Re: wget: unable to resolve host address
Given that RFCs 3490-3492 came out in 2003 and 5890-5895 came out in 2010, I would have expected IDNA support by now. Does anybody know for sure? From: Bug-wget on behalf of pythonomor...@gmail.com Sent: Tuesday, February 8, 2022 1:26 PM To: bug-wget@gnu.org Subject: wget: unable to resolve host address Hello, I am trying to download from a list of files (jpeg images). The website utilizes Cyrillic in its URL. I get the following error message: wget: unable to resolve host address 'xn--h-xubc' I've checked the links manually and the do work. I am enclosing a shortened version of the file list. I've tried different commands to no avail: wget.exe -i C:\dl_files\url-list.txt --secure-protocol=auto --remote-encoding=Windows-1251 -nc -c -P C:\dl_files\ I've used Windows-1251 as I did not see a list of encoding names in the manual https://secure-web.cisco.com/1ooTZPy8h-fBRcp0Zjk_hT6tQbv4w0wsk879mz0uB6aG15KQwcB5um7xiytswPhvpEx2CdU9QntWH_SPxAnAAG2ARAaxmvTXfptU_z__MN1SAGF4Sez144I6e5o6wRDx_cSKPXoTDNyplauirv54vbnDS5kLuXXsirRhFl1o3guYaHHwaf3LYbyLEOP1sfTL44_bLjOocvGciGnBwA68K2ME4JREkRcBuegw_-t6YfWN3v9vCCIziBr8G5DQ-u2wZVCytrHEb423jdgKX3xtQJQrfCnNBUT243xpqVx57lS8cbrgaBTxvUOBIKj0Se4FctlqI9ZanNX4VKAbM5laWTi54FjwlpdEqS5p2a-_mHFAGnfVznDud3Ng47NLEw8LBwKlZSNA26ms9KzvmbbG0zDq3PF5CE_nwWxjc01-0kGa2qeRISiPFM58HpVsAG3Pt/https%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fwget%2Fmanual%2Fwget.html%23Wgetrc-Commands wget.exe -i C:\dl_files\url-list.txt --secure-protocol=auto -nc -c -P C:\dl_files\ Apparently the problem is caused by Cyrillic characters. I have inkling that I am not using the correct options for the program. I would appreciate if you gave me a hint on how to solve the problem. Regards, Max
RE: [bug #58354] Wget doesn't parse URIs starting with http:/
I've got code for parsing broken URLs at http://mason.gmu.edu/~smetz3/source/unobfuscate.zip if that's of any use to you. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: Bug-wget [bug-wget-bounces+smetz3=gmu@gnu.org] on behalf of Luca Bernardi [invalid.nore...@gnu.org] Sent: Tuesday, May 12, 2020 6:57 AM To: Luca Bernardi; gscriv...@gnu.org; tim.rueh...@gmx.de; bug-wget@gnu.org; dar...@gnu.org Subject: [bug #58354] Wget doesn't parse URIs starting with http:/ Follow-up Comment #1, bug #58354 (project wget): PS This bug has happened when trying to crawl a website with default Wordpress template. ___ Reply to this item at: <https://secure-web.cisco.com/1q_9r4L4Y69ONAuRRi0ugNjuqo2Tj_fFoBQbF5ioU-bnyA1vRNKC2qjgGrGzNsMeAi9WBFuCZq5ZbRgGNcUnwFXhwPut6uzco1g0e7u7DGjIlIzN1O2Kb8A7lcd1hGFvVO2RlJOXPPbaPfPz1vWjpt1lp_MSi15q_ApZl5XAVjS7RRw_8hl0LW1Vlav9F86E8xj6U0j7w1Rb17wjLXaH3YDyCxaR2rYYNb5aMPjo-HUQgiErPIGkmU5OTyscR3nnY5AZZ-gRcgT7fDYF-9BIsYRmM1WK1zcfH5YaUF08mWkkbcQcl4uZEgkb53ewOM5Hc2ze5rHP40EGGXdoHzHZCnFQ-tEzuTrjgYf4u8kaLWS_mLhOUPdnuK0TVTYUcKWVhJJLvOlsmp7YPRnhtDQNzNqDbDLbFFtg7nplUPJo8CIC74qShVvDvMPALoH0UviH4/https%3A%2F%2Fsavannah.gnu.org%2Fbugs%2F%3F58354> ___ Message sent via Savannah https://secure-web.cisco.com/1t7bdydvsCxBYK2hviWUK34edpVCbTtcc7hvoEjsGxp7TF7YcwxQ4wHZDEeqhx7ckLh33IjhN6G3CTT6UK6Nhhq-1MBzaLtKN3ycAbQu9cLQX_Is4dFUdOLYzPUdtaX4csfyBmvz-h5-D-HjK5ZoEEYyJLkpqwjCVh8FrDCzMX3GPuG7Gc47pGRmt4cAoaa64gi3TWmRF9Rlac3d-3JLYmkzxyBl6DMT_eeYR9YQIZLnWPYhJhdG4367UOEV6eEJPSzbApw6N0xoxr7bE9EhRLs509MOh6MRMnCQPJk6JpDttjn_xSjlybWQzZRlYmm87zlzgsopx_leVwUGOHKtEcCDJqMajmWHC4NDH2M3DPfHGQ5uSYTbaoVmgMMZBuHksYzhBaW8pWLkIYDTAe288H6u12Rr1qbRMeJA6v5UeUTNSgb5ebn2ld1j9hvKPDnN-/https%3A%2F%2Fsavannah.gnu.org%2F
RE: [bug #57884] wget reveals my operating system to the server
Which raises far more serious security concerns than reporting browser capabilities. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: Bug-wget [bug-wget-bounces+smetz3=gmu@gnu.org] on behalf of Bruno Haible [br...@clisp.org] Sent: Monday, February 24, 2020 6:42 AM To: ge...@mweb.co.za; Tim Ruehsen Cc: bug-wget Subject: Re: [bug #57884] wget reveals my operating system to the server ge...@mweb.co.za wrote: > I wonder about the reason given: "To avoid compatibility issues." > That was - if I recall correctly - the reason for having the string > to start with: So that servers can format pages to suit the capabilities > of the browser and version used. That was how web applications were written 15-20 years ago. 10 years ago, the browser capabilities are queried by the JavaScript toolkit [1][2]. Nowadays, they prefer feature detection in JavaScript. Bruno [1] https://secure-web.cisco.com/1yxilmEYA0e_5D6kvA8W5Cqm4kLz7h7_Ye2VnfxQYqkm9N5qlYZSFt6ngcSYQysbe7ePDJeVOpzlAGq44PHRXdWMlXd6AozIn2B-QQ00LfnnlSynWCurXgcAyVxpnW-4s70vww8NvO8jBboJnb0vcvOoY4Rx_k9ak4zmgPbkDkmRc5OF5X7GXC5Sllh9M_A89zAoTeJ4Q5aHOU5M7io_xkP2-SV1t67Emos6BKN0Eixj9mejKPe27JFKXBVpgIzeXquux9HMR3XLEHe67qd5ojjG8LDkYJmPldP9JAz31DHH-WIJBk3RKoX6JyvOjzZjYCCw8itfdbd_0tS5m157ff-kv08SLGrOIQgjexjO7_zyer_-ihCJubx7krfmWMXGk8wwusXzNU3LtVCyYfDWC5cDJcGIpEP5GQ79aB23QXwkcLkZUEu03lkFPOXOPVWpY/https%3A%2F%2Fdojotoolkit.org%2Freference-guide%2F1.7%2Fquickstart%2Fbrowser-sniffing.html [2] https://secure-web.cisco.com/1cdYJDVpsOUXTFN9ygkgkR4_DBO4vlgE2j2QphPYaQgLlsortmJpgrLdRCbCoQgTsynxSE5GISz85Qp1ck_jjAz0M4hrOQ5CHKVoXqtu11b50PX3AoxjXLI2VeCC_8_G5GHMYQxp32nRo5PYUX3yHcmHZYRjut_xzl7nWNWc4Eb0adTaI1r3raH9dBt1y_yn14Uk5U1Z27FhC_0DLCHG0Hx-mTj4tawa4dcVTUYfG8kXPHWqbvCzOQnITtFd7SCeJhHqcaM88nnVPn6MgmzAYnFkRQYgnj02brU4ODRHpIxCKd9oXc6J9gDoAB7dXs8SDxiLCrd3cyd_fbDRf8BlAlMg8xWvED1-4LV3Juv0xMN-4NIh6W3uBRoAdr6fI0iPO_WoaitaKdbrc852h-5hTjf6bXX7foXMoI8-iGl5IravBl05HyOXrugDaZ8rQ1VlD/https%3A%2F%2Fapi.jquery.com%2FjQuery.browser%2F
Re: [bug #57356] Don't use smart quotes in output messages
If the code tailors the delimiters to the local then I see nothing wrong with “text” or ‘text’. OTOH, if its hardwired then I agree that only ASCII characters should be used. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: Bug-wget on behalf of anonymous Sent: Wednesday, December 4, 2019 10:03 AM To: gscriv...@gnu.org; tim.rueh...@gmx.de; bug-wget@gnu.org; dar...@gnu.org Subject: [bug #57356] Don't use smart quotes in output messages URL: <https://secure-web.cisco.com/1utMRam1wCgY4KgzqUK76rk89f7jO9i0WaMYNWk8gG3F3KjTFkPuSeZvYkNqNfFVfoOIRic-nBDbgMlmKMNtY5IduxnyVzTGVrIrBJ5CyFEUcN_XU6zx899dJxnK7ErFWTymk092zn_lvlNhg-nrFzZI8bV7WKJA1ruhBt6THiyrgx7rB3sVk4qamwzUwL9e_XQo1efeYrp9gtr3ZLlZQdRP8rQgvW-Qa-020Q2nUSOI0CwOySC135xwtHIFX77sjW4ueKsXGYmfbVNLcKodyLIqoXlfPJGjekZPVMQKl-YB5ud90b1r5f_dclGdSrZru9mlcb-FzqTfpLn0A7vltE_uydVkJvUFeWA-bupp6COBHEeURJDT74KB0ymedr58A85GwNIT1VF3MjzYxSWH8aw2ZTk0Rvlb2Req02BoWolu0R8-m06prkbkazI1EVE5H/https%3A%2F%2Fsavannah.gnu.org%2Fbugs%2F%3F57356> Summary: Don't use smart quotes in output messages Project: GNU Wget Submitted by: None Submitted on: Wed 04 Dec 2019 03:03:56 PM UTC Category: User Interface Severity: 3 - Normal Priority: 5 - Normal Status: None Privacy: Public Assigned to: None Originator Name: Originator Email: Open/Closed: Open Discussion Lock: Any Release: trunk Operating System: None Reproducibility: Every Time Fixed Release: None Planned Release: None Regression: None Work Required: None Patch Included: None ___ Details: The redirect message _Redirecting output to ‘wget-log’_ uses unicode smart quotes Expected: use normal typed ' ' quote characters. programs used by programmers / technical users should NEVER display smart quotes 1.20.3 homebrew macOS 10.14.6 ___ Reply to this item at: <https://secure-web.cisco.com/1utMRam1wCgY4KgzqUK76rk89f7jO9i0WaMYNWk8gG3F3KjTFkPuSeZvYkNqNfFVfoOIRic-nBDbgMlmKMNtY5IduxnyVzTGVrIrBJ5CyFEUcN_XU6zx899dJxnK7ErFWTymk092zn_lvlNhg-nrFzZI8bV7WKJA1ruhBt6THiyrgx7rB3sVk4qamwzUwL9e_XQo1efeYrp9gtr3ZLlZQdRP8rQgvW-Qa-020Q2nUSOI0CwOySC135xwtHIFX77sjW4ueKsXGYmfbVNLcKodyLIqoXlfPJGjekZPVMQKl-YB5ud90b1r5f_dclGdSrZru9mlcb-FzqTfpLn0A7vltE_uydVkJvUFeWA-bupp6COBHEeURJDT74KB0ymedr58A85GwNIT1VF3MjzYxSWH8aw2ZTk0Rvlb2Req02BoWolu0R8-m06prkbkazI1EVE5H/https%3A%2F%2Fsavannah.gnu.org%2Fbugs%2F%3F57356> ___ Message sent via Savannah https://secure-web.cisco.com/10dPS8Hx_IFek6cDkUZA2_4xuV-Xwb879Qj9bsZbcB8FdquqsJBXzgKoOM9noUMaEQJOqyLROdefVIohduJnmDWu4hbR82PnnAkCUwb4HMPhtAoxUM_hSUoCpyrqW5eSoYfJbRFk5J1oX2kBbAplwHfk1t6amtyMUky62oLfT3MOSLt2hkAFXfeqp3ZxSsJeVizuYQgH-LzfI17RV2X0ycVNjMerLhpsFSbekX4TIMF2oDNis8xBTF0N0XiME9rZVHZ5F2dF-Y_mspqJCgEPze3iV8590KIZSns-9YE6PpoED7y6M8Dvle5VsS9PoScAf97EUQ0v01jeGODY7syGzC89F0SXRprCCWWUyxlZE2kQO4qkWYHTl7OPhGugJoqcEsJEevFJlu1v4YXA034K7osq0ztGs4CmbAdLaWgPohc2PKADCit89E9JLs808SsTX/https%3A%2F%2Fsavannah.gnu.org%2F
Re: [Bug-wget] Standard cookie file extension
RFC 6265 does not define cookie files; the way that a browser stores cookies is up to the browser. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: Bug-wget on behalf of Peng Yu Sent: Wednesday, October 23, 2019 9:49 AM To: bug-wget Subject: [Bug-wget] Standard cookie file extension Hi, I am wondering if there is a standard cookie file extension for cookie files written by wget. So far, I only see filenames like cookie.txt cookies.txt. So the extension is just .txt. But .txt is not a specific extension for cookie files. I'd like an extension dedicated to cookie files for disambignuity purposes. Thanks. -- Regards, Peng