wget input/output using stdin/stdout
Greetings, I have a program that loads and executes wget using the following command line: wget -i - -O - and dups wget's stdin, stdout (and stderr) handles so that I can write URLs to wget's stdin and read the responses from wget's stdout. What I wanted to do was to write a sequence of URLs to wget's stdin, reading each response before the next URL is sent. Rather, wget buffers its output so that it doesn't output anything until I close its stdin. As a result, it seems that I can only send all of the URLs to wget, close its stdin, and then read all of the responses. Is there any wget command line option that will cause wget to output a response after each URL without waiting for me to close its stdin? Thanks! Dan
Integer overflows in parse_content_range() and gethttp()
Security Vulnerability Report File: src/http.c Functions: parse_content_range() and gethttp() Vulnerability Type: Integer Overflow Location: Lines 936, 942, 955 and 3739 Severity: High Description: In the parse_content_range() function, at lines 936, 942, 955, there exists a vulnerability related to an integer overflow. The vulnerability arises from the calculation of the variable num, which is assigned the value of num = 10 * num + (*hdr - '0'); Both the multiplication and addition can lead to an integer overflow, and lead to unexpected behavior, due to the lack of validation. Furthermore similarly to [curl/curl#12983](https://github.com/curl/curl/issues/12983), at line 3739 of function gethttp(), the calculation of the contlen variable can also overflow: contlen = last_byte_pos - first_byte_pos + 1; Exploitation Scenario: An attacker may craft a malicious request with carefully chosen values in the Content-Range header, triggering an integer overflow during the calculation of num and contlen. This could potentially lead to various security issues, such as memory corruption, buffer overflows, or unexpected behavior, depending on how the num and contlen variables is subsequently used. Impact: The impact of this vulnerability could be severe, potentially leading to: Memory Corruption: If the calculated num and contlen value are used to allocate memory or perform operations such as copying data, an integer overflow could result in memory corruption, leading to crashes or arbitrary code execution. Security Bypass: In scenarios where num and contlen value are used to enforce boundaries or permissions, an attacker may exploit the integer overflow to bypass security checks or gain unauthorized access to sensitive resources. Denial of Service (DoS): A carefully crafted request exploiting the integer overflow could cause the application to enter an unexpected state or consume excessive resources, leading to a denial of service condition. Recommendations: Bounds Checking: Implement proper bounds checking to ensure that the values of num and contlen are within acceptable ranges before performing calculations. Safe Arithmetic Operations: Consider using safer arithmetic operations or alternative calculation methods to prevent integer overflows, especially when dealing with potentially large or close-to-boundary values. Input Validation: Validate input parameters to ensure they adhere to expected ranges and constraints before performing calculations. Error Handling: Implement robust error handling mechanisms to gracefully handle scenarios where input parameters result in unexpected or invalid calculations. Severity Justification: The presence of an integer overflow vulnerability at lines 936, 942, 955 and 3739 poses a high risk to the security and stability of the application. Exploitation of this vulnerability could lead to severe consequences, including memory corruption, security bypass, or denial of service conditions. Affected Versions: This vulnerability affects all versions of the application that include the vulnerable parse_content_range() and gethttp() functions. References: OWASP Integer Overflow CWE-190: Integer Overflow or Wraparound CERT Secure Coding - INT32-C Conclusion: The presence of an integer overflow vulnerability at lines 936, 942, 955 in the parse_content_range() function and line 3739 of gethttp() poses a high risk to the security and stability of the application. It is imperative to address this vulnerability promptly by implementing appropriate bounds checking and error handling mechanisms to prevent potential exploitation and associated security risks. Sent with [Proton Mail](https://proton.me/) secure email.
Wget fails to download some URLs from www.investing.com
Hi guys, This is not so much a bug as requests being blocked by the cloudflare server. Checkout: https://www.investing.com/crypto/bitcoin/btc-usd-historical-data The URL works in Firefox, but fails to download using cURL or Wget. I have tried various user-agent strings, so that is not the problem. The following URL works for Firefox, cURL and Wget: https://www.investing.com/equities/lloyds-banking-grp-historical-data Kind regards, Chris Smith _ Your E-Mail. Your Cloud. Your Office. eclipso Mail Europe. https://www.eclipso.de
[Feature Request] Add a short option for --content-disposition
Nowadays it seems increasingly common to find a file that is not being hosted where its actually stored, for access control presumably, and it seems to make no sense in having to type content-disposition when a single letter flag is all that is needed? Sent with [Proton Mail](https://proton.me/) secure email.
Re: Rejecting 'index.html*' files causes recursion to include parent-directories
Ok here's what worked: wget -P dir -r -R 'index.html*' -R '..' -nH -np --cut-dirs 3https://site.org/X/Y/Z Can anyone tell me why the behavior was happening in the first place, though? That excluding "index,html" would cause recursion in the parent-directories, when it had been disabled?
Rejecting 'index.html*' files causes recursion to include parent-directories
I'm running wget version 1.20.3 (and earlier) using this command-line wget -P dir -r -nH -np --cut-dirs 3 https://svn.site.org/X/Y/Z to retrieve the contents of the remote-directory "Z" into local directory "dir". This works fine except that i also get files "index.html" in all the sub-directories, that I don't want. Yeah, I know I can delete them afterward, but is there a way to just filter them out in the first place? If I try this form wget -P dir -r -R 'index.html*' -nH -np --cut-dirs 3 https://site.org/X/Y/Z I find that it's downloading subdirectories from the parent levels as well, even though I set the -np parameter.
Wget recursive option not working correctly with scheme relative URLs
Hello, I have part of a website (`example.com/index.html`) I want to mirror which contains scheme relative URLs (`//otherexample.com/image.png`). Trying to download these with the -r flag, results in wget converting them to a wrong URL (`example.com//otherexample.com`). So using `wget -r example.com/index.html` Will cause links with `https://example.com/index.html\/\/otherexample.com\/image.png` in the output Using the debug flag reveals this: `merge(»example.com/index.html «, » //otherexample.com/image.png«) -> https://example.com/index.html\/\/otherexample.com\/image.png [`](https://example.com/index.html//otherexample.com/image.png`)
Wget - Sync remote site with dynamic URLs that contain tokens?
Hello, First of all, apologies, this doesn't fit the category of a bug, because I can't figure out whether it's possible or an ever-intended solution for wget. I reviewed the man page and did some searching, but didn't find a solution for the problem. I was interested in leveraging wget to keep a remote list of files in sync. I construct a file list on the fly that I'm feeding to wget, and a location I'm downloading files to. wget -nc -a $logfile -i $filePath/filesynclist.txt -P $folderSyncPath The issue is that these are coming from a CDN and are signed URLs. So they come with a temporary token value appended onto the file name, for example: installer.pkg?token=088817451a9c490093t5ob22eyaorncifukx However, the next time those URLs are looked up, that token value may change. Thus, that file list will also change because the old token value is not present/valid any longer. Ultimately, wget needs the token value to access the file, verify its presence, see if there has been a change, and download it, but because the token value itself is dynamic, it results in the content being downloaded over and over again. Is it possible to tell wget to ignore the token for the purposes of the comparison in the existing file list (eg wget looks up the full url, but then removes the token string and confirms if that matches something locally downloaded), and only download if there is not a "base url string" that matches? This is quite a complex problem. Thank you for the help. Casey Jensen Associate Experience Engineering Macintosh Engineering casey.jen...@capitalone.com Click to engage our team <https://intake.cloud.capitalone.com/mac-engineering> __ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Wget2 can't mirror a website
Dear devs, I was trying to use wget2 to archive webs since wget1 doesn't support resourcesets. I can't get it work as expected bug or simple user ignorance I can't tell so I would appreciate some support. The tests where done on Ubuntu using: GNU Wget 1.21.2 built on linux-gnu.GNU Wget2 1.99.1 - multithreaded metalink/file/website downloader For example Wget1 will retrieve this small website ~650Mb just missing some resoucesets. wget \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --domains copetti.org \ www.copetti.org/ But using Wget2 it will just download a coupe html pages and stop ~0.5Mb, I have looked at the options since they are not the same and added mirror. wget2 \ --mirror \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --domains copetti.org \ www.copetti.org/ Any guidance would be greatly appreciated. If there is any other kind of channel more appropriate for my support request IRC, other mailing list, etc.. let me know. Best, Retromouse -- Sent with https://mailfence.com Secure and private email
unsuscribe
On Friday, February 10, 2023, 12:01:36 p.m. GMT-5, wrote: Send Bug-wget mailing list submissions to bug-wget@gnu.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.gnu.org/mailman/listinfo/bug-wget or, via email, send a message with subject or body 'help' to bug-wget-requ...@gnu.org You can reach the person managing the list at bug-wget-ow...@gnu.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bug-wget digest..." Today's Topics: 1. Re: Download webpages for offline viewing but get PAGE NOT FOUND 404 in browser later (Stephane Ascoet) -- Message: 1 Date: Fri, 10 Feb 2023 14:20:03 +0100 From: Stephane Ascoet To: bug-wget@gnu.org Subject: Re: Download webpages for offline viewing but get PAGE NOT FOUND 404 in browser later Message-ID: Content-Type: text/plain; charset=utf-8; format=flowed > > "File"/"Open" by the browser does not work. The browser display the proper > webpage for about a second, then the browser display: > PAGE NOT FOUND > 404 > Out of nothing, something. > > Hi, Wget isn't guilty. I've saved it from Firefox and the same thing happens... except if there is no Internet access... so I'm pretty sure it's one of the numerous scripts of this bloated Website... -- Sincerely, Stephane Ascoet -- Subject: Digest Footer ___ Bug-wget mailing list Bug-wget@gnu.org https://lists.gnu.org/mailman/listinfo/bug-wget -- End of Bug-wget Digest, Vol 172, Issue 6
Please use gzip/gunzip when fetching webpages
More often than not I try recursively downloading a webpage using wget, only to have it download a single `index.html.gz` then stop. Obviously wget can't read gzipped files so it fails to find any links for recursive downloading... I ended up using a wget fork[1] that was last updated 10 years ago and it works fine, however I find it odd that such a basic feature never made it into mainline wget. Please add a feature for automatically detecting and uncompressing gzipped webpages before crawling them. [1] https://github.com/ptolts/wget-with-gzip-compression
Query on downloading a script with windows 10
Hi there, Sorry to bother you, but how do I run a wget script (.sh extension) in windows 10? Can't figure out the correct command. Kind regards, Doug Lawrence
missing something to download mp3 files from host
Hello, I am trying to download all the radio programmes from this page: https://www.radiofrance.fr/personnes/gilles-deleuze with: wget -r -l 1 -H -nd -np -A '*.mp3' -D media.radiofrance-podcast.net https://www.radiofrance.fr/personnes/gilles-deleuze In vain. The terminal prints this: --2022-12-17 20:06:36-- https://www.radiofrance.fr/personnes/gilles-deleuze Resolving www.radiofrance.fr (www.radiofrance.fr)... 23.210.120.113, 2a02:26f0:300:192::3658, 2a02:26f0:300:1a7::3658 Connecting to www.radiofrance.fr (www.radiofrance.fr)|23.210.120.113|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘gilles-deleuze.tmp’ gilles-deleuze.tmp [ <=> ] 157.09K --.-KB/s in 0.1s 2022-12-17 20:06:36 (1.21 MB/s) - ‘gilles-deleuze.tmp’ saved [160862] Removing gilles-deleuze.tmp since it should be rejected. FINISHED --2022-12-17 20:06:36-- Total wall clock time: 0.3s Downloaded: 1 files, 157K in 0.1s (1.21 MB/s) Could you please help me see what I am missing to? Kind regards, Paolo.
iframe srcdoc resources recursive mode
Hey there, I tried searching the mailing list, bug tracker and source code for srcdoc support, seems it is missing. If I missed something, please don’t hesitate to point it here. I’m using GNU Wget 1.21.3, trying to archive some websites for posterity. The only missing feature for my case is this one. Example, suppose the html file at https://example.com/subfolder/about.html contains: ``` ``` The expected is that it selects for download the following: https://example.com/subfolder/relative.jpg https://example.com/absolute.jpg documentation: https://html.spec.whatwg.org/multipage/iframe-embed-object.html#attr-iframe-srcdoc Browser support: https://caniuse.com/?search=srcdoc edge-case (or absence-of): Seems that due some past oversight the iframe inherits the parents base url: https://github.com/whatwg/html/issues/8105 Due to backwards compatibility this is not expected to change much Thanks to the maintainers
Re: wget 1.21.3 "make check" fails on M1 Mac "Out Of The Box"
Ah looks like HTTP::Daemon is required and is not automatically installed or otherwise flagged as needed in obvious ways. Thanks for pointing out the log location. % cat tests/Test-c.log Can't locate HTTP/Daemon.pm in @INC (you may need to install the HTTP::Daemon module) (@INC contains: . /Users/dew/perl5/lib/perl5/darwin-thread-multi-2level /Users/dew/perl5/lib/perl5 /opt/homebrew/Cellar/perl/5.34.0/lib/perl5/site_perl/5.34.0/darwin-thread-multi-2level /opt/homebrew/Cellar/perl/5.34.0/lib/perl5/site_perl/5.34.0 /opt/homebrew/Cellar/perl/5.34.0/lib/perl5/5.34.0/darwin-thread-multi-2level /opt/homebrew/Cellar/perl/5.34.0/lib/perl5/5.34.0 /opt/homebrew/lib/perl5/site_perl/5.34.0) at HTTPServer.pm line 6. BEGIN failed--compilation aborted at HTTPServer.pm line 6. Compilation failed in require at HTTPTest.pm line 6. BEGIN failed--compilation aborted at HTTPTest.pm line 6. Compilation failed in require at ./Test-c.px line 6. BEGIN failed--compilation aborted at ./Test-c.px line 6. FAIL Test-c.px (exit status: 2) I then installed HTTP::Daemon with CPAN, ran a make clean, re-ran ./configure && make && make check and got the same clang linker error as originally, though now there's no longer any tests/Test-c.log file. Trying to make -C tests & testenv seemed to be a no-op? % make -C tests && make -C testenv make: Nothing to be done for `all'. make: Nothing to be done for `all'. (In theory I would have expected make check to ignore the fuzz tests since I'm using the default ./configure which has fuzzing off.) Sorry for the n00b issues here. Cheers, David E. Weekly (@dweekly) On Sat, Jun 18, 2022 at 7:33 AM Tim Rühsen wrote: > Hey, > > a while ago, a nice person gave me a login to an M1 in order to build > and test wget and wget2. I had no issues running make check. > > aarch64 and arm64 is two different names for the same thing, like e.g. > x86_64 and amd64. > > The FAILs in tests/ (make -C tests) may be due to some Perl > misconfiguration. Check the output of `cat tests/Test-c.log` ? > > I have no idea what goes wrong in fuzz/ and why this only happens on > your setup. I'd suggest to skip those tests and just do >make -C tests && make -C testenv > > Is there any other M1 user here who knows (or guesses) what is going wrong > ? > > Regards, Tim > > On 18.06.22 01:44, David Weekly via Primary discussion list for GNU Wget > wrote: > > Dear Maintainers, > > > > After downloading the wget 1.21.3 from > > https://ftp.gnu.org/gnu/wget/wget-latest.tar.lz and unpacking, I ran > > "./configure" and "make" without issue. But when I ran "make check" I got > > the following output: > > > >GEN public-submodule-commit > > > > /Applications/Xcode.app/Contents/Developer/usr/bin/make check-recursive > > > > Making check in lib > > > > /Applications/Xcode.app/Contents/Developer/usr/bin/make check-am > > > > make[3]: Nothing to be done for `check-am'. > > > > Making check in src > > > > /Applications/Xcode.app/Contents/Developer/usr/bin/make libunittest.a > > > > make[3]: `libunittest.a' is up to date. > > > > Making check in doc > > > > make[2]: Nothing to be done for `check'. > > > > Making check in po > > > > make[2]: Nothing to be done for `check'. > > > > Making check in gnulib_po > > > > make[2]: Nothing to be done for `check'. > > > > Making check in util > > > > make[2]: Nothing to be done for `check'. > > > > Making check in fuzz > > > > /Applications/Xcode.app/Contents/Developer/usr/bin/make > wget_cookie_fuzzer > > wget_css_fuzzer wget_ftpls_fuzzer wget_html_fuzzer wget_netrc_fuzzer > > wget_options_fuzzer wget_progress_fuzzer wget_read_hunk_fuzzer > > wget_robots_fuzzer wget_url_fuzzer wget_ntlm_fuzzer > > > > /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../src > > libunittest.a > > > > make[4]: `libunittest.a' is up to date. > > > >CCLD wget_cookie_fuzzer > > > >CCLD wget_css_fuzzer > > > >CCLD wget_ftpls_fuzzer > > > >CCLD wget_html_fuzzer > > > >CCLD wget_netrc_fuzzer > > > > Undefined symbols for architecture arm64: > > > >"_exec_name", referenced from: > > > >_search_netrc in libunittest.a(libunittest_a-netrc.o) > > > >_parse_netrc_fp in libunittest.a(libunittest_a-netrc.o) > > > >_memfatal in libunittest.a(libunittest_a-utils.o) > > > >_log_init in libunittest.a(libunittest_a-log.o) > > > >"
wget 1.21.3 "make check" fails on M1 Mac "Out Of The Box"
= make[4]: [test-suite.log] Error 1 (ignored) Making check in tests /Applications/Xcode.app/Contents/Developer/usr/bin/make unit-tests cd ../src && /Applications/Xcode.app/Contents/Developer/usr/bin/make libunittest.a make[4]: `libunittest.a' is up to date. CCLD unit-tests /Applications/Xcode.app/Contents/Developer/usr/bin/make check-TESTS cd ../src && /Applications/Xcode.app/Contents/Developer/usr/bin/make libunittest.a make[4]: `libunittest.a' is up to date. CCLD unit-tests cd ../src && /Applications/Xcode.app/Contents/Developer/usr/bin/make libunittest.a make[5]: `libunittest.a' is up to date. CCLD unit-tests PASS: unit-tests FAIL: Test-auth-basic.px FAIL: Test-auth-no-challenge.px FAIL: Test-auth-no-challenge-url.px FAIL: Test-auth-with-content-disposition.px FAIL: Test-auth-retcode.px FAIL: Test-c-full.px FAIL: Test-c-partial.px FAIL: Test-c.px FAIL: Test-c-shorter.px FAIL: Test-cookies.px FAIL: Test-cookies-401.px FAIL: Test-E-k-K.px FAIL: Test-E-k.px PASS: Test-ftp.px PASS: Test-ftp-dir.px PASS: Test-ftp-pasv-fail.px PASS: Test-ftp-bad-list.px PASS: Test-ftp-recursive.px FAIL: Test-ftp-iri.px FAIL: Test-ftp-iri-fallback.px FAIL: Test-ftp-iri-recursive.px FAIL: Test-ftp-iri-disabled.px PASS: Test-ftp-list-Multinet.px PASS: Test-ftp-list-Unknown.px PASS: Test-ftp-list-Unknown-a.px PASS: Test-ftp-list-Unknown-hidden.px PASS: Test-ftp-list-Unknown-list-a-fails.px PASS: Test-ftp-list-UNIX-hidden.px PASS: Test-ftp--start-pos.px FAIL: Test-HTTP-Content-Disposition-1.px FAIL: Test-HTTP-Content-Disposition-2.px FAIL: Test-HTTP-Content-Disposition.px PASS: Test-i-ftp.px FAIL: Test-i-http.px FAIL: Test-idn-headers.px FAIL: Test-idn-meta.px FAIL: Test-idn-cmd.px FAIL: Test-idn-cmd-utf8.px FAIL: Test-idn-robots.px FAIL: Test-idn-robots-utf8.px FAIL: Test-iri.px FAIL: Test-iri-percent.px FAIL: Test-iri-disabled.px FAIL: Test-iri-forced-remote.px FAIL: Test-iri-list.px FAIL: Test-k.px FAIL: Test-meta-robots.px FAIL: Test-N-current.px FAIL: Test-N-HTTP-Content-Disposition.px FAIL: Test-N--no-content-disposition.px FAIL: Test-N--no-content-disposition-trivial.px FAIL: Test-N-no-info.px FAIL: Test--no-content-disposition.px FAIL: Test--no-content-disposition-trivial.px FAIL: Test-N-old.px FAIL: Test-nonexisting-quiet.px FAIL: Test-noop.px FAIL: Test-np.px FAIL: Test-N.px FAIL: Test-N-smaller.px FAIL: Test-O-HTTP-Content-Disposition.px FAIL: Test-O-nc.px FAIL: Test-O--no-content-disposition.px FAIL: Test-O--no-content-disposition-trivial.px FAIL: Test-O-nonexisting.px FAIL: Test-O.px FAIL: Test--post-file.px FAIL: Test-proxied-https-auth.px FAIL: Test-proxied-https-auth-keepalive.px FAIL: Test-proxy-auth-basic.px FAIL: Test-restrict-ascii.px FAIL: Test-Restrict-Lowercase.px FAIL: Test-Restrict-Uppercase.px FAIL: Test-stdouterr.px FAIL: Test--spider-fail.px FAIL: Test--spider.px FAIL: Test--spider-r-HTTP-Content-Disposition.px FAIL: Test--spider-r--no-content-disposition.px FAIL: Test--spider-r--no-content-disposition-trivial.px FAIL: Test--spider-r.px FAIL: Test--start-pos.px FAIL: Test--start-pos--continue.px FAIL: Test--httpsonly-r.px FAIL: Test-204.px PASS: Test-ftp-pasv-not-supported.px FAIL: Test-https-pfs.px FAIL: Test-https-tlsv1.px FAIL: Test-https-tlsv1x.px FAIL: Test-https-selfsigned.px FAIL: Test-https-weboftrust.px FAIL: Test-https-clientcert.px FAIL: Test-https-crl.px FAIL: Test-https-badcerts.px Testsuite summary for wget 1.21.3 *# TOTAL: 94* # PASS: 15 # SKIP: 0 # XFAIL: 0 # FAIL: 79 # XPASS: 0 # ERROR: 0 See tests/test-suite.log Please report to bug-wget@gnu.org make[4]: [test-suite.log] Error 1 (ignored) Making check in testenv /Applications/Xcode.app/Contents/Developer/usr/bin/make check-TESTS PASS: Test-504.py PASS: Test-416.py PASS: Test-auth-basic-fail.py PASS: Test-auth-basic.py PASS: Test-auth-basic-netrc.py PASS: Test-auth-basic-netrc-user-given.py PASS: Test-auth-basic-netrc-pass-given.py PASS: Test-auth-basic-no-netrc-fail.py PASS: Test-auth-both.py PASS: Test-auth-digest.py PASS: Test-auth-no-challenge.py PASS: Test-auth-no-challenge-url.py PASS: Test-auth-retcode.py PASS: Test-auth-with-content-disposition.py PASS: Test-c-full.py PASS: Test-condget.py PASS: Test-Content-disposition-2.py PASS: Test-Content-disposition.py PASS: Test--convert-links--content-on-error.py PASS: Test-cookie-401.py PASS: Test-cookie-domain-mismatch.py PASS: Test-cookie-expires.py PASS: Test-cookie.py PASS: Test-Head.py PASS: Test-hsts.py PASS: Test--https.py PASS: Test--https-crl.py
Manpage and infopage of wget need mention whether regex of wget is Extended or Basic
The man page of wget 1.21.2 (also 1.20.3) describes the following options concerning regular expressions. > --accept-regex urlregex > --reject-regex urlregex > Specify a regular expression to accept or reject > the complete URL. > > > --regex-type regextype > Specify the regular expression type. > Possible types are posix or pcre. > Note that to be able to use pcre type > wget has to be compiled with libpcre support. However, the above option description forgets to mention which kind of POSIX regular expression wget uses. The info page of wget also forgets to mention which. There are two kinds of POSIX regular expressions: 1. POSIX Extended Regular Expression (ERE) 2. POSIX Basic Regular Expression (BRE) The difference between BRE and ERE follows: POSIX ERE ? + | ( ) { } have special meanings by themselves without being preceded by a backslash (\). To be literal, they need be escaped. POSIX BRE ? + | are always literal and never have special meanings, no matter whether preceded by a backslash (\). ( ) { } are literal by themselves, but have special meanings if and only if they are escaped as in \( \) \{ \} All other special symbols have no difference between POSIX ERE and POSIX BRE. While the man page of the latest version of wget still forgets to mention whether wget uses ERE or BRE, a very old mail in the mailing list system suggests that wget should use ERE. Gijs van Tulder wrote on 11 Apr 2012 (https://lists.gnu.org/archive/html/bug-wget/2012-05/msg00021.html): > Here is a new version of the regular expressions patch. > The new version combines POSIX (always, from gnulib) > and PCRE (if available). > > The patch adds these options: > > --accept-regex="..." > --reject-regex="..." > > --regex-type=posix for POSIX extended regexes (the default) > --regex-type=pcre for PCRE regexes (if PCRE is available) Please verify that wget currently uses ERE (as opposed to BRE) and that it is the default, by looking at the source code and by running wget. If so verified, then, please add the sentence "posix is the default, and refers to POSIX Extended Regular Expression (ERE)." to the manpage and the infopage. Thus, the option description should become: --regex-type regextype Specify the regular expression type. Possible types are posix or pcre. posix is the default, and refers to POSIX Extended Regular Expression (ERE). Note that to be able to use pcre type wget has to be compiled with libpcre support. To test whether the regex of wget is ERE, you need know the following. ? + | ( ) { } have the following meanings when they have special meanings. ? zero or one of the preceding element + one or more of the preceding element | alternation ( ) grouping {n} the preceding element occurs exactly n times {n,} the preceding element occurs at least n times {n,m} the preceding element occurs at least n times but at most m times Before actually running `wget` to see whether the posix regex of wget is ERE, let us get familiar with the behavior of ERE by running `grep`. The -E option of GNU grep enables POSIX Extended Regular Expression (ERE). Without -E, the regex of GNU grep is basic but slightly deviated from POSIX BRE. Here is the difference between the three: POSIX ERE ? + | ( ) { } have special meanings by themselves without being preceded by a backslash (\). To be literal, they need be escaped. POSIX BRE ? + | are always literal and never have special meanings, no matter whether preceded by a backslash (\). ( ) { } are literal by themselves, but have special meanings if and only if they are escaped as in \( \) \{ \} GNU-grep basic (default for GNU grep) ? + | ( ) { } are literal by themselves, but have special meanings if and only if escaped as in \? \+ \| \( \) \{ \} All other special symbols have no difference between POSIX ERE, POSIX BRE, and GNU-grep basic. Let me mention two of such symbols. * zero or more of the preceding element . matches any character except newline The dot character '.' appears in a domain name such as "ftp.gnu.org" and before a file extension such as "report.pdf". For '.' to literally mean a dot in regex, it has to be escaped like "ftp\.gnu\.org" and "report\.pdf". Note that, in the context of regular expression, a special character means a character that has a meaning special to regular expression. This is not to be confused with a special character for bash. Many characters special to regex are also special to bash (but the meanings to regex and the meanings to bash may differ). Thu
"OpenSSL: unimplemented 'secure-protocol' option value 2"
Hi, Please advise. Getting an error "OpenSSL: unimplemented 'secure-protocol' option value 2" (debug file enclosed). Thanks in advance. Kind regards, Danny Tuerlings Setting --method (method) to POST Setting --body-file (bodyfile) to c:/temp/upsertcandidate2inforequest.xml DEBUG output created by Wget 1.21.2 on mingw32. Reading HSTS entries from C:\Temp\bin\GnuWin32\bin2/.wget-hsts URI encoding = 'CP1252' converted 'https://uat-ws-esb.emea.adecco.net:9305/NL/CORE/MAX' (CP1252) -> 'https://uat-ws-esb.emea.adecco.net:9305/NL/CORE/MAX' (UTF-8) --2021-11-29 14:48:57-- https://uat-ws-esb.emea.adecco.net:9305/NL/CORE/MAX OpenSSL: unimplemented 'secure-protocol' option value 2 Please report this issue to bug-wget@gnu.org
Patch for bug 56909
Hello wget maintainers, Attached there is a patch file that strips sending Authentication headers on redirects. This should solve the https://savannah.gnu.org/bugs/?56909 / CVE-2021-31879. Regards, Aleksander Bułanowski wget-redirect-auth.patch Description: Binary data
This version does not have support for IRIs
If I add the option “—local-encoding=UTF-8” to my wget script, wget 1.19.1 (the version on my NAS) says: “This version does not have support for IRIs” If I run “wget –help” on my NAS, both “—local-encoding” and “—remote-encoding” are listed as options. This error message was reported as a bug against 1.12.x Is it still a known bug? Was it fixed between 1.19.1 and 1.21.1? Am I doing something wrong? Thanks in advance for your advice.
Unexpected Versioning
With the following wget script I am getting unexpected versioning of the resulting files: >> wget -EkKrNpH \ --output-file=wget.log \ --domains=imcz.club,sf.wildapricot.org \ --exclude-domains=webmail.imcz.club \ --exclude-directories=calendar,Club-Events,External-Events,Fonts,fonts,Sys \ --ignore-case \ --level=1\ --no-parent \ --no-proxy \ --random-wait \ --regex-type=pcre \ --reject=ashx,"overlay*" \ --reject-regex="calendar[@\?].*|Club-Events[@\?].*|External-Events[@\?].*|event-\d+[@\?].*|/[Ff]onts" \ --rejected-log=wget-rejected.log \ --restrict-file-names=windows \ --wait=1 \ https://imcz.club/ << Some of the downloaded pages have ".1" inserted into the filenames, for no apparent reason. Since I am using -r without --no-clobber, I would expect no versioning. In the case of the above script, a versioned file, "FAQ-Forum.1", is produced in the absence of any unversioned one: >> --2021-07-22 11:03:44-- https://imcz.club/FAQ-Forum Connecting to imcz.club|34.226.77.200|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://imcz.club/Sys/Login?ReturnUrl=%2fFAQ-Forum [following] --2021-07-22 11:03:46-- https://imcz.club/Sys/Login?ReturnUrl=%2fFAQ-Forum Connecting to imcz.club|34.226.77.200|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 41667 (41K) [text/html] Saving to: 'imcz.club/FAQ-Forum.1.html' 0K .. .. .. .. 100% 225K=0.2s Last-modified header missing -- time-stamps turned off. 2021-07-22 11:03:47 (225 KB/s) - 'imcz.club/FAQ-Forum.1.html' saved [41667/41667] << Replacing "--level=2" results in many more versioned files, a few of which have unversioned counterparts, but most of which do not. The full version of the script includes login parameters and "--level=4", but I have posted a simplified version here so others can reproduce the problem. Similar problems have been reported in the past: https://lists.gnu.org/archive/html/bug-wget/2015-01/msg00076.html https://lists.gnu.org/archive/html/bug-wget/2014-11/msg00321.html https://lists.gnu.org/archive/html/bug-wget/2014-06/msg00107.html but the advice in those threads doesn't seem to apply to my case. I am using the not-so-ancient v1.19.1 of wget. Thanks for any help! Regards, Roger
wget bandwidth usage
Hello All, Thank you for your work on wget. My apologies if this isn't the place to ask a generic question. For background, I'm using wget 1.19.4 on Ubuntu 18.04.5 I'm curious about wget's usage of the bandwidth available to it. Specifically: 1 - does it use the entire pipe available? 2 - does it "monitor" how much bandwidth is available during a download and adjust it's usage accordingly? Meaning I guess, does it share the bandwidth with other programs or other instances of wget that might start up during a wget download? 3 - would one expect different download performance if multiple instances of wget were running simultaneously? For example would "wget file1 &; wget file2 &; wget file3;" download all 3 files in the same time as "wget file1; wget file2; wget file3;" would sequentially? Thanks for any insights you can give! Steve
Re: Wget passes Authorization header cross-domain upon redirect
hi team, Is this mailing list the right address for these issues? On Fri, Jan 22, 2021 at 11:35 PM Dolev Farhi wrote: > hi Wget team! > > When making an HTTP GET request with Authorization header, together with > the follow redirect flag (-L), e.g.: > > wget -v --header="Authorization: z==" http://1.1.1.1:8000 -L > > If the remote server (1.1.1.1) redirects to 2.2.2.2:8181 (different host > + port), the Authorization header will be passed to the redirected new host > on the new port. > > 1. Client sends HTTP GET with Authorization header to Server1:8080 > 2. Server1 redirects Client to Server2:8081 > 3. Server2:8081 receives the Authorization header > > My understanding is, if the scheme, host or port are different, then it > makes a different origin, and is effectively cross origin. Which means the > Header shouldn't be passed on in this case, and needs to be stripped? > > This is reproducible in the following versions: > > GNU Wget 1.21 built on MacOSX > GNU Wget 1.18 on Ubuntu > > cURL apparently experienced the same issue in 2018, described here: > https://curl.se/docs/CVE-2018-107.html > > Thanks! > > > -- Dolev Farhi Principal Security Engineer | Wealthsimple www.wealthsimple.com
Wget passes Authorization header cross-domain upon redirect
hi Wget team! When making an HTTP GET request with Authorization header, together with the follow redirect flag (-L), e.g.: wget -v --header="Authorization: z==" http://1.1.1.1:8000 -L If the remote server (1.1.1.1) redirects to 2.2.2.2:8181 (different host + port), the Authorization header will be passed to the redirected new host on the new port. 1. Client sends HTTP GET with Authorization header to Server1:8080 2. Server1 redirects Client to Server2:8081 3. Server2:8081 receives the Authorization header My understanding is, if the scheme, host or port are different, then it makes a different origin, and is effectively cross origin. Which means the Header shouldn't be passed on in this case, and needs to be stripped? This is reproducible in the following versions: GNU Wget 1.21 built on MacOSX GNU Wget 1.18 on Ubuntu cURL apparently experienced the same issue in 2018, described here: https://curl.se/docs/CVE-2018-107.html Thanks!
unsubscribe
unsubscribe On Friday, October 9, 2020, 01:45:31 AM EDT, BAHRI INCELER wrote: Hello, How are you? I hope you are well, I would like to many thanks for wget.. you are saving our life since 1990's I really need your help, I have looked everywhere and nothing worked. I am downloading a file like this wget --user xxx --password xxx "ftp://x.com/Hanimaganin.Gelinleri%202020.1080p.HDTV.x264.mkv; and it must save same as Hanimaganin.Gelinleri%202020.1080p.HDTV.x264.mkv but it decoding it.. I dont wanna do that, I have tried many method nothing work.. how can i make solution for it? I did try, everything like --restrict-file-names but nothing worked. Thanks..
Re: Download page with scripted table
Hi, Might anybody know if there is a better place to ask my question below or know where I can get consulting for wget? I did not see any replies. Morris On Monday, June 22, 2020, 12:13:05 AM EDT, Morris West wrote: Hi, Is it possible to for wget to save the page at the link below with the table as it appears on the page. My understanding is the table is the result of a script within the page. I have not been able to save it with wget. Any direction, insight and/or the command line would be greatly appreciated!! https://www.benzinga.com/calendar/ratings Morris
Bug?
Hi! I was trying to install ROCm and got stuck in the second part: - Add the ROCm apt repository. For Debian-based systems like Ubuntu, configure the Debian ROCm repository as follows: wget -q0 – http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list (is that word tee right word on that place after last sudo? i think it should be make?) i did all that above and when i was trying to do: sudo apt update from 3. part i got following messages: E: Tyyppi "gpg" on tuntematon rivillä 1 lähdeluettelossa /etc/apt/sources.list.d/rocm.list E: Lähteiden luetteloa ei pystynyt lukemaan. Im using Ubuntu 18.04.3 LTS version and as you might be able to tell im newbie with all this. If you need more info i'll try to find it for you. Janne
Re: Problem building/installing wget2
Hi, I went looking for a wget2 home page and found the git repo... YAY!!! Installed pandoc et. al. and followed the directions to build... IT WORKED! I plan on doing the tests and the valgrind and stuff after the build. You want to see this, right? By the way, this is a Fedora Core x86_64 (rawhide) system... FULLY upgraded except for the kernel which has some serious bugs. Sigh. Wrote a bug report to bugzilla.redhat.com but have had NO responses yet. I don't have much luck with them and my kernel bug reports. Sigh. oh well... I'll get back to you guys when I have more results. THANKS for the cool code and your help. George... On Wednesday, November 6, 2019, 10:53:05 AM PST, Darshit Shah wrote: Tim is right. This is an issue that came up with an updated version of Doxygen. The new version broke our existing Doxygen configuration. You can either downgrade the version of Doxygen you use or use the git master for Wget2. I have fixed this issue in git already * Tim Rühsen [191106 10:25]: > Hi George, > > can you make sure you have the latest git master (commit > a1f3f7bcc59ea071a153fed8288d1d66527e8b9d or later) ? > > Darshit meanwhile fixed the doxygen issue, should work on your Fedora 31 > (?) even without pandoc. > > Regards, Tim > > On 11/6/19 9:50 AM, Tim Rühsen wrote: > > On 11/6/19 4:03 AM, George R Goffe via Primary discussion list for GNU > > Wget wrote: > >> Hi, > >> > >> I just tried to build/install wget2 but there are some problems at the end > >> of the install related to man pages. > >> > >> Here's a copy of the log. > >> > >> Did I do something wrong or is this really a bug? > > > > Hi George, > > > > likely it's a bug coming up in a certain environment. Darshit and I > > recently discussed a similar issue, but somehow we lost focus... > > > > What version of doxygen do you have installed ? > > > > What if you install pandoc and build again (starting with ./configure ...) > > > > As a work-around, you can skip the docs with > > ./configure --disable-doc > > > > Regards, Tim > > > -- Thanking You, Darshit Shah PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
Re: Problem building/installing wget2
Guys, My doxygen is at : doxygen-1.8.16-2.fc32.x86_64 Working on pandoc now. What is git master? Are you talking about the wget2 source git repository? If you will give me the URL for that I'll update wget2 and try it all again. Ok? Best regards, George... On Wednesday, November 6, 2019, 10:53:05 AM PST, Darshit Shah wrote: Tim is right. This is an issue that came up with an updated version of Doxygen. The new version broke our existing Doxygen configuration. You can either downgrade the version of Doxygen you use or use the git master for Wget2. I have fixed this issue in git already * Tim Rühsen [191106 10:25]: > Hi George, > > can you make sure you have the latest git master (commit > a1f3f7bcc59ea071a153fed8288d1d66527e8b9d or later) ? > > Darshit meanwhile fixed the doxygen issue, should work on your Fedora 31 > (?) even without pandoc. > > Regards, Tim > > On 11/6/19 9:50 AM, Tim Rühsen wrote: > > On 11/6/19 4:03 AM, George R Goffe via Primary discussion list for GNU > > Wget wrote: > >> Hi, > >> > >> I just tried to build/install wget2 but there are some problems at the end > >> of the install related to man pages. > >> > >> Here's a copy of the log. > >> > >> Did I do something wrong or is this really a bug? > > > > Hi George, > > > > likely it's a bug coming up in a certain environment. Darshit and I > > recently discussed a similar issue, but somehow we lost focus... > > > > What version of doxygen do you have installed ? > > > > What if you install pandoc and build again (starting with ./configure ...) > > > > As a work-around, you can skip the docs with > > ./configure --disable-doc > > > > Regards, Tim > > > -- Thanking You, Darshit Shah PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
Problem building/installing wget2
Hi, I just tried to build/install wget2 but there are some problems at the end of the install related to man pages. Here's a copy of the log. Did I do something wrong or is this really a bug? Best regards, George... mkconfig-wget2.build.log.gz Description: application/gzip
[Bug-wget] lex compile problem on AIX 7.1
Hi! I get the following error when compiling wget 1.19.1 on AIX 7.1: make all-am CC connect.o CC convert.o CC cookies.o CC ftp.o lex -ocss.c 0: Warning: 1285-300 The o flag is not valid. 0: Warning: 1285-300 The s flag is not valid. 0: Warning: 1285-300 The s flag is not valid. 0: Warning: 1285-300 The . flag is not valid. Seems the LEX arguments are not valid? Any suggestions? br Markus