wget input/output using stdin/stdout

2024-03-01 Thread Dan Lewis via Primary discussion list for GNU Wget
Greetings,

I have a program that loads and executes wget using the following command
line:

wget -i - -O -


and dups wget's stdin, stdout (and stderr) handles so that I can write URLs
to wget's stdin and read the responses from wget's stdout. What I wanted to
do was to write a sequence of URLs to wget's stdin, reading each response
before the next URL is sent. Rather, wget buffers its output so that it
doesn't output anything until I close its stdin. As a result, it seems that I
can only send all of the URLs to wget, close its stdin, and then read all
of the responses.

Is there any wget command line option that will cause wget to output a
response after each URL without waiting for me to close its stdin?

Thanks!
Dan


Integer overflows in parse_content_range() and gethttp()

2024-02-24 Thread vulnerabilityspotter--- via Primary discussion list for GNU Wget
Security Vulnerability Report

File: src/http.c

Functions: parse_content_range() and gethttp()

Vulnerability Type: Integer Overflow

Location: Lines 936, 942, 955 and 3739

Severity: High

Description:

In the parse_content_range() function, at lines 936, 942, 955, there exists a 
vulnerability related to an integer overflow. The vulnerability arises from the 
calculation of the variable num, which is assigned the value of

num = 10 * num + (*hdr - '0');

Both the multiplication and addition can lead to an integer overflow, and lead 
to unexpected behavior, due to the lack of validation.

Furthermore similarly to 
[curl/curl#12983](https://github.com/curl/curl/issues/12983), at line 3739 of 
function gethttp(), the calculation of the contlen variable can also overflow:

contlen = last_byte_pos - first_byte_pos + 1;

Exploitation Scenario:

An attacker may craft a malicious request with carefully chosen values in the 
Content-Range header, triggering an integer overflow during the calculation of 
num and contlen. This could potentially lead to various security issues, such 
as memory corruption, buffer overflows, or unexpected behavior, depending on 
how the num and contlen variables is subsequently used.

Impact:

The impact of this vulnerability could be severe, potentially leading to:

Memory Corruption: If the calculated num and contlen value are used to allocate 
memory or perform operations such as copying data, an integer overflow could 
result in memory corruption, leading to crashes or arbitrary code execution.

Security Bypass: In scenarios where num and contlen value are used to enforce 
boundaries or permissions, an attacker may exploit the integer overflow to 
bypass security checks or gain unauthorized access to sensitive resources.

Denial of Service (DoS): A carefully crafted request exploiting the integer 
overflow could cause the application to enter an unexpected state or consume 
excessive resources, leading to a denial of service condition.

Recommendations:

Bounds Checking: Implement proper bounds checking to ensure that the values of 
num and contlen are within acceptable ranges before performing calculations.

Safe Arithmetic Operations: Consider using safer arithmetic operations or 
alternative calculation methods to prevent integer overflows, especially when 
dealing with potentially large or close-to-boundary values.

Input Validation: Validate input parameters to ensure they adhere to expected 
ranges and constraints before performing calculations.

Error Handling: Implement robust error handling mechanisms to gracefully handle 
scenarios where input parameters result in unexpected or invalid calculations.

Severity Justification:

The presence of an integer overflow vulnerability at lines 936, 942, 955 and 
3739 poses a high risk to the security and stability of the application. 
Exploitation of this vulnerability could lead to severe consequences, including 
memory corruption, security bypass, or denial of service conditions.

Affected Versions:

This vulnerability affects all versions of the application that include the 
vulnerable parse_content_range() and gethttp() functions.

References:

OWASP Integer Overflow
CWE-190: Integer Overflow or Wraparound
CERT Secure Coding - INT32-C

Conclusion:

The presence of an integer overflow vulnerability at lines 936, 942, 955 in the 
parse_content_range() function and line 3739 of gethttp() poses a high risk to 
the security and stability of the application. It is imperative to address this 
vulnerability promptly by implementing appropriate bounds checking and error 
handling mechanisms to prevent potential exploitation and associated security 
risks.

Sent with [Proton Mail](https://proton.me/) secure email.

Wget fails to download some URLs from www.investing.com

2024-02-14 Thread Chris Smith via Primary discussion list for GNU Wget
Hi guys,
This is not so much a bug as requests being blocked by the cloudflare server.

Checkout:
https://www.investing.com/crypto/bitcoin/btc-usd-historical-data

The URL works in Firefox, but fails to download using cURL or Wget.
I have tried various user-agent strings, so that is not the problem.

The following URL works for Firefox, cURL and Wget:
https://www.investing.com/equities/lloyds-banking-grp-historical-data

Kind regards,
Chris Smith 

_

Your E-Mail. Your Cloud. Your Office. eclipso Mail Europe. 
https://www.eclipso.de





[Feature Request] Add a short option for --content-disposition

2023-10-29 Thread No-Reply-Wolfietech via Primary discussion list for GNU Wget
Nowadays it seems increasingly common to find a file that is not being hosted 
where its actually stored, for access control presumably, and it seems to make 
no sense in having to type content-disposition when a single letter flag is all 
that is needed?

Sent with [Proton Mail](https://proton.me/) secure email.

Re: Rejecting 'index.html*' files causes recursion to include parent-directories

2023-08-16 Thread Carl Ponder via Primary discussion list for GNU Wget



Ok here's what worked:

   wget -P dir -r -R 'index.html*' -R '..' -nH -np --cut-dirs 
3https://site.org/X/Y/Z


Can anyone tell me why the behavior was happening in the first place, 
though? That excluding "index,html" would cause recursion in the 
parent-directories, when it had been disabled?


Rejecting 'index.html*' files causes recursion to include parent-directories

2023-08-07 Thread Carl Ponder via Primary discussion list for GNU Wget



I'm running wget version 1.20.3 (and earlier) using this command-line

   wget -P dir -r -nH -np --cut-dirs 3 https://svn.site.org/X/Y/Z

to retrieve the contents of the remote-directory "Z" into local 
directory "dir".
This works fine except that i also get files "index.html" in all the 
sub-directories, that I don't want.
Yeah, I know I can delete them afterward, but is there a way to just 
filter them out in the first place?

If I try this form

   wget -P dir -r -R 'index.html*' -nH -np --cut-dirs 3
   https://site.org/X/Y/Z

I find that it's downloading subdirectories from the parent levels as 
well, even though I set the -np parameter.


Wget recursive option not working correctly with scheme relative URLs

2023-07-01 Thread Jan Bidler via Primary discussion list for GNU Wget
Hello,
I have part of a website (`example.com/index.html`) I want to mirror which 
contains scheme relative URLs (`//otherexample.com/image.png`). Trying to 
download these with the -r flag, results in wget converting them to a wrong URL 
(`example.com//otherexample.com`).

So using
`wget -r example.com/index.html`
Will cause links with 
`https://example.com/index.html\/\/otherexample.com\/image.png` in the output
Using the debug flag reveals this:
`merge(»example.com/index.html «, » //otherexample.com/image.png«) -> 
https://example.com/index.html\/\/otherexample.com\/image.png 
[`](https://example.com/index.html//otherexample.com/image.png`)

Wget - Sync remote site with dynamic URLs that contain tokens?

2023-02-22 Thread Casey Jensen via Primary discussion list for GNU Wget
Hello,

First of all, apologies, this doesn't fit the category of a bug, because I
can't figure out whether it's possible or an ever-intended solution for
wget. I reviewed the man page and did some searching, but didn't find a
solution for the problem.

I was interested in leveraging wget to keep a remote list of files in sync.
I construct a file list on the fly that I'm feeding to wget, and a location
I'm downloading files to.
wget -nc -a $logfile -i $filePath/filesynclist.txt -P $folderSyncPath

The issue is that these are coming from a CDN and are signed URLs. So they
come with a temporary token value appended onto the file name, for example:
installer.pkg?token=088817451a9c490093t5ob22eyaorncifukx

However, the next time those URLs are looked up, that token value may
change. Thus, that file list will also change because the old token value
is not present/valid any longer.

Ultimately, wget needs the token value to access the file, verify its
presence, see if there has been a change, and download it, but because the
token value itself is dynamic, it results in the content being downloaded
over and over again.

Is it possible to tell wget to ignore the token for the purposes of the
comparison in the existing file list (eg wget looks up the full url, but
then removes the token string and confirms if that matches something
locally downloaded), and only download if there is not a "base url string"
that matches?

This is quite a complex problem. Thank you for the help.




Casey Jensen

Associate Experience Engineering

Macintosh Engineering

casey.jen...@capitalone.com

Click to engage our team
<https://intake.cloud.capitalone.com/mac-engineering>

__



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.





Wget2 can't mirror a website

2023-02-16 Thread retro mouse via Primary discussion list for GNU Wget
Dear devs,

I was trying to use wget2 to archive webs since wget1 doesn't support 
resourcesets. I can't get it work as expected bug or simple user ignorance I 
can't tell so I would appreciate some support.

The tests where done on Ubuntu using:
GNU Wget 1.21.2 built on linux-gnu.GNU Wget2 1.99.1 - multithreaded 
metalink/file/website downloader

For example Wget1 will retrieve this small website ~650Mb just missing some 
resoucesets.

wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --domains copetti.org \
         www.copetti.org/

But using Wget2 it will just download a coupe html pages and stop ~0.5Mb, I 
have looked at the options since they are not the same and added mirror.

wget2 \
     --mirror \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --domains copetti.org \
         www.copetti.org/

Any guidance would be greatly appreciated.

If there is any other kind of channel more appropriate for my support request 
IRC, other mailing list, etc.. let me know.

Best,
Retromouse
-- Sent with https://mailfence.com  Secure and private email


unsuscribe

2023-02-10 Thread Crusade 36 via Primary discussion list for GNU Wget
 

On Friday, February 10, 2023, 12:01:36 p.m. GMT-5, 
 wrote:  
 
 Send Bug-wget mailing list submissions to
    bug-wget@gnu.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.gnu.org/mailman/listinfo/bug-wget
or, via email, send a message with subject or body 'help' to
    bug-wget-requ...@gnu.org

You can reach the person managing the list at
    bug-wget-ow...@gnu.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bug-wget digest..."


Today's Topics:

  1. Re: Download webpages for offline viewing but get PAGE NOT
      FOUND 404 in browser later (Stephane Ascoet)


--

Message: 1
Date: Fri, 10 Feb 2023 14:20:03 +0100
From: Stephane Ascoet 
To: bug-wget@gnu.org
Subject: Re: Download webpages for offline viewing but get PAGE NOT
    FOUND 404 in browser later
Message-ID: 
Content-Type: text/plain; charset=utf-8; format=flowed

>
> "File"/"Open" by the browser does not work.  The browser display the proper
> webpage for about a second, then the browser display:
> PAGE NOT FOUND
> 404
> Out of nothing, something.
>
>

Hi, Wget isn't guilty. I've saved it from Firefox and the same thing 
happens... except if there is no Internet access... so I'm pretty sure 
it's one of the numerous scripts of this bloated Website...
-- 
Sincerely, Stephane Ascoet




--

Subject: Digest Footer

___
Bug-wget mailing list
Bug-wget@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-wget


--

End of Bug-wget Digest, Vol 172, Issue 6

  


Please use gzip/gunzip when fetching webpages

2023-02-01 Thread itstheworm--- via Primary discussion list for GNU Wget
More often than not I try recursively downloading a webpage using wget, only to 
have it download a single `index.html.gz` then stop. Obviously wget can't read 
gzipped files so it fails to find any links for recursive downloading... I 
ended up using a wget fork[1] that was last updated 10 years ago and it works 
fine, however I find it odd that such a basic feature never made it into 
mainline wget.

Please add a feature for automatically detecting and uncompressing gzipped 
webpages before crawling them.

[1] https://github.com/ptolts/wget-with-gzip-compression


Query on downloading a script with windows 10

2023-01-26 Thread Doug Lawrence via Primary discussion list for GNU Wget
Hi there,
Sorry to bother you, but how do I run a wget script (.sh extension) in windows 
10?  Can't figure out the correct command.
Kind regards,
Doug Lawrence


missing something to download mp3 files from host

2022-12-17 Thread Paolo Dista via Primary discussion list for GNU Wget
Hello,

I am trying to download all the radio programmes from this page: 
https://www.radiofrance.fr/personnes/gilles-deleuze with:

wget -r -l 1 -H -nd -np -A '*.mp3' -D media.radiofrance-podcast.net 
https://www.radiofrance.fr/personnes/gilles-deleuze

In vain.

The terminal prints this:

--2022-12-17 20:06:36-- https://www.radiofrance.fr/personnes/gilles-deleuze
Resolving www.radiofrance.fr (www.radiofrance.fr)... 23.210.120.113, 
2a02:26f0:300:192::3658, 2a02:26f0:300:1a7::3658
Connecting to www.radiofrance.fr (www.radiofrance.fr)|23.210.120.113|:443... 
connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gilles-deleuze.tmp’

gilles-deleuze.tmp [ <=> ] 157.09K --.-KB/s in 0.1s

2022-12-17 20:06:36 (1.21 MB/s) - ‘gilles-deleuze.tmp’ saved [160862]

Removing gilles-deleuze.tmp since it should be rejected.

FINISHED --2022-12-17 20:06:36--
Total wall clock time: 0.3s
Downloaded: 1 files, 157K in 0.1s (1.21 MB/s)

Could you please help me see what I am missing to?

Kind regards,
Paolo.

iframe srcdoc resources recursive mode

2022-09-02 Thread wget--- via Primary discussion list for GNU Wget
Hey there,

I tried searching the mailing list, bug tracker and source code for srcdoc 
support,
seems it is missing. If I missed something, please don’t hesitate to point it 
here.

I’m using GNU Wget 1.21.3, trying to archive some websites for posterity.
The only missing feature for my case is this one.

Example, suppose the html file at https://example.com/subfolder/about.html 
contains:
```


```

The expected is that it selects for download the following:
https://example.com/subfolder/relative.jpg
https://example.com/absolute.jpg

documentation:
https://html.spec.whatwg.org/multipage/iframe-embed-object.html#attr-iframe-srcdoc

Browser support:
https://caniuse.com/?search=srcdoc


edge-case (or absence-of):
Seems that due some past oversight the iframe inherits the parents base url:
https://github.com/whatwg/html/issues/8105
Due to backwards compatibility this is not expected to change much


Thanks to the maintainers



Re: wget 1.21.3 "make check" fails on M1 Mac "Out Of The Box"

2022-06-20 Thread David Weekly via Primary discussion list for GNU Wget
Ah looks like HTTP::Daemon is required and is not automatically installed
or otherwise flagged as needed in obvious ways. Thanks for pointing out the
log location.


% cat tests/Test-c.log

Can't locate HTTP/Daemon.pm in @INC (you may need to install the
HTTP::Daemon module) (@INC contains: .
/Users/dew/perl5/lib/perl5/darwin-thread-multi-2level
/Users/dew/perl5/lib/perl5
/opt/homebrew/Cellar/perl/5.34.0/lib/perl5/site_perl/5.34.0/darwin-thread-multi-2level
/opt/homebrew/Cellar/perl/5.34.0/lib/perl5/site_perl/5.34.0
/opt/homebrew/Cellar/perl/5.34.0/lib/perl5/5.34.0/darwin-thread-multi-2level
/opt/homebrew/Cellar/perl/5.34.0/lib/perl5/5.34.0
/opt/homebrew/lib/perl5/site_perl/5.34.0) at HTTPServer.pm line 6.

BEGIN failed--compilation aborted at HTTPServer.pm line 6.

Compilation failed in require at HTTPTest.pm line 6.

BEGIN failed--compilation aborted at HTTPTest.pm line 6.

Compilation failed in require at ./Test-c.px line 6.

BEGIN failed--compilation aborted at ./Test-c.px line 6.

FAIL Test-c.px (exit status: 2)

I then installed HTTP::Daemon with CPAN, ran a make clean, re-ran
./configure && make && make check and got the same clang linker error as
originally, though now there's no longer any tests/Test-c.log file.

Trying to make -C tests & testenv seemed to be a no-op?

% make -C tests && make -C testenv

make: Nothing to be done for `all'.

make: Nothing to be done for `all'.


(In theory I would have expected make check to ignore the fuzz tests since
I'm using the default ./configure which has fuzzing off.)

Sorry for the n00b issues here.

Cheers,
 David E. Weekly (@dweekly)


On Sat, Jun 18, 2022 at 7:33 AM Tim Rühsen  wrote:

> Hey,
>
> a while ago, a nice person gave me a login to an M1 in order to build
> and test wget and wget2. I had no issues running make check.
>
> aarch64 and arm64 is two different names for the same thing, like e.g.
> x86_64 and amd64.
>
> The FAILs in tests/ (make -C tests) may be due to some Perl
> misconfiguration. Check the output of `cat tests/Test-c.log` ?
>
> I have no idea what goes wrong in fuzz/ and why this only happens on
> your setup. I'd suggest to skip those tests and just do
>make -C tests && make -C testenv
>
> Is there any other M1 user here who knows (or guesses) what is going wrong
> ?
>
> Regards, Tim
>
> On 18.06.22 01:44, David Weekly via Primary discussion list for GNU Wget
> wrote:
> > Dear Maintainers,
> >
> > After downloading the wget 1.21.3 from
> > https://ftp.gnu.org/gnu/wget/wget-latest.tar.lz and unpacking, I ran
> > "./configure" and "make" without issue. But when I ran "make check" I got
> > the following output:
> >
> >GEN  public-submodule-commit
> >
> > /Applications/Xcode.app/Contents/Developer/usr/bin/make  check-recursive
> >
> > Making check in lib
> >
> > /Applications/Xcode.app/Contents/Developer/usr/bin/make  check-am
> >
> > make[3]: Nothing to be done for `check-am'.
> >
> > Making check in src
> >
> > /Applications/Xcode.app/Contents/Developer/usr/bin/make  libunittest.a
> >
> > make[3]: `libunittest.a' is up to date.
> >
> > Making check in doc
> >
> > make[2]: Nothing to be done for `check'.
> >
> > Making check in po
> >
> > make[2]: Nothing to be done for `check'.
> >
> > Making check in gnulib_po
> >
> > make[2]: Nothing to be done for `check'.
> >
> > Making check in util
> >
> > make[2]: Nothing to be done for `check'.
> >
> > Making check in fuzz
> >
> > /Applications/Xcode.app/Contents/Developer/usr/bin/make
> wget_cookie_fuzzer
> > wget_css_fuzzer wget_ftpls_fuzzer wget_html_fuzzer wget_netrc_fuzzer
> > wget_options_fuzzer wget_progress_fuzzer wget_read_hunk_fuzzer
> > wget_robots_fuzzer wget_url_fuzzer wget_ntlm_fuzzer
> >
> > /Applications/Xcode.app/Contents/Developer/usr/bin/make  -C ../src
> > libunittest.a
> >
> > make[4]: `libunittest.a' is up to date.
> >
> >CCLD wget_cookie_fuzzer
> >
> >CCLD wget_css_fuzzer
> >
> >CCLD wget_ftpls_fuzzer
> >
> >CCLD wget_html_fuzzer
> >
> >CCLD wget_netrc_fuzzer
> >
> > Undefined symbols for architecture arm64:
> >
> >"_exec_name", referenced from:
> >
> >_search_netrc in libunittest.a(libunittest_a-netrc.o)
> >
> >_parse_netrc_fp in libunittest.a(libunittest_a-netrc.o)
> >
> >_memfatal in libunittest.a(libunittest_a-utils.o)
> >
> >_log_init in libunittest.a(libunittest_a-log.o)
> >
> >"

wget 1.21.3 "make check" fails on M1 Mac "Out Of The Box"

2022-06-17 Thread David Weekly via Primary discussion list for GNU Wget
=

make[4]: [test-suite.log] Error 1 (ignored)

Making check in tests

/Applications/Xcode.app/Contents/Developer/usr/bin/make  unit-tests

cd ../src && /Applications/Xcode.app/Contents/Developer/usr/bin/make
libunittest.a

make[4]: `libunittest.a' is up to date.

  CCLD unit-tests

/Applications/Xcode.app/Contents/Developer/usr/bin/make  check-TESTS

cd ../src && /Applications/Xcode.app/Contents/Developer/usr/bin/make
libunittest.a

make[4]: `libunittest.a' is up to date.

  CCLD unit-tests

cd ../src && /Applications/Xcode.app/Contents/Developer/usr/bin/make
libunittest.a

make[5]: `libunittest.a' is up to date.

  CCLD unit-tests

PASS: unit-tests

FAIL: Test-auth-basic.px

FAIL: Test-auth-no-challenge.px

FAIL: Test-auth-no-challenge-url.px

FAIL: Test-auth-with-content-disposition.px

FAIL: Test-auth-retcode.px

FAIL: Test-c-full.px

FAIL: Test-c-partial.px

FAIL: Test-c.px

FAIL: Test-c-shorter.px

FAIL: Test-cookies.px

FAIL: Test-cookies-401.px

FAIL: Test-E-k-K.px

FAIL: Test-E-k.px

PASS: Test-ftp.px

PASS: Test-ftp-dir.px

PASS: Test-ftp-pasv-fail.px

PASS: Test-ftp-bad-list.px

PASS: Test-ftp-recursive.px

FAIL: Test-ftp-iri.px

FAIL: Test-ftp-iri-fallback.px

FAIL: Test-ftp-iri-recursive.px

FAIL: Test-ftp-iri-disabled.px

PASS: Test-ftp-list-Multinet.px

PASS: Test-ftp-list-Unknown.px

PASS: Test-ftp-list-Unknown-a.px

PASS: Test-ftp-list-Unknown-hidden.px

PASS: Test-ftp-list-Unknown-list-a-fails.px

PASS: Test-ftp-list-UNIX-hidden.px

PASS: Test-ftp--start-pos.px

FAIL: Test-HTTP-Content-Disposition-1.px

FAIL: Test-HTTP-Content-Disposition-2.px

FAIL: Test-HTTP-Content-Disposition.px

PASS: Test-i-ftp.px

FAIL: Test-i-http.px

FAIL: Test-idn-headers.px

FAIL: Test-idn-meta.px

FAIL: Test-idn-cmd.px

FAIL: Test-idn-cmd-utf8.px

FAIL: Test-idn-robots.px

FAIL: Test-idn-robots-utf8.px

FAIL: Test-iri.px

FAIL: Test-iri-percent.px

FAIL: Test-iri-disabled.px

FAIL: Test-iri-forced-remote.px

FAIL: Test-iri-list.px

FAIL: Test-k.px

FAIL: Test-meta-robots.px

FAIL: Test-N-current.px

FAIL: Test-N-HTTP-Content-Disposition.px

FAIL: Test-N--no-content-disposition.px

FAIL: Test-N--no-content-disposition-trivial.px

FAIL: Test-N-no-info.px

FAIL: Test--no-content-disposition.px

FAIL: Test--no-content-disposition-trivial.px

FAIL: Test-N-old.px

FAIL: Test-nonexisting-quiet.px

FAIL: Test-noop.px

FAIL: Test-np.px

FAIL: Test-N.px

FAIL: Test-N-smaller.px

FAIL: Test-O-HTTP-Content-Disposition.px

FAIL: Test-O-nc.px

FAIL: Test-O--no-content-disposition.px

FAIL: Test-O--no-content-disposition-trivial.px

FAIL: Test-O-nonexisting.px

FAIL: Test-O.px

FAIL: Test--post-file.px

FAIL: Test-proxied-https-auth.px

FAIL: Test-proxied-https-auth-keepalive.px

FAIL: Test-proxy-auth-basic.px

FAIL: Test-restrict-ascii.px

FAIL: Test-Restrict-Lowercase.px

FAIL: Test-Restrict-Uppercase.px

FAIL: Test-stdouterr.px

FAIL: Test--spider-fail.px

FAIL: Test--spider.px

FAIL: Test--spider-r-HTTP-Content-Disposition.px

FAIL: Test--spider-r--no-content-disposition.px

FAIL: Test--spider-r--no-content-disposition-trivial.px

FAIL: Test--spider-r.px

FAIL: Test--start-pos.px

FAIL: Test--start-pos--continue.px

FAIL: Test--httpsonly-r.px

FAIL: Test-204.px

PASS: Test-ftp-pasv-not-supported.px

FAIL: Test-https-pfs.px

FAIL: Test-https-tlsv1.px

FAIL: Test-https-tlsv1x.px

FAIL: Test-https-selfsigned.px

FAIL: Test-https-weboftrust.px

FAIL: Test-https-clientcert.px

FAIL: Test-https-crl.px

FAIL: Test-https-badcerts.px



Testsuite summary for wget 1.21.3



*# TOTAL: 94*

# PASS:  15

# SKIP:  0

# XFAIL: 0

# FAIL:  79

# XPASS: 0

# ERROR: 0



See tests/test-suite.log

Please report to bug-wget@gnu.org



make[4]: [test-suite.log] Error 1 (ignored)

Making check in testenv

/Applications/Xcode.app/Contents/Developer/usr/bin/make  check-TESTS

PASS: Test-504.py

PASS: Test-416.py

PASS: Test-auth-basic-fail.py

PASS: Test-auth-basic.py

PASS: Test-auth-basic-netrc.py

PASS: Test-auth-basic-netrc-user-given.py

PASS: Test-auth-basic-netrc-pass-given.py

PASS: Test-auth-basic-no-netrc-fail.py

PASS: Test-auth-both.py

PASS: Test-auth-digest.py

PASS: Test-auth-no-challenge.py

PASS: Test-auth-no-challenge-url.py

PASS: Test-auth-retcode.py

PASS: Test-auth-with-content-disposition.py

PASS: Test-c-full.py

PASS: Test-condget.py

PASS: Test-Content-disposition-2.py

PASS: Test-Content-disposition.py

PASS: Test--convert-links--content-on-error.py

PASS: Test-cookie-401.py

PASS: Test-cookie-domain-mismatch.py

PASS: Test-cookie-expires.py

PASS: Test-cookie.py

PASS: Test-Head.py

PASS: Test-hsts.py

PASS: Test--https.py

PASS: Test--https-crl.py

Manpage and infopage of wget need mention whether regex of wget is Extended or Basic

2021-12-17 Thread Rabvit via Primary discussion list for GNU Wget
The man page of wget 1.21.2 (also 1.20.3) describes the following options 
concerning regular expressions.

> --accept-regex urlregex
> --reject-regex urlregex
> Specify a regular expression to accept or reject
> the complete URL.
>
>
> --regex-type regextype
> Specify the regular expression type.
> Possible types are posix or pcre.
> Note that to be able to use pcre type
> wget has to be compiled with libpcre support.

However, the above option description forgets to mention which kind of POSIX 
regular expression wget uses.  The info page of wget also forgets to mention 
which.

There are two kinds of POSIX regular expressions:
1. POSIX Extended Regular Expression (ERE)
2. POSIX Basic Regular Expression (BRE)

The difference between BRE and ERE follows:

  POSIX ERE
    ?  +  |    ( )   { }   have special meanings by themselves
    without being preceded by a backslash (\).
    To be literal, they need be escaped.

  POSIX BRE
    ?  +  |   are always literal and
    never have special meanings,
    no matter whether preceded by a backslash (\).

    ( )   { }   are literal by themselves,
    but have special meanings if and only if
    they are escaped as in   \(  \)    \{  \}

All other special symbols have no difference between POSIX ERE and POSIX BRE.


While the man page of the latest version of wget still forgets to mention 
whether wget uses ERE or BRE, a very old mail in the mailing list system 
suggests that wget should use ERE.

Gijs van Tulder wrote on 11 Apr 2012
(https://lists.gnu.org/archive/html/bug-wget/2012-05/msg00021.html):

> Here is a new version of the regular expressions patch.
> The new version combines POSIX (always, from gnulib)
> and PCRE (if available).
>
> The patch adds these options:
>
>  --accept-regex="..."
>  --reject-regex="..."
>
>  --regex-type=posix   for POSIX extended regexes (the default)
>  --regex-type=pcre    for PCRE regexes (if PCRE is available)


Please verify that wget currently uses ERE (as opposed to BRE) and that it is 
the default, by looking at the source code and by running wget.  If so 
verified, then, please add the sentence "posix is the default, and refers to 
POSIX Extended Regular Expression (ERE)." to the manpage and the infopage.  
Thus, the option description should become:

  --regex-type regextype
  Specify the regular expression type.
  Possible types are posix or pcre.
  posix is the default, and refers to
  POSIX Extended Regular Expression (ERE).
  Note that to be able to use pcre type
  wget has to be compiled with libpcre support.


To test whether the regex of wget is ERE, you need know the following.

?  +  |    ( )   { }   have the following meanings
when they have special meanings.

  ?    zero or one of the preceding element
  +    one or more of the preceding element
  |    alternation
  ( )  grouping

  {n}    the preceding element occurs exactly n times
  {n,}   the preceding element occurs at least n times

  {n,m}  the preceding element occurs at least n times
  but at most m times



Before actually running `wget` to see whether the posix regex of wget is ERE, 
let us get familiar with the behavior of ERE by running `grep`.  The -E option 
of GNU grep enables POSIX Extended Regular Expression (ERE).  Without -E, the 
regex of GNU grep is basic but slightly deviated from POSIX BRE.

Here is the difference between the three:

  POSIX ERE
    ?  +  |    ( )   { }   have special meanings by themselves
    without being preceded by a backslash (\).
    To be literal, they need be escaped.

  POSIX BRE
    ?  +  |   are always literal and
    never have special meanings,
    no matter whether preceded by a backslash (\).

    ( )   { }   are literal by themselves,
    but have special meanings if and only if
    they are escaped as in   \(  \)    \{  \}


  GNU-grep basic (default for GNU grep)
    ?  +  |    ( )   { }   are literal by themselves,
    but have special meanings if and only if
    escaped as in
    \?  \+  \| \(  \)   \{  \}

All other special symbols have no difference between POSIX ERE, POSIX BRE, and 
GNU-grep basic.  Let me mention two of such symbols.

  *    zero or more of the preceding element
  .    matches any character except newline

The dot character '.' appears in a domain name such as "ftp.gnu.org" and before 
a file extension such as "report.pdf".  For '.' to literally mean a dot in 
regex, it has to be escaped like "ftp\.gnu\.org" and "report\.pdf".

Note that, in the context of regular expression, a special character means a 
character that has a meaning special to regular expression.  This is not to be 
confused with a special character for bash.  Many characters special to regex 
are also special to bash (but the meanings to regex and the meanings to bash 
may differ).  Thu

"OpenSSL: unimplemented 'secure-protocol' option value 2"

2021-11-29 Thread Danny Tuerlings via Primary discussion list for GNU Wget
Hi,

Please advise. Getting an error "OpenSSL: unimplemented 'secure-protocol' 
option value 2"
(debug file enclosed).

Thanks in advance.
Kind regards,
Danny Tuerlings



Setting --method (method) to POST
Setting --body-file (bodyfile) to c:/temp/upsertcandidate2inforequest.xml
DEBUG output created by Wget 1.21.2 on mingw32.

Reading HSTS entries from C:\Temp\bin\GnuWin32\bin2/.wget-hsts
URI encoding = 'CP1252'
converted 'https://uat-ws-esb.emea.adecco.net:9305/NL/CORE/MAX' (CP1252) -> 
'https://uat-ws-esb.emea.adecco.net:9305/NL/CORE/MAX' (UTF-8)
--2021-11-29 14:48:57--  https://uat-ws-esb.emea.adecco.net:9305/NL/CORE/MAX
OpenSSL: unimplemented 'secure-protocol' option value 2
Please report this issue to bug-wget@gnu.org


Patch for bug 56909

2021-09-07 Thread Aleksander Bułanowski via Primary discussion list for GNU Wget
Hello wget maintainers,

Attached there is a patch file that strips sending Authentication headers
on redirects.
This should solve the https://savannah.gnu.org/bugs/?56909 / CVE-2021-31879.

Regards,
Aleksander Bułanowski


wget-redirect-auth.patch
Description: Binary data


This version does not have support for IRIs

2021-07-26 Thread Roger Brooks via Primary discussion list for GNU Wget
If I add the option “—local-encoding=UTF-8” to my wget script, wget 1.19.1
(the version on my NAS) says:

“This version does not have support for IRIs”

If I run “wget –help” on my NAS, both “—local-encoding” and
“—remote-encoding” are listed as options.
This error message was reported as a bug against 1.12.x

Is it still a known bug?

Was it fixed between 1.19.1 and 1.21.1?

Am I doing something wrong?

Thanks in advance for your advice.


Unexpected Versioning

2021-07-23 Thread Roger Brooks via Primary discussion list for GNU Wget
With the following wget script I am getting unexpected versioning of the
resulting files:
>>
wget -EkKrNpH \
 --output-file=wget.log \
 --domains=imcz.club,sf.wildapricot.org \
 --exclude-domains=webmail.imcz.club \
 --exclude-directories=calendar,Club-Events,External-Events,Fonts,fonts,Sys
\
 --ignore-case \
 --level=1\
 --no-parent \
 --no-proxy \
 --random-wait \
 --regex-type=pcre \
 --reject=ashx,"overlay*" \
 
--reject-regex="calendar[@\?].*|Club-Events[@\?].*|External-Events[@\?].*|event-\d+[@\?].*|/[Ff]onts"
\
 --rejected-log=wget-rejected.log \
 --restrict-file-names=windows \
 --wait=1 \
 https://imcz.club/
<<
Some of the downloaded pages have ".1" inserted into the filenames, for no
apparent reason.
Since I am using -r without --no-clobber, I would expect no versioning.
In the case of the above script, a versioned file, "FAQ-Forum.1", is
produced in the absence of any unversioned one:
>>
--2021-07-22 11:03:44--  https://imcz.club/FAQ-Forum
Connecting to imcz.club|34.226.77.200|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://imcz.club/Sys/Login?ReturnUrl=%2fFAQ-Forum [following]
--2021-07-22 11:03:46--  https://imcz.club/Sys/Login?ReturnUrl=%2fFAQ-Forum
Connecting to imcz.club|34.226.77.200|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41667 (41K) [text/html]
Saving to: 'imcz.club/FAQ-Forum.1.html'

 0K .. .. .. ..   100%
225K=0.2s

Last-modified header missing -- time-stamps turned off.
2021-07-22 11:03:47 (225 KB/s) - 'imcz.club/FAQ-Forum.1.html' saved
[41667/41667]
<<
Replacing "--level=2" results in many more versioned files, a few of which
have unversioned counterparts, but most of which do not.
The full version of the script includes login parameters and "--level=4",
but I have posted a simplified version here so others can reproduce the
problem.
Similar problems have been reported in the past:
https://lists.gnu.org/archive/html/bug-wget/2015-01/msg00076.html
https://lists.gnu.org/archive/html/bug-wget/2014-11/msg00321.html
https://lists.gnu.org/archive/html/bug-wget/2014-06/msg00107.html
but the advice in those threads doesn't seem to apply to my case.
I am using the not-so-ancient v1.19.1 of wget.
Thanks for any help!
Regards, Roger



wget bandwidth usage

2021-05-10 Thread Stephen Adams via Primary discussion list for GNU Wget
Hello All,

Thank you for your work on wget. My apologies if this isn't the place to
ask a generic question. For background, I'm using wget 1.19.4 on Ubuntu
18.04.5

I'm curious about wget's usage of the bandwidth available to it.
Specifically:
1 - does it use the entire pipe available?
2 - does it "monitor" how much bandwidth is available during a download and
adjust it's usage accordingly? Meaning I guess, does it share the bandwidth
with other programs or other instances of wget that might start up during a
wget download?
3 - would one expect different download performance if multiple instances
of wget were running simultaneously?
For example would
   "wget file1 &; wget file2 &; wget file3;"
download all 3 files in the same time as
   "wget file1; wget file2; wget file3;"
would sequentially?

Thanks for any insights you can give!

Steve


Re: Wget passes Authorization header cross-domain upon redirect

2021-02-04 Thread Dolev Farhi via Primary discussion list for GNU Wget
hi team,

Is this mailing list the right address for these issues?

On Fri, Jan 22, 2021 at 11:35 PM Dolev Farhi 
wrote:

> hi Wget team!
>
> When making an HTTP GET request with Authorization header, together with
> the follow redirect flag (-L), e.g.:
>
> wget -v --header="Authorization: z==" http://1.1.1.1:8000 -L
>
> If the remote server (1.1.1.1) redirects to 2.2.2.2:8181 (different host
> + port), the Authorization header will be passed to the redirected new host
> on the new port.
>
> 1. Client sends HTTP GET with Authorization header to Server1:8080
> 2. Server1 redirects Client to Server2:8081
> 3. Server2:8081 receives the Authorization header
>
> My understanding is, if the scheme, host or port are different, then it
> makes a different origin, and is effectively cross origin. Which means the
> Header shouldn't be passed on in this case, and needs to be stripped?
>
> This is reproducible in the following versions:
>
> GNU Wget 1.21 built on MacOSX
> GNU Wget 1.18 on Ubuntu
>
> cURL apparently experienced the same issue in 2018, described here:
> https://curl.se/docs/CVE-2018-107.html
>
> Thanks!
>
>
>

-- 
Dolev Farhi
Principal Security Engineer | Wealthsimple
www.wealthsimple.com


Wget passes Authorization header cross-domain upon redirect

2021-01-23 Thread dfarhi--- via Primary discussion list for GNU Wget
hi Wget team!

When making an HTTP GET request with Authorization header, together with
the follow redirect flag (-L), e.g.:

wget -v --header="Authorization: z==" http://1.1.1.1:8000 -L

If the remote server (1.1.1.1) redirects to 2.2.2.2:8181 (different host +
port), the Authorization header will be passed to the redirected new host
on the new port.

1. Client sends HTTP GET with Authorization header to Server1:8080
2. Server1 redirects Client to Server2:8081
3. Server2:8081 receives the Authorization header

My understanding is, if the scheme, host or port are different, then it
makes a different origin, and is effectively cross origin. Which means the
Header shouldn't be passed on in this case, and needs to be stripped?

This is reproducible in the following versions:

GNU Wget 1.21 built on MacOSX
GNU Wget 1.18 on Ubuntu

cURL apparently experienced the same issue in 2018, described here:
https://curl.se/docs/CVE-2018-107.html

Thanks!


unsubscribe

2020-10-09 Thread Morris West via Primary discussion list for GNU Wget
 unsubscribe
 On Friday, October 9, 2020, 01:45:31 AM EDT, BAHRI INCELER 
 wrote:  
 
 Hello, How are you? I hope you are well,
I would like to many thanks for wget.. you are saving our life since 1990's 
I really need your help, I have looked everywhere and nothing worked.
I am downloading a  file like this
wget  --user xxx --password xxx 
"ftp://x.com/Hanimaganin.Gelinleri%202020.1080p.HDTV.x264.mkv;
and it must save same as Hanimaganin.Gelinleri%202020.1080p.HDTV.x264.mkv
but it decoding it.. I dont wanna do that, I have tried many method nothing 
work..
how can i make solution for it?
I did try, everything like --restrict-file-names but nothing worked.
Thanks..




  


Re: Download page with scripted table

2020-07-16 Thread Morris West via Primary discussion list for GNU Wget
 Hi,

Might anybody know if there is a better place to ask my question below or know 
where I can get consulting for wget?
I did not see any replies.


Morris

 On Monday, June 22, 2020, 12:13:05 AM EDT, Morris West 
 wrote:  
 
 Hi,

Is it possible to for wget to save the page at the link below with the table as 
it appears on the page.  My understanding is the table is the result of a 
script within the page.  I have not been able to save it with wget.  Any 
direction, insight and/or the command line would be greatly appreciated!!

https://www.benzinga.com/calendar/ratings


Morris
  


Bug?

2019-12-12 Thread Jan Lindfors via Primary discussion list for GNU Wget
Hi!
I was trying to install ROCm and got stuck in the second part:
   
   -
Add the ROCm apt repository.
   
For Debian-based systems like Ubuntu, configure the Debian ROCm repository as 
follows:
   wget -q0 – http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | 

sudo apt-key add -echo 'deb [arch=amd64] 
http://repo.radeon.com/rocm/apt/debian/ xenial main' | 

sudo tee /etc/apt/sources.list.d/rocm.list

(is that word tee right word on that place after last sudo? i think it should 
be make?)
i did all that above and when i was trying to do:
sudo apt update from 3. part i got following messages:

E: Tyyppi "gpg" on tuntematon rivillä 1 lähdeluettelossa 
/etc/apt/sources.list.d/rocm.list
E: Lähteiden luetteloa ei pystynyt lukemaan.
Im using Ubuntu 18.04.3 LTS version and as you might be able to tell im newbie 
with all this.
If you need more info i'll try to find it for you.
Janne




Re: Problem building/installing wget2

2019-11-06 Thread George R Goffe via Primary discussion list for GNU Wget
Hi,

I went looking for a wget2 home page and found the git repo... YAY!!!

Installed pandoc et. al. and followed the directions to build... IT WORKED!

I plan on doing the tests and the valgrind and stuff after the build. You want 
to see this, right?

By the way, this is a Fedora Core x86_64 (rawhide) system... FULLY upgraded 
except for the kernel which has some serious bugs. Sigh. Wrote a bug report to 
bugzilla.redhat.com but have had NO responses yet. I don't have much luck with 
them and my kernel bug reports. Sigh.

oh well... I'll get back to you guys when I have more results.

THANKS for the cool code and your help.

George... 

On Wednesday, November 6, 2019, 10:53:05 AM PST, Darshit Shah 
 wrote:  
 
 Tim is right. This is an issue that came up with an updated version of Doxygen.
The new version broke our existing Doxygen configuration. You can either
downgrade the version of Doxygen you use or use the git master for Wget2.

I have fixed this issue in git already

* Tim Rühsen  [191106 10:25]:
> Hi George,
> 
> can you make sure you have the latest git master (commit
> a1f3f7bcc59ea071a153fed8288d1d66527e8b9d or later) ?
> 
> Darshit meanwhile fixed the doxygen issue, should work on your Fedora 31
> (?) even without pandoc.
> 
> Regards, Tim
> 
> On 11/6/19 9:50 AM, Tim Rühsen wrote:
> > On 11/6/19 4:03 AM, George R Goffe via Primary discussion list for GNU
> > Wget wrote:
> >> Hi,
> >>
> >> I just tried to build/install wget2 but there are some problems at the end 
> >> of the install related to man pages.
> >>
> >> Here's a copy of the log. 
> >>
> >> Did I do something wrong or is this really a bug?
> > 
> > Hi George,
> > 
> > likely it's a bug coming up in a certain environment. Darshit and I
> > recently discussed a similar issue, but somehow we lost focus...
> > 
> > What version of doxygen do you have installed ?
> > 
> > What if you install pandoc and build again (starting with ./configure ...)
> > 
> > As a work-around, you can skip the docs with
> > ./configure --disable-doc
> > 
> > Regards, Tim
> > 
> 




-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6  


Re: Problem building/installing wget2

2019-11-06 Thread George R Goffe via Primary discussion list for GNU Wget
Guys,

My  doxygen is at : doxygen-1.8.16-2.fc32.x86_64
 
Working on pandoc now.

What is git master? Are you talking about the wget2 source git repository? If 
you will give me the URL for that I'll update wget2 and try it all again. Ok?

Best regards,

George...


On Wednesday, November 6, 2019, 10:53:05 AM PST, Darshit Shah 
 wrote:  
 
 Tim is right. This is an issue that came up with an updated version of Doxygen.
The new version broke our existing Doxygen configuration. You can either
downgrade the version of Doxygen you use or use the git master for Wget2.

I have fixed this issue in git already

* Tim Rühsen  [191106 10:25]:
> Hi George,
> 
> can you make sure you have the latest git master (commit
> a1f3f7bcc59ea071a153fed8288d1d66527e8b9d or later) ?
> 
> Darshit meanwhile fixed the doxygen issue, should work on your Fedora 31
> (?) even without pandoc.
> 
> Regards, Tim
> 
> On 11/6/19 9:50 AM, Tim Rühsen wrote:
> > On 11/6/19 4:03 AM, George R Goffe via Primary discussion list for GNU
> > Wget wrote:
> >> Hi,
> >>
> >> I just tried to build/install wget2 but there are some problems at the end 
> >> of the install related to man pages.
> >>
> >> Here's a copy of the log. 
> >>
> >> Did I do something wrong or is this really a bug?
> > 
> > Hi George,
> > 
> > likely it's a bug coming up in a certain environment. Darshit and I
> > recently discussed a similar issue, but somehow we lost focus...
> > 
> > What version of doxygen do you have installed ?
> > 
> > What if you install pandoc and build again (starting with ./configure ...)
> > 
> > As a work-around, you can skip the docs with
> > ./configure --disable-doc
> > 
> > Regards, Tim
> > 
> 




-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6  


Problem building/installing wget2

2019-11-05 Thread George R Goffe via Primary discussion list for GNU Wget
Hi,

I just tried to build/install wget2 but there are some problems at the end of 
the install related to man pages.

Here's a copy of the log. 

Did I do something wrong or is this really a bug?

Best regards,

George...



mkconfig-wget2.build.log.gz
Description: application/gzip


[Bug-wget] lex compile problem on AIX 7.1

2017-10-02 Thread list
Hi!

I get the following error when compiling wget 1.19.1 on AIX 7.1:

make all-am
CC connect.o
CC convert.o
CC cookies.o
CC ftp.o
lex -ocss.c
0: Warning: 1285-300 The o flag is not valid.
0: Warning: 1285-300 The s flag is not valid.
0: Warning: 1285-300 The s flag is not valid.
0: Warning: 1285-300 The . flag is not valid.

Seems the LEX arguments are not valid?

Any suggestions?

br
Markus