[Bug-wget] [bug #56909] wget Authorization header leak via 3xx redirects

2019-10-04 Thread Darshit Shah
Update of bug #56909 (project wget):

 Privacy: Private => Public 

___

Follow-up Comment #4:

I agree with Tim here that this is not a security issue.

Wget provides an option to correctly use the Authorization header. If the user
chooses to otherwise coerce Wget into doing something different, we should not
stop them from doing so.

Using `--header=Authorization: ds` means that the user is explicitly opting to
send the header everytime rather than only to a specific domain.

On your request I'm making this issue public.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #56808] wget uses HEAD method when both --spider and --post-data options are used

2019-08-26 Thread Darshit Shah
Follow-up Comment #2, bug #56808 (project wget):

`--post-data` works by setting `opt.method` and `opt.body_data` while
`--spider` works by setting `opt.method`, albeit indirectly.

Now, I believe that it makes absolutely no sense to set both of those in
contradicting ways. Especially with something like `POST` or `PUT` which
requires a Request body. Since a request body is not allowed with a HEAD
request.

My suggestion here would be to add a check that prevents _any_ change to
`opt.method` if `--spider` is also passed.

This prevents not only `--post-data`, but also `--method` from setting
something funny that makes no sense.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: [Bug-wget] Does `wget -q -O /dev/null -S -o- url` ignore response body?

2019-08-12 Thread Darshit Shah
That is precisely what the `--spider` option does. It sends a HEAD request.
Just like the similarly named option in Curl.

If you want it to be more explicit, you can use `--method=HEAD` instead. It
will still do the same thing though.
* Peng Yu  [190812 20:56]:
> curl has the --head option. Is there a reason why wget doesn't have it?
> 
>-I, --head
>   (HTTP  FTP  FILE)  Fetch the headers only! HTTP-servers
> feature the command HEAD which this uses to get nothing but the header
> of a document. When used on an
>   FTP or FILE file, curl displays the file size and last
> modification time only.
> 
> On 8/9/19, Tim Rühsen  wrote:
> > On 09.08.19 18:06, Peng Yu wrote:
> >> Hi,
> >>
> >> I just want to retrieve the response header instead of the response body..
> >>
> >> Does `wget -q -O /dev/null -S -o- url` still download the response
> >> body, but then dump it to /dev/null? Or wget is smart enough to know
> >> the destination is /dev/null so that it will not download the response
> >> body at all? Thanks.
> >
> > /dev/null is just a another file.
> >
> > Try with --spider. It will send a HEAD request instead of a GET request
> > - thus no body is downloaded. The server just serves the header as if it
> > was a GET request.
> >
> > Regards, Tim
> >
> >
> 
> 
> -- 
> Regards,
> Peng
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] gnuwget - VU#605641 Vulnerability Report

2019-07-23 Thread Darshit Shah
Hi,

You sent an encrypted email to a mailing list encrypted with the GPG key of a
person who is no longer actively involved with the project.

If you'd like to get in touch with the maintainers of GNU Wget regarding a
vulnerability, please contact Tim Ruehsen  and Darshit Shah 
 privately.
Our GPG keys can be easily found online or on the GNU keyring as well.

* CERT Coordination Center  [190723 04:50]:
Error: decryption/verification failed: No secret key
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] [PATCH] Disable automatic wget headers.

2019-05-30 Thread Darshit Shah
;File" + str(index)
> +Files[0].append (WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += header  + (',' if index < headers_len else '"')
> +WGET_URLS[0].append (file_name)
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/Test-disable-headers-after.py 
> b/testenv/Test-disable-headers-after.py
> new file mode 100644
> index ..344301a3
> --- /dev/null
> +++ b/testenv/Test-disable-headers-after.py
> @@ -0,0 +1,80 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from test.base_test import HTTP, HTTPS
> +from misc.wget_file import WgetFile
> +
> +"""
> +This is test ensures that the --disable-header option removes user 
> headers
> +from the HTTP request when it's placed after --header="header: value".
> +"""
> +# File Definitions 
> ###
> +file_content = """Les paroles de la bouche d'un homme sont des eaux 
> profondes;
> +La source de la sagesse est un torrent qui jaillit."""
> +
> +Headers = {
> +'Authorization',
> +'User-Agent',
> +'Referer',
> +'Cache-Control',
> +'Pragma',
> +'If-Modified-Since',
> +'Range',
> +'Accept',
> +'Accept-Encoding',
> +'Host',
> +'Connection',
> +'Proxy-Connection',
> +'Content-Type',
> +'Content-Length',
> +'Proxy-Authorization',
> +'Cookie',
> +'MyHeader',
> +}
> +
> +WGET_OPTIONS = ''
> +WGET_URLS = [[]]
> +Files = [[]]
> +
> +# Define user defined headers
> +for header in Headers:
> +WGET_OPTIONS += ' --header="' + header + ': any"'
> +
> +WGET_OPTIONS += ' --disable-header="'
> +headers_len = len(Headers)
> +
> +for index, header in enumerate(Headers, start=1):
> +File_rules = {
> +"RejectHeader": {
> +header : 'any'
> +}
> +}
> +file_name = "File" + str(index)
> +Files[0].append(WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += header  + (',' if index < headers_len else '"')
> +WGET_URLS[0].append(file_name)
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/Test-disable-headers-before.py 
> b/testenv/Test-disable-headers-before.py
> new file mode 100644
> index ..bc19fda9
> --- /dev/null
> +++ b/testenv/Test-disable-headers-before.py
> @@ -0,0 +1,78 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from test.base_test import HTTP, HTTPS
> +from misc.wget_file import WgetFile
> +
> +"""
> +This is test ensures that the --disable-header option doesn't remove 
> user headers
> +from the HTTP request when it's placed before --header="header: value".
> +"""
> +# File Definitions 
> ###
> +file_content = """Les paroles de la bouche d'un homme sont des eaux 
> profondes;
> +La source de la sagesse est un torrent qui jaillit."""
> +
> +Headers = {
> +'Authorization',
> +'User-Agent',
> +'Referer',
> +'Cache-Control',
> +'Pragma',
> +'If-Modified-Since',
> +'Range',
> +'Accept',
> +'Accept-Encoding',
&g

Re: [Bug-wget] informations about patch

2019-05-29 Thread Darshit Shah
Hi,

Sorry, it has taken very long. The patch has not been forgotten.
I'm a little too busy with other things right now and haven't had the time to
review the patch.

I will comment back on it as soon as possible. Sorry for the delay

* adham elkarn  [190529 21:33]:
> 
> 
> Envoyé à partir de Outlook<http://aka.ms/weboutlook>
> 
> De : adham elkarn
> Envoyé : samedi 18 mai 2019 16:32
> À : bug-wget@gnu.org
> Objet : informations about patch
> 
> Hello,
> is there any news about our patch for the bug (#54769 
> (https://savannah.gnu.org/bugs/?54769))  ?
> 
> Adham EL KARN
> 
> Envoyé à partir de Outlook<http://aka.ms/weboutlook>
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] /usr/bin/env: invalid option -- 'S'

2019-05-29 Thread Darshit Shah
That's very weird, the shebang line in that file reads:

```
#!/usr/bin/env perl
```

No options are being passed to env there. I'm going to have to take another
look at this later

* Jeffrey Walton  [190529 14:21]:
> Hi Everyone/Tim,
> 
> Debian 9.9:
> 
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Debian
> Description:Debian GNU/Linux 9.9 (stretch)
> Release:9.9
> Codename:   stretch
> 
> $ make check
> ...
> 
> PASS: Test-ftp-pasv-not-supported.px
> FAIL: Test-https-pfs.px
> FAIL: Test-https-tlsv1.px
> FAIL: Test-https-tlsv1x.px
> FAIL: Test-https-selfsigned.px
> SKIP: Test-https-weboftrust.px
> FAIL: Test-https-clientcert.px
> FAIL: Test-https-crl.px
> PASS: Test-https-badcerts.px
> 
> Trying to run manually:
> 
> $ ./wget-1.20.3/tests/Test-https-pfs.px
> /usr/bin/env: invalid option -- 'S'
> Try '/usr/bin/env --help' for more information.
> 
> And
> 
> $ /usr/bin/env --version
> env (GNU coreutils) 8.26
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Fwd: Fwd: Re: RESEND1: wget-1.20-win32

2019-05-18 Thread Darshit Shah
Hi,

yes, I get the point. You've sent a couple of examples, please do not keep
sending single links in emails.

As Jernej mentioned, this seems to happen in all cases where the filename is
too long. There seems to be a patch, I will look into it.

Once again, as I have stated in the past, this is a Windows only issue. Without
access to a Windows machine, it is difficult to debug this. Extensive testing
on multiple linux machines is unable to reproduce the issue. Filename scrolling
seems to work perfectly on linux.

* WQ  [190518 10:24]:
> *Also :*
> 
> https://fpdownload.macromedia.com/pub/flashplayer/latest/help/install_flash_player.exe
> 
> 
>  Forwarded Message 
> Subject:  Fwd: Re: [Bug-wget] RESEND1: wget-1.20-win32
> Date: Thu, 16 May 2019 14:31:49 +0200
> From: WQ 
> To:   bug-wget@gnu.org
> 
> 
> 
> *Also :*
> 
> https://saimei.ftp.acc.umu.se/mirror/ipfire.org/releases/ipfire-2.x/2.23-core131/ipfire-2.23.x86_64-full-core131.iso
> 
> 
>  Forwarded Message 
> Subject:  Re: [Bug-wget] RESEND1: wget-1.20-win32
> Date: Sun, 12 May 2019 18:57:38 +0200
> From: WQ 
> To:   bug-wget@gnu.org
> 
> 
> 
> *1.20.3 :
> 
> *
> 
> *1.20:
> ***
> **
> 
> *Please see also what I wrote and the picture in my original mail (below)*
> 
> Thanks
> 
> Walter
> 
> On 12/05/2019 17:45, Darshit Shah wrote:
> > Could you please let us know which sites?
> > 
> > * WQ  [190512 17:41]:
> > > See below !
> > > FYI: The updated 1.20.3 still gives problems with some sites
> > > 
> > > 
> > >  Forwarded Message 
> > > Subject:  wget-1.20-win32
> > > Date: Wed, 17 Apr 2019 01:18:01 +0200
> > > From: WQ
> > > To:   bug-wget@gnu.org
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > I'm using wget-1.20-win32 (not wget-1.20.3-win32 because there is a
> > > problem).
> > > 
> > > During download, there is a problem with the "scrolling" name, the last
> > > character character of the name is repeated some time :
> > > 
> > > 
> > > 
> > > Kind regards
> > > 
> > > Walter
> > > 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] RESEND1: wget-1.20-win32

2019-05-18 Thread Darshit Shah
Thanks!

I'll try and see if I can reproduce this for a test.

* Jernej Simončič  [190513 09:27]:
> On Sunday, May 12, 2019, 17:45:31, Darshit Shah wrote:
> 
> > Could you please let us know which sites?
> 
> It happens everywhere, as long as the filename is long enough to
> require scrolling. I received a patch from Bykov Alexey
>  that supposedly fixes this a few days ago, but
> haven't had time to test it yet:
> 
> diff --git a/src/progress.c b/src/progress.c
> index 8e5709c7..6a69b4e2 100644
> --- a/src/progress.c
> +++ b/src/progress.c
> @@ -845,8 +845,8 @@ static int count_cols (const char *mbs) { return (int) 
> strlen(mbs); }
>  static int
>  cols_to_bytes (const char *mbs _GL_UNUSED, const int cols, int *ncols)
>  {
> -  *ncols = cols;
> -  return cols;
> +  *ncols = min(strlen(mbs),cols);
> +  return *ncols;
>  }
>  #endif
>  
> 
> 
> -- 
> < Jernej Simončič ><><><><>< https://eternallybored.org/ >
> 
> Needs are a function of what other people have.
>-- Jones's Principle
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] RESEND1: wget-1.20-win32

2019-05-12 Thread Darshit Shah
Could you please let us know which sites?

* WQ  [190512 17:41]:
> See below !
> FYI: The updated 1.20.3 still gives problems with some sites
> 
> 
>  Forwarded Message 
> Subject:  wget-1.20-win32
> Date: Wed, 17 Apr 2019 01:18:01 +0200
> From: WQ 
> To:   bug-wget@gnu.org
> 
> 
> 
> Hi,
> 
> I'm using wget-1.20-win32 (not wget-1.20.3-win32 because there is a
> problem).
> 
> During download, there is a problem with the "scrolling" name, the last
> character character of the name is repeated some time :
> 
> 
> 
> Kind regards
> 
> Walter
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] Wget at the GNU Hacker's Meet (GHM), 2019

2019-05-10 Thread Darshit Shah
Hi Everyone,

Do you GNU? Attend the GNU Hackers' Meeting in Madrid this summer!

Twelve years after it's first edition in Orense, the GHM is back to Spain!
This time, we will be gathering in the nice city of Madrid for hacking, 
learning and meeting each other.

The GNU Hackers' Meeting is a friendly, semi-formal forum to discuss technical,
social, and organizational issues concerning free software and GNU. This is a
great opportunity to meet GNU maintainers and active contributors.

The GHM will take place at ETSISI, Universidad Politécnica de Madrid, from
Wednesday, 4th September to Friday, 6th September. For more information, visit:
https://www.gnu.org/ghm/upcoming.html

Both Tim and I will be attending the GHM and would love to meet some of our
users and contributors in person.

Have problems using Wget? Or you want to use our shiny new (alpha) libwget API
for networking operations in your application? We'll be there to help you out!

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] [PATCH] Disable automatic wget headers.

2019-05-06 Thread Darshit Shah
 Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/Test-disable-headers-after.py 
> b/testenv/Test-disable-headers-after.py
> new file mode 100644
> index ..344301a3
> --- /dev/null
> +++ b/testenv/Test-disable-headers-after.py
> @@ -0,0 +1,80 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from test.base_test import HTTP, HTTPS
> +from misc.wget_file import WgetFile
> +
> +"""
> +This is test ensures that the --disable-header option removes user 
> headers
> +from the HTTP request when it's placed after --header="header: value".
> +"""
> +# File Definitions 
> ###
> +file_content = """Les paroles de la bouche d'un homme sont des eaux 
> profondes;
> +La source de la sagesse est un torrent qui jaillit."""
> +
> +Headers = {
> +'Authorization',
> +'User-Agent',
> +'Referer',
> +'Cache-Control',
> +'Pragma',
> +'If-Modified-Since',
> +'Range',
> +'Accept',
> +'Accept-Encoding',
> +'Host',
> +'Connection',
> +'Proxy-Connection',
> +'Content-Type',
> +'Content-Length',
> +'Proxy-Authorization',
> +'Cookie',
> +'MyHeader',
> +}
> +
> +WGET_OPTIONS = ''
> +WGET_URLS = [[]]
> +Files = [[]]
> +
> +# Define user defined headers
> +for header in Headers:
> +WGET_OPTIONS += ' --header="' + header + ': any"'
> +
> +WGET_OPTIONS += ' --disable-header="'
> +headers_len = len(Headers)
> +
> +for index, header in enumerate(Headers, start=1):
> +File_rules = {
> +"RejectHeader": {
> +header : 'any'
> +}
> +}
> +file_name = "File" + str(index)
> +Files[0].append(WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += header  + (',' if index < headers_len else '"')
> +WGET_URLS[0].append(file_name)
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/Test-disable-headers-before.py 
> b/testenv/Test-disable-headers-before.py
> new file mode 100644
> index ..bc19fda9
> --- /dev/null
> +++ b/testenv/Test-disable-headers-before.py
> @@ -0,0 +1,78 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from test.base_test import HTTP, HTTPS
> +from misc.wget_file import WgetFile
> +
> +"""
> +This is test ensures that the --disable-header option doesn't remove 
> user headers
> +from the HTTP request when it's placed before --header="header: value".
> +"""
> +# File Definitions 
> ###
> +file_content = """Les paroles de la bouche d'un homme sont des eaux 
> profondes;
> +La source de la sagesse est un torrent qui jaillit."""
> +
> +Headers = {
> +'Authorization',
> +'User-Agent',
> +'Referer',
> +'Cache-Control',
> +'Pragma',
> +'If-Modified-Since',
> +'Range',
> +'Accept',
> +'Accept-Encoding',
> +'Host',
> +    'Connection',
> +'Proxy-Connection',
> +'Content-Type',
> +'Content-Length',
> +'Proxy-Authorization',
> +'Cookie',
> +'MyHeader',
> +}
> +
> +WGET_OPTIONS = '--disable-header="'
> +WGET_URLS = [[]]
> +Files = [[]]
> +headers_len = len(Headers)
> +
> +for index, header in enumerate(Headers, start=1):
> +File_rules = {
> +"ExpectHeader": {
> +header : 'any'
> +}
> +}
> +file_name = "File" + str(index)
> +Files[0].append (WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += header  + (',' if index < headers_len else '"')
> +WGET_URLS[0].append (file_name)
> +
> +# Define user defined headers
> +for header in Headers:
> +WGET_OPTIONS += ' --header="' + header + ': any"'
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/conf/reject_header_field.py 
> b/testenv/conf/reject_header_field.py
> new file mode 100644
> index ..e1009cdd
> --- /dev/null
> +++ b/testenv/conf/reject_header_field.py
> @@ -0,0 +1,12 @@
> +from conf import rule
> +
> +""" Rule: RejectHeaderField
> +This is a server side rule which expects a string list of Header Fields
> +which should be blacklisted by the server for a particular file's requests.
> +"""
> +
> +
> +@rule()
> +class RejectHeaderField:
> +def __init__(self, header_fields):
> +self.header_fields = header_fields
> diff --git a/testenv/server/http/http_server.py 
> b/testenv/server/http/http_server.py
> index 2cc82fb9..6f358335 100644
> --- a/testenv/server/http/http_server.py
> +++ b/testenv/server/http/http_server.py
> @@ -370,6 +370,14 @@ class _Handler(BaseHTTPRequestHandler):
>  header_line)
>  raise ServerError("Header " + header_line + ' received')
>  
> +def RejectHeaderField(self, header_fields_obj):
> +rej_header_fields = header_fields_obj.header_fields
> +for field in rej_header_fields:
> +if field in self.headers:
> +self.send_error(400, 'Blacklisted Header Field %s received' %
> +field)
> +raise ServerError('Header Field %s received' % field)
> +
>  def __log_request(self, method):
>  req = method + " " + self.path
>  self.server.request_headers.append(req)
> -- 
> 2.21.0
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] [PATCH] Disable automatic wget headers.

2019-05-04 Thread Darshit Shah
gt; +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/Test-disable-headers-after.py 
> b/testenv/Test-disable-headers-after.py
> new file mode 100644
> index ..c0ffc84d
> --- /dev/null
> +++ b/testenv/Test-disable-headers-after.py
> @@ -0,0 +1,77 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from test.base_test import HTTP, HTTPS
> +from misc.wget_file import WgetFile
> +
> +"""
> +This is test ensures that the --disable-header option removes user 
> headers
> +from the HTTP request when it's placed after --header="header: value".
> +"""
> +# File Definitions 
> ###
> +file_content = """Les paroles de la bouche d'un homme sont des eaux 
> profondes;
> +La source de la sagesse est un torrent qui jaillit."""
> +
> +Headers = {
> +'Authorization',
> +'User-Agent',
> +'Referer',
> +'Cache-Control',
> +'Pragma',
> +'If-Modified-Since',
> +'Range',
> +'Accept',
> +'Accept-Encoding',
> +'Host',
> +'Connection',
> +'Proxy-Connection',
> +'Content-Type',
> +'Content-Length',
> +'Proxy-Authorization',
> +'Cookie',
> +'MyHeader',
> +}
> +
> +WGET_OPTIONS = ''
> +WGET_URLS = [[]]
> +Files = [[]]
> +
> +# Define user defined headers
> +for header in Headers:
> +WGET_OPTIONS += ' --header="' + header + ': any"'
> +
> +for index, header in enumerate(Headers, start=1):
> +File_rules = {
> +"RejectHeader": {
> +header : 'any'
> +}
> +}
> +file_name = "File" + str(index)
> +Files[0].append(WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += ' --disable-header="' + header + '"'
> +WGET_URLS[0].append(file_name)
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/Test-disable-headers-before.py 
> b/testenv/Test-disable-headers-before.py
> new file mode 100644
> index ..d442b008
> --- /dev/null
> +++ b/testenv/Test-disable-headers-before.py
> @@ -0,0 +1,77 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from test.base_test import HTTP, HTTPS
> +from misc.wget_file import WgetFile
> +
> +"""
> +This is test ensures that the --disable-header option doesn't remove 
> user headers
> +from the HTTP request when it's placed before --header="header: value".
> +"""
> +# File Definitions 
> ###
> +file_content = """Les paroles de la bouche d'un homme sont des eaux 
> profondes;
> +La source de la sagesse est un torrent qui jaillit."""
> +
> +Headers = {
> +'Authorization',
> +'User-Agent',
> +'Referer',
> +'Cache-Control',
> +'Pragma',
> +'If-Modified-Since',
> +'Range',
> +'Accept',
> +'Accept-Encoding',
> +'Host',
> +'Connection',
> +'Proxy-Connection',
> +'Content-Type',
> +'Content-Length',
> +'Proxy-Authorization',
> +'Cookie',
> +'MyHeader',
> +}
> +
> +WGET_OPTIONS = ''
> +WGET_URLS = [[]]
> +Files = [[]]
> +
> +for index, header in enumerate(Headers, start=1):
> +File_rules = {
> +"ExpectHeader": {
> +header : 'any'
> +}
> +}
> +file_name = "File" + str(index)
> +Files[0].append (WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += ' --disable-header="' + header + '"'
> +WGET_URLS[0].append (file_name)
> +
> +# Define user defined headers
> +for header in Headers:
> +WGET_OPTIONS += ' --header="' + header + ': any"'
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/conf/reject_header_field.py 
> b/testenv/conf/reject_header_field.py
> new file mode 100644
> index ..e1009cdd
> --- /dev/null
> +++ b/testenv/conf/reject_header_field.py
> @@ -0,0 +1,12 @@
> +from conf import rule
> +
> +""" Rule: RejectHeaderField
> +This is a server side rule which expects a string list of Header Fields
> +which should be blacklisted by the server for a particular file's requests.
> +"""
> +
> +
> +@rule()
> +class RejectHeaderField:
> +def __init__(self, header_fields):
> +self.header_fields = header_fields
> diff --git a/testenv/server/http/http_server.py 
> b/testenv/server/http/http_server.py
> index 2cc82fb9..6f358335 100644
> --- a/testenv/server/http/http_server.py
> +++ b/testenv/server/http/http_server.py
> @@ -370,6 +370,14 @@ class _Handler(BaseHTTPRequestHandler):
>  header_line)
>  raise ServerError("Header " + header_line + ' received')
>  
> +def RejectHeaderField(self, header_fields_obj):
> +rej_header_fields = header_fields_obj.header_fields
> +for field in rej_header_fields:
> +if field in self.headers:
> +self.send_error(400, 'Blacklisted Header Field %s received' %
> +field)
> +raise ServerError('Header Field %s received' % field)
> +
>  def __log_request(self, method):
>  req = method + " " + self.path
>  self.server.request_headers.append(req)
> -- 
> 2.21.0
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] [PATCH] Disable automatic wget headers.

2019-05-04 Thread Darshit Shah
> +Files[0].append(WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += ' --disable-header="' + header + '"'
> +WGET_URLS[0].append(file_name)
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #####
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/Test-disable-headers-before.py 
> b/testenv/Test-disable-headers-before.py
> new file mode 100644
> index ..4356b4c1
> --- /dev/null
> +++ b/testenv/Test-disable-headers-before.py
> @@ -0,0 +1,77 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from test.base_test import HTTP, HTTPS
> +from misc.wget_file import WgetFile
> +
> +"""
> +This is test ensures that the --disable-header option doesn't remove 
> user headers
> +from the HTTP request when it's placed before --header="header: value".
> +"""
> +# File Definitions 
> ###
> +file_content = """Les paroles de la bouche d'un homme sont des eaux 
> profondes; 
> +La source de la sagesse est un torrent qui jaillit."""
> +
> +Headers = {
> +'Authorization',
> +'User-Agent',
> +'Referer',
> +'Cache-Control',
> +'Pragma',
> +'If-Modified-Since',
> +'Range',
> +'Accept',
> +'Accept-Encoding',
> +'Host',
> +'Connection',
> +'Proxy-Connection',
> +'Content-Type',
> +'Content-Length',
> +'Proxy-Authorization',
> +'Cookie',
> +'MyHeader',
> +}
> +
> +WGET_OPTIONS = ''
> +WGET_URLS = [[]]
> +Files = [[]]
> +
> +for index, header in enumerate(Headers, start=1):
> +File_rules = {
> +"ExpectHeader": {
> +header : 'any'
> +}
> +}
> +file_name = "File" + str(index)
> +Files[0].append (WgetFile(file_name, file_content, rules=File_rules))
> +WGET_OPTIONS += ' --disable-header="' + header + '"'
> +WGET_URLS[0].append (file_name)
> +
> +# Define user defined headers
> +for header in Headers:
> +WGET_OPTIONS += ' --header="' + header + ': any"'
> +
> +Servers = [HTTP]
> +
> +ExpectedReturnCode = 0
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files
> +}
> +test_options = {
> +"WgetCommands"  : WGET_OPTIONS,
> +"Urls"  : WGET_URLS
> +}
> +post_test = {
> +"ExpectedRetcode"   : ExpectedReturnCode
> +}
> +
> +err = HTTPTest (
> +pre_hook=pre_test,
> +test_params=test_options,
> +post_hook=post_test,
> +protocols=Servers
> +).begin ()
> +
> +exit (err)
> diff --git a/testenv/conf/reject_header_field.py 
> b/testenv/conf/reject_header_field.py
> new file mode 100644
> index ..e1009cdd
> --- /dev/null
> +++ b/testenv/conf/reject_header_field.py
> @@ -0,0 +1,12 @@
> +from conf import rule
> +
> +""" Rule: RejectHeaderField
> +This is a server side rule which expects a string list of Header Fields
> +which should be blacklisted by the server for a particular file's requests.
> +"""
> +
> +
> +@rule()
> +class RejectHeaderField:
> +def __init__(self, header_fields):
> +self.header_fields = header_fields
> diff --git a/testenv/server/http/http_server.py 
> b/testenv/server/http/http_server.py
> index 2cc82fb9..6f358335 100644
> --- a/testenv/server/http/http_server.py
> +++ b/testenv/server/http/http_server.py
> @@ -370,6 +370,14 @@ class _Handler(BaseHTTPRequestHandler):
>  header_line)
>  raise ServerError("Header " + header_line + ' received')
>  
> +def RejectHeaderField(self, header_fields_obj):
> +rej_header_fields = header_fields_obj.header_fields
> +for field in rej_header_fields:
> +if field in self.headers:
> +self.send_error(400, 'Blacklisted Header Field %s received' %
> +field)
> +raise ServerError('Header Field %s received' % field)
> +
>  def __log_request(self, method):
>  req = method + " " + self.path
>  self.server.request_headers.append(req)
> -- 
> 2.21.0
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Wget 1.20.3 on Solaris

2019-05-02 Thread Darshit Shah
That is a terrible suggestion. Please don't do that or suggest it.
I'm sure Jeffrey here is capable of knowing what this is and what the
implications of such a binary are. 

However, this is a publicly archived mailing list. People will find this when
they search the web in the future (not necessarily related to the same issue)
and this will break their systems in very confusing ways.

Making a fake binary in /usr/bin is something that should be done _very_ _very_
carefully.

* Tim Rühsen  [190502 14:10]:
> On 5/2/19 10:09 AM, Tim Rühsen wrote:
> > On 5/2/19 10:02 AM, Jeffrey Walton wrote:
> >> On Thu, May 2, 2019 at 4:00 AM Tim Rühsen  wrote:
> >>>
> >>> Hi Jeff,
> >>>
> >>> On 5/1/19 11:38 PM, Jeffrey Walton wrote:
> >>>> On Wed, May 1, 2019 at 3:51 PM Tim Rühsen  wrote:
> >>>>>
> >>>>> could you post e.g. the content of tests/Test-504.log ?
> >>>>
> >>>> Yes, attached.
> >>>>
> >>>> Do you want an account on the box. I keep it around for testing, and I
> >>>> can make you admin. You can connect to it with 'ssh
> >>>> trushen@151.196.22.177'. If so, send over your authorized_keys.
> >>>
> >>> thanks for the offer, but this issue is not Solaris specific.
> >>> But if the need arises, I have access to the OpenCSW Solaris boxes :-)
> >>>
> >>> Please check README.checkout which lists python3 as requirement for
> >>> running the tests in testenv/ (as Darshit also pointed out).
> >>
> >> Ack, thanks.
> >>
> >> Since I got you on the line, how do I disable them. There is no need
> >> to run them if all they are going to do is fail. I did not see a
> >> configure option.
> > 
> > Maybe the easiest way is before you bootstrap / autoconf (in the project
> > main dir):
> > 
> >   sed -i 's/ testenv//g' Makefile.am
> > 
> 
> Another way to do this once and forever:
> create /usr/bin/python3 with content
>   exit 77
> and chmod a+x it.
> 
> If in doubt about the path, consult your $PATH variable.
> 
> Regards, Tim
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Wget 1.20.3 on Solaris

2019-05-02 Thread Darshit Shah
* Tim Rühsen  [190502 10:21]:
> On 5/2/19 10:02 AM, Jeffrey Walton wrote:
> > On Thu, May 2, 2019 at 4:00 AM Tim Rühsen  wrote:
> >>
> >> Hi Jeff,
> >>
> >> On 5/1/19 11:38 PM, Jeffrey Walton wrote:
> >>> On Wed, May 1, 2019 at 3:51 PM Tim Rühsen  wrote:
> >>>>
> >>>> could you post e.g. the content of tests/Test-504.log ?
> >>>
> >>> Yes, attached.
> >>>
> >>> Do you want an account on the box. I keep it around for testing, and I
> >>> can make you admin. You can connect to it with 'ssh
> >>> trushen@151.196.22.177'. If so, send over your authorized_keys.
> >>
> >> thanks for the offer, but this issue is not Solaris specific.
> >> But if the need arises, I have access to the OpenCSW Solaris boxes :-)
> >>
> >> Please check README.checkout which lists python3 as requirement for
> >> running the tests in testenv/ (as Darshit also pointed out).
> > 
> > Ack, thanks.
> > 
> > Since I got you on the line, how do I disable them. There is no need
> > to run them if all they are going to do is fail. I did not see a
> > configure option.
> 
> Maybe the easiest way is before you bootstrap / autoconf (in the project
> main dir):
> 
>   sed -i 's/ testenv//g' Makefile.am
> 
> Regards, Tim
> 

The easiest way right now would be to install Python3. If that is not possible,
the easiest workaround is as Tim suggested.

However, this _is_ a bug. Since if you see testenv/Makefile.am, we have a
`HAVE_PYTHON3` conditional block. And configure also checks for it.

This is why I am interested in seeing the contents of your config.log file. I
would like to see what went wrong causing configure to believe that you indeed
have Python3 installed.



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Wget 1.20.3 on Solaris

2019-05-01 Thread Darshit Shah
The error seems to arise from a missing python3 binary on the test machine.

I do remember that we added a check for this to our build script to deal with
this issue. I wonder why it is broken on your machine. Could you also please
share the config.log file that is generated?

Anyways, a quick fix for you would be to install Python 3

* Jeffrey Walton  [190502 00:01]:
> On Wed, May 1, 2019 at 3:51 PM Tim Rühsen  wrote:
> >
> > could you post e.g. the content of tests/Test-504.log ?
> 
> Yes, attached.
> 
> Do you want an account on the box. I keep it around for testing, and I
> can make you admin. You can connect to it with 'ssh
> trushen@151.196.22.177'. If so, send over your authorized_keys.
> 
> Jeff




-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Bug: resources can't be mirrored via FTP if their name starts with a space character

2019-04-08 Thread Darshit Shah

* Tim Rühsen  [190408 09:24]:
> Hello Christian,
> 
> please add options '-d -olog --no-remove-listing' and send us (privately
> if you have secrets in the log file) the files 'log' and .listing.
> 
> A problem with FTP directory listings is that there is no standard. A
> leading space could easily be part of the formatting which is meant to
> be human readable, not machine readable. Assuming one space as delimiter
> likely is a regression for other users.
> 
> In the long term, you should consider HTTPS instead of FTP.
> 

Actually, from the provided data, I can already see what is happening and why. 
Just as Tim mentioned, the listing file doesn't have a fixed standard and uses
whitespace as a field separator. 

Looking at the `.listing` file that Wget downloaded, I can see that there is
just whitespace between the size and the filename. Wget has no way to tell if
the whitespace is a field separator or part of the filename. The correct way to
deal with this would be for your FTP server to generate a listing file similar
to ls. Where, it wraps the filename in quotes in order to show that the
whitespace is indeed a part of the filename

What FTP server are you using?

> Regards, Tim
> 
> On 08.04.19 02:13, Christian Rosentreter wrote:
> > 
> > Hi there,
> > 
> > A small bug I encountered: if any file or directory on a remote machine has 
> > pathnames
> > starting with one or more spaces (e.g. " foobar.txt") then Wget fails to 
> > mirror those
> > resources via the good old FTP protocol. Tested with '1.18' and up-to-date 
> > '1.20.3' on
> > Mac OS X.
> > 
> > If required, I could provide a testing FTP account for this particular 
> > setup to a
> > Wget developer (privately.)
> > 
> > 
> > with kind regards,
> > Christian Rosentreter
> > 
> > 
> > 
> > 
> > 
> > 
> > Example:
> > 
> > wget \
> > --config auth.config \
> > --mirror \
> > --no-host-directories \
> > --append-output wget.log \
> > ftp://wget.annex.binaryriot.org/ 
> > 
> > The test setup on the remote server (up-to-date Ubuntu) looks something 
> > like this, 
> > output via 'ls -l':
> > test/
> > -rw-r--r-- 1 foobar foobar0 Apr  8 00:31 '  
> > foobar-with-leading-spaces.txt'
> > -rw-r--r-- 1 foobar foobar0 Apr  8 00:31 ' foo bar with spaces .txt 
> > '
> > -rw-r--r-- 1 foobar foobar0 Apr  8 00:31  foobar-normal.txt
> > -rw-r--r-- 1 foobar foobar0 Apr  8 00:31 
> > 'foobar-with-trailing-spaces.txt  '
> > drwxr-xr-x 2 foobar foobar 4096 Apr  8 00:31 '  dir-with-leading-spaces'
> > drwxr-xr-x 2 foobar foobar 4096 Apr  8 00:31 'dir-with-trailing-spaces  
> >   '
> > drwxr-xr-x 2 foobar foobar 4096 Apr  8 00:31 '   dir with spaces  '
> > 
> > 
> > The content of the locally generated .listing file inside the "test" 
> > directory,
> > looks like this:
> > -rw-r--r--   1 foobar   foobar  0 Apr  8 00:31   
> > foobar-with-leading-spaces.txt
> > -rw-r--r--   1 foobar   foobar  0 Apr  8 00:31  foo bar with 
> > spaces .txt 
> > -rw-r--r--   1 foobar   foobar  0 Apr  8 00:31 foobar-normal.txt
> > -rw-r--r--   1 foobar   foobar  0 Apr  8 00:31 
> > foobar-with-trailing-spaces.txt  
> > drwxr-xr-x   2 foobar   foobar   4096 Apr  8 00:31   
> > dir-with-leading-spaces
> > drwxr-xr-x   2 foobar   foobar   4096 Apr  8 00:31 
> > dir-with-trailing-spaces
> > drwxr-xr-x   2 foobar   foobar   4096 Apr  8 00:31dir with 
> > spaces  
> > 
> > 
> > The actual locally mirrored directories and files, output via OS X's 'ls - 
> > l' look
> > like this (any resource with leading spaces is obviously missing in this 
> > local copy
> > now. That's a wee-bit bad for backups when important files can be 
> > unexpectedly
> > M.I.A. ;) )
> > test/
> > -rw-r--r--  1  foobar   foobar0 Apr  8 00:31 foobar-normal.txt
> > -rw-r--r--  1  foobar   foobar0 Apr  8 00:31 
> > foobar-with-trailing-spaces.txt  
> > drwxr-xr-x  3  foobar   foobar  102 Apr  8 00:32 
> > dir-with-trailing-spaces
> > 
> > 
> > In the log file (via --append-output) various errors like this are 
> > generated (the
> > leading spaces are missing), but with the return code of the command 
> > wrongly indicating
> > that the full mirror was actually a success in the end:
> > ...
> > No such file 'test/foobar-with-leading-spaces.txt'.
> > No such file 'test/foo bar with spaces .txt '.
> > No such directory 'test/dir-with-leading-spaces'.
> > No such directory 'test/dir with spaces  '.
> > ...
> > 
> > 
> > 
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] wget-1.20.3 released [stable]

2019-04-05 Thread Darshit Shah
Please find below the new release of GNU Wget. It fixes a buffer overflow
vulnerability which was reported to us by JPCERT.

Here are the compressed sources and a GPG detached signature[*]:
  https://ftp.gnu.org/gnu/wget/wget-1.20.3.tar.gz
  https://ftp.gnu.org/gnu/wget/wget-1.20.3.tar.gz.sig

Use a mirror for higher download bandwidth:
  https://ftpmirror.gnu.org/wget/wget-1.20.3.tar.gz
  https://ftpmirror.gnu.org/wget/wget-1.20.3.tar.gz.sig

Here are the MD5 and SHA1 checksums:

db4e6dc7977cbddcd543b240079a4899  wget-1.20.3.tar.gz
2b886eab5b97267cc358ab35e42d14d33d6dfc95  wget-1.20.3.tar.gz

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify wget-1.20.3.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 2A1743EDA91A35B6

and rerun the 'gpg --verify' command.

NEWS

* Changes in Wget 1.20.3

** Fixed a buffer overflow vulnerability



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Bugs in make check in wget2 on mac

2019-04-02 Thread Darshit Shah
e 107: 34502 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_atom_url_fuzzer
> > >> ../build-aux/test-driver: line 107: 34521 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_bar_fuzzer
> > >> ../build-aux/test-driver: line 107: 34540 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_base64_fuzzer
> > >> ../build-aux/test-driver: line 107: 34559 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_cookie_fuzzer
> > >> ../build-aux/test-driver: line 107: 34578 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_css_url_fuzzer
> > >> ../build-aux/test-driver: line 107: 34597 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_hpkp_fuzzer
> > >> ../build-aux/test-driver: line 107: 34616 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_hsts_fuzzer
> > >> ../build-aux/test-driver: line 107: 34635 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_html_url_fuzzer
> > >> ../build-aux/test-driver: line 107: 34654 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_http_client_fuzzer
> > >> ../build-aux/test-driver: line 107: 34673 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_http_parse_fuzzer
> > >> ../build-aux/test-driver: line 107: 34692 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_iri_fuzzer
> > >> ../build-aux/test-driver: line 107: 34711 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_metalink_parse_fuzzer
> > >> ../build-aux/test-driver: line 107: 34730 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_netrc_fuzzer
> > >> ../build-aux/test-driver: line 107: 34749 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_ocsp_fuzzer
> > >> ../build-aux/test-driver: line 107: 34768 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_robots_parse_fuzzer
> > >> ../build-aux/test-driver: line 107: 34787 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_sitemap_url_fuzzer
> > >> ../build-aux/test-driver: line 107: 34806 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_tlssess_fuzzer
> > >> ../build-aux/test-driver: line 107: 34825 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_utils_fuzzer
> > >> ../build-aux/test-driver: line 107: 34844 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: libwget_xml_parse_buffer_fuzzer
> > >> ../build-aux/test-driver: line 107: 34863 Abort trap: 6   "$@" >
> > >> $log_file 2>&1
> > >> FAIL: wget_options_fuzzer
> > >>
> > >>
> > 
> > >> Testsuite summary for wget2 1.99.1
> > >>
> > >>
> > 
> > >> # TOTAL: 20
> > >> # PASS:  0
> > >> # SKIP:  0
> > >> # XFAIL: 0
> > >> # FAIL:  20
> > >> # XPASS: 0
> > >> # ERROR: 0
> > >>
> > >>
> > 
> > >> See fuzz/test-suite.log
> > >> Please report to bug-wget@gnu.org
> > >>
> > >>
> > 
> > >> make[3]: *** [test-suite.log] Error 1
> > >> make[2]: *** [check-TESTS] Error 2
> > >> make[1]: *** [check-am] Error 2
> > >> make: *** [check-recursive] Error 1
> > >>
> > >
> > > Vriendelijke groeten,
> > > Kind regards,
> > >
> > > Dirk Loeckx
> > >
> > >
> > > d...@zeronary.care :: T +32 486 68 38 33 :: zeronary.care
> > > zeronary.care is an initiative of Jomale bvba :: VAT BE 0597.858.312
> > >
> >
> >



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] wget-1.20.2 released [stable]

2019-04-01 Thread Darshit Shah
Wget 1.20.2 has been released. There are no major user facing changes.

This release was made possible thanks to the work of:

André Wolski
Darshit Shah
Jeffrey Walton
Leon Klingele
Nam Nguyen
Simon Dales
Tim Rühsen

Here are the compressed sources and a GPG detached signature[*]:
  https://ftp.gnu.org/gnu/wget/wget-1.20.2.tar.gz
  https://ftp.gnu.org/gnu/wget/wget-1.20.2.tar.gz.sig

Use a mirror for higher download bandwidth:
  https://ftpmirror.gnu.org/wget/wget-1.20.2.tar.gz
  https://ftpmirror.gnu.org/wget/wget-1.20.2.tar.gz.sig

Here are the MD5 and SHA1 checksums:

2692f6678e93601441306b5c1fc6a77a  wget-1.20.2.tar.gz
03869edd390bcca0c5296c7aafb8882b727cac2b  wget-1.20.2.tar.gz

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify wget-1.20.2.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 2A1743EDA91A35B6

and rerun the 'gpg --verify' command.


NEWS

* Changes in Wget 1.20.2

** NTLM authentication will retry under certain cases

** Fixed a buffer overflow vulnerability



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] [Wget-Bug][PATCH] Disable automatic wget headers

2019-03-15 Thread Darshit Shah
Hi Adham,

Please send responses to the mailing list.

* adham elkarn  [190315 18:43]:
> Hi Darnir,
> 
> Thank you for your answer.
> 
> This is my first patch so i thank you also for your patience.
> 
> I understood your implementation but i want to be sure. A command like:
> 
> wget  --disable-header="User-Agent" --disable-header="Accept" is 
> equivalent to :
> 
> wget  --header=User-Agent --header=Accept.

No, that leads to confusion. It should be:

wget  --header="User-Agent: " --header="Accept: "

> 
> Also would you keep the --no-headers option? I think it allows someone that 
> doesn't know what are the default request headers remove them all.

I am not too keen on it. The default request headers can be mentioned in the
manual. Adding new options incurs a maintenance overhead.
> 
> Of course the previous patch will be modified.
> 
> 
> Envoyé à partir de Outlook<http://aka.ms/weboutlook>
> 
> 
> 
> De : Darshit Shah 
> Envoyé : vendredi 15 mars 2019 11:18
> À : adham elkarn
> Cc : elkarni...@hotmail.fr; bug-wget@gnu.org
> Objet : Re: [Bug-wget] [Wget-Bug][PATCH] Disable automatic wget headers
> 
> Hi Afham,
> 
> Thanks for working on this bug. However, I am not convinced that this is the
> best way of implementing this feature.
> 
> I would rather that we have a `--disable-header` switch which takes as input
> the names of the headers that should be disabled. This way it is more 
> flexible.
> Simultaneously, it would be nice if you would also implement a parallel 
> feature
> in `--header` where the input of "header-name: " will cause  to
> not be added to the request headers. This allows users to use the existing
> `--header` switch as well.
> 
> Also, in the current implementation, I would prefer that you disabled the
> generation of the header itself, rather than removing it at a later stage. 
> That
> is just inefficient.
> 
> * adham elkarn  [190315 10:26]:
> > From: a-elk 
> >
> >
> > Disable automatic wget headers.
> >
> > *options.h: added no-headers member
> > *http.c: removed default headers
> > *main.c: added new option noheaders, added help description
> > *init.c: adde new option noheaders
> >
> > >From bug #54769 (https://savannah.gnu.org/bugs/?54769).
> > Some servers doesn't handle well some headers. A --no-headers options will 
> > ensure a request will not include default
> > defaut headers. This option disables default headers except Accept header 
> > and Host header
> >
> > Signed-off-by: Moises Torres, Adham El karn
> > ---
> >  src/http.c| 8 
> >  src/init.c| 1 +
> >  src/main.c| 3 +++
> >  src/options.h | 1 +
> >  5 files changed, 14 insertions(+)
> >
> > diff --git a/src/http.c b/src/http.c
> > index 304a2f86..e4bcbf27 100644
> > --- a/src/http.c
> > +++ b/src/http.c
> > @@ -3259,6 +3259,14 @@ gethttp (const struct url *u, struct url 
> > *original_url, struct http_stat *hs,
> > ),
> >  rel_value);
> >
> > +  /* Remove default headers */
> > +  if (opt.no_headers)
> > +{
> > +  int i;
> > +  for (i = 0; i < req->hcount; i++)
> > +  request_remove_header(req, req->headers[i].name);
> > +}
> > +
> >/* Add the user headers. */
> >if (opt.user_headers)
> >  {
> > diff --git a/src/init.c b/src/init.c
> > index 9b6665a6..ae2adeff 100644
> > --- a/src/init.c
> > +++ b/src/init.c
> > @@ -262,6 +262,7 @@ static const struct {
> >{ "netrc",, cmd_boolean },
> >{ "noclobber",, cmd_boolean },
> >{ "noconfig", ,  cmd_boolean },
> > +  { "noheaders",_headers, cmd_boolean},
> >{ "noparent", _parent, cmd_boolean },
> >{ "noproxy",  _proxy,  cmd_vector },
> >{ "numtries", ,  cmd_number_inf },/* 
> > deprecated*/
> > diff --git a/src/main.c b/src/main.c
> > index 65b7f3f3..92f87171 100644
> > --- a/src/main.c
> > +++ b/src/main.c
> > @@ -377,6 +377,7 @@ static struct cmdline_option option_data[] =
> >  { "no", 'n', OPT__NO, NULL, required_argument },
> >  { "no-clobber", 0, OPT_BOOLEAN, "noclobber", -1 },
> >  { "no-config", 0, OPT_BOOLEAN, "noconfig", -1},
> > +{ &q

Re: [Bug-wget] [Wget-Bug][PATCH] Disable automatic wget headers

2019-03-15 Thread Darshit Shah
Hi Afham,

Thanks for working on this bug. However, I am not convinced that this is the
best way of implementing this feature.

I would rather that we have a `--disable-header` switch which takes as input
the names of the headers that should be disabled. This way it is more flexible.
Simultaneously, it would be nice if you would also implement a parallel feature
in `--header` where the input of "header-name: " will cause  to
not be added to the request headers. This allows users to use the existing
`--header` switch as well.

Also, in the current implementation, I would prefer that you disabled the
generation of the header itself, rather than removing it at a later stage. That
is just inefficient.

* adham elkarn  [190315 10:26]:
> From: a-elk 
> 
> 
> Disable automatic wget headers.
> 
> *options.h: added no-headers member
> *http.c: removed default headers
> *main.c: added new option noheaders, added help description
> *init.c: adde new option noheaders
> 
> >From bug #54769 (https://savannah.gnu.org/bugs/?54769).
> Some servers doesn't handle well some headers. A --no-headers options will 
> ensure a request will not include default
> defaut headers. This option disables default headers except Accept header and 
> Host header
> 
> Signed-off-by: Moises Torres, Adham El karn
> ---
>  src/http.c| 8 
>  src/init.c| 1 +
>  src/main.c| 3 +++
>  src/options.h | 1 +
>  5 files changed, 14 insertions(+)
> 
> diff --git a/src/http.c b/src/http.c
> index 304a2f86..e4bcbf27 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3259,6 +3259,14 @@ gethttp (const struct url *u, struct url 
> *original_url, struct http_stat *hs,
> ),
>  rel_value);
>  
> +  /* Remove default headers */
> +  if (opt.no_headers)
> +{
> +  int i;
> +  for (i = 0; i < req->hcount; i++)
> + request_remove_header(req, req->headers[i].name);
> +}
> +  
>/* Add the user headers. */
>if (opt.user_headers)
>  {
> diff --git a/src/init.c b/src/init.c
> index 9b6665a6..ae2adeff 100644
> --- a/src/init.c
> +++ b/src/init.c
> @@ -262,6 +262,7 @@ static const struct {
>{ "netrc",, cmd_boolean },
>{ "noclobber",, cmd_boolean },
>{ "noconfig", ,  cmd_boolean },
> +  { "noheaders",_headers, cmd_boolean},
>{ "noparent", _parent, cmd_boolean },
>{ "noproxy",  _proxy,  cmd_vector },
>{ "numtries", ,  cmd_number_inf },/* 
> deprecated*/
> diff --git a/src/main.c b/src/main.c
> index 65b7f3f3..92f87171 100644
> --- a/src/main.c
> +++ b/src/main.c
> @@ -377,6 +377,7 @@ static struct cmdline_option option_data[] =
>  { "no", 'n', OPT__NO, NULL, required_argument },
>  { "no-clobber", 0, OPT_BOOLEAN, "noclobber", -1 },
>  { "no-config", 0, OPT_BOOLEAN, "noconfig", -1},
> +{ "no-headers", 0, OPT_BOOLEAN, "noheaders", no_argument},
>  { "no-parent", 0, OPT_BOOLEAN, "noparent", -1 },
>  { "output-document", 'O', OPT_VALUE, "outputdocument", -1 },
>  { "output-file", 'o', OPT_VALUE, "logfile", -1 },
> @@ -1025,6 +1026,8 @@ Recursive accept/reject:\n"),
>-X,  --exclude-directories=LIST  list of excluded directories\n"),
>  N_("\
>-np, --no-parent don't ascend to the parent directory\n"),
> +N_("\
> +   --no-headersdon't include default headers\n"),
>  "\n",
>  N_("Email bug reports, questions, discussions to \n"),
>  N_("and/or open issues at 
> https://savannah.gnu.org/bugs/?func=additem=wget.\n;)
> diff --git a/src/options.h b/src/options.h
> index 881e2b2e..65055ad8 100644
> --- a/src/options.h
> +++ b/src/options.h
> @@ -147,6 +147,7 @@ struct options
>char *http_user;  /* HTTP username. */
>char *http_passwd;/* HTTP password. */
>char **user_headers;  /* User-defined header(s). */
> +  bool no_headers;  /* Don'include default headers */
>bool http_keep_alive; /* whether we use keep-alive */
>  
>bool use_proxy;   /* Do we use proxy? */
> -- 
> 2.17.1
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Some URLs are saved as gzip-compressed data

2019-03-08 Thread Darshit Shah
Hi Mark,

The server in this case seems to be badly configured. Wget's request to the
server looks like this:

> ---request begin---
> GET /wcsstore/CrucialSAS/firmware/mx100/MU03/MX100_MU03_Update.zip HTTP/1.1
> User-Agent: Wget/1.20.1 (linux-gnu)
> Accept: */*
> Accept-Encoding: identity
> Host: assets.crucial.com
> Connection: Keep-Alive

If you notice here, the Encoding requested by Wget is "indentity", that is,
please don't make any changes to the file. And yet, the server replies back
with a "Content-Encoding: gzip". The server is sending it in a format that Wget
did not request. Firefox on the other hand, does have an "Accept-Encoding"
header that contains gzip. So it is able to deal with the resulting data
correctly.

One thing you can do it, use the `--compression=gzip` option. This will make
Wget request a gzip encoding, causing the resulting file to be correct.
However, remember this is only a workaround for a broken server. And the
`--compression=gzip` is not a default since it doesn't always work as expected.
Sadly, there are too many servers out there that break the spec in weird ways
and we are unable to support all those cases at this point.


* mar...@iname.com  [190308 23:20]:
> I'm testing wget 1.20.1 on Lubuntu 16.04 x86-64.
> 
> I noticed an issue when trying to download a certain URL which is
> a .zip archive. The file created by wget is gzip-compressed. But
> downloading using e.g. Firefox the URL is saved as a .zip file.
> 
> Example:
> 
> $ wget 
> https://assets.crucial.com/wcsstore/CrucialSAS/firmware/mx100/MU03/MX100_MU03_Update.zip
> --2019-03-08 20:24:31--  
> https://assets.crucial.com/wcsstore/CrucialSAS/firmware/mx100/MU03/MX100_MU03_Update.zip
> Resolving assets.crucial.com... 68.232.35.24
> Connecting to assets.crucial.com|68.232.35.24|:443... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 12408650 (12M) [application/zip]
> Saving to: ‘MX100_MU03_Update.zip’
> 
> MX100_MU03_Update.zip  
> 100%[==>]
>   11.83M   738KB/sin 18s 
> 
> 2019-03-08 20:24:49 (680 KB/s) - ‘MX100_MU03_Update.zip’ saved 
> [12408650/12408650]
> 
> $ file MX100_MU03_Update.zip 
> MX100_MU03_Update.zip: gzip compressed data, from Unix
> 
> 
> Mark
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] New to list and program

2019-02-24 Thread Darshit Shah
Hi Annette,

That error would happen if the web server you're trying to backup doesn't like
it. One thing you can do to fix it is to throttle your requests, of course,
this will slow down the entire process. In order to do this, use the options:
"--wait=1 --random-wait".

This will cause Wget to wait between 0.5 and 1.5 seconds between each page it
downloads. That should ideally prevent any "429 Too Many Requests" errors.

If it still causes problems, you could attempt to add the following options as
well:
"--retry-on-http-error=429 --wait-retry=20"
This will cause Wget to wait for a few seconds when the server complains before
resuming its job.

You can also pass the "-k -K" options to Wget so that it converts the links in
the downloaded pages to local links. That is, the links will no longer take you
back to the actual website on the web.

When trying again, you should also use "-c" to ask Wget to try and continue the
download rather than doing it all again. It may or may not help, but it can't
hurt.


So, finally the options you need are:

$ wget -m -c -k -K --wait=1 --random-wait 

* Crusade 36  [190224 17:23]:
> Thank you so much for your reply!
> I did check the: 
> Internet Archive 
> 
> The didn't have most of the board backed up, mostly a surface page, then a 
> link to where the page is on the web.
> I did try as you suggested:
> $ wget -m 
> It worked wonders,  to start, then:
> connected.
> HTTP request sent, awaiting response... 429 Too Many Requests
> 2019-02-24 10:56:46 ERROR 429: Too Many Requests.
> 
> It saved the surface page, some pages, but then the links would take you back 
> to the actual website on the web. 
> 
> I  was wondering if you had any ideas, on what I might be done next. 
> 
> Thank you so much!
> 
> Annette
>  
> 
> On Friday, February 22, 2019 5:59 PM, Darshit Shah  wrote:
>  
> 
>  Hi Annette,
> 
> This is absolutely the perfect place for you to ask questions and get help.
> We'd be glad to help you archive your message board.
> 
> With wget, the basic command you should need is:
> 
> $ wget -m 
> 
> This will invoke Wget in the mirror mode which tries to make a perfect local
> copy of the website. If it doesn't work to your taste, you can come back to us
> and we'll help you to tweak the options to get it just right.
> 
> However, before you attempt to do this, may I suggest you go through The
> Internet Archive (https://archive.org). They might already have a full backup
> of the website which you can browse and even download locally.
> 
> * Crusade 36  [190222 23:53]:
> > Hello,
> > I've subscribed to this list, because it said it was for bugs or for 
> > getting help with the program.
> > To be upfront, I was a participant / owner,  for 17 years on a message 
> > board, first Ezboard, then Yuku, now its own by Taptalk, and it was so bad 
> > we moved the board. But there is 17 years of data, role playing, that I 
> > would love to back up. 
> > 
> > 99.% of the customization of the board is gone, but the data is still 
> > there:Solaris Humanus
> > 
> >  
> > |  
> > |  
> > |  
> > |  |    |
> > 
> >    |
> > 
> >  |
> > |  
> > |  |  
> > Solaris Humanus
> >  Solaris Humanus  |  |
> > 
> >  |
> > 
> >  |
> > 
> >  
> > 
> > But have no clue on how to do it. Which is why I am writing for help, I am 
> > computer illiterate,  I know the basic things, but when I looked at the 
> > document page, how to page, and starting reading at the top, I got so 
> > confused not longer after. 
> > 
> > Help would be appreciated, as I would love to back up the entire site, but 
> > at the same time, if I writing this inappropriate, then please forgive me.
> > 
> > Thank you
> > Annette
> > 
> > 
> > 
> > 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
> 
>

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] New to list and program

2019-02-22 Thread Darshit Shah
Hi Annette,

This is absolutely the perfect place for you to ask questions and get help.
We'd be glad to help you archive your message board.

With wget, the basic command you should need is:

$ wget -m 

This will invoke Wget in the mirror mode which tries to make a perfect local
copy of the website. If it doesn't work to your taste, you can come back to us
and we'll help you to tweak the options to get it just right.

However, before you attempt to do this, may I suggest you go through The
Internet Archive (https://archive.org). They might already have a full backup
of the website which you can browse and even download locally.

* Crusade 36  [190222 23:53]:
> Hello,
> I've subscribed to this list, because it said it was for bugs or for getting 
> help with the program.
> To be upfront, I was a participant / owner,  for 17 years on a message board, 
> first Ezboard, then Yuku, now its own by Taptalk, and it was so bad we moved 
> the board. But there is 17 years of data, role playing, that I would love to 
> back up. 
> 
> 99.% of the customization of the board is gone, but the data is still 
> there:Solaris Humanus
> 
>   
> |  
> |   
> |   
> |   ||
> 
>|
> 
>   |
> |  
> |   |  
> Solaris Humanus
>  Solaris Humanus  |   |
> 
>   |
> 
>   |
> 
>  
> 
> But have no clue on how to do it. Which is why I am writing for help, I am 
> computer illiterate,  I know the basic things, but when I looked at the 
> document page, how to page, and starting reading at the top, I got so 
> confused not longer after. 
> 
> Help would be appreciated, as I would love to back up the entire site, but at 
> the same time, if I writing this inappropriate, then please forgive me.
> 
> Thank you
> Annette
> 
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Unexpected wget -N behaviour for 1.17 onwards?

2019-02-11 Thread Darshit Shah
itting download, even though I knew the file sizes were different
> >> between the server and wget's copy - though the wget man page
> >> explicitly states that if the file sizes do not match, -N will trigger
> >> a download.
> >>
> >> I tried on OpenSUSE 42.3 (wget 1.14) and the incomplete file triggered
> >> a download, even though wgetrc was identical.
> >>
> >> Again, on Leap 15, I compiled 1.20.1 (latest), 1.17.1, and then
> >> finally with 1.16.3 the behaviour went back to what I expected (and I
> >> got my corrupted phone backups fixed).
> >>
> >> Was a bug possibly introduced in 1.17 with the support for 
> >> --if-modified-since?
> >>
> >> Version shipping with OpenSUSE Leap 15:
> >> GNU Wget 1.19.5 built on linux-gnu.
> >> +cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
> >> +ntlm +opie +psl +ssl/openssl
> >>
> >> Last version I tried where "wget -r -N" works as expected:
> >> GNU Wget 1.16.3 built on linux-gnu.
> >> +digest +https +ipv6 -iri +large-file +nls +ntlm +opie +psl +ssl/gnutls
> >>
> >> I'm open to the possibility that there may be something else causing
> >> this bug, I have not found many mentions of it, but then again it is
> >> subtle. You get pretty confident when you just let wget do its thing,
> >> so there may be a lot of incomplete files out there... :)
> >>
> >> Thanks so much for your help. I can provide any other info that would
> >> be helpful.
> >>
> >> Lawrence Wade
> >> Ottawa, Canada
> > 
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] gnulib in wget needs an update

2019-02-10 Thread Darshit Shah
I've updated the gnulib version in the git repos. However, unless there is an
actual problem with Wget crashing, the change will be available only with the
next release.

* LRN  [190210 10:08]:
> This[0] commit in gnulib fixed a critical bug that makes wget crash with
> stackoverflow. The gnulib version in wget is slightly older than that.
> 
> [0]:
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=72e936e89c09bcf1a76479258881d91b0a27003f
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget]

2019-01-24 Thread Darshit Shah
* mich...@cyber-dome.com  [190124 21:40]:
> 
> Hello all,
> 
> I noticed that my WordPress snapshot has missing image.
> It seems that html5 has new attribute of  called srcset.
> 
> Are we downloading the images mentioned in srcset attribute?

As far as I'm aware both Wget and Wget2 know about the srcset attribute, so it
should work just fine.

If you could please share the logs for the recursive retrieval, we could
probably try and find a reason for why the single image isn't being downloaded
> 
> Michael
> 
> 
> 
>

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] WGET2: '--convert-links' breaks from '--html-extension' as well as '--adjust-extension'

2019-01-24 Thread Darshit Shah
Hi,

Thanks for th bug report. Yes, it is a known issue and we'd like to solve it.
However, the problem is that we lack the manpower. There's just one or two if
us working on Wget2 on a regular basis and even then we do it purely in our
free time. So, some of these things take a while to get resolved.

For reference, I opened #415 sometime in November and it is exactly the same as
the issue you opened. We had some ideas, but neither of us had the time to
completely deal with this. Any support in the form of code contributions is
extremely welcome!

* Jeffrey Fetterman  [190124 23:13]:
> If you specify --html-extension or --adjust-extension when downloading a
> page that does not end with an extension (might also be a problem with any
> site that doesn't end in .html), wget2 can't find the file to convert the
> links afterward.
> 
> Can this please get looked into? It's been 3 weeks since I've posted this
> up as an issue on gitlab <https://gitlab.com/gnuwget/wget2/issues/423> and
> there hasn't been any response, I've been having to use the December 12th
> build in the meantime. The build the day after, which was supposed to fix
> an issue related to convert-links, is what broke it.
> 
> At the time, it wasn't a big deal, but there's been a ton of updates since
> then and no word if convert-links is going to be resolved.
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] Fwd: wget-1.20 released [stable]

2018-12-03 Thread Darshit Shah
- Forwarded message from Darshit Shah  -

> We are pleased to announce the release of GNU Wget 1.20.
> 
> GNU Wget is a free utility for non-interactive download of files from the Web.
> It supports HTTP(S), and FTP(S) protocols, as well as retrieval through HTTP
> proxies.
> 
> This is a small release with some bugfixes and a few quality of life
> improvements.
> 
> With this announcement, I'd also like to state that from now on, the latest
> tarball of GNU Wget can always be attained using the following link:
> 
> https://ftp.gnu.org/gnu/wget/wget-latest.tar.gz
> https://ftp.gnu.org/gnu/wget/wget-latest.tar.gz.sig
> 
> Many thanks to everyone who contributed to this release:
> 
> Darshit Shah
> ethus3h
> Jay Satiro
> Josef Moellers
> Kapus, Timotej
> Luiz Angelo Daros de Luca
> Nicholas Sielicki
> Nikos Mavrogiannopoulos
> Noël Köthe
> Rosen Penev
> Tim Rühsen
> Tomas Hozza
> Tomas Korbar
> 
> =
> 
> Here are the compressed sources and a GPG detached signature[*]:
>   https://ftp.gnu.org/gnu/wget/wget-1.20.tar.gz
>   https://ftp.gnu.org/gnu/wget/wget-1.20.tar.gz.sig
> 
> Use a mirror for higher download bandwidth:
>   https://ftpmirror.gnu.org/wget/wget-1.20.tar.gz
>   https://ftpmirror.gnu.org/wget/wget-1.20.tar.gz.sig
> 
> Here are the MD5 and SHA1 checksums:
> 
> 9f1515d083b769e9ff7642ce6016518e  wget-1.20.tar.gz
> 467c0ec7dab302cf1826970c1925999d64a6ee9d  wget-1.20.tar.gz
> 
> [*] Use a .sig file to verify that the corresponding file (without the
> .sig suffix) is intact.  First, be sure to download both the .sig file
> and the corresponding tarball.  Then, run a command like this:
> 
>   gpg --verify wget-1.20.tar.gz.sig
> 
> If that command fails because you don't have the required public key,
> then run this command to import it:
> 
>   gpg --keyserver keys.gnupg.net --recv-keys 2A1743EDA91A35B6
> 
> and rerun the 'gpg --verify' command.
> 
> NEWS
> 
> * Changes in Wget 1.20
> 
> ** Add new option `--retry-on-host-error` to treat local errors as transient
> and hence Wget will retry to download the file after a brief waiting period.
> 
> ** Fixed multiple potential resource leaks as found by static analysis
> 
> ** Wget will now not create an empty wget-log file when running with -q and -b
> switches together
> 
> ** When compiled using the GnuTLS >= 3.6.3, Wget now has support for TLSv1.3
> 
> ** Now there is support for using libpcre2 for regex pattern matching
> 
> ** When downloading over FTP recursively, one can now use the
> --{accept,reject}-regex switches to fine-tune the downloaded files
> 
> ** Building Wget from the git sources now requires autoconf 2.63 or above.
> Building from the Tarballs works as it used to.


-- 
On Behalf of the maintainers of GNU Wget,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] wget-1.20 released [stable]

2018-12-03 Thread Darshit Shah
We are pleased to announce the release of GNU Wget 1.20.

GNU Wget is a free utility for non-interactive download of files from the Web.
It supports HTTP(S), and FTP(S) protocols, as well as retrieval through HTTP
proxies.

This is a small release with some bugfixes and a few quality of life
improvements.

With this announcement, I'd also like to state that from now on, the latest
tarball of GNU Wget can always be attained using the following link:

https://ftp.gnu.org/gnu/wget/wget-latest.tar.gz
https://ftp.gnu.org/gnu/wget/wget-latest.tar.gz.sig

Many thanks to everyone who contributed to this release:

Darshit Shah
ethus3h
Jay Satiro
Josef Moellers
Kapus, Timotej
Luiz Angelo Daros de Luca
Nicholas Sielicki
Nikos Mavrogiannopoulos
Noël Köthe
Rosen Penev
Tim Rühsen
Tomas Hozza
Tomas Korbar

=

Here are the compressed sources and a GPG detached signature[*]:
  https://ftp.gnu.org/gnu/wget/wget-1.20.tar.gz
  https://ftp.gnu.org/gnu/wget/wget-1.20.tar.gz.sig

Use a mirror for higher download bandwidth:
  https://ftpmirror.gnu.org/wget/wget-1.20.tar.gz
  https://ftpmirror.gnu.org/wget/wget-1.20.tar.gz.sig

Here are the MD5 and SHA1 checksums:

9f1515d083b769e9ff7642ce6016518e  wget-1.20.tar.gz
467c0ec7dab302cf1826970c1925999d64a6ee9d  wget-1.20.tar.gz

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify wget-1.20.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 2A1743EDA91A35B6

and rerun the 'gpg --verify' command.

NEWS

* Changes in Wget 1.20

** Add new option `--retry-on-host-error` to treat local errors as transient
and hence Wget will retry to download the file after a brief waiting period.

** Fixed multiple potential resource leaks as found by static analysis

** Wget will now not create an empty wget-log file when running with -q and -b
switches together

** When compiled using the GnuTLS >= 3.6.3, Wget now has support for TLSv1.3

** Now there is support for using libpcre2 for regex pattern matching

** When downloading over FTP recursively, one can now use the
--{accept,reject}-regex switches to fine-tune the downloaded files

** Building Wget from the git sources now requires autoconf 2.63 or above.
Building from the Tarballs works as it used to.


-- 
On Behalf of the maintainers of GNU Wget,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature
-- 
If you have a working or partly working program that you'd like
to offer to the GNU project as a GNU package,
see https://www.gnu.org/help/evaluation.html.

Re: [Bug-wget] "Referer" when using spider mode

2018-11-27 Thread Darshit Shah
Hi Fernando,

Once again, the answer is quite the same. You could parse the --debug output of
Wget to do this. Though, remember, parsing the debug is not always safe since
we may change it any point.

However, even for this case, I guess, using Wget2 is a better choice for you.

https://gitlab.com/gnuwget/wget2

* Fernando Gont  [181127 13:08]:
> Folks,
> 
> I'm using wget in a script to find broken and "moved" links in a web site.
> 
> My problem is that, when parsing the output of "wget --spider", I cannot
> tell which page triggered the retrieval of a URL (i.e., the "referer" of
> such URL) -- so, while I can find that there are broken links, I cannot
> easily tell which page contains the broken link.
> 
> Any clues on how to obtain such info?
> 
> Thanks!
> Fernando
> 
> 
> 
> 
> -- 
> Fernando Gont
> SI6 Networks
> e-mail: fg...@si6networks.com
> PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492
> 
> 
> 
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Check external reference, but don't process further

2018-11-27 Thread Darshit Shah
Hi Fernando,

As far as I'm aware there is no way to limit the recursion depth only on
foreign hosts. Something like this would definitely be a lot easier to do using
Wget2 which offers a few more powerful tools that Wget does. Wget2's alpha is
currently available in the Debian repositories and Arch Linux's AUR.

If you'd still like to continue using Wget, one way to pull this off would be
to have Wget print its debug output and then parse that to extract all the URIs
on foreign hosts. You can then have a second invokation of Wget to test for
their existence. An example of doing this would be:

$ wget -r --spider -d exmaple.com | grep -B1 "This is not the same hostname as 
the parent's" | grep "Deciding whether to enqueue" | sed 
's/.*\"\(.*\)\"\./\1/g' | wget --spider -i-

Of course, you may want to modify this to meet your own needs, but the general
idea should work for you

* Fernando Gont  [181127 13:08]:
> Folks,
> 
> I'm using wget in a script to check for broken links in a web site,
> which uses the "--spider" mode.
> 
> I'd like wget to operate in recursive mode for pages in the target
> domain, but not for pages in other hosts/sites.
> 
> That is, if I'm crawling www.example.com, I'd like wget to process all
> pages in that domain recursively. However, if there's a link to an
> external site, I just want wget to check that URL, but not process that
> external reference recursively.
> 
> "-D" would seem to prevent checking external references, so I cannot use
> it. And "--level" would mean that pages on external sites my still be
> processed recursively.
> 
> Any advice on how to implement this?
> 
> Thanks!
> 
> Cheers,
> Fernando
> 
> 
> 
> 
> -- 
> Fernando Gont
> SI6 Networks
> e-mail: fg...@si6networks.com
> PGP Fingerprint:  31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492
> 
> 
> 
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] [bug #52705] HTML assets embedding with --page-requisites

2018-11-12 Thread Darshit Shah
Update of bug #52705 (project wget):

  Status:None => Wont Fix   
 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #52986] Does not compile due to Python

2018-11-12 Thread Darshit Shah
Update of bug #52986 (project wget):

 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #53020] wget -b produces empty wget-log file

2018-11-12 Thread Darshit Shah
Update of bug #53020 (project wget):

 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #53191] wget doesn't unzip data even with an Encoding:gzip header

2018-11-12 Thread Darshit Shah
Update of bug #53191 (project wget):

  Status:None => Needs Discussion   
 Open/Closed:Open => Closed 

___

Follow-up Comment #1:

This has been fixed in a past update where we changed the default to not
asking for gzip.

There are quite a few issues with various servers behaving differently and so
we need a little time to figure out all these cases

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-11-12 Thread Darshit Shah
Update of bug #53322 (project wget):

  Status:None => Invalid
 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #53750] Regex are ignored with ftp

2018-11-12 Thread Darshit Shah
Update of bug #53750 (project wget):

  Status:   Confirmed => Ready for Merge
 Assigned to:None => darnir 
Operating System:  Mac OS => None   
 Planned Release:None => 1.20   

___

Follow-up Comment #2:

I've sent a patch to the mailing lists that adds this support. It should be
available in the next release.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #53818] Proposal: Check HTML suffix (for TEXTHTML flag) also on unchanged files

2018-11-12 Thread Darshit Shah
Update of bug #53818 (project wget):

  Status:None => Inspected  
 Planned Release:None => 1.20   


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #54126] Wget keeps crashing in Windows sometimes when the filename is large enough to scroll it

2018-11-12 Thread Darshit Shah
Update of bug #54126 (project wget):

 Assigned to:None => darnir 
 Planned Release:None => 1.21   

___

Follow-up Comment #1:

Thanks for the report! I'll take a look at the issue and see if I can spot any
problems. Seems like this only occurs under Windows, which makes it slightly
problematic when reproducing.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #53884] Can wget (and any GNU projects) avoid spitting non-ASCII on the command line???

2018-11-12 Thread Darshit Shah
Update of bug #53884 (project wget):

  Status:None => Invalid
 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: [Bug-wget] Debugging issues

2018-11-10 Thread Darshit Shah
[Moving to wget-dev]

You should never touch any generated files. If you want to have debug symbols,
set the appropriate CFLAGS environment before running configure.

$ ./bootstrap
$ CFLAGS="-g -ggdb3 -O0" ./configure
$ make

Also, I don't use Netbeans (or any other IDE), but I don't understand why you
would have so many projects? You just need one that starts the root directory
of the repo?


* mich...@cyber-dome.com  [181110 10:13]:
> 
> Hello all,
> 
> I managed to get my Netbeans working! (I created several projects for each
> directory. Too much visual studio solutions)
> 
> Now, I changed the "gcc" to "gcc -g" In the make files.
> 
> When I attempt to run a debug run, I get the error:
> "wget2/netbeans/src/.libs/wget2: symbol lookup error:
> /home/mik/wget2/netbeans/src/.libs/wget2: undefined symbol: wget_strerror"
> 
> Can you salvage me on this?
> 
> Do you have a way to run the make with debugging information generated?
> (without changing the Makefile)?
> 
> For instance: 
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Timestamping vs incomplete downloads

2018-10-23 Thread Darshit Shah
On October 22, 2018 11:49:12 PM UTC, Dave Warren  wrote:
>Currently when a download with timestamping enabled gets interrupted, 
>the timestamp of the resulting file ends up being the current time and 
>when wget is re-executed after connectivity is restored the local file 
>is then seen as newer and skipped.
>
>robocopy handles this a little differently, by setting a date far in
>the 
>past as a way of ensuring that on a subsequent execution the transfer 
>can be resumed.
>
>Is there a better way to handle this situation in wget? A way to force 
>an old date on the file? I'd be happy with a fixed "in the past" date, 
>the service supplied date minus a second, etc. Or some way to detect 
>that the file is incomplete (too small) on a subsequent run?

I haven't tested it but what you say indeed sounds like a valid bug.

The cleanest approach, IMO, is to use the extended file attributes in modern 
systems to store this time at the very beginning and look for it on 
continuation. Setting the time in the past doesn't work since every packet that 
is written will once again update the last modified time. Setting the time 
after each write() is not a feasible solution. What you suggest can only work 
when the client gets a clean exit in the face of an interruption and this isn't 
always the case. 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [Bug-wget] exit status problem with pipe

2018-10-22 Thread Darshit Shah
This is difficult to do correctly for network applications. Is there any
particular reason you're looking for identifying if Wget was killed by SIGPIPE?

The problem here is that a closed socket will also result in a SIGPIPE on unix
systems. But in such a case, we don't want to kill the application. As a
result, we simply ignore the SIGPIPE handler. So when Wget dies in the command
you shared, its not really killed by SIGPIPE. Hence, in theory, it would be
incorrect for Wget to exit with a code of 141.

* Peng Yu  [181022 16:29]:
> Hi,
> 
> wget returns the following exit code when it is dealing with pipe. But
> it does not follow the common practice. Should this behavior be fixed?
> 
> $ wget -qO- http://httpbin.org/get | echo
> 
> $ echo ${PIPESTATUS[@]}
> 3 0
> $ seq 10 | echo
> 
> $ echo ${PIPESTATUS[@]}
> 141 0
> 
> -- 
> Regards,
> Peng
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Recommendations for adding log statements after checking setsockopt()

2018-10-15 Thread Darshit Shah
Hi,

Thanks for the analysis! However, the code that you have identified comes from
gnulib, which is essentially a statically linked library. As a result code in
there is out of bounds for us to change.

Also, if you see closely, the function you've pointed to, is the
"rpl_setsockopt", that is, it is a replacement function for systems where
setsockopt doesn't behave in a sane manner. Hence, all the invokations to
setsockopt are indeed checked and logged.

* niuxu <20121...@cqu.edu.cn> [181015 11:56]:
> Our team works on enhance logging practices by learning from historical log 
> revisions in evolution.
> We find that 2 patches have added validation code about the return value of 
> setsockopt() along with logging statements. 
> 
> 
> So we suggest that the return value of setsockopt() should be checked and 
> logged if the check pass.
> 
> And, we find 1 missed spot in line 35 of wget-1.19.2/lib/setsockopt.c:
> int
> rpl_setsockopt (int fd, int level, int optname, const void *optval, socklen_t 
> optlen)
> {
>   ...
>   if (level == SOL_SOCKET
>   && (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
>  {
> const struct timeval *tv = optval;
> int milliseconds = tv->tv_sec * 1000 + tv->tv_usec / 1000;
> optval = 
> r = setsockopt (sock, level, optname, optval, sizeof (int));
>  }
>   else
>  {
> r = setsockopt (sock, level, optname, optval, optlen);
>  }
>   if (r < 0)
>  set_winsock_errno ();
> 
>   return r;
> }
> 
> And the 2 patches that support us are:
> 1) In line 334 of File: wget-1.18/src/connect.c
>  if (opt.limit_rate && opt.limit_rate < 8192)
>  {
>int bufsize = opt.limit_rate;
>if (bufsize < 512)
>  bufsize = 512;  /* avoid pathologically small values */
>  #ifdef SO_RCVBUF
> -  setsockopt (sock, SOL_SOCKET, SO_RCVBUF,
> -  (void *), (socklen_t)sizeof (bufsize));
> +  if (setsockopt (sock, SOL_SOCKET, SO_RCVBUF,
> +  (void *) , (socklen_t) sizeof (bufsize)))
> +logprintf (LOG_NOTQUIET, _("setsockopt SO_RCVBUF failed: %s\n"),
> +   strerror (errno));
>  #endif
> 
> 2) In line 474 of File:  wget-1.18/src/connect.c
>sock = socket (bind_address->family, SOCK_STREAM, 0);
>if (sock < 0)
>  return -1;
>  
>  #ifdef SO_REUSEADDR
> -  setsockopt (sock, SOL_SOCKET, SO_REUSEADDR, setopt_ptr, setopt_size);
> +  if (setsockopt (sock, SOL_SOCKET, SO_REUSEADDR, setopt_ptr, setopt_size))
> +logprintf (LOG_NOTQUIET, _("setsockopt SO_REUSEADDR failed: %s\n"),
> +   strerror (errno));
>  #endif
> 
> Thanks for your reading and we are looking forward to your reply about the 
> correctness of our suggestion.
> May you a good day! ^^
> 
> Best Regards,
> Xu

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] [bug #54828] wget stalled download shows wrong speed per second

2018-10-12 Thread Darshit Shah
Update of bug #54828 (project wget):

  Status:None => Invalid
 Open/Closed:Open => Closed 

___

Follow-up Comment #1:

This is actually two different issues.

Anyways, firstly, the speed being stuck is a technical limitation of Wget
being a single threaded application. Wget uses only blocking network sockets
and while it is blocked on a read() call, there is nothing it can do to update
the UI. Similarly, mentioning that it is stalled is also not possible without
adding threading support. Something that we do not intend to do. These issues
don't exist in Wget2 which has been designed with multi-threading from the
very beginning.

Regarding the "(Success)" string, that is not controlled by Wget at all. It is
in fact the string reported by the kernel for the last error. In your case, it
seems to be that the last socket operation was a success, but the connection
was still terminated. 

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #54825] unexpected wget appends .1 after the file extension

2018-10-12 Thread Darshit Shah
Update of bug #54825 (project wget):

  Status:None => Wont Fix   
 Open/Closed:Open => Closed 

___

Follow-up Comment #3:

EDIT: Will fix in Wget2.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #54826] too much output on wget --version

2018-10-12 Thread Darshit Shah
Update of bug #54826 (project wget):

  Status:None => Wont Fix   
 Open/Closed:Open => Closed 

___

Follow-up Comment #3:

Sure, you've shown that a bunch of other programs don't dump the compile time
options. But that's not a rationale for removing it. I still don't see a
reason why I should remove it now that it is already in there? We've had this
output for ~10 years now and as a developer, I know it's been helpful on
occasion when debugging something for a user.

I'm closing this bug for now. If you have a good reason for why it should not
be there, apart from, "I don't like it", or "the others don't do it", please
feel free to re-open.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #54825] unexpected wget appends .1 after the file extension

2018-10-12 Thread Darshit Shah
Follow-up Comment #1, bug #54825 (project wget):

I kind of agree here. Over time Wget has had the --no-clobber option do many
things and trying to preserve backwards compatibility has only complicated
everything. It would be ideal to have a --force option which causes Wget to
overwrite the file. You can currently do that by explicitly specifying the
filename using -O.


However, all new features are currently being added only to Wget2. Which is
the next version of Wget, with (almost) complete command-line parity. However,
we are indeed making some backwards incompatible modification which allow
making changes as you've proposed a lot easier. Please take a look at the
source available here: https://www.gitlab.com/gnuwget/wget2.git. It is also
available on Savannah and has been packaged for Debian already.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #54826] too much output on wget --version

2018-10-12 Thread Darshit Shah
Update of bug #54826 (project wget):

Severity:  3 - Normal => 1 - Wish   

___

Follow-up Comment #1:

Why exactly? As a developer I like that information in the --version output
since it gives me clear information about the build when trying to debug an
issue remotely.

And I don't see how a little extra information in the --version output harms
anything at all. Do you have any concrete reasons apart from a personal
preference?

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: [Bug-wget] Hello again

2018-10-11 Thread 'Darshit Shah'
* mich...@cyber-dome.com  [181009 17:12]:
> 
> Hello Darshit Shah,
> 
> Thank you for your welcome message. I am glad to be part of your project!
> 
> I don't understand the term "javascript engine". AFAK javascript is code that 
> run on the browser side, and we have no problem fetching it.
>
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
> 
> There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by 
> programmers and they will have to take some action on their site to 
> incorporate the engine.

Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
> 
> POST requests to comments and mail will need to taken care of so they will 
> work on static site. One solution is to do hosted supplier that will carry 
> the task and deliver spam removal as well.
> I think I will be able to a howto document on that.
> 
> Michael
> 
> -Original Message-
> From: Darshit Shah  
> Sent: Tuesday, 9 October, 2018 2:52 PM
> To: mich...@cyber-dome.com
> Cc: bug-wget@gnu.org
> Subject: Re: [Bug-wget] Hello again
> 
> Hi Michael,
> 
> Nice to hear from you again. I vaguely remember a mention of someone who 
> wanted
> to work on this feature. When deciding to make this work, please remember that
> any of this can only work if the site does not rely on Javascript; which given
> Wordpress is a difficult thing. The reason for this is that we do _not_ intend
> to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
> much of a maintenance nightmare. However, if the site can work without
> Javascript, then I would assume that Wget2 can already handle making a static
> copy. If it can't handle something, please let us know / file a bug report
> about it.
> 
> Of course, I welcome you to work on Wget2 as you see fit. And we would love to
> look at any contributions you can make. We will also try and help you out as
> much as possible when dealing with the codebase.
> 
> About the dev setup, I only use vim and gdb to work with Wget. As Tim has
> already mentioned, he uses Netbeans and might be able to help you out.
> 
> You also mentioned something about the lib/ directory. That is an
> auto-generated dir with compatibility libs that you don't need to care about.
> All the code for Wget2 is in src/ and the code for the library is in libwget/.
> Those are the two main directories you need to care about. And of course 
> tests/
> for the tests.
> 
> * mich...@cyber-dome.com  [181008 21:22]:
> > 
> > Hello again,
> > 
> > My name is Michael. I have approached you about a year ago.
> > 
> > I am interested in making wget2 a tool that can convert content management
> > systems (like WordPress) output to HTML. This actually limits the content
> > management system to generate the website every time it is changed, and the
> > presentation is done using the HTTP server only.
> > 
> > This is an important feature as it prevents security risk - penetration of
> > hacker to the site and installing viruses or stealing data.
> > It also allows the website to be delivered much faster as no PHP code needs
> > to run in order to deliver the content. Google already announced that site
> > download speed is a factor in its SEO evaluation.
> > 
> > I will be able to work for 3 hours every week on the project. I do need some
> > guidance from you.
> > 
> > I have started to configure Netbeans IDE as using a debugger can help me
> > delve into the code much faster. There are some issues with the Netbeans. Do
> > you use Id? Which one?
> > 
> > Best regards,
> > 
> > Michael
> > 
> > 
> > 
> > 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Header line removal behavior

2018-10-09 Thread Darshit Shah
Hi Gregory,

What you've mentioned is indeed the intended behaviour.

* Gregory R Fellow  [181001 08:55]:
> Hi. Is it the intended behavior for wget to allow sending custom header
> lines with no value?
>
> 
> The following clears previous user-defined headers as described in the
> documentation:
> --header=

Yes, as you've correctly mentioned, this case is mentioned in the documentation
as a way to remove any previously defined headers.
> 
> The following both send a header with no value:
> --header="Accept-Encoding:"
> --header="Accept-Encoding: "
>
These are both accepted by Wget and they will send the exact same HTTP request.
This is because, both of them are considered equivalent _and_ valid according
to the HTTP/1.1 Spec in RFC 7230. The spec clearly mentions that a leading
whitespace is okay and so is an empty Header value.

> 
> This gets an "Invalid header" error from wget:
> --header="Accept-Encoding"

Of course. Because it is an invalid header according to the HTTP spec :). Per
RFC 7230, a header _must_ have the form: ": ". The
colon being immediately after the header-name is a hard requirement.
> 
> I noticed this behavior while trying to disable some of wget's automatically
> generated headers (which apparently isn't possible except for User-Agent via
> --user-agent=).

In Wget, you can't disable the headers, but you can indeed overwrite them to
something else or even make them empty. The real question here is, why do you
want to disable some headers? Since Wget only generated headers that are
absolutely necessary for parsing the request correctly.
> 
> Thanks in advance for your help.
> 
> Greg
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Hello again

2018-10-09 Thread Darshit Shah
Hi Michael,

Nice to hear from you again. I vaguely remember a mention of someone who wanted
to work on this feature. When deciding to make this work, please remember that
any of this can only work if the site does not rely on Javascript; which given
Wordpress is a difficult thing. The reason for this is that we do _not_ intend
to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
much of a maintenance nightmare. However, if the site can work without
Javascript, then I would assume that Wget2 can already handle making a static
copy. If it can't handle something, please let us know / file a bug report
about it.

Of course, I welcome you to work on Wget2 as you see fit. And we would love to
look at any contributions you can make. We will also try and help you out as
much as possible when dealing with the codebase.

About the dev setup, I only use vim and gdb to work with Wget. As Tim has
already mentioned, he uses Netbeans and might be able to help you out.

You also mentioned something about the lib/ directory. That is an
auto-generated dir with compatibility libs that you don't need to care about.
All the code for Wget2 is in src/ and the code for the library is in libwget/.
Those are the two main directories you need to care about. And of course tests/
for the tests.

* mich...@cyber-dome.com  [181008 21:22]:
> 
> Hello again,
> 
> My name is Michael. I have approached you about a year ago.
> 
> I am interested in making wget2 a tool that can convert content management
> systems (like WordPress) output to HTML. This actually limits the content
> management system to generate the website every time it is changed, and the
> presentation is done using the HTTP server only.
> 
> This is an important feature as it prevents security risk - penetration of
> hacker to the site and installing viruses or stealing data.
> It also allows the website to be delivered much faster as no PHP code needs
> to run in order to deliver the content. Google already announced that site
> download speed is a factor in its SEO evaluation.
> 
> I will be able to work for 3 hours every week on the project. I do need some
> guidance from you.
> 
> I have started to configure Netbeans IDE as using a debugger can help me
> delve into the code much faster. There are some issues with the Netbeans. Do
> you use Id? Which one?
> 
> Best regards,
> 
> Michael
> 
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] [PATCH] Fixes for issues found by Coverity static analysis

2018-08-25 Thread Darshit Shah
Hi Tomas,

Thanks for running the scan and the patches you've made! I briefly glanced
through those and they seem fine. Of course, they will need to be slightly
modified to apply to the current git HEAD. I can do that in the coming days and
apply these patches.

I would like to ask you if there is a regular scan of Wget that you have set up
on Coverity. We used to run coverity scans regularly, but since the last year
or so, I haven't managed to get the coverity binaries to execute on my system.
So the scans stopped. If you have a scheduled run, I would like to be able to
see the results on Coverity so that we can keep fixing those issues.

P.S.: It seems like you haven't assigned your copyrights to the FSF for Wget.
Do you happen to know if your employer has assigned the copyrights on your
behalf? I couldn't find any mentions in the list I have locally. You will
shortly receive the assignment form in a separate email.

* Tomas Hozza  [180825 02:21]:
> Hi.
> 
> We scanned the latest version of wget (1.19.5) with Coverity static analyzer. 
> It found some potentially important issues like RESOURCE LEAKS. I'm attaching 
> my proposed fixes for these issues. Each commit includes the output from 
> Coverity and the outcome of my analysis of the problem from sources.
> 
> Regards,
> Tomas
> -- 
> Tomas Hozza
> Associate Manager, Software Engineering - EMEA ENG Core Services
> 
> PGP: 1D9F3C2D
> UTC+1 (CET)
> Red Hat Inc.     http://cz.redhat.com


-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Inconsistent cookie handling between different machines

2018-08-18 Thread Darshit Shah
ile (no difference between files).
> 
> For the life of me, I can't figure out why that third cookie isn't being 
> stored from machine 2?  The only thing I noticed that is different about that 
> cookie is that it's marked as "secure" while the other two are not.
> 
> I looked through the wget man pages and didn't see any other options that 
> impact cookie processing aside from the ones I've used.  Any help would be 
> greatly appreciated.
> 
> Thanks
> Sean
> -This e-mail and any attachments may contain CONFIDENTIAL information, 
> including PROTECTED HEALTH INFORMATION. If you are not the intended 
> recipient, any use or disclosure of this information is STRICTLY PROHIBITED; 
> you are requested to delete this e-mail and any attachments, notify the 
> sender immediately, and notify the LabCorp Privacy Officer at 
> privacyoffi...@labcorp.com or call (877) 23-HIPAA / (877) 234-4722. 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Async webcrawling

2018-08-01 Thread Darshit Shah
Hi James,

Wget2 is built on top of the libwget library which uses Asynchronous network
calls. However, Wget2 is written such that it only utilizes one connection per
thread. This is essentially a design decision to simplify the codebase. In case
you want a more complex crawler, you can use libwget to write your own as Tim
suggested in his email.

Instead of this kind of async behaviour, we rely on HTTP/2 multiplexed streams
which allow you to send multiple requests over the same connection in parallel.
So, when crawling any website using HTTP/2, Wget2 can get the benefits of async
access without requiring all those code paths.


* James Read  [180731 20:28]:
> Thanks,
> 
> as I understand it though there is only so much you can do with threading.
> For more scalable solutions you need to go with async programming
> techniques. See http://www.kegel.com/c10k.html for a summary of the
> problem. I want to do large scale webcrawling and am not sure if wget2 is
> up to the job.
> 
> On Tue, Jul 31, 2018 at 6:22 PM, Tim Rühsen  wrote:
> 
> > On 31.07.2018 18:39, James Read wrote:
> > > Hi,
> > >
> > > how much work would it take to convert wget into a fully fledged
> > > asynchronous webcrawler?
> > >
> > > I was thinking something like using select. Ideally, I want to be able to
> > > supply wget with a list of starting point URLs and then for wget to crawl
> > > the web from those starting points in an asynchronous fashion.
> > >
> > > James
> > >
> >
> > Just use wget2. It is already packaged in Debian sid.
> > To build from git source, see https://gitlab.com/gnuwget/wget2.
> >
> > To build from tarball (much easier), download from
> > https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz.
> >
> > Regards, Tim
> >
> >
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Failed 1.19.5 install on Solaris 11.3

2018-07-18 Thread Darshit Shah
Are you trying to compile Wget from git? Or are you using the tarballs?

If you are using the tarballs, this should not happen unless you have modified
some of the build files. In which case, I would ask you to share your changes
with us so that we can fix the build for everyone.

You should not require automake to compile and install Wget. In case you do,
you can run that step on a different / newer machine (Even Linux updated 5
minutes ago). Then copy all the files over to the other machine and run
configure and make. 

* Jeffrey Walton  [180718 14:54]:
> On Wed, Jul 18, 2018 at 7:14 AM, Tim Rühsen  wrote:
> > Maybe it's an bash/sh incompatibility. Anyways - what does 'make
> > install' do !? It basically copies the 'wget' executable into a
> > directory (e.g. /usr/local/bin/) that is listed in your PATH env variable.
> >
> > You can do that by hand. If you want the updated man file, copy wget.1
> > into your man1 directory (e.g. /usr/local/share/man/man1/).
> 
> thanks.
> 
> It appears things are not getting as far as an install. My bad.
> 
> $ make
> make  all-recursive
> Making all in lib
> make  all-recursive
> Making all in src
> make  all-am
> Making all in doc
> Making all in po
> Making all in util
> Making all in fuzz
>  cd .. && /bin/sh
> /export/home/jwalton/Build-Scripts/wget-1.19.5/build-aux/missing
> automake-1.15 --gnu fuzz/Makefile
> /export/home/jwalton/Build-Scripts/wget-1.19.5/build-aux/missing[81]:
> automake-1.15: not found [No such file or directory]
> WARNING: 'automake-1.15' is missing on your system.
>  You should only need it if you modified 'Makefile.am' or
>  'configure.ac' or m4 files included by 'configure.ac'.
>  The 'automake' program is part of the GNU Automake package:
>  <http://www.gnu.org/software/automake>
>  It also requires GNU Autoconf, GNU m4 and Perl in order to run:
>  <http://www.gnu.org/software/autoconf>
>  <http://www.gnu.org/software/m4/>
>  <http://www.perl.org/>
> *** Error code 127
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] How specify a CA zoo file during configuration

2018-07-18 Thread Darshit Shah
At run time you can use the --ca-certificate option to pass the filename

* Jeffrey Walton  [180718 11:17]:
> Hi Everyone,
> 
> I'm working on an ancient system. I need to bootstrap an updated Wget.
> 
> I installed a new ca-certs.pem in /usr/local/share. I need to tell
> Wget to use it. I don't see a configuration option:
> 
> $ ./configure --help | grep -i ca
>   --cache-file=FILE   cache test results in FILE [disabled]
>   -C, --config-cache  alias for `--cache-file=config.cache'
>   [/usr/local]
> `/usr/local/bin', `/usr/local/lib' etc.  You can specify
> an installation prefix other than `/usr/local' using `--prefix',
>   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
>   --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
>   --localedir=DIR locale-dependent data [DATAROOTDIR/locale]
>   --with-caresenable support for C-Ares DNS lookup.
>   (use with caution on other systems).
>   CARES_CFLAGS
>   C compiler flags for CARES, overriding pkg-config
>   CARES_LIBS  linker flags for CARES, overriding pkg-config
> 
> I don't want to overwrite/replace the existing one because I fear the
> modern X509 extensions will break some things.
> 
> How do I tell configure where to look for the ca zoo file?
> 
> Jeff
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Deprecate TLS 1.0 and TLS 1.1

2018-06-19 Thread Darshit Shah
* Tim Rühsen  [180619 13:18]:
> On 06/19/2018 12:44 PM, Loganaden Velvindron wrote:
> > Hi All,
> > 
> > As per:
> > https://tools.ietf.org/html/draft-moriarty-tls-oldversions-diediedie-00
> > 
> > Attached is a tentative patch to disable TLS 1.0 and TLS 1.1 by
> > default. No doubt that this will cause some discussions, I'm open to
> > hearing all opinions on this.
> > 
> 
> Good idea for the public internet.
> 
> IMO there are too many 'internal' devices / hardware that are not
> up-to-date and impossible to update.
> 
> What about amending the patch so that we apply it only to public IP
> addresses ?
> 
I like this idea. Also, the user should retain their freedom to connect to an
insecure server as well. We should have a switch that will allow falling back
to TLS 1.0 and 1.1. 

> And even then - we should not just 'fail' on older servers but tell the
> user why wget fails and what to do about it. In the end, the user is
> responsible and in control.
> 
> Regards, Tim
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] [bug #51181] Unexpected "Redirecting output to 'wget-log'."

2018-06-16 Thread Darshit Shah
Follow-up Comment #12, bug #51181 (project wget):

Actually, I am unable to reproduce the problem.

`$ timeout -k 26s 25s wget example.com`

does _not_ put Wget in the background. The entire task runs in the
foreground.

And even when wget does run in the background, I don't see how the manual is
incorrect. It says, wget will download to `wget-log`, but if the local file
already exists, due to no-clobbering, Wget will create a unique filename by
appending a counter.

I just don't see what is wrong here

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: [Bug-wget] robots.txt seemingly ignored

2018-05-15 Thread Darshit Shah
Hi,

You are using a very old version of Wget.  v1.12 was released in 2009 if I
remember correctly. 

The current version of Wget doesn't seem to have any issues with the parsing of
that robots.txt. I just tried it locally and it downloads no files at all.

Please update your version of Wget.

* Daniel Feenberg <feenb...@nber.org> [180514 16:51]:
>
> I have the following wget command line:
> 
>wget -r  http://wwwdev.nber.org/
> 
> http://wwwdev.nber.org/robots.txt  is:
> 
>   User-agent: *
>   Disallow: /
> 
>   User-Agent: W3C-checklink
>   Disallow:
> 
> 
> However wget fetches thousands of pages from wwwdev.nber.org. I would have
> thought nothing would be found. (This is a demonstration, obviously in real
> life I'd have a more detailed robots.txt to control the process).
> 
> Obviously too, I don't understand something about wget or robots.txt. Can
> anyone help me out?
> 
> This is GNU Wget 1.12 built on linux-gnu.
> 
> Thank you
> Daniel Feenberg
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] strange behaviour

2018-05-11 Thread Darshit Shah
This is very strange behavior. It seems like the name resolution works only
from the second time onwards. I've never seen anything like that before.

In any case, I see that you're using a very old version of Wget. The current
version is 1.19.5. Could you please update your version of Wget and try again
to see if the problem persists?

* VINEETHSIVARAMAN <vineethsivara...@gmail.com> [180510 15:06]:
> *[~]$ wget hello  google.com <http://google.com> --no-proxy -d*
> DEBUG output created by Wget 1.14 on linux-gnu.
> 
> URI encoding = ‘UTF-8’
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> --2018-05-10 06:28:04--  http://hello/
> Resolving hello (hello)... failed: Name or service not known.
> wget: unable to resolve host address ‘hello’
> URI encoding = ‘UTF-8’
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> --2018-05-10 06:28:04--  http://google.com/
> Resolving google.com (google.com)... 74.125.24.102, 74.125.24.101,
> 74.125.24.100, ...
> Caching google.com => 74.125.24.102 74.125.24.101 74.125.24.100
> 74.125.24.138 74.125.24.113 74.125.24.139 2404:6800:4003:c03::71
> Connecting to google.com (google.com)|74.125.24.102|:80... ^C
> [~]$ wget  google.com --no-proxy -d
> DEBUG output created by Wget 1.14 on linux-gnu.
> 
> URI encoding = ‘UTF-8’
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> --2018-05-10 06:28:14--  http://google.com/
> Resolving google.com (google.com)... failed: Name or service not known.
> wget: unable to resolve host address ‘google.com’
>  ~]$ wget hell  google.com --no-proxy -d
> DEBUG output created by Wget 1.14 on linux-gnu.
> 
> URI encoding = ‘UTF-8’
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> --2018-05-10 06:28:21--  http://hell/
> Resolving hell (hell)... failed: Name or service not known.
> wget: unable to resolve host address ‘hell’
> URI encoding = ‘UTF-8’
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> --2018-05-10 06:28:21--  http://google.com/
> Resolving google.com (google.com)... 74.125.24.101, 74.125.24.100,
> 74.125.24.138, ...
> Caching google.com => 74.125.24.101 74.125.24.100 74.125.24.138
> 74.125.24.113 74.125.24.139 74.125.24.102 2404:6800:4003:c03::71
> Connecting to google.com (google.com)|74.125.24.101|:80... ^C
> [ ~]$
> 
> 
> On Thu, May 10, 2018 at 3:57 PM VINEETHSIVARAMAN <vineethsivara...@gmail.com>
> wrote:
> 
> >  Hello Team ,
> >
> > My server is behind a firewall and a  proxy, but when i give  2 "wget" in
> > command  gives me a DNS resolution but not with the single wget !
> >
> >
> > *
> >
> > *
> >
> >
> >
> >
> > [~]$ nslookup google.com
> >
> > Non-authoritative answer:
> > Name:   google.com
> > Address: 74.125.24.102
> > Name:   google.com
> > Address: 74.125.24.101
> > Name:   google.com
> > Address: 74.125.24.139
> > Name:   google.com
> > Address: 74.125.24.113
> > Name:   google.com
> > Address: 74.125.24.138
> > Name:   google.com
> > Address: 74.125.24.100
> >
> > [~]$ wget google.com --no-proxy -d
> > DEBUG output created by Wget 1.14 on linux-gnu.
> >
> > URI encoding = ‘UTF-8’
> > Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> > Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> > --2018-05-10 06:24:33--  http://google.com/
> > Resolving google.com (google.com)... failed: Name or service not known.
> > wget: unable to resolve host address ‘google.com’
> > [ ~]$ wget wget google.com --no-proxy -d
> > DEBUG output created by Wget 1.14 on linux-gnu.
> >
> > URI encoding = ‘UTF-8’
> > Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> > Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
> > --2018-05-10 06:24:40--  http://wget/
> > Resolving wget (wget)... failed: Name or service not known.
> > wget: unable to resolve host address ‘wget’
> > URI encoding = ‘UTF-8’
> > Converted file name 'index.html' (UTF-8) -

Re: [Bug-wget] further help needed!!

2018-05-08 Thread Darshit Shah
Hi,

Please do not create a separate thread when replying to a previous email.
It makes it hard for us to keep track of the context.

Answering inline...

* Sameeran Joshi <joshisameera...@gmail.com> [180506 16:16]:
> thanks for u r help!@Darshit Shah
> 1.GOAL:To learn the internal logic of simple wget command 'wget
> www.google.com' and get acquainted with the code base of any open source
> project

This command looks simple, but in reality a lot goes on underneath.
If you're interested in the logic, I would suggest you start from the main()
function in `src/wget.c` and follow the logic. Dig deeper into any function
call which you think leads to downloading the web-page.

Hint: The actual download is always done on a separate thread. Look for the
thread function calls.

>
> 2.how do you come to know which directory contains what and does what,

Usually, we try to keep the naming pretty accurate. So, if you see a directory
called, `libwget`, it contains the source for "libwget". Similarly, if a file
is called, "http_highlevel.c", then there's a fairly good chance that the file
contains the highlevel HTTP API. Now, what is highlevel may be ambiguous, but
that is something you'll just have to look for.

In general, follow the names.

>  2.A.while looking at /docs I found it dosen't contain the documentation
> which tells you about the structure and use of functions

Run `make` first. It will build the libwget API documentation.

>  2.B.googling out gives me this
> https://www.gnu.org/software/wget/manual/wget.html#Wgetrc-Location
> which too tells about the use of wget
> 
That's for Wget 1.x, not Wget2. 

>   SO WHERE DO I FIND THE CORRECT DOCUMENTATION FOR DEVELOPERS?

Firstly, please remember that on the internet, ALL CAPS is considered shouting
and it generally frowned upon.

Specifically what kind of documentation are you looking for? If you simply want
to use the libwget API, you can take a look at 
https://gnuwget.gitlab.io/wget2/reference/

This should also serve as decent documentation for understanding what function
does what.

You can build this documentation locally using `make` as well.

> 3.Are there any kind of algorithms or flowcharts describing the working of
> the file?,as it would be easy rather to dive into code to understand it
> better.

Sadly, nope. We don't have any design documentation. You'll just have to wade
through the code to figure it out. If you're stuck on something, you can ask on
the mailing lists / IRC.

Any help / efforts on improving the documentation are definitely appreciated :)

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Need help

2018-05-08 Thread Darshit Shah
Hi,

Apart from from Tim has already mentioned, please also consider:

1. Do not create a new mail thread for every email. Please use the same thread
   for logically connected mails. I do not wish to go searching for the context
   your email through my inbox.

2. Like I've mentioned before, responses may take in the order of days. You
   sent a message less than 2 days ago and it was ambiguously worded.

We are a team on volunteers doing this in our spare time. Sometimes, no one has
any time to look into the matters for GNU Wget since we have a day job to work
on. Please keep patient, and if your queries haven't been answered after 2+
days, just send a little reminder, *on the same thread*.

* Sameeran Joshi <joshisameera...@gmail.com> [180508 05:53]:
> Hi,I have sent emails on mailing list regarding some doubts,they aren't
> replied so generally how many days does it take for reply from
> maintainers,as I am newbie I thought no one is replying to mails,but one of
> my friend told to ask on mailing list may be the people are busy.
> Thanku

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] newbie help

2018-05-06 Thread Darshit Shah
Hi Sameeran,

What exactly is your goal?

The libwget/ directory contains all the code for libwget, the HTTP library that
powers wget2.

The examples/ directory contains a bunch of small toy programs which show how
one can use libwget.

Sadly, there is no single source file you can read which contains the entire
process of accepting a URL, parsing it and downloading the said web page. If
you are interested in the logic of downloading a web page, you should look at:
`libwget/http_highlevel.c`. This file contains the very high level API for
downloading a webpage. From there, you can go deeper into the specific details
of constructing HTTP requests and parsing the responses.

* Sameeran Joshi <joshisameera...@gmail.com> [180504 05:38]:
> *hi,i have followed all the instructions of downloading and installing
> wget2,i am a newbie to any of the open source projects,so i just went
> through the examples/ directory and the libwget directory but I am afraid
> which .c file should i start with.*
> 
> *how should I read and understand the .c file,i  am quite comfortable with
> the c language programming,also which is the .c file which contains the
> very basic code of just accepting URL and parsing it and downloading the
> supplied webpage.I am interested in seeing the logic of the same.*
> *thanks*

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Submitted a merge request

2018-05-06 Thread Darshit Shah
* sameeran joshi <gsocsamee...@gmail.com> [180506 11:24]:
> Can anyone verify the merge request by me

Hi,

Thanks for the contribution!

However, once you have submitted a merge request on gitlab, you don't need to
send another mail to the mailing list. Someone will review and merge it as soon
as possible.

Sometimes that may take time (think days), since we are all volunteers and
aren't always available. Please wait for atleast 48 hours for a response.

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] Add support to bind to a local port

2018-05-03 Thread Darshit Shah
This patch adds support to bind Wget's client socket to a user specified port.


-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
From 033a14cc952bd93618fa9b52f26beb4783a55d5f Mon Sep 17 00:00:00 2001
From: Darshit Shah <dar...@gnu.org>
Date: Thu, 3 May 2018 18:26:38 +0200
Subject: [PATCH] Add support for binding to local port

* src/options.h: Intriduce bind_port variable
* src/init.c: initialize bind_port to -1 to indicate no user value
* src/main.c: Add bind_port to command line options. Also ensure that
port is only specified along with bind_address
* src/connect.c: Use the specified bind_port when binding to a specific
address.
Also set the SO_REUSEADDR option on the socket when binding to a local
address
* doc/wget.texi: Add documentation for new option
---
 doc/wget.texi | 11 +++
 src/connect.c | 17 +++--
 src/init.c|  2 ++
 src/main.c| 12 
 src/options.h |  1 +
 5 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index 5fd11137..9b2f1a0a 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -591,6 +591,14 @@ the local machine.  @var{ADDRESS} may be specified as a 
hostname or IP
 address.  This option can be useful if your machine is bound to multiple
 IPs.
 
+@cindex bind port
+@cindex client port number
+@cindex IP address, client, port
+@item --bind-port=@var{PORT}
+When making client TCP/IP connections using @samp{--bind-address}, additionally
+bind to a specific @var{PORT} on the client machine.  If a negative value is
+passed as the parameter, then the default vallue of 0 will be used.
+
 @cindex bind DNS address
 @cindex client DNS address
 @cindex DNS IP address, client, DNS
@@ -3243,6 +3251,9 @@ as being relative to @var{string}---the same as 
@samp{--base=@var{string}}.
 @item bind_address = @var{address}
 Bind to @var{address}, like the @samp{--bind-address=@var{address}}.
 
+@item bind_port = @var{port}
+In addition to @samp{bind_address}, bind to specific @var{port}.
+
 @item ca_certificate = @var{file}
 Set the certificate authority bundle file to @var{file}.  The same
 as @samp{--ca-certificate=@var{file}}.
diff --git a/src/connect.c b/src/connect.c
index 37dae215..37a30879 100644
--- a/src/connect.c
+++ b/src/connect.c
@@ -187,7 +187,7 @@ resolve_bind_address (struct sockaddr *sa)
   if (called)
 {
   if (should_bind)
-sockaddr_set_data (sa, , 0);
+sockaddr_set_data (sa, , opt.bind_port);
   return should_bind;
 }
   called = true;
@@ -209,7 +209,7 @@ resolve_bind_address (struct sockaddr *sa)
   ip = *address_list_address_at (al, 0);
   address_list_release (al);
 
-  sockaddr_set_data (sa, , 0);
+  sockaddr_set_data (sa, , opt.bind_port);
   should_bind = true;
   return true;
 }
@@ -340,6 +340,19 @@ connect_to_ip (const ip_address *ip, int port, const char 
*print)
   struct sockaddr *bind_sa = (struct sockaddr *)_ss;
   if (resolve_bind_address (bind_sa))
 {
+
+   // Set the SO_REUSEADDR socket option if it is available. It is
+  // useful when explicitly binding to a given address
+#ifdef SO_REUSEADDR
+  /* For setting options with setsockopt. */
+  int setopt_val = 1;
+  void *setopt_ptr = (void *)_val;
+  socklen_t setopt_size = sizeof (setopt_val);
+
+  if (setsockopt (sock, SOL_SOCKET, SO_REUSEADDR, setopt_ptr, 
setopt_size))
+logprintf (LOG_NOTQUIET, _("setsockopt SO_REUSEADDR failed: %s\n"),
+   strerror (errno));
+#endif
   if (bind (sock, bind_sa, sockaddr_size (bind_sa)) < 0)
 goto err;
 }
diff --git a/src/init.c b/src/init.c
index e4186abe..98b6ac45 100644
--- a/src/init.c
+++ b/src/init.c
@@ -150,6 +150,7 @@ static const struct {
 #ifdef HAVE_LIBCARES
   { "binddnsaddress",   _dns_address,  cmd_string },
 #endif
+  { "bindport",_port, 
cmd_number },
   { "bodydata", _data, cmd_string },
   { "bodyfile", _file, cmd_string },
 #ifdef HAVE_SSL
@@ -396,6 +397,7 @@ defaults (void)
   opt.metalink_index = -1;
 #endif
 
+  opt.bind_port = -1;
   opt.cookies = true;
   opt.verbose = -1;
   opt.ntry = 20;
diff --git a/src/main.c b/src/main.c
index 46824efd..c6e560bd 100644
--- a/src/main.c
+++ b/src/main.c
@@ -275,6 +275,7 @@ static struct cmdline_option option_data[] =
 #ifdef HAVE_LIBCARES
 { "bind-dns-address", 0, OPT_VALUE, "binddnsaddress", -1 },
 #endif
+{ "bind-port", 0, OPT_VALUE, "bindport", -1 },
 { "body-data", 0, OPT_VALUE, "bodydata", -1 },
 { "body-file", 0, OPT_VALUE, "bodyfile", -1 },
 { IF_SSL ("ca-certificate"), 0, OPT_VALUE, "cacertificate", -1 },
@@ -692,6 +693,8 @@ Download:\n"),
   -Q,  --quot

Re: [Bug-wget] wget 1.18-5+deb9u1 with --hsts -E -k fails

2018-04-25 Thread Darshit Shah
Hi Karl,

Thanks! That's a pretty detailed bug report. Definitely helpful :)

1. Yes, I'm aware of the issue. Even 1.20 is not listed on it. I'll get right
   on fixing that ASAP. 

2. and 3. These are a little more involved than I have the time to look into
   for now. It does seem like it is being caused due to the webserver having
   autoindex pages. The situation with HSTS is even more weird. I will take a
   look at the specifics most likely next week and try to come up with either
   an explanation or a solution.

* Karl O. Pinc <k...@meme.com> [180419 00:19]:
> Hello,
> 
> This is a bad bug report.  Sorry.  I'm thinking that
> you'd rather hear _something_ than nothing.
> 
> I'm using wget 1.18-5+deb9u1, which is 1.18
> on Debian Squeeze (9.4).
> 
> I can't say I'm certain that there is even a bug,
> although there is a functionality problem at some
> level.
> 
> 3 different problems, from simplest and most trivial
> to more complex.
> 
> 1) The wget Savannah page makes no mention of
> version 1.19 in the news section.
> 
> 2) The situation described below (with --adjust-extension)
> produces a "doc/guide" directory and a "doc/guide.1.html"
> file.  It would be nice if the file were instead named
> "doc/guide.html", without the ".1".  (There is no such
> file.)
> 
> 3)  I am mirroring a site where the url paths ending
> in "/" deliver pages, but there are additional, longer,
> urls which extend these urls.  So --adjust-extension (-E)
> is required so that wget can write an "index file",
> ending in ".html", and create directories to hold
> additional content.
> 
> I am also using --convert (-k) so as to have relative
> links in the downloaded material.
> 
> The problem is that when I use --hsts I get (sometimes,
> but consistently for particular urls)
> a "foo/" directory, a "foo.1.html" file containing
> some converted links, and a "foo.html" file without
> converted links.  FYI, "foo" is downloaded by linking
> "upwards" in the url path from the targeted url to
> mirror.  The downloaded, --convert-ed, material contains
> some links to "foo.1.html" and some to "foo.html".
> 
> When using --no-hsts I get 301 (Permanent redirect)
> from the mirrored site to https pages (and it seems
> in this particular case https pages on the target
> top-level domain).  I then have no problems with
> --convert-ed data.
> 
> With --hsts I get some pages on other sub-domains
> of the target domain, FYI.  This is not obviously
> related to the problem.
> 
> Now, for the specifics.  Apologies that the
> example is not clean and the site it hits may change
> in ways that make the problem not reproducible.
> 
> The goal is to mirror the Yii 1.1 reference documentation
> and user guide.  The command which "works" is:
> 
> wget --no-hsts --directory-prefix mirror --timestamping -F
> --no-remove-listing --domains=www.yiiframework.com,yiiframework.com
> --regex-type=pcre
> --reject-regex='^https?://www\.yiiframework\.com/(?:(?:forum)|(?:wiki)|(?:user)|(?:extension)|(?:doc-2\.0)|(?:doc/(?:(?:(?:(?:guide)|(?:api))/(?:1|2)\.0)|(?:guide/1\.1/(?:(?:de)|(?:es)|(?:fr)|(?:he)|(?:id)|(?:it)|(?:ja)|(?:pl)|(?:pt)|(?:pt-br)|(?:ro)|(?:ru)|(?:sv)|(?:uk)|(?:zh-cn)))|(?:download/yii-.*-2\.0)|(?:blog)))|(?:news)|(?:blog)|(?:team)|(?:user)|(?:badge))'
> --adjust-extension --recursive --level inf --convert-links
> --page-requisites --span-hosts --no-clobber
> https://www.yiiframework.com/doc/guide/1.1/en
> 
> Some notes:
> 
> I happen to know that the guide contains links to the API
> docs, and all the API docs cross reference each other, 
> so I mirrored the guide and picked up the API docs as well.
> 
> The above command downloads 521 files comprising 43MB. (!)
> Sorry.
> 
> -F probably does nothing, but I included it because
> that's what I ran with.
> 
> Leaving off the --no-hsts I get:
> 
> mirror/
>   www.yiiframework.com/
> doc/
>   api/
>   api.1.html
>   api.html
>   guide/
> 1.1/
>   en/
>   en.1.html
>   en.html
>   guide.1.html
>   terms/
> 
> As noted, "en.html" and "api.html" contain un-converted links and
> some downloaded content links to these files.  I _think_ these get
> created late in the download.
> 
> With --no-hists (as in the command above) I get:
> 
> mirror/
>   www.yiiframework.com/
> doc/
>   api/
>   api.html
>   guide/
> 1.1/
>   en/
>   en.html
>   guide.1.html
>   terms/
> 
> FYI.  I first tri

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-08 Thread Darshit Shah
* Jeffrey Fetterman <jfett...@mail.ccsf.edu> [180408 04:53]:
> Yes! Multiplexing was indeed partially the culprit, I've changed it
> to --http2-request-window=5
> 
> However the download queue (AKA 'Todo') still gets enormous. It's why I was
> wanting to use non-verbose mode in the first place, screens and screens of
> 'Adding url:'. There should really be a limit on how many urls it adds!
> 
The URLs are added first because of the way Wget will traverse the links. It
just adds these URLs to the download queue, doesn't start downloading them
instantly. If you traverse a web page and Wget finds links on it, it will
obviously add them to the download queue. What else would you expect Wget to
do?

> Darshit, as it stands it doesn't look like --force-progress does anything
> because --progress=bar forces the same non-verbose mode, and
> --force-progress is meant to be something used in non-verbose mode.
> 
> However, the progress bar is still really... not useful. See here:
> https://i.imgur.com/KvbGmKe.png
> 
> It's a single bar displaying a nonsense percentage, and it sounds like with
> multiplexing there's supposed to be, by default, 30 transfers going
> concurrently.
> 
Yes, I am aware of this. Sadly, Wget is developed entirely on volunteer effort.
And currently, I don't have the time on my hands to fix the progress bar. It's
being caused due to HTTP/2 connection multiplexing. I will fix it when I find
some time for it.

> > Both reduce RTT by 1, but they can't be combined.
> 
> I was using TLS Resume because, well, for a 300+GB download it just seemed
> to make sense, so it wouldn't have to check over 100GB of files before
> getting back to where I left off.
> 
> > You use TLS Resume, but you don't explicitly need to specify a file. By
> default it will use ~/.wget-session.
> 
> I figure a 300GB+ transfer should have its own session file just in case I
> do something smaller between resumes that might overwrite .wget-session,
> plus you've got to remember I'm on WSL and I'd rather have relevant files
> kept within my normal folders rather than my WSL filesystem.
> 
I'm not sure if you've understood TLS Session Resume correctly. TLS Session
Resume is not going to resume your download session from where it left off. Due
to the way HTTP works, Wget will still have to scan all your existing files and
send HEAD requests for each of them when resuming. This is just a limitation of
HTTP and there's nothing anybody can do about it.

TLS Session Resume will simply reduce 1 RTT when starting a new TLS Session. It
simply matters for the TLS handshake and nothing else. It doesn't resume the
Wget session at all. Also, the ~/.wget-session file simply stores the TLS
Session information for each TLS Session. So you can use it for multiple
sessions. It is just a cache.
> On Sat, Apr 7, 2018 at 3:04 AM, Darshit Shah <dar...@gmail.com> wrote:
> 
> > Hi Jefferey,
> >
> > Thanks a lot for your feedback. This is what helps us improve.
> >
> > * Tim Rühsen <tim.rueh...@gmx.de> [180407 00:01]:
> > >
> > > On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > > > Thanks to the fix that Tim posted on gitlab, I've got wget2 running
> > just
> > > > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but
> > given
> > > > how fast it's downloading a ton of files at once, it seems like it
> > must've
> > > > been only a small gain.
> > > >
> > TCP Fast Open will not save you a lot in your particular scenario. It
> > simply
> > saves one round trip when opening a new connection. So, if you're using
> > Wget2
> > to download a lot of files, you are probably only opening ~5 connections
> > at the
> > beginning and reusing them all. It depends on your RTT to the server, but
> > 1 RTT
> > when downloading several megabytes is already an insignificant amount if
> > time.
> >
> > > >
> > > > I've come across a few annoyances however.
> > > >
> > > > 1. There doesn't seem to be any way to control the size of the download
> > > > queue, which I dislike because I want to download a lot of large files
> > at
> > > > once and I wish it'd just focus on a few at a time, rather than over a
> > > > dozen.
> > > The number of parallel downloads ? --max-threads=n
> >
> > I don't think he meant --max-threads. Given how he is using HTTP/2,
> > there's a
> > chance what he's seeing is HTTP Stream Multiplexing. There is also,
> > `--http2-request-window` which you can try.
> > >
> > > > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32:
> > Broken
> > > &

Re: [Bug-wget] Miscellaneous thoughts & concerns

2018-04-07 Thread Darshit Shah
 perhaps help me address some of the remarks above if
> > possible).
> 
> No need for --continue.
> Think about using TLS Session Resumption.
> --domains is not needed in your example.
> 

You use TLS Resume, but you don't explicitly need to specify a file. By default
it will use ~/.wget-session.

> Did you build with http/2 and compression support ?
> 
> Regards, Tim
> > #!/bin/bash
> >
> > wget2 \
> >   `#WSL compatibility` \
> >   --restrict-file-names=windows --no-tcp-fastopen \
> >   \
> >   `#No certificate checking` \
> >   --no-check-certificate \
> >   \
> >   `#Scrape the whole site` \
> >   --continue --mirror --adjust-extension \
> >   \
> >   `#Local viewing` \
> >   --convert-links --backup-converted \
> >   \
> >   `#Efficient resuming` \
> >   --tls-resume --tls-session-file=.\tls.session \
> >   \
> >   `#Chunk-based downloading` \
> >   --chunk-size=2M \
> >   \
> >   `#Swiper no swiping` \
> >   --robots=off --random-wait \
> >   \
> >   `#Target` \
> >   --domains=example.com example.com
> >
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] make.exe warnings

2018-04-06 Thread Darshit Shah
All of these warnings happen due to gnulib replacing standard Unix API calls 
with replacements for system specific implementations.

I guess that removing the warnings for redefinition definitely makes sense in 
the gnulib code.  Redefining functions is quite literally its job.

The pointer type warnings, I would want to look at them once and decide whether 
to fix or silence them

On April 6, 2018 7:39:30 AM UTC, "Tim Rühsen"  wrote:
>On 04/06/2018 04:30 AM, Jeffrey Fetterman wrote:
>> I've successfully built wget2 through msys2 as a Windows binary, and
>it
>> appears to be working (granted I've not used it much yet), but I'm
>> concerned about some of the warnings that occurred during
>compilation.
>> 
>> Unsurprisingly they seem to be socket-related.
>> 
>> https://spit.mixtape.moe/view/9f38bd83
>
>These are warnings from gnulib code. The code itself looks good to me.
>Our CFLAGS for building the gnulib code are maybe too strong, I'll see
>if reducing verbosity is recommended here.
>
>With Best Regards, Tim

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [Bug-wget] GSoC'18: DNS over HTTPS.

2018-03-22 Thread Darshit Shah
Hi,

I'll get to a discussion about the proposal shortly, but in the meantime, may I
please request everyone to avoid continuing this email thread on
summer-of-c...@gnu.org?

That is a generic mailing list for organizing the Summer of Code program within
GNU.

The discussion about any particular project is off-topic there. It is
however,very much on-topic on bug-wget@gnu.org and we should continue this
discussion only there.

* Tim Rühsen <tim.rueh...@gmx.de> [180322 14:20]:
> On 03/22/2018 02:01 PM, Aniketh Gireesh wrote:
> > Further, In my opinion, I think it would be better as a different
> > library/directory. I think that would be a better refactoring method as
> > well as it would be easier to work on the codebase at a later point in
> > time. Further, as far as my understanding goes, libwget is a library
> > handling HTTP, helping in creating an HTTP request. It seems better to have
> > something different to handle DNS and other things regarding that. It would
> > feel like all cluttered up inside libwget.
> > 
> > If this is not the way we want it in Wget2, just let me know. I will change
> > the proposal as well as the plans for implementation :)
> 
> Since your code will likely use functions from libwget and the other way
> round, we should place it in libwget/. But if it makes your development
> easier during GSOC, feel free to put it into a separate directory.
> 
> For the future we have a splitting of libwget in several libraries in
> mind, but it currently has low priority. We may have some day
> libwget-common, libwget-doh, libwget-warc, ...
> 
> Regards, Tim
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Error with Ambari Set Up

2018-03-09 Thread Darshit Shah
Hi Aniket,

If the webserver says that the page you were looking for does not exist, it
does not exist. You should probably try to talk to the person hosting the file
or to the person that gave you the link. I'm not sure how Wget can help you in
this regard.

All Wget does is report back to you what the server responded.


* Aniket Chowdhury <anik...@pentationanalytics.com> [180309 16:24]:
> Dear All,
> 
> Hope you are doing well.
> 
> Actually I'm facing a problem (...404 Not Found) while trying to set up 
> Amabari on linux (putty) console.
> 
> My link was 
> http://public-repo-1.hortonworks.com/ambari/centos/1.x/updates/1.4.3.38/ambari.rep.
> I've searched & tried for the probable solutions.
> I've tried for 
> with single quotes,
> with single quotes,
> wget -4 
> http://public-repo-1.hortonworks.com/ambari/centos/1.x/updates/1.4.3.38/ambari.repo
>  
>  (as I'm using IPV4).
> But every time I was getting the same error.
> I'm sharing the screen. Could you please guide me to resolve it out.
> 
> 
> 
> Regards,
> Aniket.



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] [GSoC 2018] Introductiion

2018-02-26 Thread Darshit Shah
Hi Divesh,

Glad to see you're interested in Wget2!

We have a few interesting projects this year for Wget2. Apart from those, feel
free to propose a project idea on your own as well.

While you decide on a project / we help you find a good one, I would suggest
you go through the GSoC Information page for Wget here:
https://gitlab.com/darnir/wget2/wikis/wget-gsoc

I would strongly advise you to work on some bugs on our issue tracker. The easy
ones are often marked "Junior". Fixing some bugs is important since it allows
you to understand our codebase and it gives us a good understanding of your
coding style and quality as well.

Anytime you are stuck, please don't hesitate to contact us. The issue tracker /
mailing list / mails to mentors are all valid forms of communication.


* Divesh Uttamchandani <diveshuttamchand...@gmail.com> [180226 16:02]:
> Hi everyone ,
> 
> I am Divesh Uttamchandani, currently a sophomore at Birla Institute of
> Technology and Science , Pilani (India) majoring in Computer Science. I
> found the GSoC projects for wget2 really interesting and would like to
> contribute to them this summer.
> 
> I am familiar with C and somewhat familiar with Unix systems programming. I
> have keen interest in working of network based applications. I am also
> currently studying design of small network applications on Unix based
> systems as a part of my school curriculum.
> 
> You can find some of my work on Unix systems programming in C (Specifically
> IPC) on this GitHub repo <http://github.com/diveshuttam/IS-F421>. It is a
> part of the same course I mentioned.
> 
> Apart from this I have knowledge of python and have done a few interesting
> projects including a CLI <https://github.com/termicoder/termicoder>
> and a Telegram
> Chatbot <https://github.com/diveshuttam/MessMenuBot> and have basic
> familiarity with various http requests. I also pursue sport algorithmic
> programming as a hobby and have keen interest in algorithms.
> 
> I am not sure which project will be fit for me and personally find all of
> them interesting.
> I have successfully built wget2 on my PC. Since, I am new to your code
> base, can you guys give me some pointers to issues that I can work on based
> upon my skills and familiarity.
> 
> 
> Regards,
> Divesh Uttamchandani
> GitHub <https://github.com/diveshuttam> | GitLab
> <https://gitlab.com/diveshuttamchandani>

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Interest in GSoC 2018

2018-02-25 Thread Darshit Shah
Hi Jiading!

Thanks for pointing out the broken link :)

I've fixed it now. However, since these are open wikis, anyone can make a
change if you spot an error. 

Welcome to our little dev community! 

* Tong Zing <zing...@gmail.com> [180217 04:23]:
> Hi everyone! I'm Jiading Guo, a third year CS undergraduate student. I'm
> interested in contributing towards Wget2 project.
> 
> I have little experience in open source before but I'm trying to get my
> feet wet. I've created a merge request for one issue:
> https://gitlab.com/gnuwget/wget2/merge_requests/349. Please correct me if
> I've done something inappropriate.
> 
> Also I noticed that the link for  'Am I Good Enough?' in Gsoc: frequently
> asked questions(
> https://gitlab.com/darnir/wget2/wikis/GSoC:%20Frequently%20Asked%20Questions)
> is dead. Here is the new link:
> http://write.flossmanuals.net/gsocstudentguide/am-i-good-enough/
> 
> Thanks!

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Need help with verify-sig option

2018-02-21 Thread Darshit Shah
Hi Jay,

Are you sure that you have compiled Wget2 with GPGME?

What does `grep "GPGME: " config.log` say?

If it is no, then you probably need to install the gpgme header files. The
exact method will depend upon your specific distro. Usually for debian based
distros it should be something like `apt-get install gpgme-devel`

This also reminds me, we should have gpgme status reported in wget2 --version.
I'll add that later today

* Jay Bhavsar <jbh...@gmail.com> [180221 15:23]:
> Hello folks. I need some help with wget2. When I use --verify-sig option,
> it says it's invalid. Also, it's not listed in --help. But I can see it in
> options.c file. What am I doing wrong here? Do I need to set some flags
> during compilation? Help me out.
> 
> Thanks.

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Interest in contributing towards GNU Wget project for GSOC 2018

2018-02-16 Thread Darshit Shah
Hi Diti,

Welcome to GNU! Head on over to our GitLab repository [1] for a start.

There's a bunch of FAQs for GSoC students, have you read them already? If you
have, great! Then move on to compiling Wget2 locally and fixing some of the
smaller issues.


[1]: https://gitlab.com/gnuwget/wget2

* DITI MODI <diti.m...@somaiya.edu> [180216 14:41]:
> Dear Mentor,
> 
> I,Diti Modi, a third year Computer Science undergrad, am interested in the
> GNU Wget package projects, particularly the HTTP/2 Test Suite project.
> 
>  I have a sound knowledge of C,C++ as well as the open source. I would like
> to contribute towards this project.
> 
> Please provide me with further guidance.
> 
> Thank You
> 
> -- 
> 
> <http://www.somaiya.edu> <http://www.somaiya.edu/kjsmc>  
> <http://www.nareshwadi.org> <http://www.somaiya.edu>  
> <http://www.helpachild.in> <http://nareshwadi.org>

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] no data downloaded

2018-02-05 Thread Darshit Shah
From your description it seems like the device displays all this data using 
Javascript.

Try taking a look at the source of the HTML page you downloaded to see if this 
is the case.

Wget will download the related Javascript files if you use the -P option for 
page requisites. However, wget does not, and will not support executing the 
Javascript to get static data. 

On February 5, 2018 11:03:02 AM UTC, jos vaessen  
wrote:
>Hello,
>
>I am downloading the home-webpage from my solar-power-converter using 
>WIFI and its IP adress 160.190.0.1 with WGET. No problem beside that
>the 
>page is zipped as 7ZIP. After unzipping it shows the content of 
>index.html like a copy/paste/save home.html version on mouseclick using
>
>Firefox.
>
>BUT..
>
>Without the devicenumber, date, time, converted Watts, etc. So there is
>
>no variable data in the downloaded html using WGET.
>
>What I get is an empty html and the question is: what trigger to use to
>
>download these data too like Firefox or IE show on screen?
>
>Thanks, Jos

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


[Bug-wget] wget-1.19.4 released [stable]

2018-01-22 Thread Darshit Shah
We would like to announce GNU Wget 1.19.4.

This is a bug fix version that fixes a major bug with v1.19.3.
With v1.19.3 Wget would request the origin server for a gzip compressed
document. However, due to a logic bug, it would never decompress it for the
user. This caused almost all downloads to break.

As it turns out, implementing gzip support is not trivial; especially in the
face of many buggy servers that we have to support. Hence, for the time being,
connection compression support has been marked as experimental and disabled by
default.

==

Here are the compressed sources and a GPG detached signature[*]:
  https://ftp.gnu.org/gnu/wget/wget-1.19.4.tar.gz
  https://ftp.gnu.org/gnu/wget/wget-1.19.4.tar.gz.sig

Use a mirror for higher download bandwidth:
  https://ftpmirror.gnu.org/wget/wget-1.19.4.tar.gz
  https://ftpmirror.gnu.org/wget/wget-1.19.4.tar.gz.sig

Here are the MD5 and SHA1 checksums:

e044d97067298662a277986b58a42211  wget-1.19.4.tar.gz
cace624a5edc6168557d9507903cb794111aa9d4  wget-1.19.4.tar.gz

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify wget-1.19.4.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 2A1743EDA91A35B6

and rerun the 'gpg --verify' command.

NEWS

* Changes in Wget 1.19.4

* A major bug that caused GZip'ed pages to never be decompressed has been fixed

* Support for Content-Encoding and Transfer-Encoding have been marked as
  experimental and disabled by default


-- 
On behalf of the maintainers of GNU Wget,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature
-- 
If you have a working or partly working program that you'd like
to offer to the GNU project as a GNU package,
see https://www.gnu.org/help/evaluation.html.

Re: [Bug-wget] Wget1.19.3 seem to have the bug in decompress

2018-01-20 Thread Darshit Shah
Hi,

Thanks for reporting the bug. It indeed seems like an issue there. We really
need tests for this feature.

In any case, here is a slightly amended patch that I intend to push by this
evening if nobody complains. 

* G-Ey3dr <gey...@gmail.com> [180120 06:49]:
> Hello all,
> 
> Wget1.19.3 seem to have the bug in decompress. See below.
> 
> diff -aur wget-1.19.3_old/src/http.c wget-1.19.3_new/src/http.c
> --- wget-1.19.3_old/src/http.c2018-01-14 19:22:42.0 +0900
> +++ wget-1.19.3_new/src/http.c2018-01-20 11:46:15.897109600 +0900
> @@ -3744,7 +3744,7 @@
>/* don't uncompress if a file ends with '.gz' or '.tgz' */
>if (hs->remote_encoding == ENC_GZIP
>&& (p = strrchr(u->file, '.'))
> -  && (c_strcasecmp(p, ".gz") || c_strcasecmp(p, ".tgz")))
> +  && (!c_strcasecmp(p, ".gz") || !c_strcasecmp(p, ".tgz")))
>      {
>     hs->remote_encoding = ENC_NONE;
>  }
> 
> Best regards,
> Reiji

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
From a7cc4e2b3706c17bd64afec121e0b2515aacaf63 Mon Sep 17 00:00:00 2001
From: Reiji <gey...@gmail.com>
Date: Sat, 20 Jan 2018 14:01:37 +0100
Subject: [PATCH] * src/http.c (gethttp): Fix bug that prevented all files from
 being decompressed

Signed-off-by: Darshit Shah <dar...@gnu.org>
---
 src/http.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/http.c b/src/http.c
index 1cd2768c..5bbaa52c 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3717,7 +3717,7 @@ gethttp (const struct url *u, struct url *original_url, 
struct http_stat *hs,
   /* Make sure the Content-Type is not gzip before decompressing */
   if (type)
 {
-  const char * p = strchr (type, '/');
+  p = strchr (type, '/');
   if (p == NULL)
 {
   hs->remote_encoding = ENC_GZIP;
@@ -3744,8 +3744,9 @@ gethttp (const struct url *u, struct url *original_url, 
struct http_stat *hs,
   /* don't uncompress if a file ends with '.gz' or '.tgz' */
   if (hs->remote_encoding == ENC_GZIP
   && (p = strrchr(u->file, '.'))
-  && (c_strcasecmp(p, ".gz") || c_strcasecmp(p, ".tgz")))
+  && (c_strcasecmp(p, ".gz") == 0 || c_strcasecmp(p, ".tgz") == 0))
 {
+   DEBUGP (("Enabling broken server workaround. Will not 
decompress this GZip file.\n"));
hs->remote_encoding = ENC_NONE;
 }
 }
-- 
2.16.0



signature.asc
Description: PGP signature


[Bug-wget] wget-1.19.3 released [stable]

2018-01-19 Thread Darshit Shah
We are pleased to announce the release of GNU Wget 1.19.3.

GNU Wget is a free utility for non-interactive download of files from the Web.
It supports HTTP(S), and FTP(S) protocols, as well as retrieval through HTTP
proxies.

This is a minor bugfix release, primarily to fix the behaviour of GNU Wget with
some incorrectly configured servers.

Many thanks to everyone who contributed to this release:

Arkadiusz Miśkiewicz
Ben Fuchs
Darshit Shah
Gisle Vanem
Iru Cai
Jeffrey Walton
Matthew Thode
Noël Köthe
Peter Wu
Tim Rühsen
YX Hao

=

Here are the compressed sources and a GPG detached signature[*]:
  https://ftp.gnu.org/gnu/wget/wget-1.19.3.tar.gz
  https://ftp.gnu.org/gnu/wget/wget-1.19.3.tar.gz.sig

Use a mirror for higher download bandwidth:
  https://ftpmirror.gnu.org/wget/wget-1.19.3.tar.gz
  https://ftpmirror.gnu.org/wget/wget-1.19.3.tar.gz.sig

Here are the MD5 and SHA1 checksums:

160e3164519a062d6492d5316a884d87  wget-1.19.3.tar.gz
1d82d696c3418dea77e6861315534db7ffde25f5  wget-1.19.3.tar.gz

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify wget-1.19.3.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 2A1743EDA91A35B6

and rerun the 'gpg --verify' command.

NEWS

* Changes in Wget 1.19.3

* Prevent erroneous decompression of .gz and .tgz files with broken servers

* Added support for HTTP 308 Permanent Redirect response

* Fix a segfault in some cases where the Content-Type header is not sent

* Support OpenSSL 1.1 builds without using deprecated features

* Fix netrc file detection on Windows

* Several minor bug fixes


-- 
On behalf of the maintainers of GNU Wget,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Unexpected result with -H and -D

2018-01-17 Thread Darshit Shah
Hi,

This is a bug in Wget, apparently a really old one! Seems like the bug has been
around since atleast 1997.

Looking at the source, the issue is that Wget does a very simple suffix
matching on the actual domain and accepted domains list. This is obviously
wrong as you have just found out.

I'm going to try and implement this correctly, but I'm currently a little short
on time, so if anyone else wants to pick it up, please feel free to. It's
simple, use libpsl to get the proper domain name and match against that.


Of course, this change will require libpsl to no longer be an optional
dependency

* Friso van Vollenhoven <f.van.vollenho...@gmail.com> [180117 14:40]:
> Hello all,
> 
> I am trying to do a recursive download of a webpage and span multiple hosts
> within the same domain, but not cross to other domains. The issue is that
> the crawl does extend to other domains. My full command is this:
> 
> wget \
> --recursive \
> --no-clobber \
> --page-requisites \
> --adjust-extension \
> --span-hosts \
> --domains=scapino.nl \
> --no-parent \
> --tries=2 \
> --wait=1 \
> --random-wait \
> --waitretry=2 \
> --header='User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' \
> https://www.scapino.nl/winkels/scapino-utrecht-510061
> 
> From this combination of --span-hosts and --domains, I would expect to
> download assets from cdn.scapino.nl and www.scapino.nl, but not other
> domains. For some reason that I don't understand, wget also starts to do
> what looks like a full crawl of the domain werkenbijscapino.nl, which is
> referenced from the original page.
> 
> Any thoughts or direction would be much appreciated.
> 
> I am using wget 1.18 on Debian.
> 
> 
> Best regards,
> Friso

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


[Bug-wget] [bug #52349] Seg fault with wget -O - smxi.org/sm/sm-versions

2018-01-17 Thread Darshit Shah
Follow-up Comment #4, bug #52349 (project wget):

Hi, 

You can simply apply the patch from master to your working directory. I
haven't tested it, but it should work just fine. There aren't many changes to
that part of the codebase.

Very soon we'll have a new version released with these fixes.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #44817] http/gzip compression

2018-01-17 Thread Darshit Shah
Update of bug #44817 (project wget):

  Status:None => Fixed  
 Open/Closed:Open => Closed 
 Release:  1.16.3 => 1.19.1 


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #52898] ADD BUG EMAIL SUBMISSION ADDRESS TO DOCUMENTATION.

2018-01-15 Thread Darshit Shah
Update of bug #52898 (project wget):

  Status:None => Invalid
 Open/Closed:Open => Closed 

___

Follow-up Comment #1:

wget -h and man wget both state that you can write to bug-wget@gnu.org for
submitting bug reports

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #52897] certificate warnings not seen in chromium

2018-01-15 Thread Darshit Shah
Follow-up Comment #1, bug #52897 (project wget):

Wget uses the certificate store provided by the system. Take a look at the
output of `wget -d` to see which certificates being loaded.

Probably the certificate store being used by Wget (provided by distro) is out
of date. Both Chromium and Firefox ship with their own cert stores. 

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] Trying to compile wget on CentOS 7.4 ...

2018-01-09 Thread Darshit Shah
Hi,

Yes, you're missing the libpsl library.

You may build without the library by passing the appropriate configure flags, 
but we strongly suggest that you use the library. 

On January 9, 2018 5:51:59 PM UTC, Thomas Schweikle  
wrote:
>Hi!
>
>I am trying to compile wget on CentOS 7.4:
>
> CC   version.o
>  CC   ftp-opie.o
>  CC   openssl.o
>  CC   http-ntlm.o
>  CCLD wget
>cookies.o: In function `check_domain_match':
>cookies.c:(.text+0xd6c): undefined reference to `psl_builtin_outdated'
>collect2: error: ld returned 1 exit status
>make[3]: *** [wget] Error 1
>make[3]: Leaving directory
>`/var/lib/jenkins/sharedspace/wget/build/src'
>make[2]: *** [all] Error 2
>make[2]: Leaving directory
>`/var/lib/jenkins/sharedspace/wget/build/src'
>make[1]: *** [all-recursive] Error 1
>make[1]: Leaving directory `/var/lib/jenkins/sharedspace/wget/build'
>make: *** [all] Fehler 2
>
>Compile fails at ps1_buildin_outdated anything I am missing?
>
>
>-- 
>Thomas

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


[Bug-wget] [bug #49281] wget crash on Windows 10

2018-01-09 Thread Darshit Shah
Update of bug #49281 (project wget):

  Status:None => Fixed  
 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #52535] wget ignores no_proxy in ~/.wgetrc

2018-01-09 Thread Darshit Shah
Update of bug #52535 (project wget):

  Status:None => Fixed  
 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #52743] calculated left time and download speed freezes sometimes.

2018-01-09 Thread Darshit Shah
Update of bug #52743 (project wget):

  Status:None => Wont Fix   
 Open/Closed:Open => Closed 

___

Follow-up Comment #1:

Hi,

Wget is built entirely without multi-threading support. Hence, the idea of
having the progress printed from a second thread is not feasible.

There have been multiple attempts in the past to add threading support to
Wget, but the old code base is really hard to refactor.

Instead, do take a look at GNU Wget2, a rewrite of Wget in C which was built
with concurrency in mind.
https://gitlab.com/gnuwget/wget2

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #52705] HTML assets embedding with --page-requisites

2017-12-21 Thread Darshit Shah
Follow-up Comment #2, bug #52705 (project wget):

While MHTML was a convenient way to create snapshots of pages, sadly it was
never properly standardized and most popular browsers no longer support it.

WARC has been almost standardized and is considered the de-facto way of
archiving a web page / web site.

Wget supports saving into the WARC format. So you may want to look into using
that. 

Else, implementing MHTML should not be too hard. Just some postprocessing code
in all the places where WARC data is stored. However, none of the developers
currently have time to work on a new feature. So, if you could write a patch,
we might review  and accept it.

Implementing this as a plugin for Wget would however be easier and cleaner.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] bug in socket reuse when using wget -c

2017-12-15 Thread Darshit Shah
I've merged the above patches to master. They will be available with the next
version of Wget

* Darshit Shah <dar...@gmail.com> [171208 18:47]:
> Hi,
> 
> Thanks for your report. It is indeed a bug in Wget, as you've rightfully
> investigated. The socket still had some data which caused the next request to
> have problems.
> 
> I've attached two patches here, the first one fixes the issue. It tries to 
> read
> and discard any HTTP body still available and then re-use the socket. The
> second patch adds a test case for this scenario
> 
> * Iru Cai <mytbk920...@gmail.com> [171208 17:19]:
> > Hello wget developers,
> > 
> > I found an issue when using `wget -c`, as in:
> > 
> >   https://github.com/mholt/caddy/issues/1965#issuecomment-349220927
> > 
> > By checking out the wget source code, I can confirm that it doesn't
> > drain the response body when it meets a 416 Requested Range Not
> > Satisfiable, and then the socket will be reused for the second request
> > (http get 2.dat in this case). When parse its response, it will
> > encounter the first response's body, so it failed to get the correct
> > response header. This is why you get a blank response header.
> > 
> > Hope this can be fixed.
> > 
> > Thanks,
> > Iru
> 
> 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

> From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001
> From: Darshit Shah <dar...@gnu.org>
> Date: Fri, 8 Dec 2017 18:13:00 +0100
> Subject: [PATCH 1/2] Don't assume a 416 response has no body
> 
> * http.c(gethttp): In case of a 416 response, try to drain the socket of
> any bytes before reusing the connection
> 
> Reported-By: Iru Cai <mytbk920...@gmail.com>
> ---
>  src/http.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/src/http.c b/src/http.c
> index 95d26258..e4ff0107 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url 
> *original_url, struct http_stat *hs,
>hs->res = 0;
>/* Mark as successfully retrieved. */
>*dt |= RETROKF;
> -  if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE)
> +
> +  /* Try to maintain the keep-alive connection. It is often cheaper to
> +   * consume some bytes which have already been sent than to negotiate
> +   * a new connection. However, if the body is too large, or we don't
> +   * care about keep-alive, then simply terminate the connection */
> +  if (keep_alive &&
> +  skip_short_body (sock, contlen, chunked_transfer_encoding))
>  CLOSE_FINISH (sock);
>else
> -CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there
> -   might be more bytes in the body. */
> +CLOSE_INVALIDATE (sock);
>retval = RETRUNNEEDED;
>goto cleanup;
>  }
> -- 
> 2.15.1
> 

> From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001
> From: Darshit Shah <dar...@gnu.org>
> Date: Fri, 8 Dec 2017 18:41:07 +0100
> Subject: [PATCH 2/2] Add new test for 416 responses
> 
> * testenv/server/http/http_server.py: If there are multiple requests in
> which the requested range is unsatisfiable, then send a body in the in
> the 2nd response onwards
> * testenv/Test-416.py: New test to check how Wget handles 416 responses
> ---
>  testenv/Test-416.py| 53 
> ++
>  testenv/server/http/http_server.py |  8 ++
>  2 files changed, 61 insertions(+)
>  create mode 100755 testenv/Test-416.py
> 
> diff --git a/testenv/Test-416.py b/testenv/Test-416.py
> new file mode 100755
> index ..76b94213
> --- /dev/null
> +++ b/testenv/Test-416.py
> @@ -0,0 +1,53 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from misc.wget_file import WgetFile
> +
> +"""
> +Ensure that Wget behaves well when the server responds with a HTTP 416
> +status code. This test checks both cases:
> +1. Server sends no body
> +2. Server sends a body
> +"""
> +# File Definitions 
> ###
> +File1 = 
> "abababababababababababababababababababababababababababababababababab"
> +File2 = "ababababababababababababababababababab"
> +
> +A_File = WgetFile ("File1", File1)
> +B_File = WgetFile ("File1", File1)
> +
> +C_File = WgetFile ("File2&qu

Re: [Bug-wget] bug in socket reuse when using wget -c

2017-12-11 Thread Darshit Shah
If there's no objections by tomorrow, I'll push the patches to master after
adding Test-46.py to testenv/Makefile.am

* Darshit Shah <dar...@gmail.com> [171208 18:47]:
> Hi,
> 
> Thanks for your report. It is indeed a bug in Wget, as you've rightfully
> investigated. The socket still had some data which caused the next request to
> have problems.
> 
> I've attached two patches here, the first one fixes the issue. It tries to 
> read
> and discard any HTTP body still available and then re-use the socket. The
> second patch adds a test case for this scenario
> 
> * Iru Cai <mytbk920...@gmail.com> [171208 17:19]:
> > Hello wget developers,
> > 
> > I found an issue when using `wget -c`, as in:
> > 
> >   https://github.com/mholt/caddy/issues/1965#issuecomment-349220927
> > 
> > By checking out the wget source code, I can confirm that it doesn't
> > drain the response body when it meets a 416 Requested Range Not
> > Satisfiable, and then the socket will be reused for the second request
> > (http get 2.dat in this case). When parse its response, it will
> > encounter the first response's body, so it failed to get the correct
> > response header. This is why you get a blank response header.
> > 
> > Hope this can be fixed.
> > 
> > Thanks,
> > Iru
> 
> 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

> From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001
> From: Darshit Shah <dar...@gnu.org>
> Date: Fri, 8 Dec 2017 18:13:00 +0100
> Subject: [PATCH 1/2] Don't assume a 416 response has no body
> 
> * http.c(gethttp): In case of a 416 response, try to drain the socket of
> any bytes before reusing the connection
> 
> Reported-By: Iru Cai <mytbk920...@gmail.com>
> ---
>  src/http.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/src/http.c b/src/http.c
> index 95d26258..e4ff0107 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url 
> *original_url, struct http_stat *hs,
>hs->res = 0;
>/* Mark as successfully retrieved. */
>*dt |= RETROKF;
> -  if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE)
> +
> +  /* Try to maintain the keep-alive connection. It is often cheaper to
> +   * consume some bytes which have already been sent than to negotiate
> +   * a new connection. However, if the body is too large, or we don't
> +   * care about keep-alive, then simply terminate the connection */
> +  if (keep_alive &&
> +  skip_short_body (sock, contlen, chunked_transfer_encoding))
>  CLOSE_FINISH (sock);
>else
> -CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there
> -   might be more bytes in the body. */
> +CLOSE_INVALIDATE (sock);
>retval = RETRUNNEEDED;
>goto cleanup;
>  }
> -- 
> 2.15.1
> 

> From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001
> From: Darshit Shah <dar...@gnu.org>
> Date: Fri, 8 Dec 2017 18:41:07 +0100
> Subject: [PATCH 2/2] Add new test for 416 responses
> 
> * testenv/server/http/http_server.py: If there are multiple requests in
> which the requested range is unsatisfiable, then send a body in the in
> the 2nd response onwards
> * testenv/Test-416.py: New test to check how Wget handles 416 responses
> ---
>  testenv/Test-416.py| 53 
> ++
>  testenv/server/http/http_server.py |  8 ++
>  2 files changed, 61 insertions(+)
>  create mode 100755 testenv/Test-416.py
> 
> diff --git a/testenv/Test-416.py b/testenv/Test-416.py
> new file mode 100755
> index ..76b94213
> --- /dev/null
> +++ b/testenv/Test-416.py
> @@ -0,0 +1,53 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from misc.wget_file import WgetFile
> +
> +"""
> +Ensure that Wget behaves well when the server responds with a HTTP 416
> +status code. This test checks both cases:
> +1. Server sends no body
> +2. Server sends a body
> +"""
> +# File Definitions 
> ###
> +File1 = 
> "abababababababababababababababababababababababababababababababababab"
> +File2 = "ababababababababababababababababababab"
> +
> +A_File = WgetFile ("File1", File1)
> +B_File = WgetFile ("File1", File1)
> +
> +C_File = Wge

Re: [Bug-wget] bug in socket reuse when using wget -c

2017-12-08 Thread Darshit Shah
Hi,

Thanks for your report. It is indeed a bug in Wget, as you've rightfully
investigated. The socket still had some data which caused the next request to
have problems.

I've attached two patches here, the first one fixes the issue. It tries to read
and discard any HTTP body still available and then re-use the socket. The
second patch adds a test case for this scenario

* Iru Cai <mytbk920...@gmail.com> [171208 17:19]:
> Hello wget developers,
> 
> I found an issue when using `wget -c`, as in:
> 
>   https://github.com/mholt/caddy/issues/1965#issuecomment-349220927
> 
> By checking out the wget source code, I can confirm that it doesn't
> drain the response body when it meets a 416 Requested Range Not
> Satisfiable, and then the socket will be reused for the second request
> (http get 2.dat in this case). When parse its response, it will
> encounter the first response's body, so it failed to get the correct
> response header. This is why you get a blank response header.
> 
> Hope this can be fixed.
> 
> Thanks,
> Iru



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001
From: Darshit Shah <dar...@gnu.org>
Date: Fri, 8 Dec 2017 18:13:00 +0100
Subject: [PATCH 1/2] Don't assume a 416 response has no body

* http.c(gethttp): In case of a 416 response, try to drain the socket of
any bytes before reusing the connection

Reported-By: Iru Cai <mytbk920...@gmail.com>
---
 src/http.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/http.c b/src/http.c
index 95d26258..e4ff0107 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url *original_url, 
struct http_stat *hs,
   hs->res = 0;
   /* Mark as successfully retrieved. */
   *dt |= RETROKF;
-  if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE)
+
+  /* Try to maintain the keep-alive connection. It is often cheaper to
+   * consume some bytes which have already been sent than to negotiate
+   * a new connection. However, if the body is too large, or we don't
+   * care about keep-alive, then simply terminate the connection */
+  if (keep_alive &&
+  skip_short_body (sock, contlen, chunked_transfer_encoding))
 CLOSE_FINISH (sock);
   else
-CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there
-   might be more bytes in the body. */
+CLOSE_INVALIDATE (sock);
   retval = RETRUNNEEDED;
   goto cleanup;
 }
-- 
2.15.1

From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001
From: Darshit Shah <dar...@gnu.org>
Date: Fri, 8 Dec 2017 18:41:07 +0100
Subject: [PATCH 2/2] Add new test for 416 responses

* testenv/server/http/http_server.py: If there are multiple requests in
which the requested range is unsatisfiable, then send a body in the in
the 2nd response onwards
* testenv/Test-416.py: New test to check how Wget handles 416 responses
---
 testenv/Test-416.py| 53 ++
 testenv/server/http/http_server.py |  8 ++
 2 files changed, 61 insertions(+)
 create mode 100755 testenv/Test-416.py

diff --git a/testenv/Test-416.py b/testenv/Test-416.py
new file mode 100755
index ..76b94213
--- /dev/null
+++ b/testenv/Test-416.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python3
+from sys import exit
+from test.http_test import HTTPTest
+from misc.wget_file import WgetFile
+
+"""
+Ensure that Wget behaves well when the server responds with a HTTP 416
+status code. This test checks both cases:
+1. Server sends no body
+2. Server sends a body
+"""
+# File Definitions ###
+File1 = "abababababababababababababababababababababababababababababababababab"
+File2 = "ababababababababababababababababababab"
+
+A_File = WgetFile ("File1", File1)
+B_File = WgetFile ("File1", File1)
+
+C_File = WgetFile ("File2", File2)
+D_File = WgetFile ("File2", File1)
+
+E_File = WgetFile ("File3", File1)
+
+WGET_OPTIONS = "-c"
+WGET_URLS = [["File1", "File2", "File3"]]
+
+Files = [[A_File, C_File, E_File]]
+Existing_Files = [B_File, D_File]
+
+ExpectedReturnCode = 0
+ExpectedDownloadedFiles = [B_File, D_File, E_File]
+
+ Pre and Post Test Hooks #
+pre_test = {
+"ServerFiles"   : Files,
+"LocalFiles": Existing_Files
+}
+test_options = {
+"WgetCommands"  : WGET_OPTIONS,
+"Urls"  : WGET_URLS
+}
+post_test = {
+"ExpectedFiles" : ExpectedDownloadedFiles,
+

[Bug-wget] [bug #52581] Segfault while trying to download woff2 file.

2017-12-03 Thread Darshit Shah
Update of bug #52581 (project wget):

  Status:None => Fixed  
 Open/Closed:Open => Closed 

___

Follow-up Comment #1:

Hi,

Thanks for the report. This issue has already been fixed in the master branch
with commit: 973c26ed7d51052a7b6e120ed1b84e4727e1

The fix will be available with the next release.

In the meantime, you may use the wget-git package on AUR where you will not
encounter this issue

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] static builds?

2017-11-26 Thread Darshit Shah
Hi,

I agree that might be a good idea. Static binaries are indeed used in a few
environments.

Since you already have some experience in this area, would you mind writing a
patch for Wget? 

* Liviu Ionescu <i...@livius.net> [171127 02:17]:
> Hi,
> 
> These days I prepared a build box that uses mainly static tools, and while 
> compiling wget, I had problems with the dependencies, I had to manually add 
> them to the linker flags (-ltasn1 -lnettle -lhogweed -lgmp).
> 
> I checked the `configure` script and could not find a `--static` option; 
> perhaps it would be useful to add one, to do this automatically.
> 
> 
> Regards,
> 
> Liviu
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Failing to compile without depricated features from openssl

2017-11-21 Thread Darshit Shah
At first sight, this seems fine to me. However, I would leave it to either Tim
or Ander to ACK this patch since they are the experts when it comes to the
SSL/TLS stack code.

* Matthew Thode <prometheanf...@gentoo.org> [171121 08:49]:
> Hi,
> 
> It looks like openssl-1.1 support needs to be tweaked a bit to support
> building when openssl does not support depricated features.
> 
> We are tracking the bug here, https://bugs.gentoo.org/604490 and have an
> attached patch here https://bugs.gentoo.org/attachment.cgi?id=498698
> 
> The patch looks straight forward to my untrained eyes, but I'd like an
> ack on it or to possibly get the patch committed.  (if just an ack I'd
> start carrrying it in our tree).
> 
> Thanks for looking,
> 
> -- 
> Matthew Thode (prometheanfire)



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Fwd: patch proposition

2017-09-14 Thread Darshit Shah
* kalle <ka...@projektwerkstatt.de> [170914 10:06]:
> hello,
> i hereby repeat my request (in standard english for Darshit):
> can someone please insert this patch?
> no one responded to my patch proposal last time.
> kalle
> 
> 
>  Weitergeleitete Nachricht 
> Betreff: patch proposition
> Datum: Mon, 10 Jul 2017 19:36:56 +0200
> Von: kalle <ka...@projektwerkstatt.de>
> An: Dale R. Worley <wor...@alum.mit.edu>
> 
> at the end of the second segment of the node '3 Recursive Download',
> which ends with the words "parsed and followed further",add:
> "However, wget by default will not follow links to a different host than
> the one, the link was found on."
> 
> kalle
> 
> 
> Am 08.07.2017 um 03:06 schrieb Dale R. Worley:
> > The effective way to do this is propose as a patch the specific changes
> > in the documentation that you would like to see.
> 
> 

Hi,

We can indeed do that. I'll try and fix the info pages accordingly.

Thanks for the readability improvements to the documentation

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


  1   2   3   4   5   6   7   >