Re: [Bug-wget] Meet Wget maintainers at FOSDEM

2019-02-01 Thread Giuseppe Scrivano
I am going to be around, but in the containers room :-)

I'll poke at you on Twitter.

For anyone else, I am "@gscrivano" on Twitter and Telegram.

Regards,
Giuseppe



Tim Rühsen  writes:

> Everybody who likes to meet me at FOSDEM is encouraged to do so !
>
> DM me on Twitter via @ruehsen or ask around in the security devroom
> (https://fosdem.org/2019/schedule/track/security/). I won't check my
> emails regularly during that time.
>
> Have a nice weekend :-)
>
> Tim



Re: [Bug-wget] Add support to bind to a local port

2018-05-03 Thread Giuseppe Scrivano
Hi,

some minor comments:

Darshit Shah  writes:

> +@cindex bind port
> +@cindex client port number
> +@cindex IP address, client, port
> +@item --bind-port=@var{PORT}
> +When making client TCP/IP connections using @samp{--bind-address}, 
> additionally
> +bind to a specific @var{PORT} on the client machine.  If a negative value is
> +passed as the parameter, then the default vallue of 0 will be used.

s|vallue|value|

> diff --git a/src/connect.c b/src/connect.c
> index 37dae215..37a30879 100644
> --- a/src/connect.c
> +++ b/src/connect.c
> @@ -187,7 +187,7 @@ resolve_bind_address (struct sockaddr *sa)
>if (called)
>  {
>if (should_bind)
> -sockaddr_set_data (sa, , 0);
> +sockaddr_set_data (sa, , opt.bind_port);
>return should_bind;
>  }
>called = true;
> @@ -209,7 +209,7 @@ resolve_bind_address (struct sockaddr *sa)
>ip = *address_list_address_at (al, 0);
>address_list_release (al);
>  
> -  sockaddr_set_data (sa, , 0);
> +  sockaddr_set_data (sa, , opt.bind_port);
>should_bind = true;
>return true;
>  }
> @@ -340,6 +340,19 @@ connect_to_ip (const ip_address *ip, int port, const 
> char *print)
>struct sockaddr *bind_sa = (struct sockaddr *)_ss;
>if (resolve_bind_address (bind_sa))
>  {
> +
> +   // Set the SO_REUSEADDR socket option if it is available. It is
> +// useful when explicitly binding to a given address

mix of spaces and tabs here.

> +#ifdef SO_REUSEADDR
> +  /* For setting options with setsockopt. */
> +  int setopt_val = 1;
> +  void *setopt_ptr = (void *)_val;
> +  socklen_t setopt_size = sizeof (setopt_val);
> +
> +  if (setsockopt (sock, SOL_SOCKET, SO_REUSEADDR, setopt_ptr, 
> setopt_size))
> +logprintf (LOG_NOTQUIET, _("setsockopt SO_REUSEADDR failed: 
> %s\n"),
> +   strerror (errno));
> +#endif
>if (bind (sock, bind_sa, sockaddr_size (bind_sa)) < 0)
>  goto err;
>  }
> diff --git a/src/init.c b/src/init.c
> index e4186abe..98b6ac45 100644
> --- a/src/init.c
> +++ b/src/init.c
> @@ -150,6 +150,7 @@ static const struct {
>  #ifdef HAVE_LIBCARES
>{ "binddnsaddress",   _dns_address,  cmd_string },
>  #endif
> +  { "bindport",  _port, 
> cmd_number },
>{ "bodydata", _data, cmd_string },
>{ "bodyfile", _file, cmd_string },
>  #ifdef HAVE_SSL
> @@ -396,6 +397,7 @@ defaults (void)
>opt.metalink_index = -1;
>  #endif
>  
> +  opt.bind_port = -1;
>opt.cookies = true;
>opt.verbose = -1;
>opt.ntry = 20;
> diff --git a/src/main.c b/src/main.c
> index 46824efd..c6e560bd 100644
> --- a/src/main.c
> +++ b/src/main.c
> @@ -275,6 +275,7 @@ static struct cmdline_option option_data[] =
>  #ifdef HAVE_LIBCARES
>  { "bind-dns-address", 0, OPT_VALUE, "binddnsaddress", -1 },
>  #endif
> +{ "bind-port", 0, OPT_VALUE, "bindport", -1 },
>  { "body-data", 0, OPT_VALUE, "bodydata", -1 },
>  { "body-file", 0, OPT_VALUE, "bodyfile", -1 },
>  { IF_SSL ("ca-certificate"), 0, OPT_VALUE, "cacertificate", -1 },
> @@ -692,6 +693,8 @@ Download:\n"),
>-Q,  --quota=NUMBER  set retrieval quota to NUMBER\n"),
>  N_("\
> --bind-address=ADDRESS  bind to ADDRESS (hostname or IP) on local 
> host\n"),
> +N_("\
> +   --bind-port=PORTbind to PORT on local host\n"),
>  N_("\
> --limit-rate=RATE   limit download rate to RATE\n"),
>  N_("\
> @@ -1749,6 +1752,15 @@ for details.\n\n"));
>exit (WGET_EXIT_GENERIC_ERROR);
>  }
>  
> +  if (opt.bind_port != -1 && !opt.bind_address) {
> +fprintf (stderr, _("bind-port requires bind-address to also be 
> specified.\n"));
> +exit (WGET_EXIT_GENERIC_ERROR);
> +  } else {
> +// We explicitly set the port to 0 if nothing (or a negative value) is
> +// specified
> +opt.bind_port = MAX(opt.bind_port, 0);
> +  }
> +
>/* Compile the regular expressions.  */
>switch (opt.regex_type)
>  {
> diff --git a/src/options.h b/src/options.h
> index 30845a1b..777affad 100644
> --- a/src/options.h
> +++ b/src/options.h
> @@ -219,6 +219,7 @@ struct options
>bool page_requisites; /* Whether we need to download all files
> necessary to display a page properly. */
>char *bind_address;   /* What local IP address to bind to. */
> +  int bind_port; /* What local port to bind to. 
> */

same here, there are some tabs.

Thanks,
Giuseppe



Re: [Bug-wget] Patch: Call WSACleanup for Windows when sockets finished

2017-11-28 Thread Giuseppe Scrivano
Hi Yuxi,

"Yuxi Hao"  writes:

> +#ifdef WINDOWS
> +  ws_cleanup ();
> +#endif
> +
>cleanup ();

ws_cleanup is registered using an atexit callback.  Have you observed
any issue with it?  Isn't that triggered?

Giuseppe



Re: [Bug-wget] Wget1 Gzip Compression

2017-07-26 Thread Giuseppe Scrivano
Hi Tim,

Tim Schlueter  writes:

> Hi,
>
> I was wondering if there is any interest here in adding gzip compression
> support to wget1.
>
> I recently came across a misconfigured web server which would gzip all
> responses regardless of the accept-encoding HTTP request header.
>
> This motivated me to spend some time working on adding on the fly gzip
> decompression to wget1 and making some other compression-related
> improvements to the codebase (before I discovered that wget2 is a work
> in progress).
>
> So before I spend more time working on it, I wanted to see if gzip
> support is something that you would consider adding to wget1 if I
> submitted a patch.

I think that would be a nice feature to have.  We are already linking to
libz for the WARC support so gzip compression won't require a new
dependency for wget.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 3/3] Add command line option to disable use of .netrc

2017-05-16 Thread Giuseppe Scrivano
Tomas Hozza  writes:

> The option to turn off reading .netrc by configuration in .wgetrc is
> still there. This command line option was supposed to be just
> something extra. Anyway Tim already merged the change. Do you want to
> revert it?

Tim, are you in favor of this change?

Well, not worth a revert probably, we are already exposing many options
this way.

Giuseppe



Re: [Bug-wget] [PATCH 3/3] Add command line option to disable use of .netrc

2017-05-16 Thread Giuseppe Scrivano
Hi Tomas,

Tomas Hozza  writes:

> If you don't want to add new option, maybe just documenting the possibility 
> to turn off reading of .netrc would be good enough.
>
> Please let me know what is your preference as I don't have one.

I'd prefer if we just document how to turn off reading .netrc with what
we have now.  The advantage is that it can be done permanent adding the
same configuration in ~/.wgetrc.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 3/3] Add command line option to disable use of .netrc

2017-05-12 Thread Giuseppe Scrivano
Hi Tomas,

Tomas Hozza  writes:

> Although internally code uses option for (not) reading .netrc for
> credentials, it was not possible to turn this behavior off on command
> line. Note that it was possible to turn it off using wgetrc.
>
> Idea for this change came from Bruce Jerrick (bmj...@gmail.com).
> Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1425097

wouldn't "-e netrc=off" be sufficient?

Regards,
Giuseppe



Re: [Bug-wget] Article explaining Wget in German computer magazin c't

2017-03-31 Thread Giuseppe Scrivano
Hi Tim,

Tim Rühsen  writes:

> FYI,
>
> c't 8/2017 has a 4 paged article explaining Wget and several kinds of
> usages (German language only).

cool!  Is there any online version of it?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] Add support for --retry-http503

2017-01-31 Thread Giuseppe Scrivano
Hi Tom,

> Sure, that is entirely doable from a technical point of view.
>
> However, I have thought about this beforehand, and to me, 503 and 504
> were the only 5xx errors potentially worth retrying. All the others
> signal fatal conditions. And Wget already retries 504 without any
> patch. Hence, the patch is for 503 only.
>
> Let me know if you still want me to prepare a new patch based on your
> proposal.

yes please.  I was just giving an example of how that would look like.
I was just wondering that the same mechanism could be used for other
errors (if it will ever be needed).

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] Add support for --retry-http503

2017-01-31 Thread Giuseppe Scrivano
Hi Tom,

Tom Szilagyi  writes:

> Consider HTTP 503 (Service unavailable) as a non-fatal, transient
> error. Normally Wget gives up immediately on receiving this HTTP
> response. Certain special use cases might require Wget to retry even
> in the face of this error. With this option, such retries are
> performed subject to the normal retry timing and retry count
> limitations of Wget. Using this option is generally not recommended.

thanks for working on it.  If we are going to add something like this, I
think it should be made more generic.  What about something like:

--retry-on-http-error=503,504,505...

Regards,
Giuseppe



Re: [Bug-wget] Why does wget send all messages to stderr?

2017-01-25 Thread Giuseppe Scrivano
V for Vortex  writes:

> wget sends all messages to stderr, not just errors. Why is that?

differently than what the name suggests, stderr is the destination for
diagnostic messages not only errors, while stdout is where you get the
expected output.

In the case of wget, the expected output is the file you download, for
example if you specify:

wget -O- something | some_filter..

If we were sending diagnostic messages to stdout, that would not work.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH v2 01/27] new Metalink functionalities

2016-10-03 Thread Giuseppe Scrivano
Tim Ruehsen  writes:

> @Giuseppe Please go ahead with merging.

I've just pushed the series.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH v2 01/27] new Metalink functionalities

2016-09-29 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> Series of patches to implement new Metalink functionalities.
>
> In response to Giuseppe 
> http://lists.gnu.org/archive/html/bug-wget/2016-09/msg00127.html , here you 
> find my revised series of patches.
>
> Posting with `git send-email` as requested.
>
> This series of patches supersedes the following:

thanks for your work!  I went quickly through it and it seems you
addressed the reported issues, I will give it another look tomorrow or
during the weekend.
Given the size of the changes, I'll wait for other comments before
merging it though.
If there are no other comments, I will amend the minor things I reported
before merging the series without the need of a v3.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 27/27] New: Metalink/XML v3 python class, update tests to use this class

2016-09-29 Thread Giuseppe Scrivano
Matthew White  writes:

> * NEWS: Mention the Metalink/XML v3 python class
> * testenv/misc/metalinkv3_xml.py: New Metalink/XML v3 python class

I think this should not be included.  NEWS is for users to know what
changed in a new release, we don't add internal changes.

Giuseppe



Re: [Bug-wget] [PATCH 21/27] New option --metalink-index to process Metalink application/metalink4+xml

2016-09-29 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> diff --git a/src/init.c b/src/init.c
> index 6729c5a..26f3886 100644
> --- a/src/init.c
> +++ b/src/init.c
> @@ -248,6 +248,7 @@ static const struct {
>{ "login",_user,  cmd_string },/* deprecated*/
>{ "maxredirect",  _redirect,  cmd_number },
>  #ifdef HAVE_METALINK
> +  { "metalinkindex",_index, cmd_number_inf },
>{ "metalinkoverhttp", _over_http, cmd_boolean },
>  #endif
>{ "method",   ,cmd_string_uppercase },
> @@ -385,6 +386,10 @@ defaults (void)
>   bit pattern will be the least of the implementors' worries.  */
>xzero (opt);
>  
> +#ifdef HAVE_METALINK
> +  opt.metalink_index = -1;
> +#endif
> +
>opt.cookies = true;
>opt.verbose = -1;
>opt.ntry = 20;
> diff --git a/src/main.c b/src/main.c
> index ac6ee2c..d48e3b2 100644
> --- a/src/main.c
> +++ b/src/main.c
> @@ -354,6 +354,7 @@ static struct cmdline_option option_data[] =
>  { "rejected-log", 0, OPT_VALUE, "rejectedlog", -1 },
>  { "max-redirect", 0, OPT_VALUE, "maxredirect", -1 },
>  #ifdef HAVE_METALINK
> +{ "metalink-index", 0, OPT_VALUE, "metalinkindex", -1 },
>  { "metalink-over-http", 0, OPT_BOOLEAN, "metalinkoverhttp", -1 },
>  #endif
>  { "method", 0, OPT_VALUE, "method", -1 },
> @@ -714,6 +715,8 @@ Download:\n"),
>  N_("\
> --keep-badhash  keep files with checksum mismatch (append 
> .badhash)\n"),
>  N_("\
> +   --metalink-index=NUMBER Metalink application/metalink4+xml 
> metaurl ordinal NUMBER\n"),

"Metalink" should be lower case.

Giuseppe



Re: [Bug-wget] [PATCH 25/25] New: --metalink-over-http Content-Type/Disposition Metalink/XML processing

2016-09-26 Thread Giuseppe Scrivano
Hi,

Matthew White  writes:

>> Could you please re-send the updated series?
>
> You may find the series of patches in my branch 
> https://github.com/mehw/wget/tree/metalink_staging .

it works for me, I will take a look soon at them.

>
> There's a new Metalink/XML v3 python class:
> * testenv/misc/metalinkv3_xml.py
>
> I still have to update the NEWS file.
>
>> 
>> I had some problems applying your patches, as some comments from the
>> emails got into the patch itself, could you try with git send-email?
>
> About `git send-email`, what Subject, In-Reply-to, and References should be 
> used to post the series of patches?

you could use the subject to tag the new version of the series: like
"PATCH v2".  In-reply-to and References should be empty for a new
version of a patch series.  They are more helpful if you are replying to
a single message.


> What about the gpg signature?

I have not tried that, but I guess git send-email should be able to keep
signed commits.  At least if the patches are applied to the same commit
and the checksum is not changed.

Thanks,
Giuseppe



Re: [Bug-wget] bug #46584: wget --spider always returns zero exit status

2016-09-25 Thread Giuseppe Scrivano
Darshit Shah  writes:

> Apart from the patch format, the patch itself looks good.
>
> @Giuseppe: We will require Copyright assignments for Piotr right? This
> patch may be small, but there are a couple others in the pipeline.

yes, I think we will need that.  The sum of all the contributions cannot
be considered trivial.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 25/25] New: --metalink-over-http Content-Type/Disposition Metalink/XML processing

2016-09-25 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> [Coverity Scan is ok, make syntax-check is ok, make check-valgrind is ok, 
> contrib/check-hard is ok]
>
> This implements the auto-processing of Metalink files part of a 
> "Content-Type: application/metalink4+xml" header answer.
>
> e.g.:
> $ wget --metalink-over-http \
>--metalink-index=inf \
>[--content-disposition \]
>[--trust-server-names \]
>[--header="Accept: */*,application/metalink4+xml" \]
>
>
> The following description is verbatim from the patch:
> -
> Process the Content-Type header, identify an application/metalink4+xml
> file.  The Content-Disposition could provide an alternate name through
> the "filename" field for the metalink xml file.  Respectively, the cli
> options --metalink-over-http and --content-disposition are required.
>
> When Metalink/XML auto-processing, to use the Content-Disposition's
> filename, the cli option --trust-server-names is also required.
> -

this looks fine to me as well.

Could you please re-send the updated series?

I had some problems applying your patches, as some comments from the
emails got into the patch itself, could you try with git send-email?

Thanks,
Giuseppe



Re: [Bug-wget] [PATCH 20/25] New option --metalink-index to process Metalink application/metalink4+xml

2016-09-15 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

>> > Function amended to modify *name in place.
>> >
>> > Followed Tim's suggestions for Patch 09/25 about different environments 
>> > compatibility, the function now uses last_component() to detect the 
>> > basename:
>> > http://lists.gnu.org/archive/html/bug-wget/2016-09/msg00083.html
>> >
>> > NOTES: if *name is NULL and ref is like 'dir/C:D:file', the result will be 
>> > 'C:D:file'; is it advisable to remove the drive letters 'C:D:' and return 
>> > only 'file'?
>> >
>> > /*
>> >   Replace/remove the basename of a file name.
>> >
>> >   The file name is permanently modified.
>> >
>> >   Always set NAME to a string, even an empty one.
>> >
>> >   Use REF's basename as replacement.  If REF is NULL or if it doesn't
>> >   provide a valid basename candidate, then remove NAME's basename.
>> > */
>> > void
>> > replace_metalink_basename (char **name, char *ref)
>> > {
>> 
>> is it something we could do using only dirname and basename?  What you
>> need here is "dirname(name) + basename(ref)"?
>
> You asked to avoid superfluous memory allocations, right?

yes, and that is still my idea.  I was just wondering if the cost of
these extra memory allocations was worth.  Is it enough to use
last_component() to get it working on Windows?

Thanks,
Giuseppe



Re: [Bug-wget] [PATCH 20/25] New option --metalink-index to process Metalink application/metalink4+xml

2016-09-14 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White <mehw.is...@inventati.org> writes:

> On Sun, 11 Sep 2016 23:39:09 +0200
> Giuseppe Scrivano <gscriv...@gnu.org> wrote:
>
>> Hi Matthew,
>> 
>> Matthew White <mehw.is...@inventati.org> writes:
>> 
>> > +void
>> > +replace_metalink_basename (char **name, char *ref)
>> > +{
>> > +  size_t dir_len = 0;
>> > +  char *p, *dir, *file, *new;
>> > +
>> > +  if (!name)
>> > +return;
>> > +
>> > +  /* New basename from file name reference.  */
>> > +  file = ref;
>> > +  if (file)
>> > +{
>> > +  p = strrchr (file, '/');
>> > +  if (p)
>> > +file = p + 1;
>> > +}
>> > +
>> > +  /* Old directory.  */
>> > +  dir = NULL;
>> > +  if (*name)
>> > +{
>> > +  p = strrchr (*name, '/');
>> > +  if (p)
>> > +dir_len = (p - *name) + 1;
>> > +}
>> > +  dir = xstrndup (*name, dir_len);
>> 
>> I'll review this patch more in details, but one small thing here, could
>> we modify name in place and avoid a memory allocation for dir?
>> 
>> name[dir_len] = '\0';
>
> Function amended to modify *name in place.
>
> Followed Tim's suggestions for Patch 09/25 about different environments 
> compatibility, the function now uses last_component() to detect the basename:
> http://lists.gnu.org/archive/html/bug-wget/2016-09/msg00083.html
>
> NOTES: if *name is NULL and ref is like 'dir/C:D:file', the result will be 
> 'C:D:file'; is it advisable to remove the drive letters 'C:D:' and return 
> only 'file'?
>
> /*
>   Replace/remove the basename of a file name.
>
>   The file name is permanently modified.
>
>   Always set NAME to a string, even an empty one.
>
>   Use REF's basename as replacement.  If REF is NULL or if it doesn't
>   provide a valid basename candidate, then remove NAME's basename.
> */
> void
> replace_metalink_basename (char **name, char *ref)
> {

is it something we could do using only dirname and basename?  What you
need here is "dirname(name) + basename(ref)"?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 01/25] Add two Metalink/XML tests

2016-09-13 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> Next patches add more conditions to testenv/Test-metalink-xml-relpath.py.
>
> Also variants are introduced, like:
> * testenv/Test-metalink-xml-trust.py
> * testenv/Test-metalink-xml-homepath.py
> * testenv/Test-metalink-xml-homepath-trust.py
> * testenv/Test-metalink-xml-abspath-trust.py
> * testenv/Test-metalink-xml-relpath-trust.py
>
> And there will be other tests with other meanings too.
>
> So, I don't know if it's a good idea to mix specific tests together.
>
> This topic also applies to Patch 05/25, Patch 10/25, and Patch 17/25:
> http://lists.gnu.org/archive/html/bug-wget/2016-09/msg00046.html
> http://lists.gnu.org/archive/html/bug-wget/2016-09/msg00050.html
> http://lists.gnu.org/archive/html/bug-wget/2016-09/msg00057.html
>
> WDYT?

Could we have a base file, that is shared by these files?  I agree it is
good to have a separate file for each different test, but if the
difference is only for one line, I think we can refactor them a bit.  It
will make maintainance easier later.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 25/25] New: --metalink-over-http Content-Type/Disposition Metalink/XML processing

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> From ec75cdd3f0e748523c64856f01b28128f806fb16 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Mon, 29 Aug 2016 20:59:35 +0200
> Subject: [PATCH 25/25] New: --metalink-over-http Content-Type/Disposition
>  Metalink/XML processing
>
> * src/http.c (metalink_from_http): Process the Content-Type header.
>   Add an application/metalink4+xml URL as metalink metaurl.  If the
>   option opt.content_disposition is true, the Content-Disposition's
>   filename is the metaurl's name
> * doc/wget.texi: Update --content-disposition and --metalink-over-http
> * doc/metalink-standard.txt: Update doc. Content-Type/Disposition
>   processing through --metalink-over-http. Update download naming
>   system about --trust-server-names and --content-disposition
> * testenv/Makefile.am: Add new files
> * testenv/Test-metalink-http-xml-type.py: New file. Metalink/HTTP
>   Content-Type/Disposition header automated Metalink/XML tests
> * testenv/Test-metalink-http-xml-type-trust.py: New file. Metalink/HTTP
>   Content-Type/Disposition header with --trust-server-names automated
>   Metalink/XML tests
> * testenv/Test-metalink-http-xml-type-content.py: New file. Metalink/HTTP
>   Content-Type/Disposition header with --content-disposition automated
>   Metalink/XML tests
> * testenv/Test-metalink-http-xml-type-trust-content.py: New file.
>   Metalink/HTTP Content-Type/Disposition header with --trust-server-names
>   and --content-disposition automated Metalink/XML tests
>
> Process the Content-Type header, identify an application/metalink4+xml
> file.  The Content-Disposition could provide an alternate name through
> the "filename" field for the metalink xml file.  Respectively, the cli
> options --metalink-over-http and --content-disposition are required.
>
> When Metalink/XML auto-processing, to use the Content-Disposition's
> filename, the cli option --trust-server-names is also required.
> ---
>  doc/metalink-standard.txt  |   4 +
>  doc/wget.texi  |   5 +
>  src/http.c |  81 
>  testenv/Makefile.am|   4 +
>  testenv/Test-metalink-http-xml-type-content.py | 222 
> +
>  .../Test-metalink-http-xml-type-trust-content.py   | 222 
> +
>  testenv/Test-metalink-http-xml-type-trust.py   | 222 
> +
>  testenv/Test-metalink-http-xml-type.py | 222 
> +
>  8 files changed, 982 insertions(+)
>  create mode 100755 testenv/Test-metalink-http-xml-type-content.py
>  create mode 100755 testenv/Test-metalink-http-xml-type-trust-content.py
>  create mode 100755 testenv/Test-metalink-http-xml-type-trust.py
>  create mode 100755 testenv/Test-metalink-http-xml-type.py
>

I'll have a better look at this patch.

Great work on your series!  Could you also update the NEWS file with all
that is worth to say about all your new changes?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 24/25] Bugfix: Set NULL variable due to --content-disposition to Metalink origin

2016-09-11 Thread Giuseppe Scrivano
> [Coverity Scan is ok, make syntax-check is ok, make check-valgrind is ok, 
> contrib/check-hard is ok]
>
> This solves a segmentation fault when local_file is not properly set, i.e. 
> due to the use of the option --content-disposition.
>
> The segmentation fault was generated trying to dereference a NULL local_file.
>
> Regards,
> Matthew
>
> -- 
> Matthew White 
>
> From 68e71eba7b3384543c46f43e873421ee0b8722a0 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Tue, 30 Aug 2016 12:09:38 +0200
> Subject: [PATCH 24/25] Bugfix: Set NULL variable due to --content-disposition
>  to Metalink origin
>
> * src/http.c (http_loop): Prevent SIGSEGV when hstat.local_file is
>   NULL, opt.content_disposition has a role in leaving the value unset
> * src/http.c (gethttp): If hs->local_file is NULL (aka http_loop()'s
>   hstat.local_file), set it to the value of hs->metalink->origin
> ---
>  src/http.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/src/http.c b/src/http.c
> index 753f960..e7d18dd 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3380,6 +3380,9 @@ gethttp (const struct url *u, struct url *original_url, 
> struct http_stat *hs,
>if (metalink)
>  {
>hs->metalink = metalink_from_http (resp, hs, u);
> +  /* Bugfix: hs->local_file is NULL (opt.content_disposition).  */
> +  if (!hs->local_file && hs->metalink && hs->metalink->origin)
> +hs->local_file = xstrdup (hs->metalink->origin);
>xfree (hs->message);
>retval = RETR_WITH_METALINK;
>CLOSE_FINISH (sock);
> @@ -4499,7 +4502,10 @@ exit:
>if ((ret == RETROK || opt.content_on_error) && local_file)
>  {
>xfree (*local_file);
> -  *local_file = xstrdup (hstat.local_file);
> +  /* Bugfix: Prevent SIGSEGV when hstat.local_file was left NULL
> + (i.e. due to opt.content_disposition).  */
> +  if (hstat.local_file)
> +*local_file = xstrdup (hstat.local_file);
>  }
>free_hstat ();

ACK

Giuseppe



Re: [Bug-wget] [PATCH 22/25] Bugfix: Detect when a metalink:file doesn't have any hash

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> From 3555fe04ab1fc22c93d48f88584c7116d0356bb5 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Sun, 28 Aug 2016 14:03:56 +0200
> Subject: [PATCH 22/25] Bugfix: Detect when a metalink:file doesn't have any
>  hash
>
> * src/metalink.c (retrieve_from_metalink): Reject any metalink:file
>   without hashes. Prompt the error and switch to the next file
> * testenv/Makefile.am: Add new file
> * testenv/Test-metalink-xml-nohash.py: New file. Metalink/XML with no
>   hashes tests
>
> Prevent SIGSEGV.
> ---

ACK

Giuseppe



Re: [Bug-wget] [PATCH 23/25] New: --trust-server-names saves Metalink/HTTP xml files using the "name" field

2016-09-11 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> [Coverity Scan is ok, make syntax-check is ok, make check-valgrind is ok, 
> contrib/check-hard is ok]
>
> This allows to use the value of the "name" field, part of a Metalink/HTTP 
> application/metalink4+xml Link, as file name for the Metalink file to be 
> saved.
>
> This conforms to the RFC6249 standard:
> https://tools.ietf.org/html/rfc6249#section-4
>
> e.g.
> $ wget --metalink-over-http --metalink-index=inf --trust-server-names  a Metalink/HTTP header with application/metlaink4+xml links>
>
>   Link: ; rel=describedby; 
> type="application/metalink4+xml"; name="newname.meta4"
>
>   When --trust-server-names is used, file.ext.meta4 is saved as newname.meta4.
>
> Regards,
> Matthew
>
> -- 
> Matthew White 
>
> From 30b09eb7dbfbb9423efb6bd4009203f887d90786 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Tue, 30 Aug 2016 09:26:41 +0200
> Subject: [PATCH 23/25] New: --trust-server-names saves Metalink/HTTP xml files
>  using the "name" field
>
> * src/metalink.c (retrieve_from_metalink): If opt.trustservernames is
>   true, use the basename of the metaurl's name to save the xml file
> * doc/metalink-standard.txt: Update doc. With --trust-server-names any
>   Metalink/HTTP Link application/metalink4+xml file is saved using the
>   basename of the "name" field, if any. Update Metalink/HTTP examples
> * testenv/Makefile.am: Add new file
> * testenv/Test-metalink-http-xml-trust-name.py: New file. Metalink/HTTP
>   automated Metalink/XML, save xml files using the "name" field tests
> ---
>  doc/metalink-standard.txt|  10 +
>  src/metalink.c   |   2 +-
>  testenv/Makefile.am  |   1 +
>  testenv/Test-metalink-http-xml-trust-name.py | 273 
> +++
>  4 files changed, 285 insertions(+), 1 deletion(-)
>  create mode 100755 testenv/Test-metalink-http-xml-trust-name.py

ACK

Giuseppe



Re: [Bug-wget] [PATCH 19/25] Bugfix: Prevent sorting of unallocated metalink resource/metaurl

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> diff --git a/src/main.c b/src/main.c
> index ac6ee2c..11ea86d 100644
> --- a/src/main.c
> +++ b/src/main.c
> @@ -2136,10 +2136,11 @@ only if outputting to a regular file.\n"));
>for (mres_ptr = mfile->resources; *mres_ptr; mres_ptr++)
>  mres_count++;
>  
> -  stable_sort (mfile->resources,
> -   mres_count,
> -   sizeof (metalink_resource_t *),
> -   metalink_res_cmp);
> +  if (mres_count > 1)
> +stable_sort (mfile->resources,
> + mres_count,
> + sizeof (metalink_resource_t *),
> + metalink_res_cmp);
>  }
>  }
>retr_err = retrieve_from_metalink (metalink);

hm.. actually stable_sort checks for size, but I think it should check
for nmemb.  Could you please fix stable_sort (utils.c) to do the right
thing and replace this patch with that?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 19/25] Bugfix: Prevent sorting of unallocated metalink resource/metaurl

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> From cbed79d8394c8c0a74b2f65cb92d01178184e5fe Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Thu, 25 Aug 2016 15:26:18 +0200
> Subject: [PATCH 19/25] Bugfix: Prevent sorting of unallocated metalink
>  resource/metaurl
>
> * src/main.c (main): Sort metalink resources only if there is more
>   than 1 real resource allocated
> * src/http.c (metalink_from_http): Sort metalink resources only if
>   there is more than 1 real resource allocated
> * src/http.c (metalink_from_http): Sort metalink metaurls only if
>   there is more than 1 real metaurl allocated
>
> Trying to sort the resources/metaurls of a metalink when there is only
> one element allocated and no tail, or no elements at all, will lead to
> a segmentation fault.
>
> Also, sorting the resources/metaurls of a metalink when there is only
> one element allocated and a NULL tail is useless.
> ---
>  src/http.c | 6 --
>  src/main.c | 9 +
>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/src/http.c b/src/http.c
> index e6af7c1..a7ad76e 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -2933,8 +2933,10 @@ metalink_from_http (const struct response *resp, const 
> struct http_stat *hs,
>  
>/* Metalink data is OK. Now we just need to sort the resources based
>   on their priorities, preference, and perhaps location.  */
> -  stable_sort (mfile->resources, res_count, sizeof (metalink_resource_t *), 
> metalink_res_cmp);
> -  stable_sort (mfile->metaurls, meta_count, sizeof (metalink_metaurl_t *), 
> metalink_meta_cmp);
> +  if (res_count > 1)
> +stable_sort (mfile->resources, res_count, sizeof (metalink_resource_t 
> *), metalink_res_cmp);
> +  if (meta_count > 1)
> +stable_sort (mfile->metaurls, meta_count, sizeof (metalink_metaurl_t *), 
> metalink_meta_cmp);
>  
>/* Restore sensible preference values (in case someone cares to look).  */
>for (i = 0; i < res_count; ++i)
> diff --git a/src/main.c b/src/main.c
> index ac6ee2c..11ea86d 100644
> --- a/src/main.c
> +++ b/src/main.c
> @@ -2136,10 +2136,11 @@ only if outputting to a regular file.\n"));
>for (mres_ptr = mfile->resources; *mres_ptr; mres_ptr++)
>  mres_count++;
>  
> -  stable_sort (mfile->resources,
> -   mres_count,
> -   sizeof (metalink_resource_t *),
> -   metalink_res_cmp);
> +  if (mres_count > 1)
> +stable_sort (mfile->resources,
> + mres_count,
> + sizeof (metalink_resource_t *),
> + metalink_res_cmp);
>  }
>  }
>retr_err = retrieve_from_metalink (metalink);

stable_sort is already checking for size > 1.  Isn't this redundant?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 18/25] New: Parse Metalink/HTTP header for application/metalink4+xml

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> [Coverity Scan is ok, make syntax-check is ok, contrib/check-hard is ok]
>
> `make check-valgrind` fails the following metalink tests due to the bugs 
> which will be fixed in '[PATCH 19/25] Bugfix: Prevent sorting of unallocated 
> metalink resource/metaurl':
> * testenv/Test-metalink-http.py
> * testenv/Test-metalink-http-quoted.py
>
> This implements the parsing of Metalink/HTTP application/metalink4+xml Links.
>
> The following description is verbatim from the patch:
> -
> Add Metalink/HTTP application/metalink4+xml media types as metaurls to
> the metalink variable that will be used to download the files.
> -
>
> Regards,
> Matthew
>
> -- 
> Matthew White 
>
> From e1ff59ba9086bac0211e2cb54ef2b91038b45590 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Thu, 25 Aug 2016 05:09:47 +0200
> Subject: [PATCH 18/25] New: Parse Metalink/HTTP header for
>  application/metalink4+xml
>
> * src/http.c (metalink_from_http): Parse Metalink/HTTP header for
>   metaurls application/metalink4+xml media types
> * src/metalink.h: Add function declaration metalink_meta_cmp()
> * src/metalink.c: Add function metalink_meta_cmp() compare metalink

ACK

Giuseppe



Re: [Bug-wget] [PATCH 17/25] New test: Metalink shall not concatenate '/' to an empty directory prefix

2016-09-11 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> Matthew White 
>
> From 9f1492729e3771405b13b9375be45e76fdffad62 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Sat, 27 Aug 2016 16:16:35 +0200
> Subject: [PATCH 17/25] New test: Metalink shall not concatenate '/' to an
>  empty directory prefix
>
> * testenv/Makefile.am: Add new file
> * testenv/Test-metalink-xml-emptyprefix-trust.py: New file.
>   Metalink/XML empty directory prefix (--directory-prefix '') tests

could this share some code with testenv/Test-metalink-xml-homeprefix.py?
Anyway, not a big thing if it takes too much refactoring, so ACK.

Giuseppe



Re: [Bug-wget] [PATCH 16/25] Bugfix: Remove surrounding quotes from Metalink/HTTP key's value

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> +/*
> +  Remove the quotation surrounding a string.
> +
> +  The string is permanently modified.
> + */
> +void
> +dequote_metalink_string (char **str)
> +{
> +  char *new, *beg, *end;
> +  size_t str_len, new_len;
> +
> +  if (!str || !*str)
> +return;
> +
> +  str_len = strlen (*str); /* current string length */
> +
> +  if (str_len < 2)
> +return;
> +
> +  new_len = str_len - 2;   /* predict dequoted length */
> +
> +  beg = *str; /* begin of current string */
> +  end = *str + (str_len - 1); /* end of current string */
> +
> +  /* Verify if the current string is surrounded by quotes.  */
> +  if (!(*beg == '\"' && *end == '\"') && !(*beg == '\'' && *end == '\''))
> +return;

could we just verify that (*str[0] == '\"' || *str[0] == '\'') and in
case return before computing the strlen?

Giuseppe



Re: [Bug-wget] [PATCH 15/25] Bugfix: Process Metalink/XML url strings containing white spaces and CRLF

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> +void
> +clean_metalink_string (char **str)
> +{
> +  int c;
> +  size_t len;
> +  char *new, *beg, *end;
> +
> +  if (!str || !*str)
> +return;
> +
> +  beg = *str;
> +
> +  while ((c = *beg) && (c == '\n' || c == '\r' || c == '\t' || c == ' '))
> +beg++;
> +
> +  end = beg;
> +
> +  /* To not truncate a string containing spaces, search the first '\r'
> + or '\n' which ipotetically marks the end of the string.  */
> +  while ((c = *end) && (c != '\r') && (c != '\n'))
> +end++;
> +
> +  /* If we are at the end of the string, search the first legit
> + character going backward.  */
> +  if (*end == '\0')
> +while ((c = *(end - 1)) && (c == '\n' || c == '\r' || c == '\t' || c == 
> ' '))
> +  end--;
> +
> +  len = end - beg;
> +
> +  new = xmemdup0 (beg, len);

ACK

Giuseppe



Re: [Bug-wget] [PATCH 14/25] New test: Detect when there are no good Metalink url resources

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> [Coverity Scan is ok, make syntax-check is ok, make check-valgrind is ok, 
> contrib/check-hard is ok]
>
> This adds a new Metalink test.
>
> The following description is verbatim from the patch:
> -
> Test if when there are no good Metalink url resources there is any
> segmentation fault.
> -
>
> Regards,
> Matthew
>
> -- 
> Matthew White 
>
> From fe1c53b831387742e0d21c5e0fd1935596f4b1ed Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Tue, 23 Aug 2016 04:57:33 +0200
> Subject: [PATCH 14/25] New test: Detect when there are no good Metalink url
>  resources
>
> * testenv/Makefile.am: Add new file
> * testenv/Test-metalink-xml-nourls.py: New file. Metalink/XML unknown
>   urls tests
>
> Test if when there are no good Metalink url resources there is any
> segmentation fault.
> ---
>  testenv/Makefile.am |   3 +-
>  testenv/Test-metalink-xml-nourls.py | 196 
> 
>  2 files changed, 198 insertions(+), 1 deletion(-)
>  create mode 100755 testenv/Test-metalink-xml-nourls.py
>
> diff --git a/testenv/Makefile.am b/testenv/Makefile.am
> index 5a67822..3d9ea44 100644
> --- a/testenv/Makefile.am
> +++ b/testenv/Makefile.am
> @@ -45,7 +45,8 @@ if METALINK_IS_ENABLED
>  Test-metalink-xml-absprefix-trust.py\
>  Test-metalink-xml-homeprefix-trust.py   \
>  Test-metalink-xml-continue.py   \
> -Test-metalink-xml-size.py
> +Test-metalink-xml-size.py   \
> +Test-metalink-xml-nourls.py
>  else
>METALINK_TESTS =
>  endif
> diff --git a/testenv/Test-metalink-xml-nourls.py 
> b/testenv/Test-metalink-xml-nourls.py
> new file mode 100755
> index 000..60a748b
> --- /dev/null
> +++ b/testenv/Test-metalink-xml-nourls.py
> @@ -0,0 +1,196 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from misc.wget_file import WgetFile
> +import re
> +import hashlib
> +
> +"""
> +This is to test Metalink/XML with unknown url types.
> +
> +With --trust-server-names, trust the metalink:file names.
> +
> +Without --trust-server-names, don't trust the metalink:file names:
> +use the basename of --input-metalink, and add a sequential number
> +(e.g. .#1, .#2, etc.).
> +
> +Strip the directory from unsafe paths.
> +"""
> +# File Definitions 
> ###
> +bad = "Ouch!"
> +
> +File1 = "Would you like some Tea?"
> +File1_lowPref = "Do not take this"
> +File1_sha256 = hashlib.sha256 (File1.encode ('UTF-8')).hexdigest ()
> +
> +File2 = "This is gonna be good"
> +File2_lowPref = "Not this one too"
> +File2_sha256 = hashlib.sha256 (File2.encode ('UTF-8')).hexdigest ()
> +
> +File3 = "A little more, please"
> +File3_lowPref = "That's just too much"
> +File3_sha256 = hashlib.sha256 (File3.encode ('UTF-8')).hexdigest ()
> +
> +File4 = "Maybe a biscuit?"
> +File4_lowPref = "No, thanks"
> +File4_sha256 = hashlib.sha256 (File4.encode ('UTF-8')).hexdigest ()
> +
> +File5 = "More Tea...?"
> +File5_lowPref = "I have to go..."
> +File5_sha256 = hashlib.sha256 (File5.encode ('UTF-8')).hexdigest ()
> +
> +MetaXml = \
> +"""
> +http://www.metalinker.org/;>
> +  
> +GNU Wget
> +  
> +  
> +GNU GPL
> +http://www.gnu.org/licenses/gpl.html
> +  
> +  Wget Test Files
> +  1.2.3
> +  Wget Test Files description
> +  
> +
> +  
> +{{FILE1_HASH}}
> +  
> +  
> + preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file
> + preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404
> + preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File1_lowPref
> + preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File1
> +  
> +
> + 
> +  
> +{{FILE2_HASH}}
> +  
> +  
> + preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file
> + preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404
> + preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File2_lowPref
> + preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File2
> +  
> +
> +
> +  
> +{{FILE3_HASH}}
> +  
> +  
> + preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file
> + preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404
> + preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File3_lowPref
> + preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File3
> +  
> +
> + 
> +  
> +{{FILE4_HASH}}
> +  
> +  
> + preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file
> + preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404
> + preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File4_lowPref
> + preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File4
> +  
> +
> +
> +  
> +{{FILE5_HASH}}
> +  
> +

Re: [Bug-wget] [PATCH 13/25] New: Metalink file size mismatch returns error code METALINK_SIZE_ERROR

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> From 448e3c06cd9589509cd3914a43fe4c7e93e9d622 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Mon, 22 Aug 2016 09:13:52 +0200
> Subject: [PATCH 13/25] New: Metalink file size mismatch returns error code
>  METALINK_SIZE_ERROR
>
> * src/wget.h (uerr_t): Add error code METALINK_SIZE_ERROR to enum
> * src/metalink.c (retrieve_from_metalink): Use boolean variable
>   size_ok, when false set retr_err to METALINK_SIZE_ERROR
> * testenv/Makefile.am: Add new file
> * testenv/Test-metalink-xml-size.py: New file. Metalink/XML file size
>   tests ()

ACK

Giuseppe



Re: [Bug-wget] [PATCH 12/25] New test: --continue shall keep fully retrieved Metalink files (HTTP 416)

2016-09-11 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> [Coverity Scan is ok, make syntax-check is ok, make check-valgrind is ok, 
> contrib/check-hard is ok]
>
> This introduces a new Metalink test to verify that when --continue is used, 
> after a HTTP 416 answer a file that fails a sanity test isn't renamed/removed.
>
> The following description is verbatim from the patch:
> -
> Ensure that --continue doesn't rename/remove existing and/or fully
> retrieved files which fail the sanity tests.
> -

can this test be included together with the patch that introduces this change?

> +MetaXml = re.sub (r'{{FILE1_HASH}}', File1_sha256, MetaXml)
> +MetaXml = re.sub (r'{{FILE2_HASH}}', File2_sha256, MetaXml)
> +MetaXml = re.sub (r'{{FILE3_HASH}}', File3_sha256, MetaXml)
> +MetaXml = re.sub (r'{{FILE4_HASH}}', File4_sha256, MetaXml)
> +MetaXml = re.sub (r'{{FILE5_HASH}}', File5_sha256, MetaXml)
> +MetaXml = re.sub (r'{{SRV_HOST}}', srv_host, MetaXml)
> +MetaXml = re.sub (r'{{SRV_PORT}}', str (srv_port), MetaXml)
> +MetaFile.content = MetaXml

would just MetaXml.replace be enough here as well?

Thanks,
Giuseppe



Re: [Bug-wget] [PATCH 10/25] New document: Metalink/XML and Metalink/HTTP standard reference

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> [Coverity Scan is ok, make syntax-check is ok, make check-valgrind is ok, 
> contrib/check-hard is ok]
>
> This introduces the new document doc/metalink-standard.txt.
>
> The purpose of the document is to serve as a Metalink/XML and Metalink/HTTP 
> standard reference of the Metalink functionalities currently implemented in 
> Wget.
>
> Regards,
> Matthew
>
> -- 
> Matthew White 
>
> From 40442c885ab06dbef19caeef6bc4ba22a26dbb31 Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Fri, 19 Aug 2016 13:17:34 +0200
> Subject: [PATCH 10/25] New document: Metalink/XML and Metalink/HTTP standard
>  reference
>
> * doc/metalink-standard.txt: New doc. Implemented and recommended
>   Metalink/XML and Metalink/HTTP standard features
> ---
>  doc/metalink-standard.txt | 156 
> ++
>  1 file changed, 156 insertions(+)
>  create mode 100644 doc/metalink-standard.txt
>
> diff --git a/doc/metalink-standard.txt b/doc/metalink-standard.txt
> new file mode 100644
> index 000..d00c384
> --- /dev/null
> +++ b/doc/metalink-standard.txt
> @@ -0,0 +1,156 @@
> +GNU Wget Metalink recommended behaviour
> +
> +  Metalink/XML and Metalink/HTTP standard reference
> +
> +
> +1. Security features
> +
> +
> +Only metalink:file elements with safe "name" fields shall be accepted
> +[1 #section-4.1.2.1]. If unsafe metalink:file elements are saved, any
> +related test shall fail (see '2. Tests').
> +
> +By design, libmetalink rejects unsafe metalink:file elements [3]:
> +* lib/metalink_helper.c (metalink_check_safe_path): Verify path
> +
> +1.1 Exceptions
> +==
> +
> +The option --directory-prefix could allow to use an absolute, relative
> +or home path.
> +
> +2. Tests
> +
> +
> +Saving a file to an unexpected path poses a security problem. We must
> +ensure that Wget's automated tests never modify the root and the home
> +paths or descend/escalate to a relative path unexpectedly.
> +
> +2.1 Metalink/XML implemented tests
> +==
> +
> +* testenv/Test-metalink-xml.py: Accept safe paths
> +* testenv/Test-metalink-xml-abspath.py: Reject absolute paths
> +* testenv/Test-metalink-xml-relpath.py: Reject relative paths
> +* testenv/Test-metalink-xml-homepath.py: Reject home paths

ACK with these tests merged together.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 08/25] Implement Metalink/XML --directory-prefix option in Metalink module

2016-09-11 Thread Giuseppe Scrivano
> +  /* The directory prefix for opt.metalink_over_http is handled by
> + src/url.c (url_file_name), do not add it a second time.  */
> +  if (!metalink->origin && opt.dir_prefix && strlen (opt.dir_prefix))
> +filename = aprintf("%s/%s", opt.dir_prefix, mfile->name);

please leave a space between aprintf and '('.

Giuseppe



Re: [Bug-wget] [PATCH 07/25] Change mfile->name to filename in Metalink module's messages

2016-09-11 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> From 0a41d7a593bf8b1be7231146af2b8bdf7755e4fa Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Tue, 16 Aug 2016 23:18:46 +0200
> Subject: [PATCH 07/25] Change mfile->name to filename in Metalink module's
>  messages
>
> * src/metalink.c (retrieve_from_metalink): Change mfile->name to
>   filename when referring to the downloaded file
>
> The file name could have been changed by unique_create() (or by any
> other mean) before downloading. Use the name of the downloaded file
> (filename) when printing output which refer to it.
> ---

ACK

Giuseppe



Re: [Bug-wget] [PATCH 06/25] Add file size computation in Metalink module

2016-09-11 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:
> Matthew White 
>
> From f9da9729d70d79ce245877672a2d611ca9c4667f Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Tue, 16 Aug 2016 23:12:56 +0200
> Subject: [PATCH 06/25] Add file size computation in Metalink module
>
> * src/metalink.c (retrieve_from_metalink): Add file size computation
> * doc/metalink.txt: Update document. Remove resolved bugs
>
> Reject downloaded files when they do not agree with their Metalink/XML
> metalink:size: https://tools.ietf.org/html/rfc5854#section-4.2.14
>
> At the moment of writing, Metalink/HTTP headers do not provide a file
> size field. This information could be obtained from the Content-Length
> header field: https://tools.ietf.org/html/rfc6249#section-7
> ---
>  doc/metalink.txt | 11 ---
>  src/metalink.c   | 35 +++
>  2 files changed, 39 insertions(+), 7 deletions(-)

ACK

Giuseppe



Re: [Bug-wget] [PATCH 05/25] Update Metalink/XML tests and add a new test for home paths

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> +

same as before, can be just another test in the same file, not a
complete new test file.

Giuseppe



Re: [Bug-wget] [PATCH 04/25] Bugfix: Keep the download progress when alternating metalink:url

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> From f9fc03c0788675275041d0876d3e7fffd3f50eee Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Thu, 4 Aug 2016 11:35:42 +0200
> Subject: [PATCH 04/25] Bugfix: Keep the download progress when alternating
>  metalink:url
>
> * src/metalink.c (retrieve_from_metalink): On download error, resume
>   output_stream with the next mres->url. Keep fully downloaded files
>   started with --continue, otherwise rename/remove the file
>
> Before this patch, with --continue, existing and/or fully retrieved
> files which fail the sanity tests were renamed (--keep-badhash), or
> removed.
>
> This patch ensures that --continue doesn't rename/remove existing
> and/or fully retrieved files which fail the sanity tests.
> ---

ACK

Giuseppe



Re: [Bug-wget] [PATCH 03/25] Bugfix: Fix NULL filename and output_stream in Metalink module

2016-09-11 Thread Giuseppe Scrivano
Matthew White  writes:

> +  */
> +  if (!output_stream && (output_stream = fopen (filename, "rb")))
> +{
> +  fclose (output_stream);
> +  output_stream = fopen (filename, "ab");
> +}

please use file_exists_p instead of opening and closing the file:

if (!output_stream && file_exists_p (output_stream))
   output_stream = fopen (filename, "ab");

Giuseppe



Re: [Bug-wget] [PATCH 02/25] Add metalink description

2016-09-11 Thread Giuseppe Scrivano
Hi Matthew,

Matthew White  writes:

> Matthew White 
>
> From 2a418049a0678f781ff03cecd4bde4ecfdffec2e Mon Sep 17 00:00:00 2001
> From: Matthew White 
> Date: Fri, 5 Aug 2016 20:10:19 +0200
> Subject: [PATCH 02/25] Add metalink description
>
> * doc/metalink.txt
>
> Evaluation of "Directory Options" on the command line interacting with
> the option '--input-metalink=file':
>
> $ wget --input-metalink=file 
> ---

ACK

Giuseppe



Re: [Bug-wget] [PATCH 01/25] Add two Metalink/XML tests

2016-09-11 Thread Giuseppe Scrivano
Hi Matthew,

thanks for all your work, I am going to comment on each patch separately:

Matthew White  writes:


> index 000..041d772
> --- /dev/null
> +++ b/testenv/Test-metalink-xml-relpath.py
> @@ -0,0 +1,87 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from misc.wget_file import WgetFile
> +import re
> +import hashlib
> +
> +"""
> +This is to test if Metalink XML file escapes current directory.
> +"""
> +# File Definitions 
> ###
> +File1 = "Would you like some Tea?"
> +File1_lowPref = "Do not take this"
> +File1_sha256 = hashlib.sha256 (File1.encode ('UTF-8')).hexdigest ()
> +MetaXml = \
> +"""
> +http://www.metalinker.org/;>
> +  
> +GNU Wget
> +  
> +  
> +GNU GPL
> +http://www.gnu.org/licenses/gpl.html
> +  
> +  Wget Test File 1
> +  1.2.3
> +  Wget Test File 1 description
> +  
> +

If the difference between the two files is only here, could we have only
one test file and another template substition for file name?  Somethging
like ?


> +MetaXml = re.sub (r'{{FILE1_HASH}}', File1_sha256, MetaXml)
> +MetaXml = re.sub (r'{{SRV_HOST}}', srv_host, MetaXml)
> +MetaXml = re.sub (r'{{SRV_PORT}}', str (srv_port), MetaXml)
> +MetaFile.content = MetaXml

should be enough to use replace instead of the re module?

MetaXml = MetaXml.replace('{{FILE1_HASH}}', File1_sha256)

Thanks,
Giuseppe



Re: [Bug-wget] [PATCH] Patch to change behavior with redirects under --recurse.

2016-08-26 Thread Giuseppe Scrivano
Hi Dale,

wor...@alum.mit.edu (Dale R. Worley) writes:

> This is the change that I'm interested in.  I don't expect this to be
> put into the distribution without a lot of discussion.
>
> This version changes the behavior of --recurse:  If a file is
> downloaded, it will be scanned for links to follow.  This differs from
> the current behavior, in which the URL from which the contents were
> obtained (after any redirections) is further checked to see if that URL
> passes the recursion limitations.

I have emailed you separately the instructions to get copyright
assignments to the FSF before we can accept this change.

Regards,
Giuseppe



Re: [Bug-wget] Wget - acess list bypass / race condition PoC

2016-08-24 Thread Giuseppe Scrivano
Hi,

Eli Zaretskii  writes:

>>  #else /* def __VMS */
>> -  *fp = fopen (hs->local_file, "wb");
>> +  if (opt.delete_after
>> +|| opt.spider /* opt.recursive is implicitely true */
>> +|| !acceptable (hs->local_file))
>> +{
>> +  *fp = fdopen (open (hs->local_file, O_CREAT | O_TRUNC | 
>> O_WRONLY, S_IRUSR | S_IWUSR), "wb");
>> +}
>
> For this to work on MS-Windows, the 'open' call should use O_BINARY,
> in addition to the other flags.  Otherwise, the "b" in "wb" of
> 'fdopen' will be ignored by the MS runtime.

FYI: I have pushed the two patches, with the fix amended.

Cheers,
Giuseppe



Re: [Bug-wget] Wget - acess list bypass / race condition PoC

2016-08-21 Thread Giuseppe Scrivano
Hi Eli,

Eli Zaretskii <e...@gnu.org> writes:

>> From: Giuseppe Scrivano <gscriv...@gnu.org>
>> Date: Sun, 21 Aug 2016 15:26:58 +0200
>> Cc: "bug-wget@gnu.org" <bug-wget@gnu.org>,
>>  Dawid Golunski <da...@legalhackers.com>,
>>  "kseifr...@redhat.com" <kseifr...@redhat.com>
>> 
>>  #else /* def __VMS */
>> -  *fp = fopen (hs->local_file, "wb");
>> +  if (opt.delete_after
>> +|| opt.spider /* opt.recursive is implicitely true */
>> +|| !acceptable (hs->local_file))
>> +{
>> +  *fp = fdopen (open (hs->local_file, O_CREAT | O_TRUNC | 
>> O_WRONLY, S_IRUSR | S_IWUSR), "wb");
>> +}
>
> For this to work on MS-Windows, the 'open' call should use O_BINARY,
> in addition to the other flags.  Otherwise, the "b" in "wb" of
> 'fdopen' will be ignored by the MS runtime.

thanks for the review, I am going to amend it to the patch.

Giuseppe



Re: [Bug-wget] Wget - acess list bypass / race condition PoC

2016-08-21 Thread Giuseppe Scrivano
Hi,

"Misra, Deapesh" <dmi...@verisign.com> writes:

> Yes - I whole heartedly agree with the following:
>
>> 
>> To cite myself :)
>> "But there is also non-obvious wget behavior in creating those (temp) files 
>> in 
>> the filesystem."
>> 
>> The Wget docs just don't make clear that these files come into existence for 
>> a 
>> while. Of course we could amend the docs and lean back... but it still is 
>> not 
>> an intuitive behavior and I fear people might trap into that pit. And we 
>> could 
>> easily prevent it with some lines of code...
>> 
>> Regards, Tim
>
> Although I am late to this thread, I would like to elucidate the basic issue 
> I had with the current scenario with an analogy:
>
> If I assign a guard to a room and order the guard not to allow (say)
> people wearing yellow shirts, I intuitively expect that the people
> with yellow shirts will be prevented from entering the room and not
> that everyone will be allowed into the room and then the yellow
> shirted people will be asked to leave.
>
> When I had thought about the possible solutions, I had thought of storing the 
> files in a temporary location. But you guys (developers) are on the right 
> track with all your solutions and the ensuing discussion, IMHO.

If nobody is going to complain, I will push these patches shortly:

>From 1904fbfb40d817075a809124cb03640b9b7234df Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim=20R=C3=BChsen?= <tim.rueh...@gmx.de>
Date: Sun, 14 Aug 2016 21:04:58 +0200
Subject: [PATCH 1/2] Limit file mode to u=rw on temp. downloaded files

* bootstrap.conf: Add gnulib modules fopen, open.
* src/http.c (open_output_stream): Limit file mode to u=rw
on temporary downloaded files.

Reported-by: "Misra, Deapesh" <dmi...@verisign.com>
Discovered by: Dawid Golunski (http://legalhackers.com)
---
 bootstrap.conf |  2 ++
 src/http.c | 13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/bootstrap.conf b/bootstrap.conf
index 2b225b7..d9a5f90 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -40,6 +40,7 @@ dirname
 fcntl
 flock
 fnmatch
+fopen
 futimens
 ftello
 getaddrinfo
@@ -71,6 +72,7 @@ crypto/md5
 crypto/sha1
 crypto/sha256
 crypto/sha512
+open
 quote
 quotearg
 recv
diff --git a/src/http.c b/src/http.c
index 56b8669..d463f29 100644
--- a/src/http.c
+++ b/src/http.c
@@ -39,6 +39,7 @@ as that of the covered work.  */
 #include 
 #include 
 #include 
+#include 
 
 #include "hash.h"
 #include "http.h"
@@ -2471,7 +2472,17 @@ open_output_stream (struct http_stat *hs, int count, FILE **fp)
   open_id = 22;
   *fp = fopen (hs->local_file, "wb", FOPEN_OPT_ARGS);
 #else /* def __VMS */
-  *fp = fopen (hs->local_file, "wb");
+  if (opt.delete_after
+|| opt.spider /* opt.recursive is implicitely true */
+|| !acceptable (hs->local_file))
+{
+  *fp = fdopen (open (hs->local_file, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR), "wb");
+}
+  else
+{
+  *fp = fopen (hs->local_file, "wb");
+}
+
 #endif /* def __VMS [else] */
 }
   else
-- 
2.7.4

>From 7013f8260fc0a2b71f2414593aedb45c77972d98 Mon Sep 17 00:00:00 2001
From: Giuseppe Scrivano <gscri...@redhat.com>
Date: Sun, 21 Aug 2016 15:21:44 +0200
Subject: [PATCH 2/2] Append .tmp to temporary files

* src/http.c (struct http_stat): Add `temporary` flag.
(check_file_output): Append .tmp to temporary files.
(open_output_stream): Refactor condition to use hs->temporary instead.

Reported-by: "Misra, Deapesh" <dmi...@verisign.com>
Discovered by: Dawid Golunski (http://legalhackers.com)
---
 NEWS   |  6 ++
 src/http.c | 14 +++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 5073d7e..56c21a5 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,12 @@ See the end for copying conditions.
 
 Please send GNU Wget bug reports to <bug-wget@gnu.org>.
 
+* Changes in Wget X.Y.Z
+
+* On a recursive download, append a .tmp suffix to temporary files
+  that will be deleted after being parsed, and create them
+  readable/writable only by the owner.
+
 * Changes in Wget 1.18
 
 * By default, on server redirects to a FTP resource, use the original
diff --git a/src/http.c b/src/http.c
index d463f29..43bc2cb 100644
--- a/src/http.c
+++ b/src/http.c
@@ -1569,6 +1569,7 @@ struct http_stat
 #ifdef HAVE_METALINK
   metalink_t *metalink;
 #endif
+  bool temporary;   /* downloading a temporary file */
 };
 
 static void
@@ -2259,6 +2260,15 @@ check_file_output (struct url *u, struct http_stat *hs,
   xfree (local_file);
 }
 
+  hs->temporary = opt.delete_after || opt.spider

Re: [Bug-wget] Wget - acess list bypass / race condition PoC

2016-08-18 Thread Giuseppe Scrivano
Hi,

Tim Rühsen  writes:

> Please review / test this patch.
>
> Please check the 'Reported-by' in the commit message and if you got a CVE 
> number, please report for inclusion into the commit message (and/or the code).
>
> Regards, Tim
>
> On Mittwoch, 17. August 2016 10:40:35 CEST Dawid Golunski wrote:
>> Random file name + .part extension on temporary files would already be
>> good improvement (even if still stored within the same directory) and
>> help prevent the exploitation.

I still think we should used a fixed extension, not a random file name.
If wget crashes or the process is terminated for any reason, these files
will be left around.  With a deterministic name, at least we can recover
from what was left.

IMO, it is enough to open these files with rw only for the user and not
add any extra complexity.  It is not wget responsibility to take care of
a misconfigured server that allows to execute random files fetched from
http/ftp.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] Fix signal race condition

2016-08-09 Thread Giuseppe Scrivano
Hi Tobias,

Tobias Stoeckmann  writes:

> -  signal (SIGALRM, abort_run_with_timeout);
>if (SETJMP (run_with_timeout_env) != 0)
>  {
>/* Longjumped out of FUN with a timeout. */
>signal (SIGALRM, SIG_DFL);
>return true;
>  }
> +  else
> +{
> +  signal (SIGALRM, abort_run_with_timeout);
> +}
>alarm_set (timeout);
>fun (arg);

Looks fine to me.

I have added a line to the commit message and pushed your patch.

Thanks,
Giuseppe



[Bug-wget] GNU wget 1.18 released

2016-06-09 Thread Giuseppe Scrivano
Hello,

We are pleased to announce the new version of GNU wget.

This version fixes a security vulnerability (CVE-2016-4971) present in
all old versions of wget.  The vulnerability was discovered by Dawid
Golunski which were reported to us by Beyond Security's SecuriTeam.

On a server redirect from HTTP to a FTP resource, wget would trust the
HTTP server and uses the name in the redirected URL as the destination
filename.
This behaviour was changed and now it works similarly as a redirect from
HTTP to another HTTP resource so the original name is used as
the destination file.  To keep the previous behaviour the user must
provide --trust-server-names.

The new version is available for download here:

ftp://ftp.gnu.org/gnu/wget/wget-1.18.tar.gz
ftp://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz

and the GPG detached signatures using the key E163E1EA:

ftp://ftp.gnu.org/gnu/wget/wget-1.18.tar.gz.sig
ftp://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig

To reduce load on the main server, you can use this redirector service
which automatically redirects you to a mirror:

http://ftpmirror.gnu.org/wget/wget-1.18.tar.gz
http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz

Noteworthy changes:

* By default, on server redirects to a FTP resource, use the original
  URL to get the local file name. Close CVE-2016-4971.  This
  introduces a backward-incompatibility for HTTP->FTP redirects and
  any script that relies on the old  behaviour must use
  --trust-server-names.

* Check the HSTS file is not world-writable before using it.

* Parse  attributes on a recursive download.

* Fix problem with SNI server names having trailing dot(s)

* New options --bind-dns-address and --dns-servers.

* When Wget is built with libiconv, it now converts non-ASCII URIs to
  the locale's codeset when it creates files.  The encoding of the
  remote files and URIs is taken from --remote-encoding, defaulting to
  UTF-8.  The result is that non-ASCII URIs and files downloaded via
  HTTP/HTTPS and FTP will have names on the local filesystem that
  correspond to their remote names.

Please report any problem you may experience to the bug-wget@gnu.org
mailing list.

For the maintainers of wget,
Giuseppe



Re: [Bug-wget] New test

2016-06-03 Thread Giuseppe Scrivano
Hi Tim,

Tim Ruehsen  writes:

> functionality as your test and just pushed it. I mention you in the commit 
> message.

what do you think if we rename tests -> old-tests and testenv -> tests?

It will probably be clearer for contributors to not touch them.


Regards,
Giuseppe



Re: [Bug-wget] New test

2016-06-03 Thread Giuseppe Scrivano
Hi Zdenek,

Zdenek Dohnal  writes:

> Hello,
>
> I was fixing a bug from old version of wget (1.12), which is fixed in newer 
> version wget, and I created script in perl to test it.
> Script tests return value of "missing scheme" case. Do you want to add it 
> into your project?

would it be possible for you to rewrite it using the new Python tests
suite (testenv/)?  We would like patches to the mailing list to be send
in the "git format-patch" format, where the commit message is using the
ChangeLog format (you can take a look at recent commits in the git
repository to have an idea about it).

Thanks,
Giuseppe



Re: [Bug-wget] [PATCH] Fix python test suite

2016-04-09 Thread Giuseppe Scrivano
Hi Tim,

Tim Ruehsen  writes:

> @@ -27,7 +28,7 @@ class BaseTest:
>  Define the class-wide variables (or attributes).
>  Attributes should not be defined outside __init__.
>  """
> -self.name = name
> +self.name = os.path.basename(os.path.realpath(sys.argv[0]))
>  # if pre_hook == None, then {} (an empty dict object) is passed to
>  # self.pre_configs
>  self.pre_configs = pre_hook or {}

shouldn't we remove name from __init__?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-04-07 Thread Giuseppe Scrivano
Hi Juaristi,

"Juaristi Álamos, Ander"  writes:

> -  if (file_exists_p (filename))
> +  if (file_exists_p (filename) && hsts_file_access_valid (filename))

can the file_exists_p check just be moved to hsts_file_access_valid that
doesn't return an error on ENOENT?  In other words, just have here:

if (hsts_file_access_valid (filename))


Thanks,
Giuseppe



Re: [Bug-wget] [PATCH] Anyone want to add libcares support to wget?

2016-03-21 Thread Giuseppe Scrivano
Hi Tim,

Tim Rühsen  writes:

> diff --git a/README.checkout b/README.checkout
> index 3265c81..45c0585 100644
> --- a/README.checkout
> +++ b/README.checkout
> @@ -99,6 +99,9 @@ Compiling From Repository Sources
>  
>   * [47]GnuPG with GPGME is used to verify GPG-signed Metalink resources.
>  
> + * [48]libcares is needed to bind DNS resolving to a given IP address.
> +   The command line options --dns-servers and --bind-dns-address are
> +   only available when configured with --with-ares.

with-cares here?

Except a few missing spaces after the function names, I have no
objections for the code.

Thanks,
Giuseppe



Re: [Bug-wget] Anyone want to add libcares support to wget?

2016-03-19 Thread Giuseppe Scrivano
Hi Ben,

Ben Greear  writes:

>> Did you consider using a container (e.g. docker) for such a task ? Easy to 
>> set
>> up and you'll have your feature not only for wget. IMO, that is much more
>> flexible. (It was Giuseppe's idea during a private talk).
>
> Containers will not work for me.  I need to scale to thousands of
> instances on modest hardware.  I'm certain the libcares and binding
> approach will work because we do similar things with curl and other
> programs already.

how modest is this hardware?

I tried to build a minimal Docker container for wget on top of Alpine:

Dockerfile:

FROM alpine
RUN apk add --update wget
WORKDIR /out
ENTRYPOINT ["/usr/bin/wget"]

and simply running (192.168.1.13 is an internal DNS server):

docker run --dns=192.168.1.13 --rm -v $(pwd):/out:Z wget wdserver

takes around 5M, plus the docker daemon.

If this is not enough, you can even just run wget in a chroot, and
provide a different /etc/resolv.conf.

Cheers,
Giuseppe



Re: [Bug-wget] Anyone want to add libcares support to wget?

2016-03-19 Thread Giuseppe Scrivano
Hi Ben,

Ben Greear  writes:

> wget can already bind to a local IP.  It might be nice to add support for 
> SO_BINDTODEVICE, but not sure
> it is required for what I need.
>
> The rest of your suggestions are total hacks!

have you tried to run wget from a container?  Would that work for you?
I am a bit reluctant to add/test another dependency, so if there are
other possibilities let's explore them first.

Thanks,
Giuseppe



[Bug-wget] [bug #46611] log errors with --trust-server-names

2016-03-19 Thread Giuseppe Scrivano
Follow-up Comment #8, bug #46611 (project wget):

I think it is too risky to change this behavior now and for something that is
not clear.  Who knows how many scripts we are going to break with such a
change.

___

Reply to this item at:

  

___
  Messaggio inviato con/da Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] Patch for understanding srcset= on img tags.

2016-03-03 Thread Giuseppe Scrivano
Maksim Orlovich  writes:

>> should the condition be (c == ')' && in_paren)  ?
>
> Indeed.
>
> Thanks,
> Maks

Thanks for the changes, I am going to push it shortly.

Regards,
Giuseppe



Re: [Bug-wget] Google Summer of Code 2016

2016-03-01 Thread Giuseppe Scrivano
Kushagra Singh  writes:

> Hi,
>
> Will we be taking part in GSoC this year? I would really like to work on a
> project related to Wget this summer. Any specific ideas that are of
> importance to the community presently?

yes, we will be take part in GSoC.  I think we would like to see more
work happening on wget2, at the moment there is a list of issues on
github that can be useful to you to pick some ideas to work on:

  https://github.com/rockdaboot/wget2/issues

Could you take a look at it?  Do you see anything interesting that you
would like to work on?

Regards,
Giuseppe



Re: [Bug-wget] Patch for understanding srcset= on img tags.

2016-03-01 Thread Giuseppe Scrivano
Hi Maksim,

Maksim Orlovich  writes:

> Hi... wget currently doesn't understand HTML5's srcset= attribute on
> images. The attached adds support for it.
> This is under Google copyright, so should be covered by the company's
> copyright assignment with the FSF.
>
> If you might be interested in incorporating this in some form, please
> let me know if you want any changes (e.g. tests, etc.), ---
> not really familiar with how you folks do things.
>
> Hoping this may be of some use to someone else,
> Maks

thanks for your patch!  I have some comments.  Please amend this:

diff --git a/src/html-url.c b/src/html-url.c
index dff8d57..2f205c7 100644
--- a/src/html-url.c
+++ b/src/html-url.c
@@ -692,8 +692,8 @@ tag_handle_img (int tagid, struct taginfo *tag, struct 
map_context *ctx) {
   if (srcset)
 {
   /* These are relative to the input text. */
-  int base_ind = ATTR_POS(tag,attrind,ctx);
-  int size = strlen(srcset);
+  int base_ind = ATTR_POS (tag, attrind, ctx);
+  int size = strlen (srcset);
 
   /* These are relative to srcset. */
   int offset, url_start, url_end;


> +  /* If the URL wasn't terminated by a , there may also be a 
> descriptor
> + which we just skip. */
> +  if (has_descriptor)
> +{
> +  /* This is comma-terminated, except there may be one level of
> + parentheses escaping that. */
> +  bool in_paren = false;
> +  for (offset = url_end; offset < size; ++offset)
> +{
> +  char c = srcset[offset];
> +  if (c == '(')
> +in_paren = true;
> +  else if (c == '(' && in_paren)
> +in_paren = false;

should the condition be (c == ')' && in_paren)  ?


Thanks,
Giuseppe



Re: [Bug-wget] Support non-ASCII URLs

2016-01-12 Thread Giuseppe Scrivano
Eli Zaretskii <e...@gnu.org> writes:

>> From: Tim Rühsen <tim.rueh...@gmx.de>
>> Cc: Giuseppe Scrivano <gscriv...@gnu.org>, Eli Zaretskii <e...@gnu.org>
>> Date: Fri, 18 Dec 2015 22:41:29 +0100
>> 
>> 1. Maybe do_conversion() should take a char * argument instead of const char 
>> *. We avoid one ugly const -> non-const cast an also a warning about iconv.
>
> I agree.
>
>> 2. contrib/check-hard fails with
>> TESTS_ENVIRONMENT="LC_ALL=tr_TR.utf8 VALGRIND_TESTS=0" make check
>> 
>> FAIL: Test-iri-forced-remote
>> 
>> My son has birthday tomorrow, so I am not sure how much time I can spend on 
>> the weekend on this issue. Maybe Eli or you could have a look ?
>
> I cannot bootstrap the Git repo (too many prerequisites I don't have).
> Can you or someone else produce a distribution tarball out of Git that
> I could then build "as usual"?
>
> Also, can you show me the log of the failed test?  Turkish locales
> have "an issue" with certain upper/lower-case characters, maybe that's
> the problem.  Or maybe it's something else; looking at the log might
> give good clues.

sorry for taking so long, this is the log I get when I run

$ TESTS_ENVIRONMENT="LC_ALL=tr_TR.utf8 VALGRIND_TESTS=0" make check

===
   wget 1.17.1.10-c78d: tests/test-suite.log
===

# TOTAL: 85
# PASS:  84
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: Test-iri-forced-remote


Unescaped left brace in regex is deprecated, passed through in regex; marked by 
<-- HERE in m/{{ <-- HERE port}}/ at HTTPServer.pm line 313.
Unescaped left brace in regex is deprecated, passed through in regex; marked by 
<-- HERE in m/{{ <-- HERE port}}/ at HTTPTest.pm line 50.
Running test Test-iri-forced-remote
Calling /home/gscrivano/src/wget/tests/../src/wget --iri -e robots=on 
--trust-server-names --remote-encoding=iso-8859-1 -nH -r http://localhost:36098/
--2016-01-12 12:11:57--  http://localhost:36098/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:36098... connected.
HTTP request sent, awaiting response... 200 Ok
Length: 279 [text/html]
Saving to: ‘index.html’

 0K   100% 30,8M=0s

2016-01-12 12:11:57 (30,8 MB/s) - ‘index.html’ saved [279/279]

Loading robots.txt; please ignore errors.
--2016-01-12 12:11:57--  http://localhost:36098/robots.txt
Reusing existing connection to localhost:36098.
HTTP request sent, awaiting response... 200 Ok
Length: unspecified [text/plain]
Saving to: ‘robots.txt’

 0K0,00 =0s

2016-01-12 12:11:57 (0,00 B/s) - ‘robots.txt’ saved [0]

--2016-01-12 12:11:57--  http://localhost:36098/p1_fran%C3%A7ais.html
Connecting to localhost (localhost)|127.0.0.1|:36098... connected.
HTTP request sent, awaiting response... 200 Ok
Length: 278 [text/html]
Saving to: ‘p1_français.html’

 0K   100% 32,7M=0s

2016-01-12 12:11:57 (32,7 MB/s) - ‘p1_français.html’ saved [278/278]

--2016-01-12 12:11:57--  http://localhost:36098/p3_%C2%A4%C2%A4%C2%A4.html
Reusing existing connection to localhost:36098.
HTTP request sent, awaiting response... 200 Ok
Length: 119 [text/plain]
Saving to: ‘p3_¤¤¤.html’

 0K   100% 13,9M=0s

2016-01-12 12:11:57 (13,9 MB/s) - ‘p3_¤¤¤.html’ saved [119/119]

--2016-01-12 12:11:57--  http://localhost:36098/p2_%C3%A9%C3%A9n.html
Reusing existing connection to localhost:36098.
HTTP request sent, awaiting response... 200 Ok
Length: 254 [text/html]
Saving to: ‘p2_één.html’

 0K   100% 26,2M=0s

2016-01-12 12:11:57 (26,2 MB/s) - ‘p2_één.html’ saved [254/254]

FINISHED --2016-01-12 12:11:57--
Total wall clock time: 0,09s
Downloaded: 5 files, 930 in 0s (26,0 MB/s)
Test failed: file p1_français.html not downloaded
FAIL Test-iri-forced-remote.px (exit status: 1)




Re: [Bug-wget] Support non-ASCII URLs

2016-01-12 Thread Giuseppe Scrivano
Eli Zaretskii  writes:

> This was fixed by Tim in the meantime.  Are you running the current
> Git version?

sorry my mistake, I was using an outdated version.  All works now for me
as well.

Regards,
Giuseppe



Re: [Bug-wget] short option for --content-disposition (feature request)

2016-01-05 Thread Giuseppe Scrivano
Hanno Böck  writes:

> Hi,
>
> I quite often use the --content-disposition command line option of wget.
> It's a bit annoying to type in, but currently it seems there is no
> short option for it.
> Could such a short option be added?
>
> c and d are already taken and I think also all other characters in
> content disposition. So I'd like to propose to use -z or -y (just
> because they're not used yet and easy to remember), but I'd be okay
> with any other char.

you can shorten long command line options specifying only a prefix if it
doesn't collide with another one, in your case you can specify
--content-d (since wget has --content-on-error as well).

Regards,
Giuseppe



Re: [Bug-wget] Support non-ASCII URLs

2015-12-18 Thread Giuseppe Scrivano
Tim Rühsen <tim.rueh...@gmx.de> writes:

> Am Donnerstag, 17. Dezember 2015, 20:09:56 schrieb Eli Zaretskii:
>> > From: Tim Ruehsen <tim.rueh...@gmx.de>
>> > Cc: Giuseppe Scrivano <gscriv...@gnu.org>
>> > Date: Thu, 17 Dec 2015 17:50:47 +0100
>> > 
>> > @Eli: If my change is ok for Giuseppe, please apply the changes from iri.c
>> > to your patch. If possible, make a local commit and create the
>> > attachment/patch with 'git format -1' (or -2 for the latest two commits).
>> > That makes it easier for us to apply the patch since author (you) and
>> > commit message are copied as well.
>> 
>> Attached.
>
> Nice, thank you.
>
> There is just one test not passing: Test-ftp-iri.px
>
> Maybe the test is wrong (using --local-encoding=iso-8859-1, but writing to an 
> UTF-8 filename). I am not very much into FTP. How do we know the remote 
> encoding ?

the patch looks fine to me.  Eli, could you please modify the test the
pass and add a note in NEWS?

Thanks,
Giuseppe



Re: [Bug-wget] Support non-ASCII URLs

2015-12-18 Thread Giuseppe Scrivano
Eli Zaretskii  writes:

> Attached.

great, thanks!  Tim, do you have any objections or can we push it?

Regards,
Giuseppe



Re: [Bug-wget] flock is not available on solaris 10 (at least sparc)

2015-12-17 Thread Giuseppe Scrivano
Tim Rühsen  writes:

>> but we can perhaps use this when flock is not available.
>
> Sounds like doing gnulib's job !?
>
> If flock() is not available, gnulib throws in it's flock() emulation. They 
> already check for flock and fcntl being available. Not sure, what problem 
> exists on Solaris 11 - there should be a working fcntl available.
>
> Is the gnulib documentation up-to-date ?
>
> But back to 0... the flock failure has been addressed by adding it to 
> boostrap.conf.

not sure why it is documented that way, perhaps it is only fixing the
build issue but not really working?


> What Christian and I experience is this (just got it on OpenCSW Solaris 11):
>
> /opt/csw/lib/gcc/i386-pc-solaris2.10/5.2.0/include-
> fixed/sys/feature_tests.h:346:2: error: #error "Compiler or options invalid 
> for pre-UNIX 03 X/Open applications   and pre-2001 POSIX applications"
>  #error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
>   ^
> In file included from :12:0,
>  from ../lib/sys/types.h:28,
>  from sysdep.h:85,
>  from wget.h:47,
>  from ftp-ls.c:32:
>
>
> From feature_test.h:
> /*
>  * It is invalid to compile an XPG3, XPG4, XPG4v2, or XPG5 application
>  * using c99.  The same is true for POSIX.1-1990, POSIX.2-1992, POSIX.1b,
>  * and POSIX.1c applications. Likewise, it is invalid to compile an XPG6
>  * or a POSIX.1-2001 application with anything other than a c99 or later
>  * compiler.  Therefore, we force an error in both cases.
>  */
> #if defined(_STDC_C99) && (defined(__XOPEN_OR_POSIX) && !defined(_XPG6))
> #error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
> and pre-2001 POSIX applications"
> #elif !defined(_STDC_C99) && \
> (defined(__XOPEN_OR_POSIX) && defined(_XPG6))
> #error "Compiler or options invalid; UNIX 03 and POSIX.1-2001 applications \
> require the use of c99"
> #endif
>
>
> Seems pretty clear - gcc 5 has C99 as default standard (and Solaris has gcc 
> 5.2).
> But we explicitely define _XOPEN_SOURCE 500 in src/sysdep.h, which says we 
> want 'X/Open 5, POSIX 1995' functions and nothing newer.
> Solaris headers just make some checks to ensure proper definitions here and 
> they fail.
>
> When I simply remove the define, everything compiles ok.
> The question is if we still want what the comment says: "/* Request the "Unix 
> 98 compilation environment". */". I think, we should allow Wget to be 
> compiled 
> on newer environments as well.
>
> WDYT ?

yes, I agree.  Please proceed and drop that line.

Regards,
Giuseppe



Re: [Bug-wget] Support non-ASCII URLs

2015-12-17 Thread Giuseppe Scrivano
Hi Tim,

thanks to have worked on this:

Tim Ruehsen  writes:

> diff --git a/src/iri.c b/src/iri.c
> index 6c6e8d3..354bfd9 100644
> --- a/src/iri.c
> +++ b/src/iri.c
> @@ -180,16 +180,10 @@ do_conversion (const char *tocode, const char 
> *fromcode, char const *in_org, siz
>  }
>else if (errno == E2BIG) /* Output buffer full */
>  {
> -  char *new;
> -
>tooshort++;
>done = len;
> -  outlen = done + inlen * 2;
> -  new = xmalloc (outlen + 1);
> -  memcpy (new, s, done);
> -  xfree (s);
> -  s = new;
> -  len = outlen;
> +  len = outlen = done + inlen * 2;
> +  s = xrealloc(s, outlen + 1);

ACK with a space between "xrealloc" and '('  :-)

Regards,
Giuseppe




Re: [Bug-wget] flock is not available on solaris 10 (at least sparc)

2015-12-16 Thread Giuseppe Scrivano
Tim Ruehsen <tim.rueh...@gmx.de> writes:

> On Wednesday 16 December 2015 10:31:33 Giuseppe Scrivano wrote:
>> Tim Ruehsen <tim.rueh...@gmx.de> writes:
>> > On Tuesday 15 December 2015 21:55:53 Kiyoshi KANAZAWA wrote:
>> >> 2. about #error "Compiler or options invalid for pre-UNIX 03 X/Open
>> >> applications" This error may depends on compilers.
>> >> Both of  "solarisstudio12.4"  &  "gcc-4.8.5" can build wget-1.17.1
>> >> without
>> >> errors.
>> >> 
>> >> But I once experienced the similar problem when trying to compile
>> >> "sam2p-0.49.2". At that time, gcc-4.4.7 passed, but gcc-4.8.* stopped
>> >> with
>> >> this error and I had to set "-D_XPG6" as optional cc flags.
>> >> 
>> >> ./configure of wget checks compiler to accept ISO C89, such as
>> >> "checking for gcc -O3 option to accept ISO C89... none needed".
>> >> If "-std=c89" is required for some versions of gcc, shouldn't it set in
>> >> configure ?
>> > 
>> > You could try to insert AC_PROG_CC_C89 after AC_PROG_CC. AFAIK, it will
>> > add - std=c89 but only if needed (if the compiler does not support C89
>> > without it).
>> > 
>> > autoreconf && ./configure && make clean && make
>> > to see if the problem still occurs.
>> > 
>> > Setting CFLAGS directly in configure.ac is not recommended, AFAIK. Maybe
>> > just a set of compilers understands -std=c89 !? Also, setting -std=c89
>> > might be helpful or not, depending of the compiler and version. Nobody is
>> > going to waste time working on such a list/table.
>> > 
>> > Maybe we should ask the gnulib people, why flock() isn't emulated on
>> > Solaris when -std=c89 is missing !? I could simply be a bug...
>> 
>> the gnulib documentation for flock says:
>> 
>>   Portability problems not fixed by Gnulib:
>> 
>> This function is missing on some platforms:
>> AIX 5.1, HP-UX 11.23, Solaris 11 2011-11, BeOS.
>> 
>> This can either be because nobody cared before, or because there is no
>> way to emulate it on these platforms.
>> 
>> Couldn't we replace the flock part with a write to a tmp file+atomic
>> rename?  In the remote case of two wget processes trying to write the
>> same file, at least we won't get garbage there.  What do you think?
>
> There is no atomic rename for all OSes, AFAIK.
>
> How about using fcntl() for locking ? The gnulib emulation might be better 
> and 
> it is a POSIX function.

it will break on mingw :(

fcntl:

  Portability problems not fixed by Gnulib:
The replacement function does not support F_SETFD,
  F_GETFL, F_SETFL, F_GETOWN, F_SETOWN,
  F_GETLK, F_SETLK, and F_SETLKW on some platforms:
  mingw, MSVC 9.

but we can perhaps use this when flock is not available.

Regards,
Giuseppe



Re: [Bug-wget] Marking Release v1.17.1?

2015-12-16 Thread Giuseppe Scrivano
Eli Zaretskii <e...@gnu.org> writes:

>> From: Giuseppe Scrivano <gscriv...@gnu.org>
>> Cc: Gisle Vanem <gva...@yahoo.no>, bug-wget@gnu.org
>> Date: Wed, 16 Dec 2015 10:34:12 +0100
>> 
>> do you mind to send it in the git am format with a ChangeLog entry?
>
> Attached.  (I presume by "ChangeLog entry" you meant a commit log
> message formatted according to ChangeLog rules.)

yes, correct and thanks for the patch.  I've just pushed it.

Regards,
Giuseppe



Re: [Bug-wget] flock is not available on solaris 10 (at least sparc)

2015-12-16 Thread Giuseppe Scrivano
Tim Ruehsen  writes:

> On Tuesday 15 December 2015 21:55:53 Kiyoshi KANAZAWA wrote:
>> 2. about #error "Compiler or options invalid for pre-UNIX 03 X/Open
>> applications" This error may depends on compilers.
>> Both of  "solarisstudio12.4"  &  "gcc-4.8.5" can build wget-1.17.1 without
>> errors.
>> 
>> But I once experienced the similar problem when trying to compile
>> "sam2p-0.49.2". At that time, gcc-4.4.7 passed, but gcc-4.8.* stopped with
>> this error and I had to set "-D_XPG6" as optional cc flags.
>> 
>> ./configure of wget checks compiler to accept ISO C89, such as
>> "checking for gcc -O3 option to accept ISO C89... none needed".
>> If "-std=c89" is required for some versions of gcc, shouldn't it set in
>> configure ?
>
> You could try to insert AC_PROG_CC_C89 after AC_PROG_CC. AFAIK, it will add -
> std=c89 but only if needed (if the compiler does not support C89 without it).
>
> autoreconf && ./configure && make clean && make
> to see if the problem still occurs.
>
> Setting CFLAGS directly in configure.ac is not recommended, AFAIK. Maybe just 
> a set of compilers understands -std=c89 !? Also, setting -std=c89 might be 
> helpful or not, depending of the compiler and version. Nobody is going to 
> waste time working on such a list/table.
>
> Maybe we should ask the gnulib people, why flock() isn't emulated on Solaris 
> when -std=c89 is missing !? I could simply be a bug...

the gnulib documentation for flock says:

  Portability problems not fixed by Gnulib:

This function is missing on some platforms:
AIX 5.1, HP-UX 11.23, Solaris 11 2011-11, BeOS.

This can either be because nobody cared before, or because there is no
way to emulate it on these platforms.

Couldn't we replace the flock part with a write to a tmp file+atomic
rename?  In the remote case of two wget processes trying to write the
same file, at least we won't get garbage there.  What do you think?

Regards,
Giuseppe




Re: [Bug-wget] Marking Release v1.17.1?

2015-12-16 Thread Giuseppe Scrivano
Eli Zaretskii  writes:

> If we call fd_close here with a socket that failed to connect, wget
> hangs inside Gnulib's close_fd_maybe_socket, waiting for
> WSAEnumNetworkEvents that never returns.  Why it never returns, I
> don't know, but I suspect that a failed connection and a blocking
> socket have something to do with that.
>
> And yes, I think this should be applied.

do you mind to send it in the git am format with a ChangeLog entry?

Thanks,
Giuseppe



Re: [Bug-wget] Support non-ASCII URLs

2015-12-16 Thread Giuseppe Scrivano
Hi Eli,

thanks for working on it, I have a few questions:

Eli Zaretskii  writes:

> This second part is the main part of the change.  It uses 'iconv',
> when available, to convert the file names to the local encoding,
> before saving the files.  Note that the same function I modified is
> used by ftp.c, so downloading via FTP should also work with non-ASCII
> file names now; however, I didn't test that.
>
> Thanks.
>
> diff --git a/src/url.c b/src/url.c
> index c62867f..d984bf7 100644
> --- a/src/url.c
> +++ b/src/url.c
> @@ -43,6 +43,11 @@ as that of the covered work.  */
>  #include "host.h"  /* for is_valid_ipv6_address */
>  #include "c-strcase.h"
>  
> +#if HAVE_ICONV
> +#include 
> +#include 
> +#endif
> +
>  #ifdef __VMS
>  #include "vms.h"
>  #endif /* def __VMS */
> @@ -1531,6 +1536,90 @@ append_uri_pathel (const char *b, const char *e, bool 
> escaped,
>append_null (dest);
>  }
>  
> +static char *
> +convert_fname (const char *fname)
> +{
> +  char *converted_fname = (char *)fname;
> +#if HAVE_ICONV
> +  const char *from_encoding = opt.encoding_remote;
> +  const char *to_encoding = opt.locale;
> +  iconv_t cd;
> +  /* sXXXav : hummm hard to guess... */
> +  size_t len, done, inlen, outlen;
> +  char *s;
> +  const char *orig_fname = fname;;
> +
> +  /* Defaults for remote and local encodings.  */
> +  if (!from_encoding)
> +from_encoding = "UTF-8";
> +  if (!to_encoding)
> +to_encoding = nl_langinfo (CODESET);
> +
> +  cd = iconv_open (to_encoding, from_encoding);
> +  if (cd == (iconv_t)(-1))
> +logprintf (LOG_VERBOSE, _("Conversion from %s to %s isn't supported\n"),
> +quote (from_encoding), quote (to_encoding));
> +  else
> +{
> +  inlen = strlen (fname);
> +  len = outlen = inlen * 2;
> +  converted_fname = s = xmalloc (outlen + 1);
> +  done = 0;
> +
> +  for (;;)
> + {
> +   if (iconv (cd, , , , ) != (size_t)(-1))
> + {
> +   /* Flush the last bytes.  */
> +   iconv (cd, NULL, NULL, , );

should not the return code be checked here?

> +   *(converted_fname + len - outlen - done) = '\0';
> +   iconv_close(cd);
> +   DEBUGP (("Converted file name '%s' (%s) -> '%s' (%s)\n",
> +orig_fname, from_encoding, converted_fname, 
> to_encoding));
> +   return converted_fname;
> + }
> +
> +   /* Incomplete or invalid multibyte sequence */
> +   if (errno == EINVAL || errno == EILSEQ)
> + {
> +   logprintf (LOG_VERBOSE,
> +  _("Incomplete or invalid multibyte sequence 
> encountered\n"));
> +   xfree (converted_fname);
> +   converted_fname = (char *)orig_fname;
> +   break;
> + }
> +   else if (errno == E2BIG) /* Output buffer full */
> + {
> +   char *new;
> +
> +   done = len;
> +   outlen = done + inlen * 2;
> +   new = xmalloc (outlen + 1);
> +   memcpy (new, converted_fname, done);
> +   xfree (converted_fname);

What would be the extra cost in terms of copied bytes if we just replace
the three lines above with xrealloc?

Regards,
Giuseppe



Re: [Bug-wget] Marking Release v1.17.1?

2015-12-11 Thread Giuseppe Scrivano
Jernej Simončič  writes:

> On Tuesday, December 8, 2015, 21:45:31, Darshit Shah wrote:
>
>> With my last set of patches, we have fixed all the issues reported / 
>> identified
>> after the 1.17 release. Hence, maybe we should consider releasing a 1.17.1
>> bugfix release?
>
> Sorry about reporting this late, it forgot about it a bit: when I was
> preparing my Windows release, I noticed that metalink support first
> didn't want to compile (due to missing sys/errno.h - just #ifdefing it
> out didn't seem to break anything), and then that hash validation
> failed, because the file was being opened in text mode. I did a quick
> and dirty fix for my build
> ,
> but this probably isn't correct for non-Windows systems.

Thanks for the patch, one piece is already present upstream, I will
commit the missing part:

>From bf56bf4560cef3c1591487e7df1f2cb5e5ad0303 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jernej=20Simon=C4=8Di=C4=8D?=
 
Date: Fri, 11 Dec 2015 09:58:30 +0100
Subject: [PATCH] * src/metalink.c: Specify 'rb' as mode to open file

---
 src/metalink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/metalink.c b/src/metalink.c
index 986303d..25737b3 100644
--- a/src/metalink.c
+++ b/src/metalink.c
@@ -168,7 +168,7 @@ retrieve_from_metalink (const metalink_t* metalink)
   FILE *local_file;
 
   /* Check the digest.  */
-  local_file = fopen (filename, "r");
+  local_file = fopen (filename, "rb");
   if (!local_file)
 {
   logprintf (LOG_NOTQUIET, _("Could not open downloaded 
file.\n"));
-- 
2.5.0



> Another thing that would be nice to include is the taskbar progressbar
> patch:
> 
> This one is based on patch by Ángel González and tbprogress.c from
> Gisle Vanem.

Darshit, is there anything left on the progress bar?

I will probably tag 1.17.1 later today.


Regards,
Giuseppe



[Bug-wget] GNU wget 1.17.1 released

2015-12-11 Thread Giuseppe Scrivano
Hello,

I am pleased to announce the new version of GNU wget.  We consider it a
bug fixes release as it addresses issues found in 1.17, which contained
quite a few new features.

It is available for download here:

ftp://ftp.gnu.org/gnu/wget/wget-1.17.1.tar.gz
ftp://ftp.gnu.org/gnu/wget/wget-1.17.1.tar.xz

and the GPG detached signatures using the key E163E1EA:

ftp://ftp.gnu.org/gnu/wget/wget-1.17.1.tar.gz.sig
ftp://ftp.gnu.org/gnu/wget/wget-1.17.1.tar.xz.sig

To reduce load on the main server, you can use this redirector service
which automatically redirects you to a mirror:

http://ftpmirror.gnu.org/wget/wget-1.17.1.tar.gz
http://ftpmirror.gnu.org/wget/wget-1.17.1.tar.xz

Noteworthy changes:

* Fix compile error when IPv6 is disabled or SSL is not present.

* Fix HSTS memory leak.

* Fix progress output in non-C locales.

* Fix SIGSEGV when -N and --content-disposition are used together.

* Add --check-certificate=quiet to tell wget to not print any warning about
  invalid certificates.

Please report any problem you may experience to the bug-wget@gnu.org
mailing list.

Have fun!
Giuseppe



Re: [Bug-wget] --no-check-cert does not avoid cert warning

2015-12-10 Thread Giuseppe Scrivano
Ángel González <keis...@gmail.com> writes:

> Giuseppe Scrivano wrote:
>> Hi Ángel,
>>
>>
>> thanks for the suggestion, it looks fine to me.  I already pushed the
>> patch, could you prepare a new one that adds this part?
>>
>> Regards,
>> Giuseppe
> Sure. This was actually a diff against the file in master, so if you
> are ok with the contents,
> it only needed a commit log.

thanks, recommendations for users are never enough :)  I have shortened
the commit message and pushed it.

Giuseppe



Re: [Bug-wget] Fixing progress bar assertions in multibyte locales

2015-12-09 Thread Giuseppe Scrivano
Darshit Shah  writes:

> Hi everyone,
>
> As mentioned earlier, there was a bug in the progress bar
> implementation that caused the tests in multi-byte and multi-column
> locales to fail. I've rectified the issue by re-working each part of
> the progress bar to ensure that they are indeed using exactly the
> amount of space that has been allocated.
>
> I also tried to clean up some of the code. I tested these commits on
> Travis and there seem to be no issues. All tests are passing with
> Russian, Turkish and C locales.
>
> Please review the patch, and if good, I'll push it to master. Since it
> touches the most user-centric part of Wget, I'd like to see it as well
> reviewed and tested as possible.

I have looked at the patches and they seem fine to me.  Feel free to
push them.

Thanks,
Giuseppe



Re: [Bug-wget] Marking Release v1.17.1?

2015-12-08 Thread Giuseppe Scrivano
Darshit Shah  writes:

> With my last set of patches, we have fixed all the issues reported /
> identified after the 1.17 release. Hence, maybe we should consider
> releasing a 1.17.1 bugfix release?

yes, we should do that.  I can tag a new release tomorrow or on Friday.

Regards,
Giuseppe



Re: [Bug-wget] --no-check-cert does not avoid cert warning

2015-12-03 Thread Giuseppe Scrivano
Hi Karl,

Karl Berry  writes:

> the right thing to do is to listen to the Tim's opinion, 
>
> Sure.  I meant no disrespect!  Everything he said was perfectly
> reasonable, just insufficient for my purposes :).  (Thanks Tim.)
> 
> since he is one of the maintainers 
>
> I did not know.  He's not listed in the maintainers file ... maybe he
> should be listed as an official co-maintainer?

yes, both Tim and Darshit.

> No complaints from me.  I suppose it should also be listed in NEWS.
>
> The English could be tweaked if you wish, e.g.:

Thanks for the rewiew, I am going to push the patch with your
suggestion.

Regards,
Giuseppe



Re: [Bug-wget] metalink for wget releases & tests

2015-12-03 Thread Giuseppe Scrivano
Darshit Shah  writes:

> From 2d440d5b4f9abeb84d17fc5474d0102f2fffbcc3 Mon Sep 17 00:00:00 2001
> From: Darshit Shah 
> Date: Thu, 3 Dec 2015 10:49:37 +0100
> Subject: [PATCH] Include Metalink and GPG information in version
>
> * src/build_info.c.in: Include the presence of Metalink and GPGME features in
> the output for wget --version
> ---
>  src/build_info.c.in | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/build_info.c.in b/src/build_info.c.in
> index ce1fe25..83b7664 100644
> --- a/src/build_info.c.in
> +++ b/src/build_info.c.in
> @@ -9,6 +9,9 @@ ntlmdefined ENABLE_NTLM
>  opiedefined ENABLE_OPIE
>  psl defined HAVE_LIBPSL
>  
> +metalinkdefined HAVE_METALINK
> +gpgme   defined HAVE_GPGME
> +
>  ssl choice:
>  openssl defined HAVE_LIBSSL || defined HAVE_LIBSSL32
>  gnutls  defined HAVE_LIBGNUTLS

the patch looks fine, please push it.

Thanks,
Giuseppe



Re: [Bug-wget] --no-check-cert does not avoid cert warning

2015-12-02 Thread Giuseppe Scrivano
Hi Karl,

Karl Berry  writes:

> { wget -d xxx 2>&1 1>&3 | grep -v Saving 1>&2; } 3>&1
>
> This changes the exit value, so that's no good.  Sure, with even more
> complexity the exit status could be preserved too, but IMHO wrapping
> wget in layers of shell mechanisms to work around a warning is crazy.
>
> Giuseppe - please just do the right thing and provide a way to shut off
> the warning.  Especially since it's simple to do.  -k

the right thing to do is to listen to the Tim's opinion, since he is one
of the maintainers and he does a lot of work here.  I would not push any
change if he doesn't agree with it.

Tim agreed so I've changed the patch to add --check-certificate=quiet.

How does the new version look?

Thanks,
Giuseppe

diff --git a/doc/wget.texi b/doc/wget.texi
index c647e33..9cc2bb2 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -1725,6 +1725,9 @@ this option to bypass the verification and proceed with 
the download.
 site's authenticity, or if you really don't care about the validity of
 its certificate.}  It is almost always a bad idea not to check the
 certificates when transmitting confidential or important data.
+If you are really sure of what you are doing, you can specify
+--check-certificate=quiet to ask wget to not print any warning about
+invalid certificates, in most cases this is the wrong thing to do.
 
 @cindex SSL certificate
 @item --certificate=@var{file}
diff --git a/src/gnutls.c b/src/gnutls.c
index d1444fe..d39371f 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -692,6 +692,10 @@ ssl_check_certificate (int fd, const char *host)
   const char *severity = opt.check_cert ? _("ERROR") : _("WARNING");
   bool success = true;
 
+  /* The user explicitly said to not check for the certificate.  */
+  if (opt.check_cert == CHECK_CERT_QUIET)
+return success;
+
   err = gnutls_certificate_verify_peers2 (ctx->session, );
   if (err < 0)
 {
@@ -766,5 +770,5 @@ ssl_check_certificate (int fd, const char *host)
 }
 
  out:
-  return opt.check_cert ? success : true;
+  return opt.check_cert == CHECK_CERT_ON ? success : true;
 }
diff --git a/src/init.c b/src/init.c
index 67c94b9..87fbc9b 100644
--- a/src/init.c
+++ b/src/init.c
@@ -115,6 +115,7 @@ CMD_DECLARE (cmd_spec_secure_protocol);
 CMD_DECLARE (cmd_spec_timeout);
 CMD_DECLARE (cmd_spec_useragent);
 CMD_DECLARE (cmd_spec_verbose);
+CMD_DECLARE (cmd_check_cert);
 
 /* List of recognized commands, each consisting of name, place and
function.  When adding a new command, simply add it to the list,
@@ -152,7 +153,7 @@ static const struct {
   { "cadirectory",  _directory,  cmd_directory },
   { "certificate",  _file, cmd_file },
   { "certificatetype",  _type, cmd_cert_type },
-  { "checkcertificate", _cert,cmd_boolean },
+  { "checkcertificate", _cert,cmd_check_cert },
 #endif
   { "chooseconfig", _config, cmd_file },
   { "connecttimeout",   _timeout,   cmd_time },
@@ -415,7 +416,7 @@ defaults (void)
   opt.retr_symlinks = true;
 
 #ifdef HAVE_SSL
-  opt.check_cert = true;
+  opt.check_cert = CHECK_CERT_ON;
   opt.ftps_resume_ssl = true;
   opt.ftps_fallback_to_ftp = false;
   opt.ftps_implicit = false;
@@ -955,6 +956,18 @@ static bool simple_atof (const char *, const char *, 
double *);
  && (p)[3] == '\0')
 
 
+static int
+cmd_boolean_internal (const char *com, const char *val, void *place)
+{
+  if (CMP2 (val, 'o', 'n') || CMP3 (val, 'y', 'e', 's') || CMP1 (val, '1'))
+/* "on", "yes" and "1" mean true. */
+return 1;
+  else if (CMP3 (val, 'o', 'f', 'f') || CMP2 (val, 'n', 'o') || CMP1 (val, 
'0'))
+/* "off", "no" and "0" mean false. */
+return 0;
+  return -1;
+}
+
 /* Store the boolean value from VAL to PLACE.  COM is ignored,
except for error messages.  */
 static bool
@@ -962,24 +975,62 @@ cmd_boolean (const char *com, const char *val, void 
*place)
 {
   bool value;
 
-  if (CMP2 (val, 'o', 'n') || CMP3 (val, 'y', 'e', 's') || CMP1 (val, '1'))
-/* "on", "yes" and "1" mean true. */
-value = true;
-  else if (CMP3 (val, 'o', 'f', 'f') || CMP2 (val, 'n', 'o') || CMP1 (val, 
'0'))
-/* "off", "no" and "0" mean false. */
-value = false;
-  else
+  switch (cmd_boolean_internal (com, val, place))
 {
-  fprintf (stderr,
-   _("%s: %s: Invalid boolean %s; use `on' or `off'.\n"),
-   exec_name, com, quote (val));
-  return false;
-}
+case 0:
+  value = false;
+  break;
 
+case 1:
+  value = true;
+  break;
+
+default:
+  {
+fprintf (stderr,
+ _("%s: %s: Invalid boolean %s; use `on' or `off'.\n"),
+ exec_name, com, quote (val));
+return false;
+  }
+}
   *(bool *) place = value;
   return true;
 }
 
+/* Store the check_cert value from VAL to PLACE.  COM is ignored,
+   except for error messages.  */
+static bool
+cmd_check_cert (const char *com, const char *val, void 

Re: [Bug-wget] --no-check-cert does not avoid cert warning

2015-12-01 Thread Giuseppe Scrivano
Ángel González  writes:

> On 30/11/15 22:33, Tim Rühsen wrote:
>> There is the situation where --no-check-cert is implicitly set (.wgetrc,
>> /etc/wgetrc, alias) and the user isn't aware of it. Just downloading without 
>> a
>> warning opens a huge security hole because you can't verify where you
>> downloaded it from (DNS attacks, MITM).
>> I leave it to your imagination what could happen to people in unsafe
>> countries... this warning could save lives.
>>
>> For an expert like Karl, this is just annoying.
>>
>> The warning text could be worked on, makeing clear that you are really 
>> leaving
>> secure ground, that cert checking has been explicitly turned off and how to
>> turn it on again. And only proceed if you really, really are aware of what 
>> you
>> are doing.
>>
>> Of course all this applies to HTTP (plain text) as well. But someone
>> requesting HTTPS and than dropping the gained security should be warned by
>> default.
>>
>> My thinking is a pessimistic approach, but as long as you can't be 100% sure
>> that bad things can't happend due to dropping the warning, we should leave it
>> (and improve it the best we can).
>>
>> Tim
>
> An alternative to make  --no-check-certificate silent would be to
> provide a parameter to explicitely silence it:
>  --no-check-certificate=quiet

good idea, it looks like a good compromise.  Tim, would it work for you?
We will keep the current behavior, and brave users can use the new
parameter.

Regards,
Giuseppe



Re: [Bug-wget] --no-check-cert does not avoid cert warning

2015-11-30 Thread Giuseppe Scrivano
Hi Karl,


Karl Berry  writes:

> With wget 1.17 (at least),
>
> $ wget -nv --no-check-cert https://www.gnu.org -O /dev/null
> WARNING: cannot verify www.gnu.org's certificate, issued by 'CN=Gandi 
> Standard SSL CA 2,O=Gandi,L=Paris,ST=Paris,C=FR':
>   Unable to locally verify the issuer's authority.
>
> Maybe I'm crazy, but it seems like pointless noise to complain that a
> certificate cannot be verified when wget has been explicitly told not to
> check it.  Looking at the source, the only way I see to get rid of the
> warning is with --silent, which would also eliminate real errors.

the only difference with --no-check-cert is that wget will fail and exit
immediately when the certificate is not valid.  The idea behind
--no-check-cert was probably to not abort the execution of wget but
still inform the user about an invalid certificate, as the documentation
says:

  This option forces an ``insecure'' mode of
  operation that turns the certificate verification errors into warnings
  and allows you to proceed.

I am personally in favor of dropping the warning, as it is doing
something the user asked to not do.

Anybody has something against this patch?

Regards,
Giuseppe

diff --git a/doc/wget.texi b/doc/wget.texi
index c647e33..6aeda72 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -1714,9 +1714,7 @@ handshake and aborting the download if the verification 
fails.
 Although this provides more secure downloads, it does break
 interoperability with some sites that worked with previous Wget
 versions, particularly those using self-signed, expired, or otherwise
-invalid certificates.  This option forces an ``insecure'' mode of
-operation that turns the certificate verification errors into warnings
-and allows you to proceed.
+invalid certificates.
 
 If you encounter ``certificate verification'' errors or ones saying
 that ``common name doesn't match requested host name'', you can use
diff --git a/src/gnutls.c b/src/gnutls.c
index d1444fe..b48e4e8 100644
--- a/src/gnutls.c
+++ b/src/gnutls.c
@@ -686,12 +686,13 @@ ssl_check_certificate (int fd, const char *host)
 
   unsigned int status;
   int err;
-
-  /* If the user has specified --no-check-cert, we still want to warn
- him about problems with the server's certificate.  */
-  const char *severity = opt.check_cert ? _("ERROR") : _("WARNING");
+  const char *severity = _("ERROR");
   bool success = true;
 
+  /* The user explicitly said to not check for the certificate.  */
+  if (!opt.check_cert)
+return success;
+
   err = gnutls_certificate_verify_peers2 (ctx->session, );
   if (err < 0)
 {
@@ -766,5 +767,5 @@ ssl_check_certificate (int fd, const char *host)
 }
 
  out:
-  return opt.check_cert ? success : true;
+  return success;
 }
diff --git a/src/openssl.c b/src/openssl.c
index 4876048..f5fe675 100644
--- a/src/openssl.c
+++ b/src/openssl.c
@@ -673,15 +673,15 @@ ssl_check_certificate (int fd, const char *host)
   long vresult;
   bool success = true;
   bool alt_name_checked = false;
-
-  /* If the user has specified --no-check-cert, we still want to warn
- him about problems with the server's certificate.  */
-  const char *severity = opt.check_cert ? _("ERROR") : _("WARNING");
-
+  const char *severity = _("ERROR");
   struct openssl_transport_context *ctx = fd_transport_context (fd);
   SSL *conn = ctx->conn;
   assert (conn != NULL);
 
+  /* The user explicitly said to not check for the certificate.  */
+  if (!opt.check_cert)
+return success;
+
   cert = SSL_get_peer_certificate (conn);
   if (!cert)
 {
@@ -885,8 +885,7 @@ ssl_check_certificate (int fd, const char *host)
 To connect to %s insecurely, use `--no-check-certificate'.\n"),
quotearg_style (escape_quoting_style, host));
 
-  /* Allow --no-check-cert to disable certificate checking. */
-  return opt.check_cert ? success : true;
+  return success;
 }
 
 /*



[Bug-wget] [bug #46479] null pointer dereference: gnutls_free (ctx->session_data->data)

2015-11-20 Thread Giuseppe Scrivano
Update of bug #46479 (project wget):

 Open/Closed:Open => Closed 

___

Follow-up Comment #2:

thanks, the patch looks correct and I pushed it.

___

Reply to this item at:

  

___
  Messaggio inviato con/da Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] Wget 1.17 doesn't compile on Windows (hsts.c)

2015-11-17 Thread Giuseppe Scrivano
Darshit Shah  writes:

> I think it's because flock isn't implemented in Windows. The ideal
> solution would be to use some WIN32 API to get it done. However, I saw
> that gnulib actually provides an implementation of flock that can be
> used on Windows.
> Hence, maybe just using the gnulib module may work.
>
> @Giuseppe, @Tim: What do you'll think?

using the gnulib module should fix the build issue, from the gnulib code
I see in flock.c: "Emulate flock on platforms that lack it, primarily
Windows and MinGW."

I think your patch should be enough, feel free to push it.

Regards,
Giuseppe



Re: [Bug-wget] Cygwin

2015-11-13 Thread Giuseppe Scrivano
Jim Macaulay  writes:

> I am using Cygwin . If i use wget command i am getting and error as ,
>
> wget http://google.co.in/
> --2015-11-13 14:04:35--  http://google.co.in/
> Resolving google.co.in (google.co.in)... 216.58.196.163,
> 2404:6800:4003:809::2003
> Connecting to google.co.in (google.co.in)|216.58.196.163|:80... connected.
> HTTP request sent, awaiting response... Read error (Connection reset by
> peer) in headers.
> Retrying.
>
> [image: Inline image 1]
>
>   Can you kindly guide me in this .

how did you get the executable for wget?  Did you build it by yourself?
Can you please provide the version of wget you used (wget --version) and
attach the --debug output when you run your command?

I think the issue might be related to how we  manage
blocking/non-blocking sockets on Windows.

Regards,
Giuseppe



[Bug-wget] [bug #46061] wget https://contributors.debian.org fails with "No data received."

2015-09-28 Thread Giuseppe Scrivano
Follow-up Comment #3, bug #46061 (project wget):

Good analysis Tim!  I have attacheda patch that solves the problem for me

(file #35002)
___

Additional Item Attachment:

File name: 0001-gnutls-honor-error-GNUTLS_E_REHANDSHAKE.patch Size:4 KB


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] [PATCH] [bug #46061] wget https://contributors.debian.org fails with "No data received."

2015-09-28 Thread Giuseppe Scrivano
Tim Ruehsen  writes:

> Please review / test this patch.

we basically wrote the same patch :)

It looks fine to me, just please add a space in the function calls:

"_do_handshake(" -> "_do_handshake ("

and feel free to push it.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] [bug #46061] wget https://contributors.debian.org fails with "No data received."

2015-09-28 Thread Giuseppe Scrivano
Tim Ruehsen  writes:

> Please review / test this patch.
>
> BTW, I am not sure if contributors.debian.org is configured correctly.
> The rehandshake occurs right after the HTTP request and it has a pretty heavy 
>  
> impact on download duration.
>
> Regards, Tim
>
> On Sunday 27 September 2015 20:03:54 Tim Ruehsen wrote:
>> Follow-up Comment #2, bug #46061 (project wget):
>> 
>> Wget is not reacting on GNUTLS_E_REHANDSHAKE. Should be straight forward...
>> 
>> 
>> ___
>> 
>> Reply to this item at:
>> 
>>   
>> 
>> ___
>>   Nachricht gesendet von/durch Savannah
>>   http://savannah.gnu.org/
>
> From cbec5b0c780f9d1fc343fabf22e8ee7c7cb3222d Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Tim=20R=C3=BChsen?= 
> Date: Mon, 28 Sep 2015 12:00:33 +0200
> Subject: [PATCH] Handle TLS rehandshakes in GnuTLS code
>
> * src/gnutls.c: New static function _do_handshake()
> * src/gnutls.c (wgnutls_read_timeout): Handle rehandshake
> * src/gnutls.c (wgnutls_write): Handle rehandshake
> * src/gnutls.c (ssl_connect_wget): Move handshake code into _do_handshake()
>
> Fixes #46061
> ---
>  src/gnutls.c | 179 
> ++-
>  1 file changed, 102 insertions(+), 77 deletions(-)
>
> diff --git a/src/gnutls.c b/src/gnutls.c
> index a38301a..2f53592 100644
> --- a/src/gnutls.c
> +++ b/src/gnutls.c
> @@ -57,6 +57,9 @@ as that of the covered work.  */
>  #include "host.h"
>  
>  static int
> +_do_handshake(gnutls_session_t session, int fd, double timeout);
> +
> +static int
>  key_type_to_gnutls_type (enum keyfile_type type)
>  {
>switch (type)
> @@ -277,6 +280,12 @@ wgnutls_read_timeout (int fd, char *buf, int bufsize, 
> void *arg, double timeout)
>  {
>ret = gnutls_record_recv (ctx->session, buf, bufsize);
>timed_out = timeout && ptimer_measure (timer) >= timeout;
> +  if (!timed_out && ret == GNUTLS_E_REHANDSHAKE)
> +{
> +  DEBUGP (("GnuTLS: *** REHANDSHAKE while reading\n"));
> +  if ((ret = _do_handshake(ctx->session, fd, timeout)) == 0)
> +ret = GNUTLS_E_AGAIN; /* restart reading */
> +}
>  }
>  }
>while (ret == GNUTLS_E_INTERRUPTED || (ret == GNUTLS_E_AGAIN && 
> !timed_out));
> @@ -332,7 +341,15 @@ wgnutls_write (int fd _GL_UNUSED, char *buf, int 
> bufsize, void *arg)
>int ret;
>struct wgnutls_transport_context *ctx = arg;
>do
> -ret = gnutls_record_send (ctx->session, buf, bufsize);
> +{
> +  ret = gnutls_record_send (ctx->session, buf, bufsize);
> +  if (ret == GNUTLS_E_REHANDSHAKE)
> +{
> +  DEBUGP (("GnuTLS: *** REHANDSHAKE while writing\n"));
> +  if ((ret = _do_handshake(ctx->session, fd, 0)) == 0)

one thing: timeout here should be the remaining time instead of 0?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] [bug #46061] wget https://contributors.debian.org fails with "No data received."

2015-09-28 Thread Giuseppe Scrivano
Tim Ruehsen  writes:

> wgnutls_write is called without timeout. So, what can we do here ?
>
> But maybe we don't have to check for GNUTLS_E_REHANDSHAKE at this point at 
> all. Regarding 
> http://www.gnutls.org/manual/html_node/Re_002dauthentication.html, only 
> gnutls_record_recv() returns GNUTLS_E_REHANDSHAKE.
>
> If you don't mind, I'll remove the rehandshake code from wgnutls_write().

yes, if it cannot return that error code, please drop the code that
checks for it.

Regards,
Giuseppe



Re: [Bug-wget] [GSoC 2015] Basic HTTP/2 support

2015-08-26 Thread Giuseppe Scrivano
Daniel Stenberg dan...@haxx.se writes:

 On Thu, 30 Apr 2015, Daniel Stenberg wrote:

 Did anything happen with this project?

As far I know, nothing happened with it, but maybe Miquel has more
details about it.

Regards,
Giuseppe



Re: [Bug-wget] [bug #45790] wget prints it's progress even when background

2015-08-18 Thread Giuseppe Scrivano
Darshit Shah dar...@gmail.com writes:

 This affects an invokation using the shell's background operator () too.

 E.g.: wget 
 http://cdimage.debian.org/debian-cd/current/multi-arch/iso-cd/debian-8.1.0-amd64-i386-netinst.iso
 
 will cause the logging output and progress bar to be displayed on the
 terminal as explained in the bug report.

 However, I am not willing to fix that behaviour. A huge number of
 people copy URLs and paste them in their terminals for Wget to
 download without double-quoting them. A large number of these URLs
 have the  character which causes the shell to background the
 process. They tend to realise that something went wrong when the
 screen is garbled by a background process spewing messages to stdout
 and stderr. If this behaviour is changed, many people won't realise
 their error and un-necessarily invoke multiple instances of
 backgrounded Wget processes, eventually coming back here with new bug
 reports.

 The bahviour has remained so for a long time and I'm inclined to
 retain the status quo.

I agree with you here, the behavior should not be changed.  It would be
a bug if wget stops reporting errors when resumed.

Regards,
Giuseppe



Re: [Bug-wget] FTP PORT command code in v1.16.3?

2015-08-11 Thread Giuseppe Scrivano
Tim Ruehsen tim.rueh...@gmx.de writes:

 From d8d545994be399705c483ea924e71c3e6348d99d Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Tim=20R=C3=BChsen?= tim.rueh...@gmx.de
 Date: Tue, 11 Aug 2015 16:48:08 +0200
 Subject: [PATCH] Fix IP address exposure in FTP code

 * src/ftp.c (getftp): Do not use PORT when PASV fails.
 * tests/FTPServer.px: Add pasv_not_supported server flag.
 * tests/Makefile.am: Add Test-ftp-pasv-not-supported.px
 * tests/Test-ftp-pasv-not-supported.px: New test

 Fix IP address exposure when automatically falling back from
 passive mode to active mode (using the PORT command). A behavior that
 may be used to expose a client's privacy even when using a proxy.
 ---
  src/ftp.c| 19 +++-
  tests/FTPServer.pm   |  8 +
  tests/Makefile.am|  3 +-
  tests/Test-ftp-pasv-not-supported.px | 57 
 
  4 files changed, 79 insertions(+), 8 deletions(-)
  create mode 100755 tests/Test-ftp-pasv-not-supported.px

 diff --git a/src/ftp.c b/src/ftp.c
 index 68f1a33..9dab99c 100644
 --- a/src/ftp.c
 +++ b/src/ftp.c
 @@ -252,7 +252,6 @@ getftp (struct url *u, wgint passed_expected_bytes, wgint 
 *qtyread,
char *respline, *tms;
const char *user, *passwd, *tmrate;
int cmd = con-cmd;
 -  bool pasv_mode_open = false;
wgint expected_bytes = 0;
bool got_expected_bytes = false;
bool rest_failed = false;
 @@ -883,13 +882,19 @@ Error in server response, closing control 
 connection.\n));
? CONERROR : CONIMPOSSIBLE);
  }
  
 -  pasv_mode_open = true;  /* Flag to avoid accept port */
if (!opt.server_response)
  logputs (LOG_VERBOSE, _(done.));
 -} /* err==FTP_OK */
 -}
 +}
 +  else
 +return err;
  
 -  if (!pasv_mode_open)   /* Try to use a port command if PASV failed */
 +  /*
 +   * We do not want to fall back from PASSIVE mode to ACTIVE mode !
 +   * The reason is the PORT command exposes the client's real IP 
 address
 +   * to the server. Bad for someone who relies on privacy via a ftp 
 proxy.
 +   */
 +}
 +  else
  {
err = ftp_do_port (csock, local_sock);
/* FTPRERR, WRITEFAILED, bindport (FTPSYSERR), HOSTERR,
 @@ -1148,8 +1153,8 @@ Error in server response, closing control 
 connection.\n));
  }
  
/* If no transmission was required, then everything is OK.  */
 -  if (!pasv_mode_open)  /* we are not using pasive mode so we need
 -  to accept */
 +  if (!opt.ftp_pasv)  /* we are not using passive mode so we need
 + to accept */
  {
/* Wait for the server to connect to the address we're waiting
   at.  */

ACK from me.  Could you please also update NEWS?  It looks like some
important change we want to inform people about :)

Regards,
Giuseppe



Re: [Bug-wget] FTP PORT command code in v1.16.3?

2015-08-11 Thread Giuseppe Scrivano
Tim Ruehsen tim.rueh...@gmx.de writes:

 On Tuesday 11 August 2015 17:24:42 Giuseppe Scrivano wrote:
 Tim Ruehsen tim.rueh...@gmx.de writes:
  * src/ftp.c (getftp): Do not use PORT when PASV fails.
  * tests/FTPServer.px: Add pasv_not_supported server flag.
  * tests/Makefile.am: Add Test-ftp-pasv-not-supported.px
  * tests/Test-ftp-pasv-not-supported.px: New test
  
  Fix IP address exposure when automatically falling back from
  passive mode to active mode (using the PORT command). A behavior that
  may be used to expose a client's privacy even when using a proxy.
 
 ACK from me.  Could you please also update NEWS?  It looks like some
 important change we want to inform people about :)
 
 Regards,
 Giuseppe

 Updated NEWS and the description in tests/Test-ftp-pasv-not-supported.px.

Looks fine to me, feel free to push it.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH 2/2] Rewrite the --rejected-log test using the new framework.

2015-08-07 Thread Giuseppe Scrivano
Jookia 166...@gmail.com writes:

  * tests/Test--rejected-log.px: Remove old test.
  * testenv/Test--rejected-log.py: Create new test.
 ---
  testenv/Makefile.am   |   1 +
  testenv/Test--rejected-log.py | 104 +++
  tests/Makefile.am |   1 -
  tests/Test--rejected-log.px   | 138 
 --
  4 files changed, 105 insertions(+), 139 deletions(-)
  create mode 100755 testenv/Test--rejected-log.py
  delete mode 100755 tests/Test--rejected-log.px

Fast :-)  Thanks for the additional test.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] Add option to write URL rejections to a tab-delimited CSV log.

2015-08-06 Thread Giuseppe Scrivano
Jookia 166...@gmail.com writes:

  * main.c: Add --rejected-log option.
  * init.c: Add rejectedlog command.
  * options.h: Add rejected_log parameter string.
  * wget.texi: Add brief documentation on new --rejected-log option.
  * recur.c: Optionally log details of URLs not traversed.
Add reject_reason enum.
(download_child_p - download_child): Return a reject_reason.
(descend_redirect_p - descend_redirect): Return a reject_reason.
(retrieve_tree): Support logging reasons for rejection.
Add write_reject_log_header that writes a CSV format header to a file.
Add write_reject_log_url that writes a url struct to a file in CSV format.
Add write_reject_log_reason that writes the URL and parent URL as well as 
 the
rejection reason to a CSV file.
  * Test--rejected-log.px: Add a basic test for the --rejected-log command.
  * tests/Makefile.am: Run Test--rejected-log.px.

 This allows you to figure out why URLs are being rejected and some context
 around it. CSV is used as the output format since it can be used easily 
 parsed,
 it's delimited by tabs instead of commas to allow using all (quoted) URL
 characters and includes column names which may be used for compatibility.
 ---
  doc/wget.texi   |   5 ++
  src/init.c  |   2 +
  src/main.c  |   3 +
  src/options.h   |   2 +
  src/recur.c | 189 
 
  tests/Makefile.am   |   1 +
  tests/Test--rejected-log.px | 138 
  7 files changed, 308 insertions(+), 32 deletions(-)
  create mode 100755 tests/Test--rejected-log.px

Thanks for the patch, I have nothing against it, if nobody complains
before then I am going to push it later today.

I forgot to ask you about writing a Python test instead of
tests/Test--rejected-log.px, under testenv/ as our plan to get rid of
the old Perl tests suite under tests/.  If you have some extra time
would you please do that as a separate patch?

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] rejected-log: Add option to dump URL rejections to a log.

2015-07-27 Thread Giuseppe Scrivano
Jookia 166...@gmail.com writes:

 This allows you to figure out why URLs are being rejected and some context
 around it.
 ---
  doc/wget.texi |   5 ++
  src/init.c|   2 +
  src/main.c|   3 ++
  src/options.h |   2 +
  src/recur.c   | 149 
 --
  5 files changed, 135 insertions(+), 26 deletions(-)

I tried the patch but some tests fail after I apply it, could you verify
that make check passes without problems?  It seems wget exits with the
wrong return code.

In addition, could you add a ChangeLog-like format for the commit
message?  You can check recent commits to see how the style looks like.

Would be nice to have a test for the new option.

Thanks,
Giuseppe



Re: [Bug-wget] [bug #20398] Save a list of the links that were not followed

2015-07-21 Thread Giuseppe Scrivano
Jookia 166...@gmail.com writes:

 On Mon, Jul 20, 2015 at 09:31:45AM +0200, Giuseppe Scrivano wrote:
 Jookia 166...@gmail.com writes:
 
  Hello again!
 
  A few days ago I was informed that my assignment letter was received by 
  the FSF.
 
  Is it possible to move forward with this patch now that any work I create 
  for
  Wget can be legally included? I'm happy to make modifications as needed.
 
 could you please send the last version rebased on top of git master
 branch?
 
 Thanks!
 Giuseppe

 No problem, done! It should be in the issue tracker.

it doesn't apply to master.  Please rebase it and send it as attachment
to bug-wget@gnu.org, this is the place where we do patch reviews.

Thanks,
Giuseppe



Re: [Bug-wget] [bug #20398] Save a list of the links that were not followed

2015-07-20 Thread Giuseppe Scrivano
Jookia 166...@gmail.com writes:

 Hello again!

 A few days ago I was informed that my assignment letter was received by the 
 FSF.

 Is it possible to move forward with this patch now that any work I create for
 Wget can be legally included? I'm happy to make modifications as needed.

could you please send the last version rebased on top of git master
branch?

Thanks!
Giuseppe



Re: [Bug-wget] Metalink support

2015-07-20 Thread Giuseppe Scrivano
Hubert Tarasiuk hubert.taras...@gmail.com writes:

 W dniu 04.07.2015 o 00:15, Anthony Bryan pisze:

 Of course it works for me, but we are not talking about me :-)

 The main question is: what does the typical wget user expect ?
 I would thinks (s)he expects the download of the file described by the
 .metalink file. Of course a few users really want to have only the .metalink
 file (for testing/inspection). So there should be a possibility to do 
 exactly
 that (e.g. --keep-metalink).

 Before making any decision/action we should wait for some other voices.
 
 I'm not sure what the typical wget user expects, but here are some examples.
 
 if you keep the current behavior, it might be helpful to print to the
 user that they've downloaded a metalink, and how to make wget use it.
 
 aria2 (another command line metalink downloader) uses these commands
 to download the metalink and process it (download the file described
 by it  do a hash check) by default. you have to specify
 '--follow-metalink=false' to only download the metalink XML file.
 I like that option.

 Also, I am including the amended patches. There was an issue with
 freeing memory when the Metalink parsing failed (I have moved the
 metalink_delete call from outside to inside of the 'else' block in
 main.c:1806).

pushed!

Giuseppe



  1   2   3   4   5   6   7   >