Re: [Bug-wget] [PATCH] Failing tests

2014-10-20 Thread Giuseppe Scrivano
Darshit Shah dar...@gmail.com writes:

 Hi Tim,

 Thanks for the patch. This patch fixes the issues with the tests
 failing when make check was run from the tests/ directory. Seems to
 pass all my initial tests.

 Maybe we should push this? Giuseppe?

sorry for the delay.  I've just pushed it.

Regards,
Giuseppe



Re: [Bug-wget] [Bug-Wget] Summary of Pending Patches

2014-10-20 Thread Giuseppe Scrivano
Darshit Shah dar...@gmail.com writes:

 I was going through my local repository and noticed that the following
 patches are currently pending. I wanted a final round of reviews on
 them so that we may merge them into mainline.

 I've tried to explain the context behind each patch below and have
 also resent the version of each patch I had locally to the relevant
 threads. So, I'm sorry for spamming your mailboxes.

 1. Fix make distcheck for Python tests

 Tim discovered that the Python based test suite wasn't passing make
 distcheck due to a variety of reasons. I've fixed most of all the
 issues with the code.
 Some hacks still need to be eliminated. But in general, everything works right
 now.

 2. Minor optimizations of Python tests

 While working with the Python test suite to fix the above mentioned issues, I
 came across and fixed some inefficiencies / cruft code in the test suite. 
 Those
 have been eliminated in this patch.

please push these two, let's have a release this weekend or by the
beginning of the next week.

Regards,
Giuseppe



Re: [Bug-wget] wget

2014-10-20 Thread Bryan Baas
Fair enough.  Thank all for the timely replies.


-- 
Bryan Baas
Weyco IT
x1808
414 241 0499 (cell)

On 10/18/2014 10:44 AM, Yousong Zhou wrote:
 Hi, Bryan
 
 Am 18.10.2014 21:43 schrieb Bryan Baas bb...@weycogroup.com
 mailto:bb...@weycogroup.com:

 Hi,

 I was wondering about the command output of wget.  I used a Java Runtime
 exec and, although the wget process ended with a 0 completion code, the
 results appeared in the error stream and not the output stream.

 As a further test, I executed the same command at the command line and
 redirected output to a file using the  operator.  Upon completion the
 file was empty, but the results scrolled down the screen.  This had me
 thinking that the wget command itself is directing its regular output to
 sderr instead of stdout.
 
 Yes, that is the expected.  It is possible to set the output file to
 stdout with -O - in which case you do not want to see output of wget
 itself and the file content mangled together.
 

 The results of the wget command, from what I could tell, weren't error
 conditions but regular output from a successful execution.

 
 I think it is a convention that debug, informational, error, verbose
 output of unix programs be written to stderr.  However, the choice of
 redirecting stderr to whatever file descriptor users prefer is always
 available.
 
 regards.
 
 yousong
 
 Your feedback would be appreciated.

 regards,


 --
 Bryan Baas
 Weyco IT
 x1808
 414 241 0499 (cell)

 






Re: [Bug-wget] Send Content-Length with POST 0 length body

2014-10-20 Thread Giuseppe Scrivano
Matthew Atkinson mutley...@ntlworld.com writes:

 diff --git a/src/ChangeLog b/src/ChangeLog
 index 1c4e2d5..447179e 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-19  Matthew Atkinson  mutley...@ntlworld.com
 +
 + * http.c (gethttp): Always send Content-Length header when method is 
 post,
 + even with no post body, as some servers will reject the request 
 otherwise
 +
  2014-05-03  Tim Ruehsen  tim.rueh...@gmx.de
  
   * retr.c (retrieve_url): fixed memory leak
 diff --git a/src/http.c b/src/http.c
 index 4b99c17..e020682 100644
 --- a/src/http.c
 +++ b/src/http.c
 @@ -1875,6 +1875,8 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, 
 struct url *proxy,
xstrdup (number_to_static_string 
 (body_data_size)),
rel_value);
  }
 +  else if (strcasecmp (opt.method, post) == 0)
 +request_set_header (req, Content-Length, 0, rel_none);

should we do this only for POST requests?  What about doing it in any
case that !(opt.body_data || opt.body_file)?

Regards,
Giuseppe



Re: [Bug-wget] Bug #43324: file names don't show up correctly in the log when using --backups

2014-10-20 Thread Tim Rühsen
Am Montag, 13. Oktober 2014, 10:36:41 schrieb Tim Ruehsen:
 On Sunday 12 October 2014 10:48:05 Bartosz Szczesny wrote:
  Hello,
  
  I think https://savannah.gnu.org/bugs/?43324 is not a bug (see
  https://savannah.gnu.org/bugs/?43324#comment2) and can be closed.
 
 Hi Bartosz,
 
 thanks for pointing that out. I think you are right.
 
 The 'bug' is not a bug.
 
 If someone needs a more verbose output regarding file name rotation, a new
 'wishlist' bug should be opened.

I just closed this bug.

Thanks, Bartosz.

Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] [Bug-Wget] Summary of Pending Patches

2014-10-20 Thread Ángel González

On 20/10/14 05:14, Darshit Shah wrote:
Maybe we could automate all of these under a single make target? Or 
else, I

intend to set up a Travis CI build every week to run these tests. What is
everyone's opinion on this?


IMHO, they should be in a script / make target (make maintainer-tests ? ☺)
Otherwise, the list of incantations is not easily discoverable and might 
even be forgotten.







Re: [Bug-wget] Send Content-Length with POST 0 length body

2014-10-20 Thread Matthew Atkinson
On 20/10/14 16:26, Giuseppe Scrivano wrote:
 should we do this only for POST requests?  What about doing it in any
 case that !(opt.body_data || opt.body_file)?
 
 Regards,
 Giuseppe

From http://tools.ietf.org/html/rfc7230#section-3.3.2
 A user agent SHOULD send a Content-Length in a request message when
 no Transfer-Encoding is sent and the request method defines a
 meaning for an enclosed payload body.  For example, a Content-Length
 header field is normally sent in a POST request even when the value
 is 0 (indicating an empty payload body).  A user agent SHOULD
 NOT send a Content-Length header field when the request message
 does not contain a payload body and the method semantics do not
 anticipate such a body.

I would think that we should do this for POST and PUT, but nothing
else

Thanks,
Matt



Re: [Bug-wget] [Win32 Patch] console-close events

2014-10-20 Thread Ángel González

On 20/10/14 03:03, Darshit Shah wrote:

On 06/17, Gisle Vanem wrote:

Ángel González keis...@gmail.com wrote:
PS. The above Sleep() seems to be ignored by WinCon. At least I 
failed to

make it sleep more than ~500 msec.

There may be a timeout on how long you can stay processing the event.

Why do you need that Sleep() call at all? I would remove  it.


When logging to the console (no '-o log-file' option), the Sleep(500) 
will make
the final ... cleanup. message stay a tiny bit longer (but barely 
readable).

Without a Sleep(), the console gets closed with only a message-beep.



Any final reviews / comments on this patch? I haven't tested it out 
for the lack of a Windows system. If there are no objections, maybe we 
can push this patch?
I would expect someone running a console application to have a console 
open. I understand the rationale when it's a beginner learning to 
program and is creating a console application, but don't really see a 
usecase for wget. How are you running wget that you get an autoclosing 
console?


Cheers




Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Pär Karlsson
Thank you for your feedback and suggestions.

I thought about this during the weekend and figured it could be done much
more efficiently, by only looping through the arguments once. Also, I
realized the the original version would have segfaulted if called with more
than 5 args, since the destination memory never got allocated before the
memcpy (in the current code, it never is, though!).

I cleaned up the code according to the GNU coding standards as well. The
test suite rolls with this, and I think it looks better (although the
function is only really called in a handful of places in all of wget).

Best regards,

/Pär

Patch below:

diff --git a/src/ChangeLog b/src/ChangeLog
index d5aeca0..87abd85 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,8 @@
+2014-10-20 Pär Karlsson  feino...@gmail.com
+
+   * utils.c (concat_strings): got rid of double loop, cleaned up
potential
+   memory corruption if concat_strings was called with more than five
args
+
 2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de

* url.c (url_parse): little code cleanup
diff --git a/src/utils.c b/src/utils.c
index 78c282e..dbeb9fe 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -356,42 +356,36 @@ char *
 concat_strings (const char *str0, ...)
 {
   va_list args;
-  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
   char *ret, *p;

   const char *next_str;
-  int total_length = 0;
-  size_t argcount;
+  size_t len;
+  size_t total_length = 0;
+  size_t charsize = sizeof (char);
+  size_t chunksize = 64;
+  size_t bufsize = 64;
+
+  p = ret = xmalloc (charsize * bufsize);

   /* Calculate the length of and allocate the resulting string. */

-  argcount = 0;
   va_start (args, str0);
   for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
 {
-  int len = strlen (next_str);
-  if (argcount  countof (saved_lengths))
-saved_lengths[argcount++] = len;
+  len = strlen (next_str);
+  if (len == 0) {
+  continue;
+  }
   total_length += len;
-}
-  va_end (args);
-  p = ret = xmalloc (total_length + 1);
-
-  /* Copy the strings into the allocated space. */
-
-  argcount = 0;
-  va_start (args, str0);
-  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
-{
-  int len;
-  if (argcount  countof (saved_lengths))
-len = saved_lengths[argcount++];
-  else
-len = strlen (next_str);
+  if (total_length  bufsize) {
+  bufsize += chunksize;
+  ret = xrealloc (ret, charsize * bufsize);
+  }
   memcpy (p, next_str, len);
   p += len;
 }
   va_end (args);
+  ret = xrealloc (ret, charsize * total_length + 1);
   *p = '\0';

   return ret;



2014-10-19 16:16 GMT+02:00 Giuseppe Scrivano gscriv...@gnu.org:

 Pär Karlsson feino...@gmail.com writes:

  Hi, I fould a potential gotcha when playing with clang's code analysis
 tool.
 
  The concat_strings function silently stopped counting string lengths when
  given more than 5 arguments. clang warned about potential garbage values
 in
  the saved_lengths array, so I redid it with this approach.
 
  All tests working ok with this patch.

 thanks for your contribution.  I've just few comments:


  commit 2d855670e0e1fbe578506b376cdd40b0e465d3ef
  Author: Pär Karlsson feino...@gmail.com
  Date:   Thu Oct 16 21:41:36 2014 +0200
 
  Updated ChangeLog
 

 we usually update the changelog in the same commit that made the change,
 so please squash these two commits into one.

 Also, it doesn't apply on current git master, as it seems to be based on
 a old version of git from the ChangeLog file context, could you rebase
 onto the master branch?


  diff --git a/src/utils.c b/src/utils.c
  index 78c282e..93c9ddc 100644
  --- a/src/utils.c
  +++ b/src/utils.c
  @@ -356,7 +356,8 @@ char *
   concat_strings (const char *str0, ...)
   {
 va_list args;
  -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
  +  size_t psize = sizeof(int);

 please leave a space between sizeof and '(' as mandated by the GNU
 coding standards.

  +  int *saved_lengths = xmalloc (psize);
 char *ret, *p;
 
 const char *next_str;
  @@ -370,8 +371,8 @@ concat_strings (const char *str0, ...)
 for (next_str = str0; next_str != NULL; next_str = va_arg (args, char
 *))
   {
 int len = strlen (next_str);
  -  if (argcount  countof (saved_lengths))
  -saved_lengths[argcount++] = len;
  +  saved_lengths[argcount++] = len;
  +  xrealloc(saved_lengths, psize * argcount);

 same here.

 total_length += len;
   }
 va_end (args);
  @@ -393,7 +394,7 @@ concat_strings (const char *str0, ...)
   }
 va_end (args);
 *p = '\0';
  -
  +  free(saved_lengths);

 and here.

 Regards,
 Giuseppe



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Pär Karlsson
Whoops, I realised I failed on the GNU coding standards, please disregard
the last one; the patch below should be better.

My apologies :-/

/Pär

diff --git a/src/ChangeLog b/src/ChangeLog
index d5aeca0..87abd85 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,8 @@
+2014-10-20 Pär Karlsson  feino...@gmail.com
+
+   * utils.c (concat_strings): got rid of double loop, cleaned up
potential
+   memory corruption if concat_strings was called with more than five
args
+
 2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de

* url.c (url_parse): little code cleanup
diff --git a/src/utils.c b/src/utils.c
index 78c282e..5f359e0 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -356,42 +356,36 @@ char *
 concat_strings (const char *str0, ...)
 {
   va_list args;
-  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
   char *ret, *p;

   const char *next_str;
-  int total_length = 0;
-  size_t argcount;
+  size_t len;
+  size_t total_length = 0;
+  size_t charsize = sizeof (char);
+  size_t chunksize = 64;
+  size_t bufsize = 64;
+
+  p = ret = xmalloc (charsize * bufsize);

   /* Calculate the length of and allocate the resulting string. */

-  argcount = 0;
   va_start (args, str0);
   for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
 {
-  int len = strlen (next_str);
-  if (argcount  countof (saved_lengths))
-saved_lengths[argcount++] = len;
+  len = strlen (next_str);
+  if (len == 0)
+continue;
   total_length += len;
-}
-  va_end (args);
-  p = ret = xmalloc (total_length + 1);
-
-  /* Copy the strings into the allocated space. */
-
-  argcount = 0;
-  va_start (args, str0);
-  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
-{
-  int len;
-  if (argcount  countof (saved_lengths))
-len = saved_lengths[argcount++];
-  else
-len = strlen (next_str);
+  if (total_length  bufsize)
+  {
+bufsize += chunksize;
+ret = xrealloc (ret, charsize * bufsize);
+  }
   memcpy (p, next_str, len);
   p += len;
 }
   va_end (args);
+  ret = xrealloc (ret, charsize * total_length + 1);
   *p = '\0';

   return ret;

2014-10-20 22:14 GMT+02:00 Pär Karlsson feino...@gmail.com:

 Thank you for your feedback and suggestions.

 I thought about this during the weekend and figured it could be done much
 more efficiently, by only looping through the arguments once. Also, I
 realized the the original version would have segfaulted if called with more
 than 5 args, since the destination memory never got allocated before the
 memcpy (in the current code, it never is, though!).

 I cleaned up the code according to the GNU coding standards as well. The
 test suite rolls with this, and I think it looks better (although the
 function is only really called in a handful of places in all of wget).

 Best regards,

 /Pär

 Patch below:

 diff --git a/src/ChangeLog b/src/ChangeLog
 index d5aeca0..87abd85 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-20 Pär Karlsson  feino...@gmail.com
 +
 +   * utils.c (concat_strings): got rid of double loop, cleaned up
 potential
 +   memory corruption if concat_strings was called with more than five
 args
 +
  2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de

 * url.c (url_parse): little code cleanup
 diff --git a/src/utils.c b/src/utils.c
 index 78c282e..dbeb9fe 100644
 --- a/src/utils.c
 +++ b/src/utils.c
 @@ -356,42 +356,36 @@ char *
  concat_strings (const char *str0, ...)
  {
va_list args;
 -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
char *ret, *p;

const char *next_str;
 -  int total_length = 0;
 -  size_t argcount;
 +  size_t len;
 +  size_t total_length = 0;
 +  size_t charsize = sizeof (char);
 +  size_t chunksize = 64;
 +  size_t bufsize = 64;
 +
 +  p = ret = xmalloc (charsize * bufsize);

/* Calculate the length of and allocate the resulting string. */

 -  argcount = 0;
va_start (args, str0);
for (next_str = str0; next_str != NULL; next_str = va_arg (args, char
 *))
  {
 -  int len = strlen (next_str);
 -  if (argcount  countof (saved_lengths))
 -saved_lengths[argcount++] = len;
 +  len = strlen (next_str);
 +  if (len == 0) {
 +  continue;
 +  }
total_length += len;
 -}
 -  va_end (args);
 -  p = ret = xmalloc (total_length + 1);
 -
 -  /* Copy the strings into the allocated space. */
 -
 -  argcount = 0;
 -  va_start (args, str0);
 -  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char
 *))
 -{
 -  int len;
 -  if (argcount  countof (saved_lengths))
 -len = saved_lengths[argcount++];
 -  else
 -len = strlen (next_str);
 +  if (total_length  bufsize) {
 +  bufsize += chunksize;
 +  ret = xrealloc (ret, charsize * bufsize);
 +  }
memcpy (p, next_str, len);
p += len;
   

Re: [Bug-wget] Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)

2014-10-20 Thread Ángel González

On 21/10/14 01:20, Gabriel Somlo wrote:

Hi Giuseppe,

I think I found a regression in the development branch of wget, which
wasn't present in 1.15. Using git bisect, it appears the offending
commit was 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2 (Drop usage of
strncpy) from Jun. 9 2014.

To reproduce, grab a wide and shallow copy of wikipedia.org, like so:


   wget -rpkEHN -e robots=off --random-wait -t 2 -U mozilla -l 1 \
-P ./vservers wikipedia.org


then (some 20 minutes or so later) open ./vservers/wikipedia.org/index.html
in your (firefox-32.0.2-1.fc20.x86_64) browser as a file.

Before commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2, the page looks
normal. After said commit, it looks (for me) like in the attached
screenshot.

I couldn't revert the commit, as subsequent changes have accumulated
since it was applied. I'm also relatively new to the code base, so
figuring out exactly what went wrong might be significantly faster
for you :)

I can (if necessary) open a ticket in the bug tracker (e.g. for a
better place to upload the screenshot, which may not make it through
on the mailing list), just let me know what you think.

Thanks much,
--Gabriel

Thanks for your report!

After a quick look, get_uri_string seems to not be taking into account 
the end of the url() parameter (it was before 8e6de1fb5).

Can you check if this patch fixes the issue?

diff --git a/src/css-url.c b/src/css-url.c
index c605798..34a20af 100644
--- a/src/css-url.c
+++ b/src/css-url.c
@@ -72,6 +72,8 @@ extern int yylex (void);
 static char *
 get_uri_string (const char *at, int *pos, int *length)
 {
+  char *uri;
+
   if (0 != strncasecmp (at + *pos, url(, 4))
 return NULL;

@@ -97,7 +99,14 @@ get_uri_string (const char *at, int *pos, int *length)
   *length -= 2;
 }

-  return xstrdup (at + *pos);
+  uri = xmalloc (*length + 1);
+  if (uri)
+{
+  strncpy (uri, at + *pos, *length);
+  uri[*length] = '\0';
+}
+
+  return uri;
 }

 void





Re: [Bug-wget] Send Content-Length with POST 0 length body

2014-10-20 Thread Darshit Shah
I think we should handle PUT and POST alone. And add additional logic to
handle a 411 response to any request.

Wget has traditionally supported POST requests and PUT has a very similar
usage, for anything else, the user should use the - - send-header commands.
On 20-Oct-2014 1:16 pm, Tim Rühsen tim.rueh...@gmx.de wrote:

Am Montag, 20. Oktober 2014, 20:08:05 schrieb Matthew Atkinson:
 On 20/10/14 16:26, Giuseppe Scrivano wrote:
  should we do this only for POST requests?  What about doing it in any
  case that !(opt.body_data || opt.body_file)?
 
  Regards,
  Giuseppe

 From http://tools.ietf.org/html/rfc7230#section-3.3.2

  A user agent SHOULD send a Content-Length in a request message when
  no Transfer-Encoding is sent and the request method defines a
  meaning for an enclosed payload body.  For example, a Content-Length
  header field is normally sent in a POST request even when the value
  is 0 (indicating an empty payload body).  A user agent SHOULD
  NOT send a Content-Length header field when the request message
  does not contain a payload body and the method semantics do not
  anticipate such a body.

 I would think that we should do this for POST and PUT, but nothing
 else

Since Wget does not explicitly support PUT, we may just care for the POST
request. Since servers may reject a POST without Content-Length, we are
better
off with supplying one. Since PUT (and also PATCH) request 'anticipate' a
body,
we could also care for these.

Matthew, could you also check for 'put and 'patch' request and send an
amended
version of the patch. FYI, HTTP PATCH request is rfc5789.


RFC 7230
3.3.2. Content-Length

   A user agent SHOULD send a Content-Length in a request message when
   no Transfer-Encoding is sent and the request method defines a meaning
   for an enclosed payload body.  For example, a Content-Length header
   field is normally sent in a POST request even when the value is 0
   (indicating an empty payload body).  A user agent SHOULD NOT send a
   Content-Length header field when the request message does not contain
   a payload body and the method semantics do not anticipate such a
   body.


RFC7231
6.5.10. 411 Length Required

   The 411 (Length Required) status code indicates that the server
   refuses to accept the request without a defined Content-Length
   (Section 3.3.2 of [RFC7230]).  The client MAY repeat the request if
   it adds a valid Content-Length header field containing the length of
   the message body in the request message.


Tim


Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
Hi, Pär.  I got a few comments inline.

On 21 October 2014 05:47, Pär Karlsson feino...@gmail.com wrote:
 Whoops, I realised I failed on the GNU coding standards, please disregard
 the last one; the patch below should be better.

 My apologies :-/

 /Pär

 diff --git a/src/ChangeLog b/src/ChangeLog
 index d5aeca0..87abd85 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-20 Pär Karlsson  feino...@gmail.com
 +
 +   * utils.c (concat_strings): got rid of double loop, cleaned up
 potential
 +   memory corruption if concat_strings was called with more than five
 args
 +
  2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de

 * url.c (url_parse): little code cleanup
 diff --git a/src/utils.c b/src/utils.c
 index 78c282e..5f359e0 100644
 --- a/src/utils.c
 +++ b/src/utils.c
 @@ -356,42 +356,36 @@ char *
  concat_strings (const char *str0, ...)
  {
va_list args;
 -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
char *ret, *p;

const char *next_str;
 -  int total_length = 0;
 -  size_t argcount;
 +  size_t len;
 +  size_t total_length = 0;
 +  size_t charsize = sizeof (char);

I am not sure here.  Do we always assume sizeof(char) to be 1 for
platforms supported by wget?

 +  size_t chunksize = 64;
 +  size_t bufsize = 64;
 +
 +  p = ret = xmalloc (charsize * bufsize);

/* Calculate the length of and allocate the resulting string. */

 -  argcount = 0;
va_start (args, str0);
for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
  {
 -  int len = strlen (next_str);
 -  if (argcount  countof (saved_lengths))
 -saved_lengths[argcount++] = len;
 +  len = strlen (next_str);
 +  if (len == 0)
 +continue;
total_length += len;
 -}
 -  va_end (args);
 -  p = ret = xmalloc (total_length + 1);
 -
 -  /* Copy the strings into the allocated space. */
 -
 -  argcount = 0;
 -  va_start (args, str0);
 -  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
 -{
 -  int len;
 -  if (argcount  countof (saved_lengths))
 -len = saved_lengths[argcount++];
 -  else
 -len = strlen (next_str);
 +  if (total_length  bufsize)
 +  {
 +bufsize += chunksize;

Should be `bufsize = total_length` ?

 +ret = xrealloc (ret, charsize * bufsize);
 +  }
memcpy (p, next_str, len);

Xrealloc may return a new block different from p, so memcpy(p, ...)
may not be what you want.

p += len;
  }
va_end (args);
 +  ret = xrealloc (ret, charsize * total_length + 1);
*p = '\0';

Malloc takes time.  How about counting total_length in one loop and
doing the copy in another?

Regards.

yousong


return ret;




Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
On 21 October 2014 10:02, Yousong Zhou yszhou4t...@gmail.com wrote:
 Hi, Pär.  I got a few comments inline.

 On 21 October 2014 05:47, Pär Karlsson feino...@gmail.com wrote:
 Whoops, I realised I failed on the GNU coding standards, please disregard
 the last one; the patch below should be better.

 My apologies :-/

 /Pär

 diff --git a/src/ChangeLog b/src/ChangeLog
 index d5aeca0..87abd85 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-20 Pär Karlsson  feino...@gmail.com
 +
 +   * utils.c (concat_strings): got rid of double loop, cleaned up
 potential
 +   memory corruption if concat_strings was called with more than five
 args
 +
  2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de

 * url.c (url_parse): little code cleanup
 diff --git a/src/utils.c b/src/utils.c
 index 78c282e..5f359e0 100644
 --- a/src/utils.c
 +++ b/src/utils.c
 @@ -356,42 +356,36 @@ char *
  concat_strings (const char *str0, ...)
  {
va_list args;
 -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
char *ret, *p;

const char *next_str;
 -  int total_length = 0;
 -  size_t argcount;
 +  size_t len;
 +  size_t total_length = 0;
 +  size_t charsize = sizeof (char);

 I am not sure here.  Do we always assume sizeof(char) to be 1 for
 platforms supported by wget?

 +  size_t chunksize = 64;
 +  size_t bufsize = 64;
 +
 +  p = ret = xmalloc (charsize * bufsize);

/* Calculate the length of and allocate the resulting string. */

 -  argcount = 0;
va_start (args, str0);
for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
  {
 -  int len = strlen (next_str);
 -  if (argcount  countof (saved_lengths))
 -saved_lengths[argcount++] = len;
 +  len = strlen (next_str);
 +  if (len == 0)
 +continue;
total_length += len;
 -}
 -  va_end (args);
 -  p = ret = xmalloc (total_length + 1);
 -
 -  /* Copy the strings into the allocated space. */
 -
 -  argcount = 0;
 -  va_start (args, str0);
 -  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
 -{
 -  int len;
 -  if (argcount  countof (saved_lengths))
 -len = saved_lengths[argcount++];
 -  else
 -len = strlen (next_str);
 +  if (total_length  bufsize)
 +  {
 +bufsize += chunksize;

 Should be `bufsize = total_length` ?

 +ret = xrealloc (ret, charsize * bufsize);
 +  }
memcpy (p, next_str, len);

 Xrealloc may return a new block different from p, so memcpy(p, ...)
 may not be what you want.

p += len;
  }
va_end (args);
 +  ret = xrealloc (ret, charsize * total_length + 1);
*p = '\0';

 Malloc takes time.  How about counting total_length in one loop and
 doing the copy in another?

I mean, we can skip the strlen part and just do strcpy in the second
loop as we already know we have enough space in the dest buffer for
all those null-terminated arguments.

 yousong



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
Hi, Pär

On 17 October 2014 03:50, Pär Karlsson feino...@gmail.com wrote:
 Hi, I fould a potential gotcha when playing with clang's code analysis tool.

 The concat_strings function silently stopped counting string lengths when
 given more than 5 arguments. clang warned about potential garbage values in
 the saved_lengths array, so I redid it with this approach.

After taking a closer look, I guess the old implementation is fine.
saved_length[] is used as a buffer for lengths of the first 5
arguments and there is a bound check with its length.  Maybe it's a
false-positive from clang tool?

Sorry for the noise...

Regards.

yousong


 All tests working ok with this patch.

 This is my first patch to this list, by the way. I'd be happy to help out
 more in the future.

 Best regards,

 /Pär Karlsson, Sweden

 

 commit 2d855670e0e1fbe578506b376cdd40b0e465d3ef
 Author: Pär Karlsson feino...@gmail.com
 Date:   Thu Oct 16 21:41:36 2014 +0200

 Updated ChangeLog

 diff --git a/src/ChangeLog b/src/ChangeLog
 index 1c4e2d5..1e39475 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-10-16  Pär Karlsson  feino...@gmail.com
 +
 +   * utils.c (concat_strings): fixed arbitrary limit of 5 arguments to
 +   function
 +
  2014-05-03  Tim Ruehsen  tim.rueh...@gmx.de

 * retr.c (retrieve_url): fixed memory leak

 commit 1fa9ff274dcb6e5a2dbbbc7d3fe2f139059c47f1
 Author: Pär Karlsson feino...@gmail.com
 Date:   Wed Oct 15 00:00:31 2014 +0200

 Generalized concat_strings argument length

   The concat_strings function seemed arbitrary to only accept a maximum
   of 5 arguments (the others were silently ignored).

   Also it had a potential garbage read of the values in the array.
   Updated with xmalloc/xrealloc/free

 diff --git a/src/utils.c b/src/utils.c
 index 78c282e..93c9ddc 100644
 --- a/src/utils.c
 +++ b/src/utils.c
 @@ -356,7 +356,8 @@ char *
  concat_strings (const char *str0, ...)
  {
va_list args;
 -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
 +  size_t psize = sizeof(int);
 +  int *saved_lengths = xmalloc (psize);
char *ret, *p;

const char *next_str;
 @@ -370,8 +371,8 @@ concat_strings (const char *str0, ...)
for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
  {
int len = strlen (next_str);
 -  if (argcount  countof (saved_lengths))
 -saved_lengths[argcount++] = len;
 +  saved_lengths[argcount++] = len;
 +  xrealloc(saved_lengths, psize * argcount);
total_length += len;
  }
va_end (args);
 @@ -393,7 +394,7 @@ concat_strings (const char *str0, ...)
  }
va_end (args);
*p = '\0';
 -
 +  free(saved_lengths);
return ret;
  }
  ^L



Re: [Bug-wget] Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)

2014-10-20 Thread Gabriel Somlo
On Tue, Oct 21, 2014 at 01:49:41AM +0200, ?ngel Gonz?lez wrote:
 On 21/10/14 01:20, Gabriel Somlo wrote:
 I think I found a regression in the development branch of wget, which
 wasn't present in 1.15. Using git bisect, it appears the offending
 commit was 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2 (Drop usage of
 strncpy) from Jun. 9 2014.
 
 To reproduce, grab a wide and shallow copy of wikipedia.org, like so:
 
 
wget -rpkEHN -e robots=off --random-wait -t 2 -U mozilla -l 1 \
 -P ./vservers wikipedia.org
 
 
 then (some 20 minutes or so later) open ./vservers/wikipedia.org/index.html
 in your (firefox-32.0.2-1.fc20.x86_64) browser as a file.
 
 Thanks for your report!
 
 After a quick look, get_uri_string seems to not be taking into account the
 end of the url() parameter (it was before 8e6de1fb5).
 Can you check if this patch fixes the issue?

Yeah, that took care of it for me !

Thanks again,
--Gabriel

 diff --git a/src/css-url.c b/src/css-url.c
 index c605798..34a20af 100644
 --- a/src/css-url.c
 +++ b/src/css-url.c
 @@ -72,6 +72,8 @@ extern int yylex (void);
  static char *
  get_uri_string (const char *at, int *pos, int *length)
  {
 +  char *uri;
 +
if (0 != strncasecmp (at + *pos, url(, 4))
  return NULL;
 
 @@ -97,7 +99,14 @@ get_uri_string (const char *at, int *pos, int *length)
*length -= 2;
  }
 
 -  return xstrdup (at + *pos);
 +  uri = xmalloc (*length + 1);
 +  if (uri)
 +{
 +  strncpy (uri, at + *pos, *length);
 +  uri[*length] = '\0';
 +}
 +
 +  return uri;
  }
 
  void