[Bug-wget] Two fixes: Memory leak with chunked responses / Chunked responses and WARC files

2012-01-27 Thread Gijs van Tulder

Hi,

Here are two small patches. I hope they will be useful.

First, a patch that fixes a memory leak in fd_read_body (src/retr.c) and 
skip_short_body (src/http.c) when it retrieves a response with 
Transfer-Encoding: chunked. Both functions make calls to fd_read_line 
but never free the result.


Second, a patch to the fd_read_body function that changes the way 
chunked responses are saved in the WARC file. Until now, wget would 
write a de-chunked response to the WARC file, which is wrong: the WARC 
file is supposed to have an exact copy of the HTTP response, so it 
should also include the chunk headers.


The first patch fixes the memory leaks. The second patch changes 
fd_read_body to save the full, chunked response in the WARC file.


Regards,

Gijs

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-01-11 14:27:06 +
+++ src/ChangeLog	2012-01-26 21:30:19 +
@@ -1,3 +1,8 @@
+2012-01-27  Gijs van Tulder  gvtul...@gmail.com
+
+	* retr.c (fd_read_body): Fix a memory leak with chunked responses.
+	* http.c (skip_short_body): Fix the same memory leak.
+
 2012-01-09  Gijs van Tulder  gvtul...@gmail.com
 
 	* init.c: Disable WARC compression if zlib is disabled.

=== modified file 'src/http.c'
--- src/http.c	2012-01-08 23:03:23 +
+++ src/http.c	2012-01-26 21:30:19 +
@@ -951,9 +951,12 @@
 break;
 
   remaining_chunk_size = strtol (line, endl, 16);
+  xfree (line);
+
   if (remaining_chunk_size == 0)
 {
-  fd_read_line (fd);
+  line = fd_read_line (fd);
+  xfree_null (line);
   break;
 }
 }
@@ -978,8 +981,13 @@
 {
   remaining_chunk_size -= ret;
   if (remaining_chunk_size == 0)
-if (fd_read_line (fd) == NULL)
-  return false;
+{
+  char *line = fd_read_line (fd);
+  if (line == NULL)
+return false;
+  else
+xfree (line);
+}
 }
 
   /* Safe even if %.*s bogusly expects terminating \0 because

=== modified file 'src/retr.c'
--- src/retr.c	2011-11-04 21:25:00 +
+++ src/retr.c	2012-01-26 21:30:19 +
@@ -307,11 +307,16 @@
 }
 
   remaining_chunk_size = strtol (line, endl, 16);
+  xfree (line);
+
   if (remaining_chunk_size == 0)
 {
   ret = 0;
-  if (fd_read_line (fd) == NULL)
+  line = fd_read_line (fd);
+  if (line == NULL)
 ret = -1;
+  else
+xfree (line);
   break;
 }
 }
@@ -371,11 +376,16 @@
 {
   remaining_chunk_size -= ret;
   if (remaining_chunk_size == 0)
-if (fd_read_line (fd) == NULL)
-  {
-ret = -1;
-break;
-  }
+{
+  char *line = fd_read_line (fd);
+  if (line == NULL)
+{
+  ret = -1;
+  break;
+}
+  else
+xfree (line);
+}
 }
 }
 


=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-01-26 21:30:19 +
+++ src/ChangeLog	2012-01-26 21:56:27 +
@@ -1,3 +1,9 @@
+2012-01-27  Gijs van Tulder  gvtul...@gmail.com
+
+	* retr.c (fd_read_body): If the response is chunked, the chunk
+	headers are now written to the WARC file, making the WARC file
+	an exact copy of the HTTP response.
+
 2012-01-27  Gijs van Tulder  gvtul...@gmail.com
 
 	* retr.c (fd_read_body): Fix a memory leak with chunked responses.
 	* http.c (skip_short_body): Fix the same memory leak.

=== modified file 'src/retr.c'
--- src/retr.c	2012-01-26 21:30:19 +
+++ src/retr.c	2012-01-26 21:56:27 +
@@ -213,6 +213,9 @@
the data is stored to ELAPSED.
 
If OUT2 is non-NULL, the contents is also written to OUT2.
+   OUT2 will get an exact copy of the response: if this is a chunked
+   response, everything -- including the chunk headers -- is written
+   to OUT2.  (OUT will only get the unchunked response.)
 
The function exits and returns the amount of data read.  In case of
error while reading data, -1 is returned.  In case of error while
@@ -305,6 +308,8 @@
   ret = -1;
   break;
 }
+  else if (out2 != NULL)
+fwrite (line, 1, strlen (line), out2);
 
   remaining_chunk_size = strtol (line, endl, 16);
   xfree (line);
@@ -316,7 +321,11 @@
   if (line == NULL)
 ret = -1;
   else
-xfree (line);
+{
+  if (out2 != NULL)
+  

[Bug-wget] Fwd: [PATCH] [wget-bug #33210], Add an option to output bandwidth in bits

2012-01-27 Thread Sasikanth
  Modified the calc_rate function to calculate bandwidth in powers of ten
(SI-prefix format)
  for --bits option.

 Please review the changes

  Thanks
  Sasi

-- Forwarded message --
From: Sasikanth sasikanth@gmail.com
Date: Wed, Jan 18, 2012 at 5:43 PM
Subject: Re: [Bug-wget] [PATCH] [wget-bug #33210], Add an option to output
bandwidth in bits
To: Hrvoje Niksic hnik...@xemacs.org
Cc: bug-wget@gnu.org


On Sun, Jan 15, 2012 at 8:51 PM, Hrvoje Niksic hnik...@xemacs.org wrote:

 Sasikanth sasikanth@gmail.com writes:

  No one asked. i had just thought it will be good to display all the
 output
  in either bits or bytes to avoid confusion to the user (I had
  confused).

 I understand that, but I have never seen a downloading agent output data
 length in bits, so displaying the data in bits would likely cause much
 more confusion and/or be less useful.  (Data throughput in bits, on the
 other hand, is quite common.)  With the original implementation of
 --bits I expect that someone would soon ask for
 --bits-for-bandwidth-only.

  Anyhow thanks I will modify the patch.

 Thanks.

 Note that the patch has another problem: while Wget's K, M, and G
 refer to (what is now known as) kibibytes, mebibytes, and gibibytes,
 bandwidth is measured in kilobits, megabits, and gigabits per second.
 Bandwidth units all refer to powers of ten, not to powers of two, so it
 is incorrect for calc_rate to simply increase the byte multipliers by 8.

 Hrvoje


  Modified the calc_rate function to calculate bandwidth in powers of ten
(SI-prefix format)
  for --bits option.

 Please review the changes

  Thanks
  Sasi
diff -ur orig/wget-1.13.4/src/init.c wget-1.13.4/src/init.c
--- orig/wget-1.13.4/src/init.c 2011-08-19 15:36:20.0 +0530
+++ wget-1.13.4/src/init.c  2012-01-18 14:42:56.240973950 +0530
@@ -126,6 +126,7 @@
   { backups,  opt.backups,   cmd_number },
   { base, opt.base_href, cmd_string },
   { bindaddress,  opt.bind_address,  cmd_string },
+  { bits, opt.bits_fmt,  cmd_boolean},
 #ifdef HAVE_SSL
   { cacertificate,opt.ca_cert,   cmd_file },
 #endif
diff -ur orig/wget-1.13.4/src/main.c wget-1.13.4/src/main.c
--- orig/wget-1.13.4/src/main.c 2011-09-06 19:20:11.0 +0530
+++ wget-1.13.4/src/main.c  2012-01-18 14:42:56.241973599 +0530
@@ -166,6 +166,7 @@
 { backups, 0, OPT_BOOLEAN, backups, -1 },
 { base, 'B', OPT_VALUE, base, -1 },
 { bind-address, 0, OPT_VALUE, bindaddress, -1 },
+{ bits, 0, OPT_BOOLEAN, bits, -1 },
 { IF_SSL (ca-certificate), 0, OPT_VALUE, cacertificate, -1 },
 { IF_SSL (ca-directory), 0, OPT_VALUE, cadirectory, -1 },
 { cache, 0, OPT_BOOLEAN, cache, -1 },
@@ -704,6 +705,11 @@
   -np, --no-parent don't ascend to the parent directory.\n),
 \n,
 
+N_(\
+Output format:\n),
+N_(\
+   --bits  Output bandwidth in bits.\n),
+\n,
 N_(Mail bug reports and suggestions to bug-wget@gnu.org.\n)
   };
 
diff -ur orig/wget-1.13.4/src/options.h wget-1.13.4/src/options.h
--- orig/wget-1.13.4/src/options.h  2011-08-06 15:54:32.0 +0530
+++ wget-1.13.4/src/options.h   2012-01-18 14:42:56.247982676 +0530
@@ -255,6 +255,7 @@
 
   bool show_all_dns_entries; /* Show all the DNS entries when resolving a
 name. */
+  bool bits_fmt;  /*Output bandwidth in bits format*/
 };
 
 extern struct options opt;
diff -ur orig/wget-1.13.4/src/progress.c wget-1.13.4/src/progress.c
--- orig/wget-1.13.4/src/progress.c 2011-01-01 17:42:35.0 +0530
+++ wget-1.13.4/src/progress.c  2012-01-18 14:42:56.249098685 +0530
@@ -861,7 +861,7 @@
   struct bar_progress_hist *hist = bp-hist;
 
   /* The progress bar should look like this:
- xx% [=== ] nn,nnn 12.34K/s  eta 36m 51s
+ xx% [=== ] nn,nnn 12.34KB/s  eta 36m 51s
 
  Calculate the geometry.  The idea is to assign as much room as
  possible to the progress bar.  The other idea is to never let
@@ -873,7 +873,7 @@
  xx%  or 100%  - percentage   - 4 chars
  []  - progress bar decorations - 2 chars
   nnn,nnn,nnn- downloaded bytes - 12 chars or very rarely 
more
-  12.5K/s- download rate - 8 chars
+  12.5KB/s- download rate   - 9 chars
eta 36m 51s   - ETA  - 14 chars
 
  =...   - progress bar - the rest
@@ -977,10 +977,11 @@
   *p++ = ' ';
 }
 
-  /*  12.52K/s */
+  /*  12.52Kb/s or 12.52KB/s */
   if (hist-total_time  0  hist-total_bytes)
 {
-  static const char *short_units[] = { B/s, K/s, M/s, G/s };
+  static const char *short_units[] = { B/s, KB/s, MB/s, GB/s };
+  static const char *short_units_bits[] = { b/s, Kb/s, Mb/s, Gb/s 
};
   int units = 0;
   /* Calculate the download speed using the history ring