Re: [Bug-wget] Multi segment download

2015-08-29 Thread Darshit Shah
Thanking You,
Darshit Shah
Sent from mobile device. Please excuse my brevity
On 29-Aug-2015 1:13 pm, Tim Rühsen tim.rueh...@gmx.de wrote:

 Hi,

 normally it makes much more sense when having several download mirrors and
 checksums for each chunk. The perfect technique for such is called
'Metalink'
 (more on www.metalinker.org).
 Wget has it in branch 'master'. A GSOC project of Hubert Tarasiuk.

Sometimes the evil ISPs enforce a per connection bandwidth limit. In such a
case, multi segment downloads from a single server do make sense.

Since metalink already has support for downloading a file over multiple
connections, it should not be too difficult to reuse the code for use
outside of metalink.

I think it would be a good idea to do so. I'm not sure if all the possible
variations of the range headers are parsed by Wget.
 Additionally, Wget2 is under development, already having the option
--chunk-
 size (e.g. --chunk-size=1M) to start a multi-threaded download of a file.

 Regards, Tim


 Am Freitag, 28. August 2015, 15:41:27 schrieb Random Coder:
  On Fri, Aug 28, 2015 at 3:06 PM, Ander Juaristi ajuari...@gmx.es
wrote:
   Hi,
  
   Would you point us to some potential use cases? How would a Wget user
   benefit from such a feature? One of the best regarded feature of
download
   managers is the ability to resume paused downloads, and that's already
   supported by Wget. Apart from that, I can't come across any other use
   case. But that's me, maybe you have a broader overview.
  One possible feature, described in flowery language from a product
  description: ... splits files into several sections and downloads
  them simultaneously, allowing you to use any type of connection at the
  maximum available speed. With FDM download speed increases, or even
  more!
 
  And, just show this can help, at least in some situations, here's an
  example using curl (sorry, I don't know how to do a similar request in
  wget).  First a normal download of the file:
 
  curl -o all http://mirror.internode.on.net/pub/test/100meg.test
 
  This command takes an average of 48.9 seconds to run on my current
  network connection.  Now, if I split up the download as the download
  manager will, and run these four commands at the same instant:
 
  curl -o part1 -r0-2500
  http://mirror.internode.on.net/pub/test/100meg.test curl -o part2
  -r2501-5000
  http://mirror.internode.on.net/pub/test/100meg.test
  curl -o part3 -r5001-7500
  http://mirror.internode.on.net/pub/test/100meg.test
  curl -o part4 -r7501-
  http://mirror.internode.on.net/pub/test/100meg.test
 
  The union of time it takes all four commands to run ends up being an
  average of 19.9 seconds over a few test runs on the same connection.
  There's some penalty here because I need to spend time combining the
  files afterwards, but if the command supported this logic internally,
  no doubt much of that work could be done up front as the file is
  downloaded.


Re: [Bug-wget] Multi segment download

2015-08-29 Thread Tim Rühsen
Hi,

normally it makes much more sense when having several download mirrors and 
checksums for each chunk. The perfect technique for such is called 'Metalink' 
(more on www.metalinker.org).
Wget has it in branch 'master'. A GSOC project of Hubert Tarasiuk.

Additionally, Wget2 is under development, already having the option --chunk-
size (e.g. --chunk-size=1M) to start a multi-threaded download of a file.

Regards, Tim


Am Freitag, 28. August 2015, 15:41:27 schrieb Random Coder:
 On Fri, Aug 28, 2015 at 3:06 PM, Ander Juaristi ajuari...@gmx.es wrote:
  Hi,
  
  Would you point us to some potential use cases? How would a Wget user
  benefit from such a feature? One of the best regarded feature of download
  managers is the ability to resume paused downloads, and that's already
  supported by Wget. Apart from that, I can't come across any other use
  case. But that's me, maybe you have a broader overview.
 One possible feature, described in flowery language from a product
 description: ... splits files into several sections and downloads
 them simultaneously, allowing you to use any type of connection at the
 maximum available speed. With FDM download speed increases, or even
 more!
 
 And, just show this can help, at least in some situations, here's an
 example using curl (sorry, I don't know how to do a similar request in
 wget).  First a normal download of the file:
 
 curl -o all http://mirror.internode.on.net/pub/test/100meg.test
 
 This command takes an average of 48.9 seconds to run on my current
 network connection.  Now, if I split up the download as the download
 manager will, and run these four commands at the same instant:
 
 curl -o part1 -r0-2500
 http://mirror.internode.on.net/pub/test/100meg.test curl -o part2
 -r2501-5000
 http://mirror.internode.on.net/pub/test/100meg.test
 curl -o part3 -r5001-7500
 http://mirror.internode.on.net/pub/test/100meg.test
 curl -o part4 -r7501-
 http://mirror.internode.on.net/pub/test/100meg.test
 
 The union of time it takes all four commands to run ends up being an
 average of 19.9 seconds over a few test runs on the same connection.
 There's some penalty here because I need to spend time combining the
 files afterwards, but if the command supported this logic internally,
 no doubt much of that work could be done up front as the file is
 downloaded.


signature.asc
Description: This is a digitally signed message part.


[Bug-wget] Unit test case for parse_content_range()

2015-08-29 Thread Darshit Shah
I've written a unit test for the parse_content_range() method.
However, I haven't yet populated it with various test cases.
Sharing the patch for the unit test here. I will add more test cases
for this test later.

Kindly do review the patch. If no one complains, I'll push it in a
couple of days.

-- 
Thanking You,
Darshit Shah
From 154760a79b3981f8cb5fcf7f643ae2e2579aa887 Mon Sep 17 00:00:00 2001
From: Darshit Shah dar...@gmail.com
Date: Sat, 29 Aug 2015 23:08:39 +0530
Subject: [PATCH] Add unit test for parse_content_range() method

* http.c (test_parse_range_header): New function to test the
function for parsing the HTTP/1.1 Content-Range header.
* test.[ch]: Same
---
 src/http.c | 38 ++
 src/test.c |  1 +
 src/test.h |  1 +
 3 files changed, 40 insertions(+)

diff --git a/src/http.c b/src/http.c
index e96cad7..9bba036 100644
--- a/src/http.c
+++ b/src/http.c
@@ -4892,6 +4892,44 @@ ensure_extension (struct http_stat *hs, const char *ext, int *dt)
 }
 
 #ifdef TESTING
+
+const char *
+test_parse_range_header(void)
+{
+  static const struct {
+const char * rangehdr;
+const wgint firstbyte;
+const wgint lastbyte;
+const wgint length;
+  } test_array[] = {
+  { bytes 0-1000/1, 0, 1000, 1}
+  };
+
+  /* wgint *firstbyteptr = xmalloc(sizeof(wgint)); */
+  wgint firstbyteptr[sizeof(wgint)];
+  wgint *lastbyteptr = xmalloc(sizeof(wgint));
+  wgint *lengthptr = xmalloc(sizeof(wgint));
+  bool result;
+  for (unsigned i = 0; i  countof (test_array); i++)
+{
+  result = parse_content_range(test_array[0].rangehdr, firstbyteptr, lastbyteptr, lengthptr);
+#if 0
+  printf(%ld %ld, test_array[i].firstbyte, *firstbyteptr);
+  printf(%ld %ld, test_array[i].lastbyte, *lastbyteptr);
+  printf(%ld %ld, test_array[i].length, *lengthptr);
+#endif
+  mu_assert(test_parse_range_header: Parsing failed, result);
+  mu_assert(test_parse_range_header: Bad parse, test_array[i].firstbyte == *firstbyteptr 
+  test_array[i].lastbyte == *lastbyteptr 
+  test_array[i].length == *lengthptr);
+}
+
+  /* xfree(firstbyteptr); */
+  xfree(lastbyteptr);
+  xfree(lengthptr);
+  return NULL;
+}
+
 const char *
 test_parse_content_disposition(void)
 {
diff --git a/src/test.c b/src/test.c
index 5278925..cb01de3 100644
--- a/src/test.c
+++ b/src/test.c
@@ -54,6 +54,7 @@ all_tests(void)
   mu_run_test (test_has_key);
 #endif
   mu_run_test (test_parse_content_disposition);
+  mu_run_test (test_parse_range_header);
   mu_run_test (test_subdir_p);
   mu_run_test (test_dir_matches_p);
   mu_run_test (test_commands_sorted);
diff --git a/src/test.h b/src/test.h
index f74c162..4e0e1f2 100644
--- a/src/test.h
+++ b/src/test.h
@@ -48,6 +48,7 @@ const char *test_has_key (void);
 const char *test_find_key_value (void);
 const char *test_find_key_values (void);
 const char *test_parse_content_disposition(void);
+const char *test_parse_range_header(void);
 const char *test_commands_sorted(void);
 const char *test_cmd_spec_restrict_file_names(void);
 const char *test_is_robots_txt_url(void);
-- 
2.5.0



Re: [Bug-wget] Unit test case for parse_content_range()

2015-08-29 Thread Tim Rühsen
Am Samstag, 29. August 2015, 23:13:23 schrieb Darshit Shah:
 I've written a unit test for the parse_content_range() method.
 However, I haven't yet populated it with various test cases.
 Sharing the patch for the unit test here. I will add more test cases
 for this test later.
 
 Kindly do review the patch. If no one complains, I'll push it in a
 couple of days.

Hi Darshit,

some of the 'valid' tests

0-max
{ bytes 0-1000/1000, 0, 1000, 1000}
non0-max
{ bytes 1-1000/1000, 1, 1000, 1000}
0-valid
{ bytes 0-500/1000, 0, 500, 1000}
non0-valid
{ bytes 1-500/1000, 1, 500, 1000}
0-(max-1)
{ bytes 0-999/1000, 0, 999, 1000}
non0-(max-1)
{ bytes 1-999/1000, 1, 999, 1000}

And please add some tests using =2^31 and =2^32 as values.

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Unit test case for parse_content_range()

2015-08-29 Thread Darshit Shah
On Sun, Aug 30, 2015 at 12:51 AM, Tim Rühsen tim.rueh...@gmx.de wrote:
 Am Samstag, 29. August 2015, 23:13:23 schrieb Darshit Shah:
 I've written a unit test for the parse_content_range() method.
 However, I haven't yet populated it with various test cases.
 Sharing the patch for the unit test here. I will add more test cases
 for this test later.

 Kindly do review the patch. If no one complains, I'll push it in a
 couple of days.

 Hi Darshit,

 some of the 'valid' tests

On closer inspection, some of these are *NOT* valid Header values. But
Wget currently passes them. This is a parsing bug in my opinion.

RFC 7233 states:

 A Content-Range field value is invalid if it contains a
   byte-range-resp that has a last-byte-pos value less than its
   first-byte-pos value, or a complete-length value less than or equal
   to its last-byte-pos value.  The recipient of an invalid
   Content-Range MUST NOT attempt to recombine the received content with
   a stored representation.

Based on this, the first two examples provided are illegal. Similarly
a header value such as:
{ bytes 100-99/1000, 100, 99, 1000}
should also be very clearly illegal, but Wget currently allows it. I'm
not sure about the behaviour of the program on receipt of such a
header, but the function should clearly be failing on this test.

 0-max
 { bytes 0-1000/1000, 0, 1000, 1000}
 non0-max
 { bytes 1-1000/1000, 1, 1000, 1000}
 0-valid
 { bytes 0-500/1000, 0, 500, 1000}
 non0-valid
 { bytes 1-500/1000, 1, 500, 1000}
 0-(max-1)
 { bytes 0-999/1000, 0, 999, 1000}
 non0-(max-1)
 { bytes 1-999/1000, 1, 999, 1000}

 And please add some tests using =2^31 and =2^32 as values.

 Regards, Tim



-- 
Thanking You,
Darshit Shah