Re: [Bug-wget] Multi segment download
Thanking You, Darshit Shah Sent from mobile device. Please excuse my brevity On 29-Aug-2015 1:13 pm, Tim Rühsen tim.rueh...@gmx.de wrote: Hi, normally it makes much more sense when having several download mirrors and checksums for each chunk. The perfect technique for such is called 'Metalink' (more on www.metalinker.org). Wget has it in branch 'master'. A GSOC project of Hubert Tarasiuk. Sometimes the evil ISPs enforce a per connection bandwidth limit. In such a case, multi segment downloads from a single server do make sense. Since metalink already has support for downloading a file over multiple connections, it should not be too difficult to reuse the code for use outside of metalink. I think it would be a good idea to do so. I'm not sure if all the possible variations of the range headers are parsed by Wget. Additionally, Wget2 is under development, already having the option --chunk- size (e.g. --chunk-size=1M) to start a multi-threaded download of a file. Regards, Tim Am Freitag, 28. August 2015, 15:41:27 schrieb Random Coder: On Fri, Aug 28, 2015 at 3:06 PM, Ander Juaristi ajuari...@gmx.es wrote: Hi, Would you point us to some potential use cases? How would a Wget user benefit from such a feature? One of the best regarded feature of download managers is the ability to resume paused downloads, and that's already supported by Wget. Apart from that, I can't come across any other use case. But that's me, maybe you have a broader overview. One possible feature, described in flowery language from a product description: ... splits files into several sections and downloads them simultaneously, allowing you to use any type of connection at the maximum available speed. With FDM download speed increases, or even more! And, just show this can help, at least in some situations, here's an example using curl (sorry, I don't know how to do a similar request in wget). First a normal download of the file: curl -o all http://mirror.internode.on.net/pub/test/100meg.test This command takes an average of 48.9 seconds to run on my current network connection. Now, if I split up the download as the download manager will, and run these four commands at the same instant: curl -o part1 -r0-2500 http://mirror.internode.on.net/pub/test/100meg.test curl -o part2 -r2501-5000 http://mirror.internode.on.net/pub/test/100meg.test curl -o part3 -r5001-7500 http://mirror.internode.on.net/pub/test/100meg.test curl -o part4 -r7501- http://mirror.internode.on.net/pub/test/100meg.test The union of time it takes all four commands to run ends up being an average of 19.9 seconds over a few test runs on the same connection. There's some penalty here because I need to spend time combining the files afterwards, but if the command supported this logic internally, no doubt much of that work could be done up front as the file is downloaded.
Re: [Bug-wget] Multi segment download
Hi, normally it makes much more sense when having several download mirrors and checksums for each chunk. The perfect technique for such is called 'Metalink' (more on www.metalinker.org). Wget has it in branch 'master'. A GSOC project of Hubert Tarasiuk. Additionally, Wget2 is under development, already having the option --chunk- size (e.g. --chunk-size=1M) to start a multi-threaded download of a file. Regards, Tim Am Freitag, 28. August 2015, 15:41:27 schrieb Random Coder: On Fri, Aug 28, 2015 at 3:06 PM, Ander Juaristi ajuari...@gmx.es wrote: Hi, Would you point us to some potential use cases? How would a Wget user benefit from such a feature? One of the best regarded feature of download managers is the ability to resume paused downloads, and that's already supported by Wget. Apart from that, I can't come across any other use case. But that's me, maybe you have a broader overview. One possible feature, described in flowery language from a product description: ... splits files into several sections and downloads them simultaneously, allowing you to use any type of connection at the maximum available speed. With FDM download speed increases, or even more! And, just show this can help, at least in some situations, here's an example using curl (sorry, I don't know how to do a similar request in wget). First a normal download of the file: curl -o all http://mirror.internode.on.net/pub/test/100meg.test This command takes an average of 48.9 seconds to run on my current network connection. Now, if I split up the download as the download manager will, and run these four commands at the same instant: curl -o part1 -r0-2500 http://mirror.internode.on.net/pub/test/100meg.test curl -o part2 -r2501-5000 http://mirror.internode.on.net/pub/test/100meg.test curl -o part3 -r5001-7500 http://mirror.internode.on.net/pub/test/100meg.test curl -o part4 -r7501- http://mirror.internode.on.net/pub/test/100meg.test The union of time it takes all four commands to run ends up being an average of 19.9 seconds over a few test runs on the same connection. There's some penalty here because I need to spend time combining the files afterwards, but if the command supported this logic internally, no doubt much of that work could be done up front as the file is downloaded. signature.asc Description: This is a digitally signed message part.
[Bug-wget] Unit test case for parse_content_range()
I've written a unit test for the parse_content_range() method. However, I haven't yet populated it with various test cases. Sharing the patch for the unit test here. I will add more test cases for this test later. Kindly do review the patch. If no one complains, I'll push it in a couple of days. -- Thanking You, Darshit Shah From 154760a79b3981f8cb5fcf7f643ae2e2579aa887 Mon Sep 17 00:00:00 2001 From: Darshit Shah dar...@gmail.com Date: Sat, 29 Aug 2015 23:08:39 +0530 Subject: [PATCH] Add unit test for parse_content_range() method * http.c (test_parse_range_header): New function to test the function for parsing the HTTP/1.1 Content-Range header. * test.[ch]: Same --- src/http.c | 38 ++ src/test.c | 1 + src/test.h | 1 + 3 files changed, 40 insertions(+) diff --git a/src/http.c b/src/http.c index e96cad7..9bba036 100644 --- a/src/http.c +++ b/src/http.c @@ -4892,6 +4892,44 @@ ensure_extension (struct http_stat *hs, const char *ext, int *dt) } #ifdef TESTING + +const char * +test_parse_range_header(void) +{ + static const struct { +const char * rangehdr; +const wgint firstbyte; +const wgint lastbyte; +const wgint length; + } test_array[] = { + { bytes 0-1000/1, 0, 1000, 1} + }; + + /* wgint *firstbyteptr = xmalloc(sizeof(wgint)); */ + wgint firstbyteptr[sizeof(wgint)]; + wgint *lastbyteptr = xmalloc(sizeof(wgint)); + wgint *lengthptr = xmalloc(sizeof(wgint)); + bool result; + for (unsigned i = 0; i countof (test_array); i++) +{ + result = parse_content_range(test_array[0].rangehdr, firstbyteptr, lastbyteptr, lengthptr); +#if 0 + printf(%ld %ld, test_array[i].firstbyte, *firstbyteptr); + printf(%ld %ld, test_array[i].lastbyte, *lastbyteptr); + printf(%ld %ld, test_array[i].length, *lengthptr); +#endif + mu_assert(test_parse_range_header: Parsing failed, result); + mu_assert(test_parse_range_header: Bad parse, test_array[i].firstbyte == *firstbyteptr + test_array[i].lastbyte == *lastbyteptr + test_array[i].length == *lengthptr); +} + + /* xfree(firstbyteptr); */ + xfree(lastbyteptr); + xfree(lengthptr); + return NULL; +} + const char * test_parse_content_disposition(void) { diff --git a/src/test.c b/src/test.c index 5278925..cb01de3 100644 --- a/src/test.c +++ b/src/test.c @@ -54,6 +54,7 @@ all_tests(void) mu_run_test (test_has_key); #endif mu_run_test (test_parse_content_disposition); + mu_run_test (test_parse_range_header); mu_run_test (test_subdir_p); mu_run_test (test_dir_matches_p); mu_run_test (test_commands_sorted); diff --git a/src/test.h b/src/test.h index f74c162..4e0e1f2 100644 --- a/src/test.h +++ b/src/test.h @@ -48,6 +48,7 @@ const char *test_has_key (void); const char *test_find_key_value (void); const char *test_find_key_values (void); const char *test_parse_content_disposition(void); +const char *test_parse_range_header(void); const char *test_commands_sorted(void); const char *test_cmd_spec_restrict_file_names(void); const char *test_is_robots_txt_url(void); -- 2.5.0
Re: [Bug-wget] Unit test case for parse_content_range()
Am Samstag, 29. August 2015, 23:13:23 schrieb Darshit Shah: I've written a unit test for the parse_content_range() method. However, I haven't yet populated it with various test cases. Sharing the patch for the unit test here. I will add more test cases for this test later. Kindly do review the patch. If no one complains, I'll push it in a couple of days. Hi Darshit, some of the 'valid' tests 0-max { bytes 0-1000/1000, 0, 1000, 1000} non0-max { bytes 1-1000/1000, 1, 1000, 1000} 0-valid { bytes 0-500/1000, 0, 500, 1000} non0-valid { bytes 1-500/1000, 1, 500, 1000} 0-(max-1) { bytes 0-999/1000, 0, 999, 1000} non0-(max-1) { bytes 1-999/1000, 1, 999, 1000} And please add some tests using =2^31 and =2^32 as values. Regards, Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] Unit test case for parse_content_range()
On Sun, Aug 30, 2015 at 12:51 AM, Tim Rühsen tim.rueh...@gmx.de wrote: Am Samstag, 29. August 2015, 23:13:23 schrieb Darshit Shah: I've written a unit test for the parse_content_range() method. However, I haven't yet populated it with various test cases. Sharing the patch for the unit test here. I will add more test cases for this test later. Kindly do review the patch. If no one complains, I'll push it in a couple of days. Hi Darshit, some of the 'valid' tests On closer inspection, some of these are *NOT* valid Header values. But Wget currently passes them. This is a parsing bug in my opinion. RFC 7233 states: A Content-Range field value is invalid if it contains a byte-range-resp that has a last-byte-pos value less than its first-byte-pos value, or a complete-length value less than or equal to its last-byte-pos value. The recipient of an invalid Content-Range MUST NOT attempt to recombine the received content with a stored representation. Based on this, the first two examples provided are illegal. Similarly a header value such as: { bytes 100-99/1000, 100, 99, 1000} should also be very clearly illegal, but Wget currently allows it. I'm not sure about the behaviour of the program on receipt of such a header, but the function should clearly be failing on this test. 0-max { bytes 0-1000/1000, 0, 1000, 1000} non0-max { bytes 1-1000/1000, 1, 1000, 1000} 0-valid { bytes 0-500/1000, 0, 500, 1000} non0-valid { bytes 1-500/1000, 1, 500, 1000} 0-(max-1) { bytes 0-999/1000, 0, 999, 1000} non0-(max-1) { bytes 1-999/1000, 1, 999, 1000} And please add some tests using =2^31 and =2^32 as values. Regards, Tim -- Thanking You, Darshit Shah