Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
I have set up a repository on Github where you can view progress within the project: https://github.com/jy987321/Wget/ Whenever I will obtain a useful peace of code, I will submit a patch for it to the main repository (to avoid merge conflicts in the future). Otherwise I will try to rebase my changes onto current state of Wget, so expect force pushes in master-hubert branch. W dniu 29.04.2015 o 10:02, Hubert Tarasiuk pisze: Hello developers, My proposal for *Speed up Wget's Download Mechanism* has been accepted by the mentors! There are two tasks to be done there: - conditional GET requests (if-modified-since) (RFC7232) - TCP Fast Open (RFC7413) A summarized version of my proposal is available: http://pliki.h.trsk.org/gsoc/wget_public.pdf IMHO it is quite obvious how the first feature should be implemented in Wget. However, there is some more moving around needed to use TFO. I have proposed two possible ways in the above PDF. Perhaps you can express your opinion about the approaches, or you have another idea for accomplishing it? Another issue I am thinking about is how to test the TFO feature. I am not very familiar with network API in Python, but my first idea would be to count the TCP segments sent and received and/or to check that the first packet (with SYN flag) contains data (the request). What do you think? I will be thankful for any other suggestions, as well. Have a good day, Hubert signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Darshit Shah wrote: 2. Regarding the socket options, we should spend some more time evaluating our options. My understanding of TCP_CORK is that it may be a useful option for Servers, but it doesn't really affect TCP clients in any useful way. This is because TCP_CORK modifies the minimum TCP packet size by buffering for as much data before sending it out. With the small request sizes that a HTTP client would generally send, I think it is better to follow Nagle's algorithm, since TCP_CORK will not afford us any noticeable advantage. On the other hand, it's non-portability will be a nightmare for us when trying to support OSX, BSD and Windows. Most requests won't need TCP_CORK, as the request is already small enough to fit in a single packet (however, it may still be useful for providing the needed TFO hints to the kernel). Setting and unsetting a setsockopt is a couple of calls than can easily be hidden behind a #ifdef. Changing the whole application to sendto() is mucho more invasive, and I would be quite surprised if it worked cleanly in Windows. (In addition to TFO already being Linux-only :P)
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Date: Fri, 01 May 2015 11:01:15 +0200 From: Gisle Vanem gva...@yahoo.no CC: bug-wget@gnu.org Eli Zaretskii wrote: I don't see any threads created by run_with_timeout on my system, when I download the above URL. In fact, if I set a breakpoint in run_with_timeout, the only 2 calls to it during the whole download are from getaddrinfo_with_timeout and from connect_with_timeout, both with timeout of zero, which calls the function synchronously, both on Windows and on Posix hosts. So I guess I don't see what Gisle describes as separate thread for HTTPS reads. What am I missing? It depends on what 'opt.read_timeout' is globally. So don't use read-timeout = 0 in your wgetrc. I don't have any such settings in my ~/.wgetrc or in system-global wgetrc, and the value of opt.read_timeout is the default 900 if I look at it in the debugger. The reason seems to be different: my reading of gnutls.c is that when Wget use GnuTLS for HTTPS connections, it indeed doesn't call run_with_timeout. Instead, wgnutls_read_timeout calls 'select' with the timeout value, so the thread-launching code in run_with_timeout is not used in this case. By contrast, the equivalent code in openssl.c does call run_with_timeout. So perhaps you should try building Wget with GnuTLS, and see if you get any perceptible speed-up.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Eli Zaretskii wrote: I don't see any threads created by run_with_timeout on my system, when I download the above URL. In fact, if I set a breakpoint in run_with_timeout, the only 2 calls to it during the whole download are from getaddrinfo_with_timeout and from connect_with_timeout, both with timeout of zero, which calls the function synchronously, both on Windows and on Posix hosts. So I guess I don't see what Gisle describes as separate thread for HTTPS reads. What am I missing? It depends on what 'opt.read_timeout' is globally. So don't use read-timeout = 0 in your wgetrc. -- --gv
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Date: Thu, 30 Apr 2015 19:30:39 +0200 From: Gisle Vanem gva...@yahoo.no CC: bug-wget@gnu.org wget -q -O NUL https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox Setup 37.0.2.exe results in 9931 DLL attach/detaches! For a 40 MByte file that is approx. 1 new thread per 4 kByte read. I don't see any threads created by run_with_timeout on my system, when I download the above URL. In fact, if I set a breakpoint in run_with_timeout, the only 2 calls to it during the whole download are from getaddrinfo_with_timeout and from nnect_with_timeout, both with timeout of zero, which calls the function synchronously, both on Windows and on Posix hosts. So I guess I don't see what Gisle describes as separate thread for HTTPS reads. What am I missing? My Wget is built with GnuTLS, if that matters. I guess the timing results I sent are not really interesting, given that no extra threads are involved.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Date: Thu, 30 Apr 2015 19:30:39 +0200 From: Gisle Vanem gva...@yahoo.no CC: bug-wget@gnu.org wget -q -O NUL https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox Setup 37.0.2.exe results in 9931 DLL attach/detaches! For a 40 MByte file that is approx. 1 new thread per 4 kByte read. I was thinking that increasing read-buffer would help. But where? The code is bit of a mess IMHO. Increasing the Rx buffer in fd_read_body() didn't help. Is this the chief in this regard? Without getting any numbers, I can see in 'Process Explorer' that all those run_with_timeout() calls (and no '-T0') amount to some more user+kernel time. I guess using a profiler is next. Or maybe someone knows of a Win-program that can report total CPU (kernel/user) time from the cmd-line? 'timep' from Windows System Programming can (let me know if you want the source I use). This is based on average of 2 runs of Wget 1.16.1 running on a 32-bit Windows XP, with a 30 Mbit/sec cable connection: real00h01m56.500s user00h00m00.823s sys 00h00m00.355s And here's the same from a GNU/Linux machine that downloads at 20.7 MB/sec: real0m2.300s user0m1.600s sys 0m0.6000s BTW. My ISP gives me 25 Mbit/s in and 10 MBit/s out. See above; removing -q from the command line indicates that the actual download speed for this file is around 500 KB/sec.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Date: Thu, 30 Apr 2015 19:30:39 +0200 From: Gisle Vanem gva...@yahoo.no CC: bug-wget@gnu.org wget -q -O NUL https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox Setup 37.0.2.exe results in 9931 DLL attach/detaches! For a 40 MByte file that is approx. 1 new thread per 4 kByte read. I don't see any threads created by run_with_timeout on my system, when I download the above URL. In fact, if I set a breakpoint in run_with_timeout, the only 2 calls to it during the whole download are from getaddrinfo_with_timeout and from connect_with_timeout, both with timeout of zero, which calls the function synchronously, both on Windows and on Posix hosts. So I guess I don't see what Gisle describes as separate thread for HTTPS reads. What am I missing? My Wget is built with GnuTLS, if that matters. I guess the Windows timing results I sent are not really interesting, given that no extra threads are involved.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Hi Hubert, congrats for being selected for GSOC ! Here is a researchers report about testing TFO. https://reproducingnetworkresearch.wordpress.com/2014/06/03/cs244-14-tcp-fast-open-2/ Some additional thoughts: - TFO won't work with HTTPS as long as the used SSL library does not support TFO. - Servers might not be ready to detect network-doubled SYN packets. No problem with GET (well, there might be corner cases or misconfigurations), but might be a general problem with POST. - I am not aware how to (programmatically) detect TFO on the (test) server side. Maybe there is a way via /proc file system. Maybe a question for the kernel mailing list !? - I recently made a patch for torify/torsocks to catch sendto(). It was accepted, but I am not sure if they made a new release already. At least on not-up-to-date systems, TFO would leak (circumvent tor network). We should keep this in mind for documentation. - Because of these point, TFO should not be enabled by default. TFO is an interesting technology. RTT is an issue that becomes more relevant with faster servers and faster networks. Even with the speed of light, you have a RTT of ~130ms from one side of the world to the other. 20.000km back and forth. TFO is part of the answer :-) Have fun ! Tim On Wednesday 29 April 2015 10:02:48 Hubert Tarasiuk wrote: Hello developers, My proposal for *Speed up Wget's Download Mechanism* has been accepted by the mentors! There are two tasks to be done there: - conditional GET requests (if-modified-since) (RFC7232) - TCP Fast Open (RFC7413) A summarized version of my proposal is available: http://pliki.h.trsk.org/gsoc/wget_public.pdf IMHO it is quite obvious how the first feature should be implemented in Wget. However, there is some more moving around needed to use TFO. I have proposed two possible ways in the above PDF. Perhaps you can express your opinion about the approaches, or you have another idea for accomplishing it? Another issue I am thinking about is how to test the TFO feature. I am not very familiar with network API in Python, but my first idea would be to count the TCP segments sent and received and/or to check that the first packet (with SYN flag) contains data (the request). What do you think? I will be thankful for any other suggestions, as well. Have a good day, Hubert signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
On Thursday 30 April 2015 13:35:06 Gisle Vanem wrote: Tim Ruehsen wrote: Some additional thoughts: - TFO won't work with HTTPS as long as the used SSL library does not support TFO. Isn't SSL in Wget already rather slow? Due to the way SSL_Read() is called in a SIGALRM-handler or separate Win32-thread for all (?) HTTPS reads. 'run_with_timeout()' seems to waste 1000s of good cycles per SSL-read (at least on Win32). Couldn't perhaps this be improved to do use a priori pool of e.g. 10 alarm-handlers or threads? Just my 0.02. Hi Gisle, this is a bit OT here. Maybe you open up another thread or a bug report ? There are not too many Windows developers around here. You are one of them and you have the knowledge to write a patch. I likely will be welcome if the improvement is either in code or measurable download time. BTW, 1000 cycles on a GHz CPU is 1 micro second. How much does it influence the overall download duration for your use case ? How often is SSL_Read called in a real life use-case (e.g. downloading 1GB on a 2/10/50/100 mbps connection). Regards, Tim
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Tim Ruehsen wrote: Some additional thoughts: - TFO won't work with HTTPS as long as the used SSL library does not support TFO. Isn't SSL in Wget already rather slow? Due to the way SSL_Read() is called in a SIGALRM-handler or separate Win32-thread for all (?) HTTPS reads. 'run_with_timeout()' seems to waste 1000s of good cycles per SSL-read (at least on Win32). Couldn't perhaps this be improved to do use a priori pool of e.g. 10 alarm-handlers or threads? Just my 0.02. -- --gv
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Also a pretty good view on TFO that I just stumbled upon: https://bradleyf.id.au/nix/shaving-your-rtt-wth-tfo/ Regards, Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
On Thu, 30 Apr 2015, Gisle Vanem wrote: Hard to tell since I didn't find any large files I could D/L via SSL. You have one? But some quick tests (only a 48 kByte file): Here's a HTTPS URL that gives you a 40651008 bytes Firefox installation: https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox%20Setup%2037.0.2.exe ... but I would guess that the faster the network/RTT gets, the bigger diff you'll see so if you'd run a local test server and download a 1GB file or something I figure the speed difference will scale up somewhat. -- / daniel.haxx.se
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
On Thursday 30 April 2015 17:01:03 Daniel Stenberg wrote: On Thu, 30 Apr 2015, Gisle Vanem wrote: Hard to tell since I didn't find any large files I could D/L via SSL. You have one? But some quick tests (only a 48 kByte file): Here's a HTTPS URL that gives you a 40651008 bytes Firefox installation: https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32 /en-GB/Firefox%20Setup%2037.0.2.exe ... but I would guess that the faster the network/RTT gets, the bigger diff you'll see so if you'd run a local test server and download a 1GB file or something I figure the speed difference will scale up somewhat. Originally, Gisle talked about CPU cycles, not elapsed time. That is quite a difference... Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Tim Ruehsen wrote: BTW, 1000 cycles on a GHz CPU is 1 micro second. How much does it influence the overall download duration for your use case ? How often is SSL_Read called in a real life use-case (e.g. downloading 1GB on a 2/10/50/100 mbps connection). Hard to tell since I didn't find any large files I could D/L via SSL. You have one? But some quick tests (only a 48 kByte file): wget -q -O test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html Elapsed: 0:00:02,35 wget -qT0 -O test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html Elapsed: 0:00:01,86 curl -so test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html Elapsed: 0:00:01,79 '-T0' shouldn't create any threads (like libcurl does). Hence the same speed (but depends on many factors). BTW. the timer is in my 4NT shell and both Wget and curl uses exactly the same OpenSSL DLLs (all built with the same 32-bit MSVC v18). Will investigate further. -- --gv
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
On Thu, 30 Apr 2015, Tim Ruehsen wrote: Originally, Gisle talked about CPU cycles, not elapsed time. That is quite a difference... Thousands of cycles per invoke * many invokes = measurable elapsed time -- / daniel.haxx.se
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Daniel Stenberg wrote: On Thu, 30 Apr 2015, Tim Ruehsen wrote: Originally, Gisle talked about CPU cycles, not elapsed time. That is quite a difference... Thousands of cycles per invoke * many invokes = measurable elapsed time True it seems, but Iv'e not tried SSL times on a local-net. Some more info with the aid of the URL you provided: wget -q -O NUL https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox Setup 37.0.2.exe results in 9931 DLL attach/detaches! For a 40 MByte file that is approx. 1 new thread per 4 kByte read. I was thinking that increasing read-buffer would help. But where? The code is bit of a mess IMHO. Increasing the Rx buffer in fd_read_body() didn't help. Is this the chief in this regard? Without getting any numbers, I can see in 'Process Explorer' that all those run_with_timeout() calls (and no '-T0') amount to some more user+kernel time. I guess using a profiler is next. Or maybe someone knows of a Win-program that can report total CPU (kernel/user) time from the cmd-line? BTW. My ISP gives me 25 Mbit/s in and 10 MBit/s out. -- --gv
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Am Donnerstag, 30. April 2015, 18:45:05 schrieb Daniel Stenberg: On Thu, 30 Apr 2015, Tim Ruehsen wrote: Originally, Gisle talked about CPU cycles, not elapsed time. That is quite a difference... Thousands of cycles per invoke * many invokes = measurable elapsed time Again: That is quite a difference... 1Ghz CPU: 1cycle~1ns, means 1000*1ns = 1us (microsecond). But if one packet comes 10ms later (pretty normal on the network) that would be equal to ~10 million cycles (equal to about 10.000 calls to run_with_timeout, if Gisle's assumptions are right). How could you distinguish these two, latency and wasted cycles ? On Linux even the 'time' command is helpful here (if your downloadable is large enough to generate a CPU cycle footprint few ms). Much better is 'valgrind --tool=callgrind' plus a tool like kcachegrind. But Gisle is on Windows... I don't know what tools are available there. Regards, Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Gisle, Ángel, Darshit, Tim, Thank you all for the useful suggestions and links! I'll read through it and think it over. Hubert
[Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Hello developers, My proposal for *Speed up Wget's Download Mechanism* has been accepted by the mentors! There are two tasks to be done there: - conditional GET requests (if-modified-since) (RFC7232) - TCP Fast Open (RFC7413) A summarized version of my proposal is available: http://pliki.h.trsk.org/gsoc/wget_public.pdf IMHO it is quite obvious how the first feature should be implemented in Wget. However, there is some more moving around needed to use TFO. I have proposed two possible ways in the above PDF. Perhaps you can express your opinion about the approaches, or you have another idea for accomplishing it? Another issue I am thinking about is how to test the TFO feature. I am not very familiar with network API in Python, but my first idea would be to count the TCP segments sent and received and/or to check that the first packet (with SYN flag) contains data (the request). What do you think? I will be thankful for any other suggestions, as well. Have a good day, Hubert signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
On 29/04/15 10:02, Hubert Tarasiuk wrote: Hello developers, There are two tasks to be done there: - TCP Fast Open (RFC7413) I didn't know about TFO. https://lwn.net/Articles/508865/ was a nice read! A summarized version of my proposal is available: http://pliki.h.trsk.org/gsoc/wget_public.pdf IMHO it is quite obvious how the first feature should be implemented in Wget. However, there is some more moving around needed to use TFO. I have proposed two possible ways in the above PDF. Perhaps you can express your opinion about the approaches, or you have another idea for accomplishing it? So you are asking us about using sendto vs sendmsg? I don't know if the kernel supports this as a TFO knob*, but using TCP_CORK (as suggested in the lwn comments) would be a much saner way (with benefits with non-TFO machines, too). * it makes sense that setsockopt(TCP_FASTOPEN|TCP_CORK) did this. Another issue I am thinking about is how to test the TFO feature. I am not very familiar with network API in Python, but my first idea would be to count the TCP segments sent and received and/or to check that the first packet (with SYN flag) contains data (the request). What do you think? This is very hard. If you have a remote server supporting TFO and, you can easily verify if it's being used with a network sniffer. But I don't expect the check to be easy to automate. Specially for localhost.
Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism
Hi Hubert! Congrats on your selection. I look forward to a great summer of code in Wget this time around. On 04/29, Hubert Tarasiuk wrote: Hello developers, My proposal for *Speed up Wget's Download Mechanism* has been accepted by the mentors! There are two tasks to be done there: - conditional GET requests (if-modified-since) (RFC7232) - TCP Fast Open (RFC7413) A summarized version of my proposal is available: http://pliki.h.trsk.org/gsoc/wget_public.pdf IMHO it is quite obvious how the first feature should be implemented in Wget. However, there is some more moving around needed to use TFO. I have proposed two possible ways in the above PDF. Perhaps you can express your opinion about the approaches, or you have another idea for accomplishing it? There's two separate points I want to make here: 1. With respect to the changes in the Wget source, I think it is saner to merge the connect methods. Just ensure that we can handle proxies and FTP connections without any code duplication. I don't think there should be anything special when making a HTTPS connection? 2. Regarding the socket options, we should spend some more time evaluating our options. My understanding of TCP_CORK is that it may be a useful option for Servers, but it doesn't really affect TCP clients in any useful way. This is because TCP_CORK modifies the minimum TCP packet size by buffering for as much data before sending it out. With the small request sizes that a HTTP client would generally send, I think it is better to follow Nagle's algorithm, since TCP_CORK will not afford us any noticeable advantage. On the other hand, it's non-portability will be a nightmare for us when trying to support OSX, BSD and Windows. Another issue I am thinking about is how to test the TFO feature. I am not very familiar with network API in Python, but my first idea would be to count the TCP segments sent and received and/or to check that the first packet (with SYN flag) contains data (the request). What do you think? I haven't gone through this code thoroughly yet, but they tried to reproduce the results of the original TFO whitepaper using a Python HTTP Server, like the one we use for our test suite. Maybe we can borrow some code from them? I will be thankful for any other suggestions, as well. Have a good day, Hubert -- Thanking You, Darshit Shah pgpOuc3ZBItct.pgp Description: PGP signature