Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-06 Thread Hubert Tarasiuk
I have set up a repository on Github where you can view progress within
the project: https://github.com/jy987321/Wget/

Whenever I will obtain a useful peace of code, I will submit a patch for
it to the main repository (to avoid merge conflicts in the future).

Otherwise I will try to rebase my changes onto current state of Wget, so
expect force pushes in master-hubert branch.

W dniu 29.04.2015 o 10:02, Hubert Tarasiuk pisze:
 Hello developers,
 
 My proposal for *Speed up Wget's Download Mechanism* has been accepted
 by the mentors!
 
 There are two tasks to be done there:
 - conditional GET requests (if-modified-since) (RFC7232)
 - TCP Fast Open (RFC7413)
 
 A summarized version of my proposal is available:
 http://pliki.h.trsk.org/gsoc/wget_public.pdf
 
 IMHO it is quite obvious how the first feature should be implemented in
 Wget. However, there is some more moving around needed to use TFO. I
 have proposed two possible ways in the above PDF. Perhaps you can
 express your opinion about the approaches, or you have another idea for
 accomplishing it?
 
 Another issue I am thinking about is how to test the TFO feature. I am
 not very familiar with network API in Python, but my first idea would be
 to count the TCP segments sent and received and/or to check that the
 first packet (with SYN flag) contains data (the request). What do you think?
 
 I will be thankful for any other suggestions, as well.
 
 Have a good day,
 Hubert
 
 



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-04 Thread Ángel González

Darshit Shah wrote:
2. Regarding the socket options, we should spend some more time 
evaluating our   options. My understanding of TCP_CORK is that it may 
be a useful option for   Servers, but it doesn't really affect TCP 
clients in any useful way. This is   because TCP_CORK modifies the 
minimum TCP packet size by buffering for as   much data before sending 
it out. With the small request sizes that a HTTP   client would 
generally send, I think it is better to follow Nagle's   algorithm, 
since TCP_CORK will not afford us any noticeable advantage. On the   
other hand, it's non-portability will be a nightmare for us when 
trying to   support OSX, BSD and Windows.
Most requests won't need TCP_CORK, as the request is already small 
enough to fit in a single packet (however, it may still be useful for 
providing the needed TFO hints to the kernel).
Setting and unsetting a setsockopt is a couple of calls than can easily 
be hidden behind a #ifdef. Changing the whole application to sendto() is 
mucho more invasive, and I would be quite surprised if it worked cleanly 
in Windows.

(In addition to TFO already being Linux-only :P)





Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-01 Thread Eli Zaretskii
 Date: Fri, 01 May 2015 11:01:15 +0200
 From: Gisle Vanem gva...@yahoo.no
 CC: bug-wget@gnu.org
 
 Eli Zaretskii wrote:
 
  I don't see any threads created by run_with_timeout on my system, when
  I download the above URL.  In fact, if I set a breakpoint in
  run_with_timeout, the only 2 calls to it during the whole download are
  from getaddrinfo_with_timeout and from connect_with_timeout, both with
  timeout of zero, which calls the function synchronously, both on
  Windows and on Posix hosts.
 
  So I guess I don't see what Gisle describes as separate thread for
  HTTPS reads.  What am I missing?
 
 It depends on what 'opt.read_timeout' is globally. So
 don't use read-timeout = 0 in your wgetrc.

I don't have any such settings in my ~/.wgetrc or in system-global
wgetrc, and the value of opt.read_timeout is the default 900 if I look
at it in the debugger.

The reason seems to be different: my reading of gnutls.c is that when
Wget use GnuTLS for HTTPS connections, it indeed doesn't call
run_with_timeout.  Instead, wgnutls_read_timeout calls 'select' with
the timeout value, so the thread-launching code in run_with_timeout is
not used in this case.

By contrast, the equivalent code in openssl.c does call
run_with_timeout.

So perhaps you should try building Wget with GnuTLS, and see if you
get any perceptible speed-up.



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-01 Thread Gisle Vanem

Eli Zaretskii wrote:


I don't see any threads created by run_with_timeout on my system, when
I download the above URL.  In fact, if I set a breakpoint in
run_with_timeout, the only 2 calls to it during the whole download are
from getaddrinfo_with_timeout and from connect_with_timeout, both with
timeout of zero, which calls the function synchronously, both on
Windows and on Posix hosts.

So I guess I don't see what Gisle describes as separate thread for
HTTPS reads.  What am I missing?


It depends on what 'opt.read_timeout' is globally. So
don't use read-timeout = 0 in your wgetrc.


--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-01 Thread Eli Zaretskii
 Date: Thu, 30 Apr 2015 19:30:39 +0200
 From: Gisle Vanem gva...@yahoo.no
 CC: bug-wget@gnu.org
 
 wget -q -O NUL
   
 https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox
  Setup 37.0.2.exe
 
 results in 9931 DLL attach/detaches!
 
 For a 40 MByte file that is approx. 1 new thread per 4 kByte read.

I don't see any threads created by run_with_timeout on my system, when
I download the above URL.  In fact, if I set a breakpoint in
run_with_timeout, the only 2 calls to it during the whole download are
from getaddrinfo_with_timeout and from nnect_with_timeout, both with
timeout of zero, which calls the function synchronously, both on
Windows and on Posix hosts.

So I guess I don't see what Gisle describes as separate thread for
HTTPS reads.  What am I missing?

My Wget is built with GnuTLS, if that matters.

I guess the timing results I sent are not really interesting, given
that no extra threads are involved.



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-01 Thread Eli Zaretskii
 Date: Thu, 30 Apr 2015 19:30:39 +0200
 From: Gisle Vanem gva...@yahoo.no
 CC: bug-wget@gnu.org
 
 wget -q -O NUL
   
 https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox
  Setup 37.0.2.exe
 
 results in 9931 DLL attach/detaches!
 
 For a 40 MByte file that is approx. 1 new thread per 4 kByte read.
 I was thinking that increasing read-buffer would help. But where?
 The code is bit of a mess IMHO. Increasing the Rx buffer in
 fd_read_body() didn't help. Is this the chief in this regard?
 
 Without getting any numbers, I can see in 'Process Explorer'
 that all those run_with_timeout() calls (and no '-T0') amount
 to some more user+kernel time. I guess using a profiler is next.
 Or maybe someone knows of a Win-program that can report total
 CPU (kernel/user) time from the cmd-line?

'timep' from Windows System Programming can (let me know if you want
the source I use).  This is based on average of 2 runs of Wget 1.16.1
running on a 32-bit Windows XP, with a 30 Mbit/sec cable connection:

  real00h01m56.500s
  user00h00m00.823s
  sys 00h00m00.355s

And here's the same from a GNU/Linux machine that downloads at 20.7
MB/sec:

  real0m2.300s
  user0m1.600s
  sys 0m0.6000s

 BTW. My ISP gives me 25 Mbit/s in and 10 MBit/s out.

See above; removing -q from the command line indicates that the actual
download speed for this file is around 500 KB/sec.



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-05-01 Thread Eli Zaretskii
 Date: Thu, 30 Apr 2015 19:30:39 +0200
 From: Gisle Vanem gva...@yahoo.no
 CC: bug-wget@gnu.org
 
 wget -q -O NUL
   
 https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox
  Setup 37.0.2.exe
 
 results in 9931 DLL attach/detaches!
 
 For a 40 MByte file that is approx. 1 new thread per 4 kByte read.

I don't see any threads created by run_with_timeout on my system, when
I download the above URL.  In fact, if I set a breakpoint in
run_with_timeout, the only 2 calls to it during the whole download are
from getaddrinfo_with_timeout and from connect_with_timeout, both with
timeout of zero, which calls the function synchronously, both on
Windows and on Posix hosts.

So I guess I don't see what Gisle describes as separate thread for
HTTPS reads.  What am I missing?

My Wget is built with GnuTLS, if that matters.

I guess the Windows timing results I sent are not really interesting,
given that no extra threads are involved.



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Tim Ruehsen
Hi Hubert,

congrats for being selected for GSOC !

Here is a researchers report about testing TFO.
https://reproducingnetworkresearch.wordpress.com/2014/06/03/cs244-14-tcp-fast-open-2/

Some additional thoughts:
- TFO won't work with HTTPS as long as the used SSL library does not support 
TFO.
- Servers might not be ready to detect network-doubled SYN packets. No problem 
with GET (well, there might be corner cases or misconfigurations), but might 
be a general problem with POST.
- I am not aware how to (programmatically) detect TFO on the (test) server 
side. Maybe there is a way via /proc file system. Maybe a question for the 
kernel mailing list !?
- I recently made a patch for torify/torsocks to catch sendto(). It was 
accepted, but I am not sure if they made a new release already. At least on 
not-up-to-date systems, TFO would leak (circumvent tor network). We should 
keep this in mind for documentation.
- Because of these point, TFO should not be enabled by default.

TFO is an interesting technology. RTT is an issue that becomes more relevant 
with faster servers and faster networks. Even with the speed of light, you 
have a RTT of ~130ms from one side of the world to the other. 20.000km back 
and forth. TFO is part of the answer :-)

Have fun !

Tim

On Wednesday 29 April 2015 10:02:48 Hubert Tarasiuk wrote:
 Hello developers,
 
 My proposal for *Speed up Wget's Download Mechanism* has been accepted
 by the mentors!
 
 There are two tasks to be done there:
 - conditional GET requests (if-modified-since) (RFC7232)
 - TCP Fast Open (RFC7413)
 
 A summarized version of my proposal is available:
 http://pliki.h.trsk.org/gsoc/wget_public.pdf
 
 IMHO it is quite obvious how the first feature should be implemented in
 Wget. However, there is some more moving around needed to use TFO. I
 have proposed two possible ways in the above PDF. Perhaps you can
 express your opinion about the approaches, or you have another idea for
 accomplishing it?
 
 Another issue I am thinking about is how to test the TFO feature. I am
 not very familiar with network API in Python, but my first idea would be
 to count the TCP segments sent and received and/or to check that the
 first packet (with SYN flag) contains data (the request). What do you think?
 
 I will be thankful for any other suggestions, as well.
 
 Have a good day,
 Hubert


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Tim Ruehsen
On Thursday 30 April 2015 13:35:06 Gisle Vanem wrote:
 Tim Ruehsen wrote:
  Some additional thoughts:
  - TFO won't work with HTTPS as long as the used SSL library does not
  support TFO.
 
 Isn't SSL in Wget already rather slow? Due to the way SSL_Read()
 is called in a SIGALRM-handler or separate Win32-thread for
 all (?) HTTPS reads.
 
 'run_with_timeout()' seems to waste 1000s of good cycles per
 SSL-read (at least on Win32). Couldn't perhaps this be improved
 to do use a priori pool of e.g. 10 alarm-handlers or threads?
 Just my €0.02.

Hi Gisle,

this is a bit OT here.

Maybe you open up another thread or a bug report ?
There are not too many Windows developers around here. You are one of them and 
you have the knowledge to write a patch. I likely will be welcome if the 
improvement is either in code or measurable download time. 

BTW, 1000 cycles on a GHz CPU is 1 micro second. How much does it influence 
the overall download duration for your use case ? How often is SSL_Read called 
in a real life use-case (e.g. downloading 1GB on a 2/10/50/100 mbps 
connection).

Regards, Tim




Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Gisle Vanem

Tim Ruehsen wrote:


Some additional thoughts:
- TFO won't work with HTTPS as long as the used SSL library does not support
TFO.


Isn't SSL in Wget already rather slow? Due to the way SSL_Read()
is called in a SIGALRM-handler or separate Win32-thread for
all (?) HTTPS reads.

'run_with_timeout()' seems to waste 1000s of good cycles per
SSL-read (at least on Win32). Couldn't perhaps this be improved
to do use a priori pool of e.g. 10 alarm-handlers or threads?
Just my €0.02.

--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Tim Ruehsen
Also a pretty good view on TFO that I just stumbled upon:

https://bradleyf.id.au/nix/shaving-your-rtt-wth-tfo/

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Daniel Stenberg

On Thu, 30 Apr 2015, Gisle Vanem wrote:

Hard to tell since I didn't find any large files I could D/L via SSL. You 
have one? But some quick tests (only a 48 kByte file):


Here's a HTTPS URL that gives you a 40651008 bytes Firefox installation:

https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox%20Setup%2037.0.2.exe

... but I would guess that the faster the network/RTT gets, the bigger diff 
you'll see so if you'd run a local test server and download a 1GB file or 
something I figure the speed difference will scale up somewhat.


--

 / daniel.haxx.se



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Tim Ruehsen
On Thursday 30 April 2015 17:01:03 Daniel Stenberg wrote:
 On Thu, 30 Apr 2015, Gisle Vanem wrote:
  Hard to tell since I didn't find any large files I could D/L via SSL. You
 
  have one? But some quick tests (only a 48 kByte file):
 Here's a HTTPS URL that gives you a 40651008 bytes Firefox installation:
 
 https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32
 /en-GB/Firefox%20Setup%2037.0.2.exe
 
 ... but I would guess that the faster the network/RTT gets, the bigger diff
 you'll see so if you'd run a local test server and download a 1GB file or
 something I figure the speed difference will scale up somewhat.

Originally, Gisle talked about CPU cycles, not elapsed time.
That is quite a difference...

Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Gisle Vanem

Tim Ruehsen wrote:


BTW, 1000 cycles on a GHz CPU is 1 micro second. How much does it influence
the overall download duration for your use case ? How often is SSL_Read called
in a real life use-case (e.g. downloading 1GB on a 2/10/50/100 mbps
connection).


Hard to tell since I didn't find any large files I could D/L via SSL.
You have one? But some quick tests (only a 48 kByte file):

  wget -q -O test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html
  Elapsed: 0:00:02,35

  wget -qT0 -O test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html
  Elapsed: 0:00:01,86

  curl -so test_ssl.html https://www.ssllabs.com/ssltest/viewMyClient.html
  Elapsed: 0:00:01,79

'-T0' shouldn't create any threads (like libcurl does). Hence the same
speed (but depends on many factors).

BTW. the timer is in my 4NT shell and both Wget and curl uses exactly the
  same OpenSSL DLLs (all built with the same 32-bit MSVC v18).

Will investigate further.

--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Daniel Stenberg

On Thu, 30 Apr 2015, Tim Ruehsen wrote:


Originally, Gisle talked about CPU cycles, not elapsed time.
That is quite a difference...


Thousands of cycles per invoke * many invokes = measurable elapsed time

--

 / daniel.haxx.se



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Gisle Vanem

Daniel Stenberg wrote:


On Thu, 30 Apr 2015, Tim Ruehsen wrote:


Originally, Gisle talked about CPU cycles, not elapsed time.
That is quite a difference...


Thousands of cycles per invoke * many invokes = measurable elapsed time


True it seems, but Iv'e not tried SSL times on a local-net.
Some more info with the aid of the URL you provided:

wget -q -O NUL
 
https://download-installer.cdn.mozilla.net/pub/firefox/releases/37.0.2/win32/en-GB/Firefox
 Setup 37.0.2.exe

results in 9931 DLL attach/detaches!

For a 40 MByte file that is approx. 1 new thread per 4 kByte read.
I was thinking that increasing read-buffer would help. But where?
The code is bit of a mess IMHO. Increasing the Rx buffer in
fd_read_body() didn't help. Is this the chief in this regard?

Without getting any numbers, I can see in 'Process Explorer'
that all those run_with_timeout() calls (and no '-T0') amount
to some more user+kernel time. I guess using a profiler is next.
Or maybe someone knows of a Win-program that can report total
CPU (kernel/user) time from the cmd-line?

BTW. My ISP gives me 25 Mbit/s in and 10 MBit/s out.

--
--gv



Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Tim Rühsen
Am Donnerstag, 30. April 2015, 18:45:05 schrieb Daniel Stenberg:
 On Thu, 30 Apr 2015, Tim Ruehsen wrote:
  Originally, Gisle talked about CPU cycles, not elapsed time.
  That is quite a difference...
 
 Thousands of cycles per invoke * many invokes = measurable elapsed time

Again: That is quite a difference...

1Ghz CPU: 1cycle~1ns, means 1000*1ns = 1us (microsecond). 
But if one packet comes 10ms later (pretty normal on the network) that would 
be equal to ~10 million cycles (equal to about 10.000 calls to 
run_with_timeout, if Gisle's assumptions are right).

How could you distinguish these two, latency and wasted cycles ?

On Linux even the 'time' command is helpful here (if your downloadable is 
large enough to generate a CPU cycle footprint  few ms).
Much better is 'valgrind --tool=callgrind' plus a tool like kcachegrind.

But Gisle is on Windows... I don't know what tools are available there.

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-30 Thread Hubert Tarasiuk
Gisle, Ángel, Darshit, Tim,
Thank you all for the useful suggestions and links! I'll read through it
and think it over.

Hubert


[Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-29 Thread Hubert Tarasiuk
Hello developers,

My proposal for *Speed up Wget's Download Mechanism* has been accepted
by the mentors!

There are two tasks to be done there:
- conditional GET requests (if-modified-since) (RFC7232)
- TCP Fast Open (RFC7413)

A summarized version of my proposal is available:
http://pliki.h.trsk.org/gsoc/wget_public.pdf

IMHO it is quite obvious how the first feature should be implemented in
Wget. However, there is some more moving around needed to use TFO. I
have proposed two possible ways in the above PDF. Perhaps you can
express your opinion about the approaches, or you have another idea for
accomplishing it?

Another issue I am thinking about is how to test the TFO feature. I am
not very familiar with network API in Python, but my first idea would be
to count the TCP segments sent and received and/or to check that the
first packet (with SYN flag) contains data (the request). What do you think?

I will be thankful for any other suggestions, as well.

Have a good day,
Hubert




signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-29 Thread Ángel González

On 29/04/15 10:02, Hubert Tarasiuk wrote:

Hello developers,



There are two tasks to be done there:



- TCP Fast Open (RFC7413)

I didn't know about TFO. https://lwn.net/Articles/508865/ was a nice read!


A summarized version of my proposal is available:
http://pliki.h.trsk.org/gsoc/wget_public.pdf

IMHO it is quite obvious how the first feature should be implemented in
Wget. However, there is some more moving around needed to use TFO. I
have proposed two possible ways in the above PDF. Perhaps you can
express your opinion about the approaches, or you have another idea for
accomplishing it?
So you are asking us about using sendto vs sendmsg? I don't know if the 
kernel

supports this as a TFO knob*, but using TCP_CORK (as suggested in the lwn
comments) would be a much saner way (with benefits with non-TFO 
machines, too).


* it makes sense that setsockopt(TCP_FASTOPEN|TCP_CORK) did this.



Another issue I am thinking about is how to test the TFO feature. I am
not very familiar with network API in Python, but my first idea would be
to count the TCP segments sent and received and/or to check that the
first packet (with SYN flag) contains data (the request). What do you think?
This is very hard. If you have a remote server supporting TFO and, you 
can easily
verify if it's being used with a network sniffer. But I don't expect the 
check

to be easy to automate. Specially for localhost.





Re: [Bug-wget] GSoC15: Speed up Wget's Download Mechanism

2015-04-29 Thread Darshit Shah

Hi Hubert!

Congrats on your selection. I look forward to a great summer of code in Wget 
this time around.


On 04/29, Hubert Tarasiuk wrote:

Hello developers,

My proposal for *Speed up Wget's Download Mechanism* has been accepted
by the mentors!

There are two tasks to be done there:
- conditional GET requests (if-modified-since) (RFC7232)
- TCP Fast Open (RFC7413)

A summarized version of my proposal is available:
http://pliki.h.trsk.org/gsoc/wget_public.pdf

IMHO it is quite obvious how the first feature should be implemented in
Wget. However, there is some more moving around needed to use TFO. I
have proposed two possible ways in the above PDF. Perhaps you can
express your opinion about the approaches, or you have another idea for
accomplishing it?


There's two separate points I want to make here:

1. With respect to the changes in the Wget source, I think it is saner to merge 
  the connect methods. Just ensure that we can handle proxies and FTP 
  connections without any code duplication. I don't think there should be 
  anything special when making a HTTPS connection?
2. Regarding the socket options, we should spend some more time evaluating our 
  options. My understanding of TCP_CORK is that it may be a useful option for 
  Servers, but it doesn't really affect TCP clients in any useful way. This is 
  because TCP_CORK modifies the minimum TCP packet size by buffering for as 
  much data before sending it out. With the small request sizes that a HTTP 
  client would generally send, I think it is better to follow Nagle's 
  algorithm, since TCP_CORK will not afford us any noticeable advantage. On the 
  other hand, it's non-portability will be a nightmare for us when trying to 
  support OSX, BSD and Windows.




Another issue I am thinking about is how to test the TFO feature. I am
not very familiar with network API in Python, but my first idea would be
to count the TCP segments sent and received and/or to check that the
first packet (with SYN flag) contains data (the request). What do you think?

I haven't gone through this code thoroughly yet, but they tried to reproduce the 
results of the original TFO whitepaper using a Python HTTP Server, like the one 
we use for our test suite. Maybe we can borrow some code from them?



I will be thankful for any other suggestions, as well.

Have a good day,
Hubert






--
Thanking You,
Darshit Shah


pgpOuc3ZBItct.pgp
Description: PGP signature