Re: [Bug-wget] bug in socket reuse when using wget -c

2017-12-15 Thread Darshit Shah
I've merged the above patches to master. They will be available with the next
version of Wget

* Darshit Shah  [171208 18:47]:
> Hi,
> 
> Thanks for your report. It is indeed a bug in Wget, as you've rightfully
> investigated. The socket still had some data which caused the next request to
> have problems.
> 
> I've attached two patches here, the first one fixes the issue. It tries to 
> read
> and discard any HTTP body still available and then re-use the socket. The
> second patch adds a test case for this scenario
> 
> * Iru Cai  [171208 17:19]:
> > Hello wget developers,
> > 
> > I found an issue when using `wget -c`, as in:
> > 
> >   https://github.com/mholt/caddy/issues/1965#issuecomment-349220927
> > 
> > By checking out the wget source code, I can confirm that it doesn't
> > drain the response body when it meets a 416 Requested Range Not
> > Satisfiable, and then the socket will be reused for the second request
> > (http get 2.dat in this case). When parse its response, it will
> > encounter the first response's body, so it failed to get the correct
> > response header. This is why you get a blank response header.
> > 
> > Hope this can be fixed.
> > 
> > Thanks,
> > Iru
> 
> 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

> From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001
> From: Darshit Shah 
> Date: Fri, 8 Dec 2017 18:13:00 +0100
> Subject: [PATCH 1/2] Don't assume a 416 response has no body
> 
> * http.c(gethttp): In case of a 416 response, try to drain the socket of
> any bytes before reusing the connection
> 
> Reported-By: Iru Cai 
> ---
>  src/http.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/src/http.c b/src/http.c
> index 95d26258..e4ff0107 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url 
> *original_url, struct http_stat *hs,
>hs->res = 0;
>/* Mark as successfully retrieved. */
>*dt |= RETROKF;
> -  if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE)
> +
> +  /* Try to maintain the keep-alive connection. It is often cheaper to
> +   * consume some bytes which have already been sent than to negotiate
> +   * a new connection. However, if the body is too large, or we don't
> +   * care about keep-alive, then simply terminate the connection */
> +  if (keep_alive &&
> +  skip_short_body (sock, contlen, chunked_transfer_encoding))
>  CLOSE_FINISH (sock);
>else
> -CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there
> -   might be more bytes in the body. */
> +CLOSE_INVALIDATE (sock);
>retval = RETRUNNEEDED;
>goto cleanup;
>  }
> -- 
> 2.15.1
> 

> From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001
> From: Darshit Shah 
> Date: Fri, 8 Dec 2017 18:41:07 +0100
> Subject: [PATCH 2/2] Add new test for 416 responses
> 
> * testenv/server/http/http_server.py: If there are multiple requests in
> which the requested range is unsatisfiable, then send a body in the in
> the 2nd response onwards
> * testenv/Test-416.py: New test to check how Wget handles 416 responses
> ---
>  testenv/Test-416.py| 53 
> ++
>  testenv/server/http/http_server.py |  8 ++
>  2 files changed, 61 insertions(+)
>  create mode 100755 testenv/Test-416.py
> 
> diff --git a/testenv/Test-416.py b/testenv/Test-416.py
> new file mode 100755
> index ..76b94213
> --- /dev/null
> +++ b/testenv/Test-416.py
> @@ -0,0 +1,53 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from misc.wget_file import WgetFile
> +
> +"""
> +Ensure that Wget behaves well when the server responds with a HTTP 416
> +status code. This test checks both cases:
> +1. Server sends no body
> +2. Server sends a body
> +"""
> +# File Definitions 
> ###
> +File1 = 
> "abababababababababababababababababababababababababababababababababab"
> +File2 = "ababababababababababababababababababab"
> +
> +A_File = WgetFile ("File1", File1)
> +B_File = WgetFile ("File1", File1)
> +
> +C_File = WgetFile ("File2", File2)
> +D_File = WgetFile ("File2", File1)
> +
> +E_File = WgetFile ("File3", File1)
> +
> +WGET_OPTIONS = "-c"
> +WGET_URLS = [["File1", "File2", "File3"]]
> +
> +Files = [[A_File, C_File, E_File]]
> +Existing_Files = [B_File, D_File]
> +
> +ExpectedReturnCode = 0
> +ExpectedDownloadedFiles = [B_File, D_File, E_File]
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files,
> +"LocalFiles": Existing_Files
> +}
> +test_options = {
> +

Re: [Bug-wget] bug in socket reuse when using wget -c

2017-12-11 Thread Darshit Shah
If there's no objections by tomorrow, I'll push the patches to master after
adding Test-46.py to testenv/Makefile.am

* Darshit Shah  [171208 18:47]:
> Hi,
> 
> Thanks for your report. It is indeed a bug in Wget, as you've rightfully
> investigated. The socket still had some data which caused the next request to
> have problems.
> 
> I've attached two patches here, the first one fixes the issue. It tries to 
> read
> and discard any HTTP body still available and then re-use the socket. The
> second patch adds a test case for this scenario
> 
> * Iru Cai  [171208 17:19]:
> > Hello wget developers,
> > 
> > I found an issue when using `wget -c`, as in:
> > 
> >   https://github.com/mholt/caddy/issues/1965#issuecomment-349220927
> > 
> > By checking out the wget source code, I can confirm that it doesn't
> > drain the response body when it meets a 416 Requested Range Not
> > Satisfiable, and then the socket will be reused for the second request
> > (http get 2.dat in this case). When parse its response, it will
> > encounter the first response's body, so it failed to get the correct
> > response header. This is why you get a blank response header.
> > 
> > Hope this can be fixed.
> > 
> > Thanks,
> > Iru
> 
> 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

> From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001
> From: Darshit Shah 
> Date: Fri, 8 Dec 2017 18:13:00 +0100
> Subject: [PATCH 1/2] Don't assume a 416 response has no body
> 
> * http.c(gethttp): In case of a 416 response, try to drain the socket of
> any bytes before reusing the connection
> 
> Reported-By: Iru Cai 
> ---
>  src/http.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/src/http.c b/src/http.c
> index 95d26258..e4ff0107 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url 
> *original_url, struct http_stat *hs,
>hs->res = 0;
>/* Mark as successfully retrieved. */
>*dt |= RETROKF;
> -  if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE)
> +
> +  /* Try to maintain the keep-alive connection. It is often cheaper to
> +   * consume some bytes which have already been sent than to negotiate
> +   * a new connection. However, if the body is too large, or we don't
> +   * care about keep-alive, then simply terminate the connection */
> +  if (keep_alive &&
> +  skip_short_body (sock, contlen, chunked_transfer_encoding))
>  CLOSE_FINISH (sock);
>else
> -CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there
> -   might be more bytes in the body. */
> +CLOSE_INVALIDATE (sock);
>retval = RETRUNNEEDED;
>goto cleanup;
>  }
> -- 
> 2.15.1
> 

> From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001
> From: Darshit Shah 
> Date: Fri, 8 Dec 2017 18:41:07 +0100
> Subject: [PATCH 2/2] Add new test for 416 responses
> 
> * testenv/server/http/http_server.py: If there are multiple requests in
> which the requested range is unsatisfiable, then send a body in the in
> the 2nd response onwards
> * testenv/Test-416.py: New test to check how Wget handles 416 responses
> ---
>  testenv/Test-416.py| 53 
> ++
>  testenv/server/http/http_server.py |  8 ++
>  2 files changed, 61 insertions(+)
>  create mode 100755 testenv/Test-416.py
> 
> diff --git a/testenv/Test-416.py b/testenv/Test-416.py
> new file mode 100755
> index ..76b94213
> --- /dev/null
> +++ b/testenv/Test-416.py
> @@ -0,0 +1,53 @@
> +#!/usr/bin/env python3
> +from sys import exit
> +from test.http_test import HTTPTest
> +from misc.wget_file import WgetFile
> +
> +"""
> +Ensure that Wget behaves well when the server responds with a HTTP 416
> +status code. This test checks both cases:
> +1. Server sends no body
> +2. Server sends a body
> +"""
> +# File Definitions 
> ###
> +File1 = 
> "abababababababababababababababababababababababababababababababababab"
> +File2 = "ababababababababababababababababababab"
> +
> +A_File = WgetFile ("File1", File1)
> +B_File = WgetFile ("File1", File1)
> +
> +C_File = WgetFile ("File2", File2)
> +D_File = WgetFile ("File2", File1)
> +
> +E_File = WgetFile ("File3", File1)
> +
> +WGET_OPTIONS = "-c"
> +WGET_URLS = [["File1", "File2", "File3"]]
> +
> +Files = [[A_File, C_File, E_File]]
> +Existing_Files = [B_File, D_File]
> +
> +ExpectedReturnCode = 0
> +ExpectedDownloadedFiles = [B_File, D_File, E_File]
> +
> + Pre and Post Test Hooks 
> #
> +pre_test = {
> +"ServerFiles"   : Files,
> +"LocalFiles": Existing_Files
> +}
> 

Re: [Bug-wget] bug in socket reuse when using wget -c

2017-12-08 Thread Darshit Shah
Hi,

Thanks for your report. It is indeed a bug in Wget, as you've rightfully
investigated. The socket still had some data which caused the next request to
have problems.

I've attached two patches here, the first one fixes the issue. It tries to read
and discard any HTTP body still available and then re-use the socket. The
second patch adds a test case for this scenario

* Iru Cai  [171208 17:19]:
> Hello wget developers,
> 
> I found an issue when using `wget -c`, as in:
> 
>   https://github.com/mholt/caddy/issues/1965#issuecomment-349220927
> 
> By checking out the wget source code, I can confirm that it doesn't
> drain the response body when it meets a 416 Requested Range Not
> Satisfiable, and then the socket will be reused for the second request
> (http get 2.dat in this case). When parse its response, it will
> encounter the first response's body, so it failed to get the correct
> response header. This is why you get a blank response header.
> 
> Hope this can be fixed.
> 
> Thanks,
> Iru



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001
From: Darshit Shah 
Date: Fri, 8 Dec 2017 18:13:00 +0100
Subject: [PATCH 1/2] Don't assume a 416 response has no body

* http.c(gethttp): In case of a 416 response, try to drain the socket of
any bytes before reusing the connection

Reported-By: Iru Cai 
---
 src/http.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/http.c b/src/http.c
index 95d26258..e4ff0107 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url *original_url, 
struct http_stat *hs,
   hs->res = 0;
   /* Mark as successfully retrieved. */
   *dt |= RETROKF;
-  if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE)
+
+  /* Try to maintain the keep-alive connection. It is often cheaper to
+   * consume some bytes which have already been sent than to negotiate
+   * a new connection. However, if the body is too large, or we don't
+   * care about keep-alive, then simply terminate the connection */
+  if (keep_alive &&
+  skip_short_body (sock, contlen, chunked_transfer_encoding))
 CLOSE_FINISH (sock);
   else
-CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there
-   might be more bytes in the body. */
+CLOSE_INVALIDATE (sock);
   retval = RETRUNNEEDED;
   goto cleanup;
 }
-- 
2.15.1

From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001
From: Darshit Shah 
Date: Fri, 8 Dec 2017 18:41:07 +0100
Subject: [PATCH 2/2] Add new test for 416 responses

* testenv/server/http/http_server.py: If there are multiple requests in
which the requested range is unsatisfiable, then send a body in the in
the 2nd response onwards
* testenv/Test-416.py: New test to check how Wget handles 416 responses
---
 testenv/Test-416.py| 53 ++
 testenv/server/http/http_server.py |  8 ++
 2 files changed, 61 insertions(+)
 create mode 100755 testenv/Test-416.py

diff --git a/testenv/Test-416.py b/testenv/Test-416.py
new file mode 100755
index ..76b94213
--- /dev/null
+++ b/testenv/Test-416.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python3
+from sys import exit
+from test.http_test import HTTPTest
+from misc.wget_file import WgetFile
+
+"""
+Ensure that Wget behaves well when the server responds with a HTTP 416
+status code. This test checks both cases:
+1. Server sends no body
+2. Server sends a body
+"""
+# File Definitions ###
+File1 = "abababababababababababababababababababababababababababababababababab"
+File2 = "ababababababababababababababababababab"
+
+A_File = WgetFile ("File1", File1)
+B_File = WgetFile ("File1", File1)
+
+C_File = WgetFile ("File2", File2)
+D_File = WgetFile ("File2", File1)
+
+E_File = WgetFile ("File3", File1)
+
+WGET_OPTIONS = "-c"
+WGET_URLS = [["File1", "File2", "File3"]]
+
+Files = [[A_File, C_File, E_File]]
+Existing_Files = [B_File, D_File]
+
+ExpectedReturnCode = 0
+ExpectedDownloadedFiles = [B_File, D_File, E_File]
+
+ Pre and Post Test Hooks #
+pre_test = {
+"ServerFiles"   : Files,
+"LocalFiles": Existing_Files
+}
+test_options = {
+"WgetCommands"  : WGET_OPTIONS,
+"Urls"  : WGET_URLS
+}
+post_test = {
+"ExpectedFiles" : ExpectedDownloadedFiles,
+"ExpectedRetcode"   : ExpectedReturnCode
+}
+
+err = HTTPTest (
+pre_hook=pre_test,
+test_params=test_options,
+post_hook=post_test
+).begin ()
+
+exit (err)
diff --git a/testenv/server/http/http_server.py 

[Bug-wget] bug in socket reuse when using wget -c

2017-12-08 Thread Iru Cai
Hello wget developers,

I found an issue when using `wget -c`, as in:

  https://github.com/mholt/caddy/issues/1965#issuecomment-349220927

By checking out the wget source code, I can confirm that it doesn't
drain the response body when it meets a 416 Requested Range Not
Satisfiable, and then the socket will be reused for the second request
(http get 2.dat in this case). When parse its response, it will
encounter the first response's body, so it failed to get the correct
response header. This is why you get a blank response header.

Hope this can be fixed.

Thanks,
Iru


signature.asc
Description: PGP signature