Re: [Bug-wget] bug in socket reuse when using wget -c
I've merged the above patches to master. They will be available with the next version of Wget * Darshit Shah[171208 18:47]: > Hi, > > Thanks for your report. It is indeed a bug in Wget, as you've rightfully > investigated. The socket still had some data which caused the next request to > have problems. > > I've attached two patches here, the first one fixes the issue. It tries to > read > and discard any HTTP body still available and then re-use the socket. The > second patch adds a test case for this scenario > > * Iru Cai [171208 17:19]: > > Hello wget developers, > > > > I found an issue when using `wget -c`, as in: > > > > https://github.com/mholt/caddy/issues/1965#issuecomment-349220927 > > > > By checking out the wget source code, I can confirm that it doesn't > > drain the response body when it meets a 416 Requested Range Not > > Satisfiable, and then the socket will be reused for the second request > > (http get 2.dat in this case). When parse its response, it will > > encounter the first response's body, so it failed to get the correct > > response header. This is why you get a blank response header. > > > > Hope this can be fixed. > > > > Thanks, > > Iru > > > > -- > Thanking You, > Darshit Shah > PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6 > From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001 > From: Darshit Shah > Date: Fri, 8 Dec 2017 18:13:00 +0100 > Subject: [PATCH 1/2] Don't assume a 416 response has no body > > * http.c(gethttp): In case of a 416 response, try to drain the socket of > any bytes before reusing the connection > > Reported-By: Iru Cai > --- > src/http.c | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/src/http.c b/src/http.c > index 95d26258..e4ff0107 100644 > --- a/src/http.c > +++ b/src/http.c > @@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url > *original_url, struct http_stat *hs, >hs->res = 0; >/* Mark as successfully retrieved. */ >*dt |= RETROKF; > - if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE) > + > + /* Try to maintain the keep-alive connection. It is often cheaper to > + * consume some bytes which have already been sent than to negotiate > + * a new connection. However, if the body is too large, or we don't > + * care about keep-alive, then simply terminate the connection */ > + if (keep_alive && > + skip_short_body (sock, contlen, chunked_transfer_encoding)) > CLOSE_FINISH (sock); >else > -CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there > - might be more bytes in the body. */ > +CLOSE_INVALIDATE (sock); >retval = RETRUNNEEDED; >goto cleanup; > } > -- > 2.15.1 > > From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001 > From: Darshit Shah > Date: Fri, 8 Dec 2017 18:41:07 +0100 > Subject: [PATCH 2/2] Add new test for 416 responses > > * testenv/server/http/http_server.py: If there are multiple requests in > which the requested range is unsatisfiable, then send a body in the in > the 2nd response onwards > * testenv/Test-416.py: New test to check how Wget handles 416 responses > --- > testenv/Test-416.py| 53 > ++ > testenv/server/http/http_server.py | 8 ++ > 2 files changed, 61 insertions(+) > create mode 100755 testenv/Test-416.py > > diff --git a/testenv/Test-416.py b/testenv/Test-416.py > new file mode 100755 > index ..76b94213 > --- /dev/null > +++ b/testenv/Test-416.py > @@ -0,0 +1,53 @@ > +#!/usr/bin/env python3 > +from sys import exit > +from test.http_test import HTTPTest > +from misc.wget_file import WgetFile > + > +""" > +Ensure that Wget behaves well when the server responds with a HTTP 416 > +status code. This test checks both cases: > +1. Server sends no body > +2. Server sends a body > +""" > +# File Definitions > ### > +File1 = > "abababababababababababababababababababababababababababababababababab" > +File2 = "ababababababababababababababababababab" > + > +A_File = WgetFile ("File1", File1) > +B_File = WgetFile ("File1", File1) > + > +C_File = WgetFile ("File2", File2) > +D_File = WgetFile ("File2", File1) > + > +E_File = WgetFile ("File3", File1) > + > +WGET_OPTIONS = "-c" > +WGET_URLS = [["File1", "File2", "File3"]] > + > +Files = [[A_File, C_File, E_File]] > +Existing_Files = [B_File, D_File] > + > +ExpectedReturnCode = 0 > +ExpectedDownloadedFiles = [B_File, D_File, E_File] > + > + Pre and Post Test Hooks > # > +pre_test = { > +"ServerFiles" : Files, > +"LocalFiles": Existing_Files > +} > +test_options = { > +
Re: [Bug-wget] bug in socket reuse when using wget -c
If there's no objections by tomorrow, I'll push the patches to master after adding Test-46.py to testenv/Makefile.am * Darshit Shah[171208 18:47]: > Hi, > > Thanks for your report. It is indeed a bug in Wget, as you've rightfully > investigated. The socket still had some data which caused the next request to > have problems. > > I've attached two patches here, the first one fixes the issue. It tries to > read > and discard any HTTP body still available and then re-use the socket. The > second patch adds a test case for this scenario > > * Iru Cai [171208 17:19]: > > Hello wget developers, > > > > I found an issue when using `wget -c`, as in: > > > > https://github.com/mholt/caddy/issues/1965#issuecomment-349220927 > > > > By checking out the wget source code, I can confirm that it doesn't > > drain the response body when it meets a 416 Requested Range Not > > Satisfiable, and then the socket will be reused for the second request > > (http get 2.dat in this case). When parse its response, it will > > encounter the first response's body, so it failed to get the correct > > response header. This is why you get a blank response header. > > > > Hope this can be fixed. > > > > Thanks, > > Iru > > > > -- > Thanking You, > Darshit Shah > PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6 > From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001 > From: Darshit Shah > Date: Fri, 8 Dec 2017 18:13:00 +0100 > Subject: [PATCH 1/2] Don't assume a 416 response has no body > > * http.c(gethttp): In case of a 416 response, try to drain the socket of > any bytes before reusing the connection > > Reported-By: Iru Cai > --- > src/http.c | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/src/http.c b/src/http.c > index 95d26258..e4ff0107 100644 > --- a/src/http.c > +++ b/src/http.c > @@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url > *original_url, struct http_stat *hs, >hs->res = 0; >/* Mark as successfully retrieved. */ >*dt |= RETROKF; > - if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE) > + > + /* Try to maintain the keep-alive connection. It is often cheaper to > + * consume some bytes which have already been sent than to negotiate > + * a new connection. However, if the body is too large, or we don't > + * care about keep-alive, then simply terminate the connection */ > + if (keep_alive && > + skip_short_body (sock, contlen, chunked_transfer_encoding)) > CLOSE_FINISH (sock); >else > -CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there > - might be more bytes in the body. */ > +CLOSE_INVALIDATE (sock); >retval = RETRUNNEEDED; >goto cleanup; > } > -- > 2.15.1 > > From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001 > From: Darshit Shah > Date: Fri, 8 Dec 2017 18:41:07 +0100 > Subject: [PATCH 2/2] Add new test for 416 responses > > * testenv/server/http/http_server.py: If there are multiple requests in > which the requested range is unsatisfiable, then send a body in the in > the 2nd response onwards > * testenv/Test-416.py: New test to check how Wget handles 416 responses > --- > testenv/Test-416.py| 53 > ++ > testenv/server/http/http_server.py | 8 ++ > 2 files changed, 61 insertions(+) > create mode 100755 testenv/Test-416.py > > diff --git a/testenv/Test-416.py b/testenv/Test-416.py > new file mode 100755 > index ..76b94213 > --- /dev/null > +++ b/testenv/Test-416.py > @@ -0,0 +1,53 @@ > +#!/usr/bin/env python3 > +from sys import exit > +from test.http_test import HTTPTest > +from misc.wget_file import WgetFile > + > +""" > +Ensure that Wget behaves well when the server responds with a HTTP 416 > +status code. This test checks both cases: > +1. Server sends no body > +2. Server sends a body > +""" > +# File Definitions > ### > +File1 = > "abababababababababababababababababababababababababababababababababab" > +File2 = "ababababababababababababababababababab" > + > +A_File = WgetFile ("File1", File1) > +B_File = WgetFile ("File1", File1) > + > +C_File = WgetFile ("File2", File2) > +D_File = WgetFile ("File2", File1) > + > +E_File = WgetFile ("File3", File1) > + > +WGET_OPTIONS = "-c" > +WGET_URLS = [["File1", "File2", "File3"]] > + > +Files = [[A_File, C_File, E_File]] > +Existing_Files = [B_File, D_File] > + > +ExpectedReturnCode = 0 > +ExpectedDownloadedFiles = [B_File, D_File, E_File] > + > + Pre and Post Test Hooks > # > +pre_test = { > +"ServerFiles" : Files, > +"LocalFiles": Existing_Files > +} >
Re: [Bug-wget] bug in socket reuse when using wget -c
Hi, Thanks for your report. It is indeed a bug in Wget, as you've rightfully investigated. The socket still had some data which caused the next request to have problems. I've attached two patches here, the first one fixes the issue. It tries to read and discard any HTTP body still available and then re-use the socket. The second patch adds a test case for this scenario * Iru Cai[171208 17:19]: > Hello wget developers, > > I found an issue when using `wget -c`, as in: > > https://github.com/mholt/caddy/issues/1965#issuecomment-349220927 > > By checking out the wget source code, I can confirm that it doesn't > drain the response body when it meets a 416 Requested Range Not > Satisfiable, and then the socket will be reused for the second request > (http get 2.dat in this case). When parse its response, it will > encounter the first response's body, so it failed to get the correct > response header. This is why you get a blank response header. > > Hope this can be fixed. > > Thanks, > Iru -- Thanking You, Darshit Shah PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6 From a0ffc151036c3d63f153ab3a3d8a30994c47fedf Mon Sep 17 00:00:00 2001 From: Darshit Shah Date: Fri, 8 Dec 2017 18:13:00 +0100 Subject: [PATCH 1/2] Don't assume a 416 response has no body * http.c(gethttp): In case of a 416 response, try to drain the socket of any bytes before reusing the connection Reported-By: Iru Cai --- src/http.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/http.c b/src/http.c index 95d26258..e4ff0107 100644 --- a/src/http.c +++ b/src/http.c @@ -3969,11 +3969,16 @@ gethttp (const struct url *u, struct url *original_url, struct http_stat *hs, hs->res = 0; /* Mark as successfully retrieved. */ *dt |= RETROKF; - if (statcode == HTTP_STATUS_RANGE_NOT_SATISFIABLE) + + /* Try to maintain the keep-alive connection. It is often cheaper to + * consume some bytes which have already been sent than to negotiate + * a new connection. However, if the body is too large, or we don't + * care about keep-alive, then simply terminate the connection */ + if (keep_alive && + skip_short_body (sock, contlen, chunked_transfer_encoding)) CLOSE_FINISH (sock); else -CLOSE_INVALIDATE (sock);/* would be CLOSE_FINISH, but there - might be more bytes in the body. */ +CLOSE_INVALIDATE (sock); retval = RETRUNNEEDED; goto cleanup; } -- 2.15.1 From c17b04767a1b58ee8f9db53af431ef1e63b5 Mon Sep 17 00:00:00 2001 From: Darshit Shah Date: Fri, 8 Dec 2017 18:41:07 +0100 Subject: [PATCH 2/2] Add new test for 416 responses * testenv/server/http/http_server.py: If there are multiple requests in which the requested range is unsatisfiable, then send a body in the in the 2nd response onwards * testenv/Test-416.py: New test to check how Wget handles 416 responses --- testenv/Test-416.py| 53 ++ testenv/server/http/http_server.py | 8 ++ 2 files changed, 61 insertions(+) create mode 100755 testenv/Test-416.py diff --git a/testenv/Test-416.py b/testenv/Test-416.py new file mode 100755 index ..76b94213 --- /dev/null +++ b/testenv/Test-416.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python3 +from sys import exit +from test.http_test import HTTPTest +from misc.wget_file import WgetFile + +""" +Ensure that Wget behaves well when the server responds with a HTTP 416 +status code. This test checks both cases: +1. Server sends no body +2. Server sends a body +""" +# File Definitions ### +File1 = "abababababababababababababababababababababababababababababababababab" +File2 = "ababababababababababababababababababab" + +A_File = WgetFile ("File1", File1) +B_File = WgetFile ("File1", File1) + +C_File = WgetFile ("File2", File2) +D_File = WgetFile ("File2", File1) + +E_File = WgetFile ("File3", File1) + +WGET_OPTIONS = "-c" +WGET_URLS = [["File1", "File2", "File3"]] + +Files = [[A_File, C_File, E_File]] +Existing_Files = [B_File, D_File] + +ExpectedReturnCode = 0 +ExpectedDownloadedFiles = [B_File, D_File, E_File] + + Pre and Post Test Hooks # +pre_test = { +"ServerFiles" : Files, +"LocalFiles": Existing_Files +} +test_options = { +"WgetCommands" : WGET_OPTIONS, +"Urls" : WGET_URLS +} +post_test = { +"ExpectedFiles" : ExpectedDownloadedFiles, +"ExpectedRetcode" : ExpectedReturnCode +} + +err = HTTPTest ( +pre_hook=pre_test, +test_params=test_options, +post_hook=post_test +).begin () + +exit (err) diff --git a/testenv/server/http/http_server.py
[Bug-wget] bug in socket reuse when using wget -c
Hello wget developers, I found an issue when using `wget -c`, as in: https://github.com/mholt/caddy/issues/1965#issuecomment-349220927 By checking out the wget source code, I can confirm that it doesn't drain the response body when it meets a 416 Requested Range Not Satisfiable, and then the socket will be reused for the second request (http get 2.dat in this case). When parse its response, it will encounter the first response's body, so it failed to get the correct response header. This is why you get a blank response header. Hope this can be fixed. Thanks, Iru signature.asc Description: PGP signature