Re: [PATCH] always multi v5
On Jan 21, 2013, at 6:33 PM, Nick Zitzmann n...@chronosnet.com wrote: But it started happening once the change was made; that test used to pass. Would it help if I ran all tests and then submitted what was logged prior to the stall? Well, I did it anyway, and according to trace254 it appears that after some recent changes, curl can no longer interpret the response to an FTP request to enter passive mode on my system: : 227 Entering Passive Mode (::1,246,130). 11:36:36.937919 == Info: Couldn't interpret the 227-response And test 255 gets up to this point and then stalls: 11:36:47.008582 = Send header, 5 bytes (0x5) : PWD 11:36:47.008595 == Info: FTP 0x7f9abb0118e8 (line 857) state change from PASS to PWD 11:36:47.010172 = Recv header, 30 bytes (0x1e) : 257 / is current directory So could the person responsible for FTP look into these? Here's the contents of my log folder at the time of the stall in test 255. Nick Zitzmann http://www.chronosnet.com/ logs254-255.tar.gz Description: GNU Zip compressed data --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Fri, 18 Jan 2013, Nick Zitzmann wrote: I tried running the test suite on this build. Test 254 always fails, and test 255 causes the curl tool to stall indefinitely, blocking execution of later tests. Here's a sample of curl in that test: Thanks for testing. Let's address the failures one at a time. Run test 254 only, and when it fails please tar up the entire tests/logs/ directory and post the contents here so that I can get a change to investigate all possible details that were recorded! -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Jan 21, 2013, at 3:02 AM, Daniel Stenberg dan...@haxx.se wrote: Let's address the failures one at a time. Run test 254 only, and when it fails please tar up the entire tests/logs/ directory and post the contents here so that I can get a change to investigate all possible details that were recorded! Hope this helps then! Nick Zitzmann http://www.chronosnet.com/ log254.tar.gz Description: GNU Zip compressed data --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Mon, 21 Jan 2013, Nick Zitzmann wrote: Let's address the failures one at a time. Run test 254 only, and when it fails please tar up the entire tests/logs/ directory and post the contents here so that I can get a change to investigate all possible details that were recorded! Hope this helps then! This seems to be related to the test server or at least sockfilt, which isn't part of the always-multi change: $ cat ftp_ipv6_server.log 11:04:30.683105 Failed to read input 11:04:30.683343 Error: FTP-IPv6 server, read zero 11:04:30.683660 Exited from sysread_or_die() at ./ftpserver.pl line 417. FTP-IPv6 server, read zero -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Jan 21, 2013, at 12:25 PM, Daniel Stenberg dan...@haxx.se wrote: This seems to be related to the test server or at least sockfilt, which isn't part of the always-multi change: $ cat ftp_ipv6_server.log 11:04:30.683105 Failed to read input 11:04:30.683343 Error: FTP-IPv6 server, read zero 11:04:30.683660 Exited from sysread_or_die() at ./ftpserver.pl line 417. FTP-IPv6 server, read zero But it started happening once the change was made; that test used to pass. Would it help if I ran all tests and then submitted what was logged prior to the stall? Nick Zitzmann http://www.chronosnet.com/ --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
RE: [PATCH] always multi v5
Hi Daniel, On Thu, 17 Jan 2013, Daniel Stenberg wrote: Great. I intend to merge my always-multi patch now. Or at least during this evening. Whilst reviewing some of my code (the non-blocking ssl connect in pop3.c) I got thinking about the _easy_statemach() functions that exist in ftp.c, pop3.c, smtp.c and imap.c Excuse me if this is a rather daft question but I was wondering are those functions, along with Curl_pp_easy_statemach() in pingpong.c, still needed now or should we be fixing up the code to use the _multi_statemach() functions instead? Kind Regards Steve --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
RE: [PATCH] always multi v5
On Fri, 18 Jan 2013, Steve Holme wrote: Whilst reviewing some of my code (the non-blocking ssl connect in pop3.c) I got thinking about the _easy_statemach() functions that exist in ftp.c, pop3.c, smtp.c and imap.c Excuse me if this is a rather daft question but I was wondering are those functions, along with Curl_pp_easy_statemach() in pingpong.c, still needed now or should we be fixing up the code to use the _multi_statemach() functions instead? It is a fair question! I think perhaps we should consider renaming them since they are no longer in fact used for driving any easy interface state machine. They are however still used for a blocking-mode state machine: ftp_quit() is the only user of ftp_easy_statemach() since it (still) relies on a synchronous function to do the full request and response. The other protocols have similar uses. -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
RE: [PATCH] always multi v5
Hi all, On Fri, 18 Jan 2013, Daniel Stenberg wrote: Excuse me if this is a rather daft question but I was wondering are those functions, along with Curl_pp_easy_statemach() in pingpong.c, still needed now or should we be fixing up the code to use the _multi_statemach() functions instead? It is a fair question! I think perhaps we should consider renaming them since they are no longer in fact used for driving any easy interface state machine. I agree - I think it would be less confusing for anyone reading the code to know that the easy state machine doesn't exist anymore, but does anyone have any strong opinion for not doing this? Additionally, does anyone have any suggestions as to what we should name these functions? Perhaps: 1) xxx_blocking_statemach() - For example ftp_blocking_statemach() and pop3_blocking_statemach() 2) Curl_pp_blocking_statemach() instead of Curl_pp_easy_statemach() They are however still used for a blocking-mode state machine: ftp_quit() is the only user of ftp_easy_statemach() since it (still) relies on a synchronous function to do the full request and response. The other protocols have similar uses. A quick review of the code reveals that: * In ftp.c, ftp_quit() is the only user of ftp_easy_statemach() * In imap.c, imap_logout() is the only user of imap_easy_statemach() * In pop3.c, pop3_quit() is the only user of pop3_easy_statemach() * smtp.c calls smtp_easy_statemach() twice - once in smtp_quit() and the other in smtp_done() - note there is a huge TODO comment in the latter about its usage! * ftp_easy_statemach(), imap_easy_statemach(), pop3_easy_statemach() and smtp_easy_statemach() are the only users of Curl_pp_blocking_statemach(). I'm happy to do the rename but I would appreciate a bit of guidance as to whether or not we should 1) Do this at all and 2) Do this for the v7.29 release? Kind Regards Steve --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
Hi guys, the best solution would be to eliminate all the blocking stuff, of course. Then Curl_pp_easy_statemach() could be removed, Curl_pp_multi_statemach() renamed to just Curl_pp_statemach() and the PP-based protocols could not get stuck in an unexpected blocking call anymore. But that would apparently need some deeper changes. * smtp.c calls smtp_easy_statemach() twice - once in smtp_quit() and the other in smtp_done() - note there is a huge TODO comment in the latter about its usage! Please note a similar hack will be needed in IMAP for proper FETCH and APPEND handling. For some protocols it is just necessary to send or receive some data in the DONE phase, after the data has been transferred - ssh.c contains a similar code and even the same TODO note in ssh_done(). If this is resolved as the note suggests, e.g. by allowing the done handler to return CURLE_AGAIN, these cases could be made fully non-blocking. But there is the part with Curl_done() invokes on several places in the code, which would need to know about this and handle it right. * In ftp.c, ftp_quit() is the only user of ftp_easy_statemach() * In imap.c, imap_logout() is the only user of imap_easy_statemach() * In pop3.c, pop3_quit() is the only user of pop3_easy_statemach() These logout-type functions are invoked from the disconnect handler from Curl_disconnect(), which is called from various places and I'm not sure there are provisions for asynchronous operation in all of them. Perhaps a similar situation as with Curl_done() above. So maybe the blocking state machine loop will have to stay after all... 1) xxx_blocking_statemach() - For example ftp_blocking_statemach() What about xxx_statemach_loop()? If that's what it does, loops calling the state machine again and again till it's finished. Kind regards, Jiri --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Jan 17, 2013, at 1:49 PM, Daniel Stenberg dan...@haxx.se wrote: On Thu, 17 Jan 2013, Daniel Stenberg wrote: Great. I intend to merge my always-multi patch now. Or at least during this evening. Thanks for all your help with this! And it has been pushed as commit c43127414d89ccb9. A repeated lengthier explanation of the work: http://daniel.haxx.se/blog/2013/01/17/internally-were-all-multi-now/ I tried running the test suite on this build. Test 254 always fails, and test 255 causes the curl tool to stall indefinitely, blocking execution of later tests. Here's a sample of curl in that test: Call graph: 2751 Thread_429482 DispatchQueue_1: com.apple.main-thread (serial) 2751 start (in libdyld.dylib) + 1 [0x7fff936847e1] 2751 main (in curl) + 142 [0x103ed8f6e] tool_main.c:94 2751 operate (in curl) + 25325 [0x103edfc9d] tool_operate.c:1356 2750 curl_easy_perform (in curl) + 320 [0x103f28fd0] easy.c:445 + 2749 curl_multi_wait (in curl) + 971 [0x103f34ddb] multi.c:885 + ! 2749 Curl_poll (in curl) + 1474 [0x103f3f7c2] select.c:473 + ! 2749 __select (in libsystem_kernel.dylib) + 10 [0x7fff8d4ad322] + 1 curl_multi_wait (in curl) + 1021 [0x103f34e0d] multi.c:889 + 1 curl_dofree (in curl) + 185 [0x103f2e709] memdebug.c:307 + 1 curl_memlog (in curl) + 422 [0x103f2ed46] memdebug.c:433 + 1 curl_mvsnprintf (in curl) + 86 [0x103f1e5b6] mprintf.c:1031 + 1 dprintf_formatf (in curl) + 205 [0x103f1cebd] mprintf.c:671 + 1 addbyter (in curl) + 19 [0x103f1e4e3] mprintf.c:1009 1 curl_easy_perform (in curl) + 352 [0x103f28ff0] easy.c:448 1 curl_multi_perform (in curl) + 232 [0x103f36aa8] multi.c:1708 1 multi_runsingle (in curl) + 6710 [0x103f36886] multi.c:1653 1 Curl_pgrsUpdate (in curl) + 3348 [0x103eed2d4] progress.c:449 1 curl_mfprintf (in curl) + 374 [0x103f1f176] mprintf.c:1176 1 dprintf_formatf (in curl) + 3470 [0x103f1db7e] mprintf.c:877 1 fputc (in libsystem_c.dylib) + 96 [0x7fff934efe72] 1 __swbuf (in libsystem_c.dylib) + 180 [0x7fff934b2e02] 1 __sflush (in libsystem_c.dylib) + 94 [0x7fff934eead8] 1 _swrite (in libsystem_c.dylib) + 87 [0x7fff934f3857] 1 __write_nocancel (in libsystem_kernel.dylib) + 10 [0x7fff8d4ad6fe] Is there anything I can do to help resolve the issue, and if so, then how can I help? Nick Zitzmann http://www.chronosnet.com/ --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
Commit https://github.com/bagder/curl/commit/9fd88abb pushed now for autobuild testing, solves the problems shown locally with test cases 1059 and 1316. The patch has been tested without 'always multi' on top of commit 4ed6b07a, with test suite passing ok. Additionally, patch tested with 'always multi'... [1] 'always-multi-v5' applied on top of commit fea3a3d, plus... [2] 'retry handling patch' Test suite also passes ok. (Except 238 which now is handled differently [3]). Currently I have no functional pending issues with the 'always multi' modification. Two remaining nits would be: *) Extract into a different patch the 'non always multi' stuff, modification/addition of: tests/Makefile.am tests/runtests.pl tests/valgrind.supp *) lib/ftp.c lines 104 and 784 use DEBUGBUILD instead of CURLDEBUG, Thanks, [1] http://curl.haxx.se/mail/lib-2013-01/0108.html [2] http://curl.haxx.se/mail/lib-2013-01/0170.html [3] https://github.com/bagder/curl/commit/533c31b7 -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Thu, 17 Jan 2013, Yang Tse wrote: Commit https://github.com/bagder/curl/commit/9fd88abb pushed now for autobuild testing, solves the problems shown locally with test cases 1059 and 1316. Currently I have no functional pending issues with the 'always multi' modification. Great. I intend to merge my always-multi patch now. Or at least during this evening. Thanks for all your help with this! Two remaining nits would be: *) Extract into a different patch the 'non always multi' stuff, modification/addition of: tests/Makefile.am tests/runtests.pl tests/valgrind.supp I thought about that but decided not to. Mostly because the valgrind.supp I made doesn't apply in this shape without the patch (due to the different code paths with and without the patch) and I don't think it is worth the effort to fix it first without the patch only to change it again when the patch is applied. *) lib/ftp.c lines 104 and 784 use DEBUGBUILD instead of CURLDEBUG, Ack, thanks. I'll fix this before I merge. -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Thu, 17 Jan 2013, Daniel Stenberg wrote: Great. I intend to merge my always-multi patch now. Or at least during this evening. Thanks for all your help with this! And it has been pushed as commit c43127414d89ccb9. A repeated lengthier explanation of the work: http://daniel.haxx.se/blog/2013/01/17/internally-were-all-multi-now/ -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Tue, Jan 15, 2013, Daniel Stenberg wrote: I run this and compare the traces and that tells Curl_proxyCONNECT() seems to not be working entirely correct for you, although I haven't been able to pinpoint why/how. [...] To me it looks as if in your case it doesn't properly detect the end of the response (headers) so it continues to wait and then eventually the connection gets closed and it gets upset. Can you sprinkle some debug logs in there to see if you can figure it out? Attached trace1059 shows that first call to Curl_proxyCONNECT() returns CURLE_OK without having actually connected to the proxy, awaiting to be called again. No problem here. socket is actually connected but it has not detected activity yet. The problem is in Curl_protocol_connect() Curl_protocol_connect() calls Curl_proxy_connect() which is the one calling Curl_proxyCONNECT() and getting a CURLE_OK with low level transport connection established but without having received a single bit from the proxy. Afterwards Curl_protocol_connect() calls ftp_connect() the protocol-specific connect() procedure and this is the one that is reading the proxy CONNECT response. k!! That's the reason why trace shows an FTP state change from STOP to WAIT220 inmediately after first Curl_proxyCONNECT() that returns CURLE_OK, and it is the FTP handler the one that reads the http proxy 501 response. So it seems that somehow we should avoid calling the protocol specific connect() in Curl_protocol_connect() until the proxy connection has been fully established. -- -=[Yang]=- trace1059 Description: Binary data --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
Daniel, Attached patch (--no-prefix against current master) fixes the 1059 issue for me I'm now running full 'always multi' test suite. Later on I'll also apply patch to master without 'always multi' and run test suite. The issue doesn't actually seem an 'always multi' one, but simply a bug that thanks to 'always multi' has been uncovered. Tomorrow I'll have the results for these tests. -- -=[Yang]=- url_c.patch Description: Binary data --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Wed, Jan 16, 2013, Yang Tse wrote: Attached patch (--no-prefix against current master) fixes the 1059 issue for me I'm now running full 'always multi' test suite. Buarghh. NTLM and SFTP tests fail like mad and others sigsegv :-( I've killed the test suite. Something more involved is certainly needed. -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Mon, 14 Jan 2013, Yang Tse wrote: When trying to use this connection in multi.c:1430 with Curl_readwrite() it finds that the connection has already been disconnected. easy-result gets CURLE_RECV_ERROR in multi.c:1430 and 'done' is FALSE. Thanks for that analysis. It brights much clarity! The 'done' being FALSE is not good. If the retry is to be done, we must consider that request as done so we need to explicitly make it so. Please try the patch below and see if it makes anything better. It works fine for me with the patch applied and I believe it is the right approach. And finally the big question. Do we really have to retry CURLM_STATE_PERFORM upon such conditions or should this be going to a previous state or aborting upon a given retry count or even at first failure? Yes! It needs to change back to the CURLM_STATE_CONNECT state. That is what it will (should) do if the request is done and a retry is scheduled. There's no need for a retry count since the failed connection will get closed in this procedure so it can't happen again on the same connection. Here's the patch (made on top of v5): --- a/lib/multi.c +++ b/lib/multi.c @@ -1450,9 +1450,12 @@ static CURLMcode multi_runsingle(struct Curl_multi *multi if(!ret) retry = (newurl)?TRUE:FALSE; -if(retry) - /* if we are to retry, set the result to OK */ +if(retry) { + /* if we are to retry, set the result to OK and consider the + request as done */ easy-result = CURLE_OK; + done = TRUE; +} } if(easy-result) { -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Tue, 15 Jan 2013, Yang Tse wrote: Remaining failures are test cases 238, 1316 and 1059. I decided test 238 was not handled properly already in the existing code. I've now added a check that makes it bail earlier, already when receiving an illegal 229 response. I'm on the 1059 issue now. -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Tue, 15 Jan 2013, Daniel Stenberg wrote: I'm on the 1059 issue now. I run this and compare the traces and that tells Curl_proxyCONNECT() seems to not be working entirely correct for you, although I haven't been able to pinpoint why/how. Your trace says after the CONNECT response: 18:39:38.693556 = Recv header, 19 bytes (0x13) : Content-Length: 0 18:39:38.693556 = Recv header, 2 bytes (0x2) : 18:39:38.693556 == Info: 0xb5f1a0 is at send pipe head! 18:39:38.693556 == Info: STATE: CONNECT = WAITPROXYCONNECT handle 0xb82f68; line 1029 (connection #0) 18:39:38.713556 == Info: Multi mode finished polling for response from proxy CONNECT 18:39:38.713556 == Info: Proxy CONNECT aborted While my log says: 23:07:24.111203 = Recv header, 19 bytes (0x13) : Content-Length: 0 23:07:24.112317 = Recv header, 2 bytes (0x2) : 23:07:24.118289 == Info: Received HTTP code 501 from proxy after CONNECT 23:07:24.129116 == Info: Closing connection 0 To me it looks as if in your case it doesn't properly detect the end of the response (headers) so it continues to wait and then eventually the connection gets closed and it gets upset. Can you sprinkle some debug logs in there to see if you can figure it out? -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Fri, 11 Jan 2013, Yang Tse wrote: 00:25:55.571476 == Info: Recv failure: Connection reset by peer 00:25:55.571476 == Info: Connection died, retrying a fresh connect ... it doesn't look like this is truly trying a fresh connect, it seems to mostly just _say_ it! Yep, something could probably be improved. This is meant to get trapped in lib/multi.c:1444 for the CURLM_STATE_PERFORM state, but clearly it isn't or it isn't acting correctly. Can you figure out if the if() matches and the action is wrong or if it never even invokes Curl_retry_request() there? -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Mon, Jan 14, 2013, Daniel Stenberg wrote: On Fri, 11 Jan 2013, Yang Tse wrote: 00:25:55.571476 == Info: Recv failure: Connection reset by peer 00:25:55.571476 == Info: Connection died, retrying a fresh connect ... it doesn't look like this is truly trying a fresh connect, it seems to mostly just _say_ it! Yep, something could probably be improved. This is meant to get trapped in lib/multi.c:1444 for the CURLM_STATE_PERFORM state, but clearly it isn't or it isn't acting correctly. Can you figure out if the if() matches and the action is wrong or if it never even invokes Curl_retry_request() there? In this case multi attempts to re-use an existing connection returned by ConnectionExists() which happens to be alive when checked inside ConnectionExists(). [Marker A] When trying to use this connection in multi.c:1430 with Curl_readwrite() it finds that the connection has already been disconnected. easy-result gets CURLE_RECV_ERROR in multi.c:1430 and 'done' is FALSE. Condition tested in multi.c:1444 is true and as such Curl_retry_request() is called. Curl_retry_request() call in multi.c:1449 returns 0 into 'ret' variable and sets (newurl != NULL) With above results, multi.c:1455 gets executed setting easy-result = CURLE_OK. Making next line of code actually executed multi.c:1536. Remaining in CURLM_STATE_PERFORM state. Next call to multi_runsingle() makes it iterate again in CURLM_STATE_PERFORM state, and as such, same as flow control and results as described from [Marker A] above down to here are repeated endlessly. So the issue there is that the retrying logic there has to be somehow controlled or used differently. Adding in multi.c:1444 an additional condition such as ( ! easy-easy_conn-bits.reuse) avoids the endless loop, but it subverts the purpose of the 'retrying' (at least as described in lines 1445 to 1447). Checking there if the socket is dead can also be done, but happens the same as above. Introducing some sort of retry control, count-based, time-based, or both as we do in tool_operate.c lines 1362 to 1382 would limit the number of iterations. And finally the big question. Do we really have to retry CURLM_STATE_PERFORM upon such conditions or should this be going to a previous state or aborting upon a given retry count or even at first failure? -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Fri, 11 Jan 2013, Yang Tse wrote: I'll study your trace, compare with mine and later on start to debug this. Just wanted now to handle you all logs for test 64. Can you first check and see if there's a time descripancy between the trace64 log and the http_server.log ? I can't make sense of the times in there: http server: 00:25:55.470998 Client connect 00:25:55.470998 TCP_NODELAY set 00:25:55.470998 accept_connection 3 returned 4 00:25:55.470998 accept_connection 3 returned 0 00:25:55.490998 Read 84 bytes 00:25:55.490998 Process 84 bytes request 00:25:55.490998 Got request: GET /64 HTTP/1.1 trace: 00:25:55.461476 == Info: About to connect() to 127.0.0.1 port 8990 (#0) 00:25:55.471476 == Info: Trying 127.0.0.1... 00:25:55.491476 == Info: connected 00:25:55.491476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0) 00:25:55.491476 == Info: 0xb4f1a0 is at send pipe head! 00:25:55.491476 == Info: STATE: CONNECT = DO handle 0xb72f68; line 1025 (connection #0) 00:25:55.491476 == Info: Server auth using Digest with user 'testuser' 00:25:55.491476 = Send header, 84 bytes (0x54) See? The server claims it got the GET already before libcurl logged 'connected' ! And then for the next sequence of interest to me: server log: 00:25:55.510998 Response sent (245 bytes) and written to log/server.response 00:25:55.510998 instructed to close connection after server-reply 00:25:55.510998 Client disconnect 0 trace log: 00:25:55.511476 == Info: Connection #0 to host 127.0.0.1 left intact 00:25:55.511476 == Info: Issue another request to this URL: 'http://127.0.0.1:8990/64' 00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180 00:25:55.511476 == Info: Re-using existing connection! (#0) with host 127.0.0.1 00:25:55.511476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0) ... which looks like SocketIsDead() didn't think the connection it tries to re-use is dead, but the time stamps indicate it should've seen it. Then there's clearly a bug when things went south due to the dead connection being attempted for sending the request on: 00:25:55.571476 == Info: Recv failure: Connection reset by peer 00:25:55.571476 == Info: Connection died, retrying a fresh connect 00:25:55.571476 == Info: Recv failure: Connection reset by peer 00:25:55.571476 == Info: Connection died, retrying a fresh connect ... it doesn't look like this is truly trying a fresh connect, it seems to mostly just _say_ it! -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Fri, Jan 11, 2013, Daniel Stenberg wrote: Can you first check and see if there's a time descripancy between the trace64 log and the http_server.log ? I can't make sense of the times in there: These are the best kind of timestamps we can get on this kind of 'system', not too bad accuracy and not too bad resolution. But certainly not good enough in order to timestamp with 'real' microsecond resolution and accuracy each call, so trying to mentally sync two logs with timestamps is just an approximation. http server: 00:25:55.470998 Client connect 00:25:55.470998 TCP_NODELAY set 00:25:55.470998 accept_connection 3 returned 4 00:25:55.470998 accept_connection 3 returned 0 00:25:55.490998 Read 84 bytes 00:25:55.490998 Process 84 bytes request 00:25:55.490998 Got request: GET /64 HTTP/1.1 trace: 00:25:55.461476 == Info: About to connect() to 127.0.0.1 port 8990 (#0) 00:25:55.471476 == Info: Trying 127.0.0.1... 00:25:55.491476 == Info: connected 00:25:55.491476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0) 00:25:55.491476 == Info: 0xb4f1a0 is at send pipe head! 00:25:55.491476 == Info: STATE: CONNECT = DO handle 0xb72f68; line 1025 (connection #0) 00:25:55.491476 == Info: Server auth using Digest with user 'testuser' 00:25:55.491476 = Send header, 84 bytes (0x54) See? The server claims it got the GET already before libcurl logged 'connected' ! Yeah, and server claims it read the 84 bytes 0,000478 seconds before curl logs that the 84 bytes have been sent. With this we know that there's at least that offset between timestamps in both logs, although other events might be apart even with double that offset. Fun, Eh? ;-) And then for the next sequence of interest to me: server log: 00:25:55.510998 Response sent (245 bytes) and written to log/server.response 00:25:55.510998 instructed to close connection after server-reply 00:25:55.510998 Client disconnect 0 trace log: 00:25:55.511476 == Info: Connection #0 to host 127.0.0.1 left intact 00:25:55.511476 == Info: Issue another request to this URL: 'http://127.0.0.1:8990/64' 00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180 00:25:55.511476 == Info: Re-using existing connection! (#0) with host 127.0.0.1 00:25:55.511476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0) ... which looks like SocketIsDead() didn't think the connection it tries to re-use is dead, but the time stamps indicate it should've seen it. (server log 00:25:55.510998) + 0.000478 = (trace.log 00:25:55.511476) But lack of a better resolution, makes it impossible to truly tell apart timestamps for: 00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180 00:25:55.511476 == Info: Re-using existing connection! (#0) with host I've been tracing SocketIsDead() and certainly when called in this case it does not detect that the connection has been closed on the other side, with a select timeout of zero. :-( Given that all this is running on cygwin with its own implementation of select on top of pthreads which calls underlying windows select with whatever implementation it has on top of message passing and windows 'threading model' on a non-realtime system, I consider absolutely normal that in such conditions there exist race conditions which makes it impossible to detect closing connection instantly. As a matter of fact if select call in SocketIsDead() is given a timeout of 60ms then test #64 succeeds, given that in that case it detects that the connection has been closed. I'll do more tests in order to find a better workaround that does not involve using a timeout. Then there's clearly a bug when things went south due to the dead connection being attempted for sending the request on: 00:25:55.571476 == Info: Recv failure: Connection reset by peer 00:25:55.571476 == Info: Connection died, retrying a fresh connect 00:25:55.571476 == Info: Recv failure: Connection reset by peer 00:25:55.571476 == Info: Connection died, retrying a fresh connect ... it doesn't look like this is truly trying a fresh connect, it seems to mostly just _say_ it! Yep, something could probably be improved. As I've mentioned, increassing select timeout in SocketIsDead() to 60ms allows test #64 to pass, but on the other hand the timeout has to be increased to more than 1000ms in order to make test #160 pass which makes the server wait 1 second before closing the connection. Even with all the bizarreness of this cygwin on windows testing framework, multi should be capable of properly handling remote disconnection at any point where network activity is taking place. I'll keep digging on Monday. -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
Attached file logs_7_tests.tar.gz has test traces for test cases 64, 69, 90, 153, 160, 1079 and 1095. These http tests have in common that http test server is instructed to close connection with 'swsclose', while test case is doing more than one request. Above seven tests needed kiil -9 to stop them, otherwise they seem to run for ever, and trace files edited to remove trailing lines which repeat. Attached file logs_238.tar.gz has full logs for test case 238 an ftp one. This also needs kill -9 in order to stop it, but trace238 and memdump do not grow endlessly. Attached file logs_1059.tar.gz has full logs for test case 1059. Given that tis is an 'HTTP CONNECT with proxytunnel to unsupported FTP URL' test, fixing server disconnection handling in 7 cases above most probably will also fix this. -- -=[Yang]=- logs_7_tests.tar.gz Description: GNU Zip compressed data logs_238.tar.gz Description: GNU Zip compressed data logs_1059.tar.gz Description: GNU Zip compressed data --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
Daniel, Some tests I've done... Forced libcurl to build using select() instead of poll() to rule out any problem in that area. No joy, exhibits same behavior. Used http test server sws.c rolled back to commit d759a70d and also back to commit 4d3fb91f. But it exhibits same behavior. And, given that without the 'always multi' patch tests succeed, it has to be something closely related, or a sws.c weakness it uncovers. Thanks, -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Thu, 10 Jan 2013, Yang Tse wrote: Attached file logs_7_tests.tar.gz has test traces for test cases 64, 69, 90, 153, 160, 1079 and 1095. These http tests have in common that http test server is instructed to close connection with 'swsclose', while test case is doing more than one request. Thanks, this is helpful! Lets focus on the single test 64 first. What's the exact test target/OS/version etc you're using to trigger this? Does it fail on all runs? Also, can you show us the log/http_server.log after a run where the test fails? Clearly when you run this test case it doesn't properly detect that the connection is dead when the second request is made and goes on, while in my end it detects the problem, creates a new connection and everything is fine. Compare to my log/trace64 piece: 22:34:20.834635 == Info: Ignoring the response-body 22:34:20.835136 = Recv data, 26 bytes (0x1a) : This is not the real page. 22:34:20.850927 == Info: Connection #0 to host 127.0.0.1 left intact 22:34:20.853020 == Info: Issue another request to this URL: 'http://127.0.0.1:8990/64' 22:34:20.855881 == Info: Found bundle for host 127.0.0.1: 0x780bf58 22:34:20.856924 == Info: Connection 0 seems to be dead! 22:34:20.859015 == Info: Closing connection 0 22:34:20.863244 == Info: The cache now contains 0 members 22:34:20.869979 == Info: About to connect() to 127.0.0.1 port 8990 (#1) 22:34:20.870115 == Info: Trying 127.0.0.1... 22:34:20.870228 == Info: connected 22:34:20.870345 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#1) 22:34:20.870461 == Info: 0x76955f8 is at send pipe head! 22:34:20.871375 == Info: STATE: CONNECT = DO handle 0x77eb418; line 1025 (connection #1) 22:34:20.892268 == Info: Server auth using Digest with user 'testuser' 22:34:20.893184 = Send header, 222 bytes (0xde) : GET /64 HTTP/1.1 0012: Authorization: Digest username=testuser, realm=testrealm, no 0052: nce=1053604145, uri=/64, response=c55f7f30d83d774a3d2dcacf7 0092: 25abaca 009c: User-Agent: curl/7.28.2-DEV 00b9: Host: 127.0.0.1:8990 00cf: Accept: */* ... and the log/http_server.log shows this a few milliseconds before: 22:34:20.793753 Sent off 200 bytes 22:34:20.793770 Sent off 45 bytes 22:34:20.793784 Response sent (245 bytes) and written to log/server.response 22:34:20.793794 instructed to close connection after server-reply 22:34:20.793803 Client disconnect 0 Right now I'm leaning towards a problematic test case/test server since this is code that really isn't affected by my changes - I've just changed some timing and code flows, not the actual real socket/TCP handling. -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Thu, Jan 10, Daniel Stenberg wrote: Lets focus on the single test 64 first. What's the exact test target/OS/version etc you're using to trigger this? Does it fail on all runs? cygwin 1.5.25 on w2k. Yes it fails in all runs. Also, can you show us the log/http_server.log after a run where the test fails? I've just run it again to get a fresh and full set of logs. Attached logs_64.tar.gz with them. I've let the test run for about 1 minute and then killed the curl tool process from another shell. Server seems to disconnect when instructed to do so. I'll study your trace, compare with mine and later on start to debug this. Just wanted now to handle you all logs for test 64. -- -=[Yang]=- logs_64.tar.gz Description: GNU Zip compressed data --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Wed, Jan 9, Daniel Stenberg wrote: On Wed, 9 Jan 2013, Yang Tse wrote: Nearly all test cases pass except for: 64, 69, 90, 153, 160, 1079 and 1095. I think I would need details on these in order to do something. I've now run these tests MANY times here and none of them failed even once, with or without valgrind. I don't see any obvious pattern among these tests that I can start looking at either. OK, I'll do some local digging, and report back when I have something more specific. -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Tue, Jan 8, 2013, Daniel Stenberg wrote: I accidentally posted a version that wasn't the latest rebase yesterday, and with Steve Holme's POP3 TLS fix this morning (that required a little extra merge edit) I decided to post a follow-up version. Nothing else changed. Hi Daniel, I'm trying to test always-multi-v5.diff, but I'm having some problems... First, I needed to dos2unix -U always-multi-v5.diff, otherwise no joy at all applying it. Probably it is gmail web interface acting funny, but given that the patch affects test case definitions and by design some lines of these include ^M line endings while others not it would be interesting if you provided it gzip'ed to preserve these. And second, even after mentioned above step, patch hunks relative to tests/data/test* do not apply except for test1407, test506 and test808. -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
On Tue, Jan 8, 2013, Daniel Stenberg wrote: Those line endings are a bit painful to handle in patches. Try the attached gzip version and see if it makes anything better! Perfect! Building now... -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: [PATCH] always multi v5
Daniel, Patch applies cleanly on top of fea3a3d5. Builds successfully with: Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp Features: Debug TrackMemory IDN Largefile NTLM NTLM_WB SSL libz TLS-SRP Nearly all test cases pass except for: 64, 69, 90, 153, 160, 1079 and 1095. All seven listed above consume 100% cpu, with memdump and trace growing at very fast rate. Test case 238 hangs consuming no CPU and with fixed memdump and trace238 growth. There was a glitch with test case 1059 which failed once, but no it seems to pass with no problem All other tests pass ok here. -- -=[Yang]=- --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html