Re: [PATCH] always multi v5

2013-01-23 Thread Nick Zitzmann

On Jan 21, 2013, at 6:33 PM, Nick Zitzmann n...@chronosnet.com wrote:

 But it started happening once the change was made; that test used to pass. 
 Would it help if I ran all tests and then submitted what was logged prior to 
 the stall?


Well, I did it anyway, and according to trace254 it appears that after some 
recent changes, curl can no longer interpret the response to an FTP request to 
enter passive mode on my system:

 : 227 Entering Passive Mode (::1,246,130).
 11:36:36.937919 == Info: Couldn't interpret the 227-response

And test 255 gets up to this point and then stalls:

 11:36:47.008582 = Send header, 5 bytes (0x5)
 : PWD
 11:36:47.008595 == Info: FTP 0x7f9abb0118e8 (line 857) state change from PASS 
 to PWD
 11:36:47.010172 = Recv header, 30 bytes (0x1e)
 : 257 / is current directory

So could the person responsible for FTP look into these? Here's the contents of 
my log folder at the time of the stall in test 255.

Nick Zitzmann
http://www.chronosnet.com/


logs254-255.tar.gz
Description: GNU Zip compressed data
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: [PATCH] always multi v5

2013-01-21 Thread Daniel Stenberg

On Fri, 18 Jan 2013, Nick Zitzmann wrote:

I tried running the test suite on this build. Test 254 always fails, and 
test 255 causes the curl tool to stall indefinitely, blocking execution of 
later tests. Here's a sample of curl in that test:


Thanks for testing.

Let's address the failures one at a time. Run test 254 only, and when it fails 
please tar up the entire tests/logs/ directory and post the contents here so 
that I can get a change to investigate all possible details that were 
recorded!


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-21 Thread Nick Zitzmann

On Jan 21, 2013, at 3:02 AM, Daniel Stenberg dan...@haxx.se wrote:

 Let's address the failures one at a time. Run test 254 only, and when it 
 fails please tar up the entire tests/logs/ directory and post the contents 
 here so that I can get a change to investigate all possible details that were 
 recorded!

Hope this helps then!

Nick Zitzmann
http://www.chronosnet.com/


log254.tar.gz
Description: GNU Zip compressed data
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: [PATCH] always multi v5

2013-01-21 Thread Daniel Stenberg

On Mon, 21 Jan 2013, Nick Zitzmann wrote:

Let's address the failures one at a time. Run test 254 only, and when it 
fails please tar up the entire tests/logs/ directory and post the contents 
here so that I can get a change to investigate all possible details that 
were recorded!


Hope this helps then!


This seems to be related to the test server or at least sockfilt, which isn't 
part of the always-multi change:


$ cat ftp_ipv6_server.log
11:04:30.683105 Failed to read input
11:04:30.683343 Error: FTP-IPv6 server, read zero
11:04:30.683660 Exited from sysread_or_die() at ./ftpserver.pl line 417. 
FTP-IPv6 server, read zero


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-21 Thread Nick Zitzmann

On Jan 21, 2013, at 12:25 PM, Daniel Stenberg dan...@haxx.se wrote:

 This seems to be related to the test server or at least sockfilt, which isn't 
 part of the always-multi change:
 
 $ cat ftp_ipv6_server.log
 11:04:30.683105 Failed to read input
 11:04:30.683343 Error: FTP-IPv6 server, read zero
 11:04:30.683660 Exited from sysread_or_die() at ./ftpserver.pl line 417. 
 FTP-IPv6 server, read zero


But it started happening once the change was made; that test used to pass. 
Would it help if I ran all tests and then submitted what was logged prior to 
the stall?

Nick Zitzmann
http://www.chronosnet.com/




---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


RE: [PATCH] always multi v5

2013-01-18 Thread Steve Holme
Hi Daniel,

On Thu, 17 Jan 2013, Daniel Stenberg wrote:

 Great. I intend to merge my always-multi patch now. Or at least during
this evening.

Whilst reviewing some of my code (the non-blocking ssl connect in pop3.c) I
got thinking about the _easy_statemach() functions that exist in ftp.c,
pop3.c, smtp.c and imap.c

Excuse me if this is a rather daft question but I was wondering are those
functions, along with Curl_pp_easy_statemach() in pingpong.c, still needed
now or should we be fixing up the code to use the _multi_statemach()
functions instead?

Kind Regards

Steve
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


RE: [PATCH] always multi v5

2013-01-18 Thread Daniel Stenberg

On Fri, 18 Jan 2013, Steve Holme wrote:

Whilst reviewing some of my code (the non-blocking ssl connect in pop3.c) I 
got thinking about the _easy_statemach() functions that exist in ftp.c, 
pop3.c, smtp.c and imap.c


Excuse me if this is a rather daft question but I was wondering are those 
functions, along with Curl_pp_easy_statemach() in pingpong.c, still needed 
now or should we be fixing up the code to use the _multi_statemach() 
functions instead?


It is a fair question! I think perhaps we should consider renaming them since 
they are no longer in fact used for driving any easy interface state machine.


They are however still used for a blocking-mode state machine: ftp_quit() is 
the only user of ftp_easy_statemach() since it (still) relies on a synchronous 
function to do the full request and response. The other protocols have similar 
uses.


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


RE: [PATCH] always multi v5

2013-01-18 Thread Steve Holme
Hi all,

On Fri, 18 Jan 2013, Daniel Stenberg wrote:

  Excuse me if this is a rather daft question but I was wondering are 
  those functions, along with Curl_pp_easy_statemach() in pingpong.c, 
  still needed now or should we be fixing up the code to use the 
  _multi_statemach() functions instead?

 It is a fair question! I think perhaps we should consider renaming them
 since they are no longer in fact used for driving any easy interface state
 machine.

I agree - I think it would be less confusing for anyone reading the code to
know that the easy state machine doesn't exist anymore, but does anyone have
any strong opinion for not doing this?

Additionally, does anyone have any suggestions as to what we should name
these functions? Perhaps:

1) xxx_blocking_statemach() - For example ftp_blocking_statemach() and
pop3_blocking_statemach()
2) Curl_pp_blocking_statemach() instead of Curl_pp_easy_statemach()

 They are however still used for a blocking-mode state machine:
 ftp_quit() is the only user of ftp_easy_statemach() since it (still)
 relies on a synchronous function to do the full request and
 response. The other protocols have similar uses.

A quick review of the code reveals that:

* In ftp.c, ftp_quit() is the only user of ftp_easy_statemach()
* In imap.c, imap_logout() is the only user of imap_easy_statemach()
* In pop3.c, pop3_quit() is the only user of pop3_easy_statemach()
* smtp.c calls smtp_easy_statemach() twice - once in smtp_quit() and the
other in smtp_done() - note there is a huge TODO comment in the latter about
its usage!
* ftp_easy_statemach(), imap_easy_statemach(), pop3_easy_statemach() and
smtp_easy_statemach() are the only users of Curl_pp_blocking_statemach().

I'm happy to do the rename but I would appreciate a bit of guidance as to
whether or not we should 1) Do this at all and 2) Do this for the v7.29
release?

Kind Regards

Steve
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-18 Thread Jiří Hruška
Hi guys,

the best solution would be to eliminate all the blocking stuff, of
course. Then Curl_pp_easy_statemach() could be removed,
Curl_pp_multi_statemach() renamed to just Curl_pp_statemach() and the
PP-based protocols could not get stuck in an unexpected blocking call
anymore. But that would apparently need some deeper changes.

 * smtp.c calls smtp_easy_statemach() twice - once in smtp_quit() and the
 other in smtp_done() - note there is a huge TODO comment in the latter about
 its usage!
Please note a similar hack will be needed in IMAP for proper FETCH and
APPEND handling. For some protocols it is just necessary to send or
receive some data in the DONE phase, after the data has been
transferred - ssh.c contains a similar code and even the same TODO
note in ssh_done().
If this is resolved as the note suggests, e.g. by allowing the done
handler to return CURLE_AGAIN, these cases could be made fully
non-blocking. But there is the part with Curl_done() invokes on
several places in the code, which would need to know about this and
handle it right.

 * In ftp.c, ftp_quit() is the only user of ftp_easy_statemach()
 * In imap.c, imap_logout() is the only user of imap_easy_statemach()
 * In pop3.c, pop3_quit() is the only user of pop3_easy_statemach()
These logout-type functions are invoked from the disconnect handler
from Curl_disconnect(), which is called from various places and I'm
not sure there are provisions for asynchronous operation in all of
them. Perhaps a similar situation as with Curl_done() above.

So maybe the blocking state machine loop will have to stay after all...

1) xxx_blocking_statemach() - For example ftp_blocking_statemach()
What about xxx_statemach_loop()? If that's what it does, loops calling
the state machine again and again till it's finished.

Kind regards,
Jiri
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-18 Thread Nick Zitzmann

On Jan 17, 2013, at 1:49 PM, Daniel Stenberg dan...@haxx.se wrote:

 On Thu, 17 Jan 2013, Daniel Stenberg wrote:
 
 Great. I intend to merge my always-multi patch now. Or at least during this 
 evening. Thanks for all your help with this!
 
 And it has been pushed as commit c43127414d89ccb9.
 
 A repeated lengthier explanation of the work:
 
 http://daniel.haxx.se/blog/2013/01/17/internally-were-all-multi-now/


I tried running the test suite on this build. Test 254 always fails, and test 
255 causes the curl tool to stall indefinitely, blocking execution of later 
tests. Here's a sample of curl in that test:

Call graph:
2751 Thread_429482   DispatchQueue_1: com.apple.main-thread  (serial)
  2751 start  (in libdyld.dylib) + 1  [0x7fff936847e1]
2751 main  (in curl) + 142  [0x103ed8f6e]  tool_main.c:94
  2751 operate  (in curl) + 25325  [0x103edfc9d]  tool_operate.c:1356
2750 curl_easy_perform  (in curl) + 320  [0x103f28fd0]  easy.c:445
+ 2749 curl_multi_wait  (in curl) + 971  [0x103f34ddb]  multi.c:885
+ ! 2749 Curl_poll  (in curl) + 1474  [0x103f3f7c2]  select.c:473
+ !   2749 __select  (in libsystem_kernel.dylib) + 10  
[0x7fff8d4ad322]
+ 1 curl_multi_wait  (in curl) + 1021  [0x103f34e0d]  multi.c:889
+   1 curl_dofree  (in curl) + 185  [0x103f2e709]  memdebug.c:307
+ 1 curl_memlog  (in curl) + 422  [0x103f2ed46]  memdebug.c:433
+   1 curl_mvsnprintf  (in curl) + 86  [0x103f1e5b6]  
mprintf.c:1031
+ 1 dprintf_formatf  (in curl) + 205  [0x103f1cebd]  
mprintf.c:671
+   1 addbyter  (in curl) + 19  [0x103f1e4e3]  
mprintf.c:1009
1 curl_easy_perform  (in curl) + 352  [0x103f28ff0]  easy.c:448
  1 curl_multi_perform  (in curl) + 232  [0x103f36aa8]  multi.c:1708
1 multi_runsingle  (in curl) + 6710  [0x103f36886]  multi.c:1653
  1 Curl_pgrsUpdate  (in curl) + 3348  [0x103eed2d4]  
progress.c:449
1 curl_mfprintf  (in curl) + 374  [0x103f1f176]  
mprintf.c:1176
  1 dprintf_formatf  (in curl) + 3470  [0x103f1db7e]  
mprintf.c:877
1 fputc  (in libsystem_c.dylib) + 96  [0x7fff934efe72]
  1 __swbuf  (in libsystem_c.dylib) + 180  
[0x7fff934b2e02]
1 __sflush  (in libsystem_c.dylib) + 94  
[0x7fff934eead8]
  1 _swrite  (in libsystem_c.dylib) + 87  
[0x7fff934f3857]
1 __write_nocancel  (in libsystem_kernel.dylib) 
+ 10  [0x7fff8d4ad6fe]

Is there anything I can do to help resolve the issue, and if so, then how can I 
help?

Nick Zitzmann
http://www.chronosnet.com/


---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-17 Thread Yang Tse
Commit https://github.com/bagder/curl/commit/9fd88abb pushed now for
autobuild testing, solves the problems shown locally with test cases
1059 and 1316.

The patch has been tested without 'always multi' on top of commit
4ed6b07a, with test suite passing ok.

Additionally, patch tested with 'always multi'...

[1] 'always-multi-v5' applied on top of commit fea3a3d, plus...
[2] 'retry handling patch'

Test suite also passes ok. (Except 238 which now is handled differently [3]).

Currently I have no functional pending issues with the 'always multi'
modification.

Two remaining nits would be:

*) Extract into a different patch the 'non always multi' stuff,
modification/addition of:

 tests/Makefile.am
 tests/runtests.pl
 tests/valgrind.supp

*) lib/ftp.c lines 104 and 784 use DEBUGBUILD instead of CURLDEBUG,

Thanks,

[1] http://curl.haxx.se/mail/lib-2013-01/0108.html
[2] http://curl.haxx.se/mail/lib-2013-01/0170.html
[3] https://github.com/bagder/curl/commit/533c31b7
-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-17 Thread Daniel Stenberg

On Thu, 17 Jan 2013, Yang Tse wrote:

Commit https://github.com/bagder/curl/commit/9fd88abb pushed now for 
autobuild testing, solves the problems shown locally with test cases 1059 
and 1316.


Currently I have no functional pending issues with the 'always multi' 
modification.


Great. I intend to merge my always-multi patch now. Or at least during this 
evening. Thanks for all your help with this!



Two remaining nits would be:

*) Extract into a different patch the 'non always multi' stuff,
modification/addition of:

tests/Makefile.am
tests/runtests.pl
tests/valgrind.supp


I thought about that but decided not to. Mostly because the valgrind.supp I 
made doesn't apply in this shape without the patch (due to the different code 
paths with and without the patch) and I don't think it is worth the effort to 
fix it first without the patch only to change it again when the patch is 
applied.



*) lib/ftp.c lines 104 and 784 use DEBUGBUILD instead of CURLDEBUG,


Ack, thanks. I'll fix this before I merge.

--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-17 Thread Daniel Stenberg

On Thu, 17 Jan 2013, Daniel Stenberg wrote:

Great. I intend to merge my always-multi patch now. Or at least during this 
evening. Thanks for all your help with this!


And it has been pushed as commit c43127414d89ccb9.

A repeated lengthier explanation of the work:

http://daniel.haxx.se/blog/2013/01/17/internally-were-all-multi-now/

--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-16 Thread Yang Tse
On Tue, Jan 15, 2013, Daniel Stenberg wrote:

 I run this and compare the traces and that tells Curl_proxyCONNECT() seems
 to not be working entirely correct for you, although I haven't been able to
 pinpoint why/how.

[...]

 To me it looks as if in your case it doesn't properly detect the end of the
 response (headers) so it continues to wait and then eventually the
 connection gets closed and it gets upset.

 Can you sprinkle some debug logs in there to see if you can figure it out?

Attached trace1059 shows that first call to Curl_proxyCONNECT()
returns CURLE_OK without having actually connected to the proxy,
awaiting to be called again. No problem here. socket is actually
connected but it has not detected activity yet.

The problem is in Curl_protocol_connect()

Curl_protocol_connect() calls Curl_proxy_connect() which is the one
calling Curl_proxyCONNECT() and getting a CURLE_OK with low level
transport connection established but without having received a single
bit from the proxy.

Afterwards Curl_protocol_connect() calls ftp_connect() the
protocol-specific connect() procedure and this is the one that is
reading the proxy CONNECT response. k!!

That's the reason why trace shows an FTP state change from STOP to
WAIT220 inmediately after first Curl_proxyCONNECT() that returns
CURLE_OK, and it is the FTP handler the one that reads the http proxy
501 response.

So it seems that somehow we should avoid calling the protocol specific
connect() in Curl_protocol_connect() until the proxy connection has
been fully established.

-- 
-=[Yang]=-


trace1059
Description: Binary data
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: [PATCH] always multi v5

2013-01-16 Thread Yang Tse
Daniel,

Attached patch (--no-prefix against current master) fixes the 1059 issue for me

I'm now running full 'always multi' test suite. Later on I'll also
apply patch to master without 'always multi' and run test suite. The
issue doesn't actually seem an 'always multi' one, but simply a bug
that thanks to 'always multi' has been uncovered.

Tomorrow I'll have the results for these tests.
-- 
-=[Yang]=-


url_c.patch
Description: Binary data
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: [PATCH] always multi v5

2013-01-16 Thread Yang Tse
On Wed, Jan 16, 2013, Yang Tse wrote:

 Attached patch (--no-prefix against current master) fixes the 1059 issue for 
 me

 I'm now running full 'always multi' test suite.

Buarghh. NTLM and SFTP tests fail like mad and others sigsegv :-( I've
killed the test suite. Something more involved is certainly needed.

-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-15 Thread Daniel Stenberg

On Mon, 14 Jan 2013, Yang Tse wrote:

When trying to use this connection in multi.c:1430 with Curl_readwrite() it 
finds that the connection has already been disconnected. easy-result gets 
CURLE_RECV_ERROR in multi.c:1430 and 'done' is FALSE.


Thanks for that analysis. It brights much clarity!

The 'done' being FALSE is not good. If the retry is to be done, we must 
consider that request as done so we need to explicitly make it so. Please try 
the patch below and see if it makes anything better. It works fine for me with 
the patch applied and I believe it is the right approach.


And finally the big question. Do we really have to retry CURLM_STATE_PERFORM 
upon such conditions or should this be going to a previous state or aborting 
upon a given retry count or even at first failure?


Yes! It needs to change back to the CURLM_STATE_CONNECT state. That is what it 
will (should) do if the request is done and a retry is scheduled.


There's no need for a retry count since the failed connection will get closed 
in this procedure so it can't happen again on the same connection.


Here's the patch (made on top of v5):

--- a/lib/multi.c
+++ b/lib/multi.c
@@ -1450,9 +1450,12 @@ static CURLMcode multi_runsingle(struct Curl_multi 
*multi

 if(!ret)
   retry = (newurl)?TRUE:FALSE;

-if(retry)
-  /* if we are to retry, set the result to OK */
+if(retry) {
+  /* if we are to retry, set the result to OK and consider the
+ request as done */
   easy-result = CURLE_OK;
+  done = TRUE;
+}
   }

   if(easy-result) {


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-15 Thread Daniel Stenberg

On Tue, 15 Jan 2013, Yang Tse wrote:


Remaining failures are test cases 238, 1316 and 1059.


I decided test 238 was not handled properly already in the existing code. I've 
now added a check that makes it bail earlier, already when receiving an 
illegal 229 response.


I'm on the 1059 issue now.

--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-15 Thread Daniel Stenberg

On Tue, 15 Jan 2013, Daniel Stenberg wrote:


I'm on the 1059 issue now.


I run this and compare the traces and that tells Curl_proxyCONNECT() seems to 
not be working entirely correct for you, although I haven't been able to 
pinpoint why/how.


Your trace says after the CONNECT response:

18:39:38.693556 = Recv header, 19 bytes (0x13)
: Content-Length: 0
18:39:38.693556 = Recv header, 2 bytes (0x2)
:
18:39:38.693556 == Info: 0xb5f1a0 is at send pipe head!
18:39:38.693556 == Info: STATE: CONNECT = WAITPROXYCONNECT handle 0xb82f68; 
line 1029 (connection #0)
18:39:38.713556 == Info: Multi mode finished polling for response from proxy 
CONNECT

18:39:38.713556 == Info: Proxy CONNECT aborted

While my log says:

23:07:24.111203 = Recv header, 19 bytes (0x13)
: Content-Length: 0
23:07:24.112317 = Recv header, 2 bytes (0x2)
:
23:07:24.118289 == Info: Received HTTP code 501 from proxy after CONNECT
23:07:24.129116 == Info: Closing connection 0

To me it looks as if in your case it doesn't properly detect the end of the 
response (headers) so it continues to wait and then eventually the connection 
gets closed and it gets upset.


Can you sprinkle some debug logs in there to see if you can figure it out?

--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-14 Thread Daniel Stenberg

On Fri, 11 Jan 2013, Yang Tse wrote:


00:25:55.571476 == Info: Recv failure: Connection reset by peer
00:25:55.571476 == Info: Connection died, retrying a fresh connect

... it doesn't look like this is truly trying a fresh connect, it seems to
mostly just _say_ it!


Yep, something could probably be improved.


This is meant to get trapped in lib/multi.c:1444 for the CURLM_STATE_PERFORM 
state, but clearly it isn't or it isn't acting correctly. Can you figure out 
if the if() matches and the action is wrong or if it never even invokes 
Curl_retry_request() there?


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-14 Thread Yang Tse
On Mon, Jan 14, 2013, Daniel Stenberg wrote:
 On Fri, 11 Jan 2013, Yang Tse wrote:

 00:25:55.571476 == Info: Recv failure: Connection reset by peer
 00:25:55.571476 == Info: Connection died, retrying a fresh connect

 ... it doesn't look like this is truly trying a fresh connect, it seems
 to
 mostly just _say_ it!


 Yep, something could probably be improved.


 This is meant to get trapped in lib/multi.c:1444 for the CURLM_STATE_PERFORM
 state, but clearly it isn't or it isn't acting correctly. Can you figure out
 if the if() matches and the action is wrong or if it never even invokes
 Curl_retry_request() there?

In this case multi attempts to re-use an existing connection returned
by ConnectionExists() which happens to be alive when checked inside
ConnectionExists().

[Marker A]

When trying to use this connection in multi.c:1430 with
Curl_readwrite() it finds that the connection has already been
disconnected. easy-result gets CURLE_RECV_ERROR in multi.c:1430 and
'done' is FALSE.

Condition tested in multi.c:1444 is true and as such
Curl_retry_request() is called.

Curl_retry_request() call in multi.c:1449 returns 0 into 'ret'
variable and sets (newurl != NULL)

With above results, multi.c:1455 gets executed setting easy-result =
CURLE_OK. Making next line of code actually executed multi.c:1536.
Remaining in CURLM_STATE_PERFORM state.

Next call to multi_runsingle() makes it iterate again in
CURLM_STATE_PERFORM state, and as such, same as flow control and
results as described from [Marker A] above down to here are repeated
endlessly.

So the issue there is that the retrying logic there has to be somehow
controlled or used differently.

Adding in multi.c:1444 an additional condition such as  ( !
easy-easy_conn-bits.reuse) avoids the endless loop, but it subverts
the purpose of the 'retrying' (at least as described in lines 1445 to
1447).

Checking there if the socket is dead can also be done, but happens the
same as above.

Introducing some sort of retry control, count-based, time-based, or
both as we do in tool_operate.c lines 1362 to 1382 would limit the
number of iterations.

And finally the big question. Do we really have to retry
CURLM_STATE_PERFORM upon such conditions or should this be going to a
previous state or aborting upon a given retry count or even at first
failure?

-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-11 Thread Daniel Stenberg

On Fri, 11 Jan 2013, Yang Tse wrote:


I'll study your trace, compare with mine and later on start to debug this.

Just wanted now to handle you all logs for test 64.


Can you first check and see if there's a time descripancy between the trace64 
log and the http_server.log ? I can't make sense of the times in there:


http server:

00:25:55.470998  Client connect
00:25:55.470998 TCP_NODELAY set
00:25:55.470998 accept_connection 3 returned 4
00:25:55.470998 accept_connection 3 returned 0
00:25:55.490998 Read 84 bytes
00:25:55.490998 Process 84 bytes request
00:25:55.490998 Got request: GET /64 HTTP/1.1

trace:

00:25:55.461476 == Info: About to connect() to 127.0.0.1 port 8990 (#0)
00:25:55.471476 == Info:   Trying 127.0.0.1...
00:25:55.491476 == Info: connected
00:25:55.491476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0)
00:25:55.491476 == Info: 0xb4f1a0 is at send pipe head!
00:25:55.491476 == Info: STATE: CONNECT = DO handle 0xb72f68; line 1025 
(connection #0)

00:25:55.491476 == Info: Server auth using Digest with user 'testuser'
00:25:55.491476 = Send header, 84 bytes (0x54)

See? The server claims it got the GET already before libcurl logged 
'connected' !


And then for the next sequence of interest to me:

server log:

00:25:55.510998 Response sent (245 bytes) and written to log/server.response
00:25:55.510998 instructed to close connection after server-reply
00:25:55.510998  Client disconnect 0

trace log:

00:25:55.511476 == Info: Connection #0 to host 127.0.0.1 left intact
00:25:55.511476 == Info: Issue another request to this URL: 
'http://127.0.0.1:8990/64'

00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180
00:25:55.511476 == Info: Re-using existing connection! (#0) with host 
127.0.0.1

00:25:55.511476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0)

... which looks like SocketIsDead() didn't think the connection it tries to 
re-use is dead, but the time stamps indicate it should've seen it.


Then there's clearly a bug when things went south due to the dead connection 
being attempted for sending the request on:


00:25:55.571476 == Info: Recv failure: Connection reset by peer
00:25:55.571476 == Info: Connection died, retrying a fresh connect
00:25:55.571476 == Info: Recv failure: Connection reset by peer
00:25:55.571476 == Info: Connection died, retrying a fresh connect

... it doesn't look like this is truly trying a fresh connect, it seems to 
mostly just _say_ it!


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-11 Thread Yang Tse
On Fri, Jan 11, 2013, Daniel Stenberg wrote:

 Can you first check and see if there's a time descripancy between the
 trace64 log and the http_server.log ? I can't make sense of the times in
 there:

These are the best kind of timestamps we can get on this kind of
'system', not too bad accuracy and not too bad resolution. But
certainly not good enough in order to timestamp with 'real'
microsecond resolution and accuracy each call, so trying to mentally
sync two logs with timestamps is just an approximation.

 http server:

 00:25:55.470998  Client connect
 00:25:55.470998 TCP_NODELAY set
 00:25:55.470998 accept_connection 3 returned 4
 00:25:55.470998 accept_connection 3 returned 0
 00:25:55.490998 Read 84 bytes
 00:25:55.490998 Process 84 bytes request
 00:25:55.490998 Got request: GET /64 HTTP/1.1

 trace:

 00:25:55.461476 == Info: About to connect() to 127.0.0.1 port 8990 (#0)
 00:25:55.471476 == Info:   Trying 127.0.0.1...
 00:25:55.491476 == Info: connected
 00:25:55.491476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0)
 00:25:55.491476 == Info: 0xb4f1a0 is at send pipe head!
 00:25:55.491476 == Info: STATE: CONNECT = DO handle 0xb72f68; line 1025
 (connection #0)
 00:25:55.491476 == Info: Server auth using Digest with user 'testuser'
 00:25:55.491476 = Send header, 84 bytes (0x54)

 See? The server claims it got the GET already before libcurl logged
 'connected' !

Yeah, and server claims it read the 84 bytes 0,000478 seconds before
curl logs that the 84 bytes have been sent. With this we know that
there's at least that offset between timestamps in both logs, although
other events might be apart even with double that offset.

Fun, Eh? ;-)

 And then for the next sequence of interest to me:

 server log:

 00:25:55.510998 Response sent (245 bytes) and written to log/server.response
 00:25:55.510998 instructed to close connection after server-reply
 00:25:55.510998  Client disconnect 0

 trace log:

 00:25:55.511476 == Info: Connection #0 to host 127.0.0.1 left intact
 00:25:55.511476 == Info: Issue another request to this URL:
 'http://127.0.0.1:8990/64'
 00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180
 00:25:55.511476 == Info: Re-using existing connection! (#0) with host
 127.0.0.1
 00:25:55.511476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0)

 ... which looks like SocketIsDead() didn't think the connection it tries to
 re-use is dead, but the time stamps indicate it should've seen it.

(server log 00:25:55.510998) + 0.000478 = (trace.log 00:25:55.511476)

But lack of a better resolution, makes it impossible to truly tell
apart timestamps for:

 00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180
 00:25:55.511476 == Info: Re-using existing connection! (#0) with host

I've been tracing SocketIsDead() and certainly when called in this
case it does not detect that the connection has been closed on the
other side, with a select timeout of zero. :-(

Given that all this is running on cygwin with its own implementation
of select on top of pthreads which calls underlying windows select
with whatever implementation it has on top of message passing and
windows 'threading model' on a non-realtime system, I consider
absolutely normal that in such conditions there exist race conditions
which makes it impossible to detect closing connection instantly.

As a matter of fact if select call in SocketIsDead() is given a
timeout of 60ms then test #64 succeeds, given that in that case it
detects that the connection has been closed.

I'll do more tests in order to find a better workaround that does not
involve using a timeout.

 Then there's clearly a bug when things went south due to the dead connection
 being attempted for sending the request on:

 00:25:55.571476 == Info: Recv failure: Connection reset by peer
 00:25:55.571476 == Info: Connection died, retrying a fresh connect
 00:25:55.571476 == Info: Recv failure: Connection reset by peer
 00:25:55.571476 == Info: Connection died, retrying a fresh connect

 ... it doesn't look like this is truly trying a fresh connect, it seems to
 mostly just _say_ it!

Yep, something could probably be improved. As I've mentioned,
increassing select timeout in SocketIsDead() to 60ms allows test #64
to pass, but on the other hand the timeout has to be increased to more
than 1000ms in order to make test #160 pass which makes the server
wait 1 second before closing the connection.

Even with all the bizarreness of this cygwin on windows testing
framework, multi should be capable of properly handling remote
disconnection at any point where network activity is taking place.

I'll keep digging on Monday.
-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-10 Thread Yang Tse
Attached file logs_7_tests.tar.gz has test traces for test cases 64,
69, 90, 153, 160, 1079 and 1095. These http tests have in common that
http test server is instructed to close connection with 'swsclose',
while test case is doing more than one request.

Above seven tests needed kiil -9 to stop them, otherwise they seem to
run for ever, and trace files edited to remove trailing lines which
repeat.

Attached file logs_238.tar.gz has full logs for test case 238 an ftp
one. This also needs kill -9 in order to stop it, but trace238 and
memdump do not grow endlessly.

Attached file logs_1059.tar.gz has full logs for test case 1059. Given
that tis is an 'HTTP CONNECT with proxytunnel to unsupported FTP URL'
test, fixing server disconnection handling in 7 cases above most
probably will also fix this.

-- 
-=[Yang]=-


logs_7_tests.tar.gz
Description: GNU Zip compressed data


logs_238.tar.gz
Description: GNU Zip compressed data


logs_1059.tar.gz
Description: GNU Zip compressed data
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: [PATCH] always multi v5

2013-01-10 Thread Yang Tse
Daniel,

Some tests I've done...

Forced libcurl to build using select() instead of poll() to rule out
any problem in that area. No joy, exhibits same behavior.

Used http test server sws.c rolled back to commit d759a70d and also
back to commit 4d3fb91f. But it exhibits same behavior.

And, given that without the 'always multi' patch tests succeed, it has
to be something closely related, or a sws.c weakness it uncovers.

Thanks,
-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-10 Thread Daniel Stenberg

On Thu, 10 Jan 2013, Yang Tse wrote:

Attached file logs_7_tests.tar.gz has test traces for test cases 64, 69, 90, 
153, 160, 1079 and 1095. These http tests have in common that http test 
server is instructed to close connection with 'swsclose', while test case is 
doing more than one request.


Thanks, this is helpful!

Lets focus on the single test 64 first. What's the exact test 
target/OS/version etc you're using to trigger this? Does it fail on all runs?


Also, can you show us the log/http_server.log after a run where the test 
fails?


Clearly when you run this test case it doesn't properly detect that the 
connection is dead when the second request is made and goes on, while in my 
end it detects the problem, creates a new connection and everything is fine. 
Compare to my log/trace64 piece:


22:34:20.834635 == Info: Ignoring the response-body
22:34:20.835136 = Recv data, 26 bytes (0x1a)
: This is not the real page.
22:34:20.850927 == Info: Connection #0 to host 127.0.0.1 left intact
22:34:20.853020 == Info: Issue another request to this URL: 
'http://127.0.0.1:8990/64'

22:34:20.855881 == Info: Found bundle for host 127.0.0.1: 0x780bf58
22:34:20.856924 == Info: Connection 0 seems to be dead!
22:34:20.859015 == Info: Closing connection 0
22:34:20.863244 == Info: The cache now contains 0 members
22:34:20.869979 == Info: About to connect() to 127.0.0.1 port 8990 (#1)
22:34:20.870115 == Info:   Trying 127.0.0.1...
22:34:20.870228 == Info: connected
22:34:20.870345 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#1)
22:34:20.870461 == Info: 0x76955f8 is at send pipe head!
22:34:20.871375 == Info: STATE: CONNECT = DO handle 0x77eb418; line 1025 
(connection #1)

22:34:20.892268 == Info: Server auth using Digest with user 'testuser'
22:34:20.893184 = Send header, 222 bytes (0xde)
: GET /64 HTTP/1.1
0012: Authorization: Digest username=testuser, realm=testrealm, no
0052: nce=1053604145, uri=/64, response=c55f7f30d83d774a3d2dcacf7
0092: 25abaca
009c: User-Agent: curl/7.28.2-DEV
00b9: Host: 127.0.0.1:8990
00cf: Accept: */*

... and the log/http_server.log shows this a few milliseconds before:

22:34:20.793753 Sent off 200 bytes
22:34:20.793770 Sent off 45 bytes
22:34:20.793784 Response sent (245 bytes) and written to log/server.response
22:34:20.793794 instructed to close connection after server-reply
22:34:20.793803  Client disconnect 0

Right now I'm leaning towards a problematic test case/test server since this 
is code that really isn't affected by my changes - I've just changed some 
timing and code flows, not the actual real socket/TCP handling.


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-10 Thread Yang Tse
On Thu, Jan 10, Daniel Stenberg wrote:

 Lets focus on the single test 64 first. What's the exact test
 target/OS/version etc you're using to trigger this? Does it fail on all
 runs?

cygwin 1.5.25 on w2k. Yes it fails in all runs.

 Also, can you show us the log/http_server.log after a run where the test
 fails?

I've just run it again to get a fresh and full set of logs. Attached
logs_64.tar.gz with them. I've let the test run for about 1 minute and
then killed the curl tool process from another shell.

Server seems to disconnect when instructed to do so.

I'll study your trace, compare with mine and later on start to debug this.

Just wanted now to handle you all logs for test 64.

-- 
-=[Yang]=-


logs_64.tar.gz
Description: GNU Zip compressed data
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: [PATCH] always multi v5

2013-01-09 Thread Yang Tse
On Wed, Jan 9, Daniel Stenberg wrote:

 On Wed, 9 Jan 2013, Yang Tse wrote:

 Nearly all test cases pass except for:

 64, 69, 90, 153, 160, 1079 and 1095.


 I think I would need details on these in order to do something. I've now run
 these tests MANY times here and none of them failed even once, with or
 without valgrind.

 I don't see any obvious pattern among these tests that I can start looking
 at either.

OK, I'll do some local digging, and report back when I have something
more specific.

-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-08 Thread Yang Tse
On Tue, Jan 8, 2013, Daniel Stenberg wrote:

 I accidentally posted a version that wasn't the latest rebase yesterday, and
 with Steve Holme's POP3 TLS fix this morning (that required a little extra
 merge edit) I decided to post a follow-up version. Nothing else changed.

Hi Daniel,

I'm trying to test always-multi-v5.diff, but I'm having some problems...

First, I needed to dos2unix -U always-multi-v5.diff, otherwise no joy
at all applying it.

Probably it is gmail web interface acting funny, but given that the
patch affects test case definitions and by design some lines of these
include ^M line endings while others not it would be interesting if
you provided it gzip'ed to preserve these.

And second, even after mentioned above step, patch hunks relative to
tests/data/test* do not apply except for test1407, test506 and
test808.

-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-08 Thread Yang Tse
On Tue, Jan 8, 2013, Daniel Stenberg wrote:

 Those line endings are a bit painful to handle in patches. Try the attached
 gzip version and see if it makes anything better!

Perfect!

Building now...
-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: [PATCH] always multi v5

2013-01-08 Thread Yang Tse
Daniel,

Patch applies cleanly on top of fea3a3d5.

Builds successfully with:

Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s
rtsp scp sftp smtp smtps telnet tftp
Features: Debug TrackMemory IDN Largefile NTLM NTLM_WB SSL libz TLS-SRP

Nearly all test cases pass except for:

64, 69, 90, 153, 160, 1079 and 1095.

All seven listed above consume 100% cpu, with memdump and trace
growing at very fast rate.

Test case 238 hangs consuming no CPU and with fixed memdump and trace238 growth.

There was a glitch with test case 1059 which failed once, but no it
seems to pass with no problem

All other tests pass ok here.
-- 
-=[Yang]=-
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html