Re: [BUG?] git http connection reuse
On Tue, 18 Feb 2014, Jeff King wrote: I'm not clear whether you mean by this that it is _expected_ in my test program for curl not to reuse the connection. Or that curl may simply have to do a little more work, and it is still a bug that the connection is not reused. Okey, I checked this closer now and this is the full explanation to what happens. It seems to work as intended: It's all about where the connection cache is held by libcurl. When you create a multi handle, it will create a connection cache that will automatically be shared by all easy handles that are added to it. If you create an easy handle and make a curl_easy_perform() on that, it will create its own connection cache and keep it associated with this easy handle. When first using an easy handle within a multi handle it will use the shared connection cache in there as long as it is in that multi handle family, but as soon as you remove it from there it will be detached from that connection cache. Then, when doing a fresh request with easy_perform using the handle that was detached from the multi handle, it will create and use its own private cache as it can't re-use the previous connection that is cached within the multi handle. We can certainly teach git to use the multi interface, even when doing a single blocking request. For connection re-use purposes, that may make a lot of sense. -- / daniel.haxx.se -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Tue, 18 Feb 2014, Jeff King wrote: I think there is still an unrelated issue with curl_multi preventing connection reuse, but I'm not sure from what you say below... I'm not clear whether you mean by this that it is _expected_ in my test program for curl not to reuse the connection. Or that curl may simply have to do a little more work, and it is still a bug that the connection is not reused. Argh, sorry. I thought were still referring to the previous problem. I can indeed repeat the problem you talk about with your test code. Thanks! I'll get back to you. -- / daniel.haxx.se -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Tue, Feb 18, 2014 at 08:13:16AM +0100, Daniel Stenberg wrote: > >Right; I'd expect multiple connections for parallel requests, but > >in this case we are completing the first and removing the handle > >before starting the second. Digging further, I was able to > >reproduce the behavior with a simple program: > > Yeah, given your description I had no problems to repeat it either. > Turns out we had no decent test case that checked for this so in our > eagerness to fix a security problem involving "over-reuse" we broke > this simpler reuse case. Two steps forward, one step backward... :-/ You are talking about the NTLM fix here, right? I think there is still an unrelated issue with curl_multi preventing connection reuse, but I'm not sure from what you say below... > >Does that apply to the handle after it has finished its transaction > >and been removed from the multi object (in which case git is Doing > >It Wrong)? > > No it doesn't. The man page should probably be clarified to express > that slightly better. It just means that _while_ a handle is added to > a multi handle it cannot be used with curl_easy_perform(). Thanks for the clarification. That was my guess from reading it, but given that it wasn't working as expected, I wondered if we were violating the rules. I saw the documentation fix you just pushed; it looks reasonable to me. > Several internal caches are kept in the multi handle when that's used > though, so when getting the easy handle out after having used it with > the multi interface and then using it with the easy interface may > cause libcurl to do a little more work than if you would be able to > re-add it to the same multi handle do to the operation with that... I'm not clear whether you mean by this that it is _expected_ in my test program for curl not to reuse the connection. Or that curl may simply have to do a little more work, and it is still a bug that the connection is not reused. We can certainly teach git to use the multi interface, even when doing a single blocking request. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Mon, 17 Feb 2014, Jeff King wrote: Right; I'd expect multiple connections for parallel requests, but in this case we are completing the first and removing the handle before starting the second. Digging further, I was able to reproduce the behavior with a simple program: Yeah, given your description I had no problems to repeat it either. Turns out we had no decent test case that checked for this so in our eagerness to fix a security problem involving "over-reuse" we broke this simpler reuse case. Two steps forward, one step backward... :-/ The manpage for curl_multi_add_handle does say: When an easy handle has been added to a multi stack, you can not and you must not use curl_easy_perform(3) on that handle! Does that apply to the handle after it has finished its transaction and been removed from the multi object (in which case git is Doing It Wrong)? No it doesn't. The man page should probably be clarified to express that slightly better. It just means that _while_ a handle is added to a multi handle it cannot be used with curl_easy_perform(). So yes, you can remove indeed it from the handle and then do curl_easy_perform(). It works just fine! Several internal caches are kept in the multi handle when that's used though, so when getting the easy handle out after having used it with the multi interface and then using it with the easy interface may cause libcurl to do a little more work than if you would be able to re-add it to the same multi handle do to the operation with that... -- / daniel.haxx.se -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Sun, Feb 16, 2014 at 01:18:52PM +0100, Daniel Stenberg wrote: > On Sat, 15 Feb 2014, Kyle J. McKay wrote: > > >If pipelining is off (the default) and total connections is not 1 > >it sounds to me from the description above that the requests will > >be executed on separate connections until the maximum number of > >connections is in use and then there might be some reuse. > > Not exactly. When about to do a request, libcurl will always try to > find an existing idle but open connection in its connection pool to > re-use. With the multi interface you can of course easily start N > requests at once to the same host and then they'll only re-use > connections to the extent there are connections to pick, otherwise > it'll create new connections. Right; I'd expect multiple connections for parallel requests, but in this case we are completing the first and removing the handle before starting the second. Digging further, I was able to reproduce the behavior with a simple program: -- >8 -- #include #include #include int main(int argc, char **argv) { CURL *curl = curl_easy_init(); CURLM *multi = curl_multi_init(); int want_multi = atoi(argv[2]); int i; curl_easy_setopt(curl, CURLOPT_URL, argv[1]); curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L); for (i = 0; i < 2; i++) { if (i < want_multi) { int nr; curl_multi_add_handle(multi, curl); do { curl_multi_perform(multi, &nr); } while (nr); curl_multi_remove_handle(multi, curl); } else curl_easy_perform(curl); } return 0; } -- 8< -- The program just requests the same URL twice using the same curl handle, optionally using the multi interface for zero, one, or both of the requests. For "0" and "2" (i.e., either both curl_easy or both curl_multi), the connection is reused. But for "1" (in which we use multi for the first request, but easy for the second), we do not reuse the connection. The manpage for curl_multi_add_handle does say: When an easy handle has been added to a multi stack, you can not and you must not use curl_easy_perform(3) on that handle! Does that apply to the handle after it has finished its transaction and been removed from the multi object (in which case git is Doing It Wrong)? Is the program above failing to take some step to finalize the first request, so that the second one knows its connection can be reused? Or should curl be handling this? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Sun, Feb 16, 2014 at 02:33:06PM +0100, Daniel Stenberg wrote: > On Sun, 16 Feb 2014, Daniel Stenberg wrote: > > >Right, the problem is there to make sure that a NTLM-auth > >connection with different credentials aren't re-used. NTLM with its > >connection-oriented authentication breaks the traditional HTTP > >paradigms and before this change there was a risk that libcurl > >would wrongly re-use a NTLM connection that was done with different > >credentials! > > > >I suspect we introduced a regression here with that fix. I'll dig into this. > > Thanks for pointing this out! > > I could repeat this problem with a curl test case so I wrote up two, > and then I made a fix to curl: > > https://github.com/bagder/curl/commit/d765099813f5 Thanks for the quick turnaround. I've confirmed that your patch fixes both my limited test case and Git itself (when using just the curl_easy interface). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Sun, 16 Feb 2014, Daniel Stenberg wrote: Right, the problem is there to make sure that a NTLM-auth connection with different credentials aren't re-used. NTLM with its connection-oriented authentication breaks the traditional HTTP paradigms and before this change there was a risk that libcurl would wrongly re-use a NTLM connection that was done with different credentials! I suspect we introduced a regression here with that fix. I'll dig into this. Thanks for pointing this out! I could repeat this problem with a curl test case so I wrote up two, and then I made a fix to curl: https://github.com/bagder/curl/commit/d765099813f5 -- / daniel.haxx.se -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Sat, 15 Feb 2014, Kyle J. McKay wrote: If pipelining is off (the default) and total connections is not 1 it sounds to me from the description above that the requests will be executed on separate connections until the maximum number of connections is in use and then there might be some reuse. Not exactly. When about to do a request, libcurl will always try to find an existing idle but open connection in its connection pool to re-use. With the multi interface you can of course easily start N requests at once to the same host and then they'll only re-use connections to the extent there are connections to pick, otherwise it'll create new connections. Daniel Stenberg (7 Jan 2014) - ConnectionExists: fix NTLM check for new connection Looks like you're just lucky as that above change first appears in 7.35.0. But it seems there are some patches for older versions so they might be affected as well [2]. Right, the problem is there to make sure that a NTLM-auth connection with different credentials aren't re-used. NTLM with its connection-oriented authentication breaks the traditional HTTP paradigms and before this change there was a risk that libcurl would wrongly re-use a NTLM connection that was done with different credentials! I suspect we introduced a regression here with that fix. I'll dig into this. -- / daniel.haxx.se -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] git http connection reuse
On Feb 15, 2014, at 20:05, Jeff King wrote: I've noticed that git does not reuse http connections when fetching, and I'm trying to figure out why. It seems related to our use of two curl features: 1. If we use the curl_multi interface (even though we are doing the requests sequentially), we do not get reuse. Take a look at the curl_multi_setopt page [1]. I think these explain the curl_multi issue: CURLMOPT_PIPELINING Pass a long set to 1 to enable or 0 to disable. Enabling pipelining on a multi handle will make it attempt to perform HTTP Pipelining as far as possible for transfers using this handle. This means that if you add a second request that can use an already existing connection, the second request will be "piped" on the same connection rather than being executed in parallel. (Added in 7.16.0) CURLMOPT_MAX_TOTAL_CONNECTIONS Pass a long. The set number will be used as the maximum amount of simultaneously open connections in total. For each new session, libcurl will open a new connection up to the limit set by CURLMOPT_MAX_TOTAL_CONNECTIONS. When the limit is reached, the sessions will be pending until there are available connections. If CURLMOPT_PIPELINING is 1, libcurl will try to pipeline if the host is capable of it. The default value is 0, which means that there is no limit. However, for backwards compatibility, setting it to 0 when CURLMOPT_PIPELINING is 1 will not be treated as unlimited. Instead it will open only 1 connection and try to pipeline on it. (Added in 7.30.0) If pipelining is off (the default) and total connections is not 1 it sounds to me from the description above that the requests will be executed on separate connections until the maximum number of connections is in use and then there might be some reuse. That might not be what's actually going on with multi, but that's my guess and I think setting CURLMOPT_PIPELINING to to 1 and then also setting CURLMOPT_MAX_TOTAL_CONNECTIONS to a non-zero value might be what you want. 2. If we set CURLOPT_HTTPAUTH to CURLAUTH_ANY, we do not get reuse. [snip] My curl version is 7.35.0, if that makes a difference. And that is explained by this from the CHANGES file: Daniel Stenberg (7 Jan 2014) - ConnectionExists: fix NTLM check for new connection When the requested authentication bitmask includes NTLM, we cannot re-use a connection for another username/password as we then risk re-using NTLM (connection-based auth). This has the unfortunate downside that if you include NTLM as a possible auth, you cannot re-use connections for other usernames/passwords even if NTLM doesn't end up the auth type used. Reported-by: Paras S Patched-by: Paras S Bug: http://curl.haxx.se/mail/lib-2014-01/0046.html Looks like you're just lucky as that above change first appears in 7.35.0. But it seems there are some patches for older versions so they might be affected as well [2]. Adjusting your test program to leave the first connection set to CURLAUTH_ANY, but then setting the second one to CURLAUTH_ANY with the two NTLM bits turned off (CURLAUTH_NTLM and CURLAUTH_NTLM_WB) results in the connection being reused for me. :) With the older cURL versions I already had installed, the second connection is always reused. I didn't see the non-reuse with your sample code until I fetched 7.35.0. --Kyle [1] http://curl.haxx.se/libcurl/c/curl_multi_setopt.html [2] http://curl.haxx.se/docs/adv_20140129.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html