Re: [PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-19 Thread Daniel Stenberg

On Tue, 18 Jun 2013, Jeff King wrote:

But, I don't know if there is any multi-processing happening within the 
curl library.


I don't think curl does any threading; when we are not inside 
curl_*_perform, there is no curl code running at all (Daniel can correct me 
if I'm wrong on that).


Correct, that's true. The default setup of libcurl never uses any threading at 
all, everything is done using non-blocking calls and state-machines.


There's but a minor exception, so let me describe that case just to be 
perfectly clear:


When you've build libcurl with the threaded resolver backend, libcurl fires 
up a new thread to resolve host names with during the name resolving phase of 
a transfer and that thread can then actually continue to run when 
curl_multi_perform() returns.


That's however very isolated, stricly only for name resolving and there should 
be no way for an application to mess that up. Nothing of what you've discussed 
in this thread would affect or harm that thread. The biggest impact it tends 
to have on applications (that aren't following the API properly or assume a 
little too much) is that it changes the nature of what file descriptors to 
wait for slightly during the name resolve phase.


Some Linux distros ship their default libcurl builds using the threaded 
resolver.


--

 / daniel.haxx.se
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-18 Thread Daniel Stenberg

On Tue, 18 Jun 2013, Jeff King wrote:

TL;DR: I'm just confirming what's said here! =)


My understanding of curl's pointer requirements are:

 1. Older versions of curl (and I do not recall which version off-hand,
but it is not important) stored just the pointer. Calling code was
required to manage the string lifetime itself.

 2. Newer versions of curl will strdup the string in curl_easy_setopt.


That's correct. This new behavior in (2) was introduced in libcurl 7.17.0 - 
released in September 2007 and should thus be fairly rare by now.


I mention this primarily because I think it should be noted that there will 
probably be very little testing by users with such old libcurl versions. It 
may increase the time between a committed change and people notice brekages 
caused by it. Even Debian old-stable has a much newer version.


For older versions, if we were to grow the strbuf, we might free() the 
pointer provided to an earlier call to curl_easy_setopt. But since we are 
about to call curl_easy_setopt with the new value, I would assume that curl 
will never actually look at the old one (i.e., when replacing an old 
pointer, it would not dereference it, but simply overwrite it with the new 
value).


Another accurate description.

--

 / daniel.haxx.se
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-18 Thread Junio C Hamano
Daniel Stenberg dan...@haxx.se writes:

 On Tue, 18 Jun 2013, Jeff King wrote:

 TL;DR: I'm just confirming what's said here! =)

Thanks.  We are very fortunate to have you as the cURL guru who
gives prompt responses and sanity checks to us.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-18 Thread Brandon Casey
On Mon, Jun 17, 2013 at 10:19 PM, Jeff King p...@peff.net wrote:
 On Mon, Jun 17, 2013 at 07:00:40PM -0700, Brandon Casey wrote:

 Curl requires that we manage any strings that we pass to it as pointers.
 So, we should not be overwriting this strbuf after we've passed it to
 curl.

 My understanding of curl's pointer requirements are:

   1. Older versions of curl (and I do not recall which version off-hand,
  but it is not important) stored just the pointer. Calling code was
  required to manage the string lifetime itself.

Daniel mentions that the change happened in libcurl 7.17.  RHEL 4.X
(yes, ancient, dead, I realize) provides 7.12 and RHEL 5.X (yes,
ancient, but still widely in use) provides 7.15.  Just pointing it
out.

   2. Newer versions of curl will strdup the string in curl_easy_setopt.

 So we do not have to worry about newer versions, as they do not care
 about our pointer after curl_easy_setopt returns.

I was probably reading the docs on one of these older platforms when I
wrote this.  I've actually had this patch sitting around for a while.

 For older versions, if we were to grow the strbuf, we might free() the
 pointer provided to an earlier call to curl_easy_setopt. But since we
 are about to call curl_easy_setopt with the new value, I would assume
 that curl will never actually look at the old one (i.e., when replacing
 an old pointer, it would not dereference it, but simply overwrite it
 with the new value).

 So for a single curl handle, I don't think it is a problem.

 It could be a problem when we have multiple handles in play
 simultaneously (we invalidate the pointer that another simultaneous
 handle is using, but do not immediately reset its pointer).

Don't we have multiple handles in play at the same time?  What's going
on in get_active_slot() when USE_CURL_MULTI is defined?  It appears to
be maintaining a list of slot 's, each with its own curl handle
initialized either by curl_easy_duphandle() or get_curl_handle().

So, yeah, this is what I was referring to when I mentioned
potentially dangerous.  Since the current code does not change the
size of the string, the pointer will never change, so we won't ever
invalidate a pointer that another handle is using.

The other thing I thought was potentially dangerous, was just
truncating the string.  Again, if there are multiple curl handles in
use (which I thought was a possibility), then merely truncating the
string that contains the username/password could potentially cause a
problem for another handle that could be in the middle of
authenticating using the string.  But, I don't know if there is any
multi-processing happening within the curl library.

snip

Snip the remaining comments about allowing the user to specify
multiple passwords since I'm not sure they're relevant if we are
indeed using multiple curl handles.

If we _don't_ ever use multiple curl handles, and/or if there is no
threading going on in the background within libcurl, then I don't
think there is really any danger in what the current code does.  It
would just be an issue of needlessly rewriting the same string over
and over again, which is probably not a big deal depending on how
often that happens.

-Brandon
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-18 Thread Jeff King
On Tue, Jun 18, 2013 at 12:29:03PM -0700, Brandon Casey wrote:

1. Older versions of curl (and I do not recall which version off-hand,
   but it is not important) stored just the pointer. Calling code was
   required to manage the string lifetime itself.
 
 Daniel mentions that the change happened in libcurl 7.17.  RHEL 4.X
 (yes, ancient, dead, I realize) provides 7.12 and RHEL 5.X (yes,
 ancient, but still widely in use) provides 7.15.  Just pointing it
 out.

Yeah, I didn't mean to imply we don't care about these versions, only
that our analysis is different between the two sets. We have #ifdefs for
curl going back to 7.7.4. That's probably excessive, but AFAIK, we would
still work with such old versions.

  It could be a problem when we have multiple handles in play
  simultaneously (we invalidate the pointer that another simultaneous
  handle is using, but do not immediately reset its pointer).
 
 Don't we have multiple handles in play at the same time?  What's going
 on in get_active_slot() when USE_CURL_MULTI is defined?  It appears to
 be maintaining a list of slot 's, each with its own curl handle
 initialized either by curl_easy_duphandle() or get_curl_handle().

Yes, we do; the dumb http walker will pipeline loose pack and object
requests (which makes a big difference when fetching small files). The
smart http code may use the curl-multi interface under the hood, but it
should only have a single handle, and the use of the multi interface is
just for sharing code with the dumb fetch.

 So, yeah, this is what I was referring to when I mentioned
 potentially dangerous.  Since the current code does not change the
 size of the string, the pointer will never change, so we won't ever
 invalidate a pointer that another handle is using.

Agreed. I did not so much mean to dispute your potentially dangerous
claim as clarify exactly what the potential is. :)

 The other thing I thought was potentially dangerous, was just
 truncating the string.  Again, if there are multiple curl handles in
 use (which I thought was a possibility), then merely truncating the
 string that contains the username/password could potentially cause a
 problem for another handle that could be in the middle of
 authenticating using the string.  But, I don't know if there is any
 multi-processing happening within the curl library.

I don't think curl does any threading; when we are not inside
curl_*_perform, there is no curl code running at all (Daniel can correct
me if I'm wrong on that).

So I think from curl's perspective a truncation and exact rewrite is
atomic, and it sees only the final content.  I don't know what would
happen if you truncated and put in _different_ contents. For example, if
curl would have written out half of the username/password, blocked and
returned from curl_multi_perform, then you update the buffer, then it
resumes writing.

IOW, I believe the current code is safe (though in a very subtle way),
but if you were to allow password update, I'm not sure if it would be or
not (and if not, you would need a per-handle buffer to make it safe).

I'm fine with making the safety less subtle (e.g., your patch, with a
comment added).

 If we _don't_ ever use multiple curl handles, and/or if there is no
 threading going on in the background within libcurl, then I don't
 think there is really any danger in what the current code does.  It
 would just be an issue of needlessly rewriting the same string over
 and over again, which is probably not a big deal depending on how
 often that happens.

It should be once per http request. But copying a dozen bytes is
probably nothing compared to the actual request.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-17 Thread Brandon Casey
From: Brandon Casey draf...@gmail.com

Curl requires that we manage any strings that we pass to it as pointers.
So, we should not be overwriting this strbuf after we've passed it to
curl.

Additionally, it is unnecessary since we only prompt for the user name
and password once, so we end up overwriting the strbuf with the same
sequence of characters each time.  This is why in practice it has not
caused any problems for git's use of curl; the internal strbuf char
pointer does not change, and get's overwritten with the same string
each time.

But it's unnecessary and potentially dangerous, so let's avoid it.

Signed-off-by: Brandon Casey draf...@gmail.com
---
 http.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/http.c b/http.c
index 92aba59..6828269 100644
--- a/http.c
+++ b/http.c
@@ -228,8 +228,8 @@ static void init_curl_http_auth(CURL *result)
 #else
{
static struct strbuf up = STRBUF_INIT;
-   strbuf_reset(up);
-   strbuf_addf(up, %s:%s,
+   if (!up.len)
+   strbuf_addf(up, %s:%s,
http_auth.username, http_auth.password);
curl_easy_setopt(result, CURLOPT_USERPWD, up.buf);
}
-- 
1.8.3.1.440.gc2bf105

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-17 Thread Eric Sunshine
On Mon, Jun 17, 2013 at 10:00 PM, Brandon Casey bca...@nvidia.com wrote:
 From: Brandon Casey draf...@gmail.com

 Curl requires that we manage any strings that we pass to it as pointers.
 So, we should not be overwriting this strbuf after we've passed it to
 curl.

 Additionally, it is unnecessary since we only prompt for the user name
 and password once, so we end up overwriting the strbuf with the same
 sequence of characters each time.  This is why in practice it has not
 caused any problems for git's use of curl; the internal strbuf char
 pointer does not change, and get's overwritten with the same string

s/get's/gets/

 each time.

 But it's unnecessary and potentially dangerous, so let's avoid it.

 Signed-off-by: Brandon Casey draf...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: don't rewrite the user:passwd string multiple times

2013-06-17 Thread Jeff King
On Mon, Jun 17, 2013 at 07:00:40PM -0700, Brandon Casey wrote:

 Curl requires that we manage any strings that we pass to it as pointers.
 So, we should not be overwriting this strbuf after we've passed it to
 curl.

My understanding of curl's pointer requirements are:

  1. Older versions of curl (and I do not recall which version off-hand,
 but it is not important) stored just the pointer. Calling code was
 required to manage the string lifetime itself.

  2. Newer versions of curl will strdup the string in curl_easy_setopt.

So we do not have to worry about newer versions, as they do not care
about our pointer after curl_easy_setopt returns.

For older versions, if we were to grow the strbuf, we might free() the
pointer provided to an earlier call to curl_easy_setopt. But since we
are about to call curl_easy_setopt with the new value, I would assume
that curl will never actually look at the old one (i.e., when replacing
an old pointer, it would not dereference it, but simply overwrite it
with the new value).

So for a single curl handle, I don't think it is a problem.

It could be a problem when we have multiple handles in play
simultaneously (we invalidate the pointer that another simultaneous
handle is using, but do not immediately reset its pointer).

 Additionally, it is unnecessary since we only prompt for the user name
 and password once, so we end up overwriting the strbuf with the same
 sequence of characters each time.  This is why in practice it has not
 caused any problems for git's use of curl; the internal strbuf char
 pointer does not change, and get's overwritten with the same string
 each time.

In the current code, yes, we only do this once (and if we have a
username/password from the URL, we do not re-prompt if that fails).

 diff --git a/http.c b/http.c
 index 92aba59..6828269 100644
 --- a/http.c
 +++ b/http.c
 @@ -228,8 +228,8 @@ static void init_curl_http_auth(CURL *result)
  #else
   {
   static struct strbuf up = STRBUF_INIT;
 - strbuf_reset(up);
 - strbuf_addf(up, %s:%s,
 + if (!up.len)
 + strbuf_addf(up, %s:%s,
   http_auth.username, http_auth.password);
   curl_easy_setopt(result, CURLOPT_USERPWD, up.buf);

This is correct for the current code because of the reasoning above.
I'm slightly negative on this only because it feels like we are setting
a trap for somebody who later wants to do:

  for (sanity = 0; sanity  5; sanity++) {
  int ret = http_request(...);
  if (ret != HTTP_REAUTH)
  return ret;
  }

to give the user a few chances to input.  We would continue to update
the credential struct but never actually give the new value to curl.

Another option would be to just use a static fixed-size buffer. That
removes all memory management issues.

I dunno. Maybe I am being too picky, as I do not have plans to do
anything like the above (since we don't do significant work before the
http contact, there is no reason not to just die() and let the user
re-run the shell command).  I'd also be OK with just putting a comment
above the code in question to say something like Note that we assume we
only ever have a single set of credentials in a given program run, so we
do not have to worry about updating this buffer, only setting its
initial value. Then the trap at least has a warning sign. :)

What do you think?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html