Re: [PATCH v2 3/3] http-backend: spool ref negotiation requests to buffer

2015-05-25 Thread Konstantin Ryabitsev
On 20 May 2015 at 03:37, Jeff King p...@peff.net wrote:
 +   /* partial read from read_in_full means we hit EOF */
 +   len += cnt;
 +   if (len  alloc) {
 +   *out = buf;
 +   warning(request size was %lu, (unsigned long)len);
 +   return len;
 +   }

Jeff:

This patch appears to work well -- the only complaint I have is that I
now have warning: request size was NNN all over my error logs. :) Is
it supposed to convey an actual warning message, or is it merely a
debug statement?

Best,
-- 
Konstantin Ryabitsev
Sr. Systems Administrator
Linux Foundation Collab Projects
541-224-6067
Montréal, Québec
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] http-backend: spool ref negotiation requests to buffer

2015-05-25 Thread Jeff King
On Mon, May 25, 2015 at 10:07:50PM -0400, Konstantin Ryabitsev wrote:

 On 20 May 2015 at 03:37, Jeff King p...@peff.net wrote:
  +   /* partial read from read_in_full means we hit EOF */
  +   len += cnt;
  +   if (len  alloc) {
  +   *out = buf;
  +   warning(request size was %lu, (unsigned long)len);
  +   return len;
  +   }
 
 Jeff:
 
 This patch appears to work well -- the only complaint I have is that I
 now have warning: request size was NNN all over my error logs. :) Is
 it supposed to convey an actual warning message, or is it merely a
 debug statement?

Whoops, yeah, it was just for debugging. I missed that one when sending
out the patch.

Junio, the squashable patch is below (on jk/http-backend-deadlock-2.2),
and it looks like nothing has hit next yet. But you did do some
up-merging of the topic. Let me know if you would prefer to just have a
patch on top.

diff --git a/http-backend.c b/http-backend.c
index d1333b8..6bf139b 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -295,7 +295,6 @@ static ssize_t read_request(int fd, unsigned char **out)
len += cnt;
if (len  alloc) {
*out = buf;
-   warning(request size was %lu, (unsigned long)len);
return len;
}
 

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] http-backend: spool ref negotiation requests to buffer

2015-05-20 Thread Jeff King
When http-backend spawns upload-pack to do ref
negotiation, it streams the http request body to
upload-pack, who then streams the http response back to the
client as it reads. In theory, git can go full-duplex; the
client can consume our response while it is still sending
the request.  In practice, however, HTTP is a half-duplex
protocol. Even if our client is ready to read and write
simultaneously, we may have other HTTP infrastructure in the
way, including the webserver that spawns our CGI, or any
intermediate proxies.

In at least one documented case[1], this leads to deadlock
when trying a fetch over http. What happens is basically:

  1. Apache proxies the request to the CGI, http-backend.

  2. http-backend gzip-inflates the data and sends
 the result to upload-pack.

  3. upload-pack acts on the data and generates output over
 the pipe back to Apache. Apache isn't reading because
 it's busy writing (step 1).

This works fine most of the time, because the upload-pack
output ends up in a system pipe buffer, and Apache reads
it as soon as it finishes writing. But if both the request
and the response exceed the system pipe buffer size, then we
deadlock (Apache blocks writing to http-backend,
http-backend blocks writing to upload-pack, and upload-pack
blocks writing to Apache).

We need to break the deadlock by spooling either the input
or the output. In this case, it's ideal to spool the input,
because Apache does not start reading either stdout _or_
stderr until we have consumed all of the input. So until we
do so, we cannot even get an error message out to the
client.

The solution is fairly straight-forward: we read the request
body into an in-memory buffer in http-backend, freeing up
Apache, and then feed the data ourselves to upload-pack. But
there are a few important things to note:

  1. We limit the in-memory buffer to prevent an obvious
 denial-of-service attack. This is a new hard limit on
 requests, but it's unlikely to come into play. The
 default value is 10MB, which covers even the ridiculous
 100,000-ref negotation in the included test (that
 actually caps out just over 5MB). But it's configurable
 on the off chance that you don't mind spending some
 extra memory to make even ridiculous requests work.

  2. We must take care only to buffer when we have to. For
 pushes, the incoming packfile may be of arbitrary
 size, and we should connect the input directly to
 receive-pack. There's no deadlock problem here, though,
 because we do not produce any output until the whole
 packfile has been read.

 For upload-pack's initial ref advertisement, we
 similarly do not need to buffer. Even though we may
 generate a lot of output, there is no request body at
 all (i.e., it is a GET, not a POST).

[1] http://article.gmane.org/gmane.comp.version-control.git/269020

Test-adapted-from: Dennis Kaarsemaker den...@kaarsemaker.net
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-http-backend.txt |  9 
 http-backend.c | 97 +-
 t/t5551-http-fetch-smart.sh| 15 ++
 3 files changed, 110 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-http-backend.txt 
b/Documentation/git-http-backend.txt
index d422ba4..8c6acbe 100644
--- a/Documentation/git-http-backend.txt
+++ b/Documentation/git-http-backend.txt
@@ -255,6 +255,15 @@ The GIT_HTTP_EXPORT_ALL environmental variable may be 
passed to
 'git-http-backend' to bypass the check for the git-daemon-export-ok
 file in each repository before allowing export of that repository.
 
+The `GIT_HTTP_MAX_REQUEST_BUFFER` environment variable (or the
+`http.maxRequestBuffer` config variable) may be set to change the
+largest ref negotiation request that git will handle during a fetch; any
+fetch requiring a larger buffer will not succeed.  This value should not
+normally need to be changed, but may be helpful if you are fetching from
+a repository with an extremely large number of refs.  The value can be
+specified with a unit (e.g., `100M` for 100 megabytes). The default is
+10 megabytes.
+
 The backend process sets GIT_COMMITTER_NAME to '$REMOTE_USER' and
 GIT_COMMITTER_EMAIL to '$\{REMOTE_USER}@http.$\{REMOTE_ADDR\}',
 ensuring that any reflogs created by 'git-receive-pack' contain some
diff --git a/http-backend.c b/http-backend.c
index 3ad82a8..d1333b8 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -13,18 +13,20 @@ static const char content_type[] = Content-Type;
 static const char content_length[] = Content-Length;
 static const char last_modified[] = Last-Modified;
 static int getanyfile = 1;
+static unsigned long max_request_buffer = 10 * 1024 * 1024;
 
 static struct string_list *query_params;
 
 struct rpc_service {
const char *name;
const char *config_name;
+   unsigned buffer_input : 1;
signed enabled : 2;
 };
 
 static struct rpc_service rpc_service[] = {
-