Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-12-04 Thread Jeff King
On Mon, Dec 04, 2017 at 07:18:55AM +, Florian Manschwetus wrote:

> I could provide a bash script I used in between to make this working
> with IIS, without fixing http-backend binary, maybe this helps to
> understand how this cases might be handled.

I think it would be pretty easy to handle all cases by inserting another
process in front of http-backend that just reads $CONTENT_LENGTH bytes
and then gives an EOF to http-backend. I assume that's what your bash
script does. It's just that this passes the data through an extra layer
of pipes.

If that's an acceptable tradeoff, it seems like an ideal solution from a
maintainability standpoint, since it skips all the tricky situations
inside the http-backend code.

-Peff


AW: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-12-03 Thread Florian Manschwetus
Hi All,
I could provide a bash script I used in between to make this working with IIS, 
without fixing http-backend binary, maybe this helps to understand how this 
cases might be handled.

Mit freundlichen Grüßen / With kind regards
Florian Manschwetus



> -Ursprüngliche Nachricht-
> Von: Junio C Hamano [mailto:gits...@pobox.com]
> Gesendet: Sonntag, 3. Dezember 2017 07:07
> An: Jeff King
> Cc: Max Kirillov; Eric Sunshine; Florian Manschwetus; Chris Packham;
> Konstantin Khomoutov; git@vger.kernel.org
> Betreff: Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as
> specified by rfc3875
> 
> Jeff King  writes:
> 
> > ... Eventually our fill() will block trying to get data that is not
> > there. On an Apache server, the webserver would know there is nothing
> > left to send us and close() the pipe, and we'd get EOF.
> > But on IIS, I think the pipe remains open and we'd just block
> > indefinitely trying to read().
> 
> Ah, yeah, under that scenario, trusting content-length and trying to read,
> waiting for input that would never come, will be a problem, and it would
> probably want to get documented.



Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-12-02 Thread Junio C Hamano
Jeff King  writes:

> ... Eventually our fill() will block trying to get data that is
> not there. On an Apache server, the webserver would know there is
> nothing left to send us and close() the pipe, and we'd get EOF.
> But on IIS, I think the pipe remains open and we'd just block
> indefinitely trying to read().

Ah, yeah, under that scenario, trusting content-length and trying to
read, waiting for input that would never come, will be a problem,
and it would probably want to get documented.



Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-12-02 Thread Jeff King
On Sat, Dec 02, 2017 at 05:02:37PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >   3. For large inputs (like incoming packfiles), we connect the
> >  descriptor directly to index-pack or unpack-objects, and they try
> >  to read to EOF.
> >
> >  For a well-formed pack, I _think_ this would work OK. We'd see the
> >  end of the pack and quit (there's a check for garbage at the end of
> >  the pack, but it triggers only for the non-pipe case).
> >
> >  For a truncated input, we'd hang forever rather than report an
> >  error.
> 
> Hmm.  index-pack and/or unpack-objects would be fed directly from
> the incoming pipe, they are not told how many bytes to expect (by
> design), so they try to read to EOF, which may come after the end of
> the pack-stream data, and they write the remaining junk to their
> standard output IIRC.
> 
> For a well-formed pack, the above is what should heppen.

Yeah, there should be zero bytes of "remaining junk" in the normal
well-formed case. And as long as the webserver does not mind us asking
to read() a few extra bytes, we are fine (it tells us there are no more
bytes available right now). The original problem report with IIS was
that it would hang trying to read() that any final EOF, and I don't
think that would happen here.

I wouldn't be surprised if there are webservers or situations where that
extra read() behaves badly (e.g., because it is connected directly to
the client socket and the client is trying to pipeline requests or
something).

> I am having trouble trying to come up with a case where the input
> stream is mangled and we cannot detect where the end of the
> pack-stream is without reading more than we will be fed through the
> pipe, and yet we do not trigger an "we tried to read because the data
> we received so far is incomplete, and got an EOF" error.
> 
> Wouldn't "early EOF" trigger in the fill() helper that these two
> programs have (but not share X-<)?

I think the original problem was the opposite of "early EOF": the other
side of the pipe never gives us EOF at all. So imagine the pack is
mangled so that the zlib stream of an object never ends, and just keeps
asking for more data. Eventually our fill() will block trying to get
data that is not there. On an Apache server, the webserver would know
there is nothing left to send us and close() the pipe, and we'd get EOF.
But on IIS, I think the pipe remains open and we'd just block
indefinitely trying to read().

I don't have such a setup to test on, and it's possible I'm
mis-interpreting or mis-remembering the discussion around the original
patch.

-Peff


Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-12-02 Thread Junio C Hamano
Jeff King  writes:

>   3. For large inputs (like incoming packfiles), we connect the
>  descriptor directly to index-pack or unpack-objects, and they try
>  to read to EOF.
>
>  For a well-formed pack, I _think_ this would work OK. We'd see the
>  end of the pack and quit (there's a check for garbage at the end of
>  the pack, but it triggers only for the non-pipe case).
>
>  For a truncated input, we'd hang forever rather than report an
>  error.

Hmm.  index-pack and/or unpack-objects would be fed directly from
the incoming pipe, they are not told how many bytes to expect (by
design), so they try to read to EOF, which may come after the end of
the pack-stream data, and they write the remaining junk to their
standard output IIRC.

For a well-formed pack, the above is what should heppen.

I am having trouble trying to come up with a case where the input
stream is mangled and we cannot detect where the end of the
pack-stream is without reading more than we will be fed through the
pipe, and yet we do not trigger an "we tried to read because the data
we received so far is incomplete, and got an EOF" error.

Wouldn't "early EOF" trigger in the fill() helper that these two
programs have (but not share X-<)?



Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-11-28 Thread Jeff King
On Sun, Nov 26, 2017 at 09:38:12PM +0200, Max Kirillov wrote:

> From: Florian Manschwetus 
> Date: Wed, 30 Mar 2016 10:54:21 +0200
> 
> http-backend reads whole input until EOF. However, the RFC 3875 specifies
> that a script must read only as many bytes as specified by CONTENT_LENGTH
> environment variable. Web server may exercise the specification by not closing
> the script's standard input after writing content. In that case http-backend
> would hang waiting for the input. The issue is known to happen with
> IIS/Windows, for example.
> 
> Make http-backend read only CONTENT_LENGTH bytes, if it's defined, rather than
> the whole input until EOF. If the variable is not defined, keep older behavior
> of reading until EOF because it is used to support chunked transfer-encoding.

I missed out on review of the earlier iterations from the past few days,
but I looked this over in light of the comments I made long ago in:

  https://public-inbox.org/git/20160329201349.gb9...@sigill.intra.peff.net/

The concerns I had there were:

  1. It did the wrong thing when CONTENT_LENGTH was not present in the
 environment. Your version here gets this right.

  2. I earlier was worried that this wouldn't kick in for the
 inflate_request() code path. It looks like we do use read_request()
 from inflate_request(), but only when buffer_input is true. Which
 means we wouldn't do so for receive-pack (i.e., for pushes).

 I'm not sure if the client would ever gzip a push request. Without
 double-checking, I suspect it would if the list of refs to update
 was sufficiently large (and at any rate, I think other clients
 potentially could do so).

 That _might_ be OK in practice. If the gzip stream is well-formed,
 we'd stop at its end. We'd possibly still try to read() more bytes
 than were promised, but if I understand the original problem,
 that's not that big a deal as long as we don't hang waiting for
 EOF.

 If the stream isn't well-formed, we'd hang waiting for more bytes
 (rather than seeing the EOF and complaining that the gzip stream is
 truncated). That's poor, but no worse than the current behavior.

  3. For large inputs (like incoming packfiles), we connect the
 descriptor directly to index-pack or unpack-objects, and they try
 to read to EOF.

 For a well-formed pack, I _think_ this would work OK. We'd see the
 end of the pack and quit (there's a check for garbage at the end of
 the pack, but it triggers only for the non-pipe case).

 For a truncated input, we'd hang forever rather than report an
 error.

So I suspect there are lurking problems that may trigger in corner
cases. That said, I don't think this pack makes any case _worse_, and it
may make some common ones better. So I'm not opposed, though we may be
giving people a false sense of security that it actually works robustly
on IIS.

> ---
>  config.c   |  2 +-
>  config.h   |  1 +
>  http-backend.c | 50 +-
>  3 files changed, 51 insertions(+), 2 deletions(-)

The patch itself looks OK, modulo the comments already made by others.
Two minor observations, and a possible bug:

> +static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char 
> **out)
> +{

I did wonder if this "stop at n bytes" could simply be rolled into the
existing read_request loop (by limiting the length we pass to
read_in_full there). But it may be cleaner to just have a separate
function. There's some repetition, but not much since we can rely on a
single malloc and read_in_full() for this case.

> + unsigned char *buf = NULL;
> + ssize_t cnt = 0;
> +
> + if (max_request_buffer < req_len) {
> + die("request was larger than our maximum size (%lu): %lu;"
> + " try setting GIT_HTTP_MAX_REQUEST_BUFFER",
> + max_request_buffer,
> + req_len);
> + }
> +
> + if (req_len <= 0) {
> + *out = NULL;
> + return 0;
> + }

I was slightly surprised by "<= 0" here. We should never get here with a
negative req_len, since we'd catch that in the read_request() wrapper
here. If we want to document that assumption, should this be
assert(req_len >= 0)?

I'm also puzzled about the behavior with a zero-byte CONTENT_LENGTH.
We'd return NULL here. But in the other read_request_eof() path, we'd
end up with a non-NULL pointer. I'm not sure if it matters to the caller
or not, but it seems like a potential trap.

Is it even worth special-casing here? Our xmalloc and read_in_full
wrappers should handle the zero-byte just fine. I think this whole
conditional could just go away.

-Peff


Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-11-26 Thread Eric Sunshine
On Sun, Nov 26, 2017 at 2:38 PM, Max Kirillov  wrote:
> [...]
> Make http-backend read only CONTENT_LENGTH bytes, if it's defined, rather than
> the whole input until EOF. If the variable is not defined, keep older behavior
> of reading until EOF because it is used to support chunked transfer-encoding.
> [...]

A few small comments below; with the possible exception of one,
probably none worth a re-roll...

> diff --git a/http-backend.c b/http-backend.c
> @@ -317,6 +317,54 @@ static ssize_t read_request(int fd, unsigned char **out)
> +static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char 
> **out)
> +{
> +   unsigned char *buf = NULL;
> +   ssize_t cnt = 0;
> +
> +   if (max_request_buffer < req_len) {
> +   die("request was larger than our maximum size (%lu): %lu;"
> +   " try setting GIT_HTTP_MAX_REQUEST_BUFFER",
> +   max_request_buffer,
> +   req_len);

Unsigned format conversion '%lu' doesn't seem correct for ssize_t.

> +   }
> +
> +   if (req_len <= 0) {
> +   *out = NULL;
> +   return 0;
> +   }
> +
> +   buf = xmalloc(req_len);
> +   cnt = read_in_full(fd, buf, req_len);
> +   if (cnt < 0) {
> +   free(buf);
> +   return -1;
> +   } else {
> +   *out = buf;
> +   return cnt;
> +   }

This could have been written:

if (cnt < 0) {
free(buf);
return -1;
}
*out = buf;
return cnt;

but not worth a re-roll.

> +}
> +
> +static ssize_t env_content_length(void)

The caller of this function doesn't care how the content length is
being determined -- whether it comes from an environment variable or
is computed some other way; it cares only about the result. Having
"env" in the name ties it to checking only the environment. A more
generic name, such as get_content_length(), would help to decouple the
API from the implementation.

Nevertheless, not worth a re-roll.

> +{
> +   ssize_t val = -1;
> +   const char *str = getenv("CONTENT_LENGTH");
> +
> +   if (str && !git_parse_ssize_t(str, &val))

git_parse_ssize_t() does the right thing even when 'str' is NULL, so
this condition could be simplified (but not worth a re-roll and may
not improve clarity).

> +   die("failed to parse CONTENT_LENGTH: %s", str);
> +   return val;
> +}
> +
> +static ssize_t read_request(int fd, unsigned char **out)
> +{
> +   ssize_t req_len = env_content_length();

Grabbing and parsing the value from the environment variable is
effectively a one-liner, so env_content_length() could be dropped
altogether, and instead (taking advantage of git_parse_ssize_t()'s
proper NULL-handling):

if (!git_parse_ssize_t(getenv(...), &req_len))
die(...);

Not worth a re-roll.

> +   if (req_len < 0)
> +   return read_request_eof(fd, out);
> +   else
> +   return read_request_fixed_len(fd, req_len, out);
> +}


[PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2017-11-26 Thread Max Kirillov
From: Florian Manschwetus 

From: Florian Manschwetus 
Date: Wed, 30 Mar 2016 10:54:21 +0200

http-backend reads whole input until EOF. However, the RFC 3875 specifies
that a script must read only as many bytes as specified by CONTENT_LENGTH
environment variable. Web server may exercise the specification by not closing
the script's standard input after writing content. In that case http-backend
would hang waiting for the input. The issue is known to happen with
IIS/Windows, for example.

Make http-backend read only CONTENT_LENGTH bytes, if it's defined, rather than
the whole input until EOF. If the variable is not defined, keep older behavior
of reading until EOF because it is used to support chunked transfer-encoding.

Signed-off-by: Florian Manschwetus 
[mk: fixed trivial build failures and polished style issues]
Signed-off-by: Max Kirillov 
---
 config.c   |  2 +-
 config.h   |  1 +
 http-backend.c | 50 +-
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index 231f9a750b..d3ec14ab74 100644
--- a/config.c
+++ b/config.c
@@ -853,7 +853,7 @@ int git_parse_ulong(const char *value, unsigned long *ret)
return 1;
 }
 
-static int git_parse_ssize_t(const char *value, ssize_t *ret)
+int git_parse_ssize_t(const char *value, ssize_t *ret)
 {
intmax_t tmp;
if (!git_parse_signed(value, &tmp, 
maximum_signed_value_of_type(ssize_t)))
diff --git a/config.h b/config.h
index 0352da117b..46a4989def 100644
--- a/config.h
+++ b/config.h
@@ -48,6 +48,7 @@ extern void git_config(config_fn_t fn, void *);
 extern int config_with_options(config_fn_t fn, void *,
   struct git_config_source *config_source,
   const struct config_options *opts);
+extern int git_parse_ssize_t(const char *, ssize_t *);
 extern int git_parse_ulong(const char *, unsigned long *);
 extern int git_parse_maybe_bool(const char *);
 extern int git_config_int(const char *, const char *);
diff --git a/http-backend.c b/http-backend.c
index 519025d2c3..af7dd00d70 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -280,7 +280,7 @@ static struct rpc_service *select_service(struct strbuf 
*hdr, const char *name)
  * hit max_request_buffer we die (we'd rather reject a
  * maliciously large request than chew up infinite memory).
  */
-static ssize_t read_request(int fd, unsigned char **out)
+static ssize_t read_request_eof(int fd, unsigned char **out)
 {
size_t len = 0, alloc = 8192;
unsigned char *buf = xmalloc(alloc);
@@ -317,6 +317,54 @@ static ssize_t read_request(int fd, unsigned char **out)
}
 }
 
+static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char 
**out)
+{
+   unsigned char *buf = NULL;
+   ssize_t cnt = 0;
+
+   if (max_request_buffer < req_len) {
+   die("request was larger than our maximum size (%lu): %lu;"
+   " try setting GIT_HTTP_MAX_REQUEST_BUFFER",
+   max_request_buffer,
+   req_len);
+   }
+
+   if (req_len <= 0) {
+   *out = NULL;
+   return 0;
+   }
+
+   buf = xmalloc(req_len);
+   cnt = read_in_full(fd, buf, req_len);
+   if (cnt < 0) {
+   free(buf);
+   return -1;
+   } else {
+   *out = buf;
+   return cnt;
+   }
+}
+
+static ssize_t env_content_length(void)
+{
+   ssize_t val = -1;
+   const char *str = getenv("CONTENT_LENGTH");
+
+   if (str && !git_parse_ssize_t(str, &val))
+   die("failed to parse CONTENT_LENGTH: %s", str);
+   return val;
+}
+
+static ssize_t read_request(int fd, unsigned char **out)
+{
+   ssize_t req_len = env_content_length();
+
+   if (req_len < 0)
+   return read_request_eof(fd, out);
+   else
+   return read_request_fixed_len(fd, req_len, out);
+}
+
 static void inflate_request(const char *prog_name, int out, int buffer_input)
 {
git_zstream stream;
-- 
2.11.0.1122.gc3fec58.dirty