Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list

2017-11-17 Thread Jeff King
On Fri, Nov 17, 2017 at 10:42:52AM -0500, Jeff Hostetler wrote:

> > Yes, I share the same feeling.  It does not help that the series
> > defines its own notion of arg_needs_armor() and uses it to set a
> > field called requires_armor that is not yet used, the definition of
> > "armor"ing being each byte getting encoded as two hexadecimal digits
> > without any sign (which makes me wonder what a receiver of
> > "deadbeef" would do---did it receive an armored string or a plain
> > one???).  I do not understand why these strings are not passed as
> > opaque sequences of bytes and instead converted at this low a layer.
> 
> I'm probably being too paranoid.  My fear is that a client could pass
> an expression to clone/fetch/fetch-pack that would be sent to the
> server and evaluated by the interface between upload-pack and pack-objects.
> I'm not worried about the pack-protocol transport.  I'm mainly concerned
> in how upload-pack passes that *client-expression* to pack-objects and are
> there ways for that to go south on the server with a carefully crafted
> expression.

I think you have to trust that those interfaces are capable of passing
raw bytes, whether directly via execve() or because we got the quoting
right. If there's a bug there, it's going to be a bigger problem than
just this code path (and the fix needs to be there, not second-guessing
it in the callers).

So I'd say that yeah, you are being too paranoid.

As an aside, though, I wonder if these client expressions should be fed
over stdin to pack-objects. That removes any argv limits we might run
into on the server side. It also removes shell injections as a
possibility, though of course we'd need quoting in that layer to avoid
argument-injection to pack-objects.

-Peff


Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list

2017-11-17 Thread Jeff Hostetler



On 11/16/2017 9:14 PM, Junio C Hamano wrote:

Jeff King  writes:


Those encodings don't necessarily need to be the same, because they're
about transport. Inside each process we'd have the raw bytes, and encode
them as appropriate to whatever sub-program we're going to pass to (or
not at all if we skip the shell for sub-processes, which is usually a
good idea).


Yes, I share the same feeling.  It does not help that the series
defines its own notion of arg_needs_armor() and uses it to set a
field called requires_armor that is not yet used, the definition of
"armor"ing being each byte getting encoded as two hexadecimal digits
without any sign (which makes me wonder what a receiver of
"deadbeef" would do---did it receive an armored string or a plain
one???).  I do not understand why these strings are not passed as
opaque sequences of bytes and instead converted at this low a layer.


I'm probably being too paranoid.  My fear is that a client could pass
an expression to clone/fetch/fetch-pack that would be sent to the
server and evaluated by the interface between upload-pack and pack-objects.
I'm not worried about the pack-protocol transport.  I'm mainly concerned
in how upload-pack passes that *client-expression* to pack-objects and are
there ways for that to go south on the server with a carefully crafted
expression.

Even if we assume that upload-pack on the server directly invokes
pack-objects (rather than a shell), there still might be issues.
For platforms like Linux which have a native execve() and can pass
args in an argv (and which the sub-process also receives in an argv
in their main()), my paranoia is probably overkill.

But on Windows, where the native interface takes a command-line string
rather than an argv, I was concerned.  Yes, there is code in compat/mingw.c
to quote args when building a command line from an argv (and I'm *not*
saying there are bugs in that), but again maybe I am being paranoid.

I'll take another look and the existing quoting mechanisms and re-eval.

Thanks,
Jeff




Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list

2017-11-16 Thread Junio C Hamano
Jeff King  writes:

> Those encodings don't necessarily need to be the same, because they're
> about transport. Inside each process we'd have the raw bytes, and encode
> them as appropriate to whatever sub-program we're going to pass to (or
> not at all if we skip the shell for sub-processes, which is usually a
> good idea).

Yes, I share the same feeling.  It does not help that the series
defines its own notion of arg_needs_armor() and uses it to set a
field called requires_armor that is not yet used, the definition of
"armor"ing being each byte getting encoded as two hexadecimal digits
without any sign (which makes me wonder what a receiver of
"deadbeef" would do---did it receive an armored string or a plain
one???).  I do not understand why these strings are not passed as
opaque sequences of bytes and instead converted at this low a layer.







Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list

2017-11-16 Thread Jeff King
On Thu, Nov 16, 2017 at 04:49:08PM -0500, Jeff Hostetler wrote:

> > First of all, about the injection problem, replying to your previous e-mail
> > [1]:
> > 
> > https://public-inbox.org/git/61855872-221b-0e97-abaa-24a011ad8...@jeffhostetler.com/
> > 
> > > I couldn't use quote.[ch] because it is more concerned with
> > > quoting pathnames because of LF and CR characters within
> > > them -- rather than semicolons and quotes and the like which
> > > I was concerned about.
> > 
> > sq_quote_buf() (or one of the other similarly-named functions) should
> > solve this problem, right? The single quotes around the argument takes
> > care of LF, CR, and semicolons, and things like backslashes and quotes
> > are taken care of as documented.
> > 
> > I don't think we need to invent another encoding to solve this.
> 
> I'll take another look, sq_quote_buf() looks like it might work.
> I was looking at quote_c_style() and that didn't seem right for
> my needs.  Thanks.

I admit I haven't been following this thread closely, but I couldn't
seem to find any indication of exactly which interfaces need quoting, or
who is expected to unquote (here or in the previous iterations).

It sounds like you're worried about shell injection, but shouldn't we
worry about that the actual shell boundary? Likewise, if these values
are being passed over the git protocol, shouldn't that part of the
protocol be designed to encode arbitrary bytes?

Those encodings don't necessarily need to be the same, because they're
about transport. Inside each process we'd have the raw bytes, and encode
them as appropriate to whatever sub-program we're going to pass to (or
not at all if we skip the shell for sub-processes, which is usually a
good idea).

I have the feeling I'm missing something.

-Peff


Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list

2017-11-16 Thread Jeff Hostetler



On 11/16/2017 3:21 PM, Jonathan Tan wrote:

On Thu, 16 Nov 2017 18:07:41 +
Jeff Hostetler  wrote:


+/*
+ * Return 1 if the given string needs armoring because of "special"
+ * characters that may cause injection problems when a command passes
+ * the argument to a subordinate command (such as when upload-pack
+ * launches pack-objects).
+ *
+ * The usual alphanumeric and key punctuation do not trigger it.
+ */
+static int arg_needs_armor(const char *arg)


First of all, about the injection problem, replying to your previous e-mail
[1]:

https://public-inbox.org/git/61855872-221b-0e97-abaa-24a011ad8...@jeffhostetler.com/


I couldn't use quote.[ch] because it is more concerned with
quoting pathnames because of LF and CR characters within
them -- rather than semicolons and quotes and the like which
I was concerned about.


sq_quote_buf() (or one of the other similarly-named functions) should
solve this problem, right? The single quotes around the argument takes
care of LF, CR, and semicolons, and things like backslashes and quotes
are taken care of as documented.

I don't think we need to invent another encoding to solve this.


I'll take another look, sq_quote_buf() looks like it might work.
I was looking at quote_c_style() and that didn't seem right for
my needs.  Thanks.





+{
+   const unsigned char *p;
+
+   for (p = (const unsigned char *)arg; *p; p++) {
+   if (*p >= 'a' && *p <= 'z')
+   continue;
+   if (*p >= 'A' && *p <= 'Z')
+   continue;
+   if (*p >= '0' && *p <= '9')
+   continue;
+   if (*p == '-' || *p == '_' || *p == '.' || *p == '/')
+   continue;


If we do take this approach, can ':' also be included?


Sure.  I just picked the common ones.




+   if (skip_prefix(arg, "sparse:oid=", )) {
+   struct object_context oc;
+   struct object_id sparse_oid;
+
+   /*
+* Try to parse  into an OID for the current
+* command, but DO NOT complain if we don't have the blob or
+* ref locally.
+*/
+   if (!get_oid_with_context(v0, GET_OID_BLOB,
+ _oid, ))
+   filter_options->sparse_oid_value = oiddup(_oid);
+   filter_options->choice = LOFC_SPARSE_OID;
+   if (arg_needs_armor(v0))
+   filter_options->requires_armor = v0 - arg;
+   return 0;
+   }


In your previous e-mail, you mentioned:


yes.  I always pass filter_options.raw_value over the wire.
The code above tries to parse it and put it in an OID for
private use by the current process -- just like the size limit
value in the blob:limit filter.


So I think this function should complain if you don't have the blob or
ref locally. (I envision that if a filter string is to be directly sent
to a server, it should be stored as a string, not processed by this
function first.)


The whole point was for clone to be able to ask the server to use
a known sparse-checkout spec, based on a branch name and a pathname
within the tree.  That's more usable.  My expectation was that users
could keep one or more sparse-checkout specs in the tree based upon
the area/subset of the tree they want to work on.  Then do a partial
clone based upon the desired subset.  Granted, we make them publish
OIDs (outside of git) and then make the user request the partial
clone by OID, but that's awkward.

There are 2 uses of the "--filter=..." parser -- one to actually use
the data within the current process (rev-list and pack-objects) and
one to capture the value and pass it on to another process (fetch-pack
and upload-pack).  The former want the fields fully decoded into
actual variables and the latter want the raw string to pass on.
The parser routine provides both values for its callers.  (In an
earlier draft, the parser would replace the raw_value of a sparse
spec with the actual OID when it had it locally, but I took that
out based on your earlier comments.  Now, the raw_value is always
passed by the second type of usages.)

Jeff



Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list

2017-11-16 Thread Jonathan Tan
On Thu, 16 Nov 2017 18:07:41 +
Jeff Hostetler  wrote:

> +/*
> + * Return 1 if the given string needs armoring because of "special"
> + * characters that may cause injection problems when a command passes
> + * the argument to a subordinate command (such as when upload-pack
> + * launches pack-objects).
> + *
> + * The usual alphanumeric and key punctuation do not trigger it.
> + */ 
> +static int arg_needs_armor(const char *arg)

First of all, about the injection problem, replying to your previous e-mail
[1]:

https://public-inbox.org/git/61855872-221b-0e97-abaa-24a011ad8...@jeffhostetler.com/

> I couldn't use quote.[ch] because it is more concerned with
> quoting pathnames because of LF and CR characters within
> them -- rather than semicolons and quotes and the like which
> I was concerned about.

sq_quote_buf() (or one of the other similarly-named functions) should
solve this problem, right? The single quotes around the argument takes
care of LF, CR, and semicolons, and things like backslashes and quotes
are taken care of as documented.

I don't think we need to invent another encoding to solve this.

> +{
> + const unsigned char *p;
> +
> + for (p = (const unsigned char *)arg; *p; p++) {
> + if (*p >= 'a' && *p <= 'z')
> + continue;
> + if (*p >= 'A' && *p <= 'Z')
> + continue;
> + if (*p >= '0' && *p <= '9')
> + continue;
> + if (*p == '-' || *p == '_' || *p == '.' || *p == '/')
> + continue;

If we do take this approach, can ':' also be included?

> + if (skip_prefix(arg, "sparse:oid=", )) {
> + struct object_context oc;
> + struct object_id sparse_oid;
> +
> + /*
> +  * Try to parse  into an OID for the current
> +  * command, but DO NOT complain if we don't have the blob or
> +  * ref locally.
> +  */
> + if (!get_oid_with_context(v0, GET_OID_BLOB,
> +   _oid, ))
> + filter_options->sparse_oid_value = oiddup(_oid);
> + filter_options->choice = LOFC_SPARSE_OID;
> + if (arg_needs_armor(v0))
> + filter_options->requires_armor = v0 - arg;
> + return 0;
> + }

In your previous e-mail, you mentioned:

> yes.  I always pass filter_options.raw_value over the wire.
> The code above tries to parse it and put it in an OID for
> private use by the current process -- just like the size limit
> value in the blob:limit filter.

So I think this function should complain if you don't have the blob or
ref locally. (I envision that if a filter string is to be directly sent
to a server, it should be stored as a string, not processed by this
function first.)