Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list
On Fri, Nov 17, 2017 at 10:42:52AM -0500, Jeff Hostetler wrote: > > Yes, I share the same feeling. It does not help that the series > > defines its own notion of arg_needs_armor() and uses it to set a > > field called requires_armor that is not yet used, the definition of > > "armor"ing being each byte getting encoded as two hexadecimal digits > > without any sign (which makes me wonder what a receiver of > > "deadbeef" would do---did it receive an armored string or a plain > > one???). I do not understand why these strings are not passed as > > opaque sequences of bytes and instead converted at this low a layer. > > I'm probably being too paranoid. My fear is that a client could pass > an expression to clone/fetch/fetch-pack that would be sent to the > server and evaluated by the interface between upload-pack and pack-objects. > I'm not worried about the pack-protocol transport. I'm mainly concerned > in how upload-pack passes that *client-expression* to pack-objects and are > there ways for that to go south on the server with a carefully crafted > expression. I think you have to trust that those interfaces are capable of passing raw bytes, whether directly via execve() or because we got the quoting right. If there's a bug there, it's going to be a bigger problem than just this code path (and the fix needs to be there, not second-guessing it in the callers). So I'd say that yeah, you are being too paranoid. As an aside, though, I wonder if these client expressions should be fed over stdin to pack-objects. That removes any argv limits we might run into on the server side. It also removes shell injections as a possibility, though of course we'd need quoting in that layer to avoid argument-injection to pack-objects. -Peff
Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list
On 11/16/2017 9:14 PM, Junio C Hamano wrote: Jeff Kingwrites: Those encodings don't necessarily need to be the same, because they're about transport. Inside each process we'd have the raw bytes, and encode them as appropriate to whatever sub-program we're going to pass to (or not at all if we skip the shell for sub-processes, which is usually a good idea). Yes, I share the same feeling. It does not help that the series defines its own notion of arg_needs_armor() and uses it to set a field called requires_armor that is not yet used, the definition of "armor"ing being each byte getting encoded as two hexadecimal digits without any sign (which makes me wonder what a receiver of "deadbeef" would do---did it receive an armored string or a plain one???). I do not understand why these strings are not passed as opaque sequences of bytes and instead converted at this low a layer. I'm probably being too paranoid. My fear is that a client could pass an expression to clone/fetch/fetch-pack that would be sent to the server and evaluated by the interface between upload-pack and pack-objects. I'm not worried about the pack-protocol transport. I'm mainly concerned in how upload-pack passes that *client-expression* to pack-objects and are there ways for that to go south on the server with a carefully crafted expression. Even if we assume that upload-pack on the server directly invokes pack-objects (rather than a shell), there still might be issues. For platforms like Linux which have a native execve() and can pass args in an argv (and which the sub-process also receives in an argv in their main()), my paranoia is probably overkill. But on Windows, where the native interface takes a command-line string rather than an argv, I was concerned. Yes, there is code in compat/mingw.c to quote args when building a command line from an argv (and I'm *not* saying there are bugs in that), but again maybe I am being paranoid. I'll take another look and the existing quoting mechanisms and re-eval. Thanks, Jeff
Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list
Jeff Kingwrites: > Those encodings don't necessarily need to be the same, because they're > about transport. Inside each process we'd have the raw bytes, and encode > them as appropriate to whatever sub-program we're going to pass to (or > not at all if we skip the shell for sub-processes, which is usually a > good idea). Yes, I share the same feeling. It does not help that the series defines its own notion of arg_needs_armor() and uses it to set a field called requires_armor that is not yet used, the definition of "armor"ing being each byte getting encoded as two hexadecimal digits without any sign (which makes me wonder what a receiver of "deadbeef" would do---did it receive an armored string or a plain one???). I do not understand why these strings are not passed as opaque sequences of bytes and instead converted at this low a layer.
Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list
On Thu, Nov 16, 2017 at 04:49:08PM -0500, Jeff Hostetler wrote: > > First of all, about the injection problem, replying to your previous e-mail > > [1]: > > > > https://public-inbox.org/git/61855872-221b-0e97-abaa-24a011ad8...@jeffhostetler.com/ > > > > > I couldn't use quote.[ch] because it is more concerned with > > > quoting pathnames because of LF and CR characters within > > > them -- rather than semicolons and quotes and the like which > > > I was concerned about. > > > > sq_quote_buf() (or one of the other similarly-named functions) should > > solve this problem, right? The single quotes around the argument takes > > care of LF, CR, and semicolons, and things like backslashes and quotes > > are taken care of as documented. > > > > I don't think we need to invent another encoding to solve this. > > I'll take another look, sq_quote_buf() looks like it might work. > I was looking at quote_c_style() and that didn't seem right for > my needs. Thanks. I admit I haven't been following this thread closely, but I couldn't seem to find any indication of exactly which interfaces need quoting, or who is expected to unquote (here or in the previous iterations). It sounds like you're worried about shell injection, but shouldn't we worry about that the actual shell boundary? Likewise, if these values are being passed over the git protocol, shouldn't that part of the protocol be designed to encode arbitrary bytes? Those encodings don't necessarily need to be the same, because they're about transport. Inside each process we'd have the raw bytes, and encode them as appropriate to whatever sub-program we're going to pass to (or not at all if we skip the shell for sub-processes, which is usually a good idea). I have the feeling I'm missing something. -Peff
Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list
On 11/16/2017 3:21 PM, Jonathan Tan wrote: On Thu, 16 Nov 2017 18:07:41 + Jeff Hostetlerwrote: +/* + * Return 1 if the given string needs armoring because of "special" + * characters that may cause injection problems when a command passes + * the argument to a subordinate command (such as when upload-pack + * launches pack-objects). + * + * The usual alphanumeric and key punctuation do not trigger it. + */ +static int arg_needs_armor(const char *arg) First of all, about the injection problem, replying to your previous e-mail [1]: https://public-inbox.org/git/61855872-221b-0e97-abaa-24a011ad8...@jeffhostetler.com/ I couldn't use quote.[ch] because it is more concerned with quoting pathnames because of LF and CR characters within them -- rather than semicolons and quotes and the like which I was concerned about. sq_quote_buf() (or one of the other similarly-named functions) should solve this problem, right? The single quotes around the argument takes care of LF, CR, and semicolons, and things like backslashes and quotes are taken care of as documented. I don't think we need to invent another encoding to solve this. I'll take another look, sq_quote_buf() looks like it might work. I was looking at quote_c_style() and that didn't seem right for my needs. Thanks. +{ + const unsigned char *p; + + for (p = (const unsigned char *)arg; *p; p++) { + if (*p >= 'a' && *p <= 'z') + continue; + if (*p >= 'A' && *p <= 'Z') + continue; + if (*p >= '0' && *p <= '9') + continue; + if (*p == '-' || *p == '_' || *p == '.' || *p == '/') + continue; If we do take this approach, can ':' also be included? Sure. I just picked the common ones. + if (skip_prefix(arg, "sparse:oid=", )) { + struct object_context oc; + struct object_id sparse_oid; + + /* +* Try to parse into an OID for the current +* command, but DO NOT complain if we don't have the blob or +* ref locally. +*/ + if (!get_oid_with_context(v0, GET_OID_BLOB, + _oid, )) + filter_options->sparse_oid_value = oiddup(_oid); + filter_options->choice = LOFC_SPARSE_OID; + if (arg_needs_armor(v0)) + filter_options->requires_armor = v0 - arg; + return 0; + } In your previous e-mail, you mentioned: yes. I always pass filter_options.raw_value over the wire. The code above tries to parse it and put it in an OID for private use by the current process -- just like the size limit value in the blob:limit filter. So I think this function should complain if you don't have the blob or ref locally. (I envision that if a filter string is to be directly sent to a server, it should be stored as a string, not processed by this function first.) The whole point was for clone to be able to ask the server to use a known sparse-checkout spec, based on a branch name and a pathname within the tree. That's more usable. My expectation was that users could keep one or more sparse-checkout specs in the tree based upon the area/subset of the tree they want to work on. Then do a partial clone based upon the desired subset. Granted, we make them publish OIDs (outside of git) and then make the user request the partial clone by OID, but that's awkward. There are 2 uses of the "--filter=..." parser -- one to actually use the data within the current process (rev-list and pack-objects) and one to capture the value and pass it on to another process (fetch-pack and upload-pack). The former want the fields fully decoded into actual variables and the latter want the raw string to pass on. The parser routine provides both values for its callers. (In an earlier draft, the parser would replace the raw_value of a sparse spec with the actual OID when it had it locally, but I took that out based on your earlier comments. Now, the raw_value is always passed by the second type of usages.) Jeff
Re: [PATCH v4 4/6] list-objects: filter objects in traverse_commit_list
On Thu, 16 Nov 2017 18:07:41 + Jeff Hostetlerwrote: > +/* > + * Return 1 if the given string needs armoring because of "special" > + * characters that may cause injection problems when a command passes > + * the argument to a subordinate command (such as when upload-pack > + * launches pack-objects). > + * > + * The usual alphanumeric and key punctuation do not trigger it. > + */ > +static int arg_needs_armor(const char *arg) First of all, about the injection problem, replying to your previous e-mail [1]: https://public-inbox.org/git/61855872-221b-0e97-abaa-24a011ad8...@jeffhostetler.com/ > I couldn't use quote.[ch] because it is more concerned with > quoting pathnames because of LF and CR characters within > them -- rather than semicolons and quotes and the like which > I was concerned about. sq_quote_buf() (or one of the other similarly-named functions) should solve this problem, right? The single quotes around the argument takes care of LF, CR, and semicolons, and things like backslashes and quotes are taken care of as documented. I don't think we need to invent another encoding to solve this. > +{ > + const unsigned char *p; > + > + for (p = (const unsigned char *)arg; *p; p++) { > + if (*p >= 'a' && *p <= 'z') > + continue; > + if (*p >= 'A' && *p <= 'Z') > + continue; > + if (*p >= '0' && *p <= '9') > + continue; > + if (*p == '-' || *p == '_' || *p == '.' || *p == '/') > + continue; If we do take this approach, can ':' also be included? > + if (skip_prefix(arg, "sparse:oid=", )) { > + struct object_context oc; > + struct object_id sparse_oid; > + > + /* > + * Try to parse into an OID for the current > + * command, but DO NOT complain if we don't have the blob or > + * ref locally. > + */ > + if (!get_oid_with_context(v0, GET_OID_BLOB, > + _oid, )) > + filter_options->sparse_oid_value = oiddup(_oid); > + filter_options->choice = LOFC_SPARSE_OID; > + if (arg_needs_armor(v0)) > + filter_options->requires_armor = v0 - arg; > + return 0; > + } In your previous e-mail, you mentioned: > yes. I always pass filter_options.raw_value over the wire. > The code above tries to parse it and put it in an OID for > private use by the current process -- just like the size limit > value in the blob:limit filter. So I think this function should complain if you don't have the blob or ref locally. (I envision that if a filter string is to be directly sent to a server, it should be stored as a string, not processed by this function first.)