Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item
On Thu, Apr 05, 2018 at 01:06:30PM -0700, Elijah Newren wrote: > > There are other similar trailing-slash matches in that function, but I'm > > not sure of all the cases in which they're used. I don't know if any of > > those would need similar treatment (sorry for being vague; I expect I'd > > need a few hours to dig into how the pathspec code actually works, and I > > don't have that today). > > If it'd only take you a few hours, then you're a lot faster than me. > It took me a while to start wrapping my head around it. OK, I was being overly optimistic. :) > The other trailing-slash matches in the function are all correct, > according to the testsuite. (I'm not sure I like the > DO_MATCH_DIRECTORY stuff, but it is encoded in tests and backward > compatibility is important.) In particular, changing the earlier code > to have the same offset trick would make it claim that e.g. either > "a/b" or "a/b/" as names match unconditionally against "a/b/c" as a > pathspec. We need it to be conditional: we only want that to be > considered a match when checking whether we want to recurse into the > directory for other matches, not when checking whether the directory > itself matches the pathspec. Thus, it should be behind a separate > flag, in a subsequent check, which is what this series does (namely > with DO_MATCH_LEADING_PATHSPEC). OK, that makes some sense to me. > To be more precise, here is how a matrix of pathnames and pathspecs > would be treated by match_pathspec_item(), where I am abbreviating > names like MATCH_RECURSIVELY_LEADING_PATHSPEC to LEADING): > >Pathspecs > |a/b|a/b/| a/b/c > --+---++--- > a/b | EXACT| RECURSIVE | LEADING[3] > Names a/b/ | EXACT[1] | EXACT | LEADING[2] > a/b/c | RECURSIVE | RECURSIVE | EXACT > > [1] Only if DO_MATCH_DIRECTORY is passed. Otherwise, > this is NOT a match at all. > [2] Only if DO_MATCH_LEADING_PATHSPEC is passed, > after applying this series. Otherwise, not a match > at all. > [3] Without the fix in this thread that you highlighted, > and assuming we apply patch 7, this would actually > mistakenly return RECURSIVE. > > > Now for a separate question: How much of the above would you like > added to the commit message...or even as a comment in the code to make > it clearer to other folks trying to make sense of it? That table seems quite illuminating to me. It's hard to pick out all the special-cases from the code, or what they're _supposed_ to be doing. I think it makes sense as a code comment. -Peff PS I'm going to be on a 3-week vacation starting tomorrow, so apologies in advance for ignoring any follow-ups.
Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item
On Thu, Apr 5, 2018 at 12:04 PM, Jeff Kingwrote: > On Thu, Apr 05, 2018 at 11:36:45AM -0700, Elijah Newren wrote: > >> > Do we care about matching the name "foo" against the patchspec_item "foo/"? >> > >> > That matches now, but wouldn't after your patch. >> >> So I should probably make the check handle both cases: >> >> @@ -383,8 +383,9 @@ static int match_pathspec_item(const struct >> pathspec_item *item, int prefix, >> /* Perform checks to see if "name" is a super set of the pathspec */ >> if (flags & DO_MATCH_LEADING_PATHSPEC) { >> /* name is a literal prefix of the pathspec */ >> + int offset = name[namelen-1] == '/' ? 1 : 0; >> if ((namelen < matchlen) && >> - (match[namelen] == '/') && >> + (match[namelen-offset] == '/') && >> !ps_strncmp(item, match, name, namelen)) >> return MATCHED_RECURSIVELY_LEADING_PATHSPEC; > > That seems reasonable to me, and your "offset" trick here should prevent > us from getting confused. Can namelen ever be zero here? I guess > probably not (I could see an empty pathspec, but an empty path does not > make sense). Right, I don't see how an empty path would make sense. > There are other similar trailing-slash matches in that function, but I'm > not sure of all the cases in which they're used. I don't know if any of > those would need similar treatment (sorry for being vague; I expect I'd > need a few hours to dig into how the pathspec code actually works, and I > don't have that today). If it'd only take you a few hours, then you're a lot faster than me. It took me a while to start wrapping my head around it. The other trailing-slash matches in the function are all correct, according to the testsuite. (I'm not sure I like the DO_MATCH_DIRECTORY stuff, but it is encoded in tests and backward compatibility is important.) In particular, changing the earlier code to have the same offset trick would make it claim that e.g. either "a/b" or "a/b/" as names match unconditionally against "a/b/c" as a pathspec. We need it to be conditional: we only want that to be considered a match when checking whether we want to recurse into the directory for other matches, not when checking whether the directory itself matches the pathspec. Thus, it should be behind a separate flag, in a subsequent check, which is what this series does (namely with DO_MATCH_LEADING_PATHSPEC). To be more precise, here is how a matrix of pathnames and pathspecs would be treated by match_pathspec_item(), where I am abbreviating names like MATCH_RECURSIVELY_LEADING_PATHSPEC to LEADING): Pathspecs |a/b|a/b/| a/b/c --+---++--- a/b | EXACT| RECURSIVE | LEADING[3] Names a/b/ | EXACT[1] | EXACT | LEADING[2] a/b/c | RECURSIVE | RECURSIVE | EXACT [1] Only if DO_MATCH_DIRECTORY is passed. Otherwise, this is NOT a match at all. [2] Only if DO_MATCH_LEADING_PATHSPEC is passed, after applying this series. Otherwise, not a match at all. [3] Without the fix in this thread that you highlighted, and assuming we apply patch 7, this would actually mistakenly return RECURSIVE. Now for a separate question: How much of the above would you like added to the commit message...or even as a comment in the code to make it clearer to other folks trying to make sense of it? Elijah
Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item
On Thu, Apr 05, 2018 at 11:36:45AM -0700, Elijah Newren wrote: > > Do we care about matching the name "foo" against the patchspec_item "foo/"? > > > > That matches now, but wouldn't after your patch. > > Technically, the tests pass anyway due to the fallback behavior > mentioned in the commit message, but this is a really good point. It > looks like the call to submodule_path_match() from builtin/grep.c is > going to be passing name without the trailing '/', which is contrary > to how read_directory_recursive() in dir.c builds up paths (namely > with the trailing '/'). If we tried to force consistency (either > always omit the trailing slash or always include it), then we'd > probably want to do so for match_pathspec() calls as well, and there > are lots of those throughout the code and auditing it all looks > painful. > > So I should probably make the check handle both cases: > > @@ -383,8 +383,9 @@ static int match_pathspec_item(const struct > pathspec_item *item, int prefix, > /* Perform checks to see if "name" is a super set of the pathspec */ > if (flags & DO_MATCH_LEADING_PATHSPEC) { > /* name is a literal prefix of the pathspec */ > + int offset = name[namelen-1] == '/' ? 1 : 0; > if ((namelen < matchlen) && > - (match[namelen] == '/') && > + (match[namelen-offset] == '/') && > !ps_strncmp(item, match, name, namelen)) > return MATCHED_RECURSIVELY_LEADING_PATHSPEC; That seems reasonable to me, and your "offset" trick here should prevent us from getting confused. Can namelen ever be zero here? I guess probably not (I could see an empty pathspec, but an empty path does not make sense). There are other similar trailing-slash matches in that function, but I'm not sure of all the cases in which they're used. I don't know if any of those would need similar treatment (sorry for being vague; I expect I'd need a few hours to dig into how the pathspec code actually works, and I don't have that today). -Peff
Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item
On Thu, Apr 5, 2018 at 10:49 AM, Jeff Kingwrote: >> diff --git a/dir.c b/dir.c >> index 19212129f0..c915a69385 100644 >> --- a/dir.c >> +++ b/dir.c >> @@ -384,7 +384,7 @@ static int match_pathspec_item(const struct >> pathspec_item *item, int prefix, >> if (flags & DO_MATCH_SUBMODULE) { >> /* name is a literal prefix of the pathspec */ >> if ((namelen < matchlen) && >> - (match[namelen] == '/') && >> + (match[namelen-1] == '/') && >> !ps_strncmp(item, match, name, namelen)) >> return MATCHED_RECURSIVELY; > > Do we care about matching the name "foo" against the patchspec_item "foo/"? > > That matches now, but wouldn't after your patch. Technically, the tests pass anyway due to the fallback behavior mentioned in the commit message, but this is a really good point. It looks like the call to submodule_path_match() from builtin/grep.c is going to be passing name without the trailing '/', which is contrary to how read_directory_recursive() in dir.c builds up paths (namely with the trailing '/'). If we tried to force consistency (either always omit the trailing slash or always include it), then we'd probably want to do so for match_pathspec() calls as well, and there are lots of those throughout the code and auditing it all looks painful. So I should probably make the check handle both cases: @@ -383,8 +383,9 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix, /* Perform checks to see if "name" is a super set of the pathspec */ if (flags & DO_MATCH_LEADING_PATHSPEC) { /* name is a literal prefix of the pathspec */ + int offset = name[namelen-1] == '/' ? 1 : 0; if ((namelen < matchlen) && - (match[namelen] == '/') && + (match[namelen-offset] == '/') && !ps_strncmp(item, match, name, namelen)) return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item
On Thu, Apr 05, 2018 at 10:34:41AM -0700, Elijah Newren wrote: > For a pathspec like 'foo/bar' comparing against a path named "foo/", > namelen will be 4, and match[namelen] will be 'b'. The correct location > of the directory separator is namelen-1. > > The reason the code worked anyway was that the following code immediately > checked whether the first matchlen characters matched (which they do) and > then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't > have the ability to check if "name" can be matched as a directory (or > prefix) against the pathspec. > > Signed-off-by: Elijah Newren> --- > dir.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/dir.c b/dir.c > index 19212129f0..c915a69385 100644 > --- a/dir.c > +++ b/dir.c > @@ -384,7 +384,7 @@ static int match_pathspec_item(const struct pathspec_item > *item, int prefix, > if (flags & DO_MATCH_SUBMODULE) { > /* name is a literal prefix of the pathspec */ > if ((namelen < matchlen) && > - (match[namelen] == '/') && > + (match[namelen-1] == '/') && > !ps_strncmp(item, match, name, namelen)) > return MATCHED_RECURSIVELY; Do we care about matching the name "foo" against the patchspec_item "foo/"? That matches now, but wouldn't after your patch. -Peff