Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item

2018-04-06 Thread Jeff King
On Thu, Apr 05, 2018 at 01:06:30PM -0700, Elijah Newren wrote:

> > There are other similar trailing-slash matches in that function, but I'm
> > not sure of all the cases in which they're used. I don't know if any of
> > those would need similar treatment (sorry for being vague; I expect I'd
> > need a few hours to dig into how the pathspec code actually works, and I
> > don't have that today).
> 
> If it'd only take you a few hours, then you're a lot faster than me.
> It took me a while to start wrapping my head around it.

OK, I was being overly optimistic. :)

> The other trailing-slash matches in the function are all correct,
> according to the testsuite.  (I'm not sure I like the
> DO_MATCH_DIRECTORY stuff, but it is encoded in tests and backward
> compatibility is important.)  In particular, changing the earlier code
> to have the same offset trick would make it claim that e.g. either
> "a/b" or "a/b/" as names match unconditionally against "a/b/c" as a
> pathspec.  We need it to be conditional: we only want that to be
> considered a match when checking whether we want to recurse into the
> directory for other matches, not when checking whether the directory
> itself matches the pathspec.  Thus, it should be behind a separate
> flag, in a subsequent check, which is what this series does (namely
> with DO_MATCH_LEADING_PATHSPEC).

OK, that makes some sense to me.

> To be more precise, here is how a matrix of pathnames and pathspecs
> would be treated by match_pathspec_item(), where I am abbreviating
> names like MATCH_RECURSIVELY_LEADING_PATHSPEC to LEADING):
> 
>Pathspecs
> |a/b|a/b/|   a/b/c
>   --+---++---
>   a/b   |  EXACT| RECURSIVE  |  LEADING[3]
>   Names   a/b/  |  EXACT[1] |  EXACT |  LEADING[2]
>   a/b/c | RECURSIVE | RECURSIVE  |  EXACT
> 
> [1] Only if DO_MATCH_DIRECTORY is passed.  Otherwise,
> this is NOT a match at all.
> [2] Only if DO_MATCH_LEADING_PATHSPEC is passed,
> after applying this series.  Otherwise, not a match
> at all.
> [3] Without the fix in this thread that you highlighted,
> and assuming we apply patch 7, this would actually
> mistakenly return RECURSIVE.
> 
> 
> Now for a separate question: How much of the above would you like
> added to the commit message...or even as a comment in the code to make
> it clearer to other folks trying to make sense of it?

That table seems quite illuminating to me. It's hard to pick out all the
special-cases from the code, or what they're _supposed_ to be doing. I
think it makes sense as a code comment.

-Peff

PS I'm going to be on a 3-week vacation starting tomorrow, so apologies
   in advance for ignoring any follow-ups.


Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item

2018-04-05 Thread Elijah Newren
On Thu, Apr 5, 2018 at 12:04 PM, Jeff King  wrote:
> On Thu, Apr 05, 2018 at 11:36:45AM -0700, Elijah Newren wrote:
>
>> > Do we care about matching the name "foo" against the patchspec_item "foo/"?
>> >
>> > That matches now, but wouldn't after your patch.
>>

>> So I should probably make the check handle both cases:
>>
>> @@ -383,8 +383,9 @@ static int match_pathspec_item(const struct
>> pathspec_item *item, int prefix,
>> /* Perform checks to see if "name" is a super set of the pathspec */
>> if (flags & DO_MATCH_LEADING_PATHSPEC) {
>> /* name is a literal prefix of the pathspec */
>> +   int offset = name[namelen-1] == '/' ? 1 : 0;
>> if ((namelen < matchlen) &&
>> -   (match[namelen] == '/') &&
>> +   (match[namelen-offset] == '/') &&
>> !ps_strncmp(item, match, name, namelen))
>> return MATCHED_RECURSIVELY_LEADING_PATHSPEC;
>
> That seems reasonable to me, and your "offset" trick here should prevent
> us from getting confused. Can namelen ever be zero here? I guess
> probably not (I could see an empty pathspec, but an empty path does not
> make sense).

Right, I don't see how an empty path would make sense.

> There are other similar trailing-slash matches in that function, but I'm
> not sure of all the cases in which they're used. I don't know if any of
> those would need similar treatment (sorry for being vague; I expect I'd
> need a few hours to dig into how the pathspec code actually works, and I
> don't have that today).

If it'd only take you a few hours, then you're a lot faster than me.
It took me a while to start wrapping my head around it.

The other trailing-slash matches in the function are all correct,
according to the testsuite.  (I'm not sure I like the
DO_MATCH_DIRECTORY stuff, but it is encoded in tests and backward
compatibility is important.)  In particular, changing the earlier code
to have the same offset trick would make it claim that e.g. either
"a/b" or "a/b/" as names match unconditionally against "a/b/c" as a
pathspec.  We need it to be conditional: we only want that to be
considered a match when checking whether we want to recurse into the
directory for other matches, not when checking whether the directory
itself matches the pathspec.  Thus, it should be behind a separate
flag, in a subsequent check, which is what this series does (namely
with DO_MATCH_LEADING_PATHSPEC).

To be more precise, here is how a matrix of pathnames and pathspecs
would be treated by match_pathspec_item(), where I am abbreviating
names like MATCH_RECURSIVELY_LEADING_PATHSPEC to LEADING):

   Pathspecs
|a/b|a/b/|   a/b/c
  --+---++---
  a/b   |  EXACT| RECURSIVE  |  LEADING[3]
  Names   a/b/  |  EXACT[1] |  EXACT |  LEADING[2]
  a/b/c | RECURSIVE | RECURSIVE  |  EXACT

[1] Only if DO_MATCH_DIRECTORY is passed.  Otherwise,
this is NOT a match at all.
[2] Only if DO_MATCH_LEADING_PATHSPEC is passed,
after applying this series.  Otherwise, not a match
at all.
[3] Without the fix in this thread that you highlighted,
and assuming we apply patch 7, this would actually
mistakenly return RECURSIVE.


Now for a separate question: How much of the above would you like
added to the commit message...or even as a comment in the code to make
it clearer to other folks trying to make sense of it?


Elijah


Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item

2018-04-05 Thread Jeff King
On Thu, Apr 05, 2018 at 11:36:45AM -0700, Elijah Newren wrote:

> > Do we care about matching the name "foo" against the patchspec_item "foo/"?
> >
> > That matches now, but wouldn't after your patch.
> 
> Technically, the tests pass anyway due to the fallback behavior
> mentioned in the commit message, but this is a really good point.  It
> looks like the call to submodule_path_match() from builtin/grep.c is
> going to be passing name without the trailing '/', which is contrary
> to how read_directory_recursive() in dir.c builds up paths (namely
> with the trailing '/'). If we tried to force consistency (either
> always omit the trailing slash or always include it), then we'd
> probably want to do so for match_pathspec() calls as well, and there
> are lots of those throughout the code and auditing it all looks
> painful.
> 
> So I should probably make the check handle both cases:
> 
> @@ -383,8 +383,9 @@ static int match_pathspec_item(const struct
> pathspec_item *item, int prefix,
> /* Perform checks to see if "name" is a super set of the pathspec */
> if (flags & DO_MATCH_LEADING_PATHSPEC) {
> /* name is a literal prefix of the pathspec */
> +   int offset = name[namelen-1] == '/' ? 1 : 0;
> if ((namelen < matchlen) &&
> -   (match[namelen] == '/') &&
> +   (match[namelen-offset] == '/') &&
> !ps_strncmp(item, match, name, namelen))
> return MATCHED_RECURSIVELY_LEADING_PATHSPEC;

That seems reasonable to me, and your "offset" trick here should prevent
us from getting confused. Can namelen ever be zero here? I guess
probably not (I could see an empty pathspec, but an empty path does not
make sense).

There are other similar trailing-slash matches in that function, but I'm
not sure of all the cases in which they're used. I don't know if any of
those would need similar treatment (sorry for being vague; I expect I'd
need a few hours to dig into how the pathspec code actually works, and I
don't have that today).

-Peff


Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item

2018-04-05 Thread Elijah Newren
On Thu, Apr 5, 2018 at 10:49 AM, Jeff King  wrote:
>> diff --git a/dir.c b/dir.c
>> index 19212129f0..c915a69385 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -384,7 +384,7 @@ static int match_pathspec_item(const struct 
>> pathspec_item *item, int prefix,
>>   if (flags & DO_MATCH_SUBMODULE) {
>>   /* name is a literal prefix of the pathspec */
>>   if ((namelen < matchlen) &&
>> - (match[namelen] == '/') &&
>> + (match[namelen-1] == '/') &&
>>   !ps_strncmp(item, match, name, namelen))
>>   return MATCHED_RECURSIVELY;
>
> Do we care about matching the name "foo" against the patchspec_item "foo/"?
>
> That matches now, but wouldn't after your patch.

Technically, the tests pass anyway due to the fallback behavior
mentioned in the commit message, but this is a really good point.  It
looks like the call to submodule_path_match() from builtin/grep.c is
going to be passing name without the trailing '/', which is contrary
to how read_directory_recursive() in dir.c builds up paths (namely
with the trailing '/'). If we tried to force consistency (either
always omit the trailing slash or always include it), then we'd
probably want to do so for match_pathspec() calls as well, and there
are lots of those throughout the code and auditing it all looks
painful.

So I should probably make the check handle both cases:

@@ -383,8 +383,9 @@ static int match_pathspec_item(const struct
pathspec_item *item, int prefix,
/* Perform checks to see if "name" is a super set of the pathspec */
if (flags & DO_MATCH_LEADING_PATHSPEC) {
/* name is a literal prefix of the pathspec */
+   int offset = name[namelen-1] == '/' ? 1 : 0;
if ((namelen < matchlen) &&
-   (match[namelen] == '/') &&
+   (match[namelen-offset] == '/') &&
!ps_strncmp(item, match, name, namelen))
return MATCHED_RECURSIVELY_LEADING_PATHSPEC;


Re: [RFC PATCH 2/7] dir.c: fix off-by-one error in match_pathspec_item

2018-04-05 Thread Jeff King
On Thu, Apr 05, 2018 at 10:34:41AM -0700, Elijah Newren wrote:

> For a pathspec like 'foo/bar' comparing against a path named "foo/",
> namelen will be 4, and match[namelen] will be 'b'.  The correct location
> of the directory separator is namelen-1.
> 
> The reason the code worked anyway was that the following code immediately
> checked whether the first matchlen characters matched (which they do) and
> then bailed and return MATCHED_RECURSIVELY anyway since wildmatch doesn't
> have the ability to check if "name" can be matched as a directory (or
> prefix) against the pathspec.
> 
> Signed-off-by: Elijah Newren 
> ---
>  dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 19212129f0..c915a69385 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -384,7 +384,7 @@ static int match_pathspec_item(const struct pathspec_item 
> *item, int prefix,
>   if (flags & DO_MATCH_SUBMODULE) {
>   /* name is a literal prefix of the pathspec */
>   if ((namelen < matchlen) &&
> - (match[namelen] == '/') &&
> + (match[namelen-1] == '/') &&
>   !ps_strncmp(item, match, name, namelen))
>   return MATCHED_RECURSIVELY;

Do we care about matching the name "foo" against the patchspec_item "foo/"?

That matches now, but wouldn't after your patch.

-Peff