Re: [PATCH v2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Junio C Hamano
Junio C Hamano  writes:

> Brandon Williams  writes:
>
>> On a similar but slightly different note.  In general do we want
>> the pathspec '??b' to match against the sib/ directory and
>> subsequently have ls-files print all entries inside of the sib/
>> directory?  (this is in the non-recursive case)
>
> I'd need to find time to dig a bit of history before I can give a
> firm opinion on this, but here is a knee-jerk version of my reaction.

In the context of what you are doing, i.e. "ls-files that recurses
into submodules", my opinion is that "ls-files --recurse-submodules"
should behave wrt pathspecs AS IF all the submodule contents are
flattened into a single index of the superproject.

In the sample scenario under discussion, i.e.

In the superproject we have these
$ git ls-files -s
100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0   .gitmodules
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   sib/file
16 1f5a0695289c500f25e7fa55e3ad27e394d1206b 0   sub

In 'sub' submodule we have this
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   file

such a flattend index would look like this:

100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0   .gitmodules
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   sib/file
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   sub/file

i.e. removing 'sub' submodule entry from the index of the
superproject and overlay everything in the submodule with sub/
prefixed to its path.

And with such an index, if and only if a path matches a pathspec,
"git ls-files --recurse-submodules" run at the toplevel with the
same pathspec should show the path.  That means

$ git ls-files --recurse-submodules '??b'

would show nothing (not even 'sub'), while

$ git ls-files --recurse-submodules '??b*'

should show sib/file and sub/file.  That is because that is how the
command without "--recurse-submodules" working on that flat index
would produce.

The "we have historically two kinds of pathspecs and they differ how
they work with wildcard" is a separate issue, I would think, even
though the result would affect what should happen in the above
example (i.e. if we said "either a pattern match or a literal match
to a leading directory path should make everything underneath
match", '??b' would make sib/ and sub/ to be
shown).


Re: [PATCH v2] ls-files: add pathspec matching for submodules

2016-09-21 Thread Junio C Hamano
Brandon Williams  writes:

> On a similar but slightly different note.  In general do we want
> the pathspec '??b' to match against the sib/ directory and
> subsequently have ls-files print all entries inside of the sib/
> directory?  (this is in the non-recursive case)

I'd need to find time to dig a bit of history before I can give a
firm opinion on this, but here is a knee-jerk version of my reaction.

 * A pathspec element that matches literally to a directory causes
   itself and everything underneath the directory match that
   element, e.g. "sib" would be considered a match.

 * Otherwise, a pathspec that matches with the whole path as a
   pattern matches the path, e.g. "??b" would match "sib" itself,
   but not "sib/file".  Note that "??b*" would match "sib" and
   "sib/file" because the pattern match is without FNM_PATNAME
   unless ':(glob)' magic is in effect.

Historically, some commands treated a pathspec as purely a prefix
match (i.e. the former) and did not use _any_ pattern matching,
while other commands did both of the above two (e.g. compare ls-tree
and ls-files).  I thought we were slowly moving towards unifying
them, but apparently 'git log -- "D?cumentation"' does not show
anything close to what 'git log -- Documentation' gives us even in
today's Git.

Probably we want to change it at some point so that a pattern that
matches one leading directory would cause everything underneath to
match, e.g. "??b" would include "sib/file" just because "sib" would.




Re: [PATCH v2] ls-files: add pathspec matching for submodules

2016-09-20 Thread Brandon Williams
>>> +
>>> + if (item->flags & PATHSPEC_ONESTAR) {
>>> + return WM_MATCH;
>>> + } else if (item->magic & PATHSPEC_GLOB) {
>>> + return wildmatch(pattern, string,
>>> +  WM_PATHNAME |
>>> +  (item->magic & PATHSPEC_ICASE ?
>>> +   WM_CASEFOLD : 0),
>>> +  NULL);
>>
>> Isn't this last one overly tight?  I am wondering about a scenario
>> where you have a submodule at "sub/" in the superproject, and "sub/"
>> has a "file" at the top of its working tree.  And you do:
>>
>> git ls-files --recurse-submodules ':(glob)??b/fi?e'
>>
>> at the top of the superproject.  The "pattern" would be '??b/fi?e"
>> while string would be 'sub', and wildmatch() would not like it, but
>> there is no way for this caller to append anything to 'sub' before
>> making this call, as it hasn't looked into what paths appear in the
>> submodule repository (and it should not want to).  And I think we
>> would want it to recurse to find sub/file.  IOW, this looks like a
>> false negative we must avoid in this function.  As we cannot afford
>> to check if anything that matches 'fi?e' is in the index file of the
>> submodule repository, we shouldn't try to match 'fi?e' portion of
>> the given pathspec pattern.
>
> good point.  Let me think about this some more.

On a similar but slightly different note.  In general do we want the
pathspec '??b' to
match against the sib/ directory and subsequently have ls-files print
all entries inside
of the sib/ directory?  (this is in the non-recursive case)


Re: [PATCH v2] ls-files: add pathspec matching for submodules

2016-09-20 Thread Brandon Williams
On Mon, Sep 19, 2016 at 4:21 PM, Junio C Hamano  wrote:
>
> As the previous one that used a wrong (sorry) argument is not even
> in 'next' yet, let's pretend that it never happened.  It is OK to
> still keep it and this patch as two separate steps, i.e. a topic
> with two patches in it.
>
> That means that this patch will become 2/2 of a series, and 1/2 is
> rerolled to use submodule_prefix from the get-go, without ever
> introducing output_path_prefix variable, so that many of the above
> lines we won't have to review in 2/2.

Ah ok.  Would you like me to resend the first patch with the desired
change then?

>
>> + /*
>> +  * Pass in the original pathspec args.  The submodule will be
>> +  * responsible for prepending the 'submodule_prefix' prior to comparing
>> +  * against the pathspec for matches.
>> +  */
>
> Good.
>
>> + argv_array_push(, "--");
>> + for (i = 0; i < pathspec.nr; ++i)
>> + argv_array_push(, pathspec.items[i].original);
>> +
>
> Please prefer post-increment i++ over pre-increment ++i when writing
> a for(;;) loop, unless there is a strong reason not to (familiarlity
> in C++ is not a good reason).

I had a compiler instructor drill into my head that ++i in a for loop
was faster historically
since it wouldn't have to create a temporary value.  Of course now a days there
probably isn't much (or any) difference between the two.  If post-fix
operators are the
norm in git code then I can try to remember to use them :)

>> +
>> + if (item->flags & PATHSPEC_ONESTAR) {
>> + return WM_MATCH;
>> + } else if (item->magic & PATHSPEC_GLOB) {
>> + return wildmatch(pattern, string,
>> +  WM_PATHNAME |
>> +  (item->magic & PATHSPEC_ICASE ?
>> +   WM_CASEFOLD : 0),
>> +  NULL);
>
> Isn't this last one overly tight?  I am wondering about a scenario
> where you have a submodule at "sub/" in the superproject, and "sub/"
> has a "file" at the top of its working tree.  And you do:
>
> git ls-files --recurse-submodules ':(glob)??b/fi?e'
>
> at the top of the superproject.  The "pattern" would be '??b/fi?e"
> while string would be 'sub', and wildmatch() would not like it, but
> there is no way for this caller to append anything to 'sub' before
> making this call, as it hasn't looked into what paths appear in the
> submodule repository (and it should not want to).  And I think we
> would want it to recurse to find sub/file.  IOW, this looks like a
> false negative we must avoid in this function.  As we cannot afford
> to check if anything that matches 'fi?e' is in the index file of the
> submodule repository, we shouldn't try to match 'fi?e' portion of
> the given pathspec pattern.

good point.  Let me think about this some more.


Re: [PATCH v2] ls-files: add pathspec matching for submodules

2016-09-19 Thread Junio C Hamano
Brandon Williams  writes:

> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> index a623ebf..09e4449 100644
> --- a/Documentation/git-ls-files.txt
> +++ b/Documentation/git-ls-files.txt
> @@ -19,7 +19,7 @@ SYNOPSIS
> - [--output-path-prefix=]
> + [--submodule-prefix=]
> ...
> ---output-path-prefix=::
> +--submodule-prefix=::
> ...
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
> ...
> - if (output_path_prefix && *output_path_prefix) {
> + if (submodule_prefix && *submodule_prefix) {
>   strbuf_reset(_name);
> - strbuf_addstr(_name, output_path_prefix);
> + strbuf_addstr(_name, submodule_prefix);
> ...
> - argv_array_pushf(, "--output-path-prefix=%s%s/",
> -  output_path_prefix ? output_path_prefix : "",
> + argv_array_pushf(, "--submodule-prefix=%s%s/",
> +  submodule_prefix ? submodule_prefix : "",
>ce->name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

That means that this patch will become 2/2 of a series, and 1/2 is
rerolled to use submodule_prefix from the get-go, without ever
introducing output_path_prefix variable, so that many of the above
lines we won't have to review in 2/2.

> + /*
> +  * Pass in the original pathspec args.  The submodule will be
> +  * responsible for prepending the 'submodule_prefix' prior to comparing
> +  * against the pathspec for matches.
> +  */

Good.

> + argv_array_push(, "--");
> + for (i = 0; i < pathspec.nr; ++i)
> + argv_array_push(, pathspec.items[i].original);
> +

Please prefer post-increment i++ over pre-increment ++i when writing
a for(;;) loop, unless there is a strong reason not to (familiarlity
in C++ is not a good reason).

> diff --git a/dir.c b/dir.c
> index 0ea235f..1afc3ff 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,35 @@ int fspathncmp(const char *a, const char *b, size_t count)
>   return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +/**
> + * Used to perform prefix matching against a pathspec_item for determining 
> if we
> + * should decend into a submodule.  This function can result in false 
> positives
> + * since we are only trying to match the 'string' to a prefix of the 
> 'pattern'
> + */

Perhaps s/can/is allowed to/.  Implicit but equally important is
that it is not OK to produce false negative.

> +static int prefix_fnmatch(const struct pathspec_item *item,
> +const char *pattern, const char *string,
> +int prefix)
> +{
> + if (prefix > 0) {
> + if (ps_strncmp(item, pattern, string, prefix))
> + return WM_NOMATCH;
> + pattern += prefix;
> + string += prefix;
> + }
> +
> + if (item->flags & PATHSPEC_ONESTAR) {
> + return WM_MATCH;
> + } else if (item->magic & PATHSPEC_GLOB) {
> + return wildmatch(pattern, string,
> +  WM_PATHNAME |
> +  (item->magic & PATHSPEC_ICASE ?
> +   WM_CASEFOLD : 0),
> +  NULL);

Isn't this last one overly tight?  I am wondering about a scenario
where you have a submodule at "sub/" in the superproject, and "sub/"
has a "file" at the top of its working tree.  And you do:

git ls-files --recurse-submodules ':(glob)??b/fi?e'

at the top of the superproject.  The "pattern" would be '??b/fi?e"
while string would be 'sub', and wildmatch() would not like it, but
there is no way for this caller to append anything to 'sub' before
making this call, as it hasn't looked into what paths appear in the
submodule repository (and it should not want to).  And I think we
would want it to recurse to find sub/file.  IOW, this looks like a
false negative we must avoid in this function.  As we cannot afford
to check if anything that matches 'fi?e' is in the index file of the
submodule repository, we shouldn't try to match 'fi?e' portion of
the given pathspec pattern.

Extending the scenario, if I also create a directory "sib/" in the
superproject next to "sub/" and add a "file" in it, the same
pathspec:

git ls-files [--recurse-submodules] ':(glob)??b/fi?e'

does find "sib/file" (with or without the recursion option).

Duy cc'ed because I am not quite sure why it is a good thing
that I need to add an exlicit ':(glob)' in these examples to
illustrate the breakage.  This comes from v1.8.3.2-849-gbd30c2e
("pathspec: support :(glob) syntax", 2013-07-14) and I do not
think we want to change its behaviour. I just want to be
reminded why we did this.  I am guessing that it was because of
an