Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Brandon Williams
I thought as much.  Thanks for the quick explanation :)

On Mon, Sep 19, 2016 at 11:34 AM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>> side question if the answer is short:  Any reason as to why all of the
>> pathspec matching code lives inside of dir.c and not pathspec.c?
>
> Hysterical Raisins.
>
> A pathspec used to be just a "const char **" in simpler times, which
> was no more elaborate than "argv + i" (i.e. after parsing options
> and revisions out, the remainders are pathspec elements).  Major
> part of the matching was done inside dir.c and we haven't had chance
> to look at the larger picture to move the code around.


Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Junio C Hamano
Brandon Williams  writes:

> side question if the answer is short:  Any reason as to why all of the
> pathspec matching code lives inside of dir.c and not pathspec.c?

Hysterical Raisins.

A pathspec used to be just a "const char **" in simpler times, which
was no more elaborate than "argv + i" (i.e. after parsing options
and revisions out, the remainders are pathspec elements).  Major
part of the matching was done inside dir.c and we haven't had chance
to look at the larger picture to move the code around.


Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Brandon Williams
side question if the answer is short:  Any reason as to why all of the
pathspec matching code lives inside of dir.c and not pathspec.c?

On Mon, Sep 19, 2016 at 11:22 AM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>> Yes in that case it wouldn't have passed ps_strncmp()...but we should have 
>> never
>> made it there in the first place due to a piece of logic in 
>> match_pathspec_item:
>
> Ah, OK.
>
> Thanks for clarifying.


Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Junio C Hamano
Brandon Williams  writes:

> Yes in that case it wouldn't have passed ps_strncmp()...but we should have 
> never
> made it there in the first place due to a piece of logic in 
> match_pathspec_item:

Ah, OK.

Thanks for clarifying.


Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Brandon Williams
On Mon, Sep 19, 2016 at 11:04 AM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>>> Again, what do we have in "name" and "item" at this point?  If we
>>> have a submodule at "sub/" and we are checking a pathspec element
>>> "sub/dir1/*", what is the non-wildcard part of the pathspec and what
>>> is the "string"?  Aren't they "sub/dir1/" and "sub/" respectively,
>>> which would not pass ps_strncmp() and produce a (false) negative?
>>
>> item will be the pathspec_item struct that we are trying to match against.
>
> ... which would mean "sub/dir1/" in the above example (which is
> followed by '*' that is wildcard).
>
>> name will be the file we are trying to match, which should already have the
>> 'prefix' cut off (this is the prefix that is used as an optimization
>> in the common
>> case, which isn't used in the submodule case).
>
> ... which would be "sub/" in the above example, because we disable
> the common-prefix optimization.
>
> So in short, the answer to the last questions in the first quoted
> paragraph are yes, yes, and "no they do not pass ps_strncmp()"?

Yes in that case it wouldn't have passed ps_strncmp()...but we should have never
made it there in the first place due to a piece of logic in match_pathspec_item:

@@ -283,6 +308,24 @@ static int match_pathspec_item(const struct
pathspec_item *item, int prefix,
 item->nowildcard_len - prefix))
return MATCHED_FNMATCH;

+   /* Perform checks to see if "name" is a super set of the pathspec */
+   if (flags & DO_MATCH_SUBMODULE) {
+   int matched = 0;
+
+   /* Check if the name is a literal prefix of the pathspec */
+   if ((item->match[namelen] == '/') &&
+   !ps_strncmp(item, match, name, namelen)) {
+   matched = MATCHED_RECURSIVELY;
+   /* Check if the name wildmatches to the pathspec */
+   } else if (item->nowildcard_len < item->len &&
+  !prefix_fnmatch(item, match, name,
+  item->nowildcard_len - prefix)) {
+   matched = MATCHED_FNMATCH;
+   }
+
+   return matched;
+   }

Perhaps the call structure and code organization could be changes a bit to make
a little more sense.


Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Junio C Hamano
Brandon Williams  writes:

>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>  {
> + struct strbuf name = STRBUF_INIT;
>   int len = max_prefix_len;
> + if (submodule_prefix)
> + strbuf_addstr(, submodule_prefix);
> + strbuf_addstr(, ce->name);

Continuing with the previous review, which concentrated on what
happens in the superproject; let's see what happens in the recursive
invocation in a submodule.

So a recursively spawned "ls-files --submodule-prefix=sub/" finds
a path in the index PATH and forms "sub/PATH" in "name".  From here
on, where we used to match pathspec against ce->name, we would be
matching it against name->buf.

> + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> + submodule_path_match(, name.buf, ps_matched)) {
>   show_gitlink(ce);

This is primarily what happens in the superproject to decide if the
submodule is worth showing.  When we are in a submodule, we can
descend into subsubmodule (if our ls-files run in the superproject
passed --recurse-submodule down) from here.

> + } else if (match_pathspec(, name.buf, name.len,
> +   len, ps_matched,
> +   S_ISDIR(ce->ce_mode) ||
> +   S_ISGITLINK(ce->ce_mode))) {

This is interesting bit to see what happens in the recursive
invocation.  It uses the usual match_pathspec(), as we want to be
precise and correct, iow, we do not want to use DO_MATCH_SUBMODULE,
aka "it might be worth descending into submodule".

> + if (tag && *tag && show_valid_bit &&
> + (ce->ce_flags & CE_VALID)) {
> +...
> + }
> + write_eolinfo(ce, ce->name);
> + write_name(ce->name);

The prefixing is taken care of by write_name(), so it is correct to
use ce->name here.

> ...  
> + strbuf_release();
>  }

OK, everything I saw so far for the recursive invocation here makes
sense.

Thanks.


Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Junio C Hamano
Brandon Williams  writes:

>> Again, what do we have in "name" and "item" at this point?  If we
>> have a submodule at "sub/" and we are checking a pathspec element
>> "sub/dir1/*", what is the non-wildcard part of the pathspec and what
>> is the "string"?  Aren't they "sub/dir1/" and "sub/" respectively,
>> which would not pass ps_strncmp() and produce a (false) negative?
>
> item will be the pathspec_item struct that we are trying to match against.

... which would mean "sub/dir1/" in the above example (which is
followed by '*' that is wildcard).

> name will be the file we are trying to match, which should already have the
> 'prefix' cut off (this is the prefix that is used as an optimization
> in the common
> case, which isn't used in the submodule case).  

... which would be "sub/" in the above example, because we disable
the common-prefix optimization.

So in short, the answer to the last questions in the first quoted
paragraph are yes, yes, and "no they do not pass ps_strncmp()"?

>> I am starting to have a feeling that the best we can do in this
>> function safely is to see if prefix (i.e. the constant part of the
>> pathspec before the first wildcard) is long enough to cover the
>> "name" and if "name" part [matches or does not match] ...
>> If these two checks cannot decide, we may have to be pessimistic and
>> say "it may match; we don't know until we descend into it".
>> ...
>> So I would think we'd be in the business of counting slashes in the
>> name (called "string" in this function) and the pathspec, while
>> noticing '*' and '**' in the latter, and we may be able to be more
>> precise, but I am not sure how complex the end result would become.
>
> I agree, I'm not too sure how much more complex the logic would need
> to be to handle
> all matters of wildcard characters.  We could initially be more
> lenient on what qualifies as
> a match and then later (or in the near future) revisit the wildmatch
> function (which is complex)
> and see if we can add better matching capabilities more suited for
> submodules while at the
> same time fixing that bug discussed above.

I think it is reasonable to start a function that is meant to never
have false negatives pessimistic and return "might match" from it
when in doubt.

Thanks.


Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Brandon Williams
On Mon, Sep 19, 2016 at 10:00 AM, Junio C Hamano  wrote:
>
> I think you were clear enough.
>
> Don't read everything other people say in their reviews as pointing
> out issues.  Often trying to rephrase what they read in the code in
> their own words is a good way to make sure the reviewers and the
> original author are on the same page.  The above was one of these
> cases.

That makes sense, I'll be sure to remember that!


>
 + if (prefix > 0) {
 + if (ps_strncmp(item, pattern, string, prefix))
 + return WM_NOMATCH;
>>>
>>> This says: when we have a set prefix that must literally match, and
>>> that part does not match what we have, it cannot possibly match.
>>>
>>> Is that correct?  What do we have in "name" and "item" at this
>>> point?  We disable the common-prefix optimization, so we do not have
>>> to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
>>> giving you "sub/dir" as the common prefix, when you are wondering if
>>> it is worth descending into "sub/" without knowing what it contains.
>>> Is that what guarantees why this part is correct?
>>
>> I adopted this structure from another part of the code.  The caller
>> uses a field in
>> the pathspec item which indicates the location of the first wildcard 
>> character.
>> So the prefix (everything prior to the wildcard char) must match
>> literally before
>> we drop into a more expensive wildmatch function.
>
> "Another part of the code" is about tree walking, right?  Weren't
> you saying that part of the code may be buggy or something earlier
> (e.g. pathspec "su?/" vs entry "sub")?

I was refering to the logic that is used to do normal pathspec checking:
'match_pathspec' and the functions it calls, in particular
git_fnmatch.  And the bug I
pointed out before, I believe is due to how the wildmatch function
works.  It requires
that the pattern and the string being compared must match exactly (in
length as well).
The "su?/" case would drop into wildmatch funciton and wouldn't match
against any file
in the directory "sub/file1" for example, because it doesn't exactly
match "su?/".  In the
case of "sub" there is logic prior to the wildmatch function call
which would classify it a
match and return before descending into wildmatch.

> Again, what do we have in "name" and "item" at this point?  If we
> have a submodule at "sub/" and we are checking a pathspec element
> "sub/dir1/*", what is the non-wildcard part of the pathspec and what
> is the "string"?  Aren't then "sub/dir1/" and "sub/" respectively,
> which would not pass ps_strncmp() and produce a (false) negative?

item will be the pathspec_item struct that we are trying to match against.
name will be the file we are trying to match, which should already have the
'prefix' cut off (this is the prefix that is used as an optimization
in the common
case, which isn't used in the submodule case).  The 'prefix' in this function's
context is the part of the pattern prior to the first wildcard character. which
we can do a literal comparison on before descending into the wildmatch function.

> I am starting to have a feeling that the best we can do in this
> function safely is to see if prefix (i.e. the constant part of the
> pathspec before the first wildcard) is long enough to cover the
> "name" and if "name" part does not match to the directory boundary,
> e.g. for this combination
>
> pathspec = "a/b/sib/c/*"
> name = "a/b/sub/"
>
> we can say with confidence that it is not worth descending into.
>
> When prefix is long enough and "name" and leading part of the prefix
> matches to the directory boundary, e.g.
>
> pathspec = "a/b/sub/c/*"
> name = "a/b/sub/"
>
> we can say it is worth descending into.
>
> If these two checks cannot decide, we may have to be pessimistic and
> say "it may match; we don't know until we descend into it".  When
> prefix is shorter than name, I am not sure if we can devise a set of
> simple rules, e.g.
>
> pathspec = "a/**/c/*"
> name = "a/b/sub/"
>
> may match with its ** "b/sub" part and worth descending into, so is
>
> pathspec = "a/b/*/c/*"
> name = "a/b/sub/"
>
> but not this one:
>
> pathspec = "a/b/su[c-z]/c/*"
> name = "a/b/sub/"
>
> but this is OK:
>
> pathspec = "a/b/su[a-z]/c/*"
> name = "a/b/sub/"
>
> So I would think we'd be in the business of counting slashes in the
> name (called "string" in this function) and the pathspec, while
> noticing '*' and '**' in the latter, and we may be able to be more
> precise, but I am not sure how complex the end result would become.
>

I agree, I'm not too sure how much more complex the logic would need
to be to handle
all matters of wildcard characters.  We could initially be more
lenient on what qualifies as
a match and then later (or in the near future) revisit the wildmatch
function (which is complex)
and see if we can add better matching 

Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-19 Thread Junio C Hamano
Brandon Williams  writes:

>> OK, so as discussed previously with Heiko and Stefan, the idea is to
>>
>>  - pass the original pathspec as-is,
>>
>>  - when --submodule-prefix is given, a path discovered in a
>>submodule repository is first prefixed with that string before
>>getting checked to see if it matches the original pathspec.
>>
>> And this loop is about relaying the original pathspec.
>
> Exactly.  Perhaps I should have made this more clear either with a
> detailed comment or
> more information in the commit msg.

I think you were clear enough.

Don't read everything other people say in their reviews as pointing
out issues.  Often trying to rephrase what they read in the code in
their own words is a good way to make sure the reviewers and the
original author are on the same page.  The above was one of these
cases.

>>> + if (prefix > 0) {
>>> + if (ps_strncmp(item, pattern, string, prefix))
>>> + return WM_NOMATCH;
>>
>> This says: when we have a set prefix that must literally match, and
>> that part does not match what we have, it cannot possibly match.
>>
>> Is that correct?  What do we have in "name" and "item" at this
>> point?  We disable the common-prefix optimization, so we do not have
>> to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
>> giving you "sub/dir" as the common prefix, when you are wondering if
>> it is worth descending into "sub/" without knowing what it contains.
>> Is that what guarantees why this part is correct?
>
> I adopted this structure from another part of the code.  The caller
> uses a field in
> the pathspec item which indicates the location of the first wildcard 
> character.
> So the prefix (everything prior to the wildcard char) must match
> literally before
> we drop into a more expensive wildmatch function.

"Another part of the code" is about tree walking, right?  Weren't
you saying that part of the code may be buggy or something earlier
(e.g. pathspec "su?/" vs entry "sub")?

Again, what do we have in "name" and "item" at this point?  If we
have a submodule at "sub/" and we are checking a pathspec element
"sub/dir1/*", what is the non-wildcard part of the pathspec and what
is the "string"?  Aren't then "sub/dir1/" and "sub/" respectively,
which would not pass ps_strncmp() and produce a (false) negative?

I am starting to have a feeling that the best we can do in this
function safely is to see if prefix (i.e. the constant part of the
pathspec before the first wildcard) is long enough to cover the
"name" and if "name" part does not match to the directory boundary,
e.g. for this combination

pathspec = "a/b/sib/c/*"
name = "a/b/sub/"

we can say with confidence that it is not worth descending into.

When prefix is long enough and "name" and leading part of the prefix
matches to the directory boundary, e.g.

pathspec = "a/b/sub/c/*"
name = "a/b/sub/"

we can say it is worth descending into.

If these two checks cannot decide, we may have to be pessimistic and
say "it may match; we don't know until we descend into it".  When
prefix is shorter than name, I am not sure if we can devise a set of
simple rules, e.g.

pathspec = "a/**/c/*"
name = "a/b/sub/"

may match with its ** "b/sub" part and worth descending into, so is

pathspec = "a/b/*/c/*"
name = "a/b/sub/"

but not this one:

pathspec = "a/b/su[c-z]/c/*"
name = "a/b/sub/"

but this is OK:

pathspec = "a/b/su[a-z]/c/*"
name = "a/b/sub/"

So I would think we'd be in the business of counting slashes in the
name (called "string" in this function) and the pathspec, while
noticing '*' and '**' in the latter, and we may be able to be more
precise, but I am not sure how complex the end result would become.



Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-18 Thread Brandon Williams
On Fri, Sep 16, 2016 at 8:46 PM, Junio C Hamano  wrote:
>
> As the previous one that used a wrong (sorry) argument is not even
> in 'next' yet, let's pretend that it never happened.  It is OK to
> still keep it and this patch as two separate steps, i.e. a topic
> with two patches in it.

Yeah since I've been using what I built in the last patch to test the pathspecs
I based this patch off of what you have in bw/ls-files... branch.

>
>> + /* Add pathspec args */
>> + argv_array_push(, "--");
>> + for (i = 0; i < pathspec.nr; ++i)
>> + argv_array_push(, pathspec.items[i].original);
>
> OK, so as discussed previously with Heiko and Stefan, the idea is to
>
>  - pass the original pathspec as-is,
>
>  - when --submodule-prefix is given, a path discovered in a
>submodule repository is first prefixed with that string before
>getting checked to see if it matches the original pathspec.
>
> And this loop is about relaying the original pathspec.

Exactly.  Perhaps I should have made this more clear either with a
detailed comment or
more information in the commit msg.

>> @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
>>
>>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>>  {
>> + struct strbuf name = STRBUF_INIT;
>>   int len = max_prefix_len;
>> + if (submodule_prefix)
>> + strbuf_addstr(, submodule_prefix);
>> + strbuf_addstr(, ce->name);
>>
>>   if (len >= ce_namelen(ce))
>> - die("git ls-files: internal error - cache entry not superset 
>> of prefix");
>> + die("git ls-files: internal error - cache entry not "
>> + "superset of prefix");
>
> This is not such a great thing to do.  Upon a bug report, we can no
> longer do
>
> git grep 'cache entry not superset'
>
> to see where the error message is coming from.

Oh i wasn't really thinking about that.  I simply noticed that the
line was longer than
80 characters and figured I should break it to adhere to style guidelines.

>
>> - if (!match_pathspec(, ce->name, ce_namelen(ce),
>> - len, ps_matched,
>> - S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
>> - return;
>> - if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
>> + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
>> + submodule_path_match(, name.buf, ps_matched)) {
>>   show_gitlink(ce);
>> - return;
>> - }
>> + } else if (match_pathspec(, name.buf, name.len,
>> +   len, ps_matched,
>> +   S_ISDIR(ce->ce_mode) ||
>> +   S_ISGITLINK(ce->ce_mode))) {
>> + if (tag && *tag && show_valid_bit &&
>> + ...
>
> Argh.  If we had a preparatory clean-up step, would it have helped
> to avoid this big re-indentation that makes the patch harder to read
> than necessary, I wonder?
>
> Another way would have been to "goto" from the end of this block
>
>> + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
>> + submodule_path_match(, name.buf, ps_matched)) {
>
> where we used to "return" out to the central clean-up location, i.e.
> here.
>
>> + strbuf_release();
>>  }

Yeah, the lack of destructors in C makes this a bit of a challenge when we need
to free up memory.  I've also always been taught to avoid using goto's
as they can
be error prone and lead to making the code more difficult to read.
Hence why I did
 some adjustments to make it so the function had a single exit point
to make it easy
 for cleanup.


>
>
>>   parse_pathspec(, 0,
>>  PATHSPEC_PREFER_CWD |
>>  PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>>  prefix, argv);
>>
>> - /* Find common prefix for all pathspec's */
>> - max_prefix = common_prefix();
>> + /*
>> +  * Find common prefix for all pathspec's
>> +  * This is used as a performance optimization which violates 
>> correctness
>> +  * in the recurse_submodules mode
>> +  */
>
> The two new lines phrase it overly negatively and also misleading.
> I thought you were saying "We do this as optimization anyway; damn
> the correctness in the submodule case!" in my first reading before
> reading the statements the comment talks about.  "This optimization
> unfortunately cannot be done when recursing into submodules" would
> have been better.

haha yeah, I wasn't trying to be overly negative!  I agree that your
wording makes more sense.

>
>> + if (recurse_submodules)
>> + max_prefix = NULL;
>> + else
>> + max_prefix = common_prefix();
>>   max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
>
>> diff --git a/dir.c b/dir.c
>> index 0ea235f..630dc7a 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count)
>>

Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-16 Thread Junio C Hamano
Brandon Williams  writes:

> ...
>   [--full-name] [--recurse-submodules]
> - [--output-path-prefix=]
> + [--submodule-prefix=]
>   [--abbrev] [--] [...]
>  
> ---output-path-prefix=::
> +--submodule-prefix=::
>   Prepend the provided path to the output of each file
> ...
>  static int show_eol;
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
>  static int recurse_submodules;
> ...
>   static struct strbuf full_name = STRBUF_INIT;
> - if (output_path_prefix && *output_path_prefix) {
> + if (submodule_prefix && *submodule_prefix) {
>   strbuf_reset(_name);
> - strbuf_addstr(_name, output_path_prefix);
> + strbuf_addstr(_name, submodule_prefix);
>   strbuf_addstr(_name, name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

> + /* Add pathspec args */
> + argv_array_push(, "--");
> + for (i = 0; i < pathspec.nr; ++i)
> + argv_array_push(, pathspec.items[i].original);

OK, so as discussed previously with Heiko and Stefan, the idea is to

 - pass the original pathspec as-is,

 - when --submodule-prefix is given, a path discovered in a
   submodule repository is first prefixed with that string before
   getting checked to see if it matches the original pathspec.

And this loop is about relaying the original pathspec.

> @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
>  
>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>  {
> + struct strbuf name = STRBUF_INIT;
>   int len = max_prefix_len;
> + if (submodule_prefix)
> + strbuf_addstr(, submodule_prefix);
> + strbuf_addstr(, ce->name);
>  
>   if (len >= ce_namelen(ce))
> - die("git ls-files: internal error - cache entry not superset of 
> prefix");
> + die("git ls-files: internal error - cache entry not "
> + "superset of prefix");

This is not such a great thing to do.  Upon a bug report, we can no
longer do

git grep 'cache entry not superset'

to see where the error message is coming from.

> - if (!match_pathspec(, ce->name, ce_namelen(ce),
> - len, ps_matched,
> - S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
> - return;
> - if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
> + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> + submodule_path_match(, name.buf, ps_matched)) {
>   show_gitlink(ce);
> - return;
> - }
> + } else if (match_pathspec(, name.buf, name.len,
> +   len, ps_matched,
> +   S_ISDIR(ce->ce_mode) ||
> +   S_ISGITLINK(ce->ce_mode))) {
> + if (tag && *tag && show_valid_bit &&
> + ...

Argh.  If we had a preparatory clean-up step, would it have helped
to avoid this big re-indentation that makes the patch harder to read
than necessary, I wonder?

Another way would have been to "goto" from the end of this block

> + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> + submodule_path_match(, name.buf, ps_matched)) {

where we used to "return" out to the central clean-up location, i.e.
here.

> + strbuf_release();
>  }


>   parse_pathspec(, 0,
>  PATHSPEC_PREFER_CWD |
>  PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>  prefix, argv);
>  
> - /* Find common prefix for all pathspec's */
> - max_prefix = common_prefix();
> + /*
> +  * Find common prefix for all pathspec's
> +  * This is used as a performance optimization which violates correctness
> +  * in the recurse_submodules mode
> +  */

The two new lines phrase it overly negatively and also misleading.
I thought you were saying "We do this as optimization anyway; damn
the correctness in the submodule case!" in my first reading before
reading the statements the comment talks about.  "This optimization
unfortunately cannot be done when recursing into submodules" would
have been better.

> + if (recurse_submodules)
> + max_prefix = NULL;
> + else
> + max_prefix = common_prefix();
>   max_prefix_len = max_prefix ? strlen(max_prefix) : 0;

> diff --git a/dir.c b/dir.c
> index 0ea235f..630dc7a 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count)
>   return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +static int prefix_fnmatch(const struct pathspec_item *item,
> +const char *pattern, const char *string,
> +int prefix)
>