Re: [PATCH v2] ls-files: add pathspec matching for submodules
Junio C Hamano writes: > Brandon Williams writes: > >> On a similar but slightly different note. In general do we want >> the pathspec '??b' to match against the sib/ directory and >> subsequently have ls-files print all entries inside of the sib/ >> directory? (this is in the non-recursive case) > > I'd need to find time to dig a bit of history before I can give a > firm opinion on this, but here is a knee-jerk version of my reaction. In the context of what you are doing, i.e. "ls-files that recurses into submodules", my opinion is that "ls-files --recurse-submodules" should behave wrt pathspecs AS IF all the submodule contents are flattened into a single index of the superproject. In the sample scenario under discussion, i.e. In the superproject we have these $ git ls-files -s 100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0 .gitmodules 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 sib/file 16 1f5a0695289c500f25e7fa55e3ad27e394d1206b 0 sub In 'sub' submodule we have this 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 file such a flattend index would look like this: 100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0 .gitmodules 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 sib/file 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 sub/file i.e. removing 'sub' submodule entry from the index of the superproject and overlay everything in the submodule with sub/ prefixed to its path. And with such an index, if and only if a path matches a pathspec, "git ls-files --recurse-submodules" run at the toplevel with the same pathspec should show the path. That means $ git ls-files --recurse-submodules '??b' would show nothing (not even 'sub'), while $ git ls-files --recurse-submodules '??b*' should show sib/file and sub/file. That is because that is how the command without "--recurse-submodules" working on that flat index would produce. The "we have historically two kinds of pathspecs and they differ how they work with wildcard" is a separate issue, I would think, even though the result would affect what should happen in the above example (i.e. if we said "either a pattern match or a literal match to a leading directory path should make everything underneath match", '??b' would make sib/ and sub/ to be shown).
Re: [PATCH v2] ls-files: add pathspec matching for submodules
Brandon Williams writes: > On a similar but slightly different note. In general do we want > the pathspec '??b' to match against the sib/ directory and > subsequently have ls-files print all entries inside of the sib/ > directory? (this is in the non-recursive case) I'd need to find time to dig a bit of history before I can give a firm opinion on this, but here is a knee-jerk version of my reaction. * A pathspec element that matches literally to a directory causes itself and everything underneath the directory match that element, e.g. "sib" would be considered a match. * Otherwise, a pathspec that matches with the whole path as a pattern matches the path, e.g. "??b" would match "sib" itself, but not "sib/file". Note that "??b*" would match "sib" and "sib/file" because the pattern match is without FNM_PATNAME unless ':(glob)' magic is in effect. Historically, some commands treated a pathspec as purely a prefix match (i.e. the former) and did not use _any_ pattern matching, while other commands did both of the above two (e.g. compare ls-tree and ls-files). I thought we were slowly moving towards unifying them, but apparently 'git log -- "D?cumentation"' does not show anything close to what 'git log -- Documentation' gives us even in today's Git. Probably we want to change it at some point so that a pattern that matches one leading directory would cause everything underneath to match, e.g. "??b" would include "sib/file" just because "sib" would.
Re: [PATCH v2] ls-files: add pathspec matching for submodules
>>> + >>> + if (item->flags & PATHSPEC_ONESTAR) { >>> + return WM_MATCH; >>> + } else if (item->magic & PATHSPEC_GLOB) { >>> + return wildmatch(pattern, string, >>> + WM_PATHNAME | >>> + (item->magic & PATHSPEC_ICASE ? >>> + WM_CASEFOLD : 0), >>> + NULL); >> >> Isn't this last one overly tight? I am wondering about a scenario >> where you have a submodule at "sub/" in the superproject, and "sub/" >> has a "file" at the top of its working tree. And you do: >> >> git ls-files --recurse-submodules ':(glob)??b/fi?e' >> >> at the top of the superproject. The "pattern" would be '??b/fi?e" >> while string would be 'sub', and wildmatch() would not like it, but >> there is no way for this caller to append anything to 'sub' before >> making this call, as it hasn't looked into what paths appear in the >> submodule repository (and it should not want to). And I think we >> would want it to recurse to find sub/file. IOW, this looks like a >> false negative we must avoid in this function. As we cannot afford >> to check if anything that matches 'fi?e' is in the index file of the >> submodule repository, we shouldn't try to match 'fi?e' portion of >> the given pathspec pattern. > > good point. Let me think about this some more. On a similar but slightly different note. In general do we want the pathspec '??b' to match against the sib/ directory and subsequently have ls-files print all entries inside of the sib/ directory? (this is in the non-recursive case)
Re: [PATCH v2] ls-files: add pathspec matching for submodules
On Mon, Sep 19, 2016 at 4:21 PM, Junio C Hamano wrote: > > As the previous one that used a wrong (sorry) argument is not even > in 'next' yet, let's pretend that it never happened. It is OK to > still keep it and this patch as two separate steps, i.e. a topic > with two patches in it. > > That means that this patch will become 2/2 of a series, and 1/2 is > rerolled to use submodule_prefix from the get-go, without ever > introducing output_path_prefix variable, so that many of the above > lines we won't have to review in 2/2. Ah ok. Would you like me to resend the first patch with the desired change then? > >> + /* >> + * Pass in the original pathspec args. The submodule will be >> + * responsible for prepending the 'submodule_prefix' prior to comparing >> + * against the pathspec for matches. >> + */ > > Good. > >> + argv_array_push(&cp.args, "--"); >> + for (i = 0; i < pathspec.nr; ++i) >> + argv_array_push(&cp.args, pathspec.items[i].original); >> + > > Please prefer post-increment i++ over pre-increment ++i when writing > a for(;;) loop, unless there is a strong reason not to (familiarlity > in C++ is not a good reason). I had a compiler instructor drill into my head that ++i in a for loop was faster historically since it wouldn't have to create a temporary value. Of course now a days there probably isn't much (or any) difference between the two. If post-fix operators are the norm in git code then I can try to remember to use them :) >> + >> + if (item->flags & PATHSPEC_ONESTAR) { >> + return WM_MATCH; >> + } else if (item->magic & PATHSPEC_GLOB) { >> + return wildmatch(pattern, string, >> + WM_PATHNAME | >> + (item->magic & PATHSPEC_ICASE ? >> + WM_CASEFOLD : 0), >> + NULL); > > Isn't this last one overly tight? I am wondering about a scenario > where you have a submodule at "sub/" in the superproject, and "sub/" > has a "file" at the top of its working tree. And you do: > > git ls-files --recurse-submodules ':(glob)??b/fi?e' > > at the top of the superproject. The "pattern" would be '??b/fi?e" > while string would be 'sub', and wildmatch() would not like it, but > there is no way for this caller to append anything to 'sub' before > making this call, as it hasn't looked into what paths appear in the > submodule repository (and it should not want to). And I think we > would want it to recurse to find sub/file. IOW, this looks like a > false negative we must avoid in this function. As we cannot afford > to check if anything that matches 'fi?e' is in the index file of the > submodule repository, we shouldn't try to match 'fi?e' portion of > the given pathspec pattern. good point. Let me think about this some more.
Re: [PATCH v2] ls-files: add pathspec matching for submodules
Brandon Williams writes: > diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt > index a623ebf..09e4449 100644 > --- a/Documentation/git-ls-files.txt > +++ b/Documentation/git-ls-files.txt > @@ -19,7 +19,7 @@ SYNOPSIS > - [--output-path-prefix=] > + [--submodule-prefix=] > ... > ---output-path-prefix=:: > +--submodule-prefix=:: > ... > -static const char *output_path_prefix; > +static const char *submodule_prefix; > ... > - if (output_path_prefix && *output_path_prefix) { > + if (submodule_prefix && *submodule_prefix) { > strbuf_reset(&full_name); > - strbuf_addstr(&full_name, output_path_prefix); > + strbuf_addstr(&full_name, submodule_prefix); > ... > - argv_array_pushf(&cp.args, "--output-path-prefix=%s%s/", > - output_path_prefix ? output_path_prefix : "", > + argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/", > + submodule_prefix ? submodule_prefix : "", >ce->name); As the previous one that used a wrong (sorry) argument is not even in 'next' yet, let's pretend that it never happened. It is OK to still keep it and this patch as two separate steps, i.e. a topic with two patches in it. That means that this patch will become 2/2 of a series, and 1/2 is rerolled to use submodule_prefix from the get-go, without ever introducing output_path_prefix variable, so that many of the above lines we won't have to review in 2/2. > + /* > + * Pass in the original pathspec args. The submodule will be > + * responsible for prepending the 'submodule_prefix' prior to comparing > + * against the pathspec for matches. > + */ Good. > + argv_array_push(&cp.args, "--"); > + for (i = 0; i < pathspec.nr; ++i) > + argv_array_push(&cp.args, pathspec.items[i].original); > + Please prefer post-increment i++ over pre-increment ++i when writing a for(;;) loop, unless there is a strong reason not to (familiarlity in C++ is not a good reason). > diff --git a/dir.c b/dir.c > index 0ea235f..1afc3ff 100644 > --- a/dir.c > +++ b/dir.c > @@ -63,6 +63,35 @@ int fspathncmp(const char *a, const char *b, size_t count) > return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count); > } > > +/** > + * Used to perform prefix matching against a pathspec_item for determining > if we > + * should decend into a submodule. This function can result in false > positives > + * since we are only trying to match the 'string' to a prefix of the > 'pattern' > + */ Perhaps s/can/is allowed to/. Implicit but equally important is that it is not OK to produce false negative. > +static int prefix_fnmatch(const struct pathspec_item *item, > +const char *pattern, const char *string, > +int prefix) > +{ > + if (prefix > 0) { > + if (ps_strncmp(item, pattern, string, prefix)) > + return WM_NOMATCH; > + pattern += prefix; > + string += prefix; > + } > + > + if (item->flags & PATHSPEC_ONESTAR) { > + return WM_MATCH; > + } else if (item->magic & PATHSPEC_GLOB) { > + return wildmatch(pattern, string, > + WM_PATHNAME | > + (item->magic & PATHSPEC_ICASE ? > + WM_CASEFOLD : 0), > + NULL); Isn't this last one overly tight? I am wondering about a scenario where you have a submodule at "sub/" in the superproject, and "sub/" has a "file" at the top of its working tree. And you do: git ls-files --recurse-submodules ':(glob)??b/fi?e' at the top of the superproject. The "pattern" would be '??b/fi?e" while string would be 'sub', and wildmatch() would not like it, but there is no way for this caller to append anything to 'sub' before making this call, as it hasn't looked into what paths appear in the submodule repository (and it should not want to). And I think we would want it to recurse to find sub/file. IOW, this looks like a false negative we must avoid in this function. As we cannot afford to check if anything that matches 'fi?e' is in the index file of the submodule repository, we shouldn't try to match 'fi?e' portion of the given pathspec pattern. Extending the scenario, if I also create a directory "sib/" in the superproject next to "sub/" and add a "file" in it, the same pathspec: git ls-files [--recurse-submodules] ':(glob)??b/fi?e' does find "sib/file" (with or without the recursion option). Duy cc'ed because I am not quite sure why it is a good thing that I need to add an exlicit ':(glob)' in these examples to illustrate the breakage. This comes from v1.8.3.2-849-gbd30c2e ("pathspec: support :(glob) syntax", 2013-07-14) and I do not think we want to change its behaviour. I just want to be reminded why we did this. I am guessing tha
[PATCH v2] ls-files: add pathspec matching for submodules
Pathspecs can be a bit tricky when trying to apply them to submodules. The main challenge is that the pathspecs will be with respect to the super module and not with respect to paths in the submodule. The approach this patch takes is to pass in the identical pathspec from the super module to the submodule in addition to the submodule-prefix, which is the path from the root of the super module to the submodule, and then we can compare an entry in the submodule prepended with the submodule-prefix to the pathspec in order to determine if there is a match. This patch also permits the pathspec logic to perform a prefix match against submodules since a pathspec could refer to a file inside of a submodule. This can can result in false positive matches since a super module doesn't know what files could be in the submodule. Signed-off-by: Brandon Williams --- Documentation/git-ls-files.txt | 4 +- builtin/ls-files.c | 144 - dir.c | 67 ++- dir.h | 4 + t/t3007-ls-files-recurse-submodules.sh | 66 +-- 5 files changed, 215 insertions(+), 70 deletions(-) diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index a623ebf..09e4449 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -19,7 +19,7 @@ SYNOPSIS [--exclude-standard] [--error-unmatch] [--with-tree=] [--full-name] [--recurse-submodules] - [--output-path-prefix=] + [--submodule-prefix=] [--abbrev] [--] [...] DESCRIPTION @@ -143,7 +143,7 @@ a space) at the start of each line: Recursively calls ls-files on each submodule in the repository. Currently there is only support for the --cached mode. ---output-path-prefix=:: +--submodule-prefix=:: Prepend the provided path to the output of each file --abbrev[=]:: diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 687e475..1417f44 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -29,7 +29,7 @@ static int show_valid_bit; static int line_terminator = '\n'; static int debug_mode; static int show_eol; -static const char *output_path_prefix; +static const char *submodule_prefix; static int recurse_submodules; static const char *prefix; @@ -78,9 +78,9 @@ static void write_name(const char *name) * churn. */ static struct strbuf full_name = STRBUF_INIT; - if (output_path_prefix && *output_path_prefix) { + if (submodule_prefix && *submodule_prefix) { strbuf_reset(&full_name); - strbuf_addstr(&full_name, output_path_prefix); + strbuf_addstr(&full_name, submodule_prefix); strbuf_addstr(&full_name, name); name = full_name.buf; } @@ -177,12 +177,34 @@ static void show_gitlink(const struct cache_entry *ce) { struct child_process cp = CHILD_PROCESS_INIT; int status; + int i; argv_array_push(&cp.args, "ls-files"); argv_array_push(&cp.args, "--recurse-submodules"); - argv_array_pushf(&cp.args, "--output-path-prefix=%s%s/", -output_path_prefix ? output_path_prefix : "", + argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/", +submodule_prefix ? submodule_prefix : "", ce->name); + /* add options */ + if (show_eol) + argv_array_push(&cp.args, "--eol"); + if (show_valid_bit) + argv_array_push(&cp.args, "-v"); + if (show_stage) + argv_array_push(&cp.args, "--stage"); + if (show_cached) + argv_array_push(&cp.args, "--cached"); + if (debug_mode) + argv_array_push(&cp.args, "--debug"); + + /* +* Pass in the original pathspec args. The submodule will be +* responsible for prepending the 'submodule_prefix' prior to comparing +* against the pathspec for matches. +*/ + argv_array_push(&cp.args, "--"); + for (i = 0; i < pathspec.nr; ++i) + argv_array_push(&cp.args, pathspec.items[i].original); + cp.git_cmd = 1; cp.dir = ce->name; status = run_command(&cp); @@ -192,57 +214,62 @@ static void show_gitlink(const struct cache_entry *ce) static void show_ce_entry(const char *tag, const struct cache_entry *ce) { + struct strbuf name = STRBUF_INIT; int len = max_prefix_len; + if (submodule_prefix) + strbuf_addstr(&name, submodule_prefix); + strbuf_addstr(&name, ce->name); if (len >= ce_namelen(ce)) die("git ls-files: internal error - cache entry not superset of prefix"); - if (!match_pathspec(&pathspec, ce->name, ce_namelen(ce), - len, ps_matched, -