Re: [RFC] extending pathspec support to submodules
On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano wrote: > > * Your program that runs in the top-level superproject still needs >to be able to say "this pathspec from the top cannot possibly >match anything in the submodule, so let's not even bother >descending into it". > Yes, we would need to first check if the submodule is a prefix match to the pathspec. ie a submodule 'sub' would need to match the pathspec 'sub/somedir' or '*.txt' but not the pathspecs 'subdirectory' or 'otherdir' > > >So we may have to rethink what this option name should be. "You > > >are running in a repository that is used as a submodule in a > > >larger context, which has the submodule at this path" is what the > > >option tells the command; if any existing command already has > > >such an option, we should use it. If we are inventing one, > > >perhaps "--submodule-path" (I didn't check if there are existing > > >options that sound similar to it and mean completely different > > >things, in which case that name is not usable)? > > > > Would it make sense to add the '--submodule-path' to a more generic > > part of the code? It's not just ls-files/grep that have to solve exactly > > this > > problem. Up to now we just did not go for those commands, though. > > Yes I think so, since it should also handle starting from a submodule > with a pathspec to the superproject or other submodule. In case we > go with my above suggestion I would suggest a more generic name since > the option could also be passed to processes handling the superproject. > E.g. something like --module-prefix or --repository-prefix comes to my > mind, not checked though. Yeah we may want to come up with a more descriptive option name now which can be generally applied, especially if we are going to continue adding submodule support for other commands. -Brandon
Re: [RFC] extending pathspec support to submodules
Hi, On Thu, Sep 15, 2016 at 03:28:21PM -0700, Stefan Beller wrote: > On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano wrote: > > Brandon Williams writes: > > > >> You're right that seems like the best course of action and it already falls > >> inline with what I did with a first patch to ls-files to support > >> submodules. > >> In that patch I did exactly as you suggest and pass in the prefix to the > >> submodule and make the child responsible for prepending the prefix to all > >> of > >> its output. This way we can simply pass through the whole pathspec (as > >> apposed > >> to my original idea of stripping the prefix off the pathspec prior to > >> passing > >> it to the child...which can get complicated with wild characters) to the > >> childprocess and when checking if a file matches the pathspec we can check > >> if > >> the prefix + file path matches. > > > > That's brilliant. A few observations. > > > > * With that change to tell the command that is spawned in a > >submodule directory where the submodule repository is in the > >context of the top-level superproject _and_ require it to take a > >pathspec as relative to the top-level superproject, you no longer > >worry about having to find where to cut the pathspec given at the > >top-level to adjust it for the submodule's context. That may > >simplify things. > > I wonder how this plays together with the prefix in the superproject, e.g. > > cd super/unrelated-path > # when invoking a git command the internal prefix is "unrelated-path/" > git ls-files ../submodule-* > # a submodule in submodule-A would be run in submodule-A > # with a superproject prefix of super/ ? but additionally we nned > to know we're > # not at the root of the superproject. Do we need to know that? The internal prefix is internal to each repository and can be treated as such. I would expect that the prefix is only prefixed when needed. E.g. when we display output to the user, match files, ... How about "../submodule-A" as the submodule prefix in the situation you describe? The wildcard would be resolved by the superproject since the directory is still in its domain. I would think of the submodule prefix as the path relative to the command that started everything. E.g. if we have a tree like this: *--subA / super *--subB--subsubB \ *--dirC where subA, subB and subsubB are submodules and dirC is just a directory inside super. We would get the following prefixes when issuing a command in dirC that has a pathspec for subsubB: subB: ../subB subsubB: ../subB/subsubB An interesting case is when we issue a command in subA: super: .. subB:../subB subsubB: ../subB/subsubB A rule for the prefix option could be: Always specified when crossing a repository boundary with the pathspec (including upwards). I have not completely thought this through though so just take this as some food for thought. Since I am not sure what Junio's rationale behind making the prefix relative to the toplevel superproject was, but I guess finding it could be a challenge in some situations. I.e. is the repository in home directory tracking all the dot-files really the superproject or was it that other one I found before? > >So we may have to rethink what this option name should be. "You > >are running in a repository that is used as a submodule in a > >larger context, which has the submodule at this path" is what the > >option tells the command; if any existing command already has > >such an option, we should use it. If we are inventing one, > >perhaps "--submodule-path" (I didn't check if there are existing > >options that sound similar to it and mean completely different > >things, in which case that name is not usable)? > > Would it make sense to add the '--submodule-path' to a more generic > part of the code? It's not just ls-files/grep that have to solve exactly this > problem. Up to now we just did not go for those commands, though. Yes I think so, since it should also handle starting from a submodule with a pathspec to the superproject or other submodule. In case we go with my above suggestion I would suggest a more generic name since the option could also be passed to processes handling the superproject. E.g. something like --module-prefix or --repository-prefix comes to my mind, not checked though. Cheers Heiko
Re: [RFC] extending pathspec support to submodules
On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano wrote: > Brandon Williams writes: > >> You're right that seems like the best course of action and it already falls >> inline with what I did with a first patch to ls-files to support submodules. >> In that patch I did exactly as you suggest and pass in the prefix to the >> submodule and make the child responsible for prepending the prefix to all of >> its output. This way we can simply pass through the whole pathspec (as >> apposed >> to my original idea of stripping the prefix off the pathspec prior to passing >> it to the child...which can get complicated with wild characters) to the >> childprocess and when checking if a file matches the pathspec we can check if >> the prefix + file path matches. > > That's brilliant. A few observations. > > * With that change to tell the command that is spawned in a >submodule directory where the submodule repository is in the >context of the top-level superproject _and_ require it to take a >pathspec as relative to the top-level superproject, you no longer >worry about having to find where to cut the pathspec given at the >top-level to adjust it for the submodule's context. That may >simplify things. I wonder how this plays together with the prefix in the superproject, e.g. cd super/unrelated-path # when invoking a git command the internal prefix is "unrelated-path/" git ls-files ../submodule-* # a submodule in submodule-A would be run in submodule-A # with a superproject prefix of super/ ? but additionally we nned to know we're # not at the root of the superproject. >So we may have to rethink what this option name should be. "You >are running in a repository that is used as a submodule in a >larger context, which has the submodule at this path" is what the >option tells the command; if any existing command already has >such an option, we should use it. If we are inventing one, >perhaps "--submodule-path" (I didn't check if there are existing >options that sound similar to it and mean completely different >things, in which case that name is not usable)? Would it make sense to add the '--submodule-path' to a more generic part of the code? It's not just ls-files/grep that have to solve exactly this problem. Up to now we just did not go for those commands, though. Thanks
Re: [RFC] extending pathspec support to submodules
Brandon Williams writes: > You're right that seems like the best course of action and it already falls > inline with what I did with a first patch to ls-files to support submodules. > In that patch I did exactly as you suggest and pass in the prefix to the > submodule and make the child responsible for prepending the prefix to all of > its output. This way we can simply pass through the whole pathspec (as > apposed > to my original idea of stripping the prefix off the pathspec prior to passing > it to the child...which can get complicated with wild characters) to the > childprocess and when checking if a file matches the pathspec we can check if > the prefix + file path matches. That's brilliant. A few observations. * With that change to tell the command that is spawned in a submodule directory where the submodule repository is in the context of the top-level superproject _and_ require it to take a pathspec as relative to the top-level superproject, you no longer worry about having to find where to cut the pathspec given at the top-level to adjust it for the submodule's context. That may simplify things. * Your program that runs in the top-level superproject still needs to be able to say "this pathspec from the top cannot possibly match anything in the submodule, so let's not even bother descending into it". * Earlier while reviewing "ls-files" recursion, I suggested (and you took) --output-path-prefix as the option name, because it was meant to be "when you output any path, prefix this string". But the suggested name is suboptimal, as it is no longer an option that is only about "output". A command that runs in a submodule would: - enumerate paths in the context of the submodule repository, - prepend the "prefix" to these paths, - filter by applying the full-tree pathspec, and - work on the surviving paths after filtering. When the last step, "work on", involves just "printing", the whole path (with "prefix") is sent to the output. If it involves some operation relative to the submodule repository (e.g. seeing if it is in the index), the "prefix" may have to be stripped while the operation is carried out. So we may have to rethink what this option name should be. "You are running in a repository that is used as a submodule in a larger context, which has the submodule at this path" is what the option tells the command; if any existing command already has such an option, we should use it. If we are inventing one, perhaps "--submodule-path" (I didn't check if there are existing options that sound similar to it and mean completely different things, in which case that name is not usable)? Thanks.
Re: [RFC] extending pathspec support to submodules
On Thu, Sep 15, 2016 at 4:57 AM, Heiko Voigt wrote: > > The problem when you do that is that the child is not aware that it is > actually run as a submodule process. E.g. > >git grep --recurse-submodules foobar -- sub/dir/a > > would report back matches in 'dir/a' instead of 'sub/dir/a'. From the > users perspective this will be confusing since she started the grep in > the superproject. > > So what I would suggest is that we make the childs aware of the fact > that they are called from a superproject by passing a prefix when > calling it. E.g. like this > >git grep --submodule-prefix=sub foobar -- sub/dir/a > > Whether we pass the full or stripped path is now a matter of taste or > ease of implementation, since we could either preprend or strip it in > the child. But the important difference is that we have information > about the original path. > > Another approach could, of course, be to omit passing the prefix and > parse the output of the child and try to substitute, paths but that > seems quite error prone to me. > > Cheers Heiko You're right that seems like the best course of action and it already falls inline with what I did with a first patch to ls-files to support submodules. In that patch I did exactly as you suggest and pass in the prefix to the submodule and make the child responsible for prepending the prefix to all of its output. This way we can simply pass through the whole pathspec (as apposed to my original idea of stripping the prefix off the pathspec prior to passing it to the child...which can get complicated with wild characters) to the childprocess and when checking if a file matches the pathspec we can check if the prefix + file path matches. -Brandon
Re: [RFC] extending pathspec support to submodules
Hi, On Wed, Sep 14, 2016 at 04:57:53PM -0700, Brandon Williams wrote: > --- > I've been trying to think through how we could potentially add pathspec > support > for --recurse-submodule options (for builtins like ls-files or grep down the > line). This is something that could be useful if the user supply's a pathspec > that could match to a file in a submodule. We could match the submodule to > the > pathspec and then fork the process to recursively run the command on the > submodule which can be passed a modified pathspec. > > For example with a pathspec 'sub/dir/a', where sub is a submodule in the root > directory of the supermodule's repo, we could match 'sub' to that spec and > then > recursively call the git command with a pathspec of 'dir/a'. The child > process > would then have the responsibility of matching 'dir/a' to files in its repo. > > Does this seem like a reasonable feature to add? And if so are how is my > initial approach at solving the problem? The problem when you do that is that the child is not aware that it is actually run as a submodule process. E.g. git grep --recurse-submodules foobar -- sub/dir/a would report back matches in 'dir/a' instead of 'sub/dir/a'. From the users perspective this will be confusing since she started the grep in the superproject. So what I would suggest is that we make the childs aware of the fact that they are called from a superproject by passing a prefix when calling it. E.g. like this git grep --submodule-prefix=sub foobar -- sub/dir/a Whether we pass the full or stripped path is now a matter of taste or ease of implementation, since we could either preprend or strip it in the child. But the important difference is that we have information about the original path. Another approach could, of course, be to omit passing the prefix and parse the output of the child and try to substitute, paths but that seems quite error prone to me. Cheers Heiko
[RFC] extending pathspec support to submodules
--- I've been trying to think through how we could potentially add pathspec support for --recurse-submodule options (for builtins like ls-files or grep down the line). This is something that could be useful if the user supply's a pathspec that could match to a file in a submodule. We could match the submodule to the pathspec and then fork the process to recursively run the command on the submodule which can be passed a modified pathspec. For example with a pathspec 'sub/dir/a', where sub is a submodule in the root directory of the supermodule's repo, we could match 'sub' to that spec and then recursively call the git command with a pathspec of 'dir/a'. The child process would then have the responsibility of matching 'dir/a' to files in its repo. Does this seem like a reasonable feature to add? And if so are how is my initial approach at solving the problem? One idea I had was to add a submodule match flag in order to perform special matching just in the --recurse-submodules cases since we'll want somethings to match here that wouldn't normally match. @@ -283,6 +284,29 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix, item->nowildcard_len - prefix)) return MATCHED_FNMATCH; + /* +* Preform some checks to see if "name" is a super set of the pathspec +*/ + if (flags & DO_MATCH_SUBMODULE) { + struct strbuf buf = STRBUF_INIT; + strbuf_addstr(&buf, name); + strbuf_addch(&buf, '/'); + /* +* Check if the name is a prefix of the pathspec +*/ + if ((item->match[namelen] == '/') && + !ps_strncmp(item, match, name, namelen)) + return MATCHED_RECURSIVELY; + /* +* Check if the name wildmatches to the pathspec +*/ + if (!wildmatch(item->match, buf.buf, + WM_PREFIX | + (item->magic & PATHSPEC_ICASE ? WM_CASEFOLD : 0), + NULL)); + return MATCHED_FNMATCH; + } + return 0; } One of the main difficulties I was having is figuring out how wildmatching should be applied in this case. What I believe we want is the ability for the whole name of the submodule to match a prefix of the pathspec pattern. To do this I was thinking of adding a flag to do prefix matching to the wildmatch function like so: diff --git a/wildmatch.c b/wildmatch.c index 57c8765..f1e1725 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -60,8 +60,12 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) for ( ; (p_ch = *p) != '\0'; text++, p++) { int matched, match_slash, negated; uchar t_ch, prev_ch; - if ((t_ch = *text) == '\0' && p_ch != '*') - return WM_ABORT_ALL; + if ((t_ch = *text) == '\0' && p_ch != '*') { + if ((flags & WM_PREFIX) && (*(p-1) == '/')) + return WM_MATCH; + else + return WM_ABORT_ALL; + } if ((flags & WM_CASEFOLD) && ISUPPER(t_ch)) t_ch = tolower(t_ch); if ((flags & WM_CASEFOLD) && ISUPPER(p_ch)) diff --git a/wildmatch.h b/wildmatch.h index 4090c8f..490db51 100644 --- a/wildmatch.h +++ b/wildmatch.h @@ -3,6 +3,7 @@ #define WM_CASEFOLD 1 #define WM_PATHNAME 2 +#define WM_PREFIX 4 #define WM_ABORT_MALFORMED 2 #define WM_NOMATCH 1 -- Any comments or thoughts on this would be appreciated. Thanks, Brandon