Re: [RFC] extending pathspec support to submodules

2016-09-16 Thread Brandon Williams
On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano  wrote:
>
>  * Your program that runs in the top-level superproject still needs
>to be able to say "this pathspec from the top cannot possibly
>match anything in the submodule, so let's not even bother
>descending into it".
>

Yes, we would need to first check if the submodule is a prefix match to the
pathspec.  ie a submodule 'sub' would need to match the pathspec 'sub/somedir'
or '*.txt' but not the pathspecs 'subdirectory' or 'otherdir'

> > >So we may have to rethink what this option name should be.  "You
> > >are running in a repository that is used as a submodule in a
> > >larger context, which has the submodule at this path" is what the
> > >option tells the command; if any existing command already has
> > >such an option, we should use it.  If we are inventing one,
> > >perhaps "--submodule-path" (I didn't check if there are existing
> > >options that sound similar to it and mean completely different
> > >things, in which case that name is not usable)?
> >
> > Would it make sense to add the '--submodule-path' to a more generic
> > part of the code? It's not just ls-files/grep that have to solve exactly 
> > this
> > problem. Up to now we just did not go for those commands, though.
>
> Yes I think so, since it should also handle starting from a submodule
> with a pathspec to the superproject or other submodule. In case we
> go with my above suggestion I would suggest a more generic name since
> the option could also be passed to processes handling the superproject.
> E.g. something like --module-prefix or --repository-prefix comes to my
> mind, not checked though.

Yeah we may want to come up with a more descriptive option name now which can
be generally applied, especially if we are going to continue adding submodule
support for other commands.

-Brandon


Re: [RFC] extending pathspec support to submodules

2016-09-16 Thread Heiko Voigt
Hi,

On Thu, Sep 15, 2016 at 03:28:21PM -0700, Stefan Beller wrote:
> On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano  wrote:
> > Brandon Williams  writes:
> >
> >> You're right that seems like the best course of action and it already falls
> >> inline with what I did with a first patch to ls-files to support 
> >> submodules.
> >> In that patch I did exactly as you suggest and pass in the prefix to the
> >> submodule and make the child responsible for prepending the prefix to all 
> >> of
> >> its output.  This way we can simply pass through the whole pathspec (as 
> >> apposed
> >> to my original idea of stripping the prefix off the pathspec prior to 
> >> passing
> >> it to the child...which can get complicated with wild characters) to the
> >> childprocess and when checking if a file matches the pathspec we can check 
> >> if
> >> the prefix + file path matches.
> >
> > That's brilliant.  A few observations.
> >
> >  * With that change to tell the command that is spawned in a
> >submodule directory where the submodule repository is in the
> >context of the top-level superproject _and_ require it to take a
> >pathspec as relative to the top-level superproject, you no longer
> >worry about having to find where to cut the pathspec given at the
> >top-level to adjust it for the submodule's context.  That may
> >simplify things.
> 
> I wonder how this plays together with the prefix in the superproject, e.g.
> 
> cd super/unrelated-path
> # when invoking a git command the internal prefix is "unrelated-path/"
> git ls-files ../submodule-*
> # a submodule in submodule-A would be run in  submodule-A
> # with a superproject prefix of super/ ? but additionally we nned
> to know we're
> # not at the root of the superproject.

Do we need to know that? The internal prefix is internal to each
repository and can be treated as such. I would expect that the prefix is
only prefixed when needed. E.g. when we display output to the user,
match files, ...

How about "../submodule-A" as the submodule prefix in the situation you
describe? The wildcard would be resolved by the superproject since the
directory is still in its domain.

I would think of the submodule prefix as the path relative to the
command that started everything. E.g. if we have a tree like this:

*--subA
   /
super *--subB--subsubB
   \
*--dirC

where subA, subB and subsubB are submodules and dirC is just a
directory inside super.

We would get the following prefixes when issuing a command in dirC that
has a pathspec for subsubB:

  subB: ../subB
  subsubB: ../subB/subsubB

An interesting case is when we issue a command in subA:

  super:   ..
  subB:../subB
  subsubB: ../subB/subsubB

A rule for the prefix option could be: Always specified when crossing a
repository boundary with the pathspec (including upwards).

I have not completely thought this through though so just take this as
some food for thought. Since I am not sure what Junio's rationale behind
making the prefix relative to the toplevel superproject was, but I guess
finding it could be a challenge in some situations. I.e. is the
repository in home directory tracking all the dot-files really the
superproject or was it that other one I found before?

> >So we may have to rethink what this option name should be.  "You
> >are running in a repository that is used as a submodule in a
> >larger context, which has the submodule at this path" is what the
> >option tells the command; if any existing command already has
> >such an option, we should use it.  If we are inventing one,
> >perhaps "--submodule-path" (I didn't check if there are existing
> >options that sound similar to it and mean completely different
> >things, in which case that name is not usable)?
> 
> Would it make sense to add the '--submodule-path' to a more generic
> part of the code? It's not just ls-files/grep that have to solve exactly this
> problem. Up to now we just did not go for those commands, though.

Yes I think so, since it should also handle starting from a submodule
with a pathspec to the superproject or other submodule. In case we
go with my above suggestion I would suggest a more generic name since
the option could also be passed to processes handling the superproject.
E.g. something like --module-prefix or --repository-prefix comes to my
mind, not checked though.

Cheers Heiko


Re: [RFC] extending pathspec support to submodules

2016-09-15 Thread Stefan Beller
On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>> You're right that seems like the best course of action and it already falls
>> inline with what I did with a first patch to ls-files to support submodules.
>> In that patch I did exactly as you suggest and pass in the prefix to the
>> submodule and make the child responsible for prepending the prefix to all of
>> its output.  This way we can simply pass through the whole pathspec (as 
>> apposed
>> to my original idea of stripping the prefix off the pathspec prior to passing
>> it to the child...which can get complicated with wild characters) to the
>> childprocess and when checking if a file matches the pathspec we can check if
>> the prefix + file path matches.
>
> That's brilliant.  A few observations.
>
>  * With that change to tell the command that is spawned in a
>submodule directory where the submodule repository is in the
>context of the top-level superproject _and_ require it to take a
>pathspec as relative to the top-level superproject, you no longer
>worry about having to find where to cut the pathspec given at the
>top-level to adjust it for the submodule's context.  That may
>simplify things.

I wonder how this plays together with the prefix in the superproject, e.g.

cd super/unrelated-path
# when invoking a git command the internal prefix is "unrelated-path/"
git ls-files ../submodule-*
# a submodule in submodule-A would be run in  submodule-A
# with a superproject prefix of super/ ? but additionally we nned
to know we're
# not at the root of the superproject.

>So we may have to rethink what this option name should be.  "You
>are running in a repository that is used as a submodule in a
>larger context, which has the submodule at this path" is what the
>option tells the command; if any existing command already has
>such an option, we should use it.  If we are inventing one,
>perhaps "--submodule-path" (I didn't check if there are existing
>options that sound similar to it and mean completely different
>things, in which case that name is not usable)?

Would it make sense to add the '--submodule-path' to a more generic
part of the code? It's not just ls-files/grep that have to solve exactly this
problem. Up to now we just did not go for those commands, though.

Thanks


Re: [RFC] extending pathspec support to submodules

2016-09-15 Thread Junio C Hamano
Brandon Williams  writes:

> You're right that seems like the best course of action and it already falls
> inline with what I did with a first patch to ls-files to support submodules.
> In that patch I did exactly as you suggest and pass in the prefix to the
> submodule and make the child responsible for prepending the prefix to all of
> its output.  This way we can simply pass through the whole pathspec (as 
> apposed
> to my original idea of stripping the prefix off the pathspec prior to passing
> it to the child...which can get complicated with wild characters) to the
> childprocess and when checking if a file matches the pathspec we can check if
> the prefix + file path matches.

That's brilliant.  A few observations.

 * With that change to tell the command that is spawned in a
   submodule directory where the submodule repository is in the
   context of the top-level superproject _and_ require it to take a
   pathspec as relative to the top-level superproject, you no longer
   worry about having to find where to cut the pathspec given at the
   top-level to adjust it for the submodule's context.  That may
   simplify things.

 * Your program that runs in the top-level superproject still needs
   to be able to say "this pathspec from the top cannot possibly
   match anything in the submodule, so let's not even bother
   descending into it".

 * Earlier while reviewing "ls-files" recursion, I suggested (and
   you took) --output-path-prefix as the option name, because it was
   meant to be "when you output any path, prefix this string".  But
   the suggested name is suboptimal, as it is no longer an option
   that is only about "output".  A command that runs in a submodule
   would:

   - enumerate paths in the context of the submodule repository,
   - prepend the "prefix" to these paths,
   - filter by applying the full-tree pathspec, and
   - work on the surviving paths after filtering.

   When the last step, "work on", involves just "printing", the
   whole path (with "prefix") is sent to the output.  If it involves
   some operation relative to the submodule repository (e.g. seeing
   if it is in the index), the "prefix" may have to be stripped
   while the operation is carried out.

   So we may have to rethink what this option name should be.  "You
   are running in a repository that is used as a submodule in a
   larger context, which has the submodule at this path" is what the
   option tells the command; if any existing command already has
   such an option, we should use it.  If we are inventing one,
   perhaps "--submodule-path" (I didn't check if there are existing
   options that sound similar to it and mean completely different
   things, in which case that name is not usable)?

Thanks.

   


Re: [RFC] extending pathspec support to submodules

2016-09-15 Thread Brandon Williams
On Thu, Sep 15, 2016 at 4:57 AM, Heiko Voigt  wrote:
>
> The problem when you do that is that the child is not aware that it is
> actually run as a submodule process. E.g.
>
>git grep --recurse-submodules foobar -- sub/dir/a
>
> would report back matches in 'dir/a' instead of 'sub/dir/a'. From the
> users perspective this will be confusing since she started the grep in
> the superproject.
>
> So what I would suggest is that we make the childs aware of the fact
> that they are called from a superproject by passing a prefix when
> calling it. E.g. like this
>
>git grep --submodule-prefix=sub foobar -- sub/dir/a
>
> Whether we pass the full or stripped path is now a matter of taste or
> ease of implementation, since we could either preprend or strip it in
> the child. But the important difference is that we have information
> about the original path.
>
> Another approach could, of course, be to omit passing the prefix and
> parse the output of the child and try to substitute, paths but that
> seems quite error prone to me.
>
> Cheers Heiko

You're right that seems like the best course of action and it already falls
inline with what I did with a first patch to ls-files to support submodules.
In that patch I did exactly as you suggest and pass in the prefix to the
submodule and make the child responsible for prepending the prefix to all of
its output.  This way we can simply pass through the whole pathspec (as apposed
to my original idea of stripping the prefix off the pathspec prior to passing
it to the child...which can get complicated with wild characters) to the
childprocess and when checking if a file matches the pathspec we can check if
the prefix + file path matches.

-Brandon


Re: [RFC] extending pathspec support to submodules

2016-09-15 Thread Heiko Voigt
Hi,

On Wed, Sep 14, 2016 at 04:57:53PM -0700, Brandon Williams wrote:
> ---
> I've been trying to think through how we could potentially add pathspec 
> support
> for --recurse-submodule options (for builtins like ls-files or grep down the
> line).  This is something that could be useful if the user supply's a pathspec
> that could match to a file in a submodule.  We could match the submodule to 
> the
> pathspec and then fork the process to recursively run the command on the
> submodule which can be passed a modified pathspec.
> 
> For example with a pathspec 'sub/dir/a', where sub is a submodule in the root
> directory of the supermodule's repo, we could match 'sub' to that spec and 
> then
> recursively call the git command with a pathspec of 'dir/a'.  The child 
> process
> would then have the responsibility of matching 'dir/a' to files in its repo.
> 
> Does this seem like a reasonable feature to add? And if so are how is my
> initial approach at solving the problem?

The problem when you do that is that the child is not aware that it is
actually run as a submodule process. E.g.

   git grep --recurse-submodules foobar -- sub/dir/a

would report back matches in 'dir/a' instead of 'sub/dir/a'. From the
users perspective this will be confusing since she started the grep in
the superproject.

So what I would suggest is that we make the childs aware of the fact
that they are called from a superproject by passing a prefix when
calling it. E.g. like this

   git grep --submodule-prefix=sub foobar -- sub/dir/a

Whether we pass the full or stripped path is now a matter of taste or
ease of implementation, since we could either preprend or strip it in
the child. But the important difference is that we have information
about the original path.

Another approach could, of course, be to omit passing the prefix and
parse the output of the child and try to substitute, paths but that
seems quite error prone to me.

Cheers Heiko


[RFC] extending pathspec support to submodules

2016-09-14 Thread Brandon Williams
---
I've been trying to think through how we could potentially add pathspec support
for --recurse-submodule options (for builtins like ls-files or grep down the
line).  This is something that could be useful if the user supply's a pathspec
that could match to a file in a submodule.  We could match the submodule to the
pathspec and then fork the process to recursively run the command on the
submodule which can be passed a modified pathspec.

For example with a pathspec 'sub/dir/a', where sub is a submodule in the root
directory of the supermodule's repo, we could match 'sub' to that spec and then
recursively call the git command with a pathspec of 'dir/a'.  The child process
would then have the responsibility of matching 'dir/a' to files in its repo.

Does this seem like a reasonable feature to add? And if so are how is my
initial approach at solving the problem?

One idea I had was to add a submodule match flag in order to perform special
matching just in the --recurse-submodules cases since we'll want somethings to
match here that wouldn't normally match.

@@ -283,6 +284,29 @@ static int match_pathspec_item(const struct pathspec_item 
*item, int prefix,
 item->nowildcard_len - prefix))
return MATCHED_FNMATCH;
 
+   /*
+* Preform some checks to see if "name" is a super set of the pathspec
+*/
+   if (flags & DO_MATCH_SUBMODULE) {
+   struct strbuf buf = STRBUF_INIT;
+   strbuf_addstr(&buf, name);
+   strbuf_addch(&buf, '/');
+   /*
+* Check if the name is a prefix of the pathspec
+*/
+   if ((item->match[namelen] == '/') &&
+   !ps_strncmp(item, match, name, namelen))
+   return MATCHED_RECURSIVELY;
+   /*
+* Check if the name wildmatches to the pathspec
+*/
+   if (!wildmatch(item->match, buf.buf,
+  WM_PREFIX |
+  (item->magic & PATHSPEC_ICASE ? WM_CASEFOLD : 0),
+  NULL));
+   return MATCHED_FNMATCH;
+   }
+
return 0;
 }
 
One of the main difficulties I was having is figuring out how wildmatching
should be applied in this case.  What I believe we want is the ability for the
whole name of the submodule to match a prefix of the pathspec pattern.  To do
this I was thinking of adding a flag to do prefix matching to the wildmatch
function like so: 


diff --git a/wildmatch.c b/wildmatch.c
index 57c8765..f1e1725 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -60,8 +60,12 @@ static int dowild(const uchar *p, const uchar *text, 
unsigned int flags)
for ( ; (p_ch = *p) != '\0'; text++, p++) {
int matched, match_slash, negated;
uchar t_ch, prev_ch;
-   if ((t_ch = *text) == '\0' && p_ch != '*')
-   return WM_ABORT_ALL;
+   if ((t_ch = *text) == '\0' && p_ch != '*') {
+   if ((flags & WM_PREFIX) && (*(p-1) == '/'))
+   return WM_MATCH;
+   else
+   return WM_ABORT_ALL;
+   }
if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
t_ch = tolower(t_ch);
if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
diff --git a/wildmatch.h b/wildmatch.h
index 4090c8f..490db51 100644
--- a/wildmatch.h
+++ b/wildmatch.h
@@ -3,6 +3,7 @@
 
 #define WM_CASEFOLD 1
 #define WM_PATHNAME 2
+#define WM_PREFIX 4
 
 #define WM_ABORT_MALFORMED 2
 #define WM_NOMATCH 1
-- 

Any comments or thoughts on this would be appreciated.

Thanks,
Brandon