Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-24 Thread Junio C Hamano
Duy Nguyen  writes:

> On Fri, Jan 25, 2013 at 1:22 AM, Junio C Hamano  wrote:
>> Duy Nguyen  writes:
>>
>>> I don't think we need to support two different sets of wildcards in
>>> the long run. I'm thinking of adding ":(glob)" with WM_PATHNAME.
>>> Pathspec without :(glob) still uses the current wildcards (i.e. no
>>> FNM_PATHNAME). At some point, like 2.0, we either switch the behavior
>>> of patterns-without-:(glob) to WM_PATHNAME, or just disable wildcards
>>> when :(glob) is not present.
>>
>> Yeah, I think that is sensible.
>>
>> I am meaning to merge your retire-fnmatch topic to 'master'.
>
> I thought you wanted it to stay in 'next' longer :-)

I only said "a bit longer than other topics", and the topic has been
cooking on 'next' for three weeks by now, which is a lot longer.  I
haven't been keeping exact tallies, but trivial ones graduate from
'next' to 'master' in 3 to 5 days, ones with medium complexity in 7
to 10 days on average, I think.

> That looks ok. You may want to mention that "**" syntax is enabled in
> .gitignore and .gitattributes as well (even without USE_WILDMATCH).

Yeah, I forgot about that behaviour, and it is a bit confusing
story.  You did that in 237ec6e (Support "**" wildcard in .gitignore
and .gitattributes, 2012-10-15).

c41244e (wildmatch: support "no FNM_PATHNAME" mode, 2013-01-01) that
is part of the retire-fnmatch topic, changes the behaviour of
wildmatch() with respect to /**/ magic drastically, but everything
cancels out in the end:

 - With the current master without nd/retire-fnmatch, wildmatch()
   always works in WM_PATHNAME mode wrt '/**/'; match_pathname()
   that is used for attr/ignore matching uses wildmatch() so a
   pattern "**/Makefile" matches "Makefile" at the top, affected by
   the "**/" magic;

 - After merging nd/retire-fnmatch, wildmatch() needs WM_PATHNAME
   passed in order to grok '/**/' magic, but match_pathname() is
   changed in the same commit to pass WM_PATHNAME, so the "**/"
   magic is retained for ignore/attr matching.

> We
> could even stop the behavior change in for-each-ref (FNM_PATHNAME in
> general) but I guess because it's a nice regression, nobody would
> complain.

That is not a "regression" (whose definition is "we inadvertently
changed the behaviour and fixed that once, but the breakage came
back").  It is a behaviour change that is backward incompatible.

I would agree that it is a nice behaviour change, though ;-)

Thanks.
 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-24 Thread Duy Nguyen
On Fri, Jan 25, 2013 at 1:22 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> I don't think we need to support two different sets of wildcards in
>> the long run. I'm thinking of adding ":(glob)" with WM_PATHNAME.
>> Pathspec without :(glob) still uses the current wildcards (i.e. no
>> FNM_PATHNAME). At some point, like 2.0, we either switch the behavior
>> of patterns-without-:(glob) to WM_PATHNAME, or just disable wildcards
>> when :(glob) is not present.
>
> Yeah, I think that is sensible.
>
> I am meaning to merge your retire-fnmatch topic to 'master'.

I thought you wanted it to stay in 'next' longer :-)

> What should the Release Notes say for the upcoming release?
>
> You can build with USE_WILDMATCH=YesPlease to use a replacement
> implementation of pattern matching logic used for pathname-like
> things, e.g.  refnames and paths in the repository.  The new
> implementation is not expected change the existing behaviour of
> Git at all, except for "git for-each-ref" where you can now say
> "refs/**/master" and match with both refs/heads/master and
> refs/remotes/origin/master.  In future versions of Git, we plan
> to use this new implementation in wider places (e.g. "git log
> '**/t*.sh'" may find commits that touch a shell script whose
> name begins with "t" at any level), but we are not there yet.
> By building with USE_WILDMATCH, using the resulting Git daily
> and reporting when you find breakages, you can help us get
> closer to that goal.

That looks ok. You may want to mention that "**" syntax is enabled in
.gitignore and .gitattributes as well (even without USE_WILDMATCH). We
could even stop the behavior change in for-each-ref (FNM_PATHNAME in
general) but I guess because it's a nice regression, nobody would
complain.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-24 Thread Junio C Hamano
Duy Nguyen  writes:

> I don't think we need to support two different sets of wildcards in
> the long run. I'm thinking of adding ":(glob)" with WM_PATHNAME.
> Pathspec without :(glob) still uses the current wildcards (i.e. no
> FNM_PATHNAME). At some point, like 2.0, we either switch the behavior
> of patterns-without-:(glob) to WM_PATHNAME, or just disable wildcards
> when :(glob) is not present.

Yeah, I think that is sensible.

I am meaning to merge your retire-fnmatch topic to 'master'.  What
should the Release Notes say for the upcoming release?  

You can build with USE_WILDMATCH=YesPlease to use a replacement
implementation of pattern matching logic used for pathname-like
things, e.g.  refnames and paths in the repository.  The new
implementation is not expected change the existing behaviour of
Git at all, except for "git for-each-ref" where you can now say
"refs/**/master" and match with both refs/heads/master and
refs/remotes/origin/master.  In future versions of Git, we plan
to use this new implementation in wider places (e.g. "git log
'**/t*.sh'" may find commits that touch a shell script whose
name begins with "t" at any level), but we are not there yet.
By building with USE_WILDMATCH, using the resulting Git daily
and reporting when you find breakages, you can help us get
closer to that goal.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-23 Thread Duy Nguyen
On Thu, Jan 24, 2013 at 11:49 AM, Junio C Hamano  wrote:
>> The only problem I see is, without the version string, there's no way
>> to know if "**" is supported. Old git versions will happily take "**"
>> and interpret as "*". When you advise someone to use "**" you might
>> need to add "check if you have this version of git". This problem does
>> not exist with pathspec magic like :(glob)
>
> OK, so what do we want to do when we do the real "USE_WILDMATCH"
> that is not the current experimental curiosity?  Use ":(wild)" or
> something?

I don't think we need to support two different sets of wildcards in
the long run. I'm thinking of adding ":(glob)" with WM_PATHNAME.
Pathspec without :(glob) still uses the current wildcards (i.e. no
FNM_PATHNAME). At some point, like 2.0, we either switch the behavior
of patterns-without-:(glob) to WM_PATHNAME, or just disable wildcards
when :(glob) is not present.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-23 Thread Junio C Hamano
Duy Nguyen  writes:

> If we do that, we need to do the same in tree_entry_interesting(). In
> other words, pathspec learns the new glob syntax. It's fine for an
> experimental flag like USE_WILDMATCH. But after fnmatch is replaced by
> wildmatch unconditionally (thus USE_WILDMATCH becomes obsolete), then
> what? Should people who expects new syntax with USE_WILDMATCH continue
> to have new syntax? How does a user know which syntax may be used in
> his friend's "git" binary?

Good point.

> On a related subject, I've been meaning to write a mail about the
> other use of patterns in git (e.g. in git-branch, git-tag,
> git-apply...) but never got around to. Some use without FNM_PATHNAME,
> some do and the document does not go into details about the
> differences. We might want to unify the syntax there too. It'll
> probably break backward compatibility so we can do the same when we
> switch pathspec syntax.

Right now, I think for-each-ref is the only one with FNM_PATHNAME.
With the experimental USE_WILDMATCH, "for-each-ref refs/**/master"
will give us what is naturally expected.  With a working wildmatch,
I think it probably makes sense to globally enable FNM_PATHNAME;
it would probably be nice if we could do so at Git 2.0 version bump
boundary, but I suspect we are not ready yet (as you pointed out,
there are still codepaths that need to be adjusted).

> The only problem I see is, without the version string, there's no way
> to know if "**" is supported. Old git versions will happily take "**"
> and interpret as "*". When you advise someone to use "**" you might
> need to add "check if you have this version of git". This problem does
> not exist with pathspec magic like :(glob)

OK, so what do we want to do when we do the real "USE_WILDMATCH"
that is not the current experimental curiosity?  Use ":(wild)" or
something?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-22 Thread Duy Nguyen
On Wed, Jan 23, 2013 at 5:51 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> We obviously do not want to set FNM_PATHNAME when we are not
>> substituting fnmatch() with wildmatch(), but I wonder if it may make
>> sense to unconditionally use WM_PATHNAME semantics when we build the
>> system with USE_WILDMATCH and calling wildmatch() in this codepath.
>> Users can always use "*/**/*" in place of "*" in their patterns
>> where they want to ignore directory boundaries.

If we do that, we need to do the same in tree_entry_interesting(). In
other words, pathspec learns the new glob syntax. It's fine for an
experimental flag like USE_WILDMATCH. But after fnmatch is replaced by
wildmatch unconditionally (thus USE_WILDMATCH becomes obsolete), then
what? Should people who expects new syntax with USE_WILDMATCH continue
to have new syntax? How does a user know which syntax may be used in
his friend's "git" binary?

On a related subject, I've been meaning to write a mail about the
other use of patterns in git (e.g. in git-branch, git-tag,
git-apply...) but never got around to. Some use without FNM_PATHNAME,
some do and the document does not go into details about the
differences. We might want to unify the syntax there too. It'll
probably break backward compatibility so we can do the same when we
switch pathspec syntax.

> Another possibility, which _might_ make more practical sense, is to
> update dowild() so that any pattern that has (^|/)**(/|$) in it
> implicitly turns the WM_PATHNAME flag on.  There is no reason for
> the user to feed it a double-asterisk unless it is an attempt to
> defeat some directory boundaries,

They may also put "**" by mistake (or realize they just put "**" but
too lazy to remove one asterisk).

> so we already know that the user
> expects WM_PATHNAME behaviour at that point.  Otherwise, the user
> would have simply said '*' instead, wouldn't he?

The only problem I see is, without the version string, there's no way
to know if "**" is supported. Old git versions will happily take "**"
and interpret as "*". When you advise someone to use "**" you might
need to add "check if you have this version of git". This problem does
not exist with pathspec magic like :(glob)

> I said "_might_" because it sounds a bit too magical to my taste.
> I dunno.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-22 Thread Junio C Hamano
Junio C Hamano  writes:

> We obviously do not want to set FNM_PATHNAME when we are not
> substituting fnmatch() with wildmatch(), but I wonder if it may make
> sense to unconditionally use WM_PATHNAME semantics when we build the
> system with USE_WILDMATCH and calling wildmatch() in this codepath.
> Users can always use "*/**/*" in place of "*" in their patterns
> where they want to ignore directory boundaries.

Another possibility, which _might_ make more practical sense, is to
update dowild() so that any pattern that has (^|/)**(/|$) in it
implicitly turns the WM_PATHNAME flag on.  There is no reason for
the user to feed it a double-asterisk unless it is an attempt to
defeat some directory boundaries, so we already know that the user
expects WM_PATHNAME behaviour at that point.  Otherwise, the user
would have simply said '*' instead, wouldn't he?

I said "_might_" because it sounds a bit too magical to my taste.
I dunno.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/10] wildmatch: fix "**" special case

2013-01-22 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> "**" is adjusted to only be effective when surrounded by slashes, in
> 40bbee0 (wildmatch: adjust "**" behavior - 2012-10-15). Except that
> the commit did it wrong:
>
> 1. when it checks for "the preceding slash unless ** is at the
>beginning", it compares to wrong pointer. It should have compared
>to the beginning of the pattern, not the text.

So should

git ls-files '**/Makefile'

list the Makefile at the top-level of the repository (I think it
should)?

But that does not seem to be working.

I think the callpath goes like this:

 match_pathspec()
  -> match_one()
-> fnmatch_icase()
  -> fnmatch()
   -> wildmatch()

and the problem is that the fnmatch_icase() call made by match_one()
always passes 0 as the value for flags.  Without WM_PATHNAME,
however, the underlying dowild() does not honor the "**/" magic.

We obviously do not want to set FNM_PATHNAME when we are not
substituting fnmatch() with wildmatch(), but I wonder if it may make
sense to unconditionally use WM_PATHNAME semantics when we build the
system with USE_WILDMATCH and calling wildmatch() in this codepath.
Users can always use "*/**/*" in place of "*" in their patterns
where they want to ignore directory boundaries.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html