Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash

2018-02-18 Thread Yuya Nishihara
On Thu, 15 Feb 2018 22:05:47 -0500, Matt Harbison wrote:
> >> Basically, I spent some time last week writing ignore rules for some  
> >> converted repos, and got into the habit of appending a trailing '/' to  
> >> ensure the match is a directory, and not just a substring.  When I did  
> >> that here, it took awhile to figure out why the path was being  
> >> ignored.  ('path:' only matches directories)
> >>
> >> > Can't we reuse some parts of the match module to build a function or  
> >> regexp
> >> > from a pattern string?
> >>
> >> Probably.  I’ve seen a couple cases where a regex pattern would be  
> >> useful.  I just assumed those other match types were part of the  
> >> performance concern that was the reason for splitting out the mini  
> >> language in the first place.
> >
> > (CC Jun)
> >
> > I think the O(n) concern came from how fileset filters n-length list, not
> > from the matcher function itself.
> 
> Unless I'm missing something, the only time patternmatcher walks ctx is if
> there's a 'set:' kind.

Perhaps. And we can effectively disable 'set:' by not passing ctx to matcher.
57d6c0c74b1b could be partially backed out if we want to handle unsupported
'set:' in matcher.

> So if we filter that out that, the relative kinds
> (except relglob), and 'subinclude:', I don't see why we can't create one  
> of those to build the match function.  That would allow regex,  
> rootfilesin, and (rel)glob support too.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash

2018-02-15 Thread Matt Harbison

On Tue, 13 Feb 2018 08:38:27 -0500, Yuya Nishihara  wrote:


On Tue, 13 Feb 2018 07:58:50 -0500, Matt Harbison wrote:


> On Feb 13, 2018, at 6:27 AM, Yuya Nishihara  wrote:
>
>> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1518488713 18000
>> #  Mon Feb 12 21:25:13 2018 -0500
>> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
>> # Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
>> minifileset: allow 'path:' patterns to have an explicit trailing  
slash

>>
>> We allow for it on the command line, with `hg status`, for example.   
I thought
>> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] ==  
'/')" pattern

>> from somewhere, but I don't see it now.
>>
>> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
>> --- a/mercurial/minifileset.py
>> +++ b/mercurial/minifileset.py
>> @@ -26,7 +26,7 @@
>> raise error.ParseError(_('reserved character:  
%s') % c)

>> return lambda n, s: n.endswith(ext)
>> elif name.startswith('path:'): # directory or full path test
>> -p = name[5:] # prefix
>> +p = name[5:] if name[-1] != '/' else name[5:-1] # prefix
>
> Doesn't it mean 'a/' matches 'a'?

Yes. (But 'a' won’t match 'ab'.)


Ugh, I never thought 'path:hg/' would match the file 'hg', but it does
probably because of util.normpath().

  % hg debugwalk 'path:hg/'
  matcher: 
  f  hg  hg  exact


Hmm, that surprises me too.  It does look like util.normpath() is involved:

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -199,7 +199,10 @@
 if kind in cwdrelativepatternkinds:
 pat = pathutil.canonpath(root, cwd, pat, auditor)
 elif kind in ('relglob', 'path', 'rootfilesin'):
+isdir = pat[-1] in br'\/'
 pat = util.normpath(pat)
+if isdir:
+pat += b'/'
 elif kind in ('listfile', 'listfile0'):
 try:
 files = util.readfile(pat)

$ ../hg debugwalk 'path:hg/'
matcher: 
..\hg\: The filename, directory name, or volume label syntax is incorrect

(Though that pattern looks like it would have problems matching against  
files in an actual directory.  In practice, it could walk tests/, though  
it double slashed the first separator.)


Perhaps applying normpath() would look saner than testing if name[-1] ==  
'/'.


Yeah, that was definitely a micro optimization.

Basically, I spent some time last week writing ignore rules for some  
converted repos, and got into the habit of appending a trailing '/' to  
ensure the match is a directory, and not just a substring.  When I did  
that here, it took awhile to figure out why the path was being  
ignored.  ('path:' only matches directories)


> Can't we reuse some parts of the match module to build a function or  
regexp

> from a pattern string?

Probably.  I’ve seen a couple cases where a regex pattern would be  
useful.  I just assumed those other match types were part of the  
performance concern that was the reason for splitting out the mini  
language in the first place.


(CC Jun)

I think the O(n) concern came from how fileset filters n-length list, not
from the matcher function itself.


Unless I'm missing something, the only time patternmatcher walks ctx is if  
there's a 'set:' kind.  So if we filter that out that, the relative kinds  
(except relglob), and 'subinclude:', I don't see why we can't create one  
of those to build the match function.  That would allow regex,  
rootfilesin, and (rel)glob support too.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash

2018-02-13 Thread Matt Harbison

> On Feb 13, 2018, at 6:27 AM, Yuya Nishihara  wrote:
> 
>> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1518488713 18000
>> #  Mon Feb 12 21:25:13 2018 -0500
>> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
>> # Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
>> minifileset: allow 'path:' patterns to have an explicit trailing slash
>> 
>> We allow for it on the command line, with `hg status`, for example.  I 
>> thought
>> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" 
>> pattern
>> from somewhere, but I don't see it now.
>> 
>> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
>> --- a/mercurial/minifileset.py
>> +++ b/mercurial/minifileset.py
>> @@ -26,7 +26,7 @@
>> raise error.ParseError(_('reserved character: %s') % c)
>> return lambda n, s: n.endswith(ext)
>> elif name.startswith('path:'): # directory or full path test
>> -p = name[5:] # prefix
>> +p = name[5:] if name[-1] != '/' else name[5:-1] # prefix
> 
> Doesn't it mean 'a/' matches 'a'?

Yes. (But 'a' won’t match 'ab'.)  Basically, I spent some time last week 
writing ignore rules for some converted repos, and got into the habit of 
appending a trailing '/' to ensure the match is a directory, and not just a 
substring.  When I did that here, it took awhile to figure out why the path was 
being ignored.  ('path:' only matches directories)

> Can't we reuse some parts of the match module to build a function or regexp
> from a pattern string?

Probably.  I’ve seen a couple cases where a regex pattern would be useful.  I 
just assumed those other match types were part of the performance concern that 
was the reason for splitting out the mini language in the first place.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash

2018-02-12 Thread Matt Harbison
# HG changeset patch
# User Matt Harbison 
# Date 1518488713 18000
#  Mon Feb 12 21:25:13 2018 -0500
# Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
# Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
minifileset: allow 'path:' patterns to have an explicit trailing slash

We allow for it on the command line, with `hg status`, for example.  I thought
that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" pattern
from somewhere, but I don't see it now.

diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
--- a/mercurial/minifileset.py
+++ b/mercurial/minifileset.py
@@ -26,7 +26,7 @@
 raise error.ParseError(_('reserved character: %s') % c)
 return lambda n, s: n.endswith(ext)
 elif name.startswith('path:'): # directory or full path test
-p = name[5:] # prefix
+p = name[5:] if name[-1] != '/' else name[5:-1] # prefix
 pl = len(p)
 f = lambda n, s: n.startswith(p) and (len(n) == pl or n[pl] == '/')
 return f
diff --git a/tests/test-minifileset.py b/tests/test-minifileset.py
--- a/tests/test-minifileset.py
+++ b/tests/test-minifileset.py
@@ -25,6 +25,8 @@
 check('"path:a" & (**.b | **.c)', [('a/b.b', 0), ('a/c.c', 0)], [('b/c.c', 0)])
 check('(path:a & **.b) | **.c',
   [('a/b.b', 0), ('a/c.c', 0), ('b/c.c', 0)], [])
+check('path:a/', [('a/b.b', 0), ('a/c.c', 0)], [('ab/c.c', 0)])
+check('path:a', [('a/b.b', 0), ('a/c.c', 0)], [('ab/c.c', 0)])
 
 check('**.bin - size("<20B")', [('b.bin', 21)], [('a.bin', 11), ('b.txt', 21)])
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel