Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash
On Thu, 15 Feb 2018 22:05:47 -0500, Matt Harbison wrote: > >> Basically, I spent some time last week writing ignore rules for some > >> converted repos, and got into the habit of appending a trailing '/' to > >> ensure the match is a directory, and not just a substring. When I did > >> that here, it took awhile to figure out why the path was being > >> ignored. ('path:' only matches directories) > >> > >> > Can't we reuse some parts of the match module to build a function or > >> regexp > >> > from a pattern string? > >> > >> Probably. I’ve seen a couple cases where a regex pattern would be > >> useful. I just assumed those other match types were part of the > >> performance concern that was the reason for splitting out the mini > >> language in the first place. > > > > (CC Jun) > > > > I think the O(n) concern came from how fileset filters n-length list, not > > from the matcher function itself. > > Unless I'm missing something, the only time patternmatcher walks ctx is if > there's a 'set:' kind. Perhaps. And we can effectively disable 'set:' by not passing ctx to matcher. 57d6c0c74b1b could be partially backed out if we want to handle unsupported 'set:' in matcher. > So if we filter that out that, the relative kinds > (except relglob), and 'subinclude:', I don't see why we can't create one > of those to build the match function. That would allow regex, > rootfilesin, and (rel)glob support too. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash
On Tue, 13 Feb 2018 08:38:27 -0500, Yuya Nishihara wrote: On Tue, 13 Feb 2018 07:58:50 -0500, Matt Harbison wrote: > On Feb 13, 2018, at 6:27 AM, Yuya Nishihara wrote: > >> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote: >> # HG changeset patch >> # User Matt Harbison >> # Date 1518488713 18000 >> # Mon Feb 12 21:25:13 2018 -0500 >> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743 >> # Parent 9b5df6e19a4f308e14703a8136cd0530c1e1d1a9 >> minifileset: allow 'path:' patterns to have an explicit trailing slash >> >> We allow for it on the command line, with `hg status`, for example. I thought >> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" pattern >> from somewhere, but I don't see it now. >> >> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py >> --- a/mercurial/minifileset.py >> +++ b/mercurial/minifileset.py >> @@ -26,7 +26,7 @@ >> raise error.ParseError(_('reserved character: %s') % c) >> return lambda n, s: n.endswith(ext) >> elif name.startswith('path:'): # directory or full path test >> -p = name[5:] # prefix >> +p = name[5:] if name[-1] != '/' else name[5:-1] # prefix > > Doesn't it mean 'a/' matches 'a'? Yes. (But 'a' won’t match 'ab'.) Ugh, I never thought 'path:hg/' would match the file 'hg', but it does probably because of util.normpath(). % hg debugwalk 'path:hg/' matcher: f hg hg exact Hmm, that surprises me too. It does look like util.normpath() is involved: diff --git a/mercurial/match.py b/mercurial/match.py --- a/mercurial/match.py +++ b/mercurial/match.py @@ -199,7 +199,10 @@ if kind in cwdrelativepatternkinds: pat = pathutil.canonpath(root, cwd, pat, auditor) elif kind in ('relglob', 'path', 'rootfilesin'): +isdir = pat[-1] in br'\/' pat = util.normpath(pat) +if isdir: +pat += b'/' elif kind in ('listfile', 'listfile0'): try: files = util.readfile(pat) $ ../hg debugwalk 'path:hg/' matcher: ..\hg\: The filename, directory name, or volume label syntax is incorrect (Though that pattern looks like it would have problems matching against files in an actual directory. In practice, it could walk tests/, though it double slashed the first separator.) Perhaps applying normpath() would look saner than testing if name[-1] == '/'. Yeah, that was definitely a micro optimization. Basically, I spent some time last week writing ignore rules for some converted repos, and got into the habit of appending a trailing '/' to ensure the match is a directory, and not just a substring. When I did that here, it took awhile to figure out why the path was being ignored. ('path:' only matches directories) > Can't we reuse some parts of the match module to build a function or regexp > from a pattern string? Probably. I’ve seen a couple cases where a regex pattern would be useful. I just assumed those other match types were part of the performance concern that was the reason for splitting out the mini language in the first place. (CC Jun) I think the O(n) concern came from how fileset filters n-length list, not from the matcher function itself. Unless I'm missing something, the only time patternmatcher walks ctx is if there's a 'set:' kind. So if we filter that out that, the relative kinds (except relglob), and 'subinclude:', I don't see why we can't create one of those to build the match function. That would allow regex, rootfilesin, and (rel)glob support too. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash
On Tue, 13 Feb 2018 07:58:50 -0500, Matt Harbison wrote: > > > On Feb 13, 2018, at 6:27 AM, Yuya Nishihara wrote: > > > >> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote: > >> # HG changeset patch > >> # User Matt Harbison > >> # Date 1518488713 18000 > >> # Mon Feb 12 21:25:13 2018 -0500 > >> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743 > >> # Parent 9b5df6e19a4f308e14703a8136cd0530c1e1d1a9 > >> minifileset: allow 'path:' patterns to have an explicit trailing slash > >> > >> We allow for it on the command line, with `hg status`, for example. I > >> thought > >> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" > >> pattern > >> from somewhere, but I don't see it now. > >> > >> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py > >> --- a/mercurial/minifileset.py > >> +++ b/mercurial/minifileset.py > >> @@ -26,7 +26,7 @@ > >> raise error.ParseError(_('reserved character: %s') % c) > >> return lambda n, s: n.endswith(ext) > >> elif name.startswith('path:'): # directory or full path test > >> -p = name[5:] # prefix > >> +p = name[5:] if name[-1] != '/' else name[5:-1] # prefix > > > > Doesn't it mean 'a/' matches 'a'? > > Yes. (But 'a' won’t match 'ab'.) Ugh, I never thought 'path:hg/' would match the file 'hg', but it does probably because of util.normpath(). % hg debugwalk 'path:hg/' matcher: f hg hg exact Perhaps applying normpath() would look saner than testing if name[-1] == '/'. > Basically, I spent some time last week writing ignore rules for some > converted repos, and got into the habit of appending a trailing '/' to ensure > the match is a directory, and not just a substring. When I did that here, it > took awhile to figure out why the path was being ignored. ('path:' only > matches directories) > > > Can't we reuse some parts of the match module to build a function or regexp > > from a pattern string? > > Probably. I’ve seen a couple cases where a regex pattern would be useful. I > just assumed those other match types were part of the performance concern > that was the reason for splitting out the mini language in the first place. (CC Jun) I think the O(n) concern came from how fileset filters n-length list, not from the matcher function itself. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash
> On Feb 13, 2018, at 6:27 AM, Yuya Nishihara wrote: > >> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote: >> # HG changeset patch >> # User Matt Harbison >> # Date 1518488713 18000 >> # Mon Feb 12 21:25:13 2018 -0500 >> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743 >> # Parent 9b5df6e19a4f308e14703a8136cd0530c1e1d1a9 >> minifileset: allow 'path:' patterns to have an explicit trailing slash >> >> We allow for it on the command line, with `hg status`, for example. I >> thought >> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" >> pattern >> from somewhere, but I don't see it now. >> >> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py >> --- a/mercurial/minifileset.py >> +++ b/mercurial/minifileset.py >> @@ -26,7 +26,7 @@ >> raise error.ParseError(_('reserved character: %s') % c) >> return lambda n, s: n.endswith(ext) >> elif name.startswith('path:'): # directory or full path test >> -p = name[5:] # prefix >> +p = name[5:] if name[-1] != '/' else name[5:-1] # prefix > > Doesn't it mean 'a/' matches 'a'? Yes. (But 'a' won’t match 'ab'.) Basically, I spent some time last week writing ignore rules for some converted repos, and got into the habit of appending a trailing '/' to ensure the match is a directory, and not just a substring. When I did that here, it took awhile to figure out why the path was being ignored. ('path:' only matches directories) > Can't we reuse some parts of the match module to build a function or regexp > from a pattern string? Probably. I’ve seen a couple cases where a regex pattern would be useful. I just assumed those other match types were part of the performance concern that was the reason for splitting out the mini language in the first place. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] minifileset: allow 'path:' patterns to have an explicit trailing slash
On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote: > # HG changeset patch > # User Matt Harbison > # Date 1518488713 18000 > # Mon Feb 12 21:25:13 2018 -0500 > # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743 > # Parent 9b5df6e19a4f308e14703a8136cd0530c1e1d1a9 > minifileset: allow 'path:' patterns to have an explicit trailing slash > > We allow for it on the command line, with `hg status`, for example. I thought > that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" > pattern > from somewhere, but I don't see it now. > > diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py > --- a/mercurial/minifileset.py > +++ b/mercurial/minifileset.py > @@ -26,7 +26,7 @@ > raise error.ParseError(_('reserved character: %s') % c) > return lambda n, s: n.endswith(ext) > elif name.startswith('path:'): # directory or full path test > -p = name[5:] # prefix > +p = name[5:] if name[-1] != '/' else name[5:-1] # prefix Doesn't it mean 'a/' matches 'a'? Can't we reuse some parts of the match module to build a function or regexp from a pattern string? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel