RE: [PATCH 1 of 5] extdata: add extdatasource reader

2016-10-07 Thread De Mare, Mathias (Nokia - BE)


> -Original Message-
> From: Mercurial-devel [mailto:mercurial-devel-bounces@mercurial-
> scm.org] On Behalf Of Matt Mackall
> Sent: donderdag 6 oktober 2016 23:36
> To: mathias.dem...@gmail.com
> Cc: Mercurial-devel
> Subject: Re: [PATCH 1 of 5] extdata: add extdatasource reader
> 
> On Thu, 2016-10-06 at 07:52 +0200, Mathias De Maré wrote:
> > On Thu, Sep 22, 2016 at 8:21 PM, Matt Mackall <m...@selenic.com>
> wrote:
> >
> > >
> > > # HG changeset patch
> > > # User Matt Mackall <m...@selenic.com> # Date 1473794045 18000
> #
> > > Tue Sep 13 14:14:05 2016 -0500 # Node ID
> > > 19bf2776dfe39befdc479253e1e7d030b41c08f9
> > > # Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
> > > extdata: add extdatasource reader
> > >
> > > This adds basic support for extdata, a way to add external data
> > > sources for revsets and templates. An extdata data source is simply
> > > a list of lines of the form:
> > >
> > > []\n
> > >
> > > An extdata source is configured thusly:
> > >
> > > [extdata]
> > > name = 
> > >
> > > urls of the form shell: are launch shell commands to generate data.
> > >
> > This looks great!
> > We've been using a small extension to load an external file with tags
> > into one of our repositories. Do you think that could fit here as
> well?
> >
> > Something like this would be nice:
> > [extdata]
> > name = 
> > name.type = tags
> 
> Do you actually need it to be a 'tag' or is it good enough that it be a
> named identifier in the current union of namespaces?
> 
> Extdata as presently designed doesn't add new names for nodes to the
> namespaces.
> But having a way to import an arbitrary name mapping would indeed be
> handy. For instance, for having a mapping to other version control
> system identifiers.

An identifier would be fine. We currently use tags, but I guess it's not 
strictly required.
Our users mostly need to be able to do things like 'hg update someid' or 'hg 
diff -r someid -r someotherid'.

> 
> > name.checkrev =  > before loading the tags>
> 
> Can you explain more about checkrev?

One of our repositories uses about 1 identifiers (or tags, using our 
current extension). We only load these if a specific revision is present, to 
avoid the overhead on any other repositories.
I guess this could be a general command (which in our case would check for 
revision presence) to allow alternative checks.

> 
> --
> Mathematics is the supreme nostalgia of our time.
> 
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5] extdata: add extdatasource reader

2016-10-06 Thread Matt Mackall
On Thu, 2016-10-06 at 07:52 +0200, Mathias De Maré wrote:
> On Thu, Sep 22, 2016 at 8:21 PM, Matt Mackall  wrote:
> 
> > 
> > # HG changeset patch
> > # User Matt Mackall 
> > # Date 1473794045 18000
> > #  Tue Sep 13 14:14:05 2016 -0500
> > # Node ID 19bf2776dfe39befdc479253e1e7d030b41c08f9
> > # Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
> > extdata: add extdatasource reader
> > 
> > This adds basic support for extdata, a way to add external data
> > sources for revsets and templates. An extdata data source is simply a
> > list of lines of the form:
> > 
> > []\n
> > 
> > An extdata source is configured thusly:
> > 
> > [extdata]
> > name = 
> > 
> > urls of the form shell: are launch shell commands to generate data.
> > 
> This looks great!
> We've been using a small extension to load an external file with tags into
> one of our repositories. Do you think that could fit here as well?
> 
> Something like this would be nice:
> [extdata]
> name = 
> name.type = tags

Do you actually need it to be a 'tag' or is it good enough that it be a named
identifier in the current union of namespaces?

Extdata as presently designed doesn't add new names for nodes to the namespaces.
But having a way to import an arbitrary name mapping would indeed be handy. For
instance, for having a mapping to other version control system identifiers.

> name.checkrev =  loading the tags>

Can you explain more about checkrev?

-- 
Mathematics is the supreme nostalgia of our time.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5] extdata: add extdatasource reader

2016-10-05 Thread Mathias De Maré
On Thu, Sep 22, 2016 at 8:21 PM, Matt Mackall  wrote:

> # HG changeset patch
> # User Matt Mackall 
> # Date 1473794045 18000
> #  Tue Sep 13 14:14:05 2016 -0500
> # Node ID 19bf2776dfe39befdc479253e1e7d030b41c08f9
> # Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
> extdata: add extdatasource reader
>
> This adds basic support for extdata, a way to add external data
> sources for revsets and templates. An extdata data source is simply a
> list of lines of the form:
>
> []\n
>
> An extdata source is configured thusly:
>
> [extdata]
> name = 
>
> urls of the form shell: are launch shell commands to generate data.
>

This looks great!
We've been using a small extension to load an external file with tags into
one of our repositories. Do you think that could fit here as well?

Something like this would be nice:
[extdata]
name = 
name.type = tags
name.checkrev = 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5] extdata: add extdatasource reader

2016-09-25 Thread Yuya Nishihara
On Sat, 24 Sep 2016 17:34:20 -0500, Matt Mackall wrote:
> On Sat, 2016-09-24 at 10:17 +0900, Yuya Nishihara wrote:
> > On Thu, 22 Sep 2016 13:21:35 -0500, Matt Mackall wrote:
> > > 
> > > # HG changeset patch
> > > # User Matt Mackall 
> > > # Date 1473794045 18000
> > > #  Tue Sep 13 14:14:05 2016 -0500
> > > # Node ID 19bf2776dfe39befdc479253e1e7d030b41c08f9
> > > # Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
> > > extdata: add extdatasource reader
> > > 
> > > +def extdatasource(repo, source):
> > > +"""gather a map of rev -> value dict from the specified source
> > > +
> > > +A source spec is treated as a URL, with a special case shell: type
> > > +for parsing the output from a shell command.
> > > +
> > > +The data is parsed as a series of newline-separated records where
> > > +each record is a revision specifier optionally followed by a space
> > > +and a freeform string value. If the revision is known locally, it
> > > +is converted to a rev, otherwise the record is skipped.
> > > +
> > > +Note that both key and value are treated as UTF-8 and converted to
> > > +the local encoding. This allows uniformity between local and
> > > +remote data sources.
> > > +"""
> > > +
> > > +spec = repo.ui.config("extdata", source)
> > > +if not spec:
> > > +raise util.Abourt(_("unknown extdata source '%s'") % source)
> > > +
> > > +try:
> > > +# prepare for future expansion
> > > +expand = spec % ()
> > > +except TypeError:
> > > +raise error.Abort(_("extdata doesn't support parameters yet"),
> > > +  hint=_("use double % for escaping"))
> > Using '%' as a parameter symbol would be a bit unfortunate since URLs have
> > %HEX-escape syntax. How about using '$' or '{...}' ?
> 
> I picked it because it's how Firefox search URLs work. Was in my RFC.

It appears that Firefox uses '{}' these days. I remember '%' was used in
old Firefox (or maybe Netscape) though.

https://developer.mozilla.org/en/Add-ons/Creating_OpenSearch_plugins_for_Firefox

> > > +os.chdir(cwd)
> > Temporary chdir() may cause problems in hgweb, which can be threaded.
> 
> True. But we've had the same problem with hooks for a decade.

Hooks are run by subprocess, so only child process will chdir to repo.root.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5] extdata: add extdatasource reader

2016-09-23 Thread Augie Fackler
On Thu, Sep 22, 2016 at 06:20:03PM -0500, Kevin Bullock wrote:
> > On Sep 22, 2016, at 13:21, Matt Mackall  wrote:
> >
> > # HG changeset patch
> > # User Matt Mackall 
> > # Date 1473794045 18000
> > #  Tue Sep 13 14:14:05 2016 -0500
> > # Node ID 19bf2776dfe39befdc479253e1e7d030b41c08f9
> > # Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
> > extdata: add extdatasource reader
> >
[...]
> > +"""
> > +
> > +spec = repo.ui.config("extdata", source)
> > +if not spec:
> > +raise util.Abourt(_("unknown extdata source '%s'") % source)
>
> Typo here. I suppose there's no great way to test against this.

Actually something like pytype or mypy should let us catch stuff like
this. I want to carve out some time to try and find a way to get one
of those tools running on our code regularly.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5] extdata: add extdatasource reader

2016-09-23 Thread Kevin Bullock
> On Sep 23, 2016, at 12:47, Matt Mackall  wrote:
> 
> On Thu, 2016-09-22 at 18:20 -0500, Kevin Bullock wrote:
>>> 
>>> On Sep 22, 2016, at 13:21, Matt Mackall  wrote:
[...]
>>> +try:
>>> +src = util.popen(cmd)
>> Erm, don't we want to use util.popen2 or one of the other variants that use
>> subprocess instead?
> 
> The universal advantages of subprocess are overstated. For the simple task of
> reading stdout from a subprocess, util.popen is perfectly suited. If it 
> wasn't..
> we'd fix util.popen.

Related to my reply below: if we use popen, does stderr not get captured? I.e. 
will the user see the stderr output in their terminal?

>> ...and maybe handle ENOENT gracefully?
> 
> We can't, because cmd is an arbitrary shell expression.

I mean that it would be nice to inform a user somehow that their arbitrary 
shell expression failed and what the error was (ENOENT meaning "command not 
found" in this case).

pacem in terris / мир / शान्ति / ‎‫سَلاَم‬ / 平和
Kevin R. Bullock

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5] extdata: add extdatasource reader

2016-09-23 Thread Matt Mackall
On Thu, 2016-09-22 at 18:20 -0500, Kevin Bullock wrote:
> > 
> > On Sep 22, 2016, at 13:21, Matt Mackall  wrote:
> > 
> > # HG changeset patch
> > # User Matt Mackall 
> > # Date 1473794045 18000
> > #  Tue Sep 13 14:14:05 2016 -0500
> > # Node ID 19bf2776dfe39befdc479253e1e7d030b41c08f9
> > # Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
> > extdata: add extdatasource reader
> > 
> > This adds basic support for extdata, a way to add external data
> > sources for revsets and templates. An extdata data source is simply a
> > list of lines of the form:
> > 
> > []\n
> > 
> > An extdata source is configured thusly:
> > 
> > [extdata]
> > name = 
> > 
> > urls of the form shell: are launch shell commands to generate data.
> > 
> > diff -r 5271ae666152 -r 19bf2776dfe3 mercurial/scmutil.py
> > --- a/mercurial/scmutil.py  Wed Sep 21 17:05:27 2016 -0400
> > +++ b/mercurial/scmutil.py  Tue Sep 13 14:14:05 2016 -0500
> > @@ -29,6 +29,7 @@
> > phases,
> > revset,
> > similar,
> > +url,
> > util,
> > )
> > 
> > @@ -1418,3 +1419,66 @@
> > return
> > 
> > self._queue.put(fh, block=True, timeout=None)
> > +
> > +def extdatasources(repo):
> > +sources = set()
> > +for k, v in repo.ui.configitems("extdata"):
> > +sources.add(k)
> > +return sources
> > +
> > +def extdatasource(repo, source):
> > +"""gather a map of rev -> value dict from the specified source
> > +
> > +A source spec is treated as a URL, with a special case shell: type
> > +for parsing the output from a shell command.
> > +
> > +The data is parsed as a series of newline-separated records where
> > +each record is a revision specifier optionally followed by a space
> > +and a freeform string value. If the revision is known locally, it
> > +is converted to a rev, otherwise the record is skipped.
> > +
> > +Note that both key and value are treated as UTF-8 and converted to
> > +the local encoding. This allows uniformity between local and
> > +remote data sources.
> That's a bit unfortunate. If we're expecting them to be read as UTF-8, can't
> we just keep them in UTF-8 all the way thru?

We always work internally in the local encoding. Sane local encodings include:
utf-8. We've normally got two strategies: files owned by users and stored in
local encoding (hgrc) and files owned by Mercurial and stored in utf-8
(bookmarks). So this note is to point out that this is different from the usual
case.. because URLs might (or might not) be remote, shared resources that need
to be in utf-8 for portability.

> > 
> > +"""
> > +
> > +spec = repo.ui.config("extdata", source)
> > +if not spec:
> > +raise util.Abourt(_("unknown extdata source '%s'") % source)
> Typo here. I suppose there's no great way to test against this.

Indeed, both of the current callers prevent this code from being reached.

> > 
> > +
> > +try:
> > +# prepare for future expansion
> > +expand = spec % ()
> > +except TypeError:
> > +raise error.Abort(_("extdata doesn't support parameters yet"),
> > +  hint=_("use double % for escaping"))
> > +
> > +data = {}
> > +if spec.startswith("shell:"):
> > +# external commands should be run relative to the repo root
> > +cmd = spec[6:]
> > +cwd = os.getcwd()
> > +os.chdir(repo.root)
> > +try:
> > +src = util.popen(cmd)
> Erm, don't we want to use util.popen2 or one of the other variants that use
> subprocess instead?

The universal advantages of subprocess are overstated. For the simple task of
reading stdout from a subprocess, util.popen is perfectly suited. If it wasn't..
we'd fix util.popen.

> ...and maybe handle ENOENT gracefully?

We can't, because cmd is an arbitrary shell expression.

> pacem in terris / мир / शान्ति / ‎‫سَلاَم‬ / 平和
> Kevin R. Bullock
-- 
Mathematics is the supreme nostalgia of our time.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5] extdata: add extdatasource reader

2016-09-22 Thread Kevin Bullock
> On Sep 22, 2016, at 13:21, Matt Mackall  wrote:
> 
> # HG changeset patch
> # User Matt Mackall 
> # Date 1473794045 18000
> #  Tue Sep 13 14:14:05 2016 -0500
> # Node ID 19bf2776dfe39befdc479253e1e7d030b41c08f9
> # Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
> extdata: add extdatasource reader
> 
> This adds basic support for extdata, a way to add external data
> sources for revsets and templates. An extdata data source is simply a
> list of lines of the form:
> 
> []\n
> 
> An extdata source is configured thusly:
> 
> [extdata]
> name = 
> 
> urls of the form shell: are launch shell commands to generate data.
> 
> diff -r 5271ae666152 -r 19bf2776dfe3 mercurial/scmutil.py
> --- a/mercurial/scmutil.pyWed Sep 21 17:05:27 2016 -0400
> +++ b/mercurial/scmutil.pyTue Sep 13 14:14:05 2016 -0500
> @@ -29,6 +29,7 @@
> phases,
> revset,
> similar,
> +url,
> util,
> )
> 
> @@ -1418,3 +1419,66 @@
> return
> 
> self._queue.put(fh, block=True, timeout=None)
> +
> +def extdatasources(repo):
> +sources = set()
> +for k, v in repo.ui.configitems("extdata"):
> +sources.add(k)
> +return sources
> +
> +def extdatasource(repo, source):
> +"""gather a map of rev -> value dict from the specified source
> +
> +A source spec is treated as a URL, with a special case shell: type
> +for parsing the output from a shell command.
> +
> +The data is parsed as a series of newline-separated records where
> +each record is a revision specifier optionally followed by a space
> +and a freeform string value. If the revision is known locally, it
> +is converted to a rev, otherwise the record is skipped.
> +
> +Note that both key and value are treated as UTF-8 and converted to
> +the local encoding. This allows uniformity between local and
> +remote data sources.

That's a bit unfortunate. If we're expecting them to be read as UTF-8, can't we 
just keep them in UTF-8 all the way thru?

> +"""
> +
> +spec = repo.ui.config("extdata", source)
> +if not spec:
> +raise util.Abourt(_("unknown extdata source '%s'") % source)

Typo here. I suppose there's no great way to test against this.

> +
> +try:
> +# prepare for future expansion
> +expand = spec % ()
> +except TypeError:
> +raise error.Abort(_("extdata doesn't support parameters yet"),
> +  hint=_("use double % for escaping"))
> +
> +data = {}
> +if spec.startswith("shell:"):
> +# external commands should be run relative to the repo root
> +cmd = spec[6:]
> +cwd = os.getcwd()
> +os.chdir(repo.root)
> +try:
> +src = util.popen(cmd)

Erm, don't we want to use util.popen2 or one of the other variants that use 
subprocess instead?

...and maybe handle ENOENT gracefully?

pacem in terris / мир / शान्ति / ‎‫سَلاَم‬ / 平和
Kevin R. Bullock

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 5] extdata: add extdatasource reader

2016-09-22 Thread Matt Mackall
# HG changeset patch
# User Matt Mackall 
# Date 1473794045 18000
#  Tue Sep 13 14:14:05 2016 -0500
# Node ID 19bf2776dfe39befdc479253e1e7d030b41c08f9
# Parent  5271ae66615207f39cc41d78f4541bc6f8ca6ff6
extdata: add extdatasource reader

This adds basic support for extdata, a way to add external data
sources for revsets and templates. An extdata data source is simply a
list of lines of the form:

[]\n

An extdata source is configured thusly:

[extdata]
name = 

urls of the form shell: are launch shell commands to generate data.

diff -r 5271ae666152 -r 19bf2776dfe3 mercurial/scmutil.py
--- a/mercurial/scmutil.py  Wed Sep 21 17:05:27 2016 -0400
+++ b/mercurial/scmutil.py  Tue Sep 13 14:14:05 2016 -0500
@@ -29,6 +29,7 @@
 phases,
 revset,
 similar,
+url,
 util,
 )
 
@@ -1418,3 +1419,66 @@
 return
 
 self._queue.put(fh, block=True, timeout=None)
+
+def extdatasources(repo):
+sources = set()
+for k, v in repo.ui.configitems("extdata"):
+sources.add(k)
+return sources
+
+def extdatasource(repo, source):
+"""gather a map of rev -> value dict from the specified source
+
+A source spec is treated as a URL, with a special case shell: type
+for parsing the output from a shell command.
+
+The data is parsed as a series of newline-separated records where
+each record is a revision specifier optionally followed by a space
+and a freeform string value. If the revision is known locally, it
+is converted to a rev, otherwise the record is skipped.
+
+Note that both key and value are treated as UTF-8 and converted to
+the local encoding. This allows uniformity between local and
+remote data sources.
+"""
+
+spec = repo.ui.config("extdata", source)
+if not spec:
+raise util.Abourt(_("unknown extdata source '%s'") % source)
+
+try:
+# prepare for future expansion
+expand = spec % ()
+except TypeError:
+raise error.Abort(_("extdata doesn't support parameters yet"),
+  hint=_("use double % for escaping"))
+
+data = {}
+if spec.startswith("shell:"):
+# external commands should be run relative to the repo root
+cmd = spec[6:]
+cwd = os.getcwd()
+os.chdir(repo.root)
+try:
+src = util.popen(cmd)
+finally:
+os.chdir(cwd)
+else:
+# treat as a URL or file
+src = url.open(repo.ui, spec)
+
+try:
+for l in src.readlines():
+if " " in l:
+k, v = l.strip().split(" ", 1)
+else:
+k, v = l.strip(), ""
+
+k = encoding.tolocal(k)
+if k in repo:
+# we ignore data for nodes that don't exist locally
+data[repo[k].rev()] = encoding.tolocal(v)
+finally:
+src.close()
+
+return data
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel