[PATCH 07 of 10 V2] util: add a stream compression API to compression engines

2016-11-07 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478573827 28800
#  Mon Nov 07 18:57:07 2016 -0800
# Node ID fa24595b79b603ff7be6f32b849c07ddfdee3da4
# Parent  8672777162085c92b836ce1e97ca254734b0fae0
util: add a stream compression API to compression engines

It is a common pattern throughout the code to perform compression
on an iterator of chunks, yielding an iterator of compressed chunks.
Let's formalize that as part of the compression engine API.

The zlib and bzip2 implementations allow an optional "level" option
to control the compression level. The default values are the same as
what the Python modules use. This option will be used in subsequent
patches.

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2966,10 +2966,22 @@ class compressionengine(object):
 exclude the name from external usage, set the first element to 
``None``.
 
 If bundle compression is supported, the class must also implement
-``compressorobj`` and `decompressorreader``.
+``compressstream``, ``compressorobj`` and `decompressorreader``.
 """
 return None
 
+def compressstream(self, it, opts=None):
+"""Compress an iterator of chunks.
+
+The method receives an iterator (ideally a generator) of chunks of
+bytes to be compressed. It returns an iterator (ideally a generator)
+of bytes of chunks representing the compressed output.
+
+Optionally accepts an argument defining how to perform compression.
+Each engine treats this argument differently.
+"""
+raise NotImplementedError()
+
 def compressorobj(self):
 """(Temporary) Obtain an object used for compression.
 
@@ -2997,6 +3009,19 @@ class _zlibengine(compressionengine):
 def compressorobj(self):
 return zlib.compressobj()
 
+def compressstream(self, it, opts=None):
+opts = opts or {}
+
+z = zlib.compressobj(opts.get('level', -1))
+for chunk in it:
+data = z.compress(chunk)
+# Not all calls to compress emit data. It is cheaper to inspect
+# here than to feed empty chunks through generator.
+if data:
+yield data
+
+yield z.flush()
+
 def decompressorreader(self, fh):
 def gen():
 d = zlib.decompressobj()
@@ -3017,6 +3042,16 @@ class _bz2engine(compressionengine):
 def compressorobj(self):
 return bz2.BZ2Compressor()
 
+def compressstream(self, it, opts=None):
+opts = opts or {}
+z = bz2.BZ2Compressor(opts.get('level', 9))
+for chunk in it:
+data = z.compress(chunk)
+if data:
+yield data
+
+yield z.flush()
+
 def decompressorreader(self, fh):
 def gen():
 d = bz2.BZ2Decompressor()
@@ -3065,6 +3100,9 @@ class _noopengine(compressionengine):
 def compressorobj(self):
 return nocompress()
 
+def compressstream(self, it, opts=None):
+return it
+
 def decompressorreader(self, fh):
 return fh
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] commands: introduce `hg display`

2016-11-07 Thread Gregory Szorc
On Mon, Nov 7, 2016 at 2:03 AM, Denis Laxalde 
wrote:

> Gregory Szorc a écrit :
>
>> For the command name, we would have preferred `hg show` because it is
>> shorter and not ambigious with any other core command. However, a
>> number of people have created `hg show` as effectively an alias to
>> `hg export`. And, some were concerned that Git users used to `git show`
>> being equivalent to `hg export` would be confused by a `hg show` doing
>> something different.
>>
>
> `git show` is not equivalent to `hg export`, quoting git-show(1):
>
>Shows one or more objects (blobs, trees, tags and commits).
>
>For commits it shows the log message and textual diff. It also
>presents the merge commit in a special format as produced by git
>diff-tree --cc.
>
>For tags, it shows the tag message and the referenced objects.
>
>For trees, it shows the names (equivalent to git ls-tree with
>--name-only).
>
>For plain blobs, it shows the plain contents.
>

TIL. I've only ever used `git show` for the "show a commit representation"
use case and `git cat-file` for displaying low-level objects.


>
> So only the first case is equivalent to `hg export` (or probably more
> `hg log -vpr`). Other cases are quite close to the "view" concept
> introduced here, as far as I understand.
>
> Then if a revision can be registered as a view, `hg show` could just be
> a plain replacement to the aforementioned alias I guess.
>
> Given this and the conflict with `hg diff`, could we reconsider
> the command name?
>

That is an interesting proposal. But I'm concerned with overlapping
namespaces. What values do we allow for the non-view behavior? Hash
fragments? Names (bookmarks, branches, tags)? If we allow names, what
happens when a name in a repo conflicts with a registered view name? What
happens if a view name conflicts with a changeset prefix? Of course, to
know if there is a collision you have to load names. That means (slightly
more) overhead to run the command.

FWIW, my idea for this command was to show representations of multiple
things. I'm willing to entertain the idea of "show me single entity X"
(changeset, tag, bookmark, etc). The easy solution is an argument to a view
(`hg display tag my-tag`). Things get harder when we merge namespaces.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] commands: introduce `hg display`

2016-11-07 Thread Gregory Szorc
On Sun, Nov 6, 2016 at 1:52 AM, timeless  wrote:

> Gregory Szorc wrote:
> > @@ -2019,6 +2026,13 @@ Dish up an empty repo; serve it cold.
> >diff repository (or selected files)
> >
> >
> > +  
> > +  display
> > +  
> > +  
> > +  show various repository information
> > +  
> > +  
> >
> >export
> >
>
> Will /help/display list the views it supports?
>

It should. I forgot to implement that. It can be done as a follow-up easily
enough.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 04 of 10 V2] bundle2: use compression engines API to obtain decompressor

2016-11-07 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478572608 28800
#  Mon Nov 07 18:36:48 2016 -0800
# Node ID 439b96dc6e896d9875ae24fae4a59e41b00b63c6
# Parent  8b1e72914d246af5703ea5bad9bd3cb051463164
bundle2: use compression engines API to obtain decompressor

Like the recent change for the compressor side, this too is
relatively straightforward. We now store a compression engine
on the instance instead of a low-level decompressor. Again, this
will allow us to easily transition to different compression engine
APIs when they are implemented.

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -681,7 +681,7 @@ class unbundle20(unpackermixin):
 def __init__(self, ui, fp):
 """If header is specified, we do not read it out of the stream."""
 self.ui = ui
-self._decompressor = util.decompressors[None]
+self._compengine = util.compengines.forbundletype('UN')
 self._compressed = None
 super(unbundle20, self).__init__(fp)
 
@@ -755,9 +755,9 @@ class unbundle20(unpackermixin):
 params = self._readexact(paramssize)
 self._processallparams(params)
 yield params
-assert self._decompressor is util.decompressors[None]
+assert self._compengine.bundletype == 'UN'
 # From there, payload might need to be decompressed
-self._fp = self._decompressor(self._fp)
+self._fp = self._compengine.decompressorreader(self._fp)
 emptycount = 0
 while emptycount < 2:
 # so we can brainlessly loop
@@ -781,7 +781,7 @@ class unbundle20(unpackermixin):
 # make sure param have been loaded
 self.params
 # From there, payload need to be decompressed
-self._fp = self._decompressor(self._fp)
+self._fp = self._compengine.decompressorreader(self._fp)
 indebug(self.ui, 'start extraction of bundle2 parts')
 headerblock = self._readpartheader()
 while headerblock is not None:
@@ -823,10 +823,10 @@ def b2streamparamhandler(name):
 @b2streamparamhandler('compression')
 def processcompression(unbundler, param, value):
 """read compression parameter and install payload decompression"""
-if value not in util.decompressors:
+if value not in util.compengines.supportedbundletypes:
 raise error.BundleUnknownFeatureError(params=(param,),
   values=(value,))
-unbundler._decompressor = util.decompressors[value]
+unbundler._compengine = util.compengines.forbundletype(value)
 if value is not None:
 unbundler._compressed = True
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 05 of 10 V2] changegroup: use compression engines API

2016-11-07 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478572693 28800
#  Mon Nov 07 18:38:13 2016 -0800
# Node ID 5642a2b769a73befd6c3e3539e7e373a20392f3a
# Parent  439b96dc6e896d9875ae24fae4a59e41b00b63c6
changegroup: use compression engines API

The new API doesn't have the equivalence for None and 'UN' so we
introduce code to use 'UN' explicitly.

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -137,14 +137,16 @@ class cg1unpacker(object):
 _grouplistcount = 1 # One list of files after the manifests
 
 def __init__(self, fh, alg, extras=None):
-if alg == 'UN':
-alg = None # get more modern without breaking too much
-if not alg in util.decompressors:
+if alg is None:
+alg = 'UN'
+if alg not in util.compengines.supportedbundletypes:
 raise error.Abort(_('unknown stream compression type: %s')
  % alg)
 if alg == 'BZ':
 alg = '_truncatedBZ'
-self._stream = util.decompressors[alg](fh)
+
+compengine = util.compengines.forbundletype(alg)
+self._stream = compengine.decompressorreader(fh)
 self._type = alg
 self.extras = extras or {}
 self.callback = None
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 02 of 10 V2] bundle2: use new compression engine API for compression

2016-11-07 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478572543 28800
#  Mon Nov 07 18:35:43 2016 -0800
# Node ID 9c4c59fa0b44412bd59170f850463c68497b43da
# Parent  f3c9da54ff5e23becaa4d0e90a20c9de704a70ba
bundle2: use new compression engine API for compression

Now that we have a new API to define compression engines, let's put it
to use!

The new code stores a reference to the compression engine instead of
a low-level compressor object. This will allow us to more easily
transition to different APIs on the compression engine interface
once we implement them.

As part of this, we change the registration in bundletypes to use 'UN'
instead of None. Previously, util.compressors had the no-op compressor
registered under both the 'UN' and None keys. Since we're switching to
a new API, I don't see the point in carrying this dual registration
forward.

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -485,11 +485,11 @@ def encodecaps(caps):
 return '\n'.join(chunks)
 
 bundletypes = {
-"": ("", None),   # only when using unbundle on ssh and old http 
servers
+"": ("", 'UN'),   # only when using unbundle on ssh and old http 
servers
   # since the unification ssh accepts a header but 
there
   # is no capability signaling it.
 "HG20": (), # special-cased below
-"HG10UN": ("HG10UN", None),
+"HG10UN": ("HG10UN", 'UN'),
 "HG10BZ": ("HG10", 'BZ'),
 "HG10GZ": ("HG10GZ", 'GZ'),
 }
@@ -511,7 +511,7 @@ class bundle20(object):
 self._params = []
 self._parts = []
 self.capabilities = dict(capabilities)
-self._compressor = util.compressors[None]()
+self._compengine = util.compengines.forbundletype('UN')
 
 def setcompression(self, alg):
 """setup core part compression to """
@@ -519,7 +519,7 @@ class bundle20(object):
 return
 assert not any(n.lower() == 'Compression' for n, v in self._params)
 self.addparam('Compression', alg)
-self._compressor = util.compressors[alg]()
+self._compengine = util.compengines.forbundletype(alg)
 
 @property
 def nbparts(self):
@@ -572,11 +572,12 @@ class bundle20(object):
 if param:
 yield param
 # starting compression
+compressor = self._compengine.compressorobj()
 for chunk in self._getcorechunk():
-data = self._compressor.compress(chunk)
+data = compressor.compress(chunk)
 if data:
 yield data
-yield self._compressor.flush()
+yield compressor.flush()
 
 def _paramchunk(self):
 """return a encoded version of all stream parameters"""
@@ -1318,18 +1319,19 @@ def writebundle(ui, cg, filename, bundle
 raise error.Abort(_('old bundle types only supports v1 '
 'changegroups'))
 header, comp = bundletypes[bundletype]
-if comp not in util.compressors:
+if comp not in util.compengines.supportedbundletypes:
 raise error.Abort(_('unknown stream compression type: %s')
   % comp)
-z = util.compressors[comp]()
+compengine = util.compengines.forbundletype(comp)
+compressor = compengine.compressorobj()
 subchunkiter = cg.getchunks()
 def chunkiter():
 yield header
 for chunk in subchunkiter:
-data = z.compress(chunk)
+data = compressor.compress(chunk)
 if data:
 yield data
-yield z.flush()
+yield compressor.flush()
 chunkiter = chunkiter()
 
 # parse the changegroup data, otherwise we will block
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 01 of 10 V2] util: create new abstraction for compression engines

2016-11-07 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478572299 28800
#  Mon Nov 07 18:31:39 2016 -0800
# Node ID f3c9da54ff5e23becaa4d0e90a20c9de704a70ba
# Parent  0911191dc4c97cbc8334c8b83782e8134bf621f0
util: create new abstraction for compression engines

Currently, util.py has "compressors" and "decompressors" dicts
mapping compression algorithms to callables returning objects that
perform well-defined operations. In addition, revlog.py has code
for calling into a compressor or decompressor explicitly. And, there
is code in the wire protocol for performing zlib compression.

The 3rd party lz4revlog extension has demonstrated the utility of
supporting alternative compression formats for revlog storage. But
it stops short of supporting lz4 for bundles and the wire protocol.

There are also plans to support zstd as a general compression
replacement.

So, there appears to be a market for a unified API for registering
compression engines. This commit starts the process of establishing
one.

This commit establishes a base class/interface for defining
compression engines and how they will be used. A collection class
to hold references to registered compression engines has also been
introduced.

The built-in zlib, bz2, truncated bz2, and no-op compression engines
are registered with a singleton instance of the collection class.

The compression engine API will change once consumers are ported
to the new API and some common patterns can be simplified at the
engine API level. So don't get too attached to the API...

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2856,13 +2856,219 @@ class ctxmanager(object):
 raise exc_val
 return received and suppressed
 
-# compression utility
+# compression code
+
+class compressormanager(object):
+"""Holds registrations of various compression engines.
+
+This class essentially abstracts the differences between compression
+engines to allow new compression formats to be added easily, possibly from
+extensions.
+
+Compressors are registered against the global instance by calling its
+``register()`` method.
+"""
+def __init__(self):
+self._engines = {}
+# Bundle spec human name to engine name.
+self._bundlenames = {}
+# Internal bundle identifier to engine name.
+self._bundletypes = {}
+
+def __getitem__(self, key):
+return self._engines[key]
+
+def __contains__(self, key):
+return key in self._engines
+
+def __iter__(self):
+return iter(self._engines.keys())
+
+def register(self, engine):
+"""Register a compression engine with the manager.
+
+The argument must be a ``compressionengine`` instance.
+"""
+if not isinstance(engine, compressionengine):
+raise ValueError(_('argument must be a compressionengine'))
+
+name = engine.name()
+
+if name in self._engines:
+raise error.Abort(_('compression engine %s already registered') %
+  name)
+
+bundleinfo = engine.bundletype()
+if bundleinfo:
+bundlename, bundletype = bundleinfo
+
+if bundlename in self._bundlenames:
+raise error.Abort(_('bundle name %s already registered') %
+  bundlename)
+if bundletype in self._bundletypes:
+raise error.Abort(_('bundle type %s already registered by %s') 
%
+  (bundletype, self._bundletypes[bundletype]))
+
+# No external facing name declared.
+if bundlename:
+self._bundlenames[bundlename] = name
+
+self._bundletypes[bundletype] = name
+
+self._engines[name] = engine
+
+@property
+def supportedbundlenames(self):
+return set(self._bundlenames.keys())
+
+@property
+def supportedbundletypes(self):
+return set(self._bundletypes.keys())
+
+def forbundlename(self, bundlename):
+"""Obtain a compression engine registered to a bundle name.
+
+Will raise KeyError if the bundle type isn't registered.
+"""
+return self._engines[self._bundlenames[bundlename]]
+
+def forbundletype(self, bundletype):
+"""Obtain a compression engine registered to a bundle type.
+
+Will raise KeyError if the bundle type isn't registered.
+"""
+return self._engines[self._bundletypes[bundletype]]
+
+compengines = compressormanager()
+
+class compressionengine(object):
+"""Base class for compression engines.
+
+Compression engines must implement the interface defined by this class.
+"""
+def name(self):
+"""Returns the name of the compression engine.
+
+This is the key the engine is registered under.
+
+This method must be implemented.
+"""
+raise 

[PATCH 08 of 10 V2] bundle2: use compressstream compression engine API

2016-11-07 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478573197 28800
#  Mon Nov 07 18:46:37 2016 -0800
# Node ID fc931794a250e605717cc066f26512c0dcc81224
# Parent  fa24595b79b603ff7be6f32b849c07ddfdee3da4
bundle2: use compressstream compression engine API

Compression engines now have an API for compressing a stream of
chunks. Switch to it and make low-level compression code disappear.

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -571,13 +571,8 @@ class bundle20(object):
 yield _pack(_fstreamparamsize, len(param))
 if param:
 yield param
-# starting compression
-compressor = self._compengine.compressorobj()
-for chunk in self._getcorechunk():
-data = compressor.compress(chunk)
-if data:
-yield data
-yield compressor.flush()
+for chunk in self._compengine.compressstream(self._getcorechunk()):
+yield chunk
 
 def _paramchunk(self):
 """return a encoded version of all stream parameters"""
@@ -1323,15 +1318,10 @@ def writebundle(ui, cg, filename, bundle
 raise error.Abort(_('unknown stream compression type: %s')
   % comp)
 compengine = util.compengines.forbundletype(comp)
-compressor = compengine.compressorobj()
-subchunkiter = cg.getchunks()
 def chunkiter():
 yield header
-for chunk in subchunkiter:
-data = compressor.compress(chunk)
-if data:
-yield data
-yield compressor.flush()
+for chunk in compengine.compressstream(cg.getchunks()):
+yield chunk
 chunkiter = chunkiter()
 
 # parse the changegroup data, otherwise we will block
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 09 of 11] bundle2: use compressstream compression engine API

2016-11-07 Thread Gregory Szorc
On Mon, Nov 7, 2016 at 6:13 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

>
>
> On 11/02/2016 01:08 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1477160145 25200
>> #  Sat Oct 22 11:15:45 2016 -0700
>> # Node ID 03555032b7e3bc7192fd8bebf6af3f05b1e70516
>> # Parent  1d4d111b644453acc4893478528a5f2ecd7ca023
>> bundle2: use compressstream compression engine API
>>
>> Compression engines now have an API for compressing a stream of
>> chunks. Switch to it and make low-level compression code disappear.
>>
>
> Do we get any performance benefit for this ? I know you have spend a lot
> of time tracking performance gain in bundle creation/application. And this
> likely have some effect.
>
> Talking about performance, Philippe Pépiot have a patch to setup some
> official performance tracking tool, if you could help reviewing it we could
> include these operations to it and we would have an easy and standard way
> to get these number.


>From this patch, most likely not. The reason is because the code is nearly
identical and I expect any performance changes due to how functions are
called to be dwarfed by the time spent inside the compressor.


>
>
> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
>> --- a/mercurial/bundle2.py
>> +++ b/mercurial/bundle2.py
>> @@ -566,23 +566,18 @@ class bundle20(object):
>>  self.ui.debug(''.join(msg))
>>  outdebug(self.ui, 'start emission of %s stream' %
>> self._magicstring)
>>  yield self._magicstring
>>  param = self._paramchunk()
>>  outdebug(self.ui, 'bundle parameter: %s' % param)
>>  yield _pack(_fstreamparamsize, len(param))
>>  if param:
>>  yield param
>> -# starting compression
>> -compressor = self._compengine.compressorobj()
>> -for chunk in self._getcorechunk():
>> -data = compressor.compress(chunk)
>> -if data:
>> -yield data
>> -yield compressor.flush()
>> +for chunk in self._compengine.compressstrea
>> m(self._getcorechunk()):
>> +yield chunk
>>
>>  def _paramchunk(self):
>>  """return a encoded version of all stream parameters"""
>>  blocks = []
>>  for par, value in self._params:
>>  par = urlreq.quote(par)
>>  if value is not None:
>>  value = urlreq.quote(value)
>> @@ -1318,25 +1313,20 @@ def writebundle(ui, cg, filename, bundle
>>  if cg.version != '01':
>>  raise error.Abort(_('old bundle types only supports v1 '
>>  'changegroups'))
>>  header, comp = bundletypes[bundletype]
>>  if comp not in util.compressionengines.supportedbundletypes:
>>  raise error.Abort(_('unknown stream compression type: %s')
>>% comp)
>>  compengine = util.compressionengines.forbundletype(comp)
>> -compressor = compengine.compressorobj()
>> -subchunkiter = cg.getchunks()
>>  def chunkiter():
>>  yield header
>> -for chunk in subchunkiter:
>> -data = compressor.compress(chunk)
>> -if data:
>> -yield data
>> -yield compressor.flush()
>> +for chunk in compengine.compressstream(cg.getchunks()):
>> +yield chunk
>>  chunkiter = chunkiter()
>>
>>  # parse the changegroup data, otherwise we will block
>>  # in case of sshrepo because we don't know the end of the stream
>>  return changegroup.writechunks(ui, chunkiter, filename, vfs=vfs)
>>
>>  @parthandler('changegroup', ('version', 'nbchanges', 'treemanifest'))
>>  def handlechangegroup(op, inpart):
>>
>
>
> --
> Pierre-Yves David
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 02 of 11] util: create new abstraction for compression engines

2016-11-07 Thread Gregory Szorc
On Mon, Nov 7, 2016 at 5:36 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

>
>
> On 11/02/2016 01:08 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1477966026 25200
>> #  Mon Oct 31 19:07:06 2016 -0700
>> # Node ID 4015d575d311cd7ebc923d1320e55a76c655c485
>> # Parent  60f180c9a030ebcee6c6f4f8584fdb94c73ac337
>> util: create new abstraction for compression engines
>>
>> Currently, util.py has "compressors" and "decompressors" dicts
>> mapping compression algorithms to callables returning object that
>> perform well-defined operations. In addition, revlog.py has code
>> for calling into a compressor or decompressor explicitly. And, there
>> is code in the wire protocol for performing zlib compression.
>>
>> The 3rd party lz4revlog extension has demonstrated the utility of
>> supporting alternative compression formats for revlog storage. But
>> it stops short of supporting lz4 for bundles and the wire protocol.
>>
>> There are also plans to support zstd as a general compression
>> replacement.
>>
>> So, there appears to be a market for a unified API for registering
>> compression engines. This commit starts the process of establishing
>> one. It establishes a new container class for holding registered
>> compression engine objects. Each object declares and supports common
>> operations via attributes.
>>
>> The built-in zlib, bz2, truncated bz2, and no-op compression engines
>> are registered with a singleton instance of this class.
>>
>> It's worth stating that I'm no fan of the "decompressorreader" API.
>> But this is what existing consumers expect. My plans are to get
>> consumers using the new "engines" API then transition them to a
>> better decompression primitive. This partially explains why I don't
>> care about the duplicated code pattern used for decompressors
>> (it is abstracted into _makedecompressor in the existing code).
>>
>
> The plan seems overall good, I've some suggestion on the implementation.
>
>
> diff --git a/mercurial/util.py b/mercurial/util.py
>> --- a/mercurial/util.py
>> +++ b/mercurial/util.py
>> @@ -2851,21 +2851,156 @@ class ctxmanager(object):
>>  exc_type, exc_val, exc_tb = pending = sys.exc_info()
>>  del self._atexit
>>  if pending:
>>  raise exc_val
>>  return received and suppressed
>>
>>  # compression utility
>>
>> +class compressormanager(object):
>> +"""Holds registrations of various compression engines.
>> +
>> +This class essentially abstracts the differences between compression
>> +engines to allow new compression formats to be added easily,
>> possibly from
>> +extensions.
>> +
>> +Compressors are registered against the global instance by calling its
>> +``register()`` method.
>> +"""
>> +def __init__(self):
>> +self._engines = {}
>> +self._bundletypes = {}
>> +
>> +def __getitem__(self, key):
>> +return self._engines[key]
>> +
>> +def __contains__(self, key):
>> +return key in self._engines
>> +
>> +def __iter__(self):
>> +return iter(self._engines.keys())
>> +
>> +def register(self, name, engine):
>> +"""Register a compression format with the manager.
>> +
>> +The passed compression engine is an object with attributes
>> describing
>> +behavior and methods performing well-defined actions. The
>> following
>> +attributes are recognized (all are optional):
>> +
>> +* bundletype -- Attribute containing the identifier of this
>> compression
>> +  format as used by bundles.
>> +
>> +* compressorobj -- Method returning an object with
>> ``compress(data)``
>> +  and ``flush()`` methods. This object and these methods are
>> used to
>> +  incrementally feed data (presumably uncompressed) chunks into a
>> +  compressor. Calls to these methods return compressed bytes,
>> which
>> +  may be 0-length if there is no output for the operation.
>> +
>> +* decompressorreader -- Method that is used to perform
>> decompression
>> +  on a file object. Argument is an object with a ``read(size)``
>> method
>> +  that returns compressed data. Return value is an object with a
>> +  ``read(size)`` that returns uncompressed data.
>> +"""
>>
>
> This method would be a great decorator candidate. Could we get the name
> from the object (as we do for the other property?) or have it declared as
> part of a decorator (but I think the property approach is more consistent
> with the other bits).
>
> Being a decorator probably means to move away from
>
> +bundletype = getattr(engine, 'bundletype', None)
>>
>
> Apparently the 'bundletype' can be None but there is not mention of it in
> the documentation. Can the documentation be updated?
> Also, I'm not sure why the bundletype attribut is optional. Could we just
> have it mandatory
>
> +   

[Bug 5420] New: rebase -b should calculate ancestors seperately

2016-11-07 Thread bugzilla
https://bz.mercurial-scm.org/show_bug.cgi?id=5420

Bug ID: 5420
   Summary: rebase -b should calculate ancestors seperately
   Product: Mercurial
   Version: default branch
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: feature
  Priority: wish
 Component: rebase
  Assignee: bugzi...@selenic.com
  Reporter: arcppzju+hg...@gmail.com
CC: mercurial-de...@selenic.com

Given the following graph:

5
| 4
|/
3
| 2
|/
1

rebase -b 2+4 -d 5 will use the revset (ancestor(2+4)::(2+4) - ancestor(2+4))::
as the source, which is (1::(2+4) - 1)::, and it's finally 2+3+4+5.

So it may be better if we calculate ancestors for each revision -b specifies:
(ancestor(4,5):: - ancestor(4,5)):: + (ancestor(2,5):: - ancestor(2,5)):: and
that resolves to 2+4 as expected.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH] keyword: handle filectx _customcmp

2016-11-07 Thread Christian Ebert
# HG changeset patch
# User Christian Ebert 
# Date 1476718966 -7200
#  Mon Oct 17 17:42:46 2016 +0200
# Node ID 94e42c8808cdd96891a9f375f02a0760670e33d8
# Parent  d06c049695e6ad3219e7479c65ce98a2f123e878
keyword: handle filectx _customcmp

Suggested by Yuya Nishihara:
https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089461.html

Related to issue5364.

diff --git a/hgext/keyword.py b/hgext/keyword.py
--- a/hgext/keyword.py
+++ b/hgext/keyword.py
@@ -737,6 +737,8 @@ def reposetup(ui, repo):
 return ret
 
 def kwfilectx_cmp(orig, self, fctx):
+if fctx._customcmp:
+return fctx.cmp(self)
 # keyword affects data size, comparing wdir and filelog size does
 # not make sense
 if (fctx._filenode is None and
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[Bug 5419] New: hg revert crashes with multiple renames and --rev

2016-11-07 Thread bugzilla
https://bz.mercurial-scm.org/show_bug.cgi?id=5419

Bug ID: 5419
   Summary: hg revert crashes with multiple renames and --rev
   Product: Mercurial
   Version: default branch
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: bug
  Priority: wish
 Component: Mercurial
  Assignee: bugzi...@selenic.com
  Reporter: arcppzju+hg...@gmail.com
CC: mercurial-de...@selenic.com

The following commands will crash "hg revert":

  $ hg init repo
  $ cd repo
  $ touch a
  $ hg commit -A a -m a
  $ hg mv a a1
  $ hg commit -m a1
  $ hg mv a1 a2
  $ hg revert -a -r 0

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 10 of 11] hgweb: use compression engine API for zlib compression

2016-11-07 Thread Gregory Szorc
On Mon, Nov 7, 2016 at 6:25 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

>
>
> On 11/02/2016 01:08 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1477160356 25200
>> #  Sat Oct 22 11:19:16 2016 -0700
>> # Node ID fc426af4f25c3403703e913ccb4a6865865fcb02
>> # Parent  03555032b7e3bc7192fd8bebf6af3f05b1e70516
>> hgweb: use compression engine API for zlib compression
>>
>> More low-level compression code elimination because we now have nice
>> APIs.
>>
>> diff --git a/mercurial/hgweb/protocol.py b/mercurial/hgweb/protocol.py
>> --- a/mercurial/hgweb/protocol.py
>> +++ b/mercurial/hgweb/protocol.py
>> @@ -83,24 +83,18 @@ class webproto(wireproto.abstractserverp
>>  yield chunk
>>
>>  return self.compresschunks(getchunks())
>>
>>  def compresschunks(self, chunks):
>>  # Don't allow untrusted settings because disabling compression or
>>  # setting a very high compression level could lead to flooding
>>  # the server's network or CPU.
>> -z = zlib.compressobj(self.ui.configint('server', 'zliblevel',
>> -1))
>> -for chunk in chunks:
>> -data = z.compress(chunk)
>> -# Not all calls to compress() emit data. It is cheaper to
>> inspect
>> -# that here than to send it via the generator.
>> -if data:
>> -yield data
>> -yield z.flush()
>> +opts = {'level': self.ui.configint('server', 'zliblevel', -1)}
>> +return util.compressionengines['zlib'].compressstream(chunks,
>> opts)
>>
>
> Out of curiosity, what is the long term plan for this zliblevel option
> here?
> * Having some special case for each compressors in the code,
> * Having a generic callback to set this up,
> * Pass ui to the compressors for auto configuration,
> * something else?
>

I haven't fully solved this problem for all cases.

For bundles, I plan on extending the "bundle spec" mechanism to allow
defining compression parameters. See
https://hg.mozilla.org/users/gszorc_mozilla.com/hg/rev/04f0144c9142.

For the wire protocol, I was tentatively planning on reusing [server]. For
revlogs, we could reuse [format]. In many cases, yes, we'd need to pass a
ui or have the caller pass in options read from a ui.

The project survived for years without having any configuration knobs for
zlib. So I think we make sane default choices for new engines and add the
knobs later.


>
>  def _client(self):
>>  return 'remote:%s:%s:%s' % (
>>  self.req.env.get('wsgi.url_scheme') or 'http',
>>  urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
>>  urlreq.quote(self.req.env.get('REMOTE_USER', '')))
>>
>>  def iscmd(cmd):
>>
>
> Cheers,
>
> --
> Pierre-Yves David
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH STABLE V2] hgweb: cache fctx.parents() in annotate command (issue5414)

2016-11-07 Thread Jun Wu
Excerpts from Gregory Szorc's message of 2016-11-07 09:29:40 -0800:
> Could we change basefilectx.annotate() to return a rich data structure
> instead of a list of tuples? That data structure could have the cached
> parents and other reusable cache data (which could be passed into
> subsequent calls if needed).

It's already returning "fctx", which is rich...
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH RESEND] revert: do not reverse hunks in interactive when REV is not parent (issue5096)

2016-11-07 Thread Denis Laxalde
# HG changeset patch
# User Denis Laxalde 
# Date 1475237490 -7200
#  Fri Sep 30 14:11:30 2016 +0200
# Node ID 4f7b7750403ce48e78d0f361236f65ac03584c3c
# Parent  d06c049695e6ad3219e7479c65ce98a2f123e878
revert: do not reverse hunks in interactive when REV is not parent (issue5096)

And introduce a new "apply" operation verb for this case as suggested in
issue5096. This replaces the no longer used "revert" operation.

In interactive revert, when reverting to something else that the parent
revision, display an "apply this change" message with a diff that is not
reversed.

The rationale is that `hg revert -i -r REV` will show hunks of the diff from
the working directory to REV and prompt the user to select them for applying
(to working directory). This somehow contradicts dcc56e10c23b in which it was
decided to have the "direction" of prompted hunks reversed...

Drop no longer used "experimental.revertalternateinteractivemode"
configuration option. (Keeping it would lead to inconsistent prompt message
vs. hunks display.)

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -3291,15 +3291,17 @@ def _performrevert(repo, parents, ctx, a
 diffopts = patch.difffeatureopts(repo.ui, whitespace=True)
 diffopts.nodates = True
 diffopts.git = True
-reversehunks = repo.ui.configbool('experimental',
-  'revertalternateinteractivemode',
-  True)
+if node == parent:
+operation = 'discard'
+reversehunks = True
+else:
+operation = 'apply'
+reversehunks = False
 if reversehunks:
 diff = patch.diff(repo, ctx.node(), None, m, opts=diffopts)
 else:
 diff = patch.diff(repo, None, ctx.node(), m, opts=diffopts)
 originalchunks = patch.parsepatch(diff)
-operation = 'discard' if node == parent else 'revert'
 
 try:
 
diff --git a/mercurial/patch.py b/mercurial/patch.py
--- a/mercurial/patch.py
+++ b/mercurial/patch.py
@@ -980,14 +980,14 @@ def filterpatch(ui, headers, operation=N
 operation = 'record'
 messages = {
 'multiple': {
+'apply': _("apply change %d/%d to '%s'?"),
 'discard': _("discard change %d/%d to '%s'?"),
 'record': _("record change %d/%d to '%s'?"),
-'revert': _("revert change %d/%d to '%s'?"),
 }[operation],
 'single': {
+'apply': _("apply this change to '%s'?"),
 'discard': _("discard this change to '%s'?"),
 'record': _("record this change to '%s'?"),
-'revert': _("revert this change to '%s'?"),
 }[operation],
 }
 
diff --git a/tests/test-revert-interactive.t b/tests/test-revert-interactive.t
--- a/tests/test-revert-interactive.t
+++ b/tests/test-revert-interactive.t
@@ -57,45 +57,45 @@ 10 run the same test than 8 from within 
   2 hunks, 2 lines changed
   examine changes to 'f'? [Ynesfdaq?] y
   
-  @@ -1,5 +1,6 @@
-  +a
-   1
-   2
-   3
-   4
-   5
-  revert change 1/6 to 'f'? [Ynesfdaq?] y
-  
-  @@ -1,5 +2,6 @@
+  @@ -1,6 +1,5 @@
+  -a
1
2
3
4
5
-  +b
-  revert change 2/6 to 'f'? [Ynesfdaq?] y
+  apply change 1/6 to 'f'? [Ynesfdaq?] y
+  
+  @@ -2,6 +1,5 @@
+   1
+   2
+   3
+   4
+   5
+  -b
+  apply change 2/6 to 'f'? [Ynesfdaq?] y
   
   diff --git a/folder1/g b/folder1/g
   2 hunks, 2 lines changed
   examine changes to 'folder1/g'? [Ynesfdaq?] y
   
-  @@ -1,5 +1,6 @@
-  +c
+  @@ -1,6 +1,5 @@
+  -c
1
2
3
4
5
-  revert change 3/6 to 'folder1/g'? [Ynesfdaq?] y
+  apply change 3/6 to 'folder1/g'? [Ynesfdaq?] y
   
-  @@ -1,5 +2,6 @@
+  @@ -2,6 +1,5 @@
1
2
3
4
5
-  +d
-  revert change 4/6 to 'folder1/g'? [Ynesfdaq?] n
+  -d
+  apply change 4/6 to 'folder1/g'? [Ynesfdaq?] n
   
   diff --git a/folder2/h b/folder2/h
   2 hunks, 2 lines changed
@@ -143,12 +143,12 @@ Test that a noop revert doesn't do an un
   1 hunks, 1 lines changed
   examine changes to 'folder1/g'? [Ynesfdaq?] y
   
-  @@ -3,3 +3,4 @@
+  @@ -3,4 +3,3 @@
3
4
5
-  +d
-  revert this change to 'folder1/g'? [Ynesfdaq?] n
+  -d
+  apply this change to 'folder1/g'? [Ynesfdaq?] n
   
   $ ls folder1/
   g
@@ -159,12 +159,12 @@ Test --no-backup
   1 hunks, 1 lines changed
   examine changes to 'folder1/g'? [Ynesfdaq?] y
   
-  @@ -3,3 +3,4 @@
+  @@ -3,4 +3,3 @@
3
4
5
-  +d
-  revert this change to 'folder1/g'? [Ynesfdaq?] y
+  -d
+  apply this change to 'folder1/g'? [Ynesfdaq?] y
   
   $ ls folder1/
   g
@@ -190,45 +190,45 @@ Test --no-backup
   2 hunks, 2 lines changed
   examine changes to 'f'? [Ynesfdaq?] y
   
-  @@ -1,5 +1,6 @@
-  +a
-   1
-   2
-   3
-   4
-   5
-  revert change 1/6 to 'f'? [Ynesfdaq?] y
-  
-  @@ -1,5 +2,6 @@
+  @@ -1,6 +1,5 @@
+  -a
1
2
3
4
5
-  +b
-  

[Bug 5418] New: Operation not permitted for utime

2016-11-07 Thread bugzilla
https://bz.mercurial-scm.org/show_bug.cgi?id=5418

Bug ID: 5418
   Summary: Operation not permitted for utime
   Product: Mercurial
   Version: 3.9
  Hardware: PC
OS: Linux
Status: UNCONFIRMED
  Severity: feature
  Priority: wish
 Component: Mercurial
  Assignee: bugzi...@selenic.com
  Reporter: m...@kiilerich.com
CC: mercurial-de...@selenic.com

When using a repo owned by another user but where I have group suid and proper
umask, I get 'Operation not permitted' from os.utime.

From linux utime man page:

   Changing  timestamps  is permitted when: either the process has
appropriate privileges, or the effective user ID equals the user ID of the
file, or times is NULL and the process has write permission
   for the file.

   If times is NULL, then the access and modification times of the file are
set to the current time.

The utime usage in 731ced087a4b does thus apparently not work with use cases
that Mercurial "always" has supported.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 6 of 7] py3: use try/except to check for basestring

2016-11-07 Thread Pulkit Goyal
On Mon, Nov 7, 2016 at 5:55 PM, Yuya Nishihara  wrote:
> On Mon, 7 Nov 2016 00:15:21 +0530, Pulkit Goyal wrote:
>> This 
>> https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089099.html
>> is a better version of what I want to do, since this didn't went
>> through I will be using this.
>
> I'm okay with that pycompat.basestring stuff, but I'm pretty sure most of
> our basestring uses are moot since we avoid using unicodes except for very
> specific string manipulations.
>
>> >> @@ -520,7 +520,12 @@
>> >>  result = self.config(section, name, untrusted=untrusted)
>> >>  if result is None:
>> >>  result = default or []
>> >> -if isinstance(result, basestring):
>> >> +checkunicode = False
>> >> +try:
>> >> +checkunicode = isinstance(result, basestring)
>> >> +except NameError:
>> >> +checkunicode = isinstance(result, str)
>> >> +if checkunicode:
>> >>  result = _configlist(result.lstrip(' ,\n'))
>
> And with this change, ui.configlist() would look as if it supports unicodes,
> which seems confusing.
Can you cherry pick some commits from that series?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: hglib uses distutils that is being deprecated

2016-11-07 Thread Pierre-Yves David



On 11/07/2016 03:37 PM, Barry Scott wrote:

On Monday, 7 November 2016 15:28:33 GMT Pierre-Yves David wrote:

On 11/07/2016 03:23 PM, Barry Scott wrote:

So that I could use the recents improvements to python-hglib I built a
wheel.

I needed to patch setup.py to do this /distutils/setuputils/ so that I
could create the wheel with

 python3 setup.py sdist bdist_wheel

Then when I installed my wheel I go this:

$ pip3.5 install --upgrade  /home/barry/wc/hg/hglib/dist/
python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl
Processing /home/barry/wc/hg/hglib/dist/
python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl
Installing collected packages: python-hglib

  Found existing installation: python-hglib 2.0

DEPRECATION: Uninstalling a distutils installed project (python-hglib)
has

been deprecated and will be removed in a future version. This is due to
the
fact that uninstalling a distutils project will only partially uninstall
the project.

Uninstalling python-hglib-2.0:
  Successfully uninstalled python-hglib-2.0

Successfully installed python-hglib-2.2-6-0f81ed8e147b-20161107

Do you have a plan to update to setuputils?

I'm guessing that you want to use distutils to support verion old python
versions. If that is true then I'd guess that the setup.py would need to
do
something like:

try:
 from setuptools import setup

except ImportError:
 from distutils import setup


I know that Gregory Szorc is building wheel for Mercurial itself. We can
probably use the same approache used by Mercurial in hglib (whatever
this approach is). Can you send a patch for hglib?



I don't think a wheel is created for mercurial.


The internet disagree https://pypi.python.org/pypi/Mercurial


On Windows its a .exe and
on Fedoara the site-specific/mercurial is installed from the RPM.

Looking a bit closer at hglib I only see PyPI with a .tar.gz source file.
I guess you do nto use wheels at all and pip will do the setup.py install
dance for the user.

I could patch to change from distutil to setuputil. But someone that knows
hglib's packaging strategy needs to speak to what is sensible to do.


Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range

2016-11-07 Thread Pierre-Yves David



On 11/03/2016 10:34 PM, Mads Kiilerich wrote:

# HG changeset patch
# User Mads Kiilerich 
# Date 1478208837 -3600
#  Thu Nov 03 22:33:57 2016 +0100
# Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1
# Parent  3e0216b2a0995cb21946bc13fb21391013332c57
bdiff: make sure we append to repeated lines instead of inserting into range

This will mitigate the symptoms that tests exposed in the previous changeset.


small nits Would it be possible to have something patch-5-ish before 
patch 4 then to reduce the line change in patch 5?



Arguably, we need similar handling for longer sequences of repeated lines...

But also, we already have examples of how the heuristics handle other cases in
a similar way.

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -187,7 +187,7 @@ static int longest_match(struct bdiff_li
} else if (i == mi && findbetterb) {
/* better j in first upper half */
mj = j;
-   if (j <= bhalf)
+   if (j <= bhalf && !(j > 0 && k == 1 && 
b[j - 1].e == b[j].e))
findbetterb = 0;
}
}
diff --git a/tests/test-annotate.t b/tests/test-annotate.t
--- a/tests/test-annotate.t
+++ b/tests/test-annotate.t
@@ -91,8 +91,8 @@ annotate (JSON)
 annotate -n b

   $ hg annotate -n b
+  0: a
   1: a
-  0: a
   1: a
   3: b4
   3: b5
@@ -111,8 +111,8 @@ annotate --no-follow b
 annotate -nl b

   $ hg annotate -nl b
-  1:1: a
   0:1: a
+  1:2: a
   1:3: a
   3:4: b4
   3:5: b5
@@ -121,8 +121,8 @@ annotate -nl b
 annotate -nf b

   $ hg annotate -nf b
+  0 a: a
   1 a: a
-  0 a: a
   1 a: a
   3 b: b4
   3 b: b5
@@ -131,8 +131,8 @@ annotate -nf b
 annotate -nlf b

   $ hg annotate -nlf b
-  1 a:1: a
   0 a:1: a
+  1 a:2: a
   1 a:3: a
   3 b:4: b4
   3 b:5: b5
@@ -156,8 +156,8 @@ annotate -nlf b
 annotate after merge

   $ hg annotate -nf b
+  0 a: a
   1 a: a
-  0 a: a
   1 a: a
   3 b: b4
   4 b: c
@@ -166,8 +166,8 @@ annotate after merge
 annotate after merge with -l

   $ hg annotate -nlf b
-  1 a:1: a
   0 a:1: a
+  1 a:2: a
   1 a:3: a
   3 b:4: b4
   4 b:5: c
@@ -198,7 +198,7 @@ annotate after merge with -l
 annotate after rename merge

   $ hg annotate -nf b
-  1 a: a
+  0 a: a
   6 b: z
   1 a: a
   3 b: b4
@@ -209,7 +209,7 @@ annotate after rename merge
 annotate after rename merge with -l

   $ hg annotate -nlf b
-  1 a:1: a
+  0 a:1: a
   6 b:2: z
   1 a:3: a
   3 b:4: b4
@@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit
   $ echo more >> b
   $ hg ci -mmore -d '7 0'
   $ hg annotate -nlf b
-   1 a: 1: a
+   0 a: 1: a
6 b: 2: z
1 a: 3: a
3 b: 4: b4
@@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit
 linkrev vs rev

   $ hg annotate -r tip -n a
+  0: a
   1: a
-  0: a
   1: a

 linkrev vs rev with -l

   $ hg annotate -r tip -nl a
-  1:1: a
   0:1: a
+  1:2: a
   1:3: a

 Issue589: "undelete" sequence leads to crash
diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t
--- a/tests/test-bhalf.t
+++ b/tests/test-bhalf.t
@@ -105,8 +105,8 @@ Explore some bdiff implementation edge c
   --- a/x
   +++ b/x
   @@ -1,1 +1,3 @@
+   a
   +a
-   a
   +a
   diff --git a/y b/y
   --- a/y
diff --git a/tests/test-commit-amend.t b/tests/test-commit-amend.t
--- a/tests/test-commit-amend.t
+++ b/tests/test-commit-amend.t
@@ -47,8 +47,8 @@ Amending changeset with changes in worki
   --- a/a  Thu Jan 01 00:00:00 1970 +
   +++ b/a  Thu Jan 01 00:00:00 1970 +
   @@ -1,1 +1,3 @@
+   a
   +a
-   a
   +a
   $ hg log
   changeset:   1:43f1ba15f28a
@@ -122,13 +122,13 @@ No changes, just a different message:
   uncompressed size of bundle content:
254 (changelog)
163 (manifests)
-   141  a
+   129  a
   saved backup bundle to 
$TESTTMP/.hg/strip-backup/74609c7f506e-1bfde511-amend-backup.hg (glob)
   1 changesets found
   uncompressed size of bundle content:
250 (changelog)
163 (manifests)
-   141  a
+   129  a
   adding branch
   adding changesets
   adding manifests
@@ -140,8 +140,8 @@ No changes, just a different message:
   --- a/a  Thu Jan 01 00:00:00 1970 +
   +++ b/a  Thu Jan 01 00:00:00 1970 +
   @@ -1,1 +1,3 @@
+   a
   +a
-   a
   +a
   $ hg log
   changeset:   1:1cd866679df8
@@ -266,13 +266,13 @@ then, test editing custom commit message
   uncompressed size of bundle content:
249 (changelog)
163 (manifests)
-   143  a
+   131  a
   saved backup bundle to 
$TESTTMP/.hg/strip-backup/5f357c7560ab-e7c84ade-amend-backup.hg (glob)
   1 changesets found
   uncompressed size of bundle content:
257 (changelog)
163 (manifests)
-   143  a
+   131  a
   adding branch
   adding changesets
   adding manifests
@@ -309,13 

Re: hglib uses distutils that is being deprecated

2016-11-07 Thread Barry Scott
On Monday, 7 November 2016 15:28:33 GMT Pierre-Yves David wrote:
> On 11/07/2016 03:23 PM, Barry Scott wrote:
> > So that I could use the recents improvements to python-hglib I built a
> > wheel.
> > 
> > I needed to patch setup.py to do this /distutils/setuputils/ so that I
> > could create the wheel with
> > 
> >  python3 setup.py sdist bdist_wheel
> > 
> > Then when I installed my wheel I go this:
> > 
> > $ pip3.5 install --upgrade  /home/barry/wc/hg/hglib/dist/
> > python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl
> > Processing /home/barry/wc/hg/hglib/dist/
> > python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl
> > Installing collected packages: python-hglib
> > 
> >   Found existing installation: python-hglib 2.0
> >   
> > DEPRECATION: Uninstalling a distutils installed project (python-hglib)
> > has
> > 
> > been deprecated and will be removed in a future version. This is due to
> > the
> > fact that uninstalling a distutils project will only partially uninstall
> > the project.
> > 
> > Uninstalling python-hglib-2.0:
> >   Successfully uninstalled python-hglib-2.0
> > 
> > Successfully installed python-hglib-2.2-6-0f81ed8e147b-20161107
> > 
> > Do you have a plan to update to setuputils?
> > 
> > I'm guessing that you want to use distutils to support verion old python
> > versions. If that is true then I'd guess that the setup.py would need to
> > do
> > something like:
> > 
> > try:
> >  from setuptools import setup
> > 
> > except ImportError:
> >  from distutils import setup
> 
> I know that Gregory Szorc is building wheel for Mercurial itself. We can
> probably use the same approache used by Mercurial in hglib (whatever
> this approach is). Can you send a patch for hglib?
> 

I don't think a wheel is created for mercurial. On Windows its a .exe and
on Fedoara the site-specific/mercurial is installed from the RPM.

Looking a bit closer at hglib I only see PyPI with a .tar.gz source file.
I guess you do nto use wheels at all and pip will do the setup.py install 
dance for the user.

I could patch to change from distutil to setuputil. But someone that knows
hglib's packaging strategy needs to speak to what is sensible to do.

Barry



Barry

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 11 of 11] util: remove compressorobj API from compression engines

2016-11-07 Thread Pierre-Yves David



On 11/02/2016 01:08 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1477160459 25200
#  Sat Oct 22 11:20:59 2016 -0700
# Node ID 4f491f7958229b370c5929d2e2599b9ed69d8254
# Parent  fc426af4f25c3403703e913ccb4a6865865fcb02
util: remove compressorobj API from compression engines

It was quite low-level and there are no callers of it now that
everyone is using compressstream()


Wait what ‽‽ plot twist!! You should probably mention upfront that 
eventually killing this method is one of your goal.



diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2884,22 +2884,16 @@ class compressormanager(object):

 The passed compression engine is an object with attributes describing
 behavior and methods performing well-defined actions. The following
 attributes are recognized (all are optional):

 * bundletype -- Attribute containing the identifier of this compression
   format as used by bundles.

-* compressorobj -- Method returning an object with ``compress(data)``
-  and ``flush()`` methods. This object and these methods are used to
-  incrementally feed data (presumably uncompressed) chunks into a
-  compressor. Calls to these methods return compressed bytes, which
-  may be 0-length if there is no output for the operation.
-
 * compressstream -- Compress an iterator of chunks and return an
   iterator of compressed chunks.

   Optionally accepts an argument defining how to perform compression.
   Each engine treats this argument differently.

 * decompressorreader -- Method that is used to perform decompression
   on a file object. Argument is an object with a ``read(size)`` method
@@ -2928,19 +2922,16 @@ class compressormanager(object):

 compressionengines = compressormanager()

 class _zlibengine(object):
 @property
 def bundletype(self):
 return 'GZ'

-def compressorobj(self):
-return zlib.compressobj()
-
 def compressstream(self, it, opts=None):
 opts = opts or {}

 z = zlib.compressobj(opts.get('level', -1))
 for chunk in it:
 data = z.compress(chunk)
 # Not all calls to compress emit data. It is cheaper to inspect
 # here than to feed empty chunks through generator.
@@ -2959,19 +2950,16 @@ class _zlibengine(object):

 compressionengines.register('zlib', _zlibengine())

 class _bz2engine(object):
 @property
 def bundletype(self):
 return 'BZ'

-def compressorobj(self):
-return bz2.BZ2Compressor()
-
 def compressstream(self, it, opts=None):
 opts = opts or {}
 z = bz2.BZ2Compressor(opts.get('level', 9))
 for chunk in it:
 data = z.compress(chunk)
 if data:
 yield data

@@ -2987,45 +2975,35 @@ class _bz2engine(object):

 compressionengines.register('bz2', _bz2engine())

 class _truncatedbz2engine(object):
 @property
 def bundletype(self):
 return '_truncatedBZ'

-# We don't implement compressorobj because it is hackily handled elsewhere.
+# We don't implement compressstream because it is hackily handled 
elsewhere.

 def decompressorreader(self, fh):
 def gen():
 # The input stream doesn't have the 'BZ' header. So add it back.
 d = bz2.BZ2Decompressor()
 d.decompress('BZ')
 for chunk in filechunkiter(fh):
 yield d.decompress(chunk)

 return chunkbuffer(gen())

 compressionengines.register('bz2truncated', _truncatedbz2engine())

-class nocompress(object):
-def compress(self, x):
-return x
-
-def flush(self):
-return ''
-
 class _noopengine(object):
 @property
 def bundletype(self):
 return 'UN'

-def compressorobj(self):
-return nocompress()
-
 def compressstream(self, it, opts=None):
 return it

 def decompressorreader(self, fh):
 return fh

 compressionengines.register('none', _noopengine())

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel



--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


hglib uses distutils that is being deprecated

2016-11-07 Thread Barry Scott
So that I could use the recents improvements to python-hglib I built a wheel.

I needed to patch setup.py to do this /distutils/setuputils/ so that I could
create the wheel with

 python3 setup.py sdist bdist_wheel

Then when I installed my wheel I go this:

$ pip3.5 install --upgrade  /home/barry/wc/hg/hglib/dist/
python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl 
Processing /home/barry/wc/hg/hglib/dist/
python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl
Installing collected packages: python-hglib
  Found existing installation: python-hglib 2.0
DEPRECATION: Uninstalling a distutils installed project (python-hglib) has 
been deprecated and will be removed in a future version. This is due to the 
fact that uninstalling a distutils project will only partially uninstall the 
project.
Uninstalling python-hglib-2.0:
  Successfully uninstalled python-hglib-2.0
Successfully installed python-hglib-2.2-6-0f81ed8e147b-20161107

Do you have a plan to update to setuputils?

I'm guessing that you want to use distutils to support verion old python 
versions. If that is true then I'd guess that the setup.py would need to do
something like:

try:
 from setuptools import setup
except ImportError:
 from distutils import setup

Barry

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 08 of 11] util: add a stream compression API to compression engines

2016-11-07 Thread Pierre-Yves David



On 11/02/2016 01:08 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1477159930 25200
#  Sat Oct 22 11:12:10 2016 -0700
# Node ID 1d4d111b644453acc4893478528a5f2ecd7ca023
# Parent  289da69280d95f1b983fdf9216739411a9953fb6
util: add a stream compression API to compression engines

It is a common pattern throughout the code to perform compression
on an iterator of chunks, yielding an iterator of compressed chunks.
Let's formalize that as part of the compression engine API.


The basic compression implementation for stream compression will be 
similar. We should maybe have a base class for these object?




diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2890,16 +2890,22 @@ class compressormanager(object):
   format as used by bundles.

 * compressorobj -- Method returning an object with ``compress(data)``
   and ``flush()`` methods. This object and these methods are used to
   incrementally feed data (presumably uncompressed) chunks into a
   compressor. Calls to these methods return compressed bytes, which
   may be 0-length if there is no output for the operation.

+* compressstream -- Compress an iterator of chunks and return an
+  iterator of compressed chunks.
+
+  Optionally accepts an argument defining how to perform compression.
+  Each engine treats this argument differently.
+
 * decompressorreader -- Method that is used to perform decompression
   on a file object. Argument is an object with a ``read(size)`` method
   that returns compressed data. Return value is an object with a
   ``read(size)`` that returns uncompressed data.
 """
 bundletype = getattr(engine, 'bundletype', None)
 if bundletype and bundletype in self._bundletypes:
 raise error.Abort(_('bundle type %s is already registered') %
@@ -2925,16 +2931,29 @@ compressionengines = compressormanager()
 class _zlibengine(object):
 @property
 def bundletype(self):
 return 'GZ'

 def compressorobj(self):
 return zlib.compressobj()

+def compressstream(self, it, opts=None):
+opts = opts or {}
+
+z = zlib.compressobj(opts.get('level', -1))
+for chunk in it:
+data = z.compress(chunk)
+# Not all calls to compress emit data. It is cheaper to inspect
+# here than to feed empty chunks through generator.
+if data:
+yield data
+
+yield z.flush()
+
 def decompressorreader(self, fh):
 def gen():
 d = zlib.decompressobj()
 for chunk in filechunkiter(fh):
 yield d.decompress(chunk)

 return chunkbuffer(gen())

@@ -2943,16 +2962,26 @@ compressionengines.register('zlib', _zli
 class _bz2engine(object):
 @property
 def bundletype(self):
 return 'BZ'

 def compressorobj(self):
 return bz2.BZ2Compressor()

+def compressstream(self, it, opts=None):
+opts = opts or {}
+z = bz2.BZ2Compressor(opts.get('level', 9))
+for chunk in it:
+data = z.compress(chunk)
+if data:
+yield data
+
+yield z.flush()
+
 def decompressorreader(self, fh):
 def gen():
 d = bz2.BZ2Decompressor()
 for chunk in filechunkiter(fh):
 yield d.decompress(chunk)

 return chunkbuffer(gen())

@@ -2987,15 +3016,18 @@ class nocompress(object):
 class _noopengine(object):
 @property
 def bundletype(self):
 return 'UN'

 def compressorobj(self):
 return nocompress()

+def compressstream(self, it, opts=None):
+return it
+
 def decompressorreader(self, fh):
 return fh

 compressionengines.register('none', _noopengine())

 # convenient shortcut
 dst = debugstacktrace
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel



--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 02 of 11] util: create new abstraction for compression engines

2016-11-07 Thread Pierre-Yves David



On 11/02/2016 01:08 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1477966026 25200
#  Mon Oct 31 19:07:06 2016 -0700
# Node ID 4015d575d311cd7ebc923d1320e55a76c655c485
# Parent  60f180c9a030ebcee6c6f4f8584fdb94c73ac337
util: create new abstraction for compression engines

Currently, util.py has "compressors" and "decompressors" dicts
mapping compression algorithms to callables returning object that
perform well-defined operations. In addition, revlog.py has code
for calling into a compressor or decompressor explicitly. And, there
is code in the wire protocol for performing zlib compression.

The 3rd party lz4revlog extension has demonstrated the utility of
supporting alternative compression formats for revlog storage. But
it stops short of supporting lz4 for bundles and the wire protocol.

There are also plans to support zstd as a general compression
replacement.

So, there appears to be a market for a unified API for registering
compression engines. This commit starts the process of establishing
one. It establishes a new container class for holding registered
compression engine objects. Each object declares and supports common
operations via attributes.

The built-in zlib, bz2, truncated bz2, and no-op compression engines
are registered with a singleton instance of this class.

It's worth stating that I'm no fan of the "decompressorreader" API.
But this is what existing consumers expect. My plans are to get
consumers using the new "engines" API then transition them to a
better decompression primitive. This partially explains why I don't
care about the duplicated code pattern used for decompressors
(it is abstracted into _makedecompressor in the existing code).


The plan seems overall good, I've some suggestion on the implementation.


diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2851,21 +2851,156 @@ class ctxmanager(object):
 exc_type, exc_val, exc_tb = pending = sys.exc_info()
 del self._atexit
 if pending:
 raise exc_val
 return received and suppressed

 # compression utility

+class compressormanager(object):
+"""Holds registrations of various compression engines.
+
+This class essentially abstracts the differences between compression
+engines to allow new compression formats to be added easily, possibly from
+extensions.
+
+Compressors are registered against the global instance by calling its
+``register()`` method.
+"""
+def __init__(self):
+self._engines = {}
+self._bundletypes = {}
+
+def __getitem__(self, key):
+return self._engines[key]
+
+def __contains__(self, key):
+return key in self._engines
+
+def __iter__(self):
+return iter(self._engines.keys())
+
+def register(self, name, engine):
+"""Register a compression format with the manager.
+
+The passed compression engine is an object with attributes describing
+behavior and methods performing well-defined actions. The following
+attributes are recognized (all are optional):
+
+* bundletype -- Attribute containing the identifier of this compression
+  format as used by bundles.
+
+* compressorobj -- Method returning an object with ``compress(data)``
+  and ``flush()`` methods. This object and these methods are used to
+  incrementally feed data (presumably uncompressed) chunks into a
+  compressor. Calls to these methods return compressed bytes, which
+  may be 0-length if there is no output for the operation.
+
+* decompressorreader -- Method that is used to perform decompression
+  on a file object. Argument is an object with a ``read(size)`` method
+  that returns compressed data. Return value is an object with a
+  ``read(size)`` that returns uncompressed data.
+"""


This method would be a great decorator candidate. Could we get the name 
from the object (as we do for the other property?) or have it declared 
as part of a decorator (but I think the property approach is more 
consistent with the other bits).


Being a decorator probably means to move away from


+bundletype = getattr(engine, 'bundletype', None)


Apparently the 'bundletype' can be None but there is not mention of it 
in the documentation. Can the documentation be updated?
Also, I'm not sure why the bundletype attribut is optional. Could we 
just have it mandatory



+if bundletype and bundletype in self._bundletypes:
+raise error.Abort(_('bundle type %s is already registered') %
+  bundletype)


note: Having the name on the object would allow us to provide a better 
error message here.


  "bundle type X provided by Y is already provided by Z"

This piece of code is also tickling the idea of a ProgrammingError of 
some sort.



+

Re: [PATCH STABLE V2] hgweb: cache fctx.parents() in annotate command (issue5414)

2016-11-07 Thread Yuya Nishihara
On Sun, 6 Nov 2016 17:01:05 +, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-11-06 11:31:04 +0900:
> > Perhaps fctx.parents() can be property-cached, but we'll need to drop
> > uninteresting chains of parents in fctx.annotate().
> 
> If we go the property-cache approach, I think it's better to cache
> "_adjustedlinkrev". It's at a lower level and covers both "parents"
> and "introrev". Caching "parents" may increase memory usage unintentionally.
> 
> I don't fully get what "uninteresting chains of parents" means here.
> In the annotate case, let's say f1, f2 = f0.parents().
> Both f1 and f2 have _descendantrev set to f0's adjusted linkrev.

As you said, what's in my mind was the memory usage. Caching fctx.parents()
would mean annotate() builds a full link from self to root nodes. Some of
these intermediate nodes aren't useful for hgweb.

> Suppose there is a global cache dict: {(path, filenode, srcrev): linkrev}, I
> think if srcrev=_descendantrev (it's true for f1, f2) and _descendantrev is
> adjusted from the direct child (f0), then it is "interesting" and can be
> cached. This is similar to what marmoute said during the sprint - for the
> log -f or annotate case, once the first fctx's introrev is known, the cache
> can be used to calculate the ancestors' adjusted linkrevs.

Given we have ugly hacks to pass ancestry data around fctx objects, a global
cache might be useful.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 8] py3: have bytes version of sys.argv

2016-11-07 Thread Yuya Nishihara
On Sun, 6 Nov 2016 23:55:50 +0530, Pulkit Goyal wrote:
> On Sun, Nov 6, 2016 at 9:04 AM, Yuya Nishihara  wrote:
> > On Sun, 06 Nov 2016 04:46:25 +0530, Pulkit Goyal wrote:
> >> # HG changeset patch
> >> # User Pulkit Goyal <7895pul...@gmail.com>
> >> # Date 1478387186 -19800
> >> #  Sun Nov 06 04:36:26 2016 +0530
> >> # Node ID b5fc4e71286dd4f6e4f38e0b9fb17f51f1e3
> >> # Parent  6eed3ee0df425da61d03bfe024dd082f3176ce5d
> >> py3: have bytes version of sys.argv
> >>
> >> sys.argv returns unicodes on Python 3. We need a bytes version for us.
> >> There was also a python bug/feature request which wanted then to implement
> >> one. They rejected and it is quoted in one of the comments that we can use
> >> fsencode() to get a bytes version of sys.argv. Though not sure about its
> >> correctness.
> >>
> >> Link to the comment: http://bugs.python.org/issue8776#msg217416
> >>
> >> After this patch we will have pycompat.sysargv which will return us bytes
> >> version of sys.argv. If this patch goes in, i will like to make transformer
> >> rewrite sys.argv with pycompat.argv because there are lot of occurences.
> >>
> >> diff -r 6eed3ee0df42 -r b5fc4e71286d mercurial/pycompat.py
> >> --- a/mercurial/pycompat.py   Sun Nov 06 04:17:19 2016 +0530
> >> +++ b/mercurial/pycompat.py   Sun Nov 06 04:36:26 2016 +0530
> >> @@ -41,6 +41,7 @@
> >>  osname = os.name.encode('ascii')
> >>  ospathsep = os.pathsep.encode('ascii')
> >>  ossep = os.sep.encode('ascii')
> >> +sysargv = list(map(os.fsencode, sys.argv))
> >
> > Looks good to me. Can you add a comment why we can use os.fsencode() here
> > (and the weirdness of Python 3 on Unix.) We might need a Windows workaround
> > because the situation is slightly different, but we wouldn't want to care
> > for now.
> 
> Well I will resend this patch because I am not sure about its
> correctness still. I followed that issue where Victor Stinner, one who
> wrote os.environb commented this. There are few doubts/confusions or
> maybe I want to just confirm it with MJ once.

It's generally wrong to assume argv is in filesystem encoding, but I think
that's okay for Python 3 on Unix. They builds "wchar_t argv" from "char argv"
by Py_DecodeLocale(), which would be identical to fsdecode() on Unix.

https://hg.python.org/cpython/file/v3.5.1/Programs/python.c#l55

On Windows, the native argv appears to be wchar_t, so we'll need a different
hack to simulate the Python 2 (i.e. ANSI Win32 API) behavior.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 6 RFC] manifest: introduce an accessor class for manifests

2016-11-07 Thread Pierre-Yves David



On 11/05/2016 02:05 PM, Yuya Nishihara wrote:

On Thu, 3 Nov 2016 15:27:37 -0700, Durham Goode wrote:

# HG changeset patch
# User Durham Goode 
# Date 1478208817 25200
#  Thu Nov 03 14:33:37 2016 -0700
# Branch stable
# Node ID 1788ee9e1df92ac94b9be84eac6d16e3bad903a9
# Parent  b9f7b0c10027764cee77f9c6d61877fcffea837f
manifest: introduce an accessor class for manifests

This introduces a revlogaccessor class which can be used to allow multiple
objects hold an auto-invalidating reference to a revlog, without having to hold
a reference to the actual repo object. Future patches will switch repo.manifest
and repo.manifestlog to access the manifest through this accessor. This will fix
the circular reference caused by manifestlog and manifestctx holding a reference
to the repo

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -514,6 +514,11 @@ class localrepository(object):
 # manifest creation.
 return manifest.manifest(self.svfs)

+@unfilteredpropertycache
+def manifestaccessor(self):
+return revlogaccessor('00manifest.i', self.svfs,
+  self._constructmanifest)


Honestly I don't get why manifestlog and manifestctxs have to live longer
than the other repo properties.


I'm also a bit curious about that.


But suppose that is necessary, I agree we'll
need this kind of a wrapper.


Any reason why we are using the wrapper approach over using a weak 
reference ? Weakref are not great but we use them in multiple spot when 
needed. this seems it would be simpler than the current approach, but I 
might be missing something.



Maybe we can move the accessor to the manifestlog,
but still the accessor will have to be shared (and updated transparently) by
the manifestlog and its cachable manifestctxs.


+def revlogaccessor(filename, opener, constructor):
+"""Creates an accessor that provides cached and invalidated access to a
+revlog, via instance.revlog. This is useful for letting multiple objects
+hold a reference to the revlog, without having to hold a possibly-circular
+reference to the actual repository.  """
+
+# We have to use a runtime type here, because the only way to create a
+# property is to put it on a class itself, and the property is dynamically
+# defined by the filename parameter.
+class accessor(object):


Perhaps we could refactor filecache to avoid dynamically creating a class, but
that would be a minor issue of this RFC series.


--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 4] tests: merge 'test-push-validation.t' into 'test-push.t'

2016-11-07 Thread Pierre-Yves David



On 11/04/2016 08:15 PM, timeless wrote:

Pierre-Yves David wrote:

tests: merge 'test-push-validation.t' into 'test-push.t'

That test file is very small and is merge with the new 'test-push.t'. No logic
is changed but repository name are update to avoid collision.

We don't register this as a copy because is actually a "ypoc" merging two file
together without replacing the destination and Mercurial cannot express that.


Actually, it can:

0: a b
0->1: rename a->d
0->2: rename b->d
1+2->3: merge d

This should give you the history you want in `hg ann`.


Hu, good point, but the UI only offer it through merge. We could offer 
make copy able to record them but I'm not sure its a good idea.


Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] commands: introduce `hg display`

2016-11-07 Thread Denis Laxalde

Gregory Szorc a écrit :

For the command name, we would have preferred `hg show` because it is
shorter and not ambigious with any other core command. However, a
number of people have created `hg show` as effectively an alias to
`hg export`. And, some were concerned that Git users used to `git show`
being equivalent to `hg export` would be confused by a `hg show` doing
something different.


`git show` is not equivalent to `hg export`, quoting git-show(1):

   Shows one or more objects (blobs, trees, tags and commits).

   For commits it shows the log message and textual diff. It also
   presents the merge commit in a special format as produced by git
   diff-tree --cc.

   For tags, it shows the tag message and the referenced objects.

   For trees, it shows the names (equivalent to git ls-tree with
   --name-only).

   For plain blobs, it shows the plain contents.

So only the first case is equivalent to `hg export` (or probably more
`hg log -vpr`). Other cases are quite close to the "view" concept
introduced here, as far as I understand.

Then if a revision can be registered as a view, `hg show` could just be
a plain replacement to the aforementioned alias I guess.

Given this and the conflict with `hg diff`, could we reconsider
the command name?


--
Denis Laxalde
Logilab http://www.logilab.fr
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 2] commands: add "di" alias for "diff"

2016-11-07 Thread Denis Laxalde

Jun Wu a écrit :

Therefore I'm very sensitive about this. I think we should always make sure
"d" = "diff" (although the complaint was only about "di").


I'm also quite used to `hg d`, for what it's worth.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] bdiff: replace hash algorithm

2016-11-07 Thread Gregory Szorc


> On Nov 6, 2016, at 19:06, Gregory Szorc  wrote:
> 
> # HG changeset patch
> # User Gregory Szorc 
> # Date 1478487117 28800
> #  Sun Nov 06 18:51:57 2016 -0800
> # Node ID bb7c6d6f4a10e80ff4bdf88919692f08497d2d66
> # Parent  1c7269484883804b6f960e87309169ef4ae85043
> bdiff: replace hash algorithm
> 
> This patch replaces lyhash with the hash algorithm used by diffutils.
> The algorithm has its origins in Git commit 2e9d1410, which is all the
> way back from 1992. The license header in the code at that revision
> in GPL v2.
> 
> I have not performed an extensive analysis of the distribution
> (and therefore buckets) of hash output. However, `hg perfbdiff`
> gives some clear wins. I'd like to think that if it is good enough
> for diffutils it is good enough for us?

Searching the Internets seems to reveal that xxHash is the state of the art for 
fast string hashing with great distribution. We'll have a copy of xxHash 
vendored as part of zstd and it should be relatively easy to plug in then.

Honestly, I'm not sure if we should take the quick win or hold out for xxHash 
in a few weeks (assuming my compression engine series and zstd vendoring moves 
forward...).

> 
> From the mozilla-unified repository:
> 
> $ perfbdiff -m 3041e4d59df2
> ! wall 0.053271 comb 0.06 user 0.06 sys 0.00 (best of 100)
> ! wall 0.035827 comb 0.04 user 0.04 sys 0.00 (best of 100)
> 
> $ perfbdiff 0e9928989e9c --alldata --count 100
> ! wall 6.204277 comb 6.20 user 6.20 sys 0.00 (best of 3)
> ! wall 4.309710 comb 4.30 user 4.30 sys 0.00 (best of 3)
> 
> From the hg repo:
> 
> $ perfbdiff 35000 --alldata --count 1000
> ! wall 0.660358 comb 0.66 user 0.66 sys 0.00 (best of 15)
> ! wall 0.534092 comb 0.53 user 0.53 sys 0.00 (best of 19)
> 
> Looking at the generated assembly and statistical profiler output
> from the kernel level, I believe there is room to make this function
> even faster. Namely, we're still consuming data character by character
> instead of at the word level. This translates to more loop iterations
> and more instructions.
> 
> At this juncture though, the real performance killer is that we're
> hashing every line. We should get a significant speedup if we change
> the algorithm to find the longest prefix, longest suffix, treat those
> as single "lines" and then only do the line splitting and hashing on
> the parts that are different. That will require a lot of C code,
> however. I'm optimistic this approach could result in a ~2x speedup.
> 
> diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
> --- a/mercurial/bdiff.c
> +++ b/mercurial/bdiff.c
> @@ -17,6 +17,10 @@
> #include "bitmanipulation.h"
> #include "bdiff.h"
> 
> +/* Hash implementation from diffutils */
> +#define ROL(v, n) ((v) << (n) | (v) >> (sizeof(v) * CHAR_BIT - (n)))
> +#define HASH(h, c) ((c) + ROL(h ,7))
> +
> struct pos {
>int pos, len;
> };
> @@ -44,8 +48,7 @@ int bdiff_splitlines(const char *a, ssiz
>/* build the line array and calculate hashes */
>hash = 0;
>for (p = a; p < a + len; p++) {
> -/* Leonid Yuriev's hash */
> -hash = (hash * 1664525) + (unsigned char)*p + 1013904223;
> +hash = HASH(hash, *p);
> 
>if (*p == '\n' || p == plast) {
>l->hash = hash;
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel