[PATCH 07 of 10 V2] util: add a stream compression API to compression engines
# HG changeset patch # User Gregory Szorc# Date 1478573827 28800 # Mon Nov 07 18:57:07 2016 -0800 # Node ID fa24595b79b603ff7be6f32b849c07ddfdee3da4 # Parent 8672777162085c92b836ce1e97ca254734b0fae0 util: add a stream compression API to compression engines It is a common pattern throughout the code to perform compression on an iterator of chunks, yielding an iterator of compressed chunks. Let's formalize that as part of the compression engine API. The zlib and bzip2 implementations allow an optional "level" option to control the compression level. The default values are the same as what the Python modules use. This option will be used in subsequent patches. diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -2966,10 +2966,22 @@ class compressionengine(object): exclude the name from external usage, set the first element to ``None``. If bundle compression is supported, the class must also implement -``compressorobj`` and `decompressorreader``. +``compressstream``, ``compressorobj`` and `decompressorreader``. """ return None +def compressstream(self, it, opts=None): +"""Compress an iterator of chunks. + +The method receives an iterator (ideally a generator) of chunks of +bytes to be compressed. It returns an iterator (ideally a generator) +of bytes of chunks representing the compressed output. + +Optionally accepts an argument defining how to perform compression. +Each engine treats this argument differently. +""" +raise NotImplementedError() + def compressorobj(self): """(Temporary) Obtain an object used for compression. @@ -2997,6 +3009,19 @@ class _zlibengine(compressionengine): def compressorobj(self): return zlib.compressobj() +def compressstream(self, it, opts=None): +opts = opts or {} + +z = zlib.compressobj(opts.get('level', -1)) +for chunk in it: +data = z.compress(chunk) +# Not all calls to compress emit data. It is cheaper to inspect +# here than to feed empty chunks through generator. +if data: +yield data + +yield z.flush() + def decompressorreader(self, fh): def gen(): d = zlib.decompressobj() @@ -3017,6 +3042,16 @@ class _bz2engine(compressionengine): def compressorobj(self): return bz2.BZ2Compressor() +def compressstream(self, it, opts=None): +opts = opts or {} +z = bz2.BZ2Compressor(opts.get('level', 9)) +for chunk in it: +data = z.compress(chunk) +if data: +yield data + +yield z.flush() + def decompressorreader(self, fh): def gen(): d = bz2.BZ2Decompressor() @@ -3065,6 +3100,9 @@ class _noopengine(compressionengine): def compressorobj(self): return nocompress() +def compressstream(self, it, opts=None): +return it + def decompressorreader(self, fh): return fh ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] commands: introduce `hg display`
On Mon, Nov 7, 2016 at 2:03 AM, Denis Laxaldewrote: > Gregory Szorc a écrit : > >> For the command name, we would have preferred `hg show` because it is >> shorter and not ambigious with any other core command. However, a >> number of people have created `hg show` as effectively an alias to >> `hg export`. And, some were concerned that Git users used to `git show` >> being equivalent to `hg export` would be confused by a `hg show` doing >> something different. >> > > `git show` is not equivalent to `hg export`, quoting git-show(1): > >Shows one or more objects (blobs, trees, tags and commits). > >For commits it shows the log message and textual diff. It also >presents the merge commit in a special format as produced by git >diff-tree --cc. > >For tags, it shows the tag message and the referenced objects. > >For trees, it shows the names (equivalent to git ls-tree with >--name-only). > >For plain blobs, it shows the plain contents. > TIL. I've only ever used `git show` for the "show a commit representation" use case and `git cat-file` for displaying low-level objects. > > So only the first case is equivalent to `hg export` (or probably more > `hg log -vpr`). Other cases are quite close to the "view" concept > introduced here, as far as I understand. > > Then if a revision can be registered as a view, `hg show` could just be > a plain replacement to the aforementioned alias I guess. > > Given this and the conflict with `hg diff`, could we reconsider > the command name? > That is an interesting proposal. But I'm concerned with overlapping namespaces. What values do we allow for the non-view behavior? Hash fragments? Names (bookmarks, branches, tags)? If we allow names, what happens when a name in a repo conflicts with a registered view name? What happens if a view name conflicts with a changeset prefix? Of course, to know if there is a collision you have to load names. That means (slightly more) overhead to run the command. FWIW, my idea for this command was to show representations of multiple things. I'm willing to entertain the idea of "show me single entity X" (changeset, tag, bookmark, etc). The easy solution is an argument to a view (`hg display tag my-tag`). Things get harder when we merge namespaces. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] commands: introduce `hg display`
On Sun, Nov 6, 2016 at 1:52 AM, timelesswrote: > Gregory Szorc wrote: > > @@ -2019,6 +2026,13 @@ Dish up an empty repo; serve it cold. > >diff repository (or selected files) > > > > > > + > > + display > > + > > + > > + show various repository information > > + > > + > > > >export > > > > Will /help/display list the views it supports? > It should. I forgot to implement that. It can be done as a follow-up easily enough. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 04 of 10 V2] bundle2: use compression engines API to obtain decompressor
# HG changeset patch # User Gregory Szorc# Date 1478572608 28800 # Mon Nov 07 18:36:48 2016 -0800 # Node ID 439b96dc6e896d9875ae24fae4a59e41b00b63c6 # Parent 8b1e72914d246af5703ea5bad9bd3cb051463164 bundle2: use compression engines API to obtain decompressor Like the recent change for the compressor side, this too is relatively straightforward. We now store a compression engine on the instance instead of a low-level decompressor. Again, this will allow us to easily transition to different compression engine APIs when they are implemented. diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py --- a/mercurial/bundle2.py +++ b/mercurial/bundle2.py @@ -681,7 +681,7 @@ class unbundle20(unpackermixin): def __init__(self, ui, fp): """If header is specified, we do not read it out of the stream.""" self.ui = ui -self._decompressor = util.decompressors[None] +self._compengine = util.compengines.forbundletype('UN') self._compressed = None super(unbundle20, self).__init__(fp) @@ -755,9 +755,9 @@ class unbundle20(unpackermixin): params = self._readexact(paramssize) self._processallparams(params) yield params -assert self._decompressor is util.decompressors[None] +assert self._compengine.bundletype == 'UN' # From there, payload might need to be decompressed -self._fp = self._decompressor(self._fp) +self._fp = self._compengine.decompressorreader(self._fp) emptycount = 0 while emptycount < 2: # so we can brainlessly loop @@ -781,7 +781,7 @@ class unbundle20(unpackermixin): # make sure param have been loaded self.params # From there, payload need to be decompressed -self._fp = self._decompressor(self._fp) +self._fp = self._compengine.decompressorreader(self._fp) indebug(self.ui, 'start extraction of bundle2 parts') headerblock = self._readpartheader() while headerblock is not None: @@ -823,10 +823,10 @@ def b2streamparamhandler(name): @b2streamparamhandler('compression') def processcompression(unbundler, param, value): """read compression parameter and install payload decompression""" -if value not in util.decompressors: +if value not in util.compengines.supportedbundletypes: raise error.BundleUnknownFeatureError(params=(param,), values=(value,)) -unbundler._decompressor = util.decompressors[value] +unbundler._compengine = util.compengines.forbundletype(value) if value is not None: unbundler._compressed = True ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 05 of 10 V2] changegroup: use compression engines API
# HG changeset patch # User Gregory Szorc# Date 1478572693 28800 # Mon Nov 07 18:38:13 2016 -0800 # Node ID 5642a2b769a73befd6c3e3539e7e373a20392f3a # Parent 439b96dc6e896d9875ae24fae4a59e41b00b63c6 changegroup: use compression engines API The new API doesn't have the equivalence for None and 'UN' so we introduce code to use 'UN' explicitly. diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py --- a/mercurial/changegroup.py +++ b/mercurial/changegroup.py @@ -137,14 +137,16 @@ class cg1unpacker(object): _grouplistcount = 1 # One list of files after the manifests def __init__(self, fh, alg, extras=None): -if alg == 'UN': -alg = None # get more modern without breaking too much -if not alg in util.decompressors: +if alg is None: +alg = 'UN' +if alg not in util.compengines.supportedbundletypes: raise error.Abort(_('unknown stream compression type: %s') % alg) if alg == 'BZ': alg = '_truncatedBZ' -self._stream = util.decompressors[alg](fh) + +compengine = util.compengines.forbundletype(alg) +self._stream = compengine.decompressorreader(fh) self._type = alg self.extras = extras or {} self.callback = None ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 02 of 10 V2] bundle2: use new compression engine API for compression
# HG changeset patch # User Gregory Szorc# Date 1478572543 28800 # Mon Nov 07 18:35:43 2016 -0800 # Node ID 9c4c59fa0b44412bd59170f850463c68497b43da # Parent f3c9da54ff5e23becaa4d0e90a20c9de704a70ba bundle2: use new compression engine API for compression Now that we have a new API to define compression engines, let's put it to use! The new code stores a reference to the compression engine instead of a low-level compressor object. This will allow us to more easily transition to different APIs on the compression engine interface once we implement them. As part of this, we change the registration in bundletypes to use 'UN' instead of None. Previously, util.compressors had the no-op compressor registered under both the 'UN' and None keys. Since we're switching to a new API, I don't see the point in carrying this dual registration forward. diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py --- a/mercurial/bundle2.py +++ b/mercurial/bundle2.py @@ -485,11 +485,11 @@ def encodecaps(caps): return '\n'.join(chunks) bundletypes = { -"": ("", None), # only when using unbundle on ssh and old http servers +"": ("", 'UN'), # only when using unbundle on ssh and old http servers # since the unification ssh accepts a header but there # is no capability signaling it. "HG20": (), # special-cased below -"HG10UN": ("HG10UN", None), +"HG10UN": ("HG10UN", 'UN'), "HG10BZ": ("HG10", 'BZ'), "HG10GZ": ("HG10GZ", 'GZ'), } @@ -511,7 +511,7 @@ class bundle20(object): self._params = [] self._parts = [] self.capabilities = dict(capabilities) -self._compressor = util.compressors[None]() +self._compengine = util.compengines.forbundletype('UN') def setcompression(self, alg): """setup core part compression to """ @@ -519,7 +519,7 @@ class bundle20(object): return assert not any(n.lower() == 'Compression' for n, v in self._params) self.addparam('Compression', alg) -self._compressor = util.compressors[alg]() +self._compengine = util.compengines.forbundletype(alg) @property def nbparts(self): @@ -572,11 +572,12 @@ class bundle20(object): if param: yield param # starting compression +compressor = self._compengine.compressorobj() for chunk in self._getcorechunk(): -data = self._compressor.compress(chunk) +data = compressor.compress(chunk) if data: yield data -yield self._compressor.flush() +yield compressor.flush() def _paramchunk(self): """return a encoded version of all stream parameters""" @@ -1318,18 +1319,19 @@ def writebundle(ui, cg, filename, bundle raise error.Abort(_('old bundle types only supports v1 ' 'changegroups')) header, comp = bundletypes[bundletype] -if comp not in util.compressors: +if comp not in util.compengines.supportedbundletypes: raise error.Abort(_('unknown stream compression type: %s') % comp) -z = util.compressors[comp]() +compengine = util.compengines.forbundletype(comp) +compressor = compengine.compressorobj() subchunkiter = cg.getchunks() def chunkiter(): yield header for chunk in subchunkiter: -data = z.compress(chunk) +data = compressor.compress(chunk) if data: yield data -yield z.flush() +yield compressor.flush() chunkiter = chunkiter() # parse the changegroup data, otherwise we will block ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 01 of 10 V2] util: create new abstraction for compression engines
# HG changeset patch # User Gregory Szorc# Date 1478572299 28800 # Mon Nov 07 18:31:39 2016 -0800 # Node ID f3c9da54ff5e23becaa4d0e90a20c9de704a70ba # Parent 0911191dc4c97cbc8334c8b83782e8134bf621f0 util: create new abstraction for compression engines Currently, util.py has "compressors" and "decompressors" dicts mapping compression algorithms to callables returning objects that perform well-defined operations. In addition, revlog.py has code for calling into a compressor or decompressor explicitly. And, there is code in the wire protocol for performing zlib compression. The 3rd party lz4revlog extension has demonstrated the utility of supporting alternative compression formats for revlog storage. But it stops short of supporting lz4 for bundles and the wire protocol. There are also plans to support zstd as a general compression replacement. So, there appears to be a market for a unified API for registering compression engines. This commit starts the process of establishing one. This commit establishes a base class/interface for defining compression engines and how they will be used. A collection class to hold references to registered compression engines has also been introduced. The built-in zlib, bz2, truncated bz2, and no-op compression engines are registered with a singleton instance of the collection class. The compression engine API will change once consumers are ported to the new API and some common patterns can be simplified at the engine API level. So don't get too attached to the API... diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -2856,13 +2856,219 @@ class ctxmanager(object): raise exc_val return received and suppressed -# compression utility +# compression code + +class compressormanager(object): +"""Holds registrations of various compression engines. + +This class essentially abstracts the differences between compression +engines to allow new compression formats to be added easily, possibly from +extensions. + +Compressors are registered against the global instance by calling its +``register()`` method. +""" +def __init__(self): +self._engines = {} +# Bundle spec human name to engine name. +self._bundlenames = {} +# Internal bundle identifier to engine name. +self._bundletypes = {} + +def __getitem__(self, key): +return self._engines[key] + +def __contains__(self, key): +return key in self._engines + +def __iter__(self): +return iter(self._engines.keys()) + +def register(self, engine): +"""Register a compression engine with the manager. + +The argument must be a ``compressionengine`` instance. +""" +if not isinstance(engine, compressionengine): +raise ValueError(_('argument must be a compressionengine')) + +name = engine.name() + +if name in self._engines: +raise error.Abort(_('compression engine %s already registered') % + name) + +bundleinfo = engine.bundletype() +if bundleinfo: +bundlename, bundletype = bundleinfo + +if bundlename in self._bundlenames: +raise error.Abort(_('bundle name %s already registered') % + bundlename) +if bundletype in self._bundletypes: +raise error.Abort(_('bundle type %s already registered by %s') % + (bundletype, self._bundletypes[bundletype])) + +# No external facing name declared. +if bundlename: +self._bundlenames[bundlename] = name + +self._bundletypes[bundletype] = name + +self._engines[name] = engine + +@property +def supportedbundlenames(self): +return set(self._bundlenames.keys()) + +@property +def supportedbundletypes(self): +return set(self._bundletypes.keys()) + +def forbundlename(self, bundlename): +"""Obtain a compression engine registered to a bundle name. + +Will raise KeyError if the bundle type isn't registered. +""" +return self._engines[self._bundlenames[bundlename]] + +def forbundletype(self, bundletype): +"""Obtain a compression engine registered to a bundle type. + +Will raise KeyError if the bundle type isn't registered. +""" +return self._engines[self._bundletypes[bundletype]] + +compengines = compressormanager() + +class compressionengine(object): +"""Base class for compression engines. + +Compression engines must implement the interface defined by this class. +""" +def name(self): +"""Returns the name of the compression engine. + +This is the key the engine is registered under. + +This method must be implemented. +""" +raise
[PATCH 08 of 10 V2] bundle2: use compressstream compression engine API
# HG changeset patch # User Gregory Szorc# Date 1478573197 28800 # Mon Nov 07 18:46:37 2016 -0800 # Node ID fc931794a250e605717cc066f26512c0dcc81224 # Parent fa24595b79b603ff7be6f32b849c07ddfdee3da4 bundle2: use compressstream compression engine API Compression engines now have an API for compressing a stream of chunks. Switch to it and make low-level compression code disappear. diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py --- a/mercurial/bundle2.py +++ b/mercurial/bundle2.py @@ -571,13 +571,8 @@ class bundle20(object): yield _pack(_fstreamparamsize, len(param)) if param: yield param -# starting compression -compressor = self._compengine.compressorobj() -for chunk in self._getcorechunk(): -data = compressor.compress(chunk) -if data: -yield data -yield compressor.flush() +for chunk in self._compengine.compressstream(self._getcorechunk()): +yield chunk def _paramchunk(self): """return a encoded version of all stream parameters""" @@ -1323,15 +1318,10 @@ def writebundle(ui, cg, filename, bundle raise error.Abort(_('unknown stream compression type: %s') % comp) compengine = util.compengines.forbundletype(comp) -compressor = compengine.compressorobj() -subchunkiter = cg.getchunks() def chunkiter(): yield header -for chunk in subchunkiter: -data = compressor.compress(chunk) -if data: -yield data -yield compressor.flush() +for chunk in compengine.compressstream(cg.getchunks()): +yield chunk chunkiter = chunkiter() # parse the changegroup data, otherwise we will block ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 09 of 11] bundle2: use compressstream compression engine API
On Mon, Nov 7, 2016 at 6:13 AM, Pierre-Yves David < pierre-yves.da...@ens-lyon.org> wrote: > > > On 11/02/2016 01:08 AM, Gregory Szorc wrote: > >> # HG changeset patch >> # User Gregory Szorc>> # Date 1477160145 25200 >> # Sat Oct 22 11:15:45 2016 -0700 >> # Node ID 03555032b7e3bc7192fd8bebf6af3f05b1e70516 >> # Parent 1d4d111b644453acc4893478528a5f2ecd7ca023 >> bundle2: use compressstream compression engine API >> >> Compression engines now have an API for compressing a stream of >> chunks. Switch to it and make low-level compression code disappear. >> > > Do we get any performance benefit for this ? I know you have spend a lot > of time tracking performance gain in bundle creation/application. And this > likely have some effect. > > Talking about performance, Philippe Pépiot have a patch to setup some > official performance tracking tool, if you could help reviewing it we could > include these operations to it and we would have an easy and standard way > to get these number. >From this patch, most likely not. The reason is because the code is nearly identical and I expect any performance changes due to how functions are called to be dwarfed by the time spent inside the compressor. > > > diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py >> --- a/mercurial/bundle2.py >> +++ b/mercurial/bundle2.py >> @@ -566,23 +566,18 @@ class bundle20(object): >> self.ui.debug(''.join(msg)) >> outdebug(self.ui, 'start emission of %s stream' % >> self._magicstring) >> yield self._magicstring >> param = self._paramchunk() >> outdebug(self.ui, 'bundle parameter: %s' % param) >> yield _pack(_fstreamparamsize, len(param)) >> if param: >> yield param >> -# starting compression >> -compressor = self._compengine.compressorobj() >> -for chunk in self._getcorechunk(): >> -data = compressor.compress(chunk) >> -if data: >> -yield data >> -yield compressor.flush() >> +for chunk in self._compengine.compressstrea >> m(self._getcorechunk()): >> +yield chunk >> >> def _paramchunk(self): >> """return a encoded version of all stream parameters""" >> blocks = [] >> for par, value in self._params: >> par = urlreq.quote(par) >> if value is not None: >> value = urlreq.quote(value) >> @@ -1318,25 +1313,20 @@ def writebundle(ui, cg, filename, bundle >> if cg.version != '01': >> raise error.Abort(_('old bundle types only supports v1 ' >> 'changegroups')) >> header, comp = bundletypes[bundletype] >> if comp not in util.compressionengines.supportedbundletypes: >> raise error.Abort(_('unknown stream compression type: %s') >>% comp) >> compengine = util.compressionengines.forbundletype(comp) >> -compressor = compengine.compressorobj() >> -subchunkiter = cg.getchunks() >> def chunkiter(): >> yield header >> -for chunk in subchunkiter: >> -data = compressor.compress(chunk) >> -if data: >> -yield data >> -yield compressor.flush() >> +for chunk in compengine.compressstream(cg.getchunks()): >> +yield chunk >> chunkiter = chunkiter() >> >> # parse the changegroup data, otherwise we will block >> # in case of sshrepo because we don't know the end of the stream >> return changegroup.writechunks(ui, chunkiter, filename, vfs=vfs) >> >> @parthandler('changegroup', ('version', 'nbchanges', 'treemanifest')) >> def handlechangegroup(op, inpart): >> > > > -- > Pierre-Yves David > ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 02 of 11] util: create new abstraction for compression engines
On Mon, Nov 7, 2016 at 5:36 AM, Pierre-Yves David < pierre-yves.da...@ens-lyon.org> wrote: > > > On 11/02/2016 01:08 AM, Gregory Szorc wrote: > >> # HG changeset patch >> # User Gregory Szorc>> # Date 1477966026 25200 >> # Mon Oct 31 19:07:06 2016 -0700 >> # Node ID 4015d575d311cd7ebc923d1320e55a76c655c485 >> # Parent 60f180c9a030ebcee6c6f4f8584fdb94c73ac337 >> util: create new abstraction for compression engines >> >> Currently, util.py has "compressors" and "decompressors" dicts >> mapping compression algorithms to callables returning object that >> perform well-defined operations. In addition, revlog.py has code >> for calling into a compressor or decompressor explicitly. And, there >> is code in the wire protocol for performing zlib compression. >> >> The 3rd party lz4revlog extension has demonstrated the utility of >> supporting alternative compression formats for revlog storage. But >> it stops short of supporting lz4 for bundles and the wire protocol. >> >> There are also plans to support zstd as a general compression >> replacement. >> >> So, there appears to be a market for a unified API for registering >> compression engines. This commit starts the process of establishing >> one. It establishes a new container class for holding registered >> compression engine objects. Each object declares and supports common >> operations via attributes. >> >> The built-in zlib, bz2, truncated bz2, and no-op compression engines >> are registered with a singleton instance of this class. >> >> It's worth stating that I'm no fan of the "decompressorreader" API. >> But this is what existing consumers expect. My plans are to get >> consumers using the new "engines" API then transition them to a >> better decompression primitive. This partially explains why I don't >> care about the duplicated code pattern used for decompressors >> (it is abstracted into _makedecompressor in the existing code). >> > > The plan seems overall good, I've some suggestion on the implementation. > > > diff --git a/mercurial/util.py b/mercurial/util.py >> --- a/mercurial/util.py >> +++ b/mercurial/util.py >> @@ -2851,21 +2851,156 @@ class ctxmanager(object): >> exc_type, exc_val, exc_tb = pending = sys.exc_info() >> del self._atexit >> if pending: >> raise exc_val >> return received and suppressed >> >> # compression utility >> >> +class compressormanager(object): >> +"""Holds registrations of various compression engines. >> + >> +This class essentially abstracts the differences between compression >> +engines to allow new compression formats to be added easily, >> possibly from >> +extensions. >> + >> +Compressors are registered against the global instance by calling its >> +``register()`` method. >> +""" >> +def __init__(self): >> +self._engines = {} >> +self._bundletypes = {} >> + >> +def __getitem__(self, key): >> +return self._engines[key] >> + >> +def __contains__(self, key): >> +return key in self._engines >> + >> +def __iter__(self): >> +return iter(self._engines.keys()) >> + >> +def register(self, name, engine): >> +"""Register a compression format with the manager. >> + >> +The passed compression engine is an object with attributes >> describing >> +behavior and methods performing well-defined actions. The >> following >> +attributes are recognized (all are optional): >> + >> +* bundletype -- Attribute containing the identifier of this >> compression >> + format as used by bundles. >> + >> +* compressorobj -- Method returning an object with >> ``compress(data)`` >> + and ``flush()`` methods. This object and these methods are >> used to >> + incrementally feed data (presumably uncompressed) chunks into a >> + compressor. Calls to these methods return compressed bytes, >> which >> + may be 0-length if there is no output for the operation. >> + >> +* decompressorreader -- Method that is used to perform >> decompression >> + on a file object. Argument is an object with a ``read(size)`` >> method >> + that returns compressed data. Return value is an object with a >> + ``read(size)`` that returns uncompressed data. >> +""" >> > > This method would be a great decorator candidate. Could we get the name > from the object (as we do for the other property?) or have it declared as > part of a decorator (but I think the property approach is more consistent > with the other bits). > > Being a decorator probably means to move away from > > +bundletype = getattr(engine, 'bundletype', None) >> > > Apparently the 'bundletype' can be None but there is not mention of it in > the documentation. Can the documentation be updated? > Also, I'm not sure why the bundletype attribut is optional. Could we just > have it mandatory > > +
[Bug 5420] New: rebase -b should calculate ancestors seperately
https://bz.mercurial-scm.org/show_bug.cgi?id=5420 Bug ID: 5420 Summary: rebase -b should calculate ancestors seperately Product: Mercurial Version: default branch Hardware: All OS: All Status: UNCONFIRMED Severity: feature Priority: wish Component: rebase Assignee: bugzi...@selenic.com Reporter: arcppzju+hg...@gmail.com CC: mercurial-de...@selenic.com Given the following graph: 5 | 4 |/ 3 | 2 |/ 1 rebase -b 2+4 -d 5 will use the revset (ancestor(2+4)::(2+4) - ancestor(2+4)):: as the source, which is (1::(2+4) - 1)::, and it's finally 2+3+4+5. So it may be better if we calculate ancestors for each revision -b specifies: (ancestor(4,5):: - ancestor(4,5)):: + (ancestor(2,5):: - ancestor(2,5)):: and that resolves to 2+4 as expected. -- You are receiving this mail because: You are on the CC list for the bug. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH] keyword: handle filectx _customcmp
# HG changeset patch # User Christian Ebert# Date 1476718966 -7200 # Mon Oct 17 17:42:46 2016 +0200 # Node ID 94e42c8808cdd96891a9f375f02a0760670e33d8 # Parent d06c049695e6ad3219e7479c65ce98a2f123e878 keyword: handle filectx _customcmp Suggested by Yuya Nishihara: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089461.html Related to issue5364. diff --git a/hgext/keyword.py b/hgext/keyword.py --- a/hgext/keyword.py +++ b/hgext/keyword.py @@ -737,6 +737,8 @@ def reposetup(ui, repo): return ret def kwfilectx_cmp(orig, self, fctx): +if fctx._customcmp: +return fctx.cmp(self) # keyword affects data size, comparing wdir and filelog size does # not make sense if (fctx._filenode is None and ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[Bug 5419] New: hg revert crashes with multiple renames and --rev
https://bz.mercurial-scm.org/show_bug.cgi?id=5419 Bug ID: 5419 Summary: hg revert crashes with multiple renames and --rev Product: Mercurial Version: default branch Hardware: All OS: All Status: UNCONFIRMED Severity: bug Priority: wish Component: Mercurial Assignee: bugzi...@selenic.com Reporter: arcppzju+hg...@gmail.com CC: mercurial-de...@selenic.com The following commands will crash "hg revert": $ hg init repo $ cd repo $ touch a $ hg commit -A a -m a $ hg mv a a1 $ hg commit -m a1 $ hg mv a1 a2 $ hg revert -a -r 0 -- You are receiving this mail because: You are on the CC list for the bug. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 10 of 11] hgweb: use compression engine API for zlib compression
On Mon, Nov 7, 2016 at 6:25 AM, Pierre-Yves David < pierre-yves.da...@ens-lyon.org> wrote: > > > On 11/02/2016 01:08 AM, Gregory Szorc wrote: > >> # HG changeset patch >> # User Gregory Szorc>> # Date 1477160356 25200 >> # Sat Oct 22 11:19:16 2016 -0700 >> # Node ID fc426af4f25c3403703e913ccb4a6865865fcb02 >> # Parent 03555032b7e3bc7192fd8bebf6af3f05b1e70516 >> hgweb: use compression engine API for zlib compression >> >> More low-level compression code elimination because we now have nice >> APIs. >> >> diff --git a/mercurial/hgweb/protocol.py b/mercurial/hgweb/protocol.py >> --- a/mercurial/hgweb/protocol.py >> +++ b/mercurial/hgweb/protocol.py >> @@ -83,24 +83,18 @@ class webproto(wireproto.abstractserverp >> yield chunk >> >> return self.compresschunks(getchunks()) >> >> def compresschunks(self, chunks): >> # Don't allow untrusted settings because disabling compression or >> # setting a very high compression level could lead to flooding >> # the server's network or CPU. >> -z = zlib.compressobj(self.ui.configint('server', 'zliblevel', >> -1)) >> -for chunk in chunks: >> -data = z.compress(chunk) >> -# Not all calls to compress() emit data. It is cheaper to >> inspect >> -# that here than to send it via the generator. >> -if data: >> -yield data >> -yield z.flush() >> +opts = {'level': self.ui.configint('server', 'zliblevel', -1)} >> +return util.compressionengines['zlib'].compressstream(chunks, >> opts) >> > > Out of curiosity, what is the long term plan for this zliblevel option > here? > * Having some special case for each compressors in the code, > * Having a generic callback to set this up, > * Pass ui to the compressors for auto configuration, > * something else? > I haven't fully solved this problem for all cases. For bundles, I plan on extending the "bundle spec" mechanism to allow defining compression parameters. See https://hg.mozilla.org/users/gszorc_mozilla.com/hg/rev/04f0144c9142. For the wire protocol, I was tentatively planning on reusing [server]. For revlogs, we could reuse [format]. In many cases, yes, we'd need to pass a ui or have the caller pass in options read from a ui. The project survived for years without having any configuration knobs for zlib. So I think we make sane default choices for new engines and add the knobs later. > > def _client(self): >> return 'remote:%s:%s:%s' % ( >> self.req.env.get('wsgi.url_scheme') or 'http', >> urlreq.quote(self.req.env.get('REMOTE_HOST', '')), >> urlreq.quote(self.req.env.get('REMOTE_USER', ''))) >> >> def iscmd(cmd): >> > > Cheers, > > -- > Pierre-Yves David > ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH STABLE V2] hgweb: cache fctx.parents() in annotate command (issue5414)
Excerpts from Gregory Szorc's message of 2016-11-07 09:29:40 -0800: > Could we change basefilectx.annotate() to return a rich data structure > instead of a list of tuples? That data structure could have the cached > parents and other reusable cache data (which could be passed into > subsequent calls if needed). It's already returning "fctx", which is rich... ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH RESEND] revert: do not reverse hunks in interactive when REV is not parent (issue5096)
# HG changeset patch # User Denis Laxalde# Date 1475237490 -7200 # Fri Sep 30 14:11:30 2016 +0200 # Node ID 4f7b7750403ce48e78d0f361236f65ac03584c3c # Parent d06c049695e6ad3219e7479c65ce98a2f123e878 revert: do not reverse hunks in interactive when REV is not parent (issue5096) And introduce a new "apply" operation verb for this case as suggested in issue5096. This replaces the no longer used "revert" operation. In interactive revert, when reverting to something else that the parent revision, display an "apply this change" message with a diff that is not reversed. The rationale is that `hg revert -i -r REV` will show hunks of the diff from the working directory to REV and prompt the user to select them for applying (to working directory). This somehow contradicts dcc56e10c23b in which it was decided to have the "direction" of prompted hunks reversed... Drop no longer used "experimental.revertalternateinteractivemode" configuration option. (Keeping it would lead to inconsistent prompt message vs. hunks display.) diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py --- a/mercurial/cmdutil.py +++ b/mercurial/cmdutil.py @@ -3291,15 +3291,17 @@ def _performrevert(repo, parents, ctx, a diffopts = patch.difffeatureopts(repo.ui, whitespace=True) diffopts.nodates = True diffopts.git = True -reversehunks = repo.ui.configbool('experimental', - 'revertalternateinteractivemode', - True) +if node == parent: +operation = 'discard' +reversehunks = True +else: +operation = 'apply' +reversehunks = False if reversehunks: diff = patch.diff(repo, ctx.node(), None, m, opts=diffopts) else: diff = patch.diff(repo, None, ctx.node(), m, opts=diffopts) originalchunks = patch.parsepatch(diff) -operation = 'discard' if node == parent else 'revert' try: diff --git a/mercurial/patch.py b/mercurial/patch.py --- a/mercurial/patch.py +++ b/mercurial/patch.py @@ -980,14 +980,14 @@ def filterpatch(ui, headers, operation=N operation = 'record' messages = { 'multiple': { +'apply': _("apply change %d/%d to '%s'?"), 'discard': _("discard change %d/%d to '%s'?"), 'record': _("record change %d/%d to '%s'?"), -'revert': _("revert change %d/%d to '%s'?"), }[operation], 'single': { +'apply': _("apply this change to '%s'?"), 'discard': _("discard this change to '%s'?"), 'record': _("record this change to '%s'?"), -'revert': _("revert this change to '%s'?"), }[operation], } diff --git a/tests/test-revert-interactive.t b/tests/test-revert-interactive.t --- a/tests/test-revert-interactive.t +++ b/tests/test-revert-interactive.t @@ -57,45 +57,45 @@ 10 run the same test than 8 from within 2 hunks, 2 lines changed examine changes to 'f'? [Ynesfdaq?] y - @@ -1,5 +1,6 @@ - +a - 1 - 2 - 3 - 4 - 5 - revert change 1/6 to 'f'? [Ynesfdaq?] y - - @@ -1,5 +2,6 @@ + @@ -1,6 +1,5 @@ + -a 1 2 3 4 5 - +b - revert change 2/6 to 'f'? [Ynesfdaq?] y + apply change 1/6 to 'f'? [Ynesfdaq?] y + + @@ -2,6 +1,5 @@ + 1 + 2 + 3 + 4 + 5 + -b + apply change 2/6 to 'f'? [Ynesfdaq?] y diff --git a/folder1/g b/folder1/g 2 hunks, 2 lines changed examine changes to 'folder1/g'? [Ynesfdaq?] y - @@ -1,5 +1,6 @@ - +c + @@ -1,6 +1,5 @@ + -c 1 2 3 4 5 - revert change 3/6 to 'folder1/g'? [Ynesfdaq?] y + apply change 3/6 to 'folder1/g'? [Ynesfdaq?] y - @@ -1,5 +2,6 @@ + @@ -2,6 +1,5 @@ 1 2 3 4 5 - +d - revert change 4/6 to 'folder1/g'? [Ynesfdaq?] n + -d + apply change 4/6 to 'folder1/g'? [Ynesfdaq?] n diff --git a/folder2/h b/folder2/h 2 hunks, 2 lines changed @@ -143,12 +143,12 @@ Test that a noop revert doesn't do an un 1 hunks, 1 lines changed examine changes to 'folder1/g'? [Ynesfdaq?] y - @@ -3,3 +3,4 @@ + @@ -3,4 +3,3 @@ 3 4 5 - +d - revert this change to 'folder1/g'? [Ynesfdaq?] n + -d + apply this change to 'folder1/g'? [Ynesfdaq?] n $ ls folder1/ g @@ -159,12 +159,12 @@ Test --no-backup 1 hunks, 1 lines changed examine changes to 'folder1/g'? [Ynesfdaq?] y - @@ -3,3 +3,4 @@ + @@ -3,4 +3,3 @@ 3 4 5 - +d - revert this change to 'folder1/g'? [Ynesfdaq?] y + -d + apply this change to 'folder1/g'? [Ynesfdaq?] y $ ls folder1/ g @@ -190,45 +190,45 @@ Test --no-backup 2 hunks, 2 lines changed examine changes to 'f'? [Ynesfdaq?] y - @@ -1,5 +1,6 @@ - +a - 1 - 2 - 3 - 4 - 5 - revert change 1/6 to 'f'? [Ynesfdaq?] y - - @@ -1,5 +2,6 @@ + @@ -1,6 +1,5 @@ + -a 1 2 3 4 5 - +b -
[Bug 5418] New: Operation not permitted for utime
https://bz.mercurial-scm.org/show_bug.cgi?id=5418 Bug ID: 5418 Summary: Operation not permitted for utime Product: Mercurial Version: 3.9 Hardware: PC OS: Linux Status: UNCONFIRMED Severity: feature Priority: wish Component: Mercurial Assignee: bugzi...@selenic.com Reporter: m...@kiilerich.com CC: mercurial-de...@selenic.com When using a repo owned by another user but where I have group suid and proper umask, I get 'Operation not permitted' from os.utime. From linux utime man page: Changing timestamps is permitted when: either the process has appropriate privileges, or the effective user ID equals the user ID of the file, or times is NULL and the process has write permission for the file. If times is NULL, then the access and modification times of the file are set to the current time. The utime usage in 731ced087a4b does thus apparently not work with use cases that Mercurial "always" has supported. -- You are receiving this mail because: You are on the CC list for the bug. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 6 of 7] py3: use try/except to check for basestring
On Mon, Nov 7, 2016 at 5:55 PM, Yuya Nishiharawrote: > On Mon, 7 Nov 2016 00:15:21 +0530, Pulkit Goyal wrote: >> This >> https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089099.html >> is a better version of what I want to do, since this didn't went >> through I will be using this. > > I'm okay with that pycompat.basestring stuff, but I'm pretty sure most of > our basestring uses are moot since we avoid using unicodes except for very > specific string manipulations. > >> >> @@ -520,7 +520,12 @@ >> >> result = self.config(section, name, untrusted=untrusted) >> >> if result is None: >> >> result = default or [] >> >> -if isinstance(result, basestring): >> >> +checkunicode = False >> >> +try: >> >> +checkunicode = isinstance(result, basestring) >> >> +except NameError: >> >> +checkunicode = isinstance(result, str) >> >> +if checkunicode: >> >> result = _configlist(result.lstrip(' ,\n')) > > And with this change, ui.configlist() would look as if it supports unicodes, > which seems confusing. Can you cherry pick some commits from that series? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: hglib uses distutils that is being deprecated
On 11/07/2016 03:37 PM, Barry Scott wrote: On Monday, 7 November 2016 15:28:33 GMT Pierre-Yves David wrote: On 11/07/2016 03:23 PM, Barry Scott wrote: So that I could use the recents improvements to python-hglib I built a wheel. I needed to patch setup.py to do this /distutils/setuputils/ so that I could create the wheel with python3 setup.py sdist bdist_wheel Then when I installed my wheel I go this: $ pip3.5 install --upgrade /home/barry/wc/hg/hglib/dist/ python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl Processing /home/barry/wc/hg/hglib/dist/ python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl Installing collected packages: python-hglib Found existing installation: python-hglib 2.0 DEPRECATION: Uninstalling a distutils installed project (python-hglib) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project. Uninstalling python-hglib-2.0: Successfully uninstalled python-hglib-2.0 Successfully installed python-hglib-2.2-6-0f81ed8e147b-20161107 Do you have a plan to update to setuputils? I'm guessing that you want to use distutils to support verion old python versions. If that is true then I'd guess that the setup.py would need to do something like: try: from setuptools import setup except ImportError: from distutils import setup I know that Gregory Szorc is building wheel for Mercurial itself. We can probably use the same approache used by Mercurial in hglib (whatever this approach is). Can you send a patch for hglib? I don't think a wheel is created for mercurial. The internet disagree https://pypi.python.org/pypi/Mercurial On Windows its a .exe and on Fedoara the site-specific/mercurial is installed from the RPM. Looking a bit closer at hglib I only see PyPI with a .tar.gz source file. I guess you do nto use wheels at all and pip will do the setup.py install dance for the user. I could patch to change from distutil to setuputil. But someone that knows hglib's packaging strategy needs to speak to what is sensible to do. Cheers, -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range
On 11/03/2016 10:34 PM, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich# Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1 # Parent 3e0216b2a0995cb21946bc13fb21391013332c57 bdiff: make sure we append to repeated lines instead of inserting into range This will mitigate the symptoms that tests exposed in the previous changeset. small nits Would it be possible to have something patch-5-ish before patch 4 then to reduce the line change in patch 5? Arguably, we need similar handling for longer sequences of repeated lines... But also, we already have examples of how the heuristics handle other cases in a similar way. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -187,7 +187,7 @@ static int longest_match(struct bdiff_li } else if (i == mi && findbetterb) { /* better j in first upper half */ mj = j; - if (j <= bhalf) + if (j <= bhalf && !(j > 0 && k == 1 && b[j - 1].e == b[j].e)) findbetterb = 0; } } diff --git a/tests/test-annotate.t b/tests/test-annotate.t --- a/tests/test-annotate.t +++ b/tests/test-annotate.t @@ -91,8 +91,8 @@ annotate (JSON) annotate -n b $ hg annotate -n b + 0: a 1: a - 0: a 1: a 3: b4 3: b5 @@ -111,8 +111,8 @@ annotate --no-follow b annotate -nl b $ hg annotate -nl b - 1:1: a 0:1: a + 1:2: a 1:3: a 3:4: b4 3:5: b5 @@ -121,8 +121,8 @@ annotate -nl b annotate -nf b $ hg annotate -nf b + 0 a: a 1 a: a - 0 a: a 1 a: a 3 b: b4 3 b: b5 @@ -131,8 +131,8 @@ annotate -nf b annotate -nlf b $ hg annotate -nlf b - 1 a:1: a 0 a:1: a + 1 a:2: a 1 a:3: a 3 b:4: b4 3 b:5: b5 @@ -156,8 +156,8 @@ annotate -nlf b annotate after merge $ hg annotate -nf b + 0 a: a 1 a: a - 0 a: a 1 a: a 3 b: b4 4 b: c @@ -166,8 +166,8 @@ annotate after merge annotate after merge with -l $ hg annotate -nlf b - 1 a:1: a 0 a:1: a + 1 a:2: a 1 a:3: a 3 b:4: b4 4 b:5: c @@ -198,7 +198,7 @@ annotate after merge with -l annotate after rename merge $ hg annotate -nf b - 1 a: a + 0 a: a 6 b: z 1 a: a 3 b: b4 @@ -209,7 +209,7 @@ annotate after rename merge annotate after rename merge with -l $ hg annotate -nlf b - 1 a:1: a + 0 a:1: a 6 b:2: z 1 a:3: a 3 b:4: b4 @@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit $ echo more >> b $ hg ci -mmore -d '7 0' $ hg annotate -nlf b - 1 a: 1: a + 0 a: 1: a 6 b: 2: z 1 a: 3: a 3 b: 4: b4 @@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit linkrev vs rev $ hg annotate -r tip -n a + 0: a 1: a - 0: a 1: a linkrev vs rev with -l $ hg annotate -r tip -nl a - 1:1: a 0:1: a + 1:2: a 1:3: a Issue589: "undelete" sequence leads to crash diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t --- a/tests/test-bhalf.t +++ b/tests/test-bhalf.t @@ -105,8 +105,8 @@ Explore some bdiff implementation edge c --- a/x +++ b/x @@ -1,1 +1,3 @@ + a +a - a +a diff --git a/y b/y --- a/y diff --git a/tests/test-commit-amend.t b/tests/test-commit-amend.t --- a/tests/test-commit-amend.t +++ b/tests/test-commit-amend.t @@ -47,8 +47,8 @@ Amending changeset with changes in worki --- a/a Thu Jan 01 00:00:00 1970 + +++ b/a Thu Jan 01 00:00:00 1970 + @@ -1,1 +1,3 @@ + a +a - a +a $ hg log changeset: 1:43f1ba15f28a @@ -122,13 +122,13 @@ No changes, just a different message: uncompressed size of bundle content: 254 (changelog) 163 (manifests) - 141 a + 129 a saved backup bundle to $TESTTMP/.hg/strip-backup/74609c7f506e-1bfde511-amend-backup.hg (glob) 1 changesets found uncompressed size of bundle content: 250 (changelog) 163 (manifests) - 141 a + 129 a adding branch adding changesets adding manifests @@ -140,8 +140,8 @@ No changes, just a different message: --- a/a Thu Jan 01 00:00:00 1970 + +++ b/a Thu Jan 01 00:00:00 1970 + @@ -1,1 +1,3 @@ + a +a - a +a $ hg log changeset: 1:1cd866679df8 @@ -266,13 +266,13 @@ then, test editing custom commit message uncompressed size of bundle content: 249 (changelog) 163 (manifests) - 143 a + 131 a saved backup bundle to $TESTTMP/.hg/strip-backup/5f357c7560ab-e7c84ade-amend-backup.hg (glob) 1 changesets found uncompressed size of bundle content: 257 (changelog) 163 (manifests) - 143 a + 131 a adding branch adding changesets adding manifests @@ -309,13
Re: hglib uses distutils that is being deprecated
On Monday, 7 November 2016 15:28:33 GMT Pierre-Yves David wrote: > On 11/07/2016 03:23 PM, Barry Scott wrote: > > So that I could use the recents improvements to python-hglib I built a > > wheel. > > > > I needed to patch setup.py to do this /distutils/setuputils/ so that I > > could create the wheel with > > > > python3 setup.py sdist bdist_wheel > > > > Then when I installed my wheel I go this: > > > > $ pip3.5 install --upgrade /home/barry/wc/hg/hglib/dist/ > > python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl > > Processing /home/barry/wc/hg/hglib/dist/ > > python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl > > Installing collected packages: python-hglib > > > > Found existing installation: python-hglib 2.0 > > > > DEPRECATION: Uninstalling a distutils installed project (python-hglib) > > has > > > > been deprecated and will be removed in a future version. This is due to > > the > > fact that uninstalling a distutils project will only partially uninstall > > the project. > > > > Uninstalling python-hglib-2.0: > > Successfully uninstalled python-hglib-2.0 > > > > Successfully installed python-hglib-2.2-6-0f81ed8e147b-20161107 > > > > Do you have a plan to update to setuputils? > > > > I'm guessing that you want to use distutils to support verion old python > > versions. If that is true then I'd guess that the setup.py would need to > > do > > something like: > > > > try: > > from setuptools import setup > > > > except ImportError: > > from distutils import setup > > I know that Gregory Szorc is building wheel for Mercurial itself. We can > probably use the same approache used by Mercurial in hglib (whatever > this approach is). Can you send a patch for hglib? > I don't think a wheel is created for mercurial. On Windows its a .exe and on Fedoara the site-specific/mercurial is installed from the RPM. Looking a bit closer at hglib I only see PyPI with a .tar.gz source file. I guess you do nto use wheels at all and pip will do the setup.py install dance for the user. I could patch to change from distutil to setuputil. But someone that knows hglib's packaging strategy needs to speak to what is sensible to do. Barry Barry ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 11 of 11] util: remove compressorobj API from compression engines
On 11/02/2016 01:08 AM, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc# Date 1477160459 25200 # Sat Oct 22 11:20:59 2016 -0700 # Node ID 4f491f7958229b370c5929d2e2599b9ed69d8254 # Parent fc426af4f25c3403703e913ccb4a6865865fcb02 util: remove compressorobj API from compression engines It was quite low-level and there are no callers of it now that everyone is using compressstream() Wait what ‽‽ plot twist!! You should probably mention upfront that eventually killing this method is one of your goal. diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -2884,22 +2884,16 @@ class compressormanager(object): The passed compression engine is an object with attributes describing behavior and methods performing well-defined actions. The following attributes are recognized (all are optional): * bundletype -- Attribute containing the identifier of this compression format as used by bundles. -* compressorobj -- Method returning an object with ``compress(data)`` - and ``flush()`` methods. This object and these methods are used to - incrementally feed data (presumably uncompressed) chunks into a - compressor. Calls to these methods return compressed bytes, which - may be 0-length if there is no output for the operation. - * compressstream -- Compress an iterator of chunks and return an iterator of compressed chunks. Optionally accepts an argument defining how to perform compression. Each engine treats this argument differently. * decompressorreader -- Method that is used to perform decompression on a file object. Argument is an object with a ``read(size)`` method @@ -2928,19 +2922,16 @@ class compressormanager(object): compressionengines = compressormanager() class _zlibengine(object): @property def bundletype(self): return 'GZ' -def compressorobj(self): -return zlib.compressobj() - def compressstream(self, it, opts=None): opts = opts or {} z = zlib.compressobj(opts.get('level', -1)) for chunk in it: data = z.compress(chunk) # Not all calls to compress emit data. It is cheaper to inspect # here than to feed empty chunks through generator. @@ -2959,19 +2950,16 @@ class _zlibengine(object): compressionengines.register('zlib', _zlibengine()) class _bz2engine(object): @property def bundletype(self): return 'BZ' -def compressorobj(self): -return bz2.BZ2Compressor() - def compressstream(self, it, opts=None): opts = opts or {} z = bz2.BZ2Compressor(opts.get('level', 9)) for chunk in it: data = z.compress(chunk) if data: yield data @@ -2987,45 +2975,35 @@ class _bz2engine(object): compressionengines.register('bz2', _bz2engine()) class _truncatedbz2engine(object): @property def bundletype(self): return '_truncatedBZ' -# We don't implement compressorobj because it is hackily handled elsewhere. +# We don't implement compressstream because it is hackily handled elsewhere. def decompressorreader(self, fh): def gen(): # The input stream doesn't have the 'BZ' header. So add it back. d = bz2.BZ2Decompressor() d.decompress('BZ') for chunk in filechunkiter(fh): yield d.decompress(chunk) return chunkbuffer(gen()) compressionengines.register('bz2truncated', _truncatedbz2engine()) -class nocompress(object): -def compress(self, x): -return x - -def flush(self): -return '' - class _noopengine(object): @property def bundletype(self): return 'UN' -def compressorobj(self): -return nocompress() - def compressstream(self, it, opts=None): return it def decompressorreader(self, fh): return fh compressionengines.register('none', _noopengine()) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
hglib uses distutils that is being deprecated
So that I could use the recents improvements to python-hglib I built a wheel. I needed to patch setup.py to do this /distutils/setuputils/ so that I could create the wheel with python3 setup.py sdist bdist_wheel Then when I installed my wheel I go this: $ pip3.5 install --upgrade /home/barry/wc/hg/hglib/dist/ python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl Processing /home/barry/wc/hg/hglib/dist/ python_hglib-2.2_6_0f81ed8e147b_20161107-py3-none-any.whl Installing collected packages: python-hglib Found existing installation: python-hglib 2.0 DEPRECATION: Uninstalling a distutils installed project (python-hglib) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project. Uninstalling python-hglib-2.0: Successfully uninstalled python-hglib-2.0 Successfully installed python-hglib-2.2-6-0f81ed8e147b-20161107 Do you have a plan to update to setuputils? I'm guessing that you want to use distutils to support verion old python versions. If that is true then I'd guess that the setup.py would need to do something like: try: from setuptools import setup except ImportError: from distutils import setup Barry ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 08 of 11] util: add a stream compression API to compression engines
On 11/02/2016 01:08 AM, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc# Date 1477159930 25200 # Sat Oct 22 11:12:10 2016 -0700 # Node ID 1d4d111b644453acc4893478528a5f2ecd7ca023 # Parent 289da69280d95f1b983fdf9216739411a9953fb6 util: add a stream compression API to compression engines It is a common pattern throughout the code to perform compression on an iterator of chunks, yielding an iterator of compressed chunks. Let's formalize that as part of the compression engine API. The basic compression implementation for stream compression will be similar. We should maybe have a base class for these object? diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -2890,16 +2890,22 @@ class compressormanager(object): format as used by bundles. * compressorobj -- Method returning an object with ``compress(data)`` and ``flush()`` methods. This object and these methods are used to incrementally feed data (presumably uncompressed) chunks into a compressor. Calls to these methods return compressed bytes, which may be 0-length if there is no output for the operation. +* compressstream -- Compress an iterator of chunks and return an + iterator of compressed chunks. + + Optionally accepts an argument defining how to perform compression. + Each engine treats this argument differently. + * decompressorreader -- Method that is used to perform decompression on a file object. Argument is an object with a ``read(size)`` method that returns compressed data. Return value is an object with a ``read(size)`` that returns uncompressed data. """ bundletype = getattr(engine, 'bundletype', None) if bundletype and bundletype in self._bundletypes: raise error.Abort(_('bundle type %s is already registered') % @@ -2925,16 +2931,29 @@ compressionengines = compressormanager() class _zlibengine(object): @property def bundletype(self): return 'GZ' def compressorobj(self): return zlib.compressobj() +def compressstream(self, it, opts=None): +opts = opts or {} + +z = zlib.compressobj(opts.get('level', -1)) +for chunk in it: +data = z.compress(chunk) +# Not all calls to compress emit data. It is cheaper to inspect +# here than to feed empty chunks through generator. +if data: +yield data + +yield z.flush() + def decompressorreader(self, fh): def gen(): d = zlib.decompressobj() for chunk in filechunkiter(fh): yield d.decompress(chunk) return chunkbuffer(gen()) @@ -2943,16 +2962,26 @@ compressionengines.register('zlib', _zli class _bz2engine(object): @property def bundletype(self): return 'BZ' def compressorobj(self): return bz2.BZ2Compressor() +def compressstream(self, it, opts=None): +opts = opts or {} +z = bz2.BZ2Compressor(opts.get('level', 9)) +for chunk in it: +data = z.compress(chunk) +if data: +yield data + +yield z.flush() + def decompressorreader(self, fh): def gen(): d = bz2.BZ2Decompressor() for chunk in filechunkiter(fh): yield d.decompress(chunk) return chunkbuffer(gen()) @@ -2987,15 +3016,18 @@ class nocompress(object): class _noopengine(object): @property def bundletype(self): return 'UN' def compressorobj(self): return nocompress() +def compressstream(self, it, opts=None): +return it + def decompressorreader(self, fh): return fh compressionengines.register('none', _noopengine()) # convenient shortcut dst = debugstacktrace ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 02 of 11] util: create new abstraction for compression engines
On 11/02/2016 01:08 AM, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc# Date 1477966026 25200 # Mon Oct 31 19:07:06 2016 -0700 # Node ID 4015d575d311cd7ebc923d1320e55a76c655c485 # Parent 60f180c9a030ebcee6c6f4f8584fdb94c73ac337 util: create new abstraction for compression engines Currently, util.py has "compressors" and "decompressors" dicts mapping compression algorithms to callables returning object that perform well-defined operations. In addition, revlog.py has code for calling into a compressor or decompressor explicitly. And, there is code in the wire protocol for performing zlib compression. The 3rd party lz4revlog extension has demonstrated the utility of supporting alternative compression formats for revlog storage. But it stops short of supporting lz4 for bundles and the wire protocol. There are also plans to support zstd as a general compression replacement. So, there appears to be a market for a unified API for registering compression engines. This commit starts the process of establishing one. It establishes a new container class for holding registered compression engine objects. Each object declares and supports common operations via attributes. The built-in zlib, bz2, truncated bz2, and no-op compression engines are registered with a singleton instance of this class. It's worth stating that I'm no fan of the "decompressorreader" API. But this is what existing consumers expect. My plans are to get consumers using the new "engines" API then transition them to a better decompression primitive. This partially explains why I don't care about the duplicated code pattern used for decompressors (it is abstracted into _makedecompressor in the existing code). The plan seems overall good, I've some suggestion on the implementation. diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -2851,21 +2851,156 @@ class ctxmanager(object): exc_type, exc_val, exc_tb = pending = sys.exc_info() del self._atexit if pending: raise exc_val return received and suppressed # compression utility +class compressormanager(object): +"""Holds registrations of various compression engines. + +This class essentially abstracts the differences between compression +engines to allow new compression formats to be added easily, possibly from +extensions. + +Compressors are registered against the global instance by calling its +``register()`` method. +""" +def __init__(self): +self._engines = {} +self._bundletypes = {} + +def __getitem__(self, key): +return self._engines[key] + +def __contains__(self, key): +return key in self._engines + +def __iter__(self): +return iter(self._engines.keys()) + +def register(self, name, engine): +"""Register a compression format with the manager. + +The passed compression engine is an object with attributes describing +behavior and methods performing well-defined actions. The following +attributes are recognized (all are optional): + +* bundletype -- Attribute containing the identifier of this compression + format as used by bundles. + +* compressorobj -- Method returning an object with ``compress(data)`` + and ``flush()`` methods. This object and these methods are used to + incrementally feed data (presumably uncompressed) chunks into a + compressor. Calls to these methods return compressed bytes, which + may be 0-length if there is no output for the operation. + +* decompressorreader -- Method that is used to perform decompression + on a file object. Argument is an object with a ``read(size)`` method + that returns compressed data. Return value is an object with a + ``read(size)`` that returns uncompressed data. +""" This method would be a great decorator candidate. Could we get the name from the object (as we do for the other property?) or have it declared as part of a decorator (but I think the property approach is more consistent with the other bits). Being a decorator probably means to move away from +bundletype = getattr(engine, 'bundletype', None) Apparently the 'bundletype' can be None but there is not mention of it in the documentation. Can the documentation be updated? Also, I'm not sure why the bundletype attribut is optional. Could we just have it mandatory +if bundletype and bundletype in self._bundletypes: +raise error.Abort(_('bundle type %s is already registered') % + bundletype) note: Having the name on the object would allow us to provide a better error message here. "bundle type X provided by Y is already provided by Z" This piece of code is also tickling the idea of a ProgrammingError of some sort. +
Re: [PATCH STABLE V2] hgweb: cache fctx.parents() in annotate command (issue5414)
On Sun, 6 Nov 2016 17:01:05 +, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2016-11-06 11:31:04 +0900: > > Perhaps fctx.parents() can be property-cached, but we'll need to drop > > uninteresting chains of parents in fctx.annotate(). > > If we go the property-cache approach, I think it's better to cache > "_adjustedlinkrev". It's at a lower level and covers both "parents" > and "introrev". Caching "parents" may increase memory usage unintentionally. > > I don't fully get what "uninteresting chains of parents" means here. > In the annotate case, let's say f1, f2 = f0.parents(). > Both f1 and f2 have _descendantrev set to f0's adjusted linkrev. As you said, what's in my mind was the memory usage. Caching fctx.parents() would mean annotate() builds a full link from self to root nodes. Some of these intermediate nodes aren't useful for hgweb. > Suppose there is a global cache dict: {(path, filenode, srcrev): linkrev}, I > think if srcrev=_descendantrev (it's true for f1, f2) and _descendantrev is > adjusted from the direct child (f0), then it is "interesting" and can be > cached. This is similar to what marmoute said during the sprint - for the > log -f or annotate case, once the first fctx's introrev is known, the cache > can be used to calculate the ancestors' adjusted linkrevs. Given we have ugly hacks to pass ancestry data around fctx objects, a global cache might be useful. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 8 of 8] py3: have bytes version of sys.argv
On Sun, 6 Nov 2016 23:55:50 +0530, Pulkit Goyal wrote: > On Sun, Nov 6, 2016 at 9:04 AM, Yuya Nishiharawrote: > > On Sun, 06 Nov 2016 04:46:25 +0530, Pulkit Goyal wrote: > >> # HG changeset patch > >> # User Pulkit Goyal <7895pul...@gmail.com> > >> # Date 1478387186 -19800 > >> # Sun Nov 06 04:36:26 2016 +0530 > >> # Node ID b5fc4e71286dd4f6e4f38e0b9fb17f51f1e3 > >> # Parent 6eed3ee0df425da61d03bfe024dd082f3176ce5d > >> py3: have bytes version of sys.argv > >> > >> sys.argv returns unicodes on Python 3. We need a bytes version for us. > >> There was also a python bug/feature request which wanted then to implement > >> one. They rejected and it is quoted in one of the comments that we can use > >> fsencode() to get a bytes version of sys.argv. Though not sure about its > >> correctness. > >> > >> Link to the comment: http://bugs.python.org/issue8776#msg217416 > >> > >> After this patch we will have pycompat.sysargv which will return us bytes > >> version of sys.argv. If this patch goes in, i will like to make transformer > >> rewrite sys.argv with pycompat.argv because there are lot of occurences. > >> > >> diff -r 6eed3ee0df42 -r b5fc4e71286d mercurial/pycompat.py > >> --- a/mercurial/pycompat.py Sun Nov 06 04:17:19 2016 +0530 > >> +++ b/mercurial/pycompat.py Sun Nov 06 04:36:26 2016 +0530 > >> @@ -41,6 +41,7 @@ > >> osname = os.name.encode('ascii') > >> ospathsep = os.pathsep.encode('ascii') > >> ossep = os.sep.encode('ascii') > >> +sysargv = list(map(os.fsencode, sys.argv)) > > > > Looks good to me. Can you add a comment why we can use os.fsencode() here > > (and the weirdness of Python 3 on Unix.) We might need a Windows workaround > > because the situation is slightly different, but we wouldn't want to care > > for now. > > Well I will resend this patch because I am not sure about its > correctness still. I followed that issue where Victor Stinner, one who > wrote os.environb commented this. There are few doubts/confusions or > maybe I want to just confirm it with MJ once. It's generally wrong to assume argv is in filesystem encoding, but I think that's okay for Python 3 on Unix. They builds "wchar_t argv" from "char argv" by Py_DecodeLocale(), which would be identical to fsdecode() on Unix. https://hg.python.org/cpython/file/v3.5.1/Programs/python.c#l55 On Windows, the native argv appears to be wchar_t, so we'll need a different hack to simulate the Python 2 (i.e. ANSI Win32 API) behavior. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 6 RFC] manifest: introduce an accessor class for manifests
On 11/05/2016 02:05 PM, Yuya Nishihara wrote: On Thu, 3 Nov 2016 15:27:37 -0700, Durham Goode wrote: # HG changeset patch # User Durham Goode# Date 1478208817 25200 # Thu Nov 03 14:33:37 2016 -0700 # Branch stable # Node ID 1788ee9e1df92ac94b9be84eac6d16e3bad903a9 # Parent b9f7b0c10027764cee77f9c6d61877fcffea837f manifest: introduce an accessor class for manifests This introduces a revlogaccessor class which can be used to allow multiple objects hold an auto-invalidating reference to a revlog, without having to hold a reference to the actual repo object. Future patches will switch repo.manifest and repo.manifestlog to access the manifest through this accessor. This will fix the circular reference caused by manifestlog and manifestctx holding a reference to the repo diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py --- a/mercurial/localrepo.py +++ b/mercurial/localrepo.py @@ -514,6 +514,11 @@ class localrepository(object): # manifest creation. return manifest.manifest(self.svfs) +@unfilteredpropertycache +def manifestaccessor(self): +return revlogaccessor('00manifest.i', self.svfs, + self._constructmanifest) Honestly I don't get why manifestlog and manifestctxs have to live longer than the other repo properties. I'm also a bit curious about that. But suppose that is necessary, I agree we'll need this kind of a wrapper. Any reason why we are using the wrapper approach over using a weak reference ? Weakref are not great but we use them in multiple spot when needed. this seems it would be simpler than the current approach, but I might be missing something. Maybe we can move the accessor to the manifestlog, but still the accessor will have to be shared (and updated transparently) by the manifestlog and its cachable manifestctxs. +def revlogaccessor(filename, opener, constructor): +"""Creates an accessor that provides cached and invalidated access to a +revlog, via instance.revlog. This is useful for letting multiple objects +hold a reference to the revlog, without having to hold a possibly-circular +reference to the actual repository. """ + +# We have to use a runtime type here, because the only way to create a +# property is to put it on a class itself, and the property is dynamically +# defined by the filename parameter. +class accessor(object): Perhaps we could refactor filecache to avoid dynamically creating a class, but that would be a minor issue of this RFC series. -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 4] tests: merge 'test-push-validation.t' into 'test-push.t'
On 11/04/2016 08:15 PM, timeless wrote: Pierre-Yves David wrote: tests: merge 'test-push-validation.t' into 'test-push.t' That test file is very small and is merge with the new 'test-push.t'. No logic is changed but repository name are update to avoid collision. We don't register this as a copy because is actually a "ypoc" merging two file together without replacing the destination and Mercurial cannot express that. Actually, it can: 0: a b 0->1: rename a->d 0->2: rename b->d 1+2->3: merge d This should give you the history you want in `hg ann`. Hu, good point, but the UI only offer it through merge. We could offer make copy able to record them but I'm not sure its a good idea. Cheers, -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] commands: introduce `hg display`
Gregory Szorc a écrit : For the command name, we would have preferred `hg show` because it is shorter and not ambigious with any other core command. However, a number of people have created `hg show` as effectively an alias to `hg export`. And, some were concerned that Git users used to `git show` being equivalent to `hg export` would be confused by a `hg show` doing something different. `git show` is not equivalent to `hg export`, quoting git-show(1): Shows one or more objects (blobs, trees, tags and commits). For commits it shows the log message and textual diff. It also presents the merge commit in a special format as produced by git diff-tree --cc. For tags, it shows the tag message and the referenced objects. For trees, it shows the names (equivalent to git ls-tree with --name-only). For plain blobs, it shows the plain contents. So only the first case is equivalent to `hg export` (or probably more `hg log -vpr`). Other cases are quite close to the "view" concept introduced here, as far as I understand. Then if a revision can be registered as a view, `hg show` could just be a plain replacement to the aforementioned alias I guess. Given this and the conflict with `hg diff`, could we reconsider the command name? -- Denis Laxalde Logilab http://www.logilab.fr ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 2] commands: add "di" alias for "diff"
Jun Wu a écrit : Therefore I'm very sensitive about this. I think we should always make sure "d" = "diff" (although the complaint was only about "di"). I'm also quite used to `hg d`, for what it's worth. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] bdiff: replace hash algorithm
> On Nov 6, 2016, at 19:06, Gregory Szorcwrote: > > # HG changeset patch > # User Gregory Szorc > # Date 1478487117 28800 > # Sun Nov 06 18:51:57 2016 -0800 > # Node ID bb7c6d6f4a10e80ff4bdf88919692f08497d2d66 > # Parent 1c7269484883804b6f960e87309169ef4ae85043 > bdiff: replace hash algorithm > > This patch replaces lyhash with the hash algorithm used by diffutils. > The algorithm has its origins in Git commit 2e9d1410, which is all the > way back from 1992. The license header in the code at that revision > in GPL v2. > > I have not performed an extensive analysis of the distribution > (and therefore buckets) of hash output. However, `hg perfbdiff` > gives some clear wins. I'd like to think that if it is good enough > for diffutils it is good enough for us? Searching the Internets seems to reveal that xxHash is the state of the art for fast string hashing with great distribution. We'll have a copy of xxHash vendored as part of zstd and it should be relatively easy to plug in then. Honestly, I'm not sure if we should take the quick win or hold out for xxHash in a few weeks (assuming my compression engine series and zstd vendoring moves forward...). > > From the mozilla-unified repository: > > $ perfbdiff -m 3041e4d59df2 > ! wall 0.053271 comb 0.06 user 0.06 sys 0.00 (best of 100) > ! wall 0.035827 comb 0.04 user 0.04 sys 0.00 (best of 100) > > $ perfbdiff 0e9928989e9c --alldata --count 100 > ! wall 6.204277 comb 6.20 user 6.20 sys 0.00 (best of 3) > ! wall 4.309710 comb 4.30 user 4.30 sys 0.00 (best of 3) > > From the hg repo: > > $ perfbdiff 35000 --alldata --count 1000 > ! wall 0.660358 comb 0.66 user 0.66 sys 0.00 (best of 15) > ! wall 0.534092 comb 0.53 user 0.53 sys 0.00 (best of 19) > > Looking at the generated assembly and statistical profiler output > from the kernel level, I believe there is room to make this function > even faster. Namely, we're still consuming data character by character > instead of at the word level. This translates to more loop iterations > and more instructions. > > At this juncture though, the real performance killer is that we're > hashing every line. We should get a significant speedup if we change > the algorithm to find the longest prefix, longest suffix, treat those > as single "lines" and then only do the line splitting and hashing on > the parts that are different. That will require a lot of C code, > however. I'm optimistic this approach could result in a ~2x speedup. > > diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c > --- a/mercurial/bdiff.c > +++ b/mercurial/bdiff.c > @@ -17,6 +17,10 @@ > #include "bitmanipulation.h" > #include "bdiff.h" > > +/* Hash implementation from diffutils */ > +#define ROL(v, n) ((v) << (n) | (v) >> (sizeof(v) * CHAR_BIT - (n))) > +#define HASH(h, c) ((c) + ROL(h ,7)) > + > struct pos { >int pos, len; > }; > @@ -44,8 +48,7 @@ int bdiff_splitlines(const char *a, ssiz >/* build the line array and calculate hashes */ >hash = 0; >for (p = a; p < a + len; p++) { > -/* Leonid Yuriev's hash */ > -hash = (hash * 1664525) + (unsigned char)*p + 1013904223; > +hash = HASH(hash, *p); > >if (*p == '\n' || p == plast) { >l->hash = hash; ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel