[PATCH 2 of 2] procutil: don't allow the main 'hg' script to be treated as the Windows exe

2018-11-23 Thread Matt Harbison
# HG changeset patch
# User Matt Harbison 
# Date 1543030077 18000
#  Fri Nov 23 22:27:57 2018 -0500
# Node ID 1f9de5636e5f7f4bfe2d3fb8c5dde543a1870161
# Parent  2abf33243bea3e4679ac944315d82fce21918d8f
procutil: don't allow the main 'hg' script to be treated as the Windows exe

Previously, there were a handful of errors like this:

 $ hg prefetch --repack
 (running background incremental repack)
  +  abort: %1 is not a valid Win32 application
  +  [255]

CreateProcess() doesn't append .exe when `lpApplicationName` contains a path,
and a python script isn't directly executable.

diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
--- a/mercurial/utils/procutil.py
+++ b/mercurial/utils/procutil.py
@@ -241,7 +241,7 @@ def hgexecutable():
 _sethgexecutable(encoding.environ['EXECUTABLEPATH'])
 else:
 _sethgexecutable(pycompat.sysexecutable)
-elif (os.path.basename(
+elif (not pycompat.iswindows and os.path.basename(
 pycompat.fsencode(getattr(mainmod, '__file__', ''))) == 'hg'):
 _sethgexecutable(pycompat.fsencode(mainmod.__file__))
 else:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 2] remotefilelog: drop some compatibility cruft for finding the hg exeutable

2018-11-23 Thread Matt Harbison
# HG changeset patch
# User Matt Harbison 
# Date 1543029536 18000
#  Fri Nov 23 22:18:56 2018 -0500
# Node ID 2abf33243bea3e4679ac944315d82fce21918d8f
# Parent  197f7eebf5f89fb2b9d0e117157b4a040dde0a89
remotefilelog: drop some compatibility cruft for finding the hg exeutable

diff --git a/hgext/remotefilelog/repack.py b/hgext/remotefilelog/repack.py
--- a/hgext/remotefilelog/repack.py
+++ b/hgext/remotefilelog/repack.py
@@ -34,15 +34,8 @@ osutil = policy.importmod(r'osutil')
 class RepackAlreadyRunning(error.Abort):
 pass
 
-if util.safehasattr(util, '_hgexecutable'):
-# Before 5be286db
-_hgexecutable = util.hgexecutable
-else:
-from mercurial.utils import procutil
-_hgexecutable = procutil.hgexecutable
-
 def backgroundrepack(repo, incremental=True, packsonly=False):
-cmd = [_hgexecutable(), '-R', repo.origroot, 'repack']
+cmd = [procutil.hgexecutable(), '-R', repo.origroot, 'repack']
 msg = _("(running background repack)\n")
 if incremental:
 cmd.append('--incremental')
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 3 RESEND] ui: manage logger instances and event filtering by core ui

2018-11-23 Thread Yuya Nishihara
# HG changeset patch
# User Yuya Nishihara 
# Date 1541927313 -32400
#  Sun Nov 11 18:08:33 2018 +0900
# Node ID 35fa768c3fbc6f5fa5f5a6108f10bf782cc20393
# Parent  c178d702bd35d0e4f3f3903c9a50f48221359d08
ui: manage logger instances and event filtering by core ui

The setup code in blackbox needs more tweaks since it has lots of black
magics. I'll fix them by follow-up patches.

To be clear, the goal of this series is to provide a proper way for command
server to install its own logger. I need it to debug in-memory repository
cache.

diff --git a/hgext/blackbox.py b/hgext/blackbox.py
--- a/hgext/blackbox.py
+++ b/hgext/blackbox.py
@@ -53,7 +53,6 @@ from mercurial import (
 pycompat,
 registrar,
 ui as uimod,
-util,
 )
 from mercurial.utils import (
 dateutil,
@@ -147,9 +146,6 @@ class blackboxlogger(object):
 
 def log(self, ui, event, msg, opts):
 global _lastlogger
-if not self.tracked(event):
-return
-
 if self._bbvfs:
 _lastlogger = self
 elif _lastlogger and _lastlogger._bbvfs:
@@ -201,33 +197,20 @@ class blackboxlogger(object):
 
 def wrapui(ui):
 class blackboxui(ui.__class__):
-def __init__(self, src=None):
-super(blackboxui, self).__init__(src)
-if src and r'_bblogger' in src.__dict__:
-self._bblogger = src._bblogger
-
-# trick to initialize logger after configuration is loaded, which
-# can be replaced later with blackboxlogger(ui) in uisetup(), where
-# both user and repo configurations should be available.
-@util.propertycache
-def _bblogger(self):
-return blackboxlogger(self)
-
 def debug(self, *msg, **opts):
 super(blackboxui, self).debug(*msg, **opts)
 if self.debugflag:
 self.log('debug', '%s', ''.join(msg))
 
-def log(self, event, *msg, **opts):
-super(blackboxui, self).log(event, *msg, **opts)
-self._bblogger.log(self, event, msg, opts)
-
 ui.__class__ = blackboxui
 uimod.ui = blackboxui
 
 def uisetup(ui):
 wrapui(ui)
 
+def uipopulate(ui):
+ui.setlogger(b'blackbox', blackboxlogger(ui))
+
 def reposetup(ui, repo):
 # During 'hg pull' a httppeer repo is created to represent the remote repo.
 # It doesn't have a .hg directory to put a blackbox in, so we don't do
@@ -235,7 +218,10 @@ def reposetup(ui, repo):
 if not repo.local():
 return
 
-logger = getattr(ui, '_bblogger', None)
+# Since blackbox.log is stored in the repo directory, the logger should be
+# instantiated per repository.
+logger = blackboxlogger(ui)
+ui.setlogger(b'blackbox', logger)
 if logger:
 logger.setrepo(repo)
 
diff --git a/hgext/logtoprocess.py b/hgext/logtoprocess.py
--- a/hgext/logtoprocess.py
+++ b/hgext/logtoprocess.py
@@ -38,7 +38,6 @@ import os
 
 from mercurial import (
 pycompat,
-util,
 )
 from mercurial.utils import (
 procutil,
@@ -63,9 +62,7 @@ class processlogger(object):
 return bool(self._scripts.get(event))
 
 def log(self, ui, event, msg, opts):
-script = self._scripts.get(event)
-if not script:
-return
+script = self._scripts[event]
 env = {
 b'EVENT': event,
 b'HGPID': os.getpid(),
@@ -77,24 +74,5 @@ class processlogger(object):
 fullenv = procutil.shellenviron(env)
 procutil.runbgcommand(script, fullenv, shell=True)
 
-def uisetup(ui):
-
-class logtoprocessui(ui.__class__):
-def __init__(self, src=None):
-super(logtoprocessui, self).__init__(src)
-if src and r'_ltplogger' in src.__dict__:
-self._ltplogger = src._ltplogger
-
-# trick to initialize logger after configuration is loaded, which
-# can be replaced later with processlogger(ui) in uisetup(), where
-# both user and repo configurations should be available.
-@util.propertycache
-def _ltplogger(self):
-return processlogger(self)
-
-def log(self, event, *msg, **opts):
-self._ltplogger.log(self, event, msg, opts)
-return super(logtoprocessui, self).log(event, *msg, **opts)
-
-# Replace the class for this instance and all clones created from it:
-ui.__class__ = logtoprocessui
+def uipopulate(ui):
+ui.setlogger(b'logtoprocess', processlogger(ui))
diff --git a/mercurial/ui.py b/mercurial/ui.py
--- a/mercurial/ui.py
+++ b/mercurial/ui.py
@@ -235,6 +235,7 @@ class ui(object):
 self._fmsgout = src._fmsgout
 self._fmsgerr = src._fmsgerr
 self._finoutredirected = src._finoutredirected
+self._loggers = src._loggers.copy()
 self.pageractive = src.pageractive
 self._disablepager = src._disablepager
 self._tweaked = src._tweaked
@@ -263,6 +264,7 @@ class ui(object):
 self._fmsgout 

[PATCH 2 of 3 RESEND] extensions: add "uipopulate" hook, called per instance, not per process

2018-11-23 Thread Yuya Nishihara
# HG changeset patch
# User Yuya Nishihara 
# Date 1542024651 -32400
#  Mon Nov 12 21:10:51 2018 +0900
# Node ID c178d702bd35d0e4f3f3903c9a50f48221359d08
# Parent  2ca38e8c9fe7464596073de8c6c35a291c1e09c4
extensions: add "uipopulate" hook, called per instance, not per process

In short, this is the "reposetup" function for ui. It allows us to modify
ui attributes without extending ui.__class__. Before, the only way to do
that was to abuse the config dictionary, which is copied across ui instances.

See the next patch for usage example.

diff --git a/mercurial/chgserver.py b/mercurial/chgserver.py
--- a/mercurial/chgserver.py
+++ b/mercurial/chgserver.py
@@ -246,6 +246,10 @@ def _loadnewui(srcui, args):
 rpath = options['repository']
 path, newlui = dispatch._getlocal(newui, rpath, wd=cwd)
 
+extensions.populateui(newui)
+if newui is not newlui:
+extensions.populateui(newlui)
+
 return (newui, newlui)
 
 class channeledsystem(object):
diff --git a/mercurial/dispatch.py b/mercurial/dispatch.py
--- a/mercurial/dispatch.py
+++ b/mercurial/dispatch.py
@@ -866,6 +866,9 @@ def _dispatch(req):
 # Check abbreviation/ambiguity of shell alias.
 shellaliasfn = _checkshellalias(lui, ui, args)
 if shellaliasfn:
+# no additional configs will be set, set up the ui instances
+for ui_ in uis:
+extensions.populateui(ui_)
 return shellaliasfn()
 
 # check for fallback encoding
@@ -948,6 +951,10 @@ def _dispatch(req):
 for ui_ in uis:
 ui_.disablepager()
 
+# configs are fully loaded, set up the ui instances
+for ui_ in uis:
+extensions.populateui(ui_)
+
 if options['version']:
 return commands.version_(ui)
 if options['help']:
diff --git a/mercurial/extensions.py b/mercurial/extensions.py
--- a/mercurial/extensions.py
+++ b/mercurial/extensions.py
@@ -405,6 +405,25 @@ def afterloaded(extension, callback):
 else:
 _aftercallbacks.setdefault(extension, []).append(callback)
 
+def populateui(ui):
+"""Run extension hooks on the given ui to populate additional members,
+extend the class dynamically, etc.
+
+This will be called after the configuration is loaded, and/or extensions
+are loaded. In general, it's once per ui instance, but in command-server
+and hgweb, this may be called more than once with the same ui.
+"""
+for name, mod in extensions(ui):
+hook = getattr(mod, 'uipopulate', None)
+if not hook:
+continue
+try:
+hook(ui)
+except Exception as inst:
+ui.traceback(force=True)
+ui.warn(_('*** failed to populate ui by extension %s: %s\n')
+% (name, stringutil.forcebytestr(inst)))
+
 def bind(func, *args):
 '''Partial function application
 
diff --git a/mercurial/help/internals/extensions.txt 
b/mercurial/help/internals/extensions.txt
--- a/mercurial/help/internals/extensions.txt
+++ b/mercurial/help/internals/extensions.txt
@@ -183,6 +183,29 @@ Command table setup
 After ``extsetup``, the ``cmdtable`` is copied into the global command table
 in Mercurial.
 
+Ui instance setup
+-
+
+The optional ``uipopulate`` is called for each ``ui`` instance after
+configuration is loaded, where extensions can set up additional ui members,
+update configuration by ``ui.setconfig()``, and extend the class dynamically.
+
+Typically there are three ``ui`` instances involved in command execution:
+
+``req.ui`` (or ``repo.baseui``)
+Only system and user configurations are loaded into it.
+``lui``
+Local repository configuration is loaded as well. This will be used at
+early dispatching stage where a repository isn't available.
+``repo.ui``
+The fully-loaded ``ui`` used after a repository is instantiated. This
+will be created from the ``req.ui`` per repository.
+
+In command server and hgweb, this may be called more than once for the same
+``ui`` instance.
+
+(New in Mercurial 4.9)
+
 Repository setup
 
 
@@ -304,7 +327,8 @@ uisetup
   a change made here will be visible by other extensions during ``extsetup``.
 * Monkeypatches or function wraps (``extensions.wrapfunction``) of ``dispatch``
   module members
-* Setup of ``pre-*`` and ``post-*`` hooks
+* Set up ``pre-*`` and ``post-*`` hooks. (DEPRECATED. ``uipopulate`` is
+  preferred on Mercurial 4.9 and later.)
 * ``pushkey`` setup
 
 extsetup
@@ -314,9 +338,17 @@ extsetup
 * Add a global option to all commands
 * Extend revsets
 
+uipopulate
+--
+
+* Modify ``ui`` instance attributes and configuration variables.
+* Changes to ``ui.__class__`` per instance.
+* Set up all hooks per scoped configuration.
+
 reposetup
 -
 
-* All hooks but ``pre-*`` and ``post-*``
+* Set up all hooks but ``pre-*`` and ``post-*``. (DEPRECATED. ``uipopulate`` is
+  preferred on Mercurial 4.9 and later.)
 * 

[PATCH 1 of 3 RESEND] hgweb: load globally-enabled extensions explicitly

2018-11-23 Thread Yuya Nishihara
# HG changeset patch
# User Yuya Nishihara 
# Date 1542449505 -32400
#  Sat Nov 17 19:11:45 2018 +0900
# Node ID 2ca38e8c9fe7464596073de8c6c35a291c1e09c4
# Parent  50a64c321c1e74b98ec1fa959bdc92efdc6f4ee7
hgweb: load globally-enabled extensions explicitly

Before, extensions were loaded as a side effect of hg.repository() if the
hgweb was executed as a CGI/WSGI. I want to make it explicit so that another
ui hook can be inserted after extensions.loadall().

diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
--- a/mercurial/hgweb/hgweb_mod.py
+++ b/mercurial/hgweb/hgweb_mod.py
@@ -22,6 +22,7 @@ from .common import (
 from .. import (
 encoding,
 error,
+extensions,
 formatter,
 hg,
 hook,
@@ -212,6 +213,7 @@ class hgweb(object):
 u = baseui.copy()
 else:
 u = uimod.ui.load()
+extensions.loadall(u)
 r = hg.repository(u, repo)
 else:
 # we trust caller to give us a private copy
diff --git a/mercurial/hgweb/hgwebdir_mod.py b/mercurial/hgweb/hgwebdir_mod.py
--- a/mercurial/hgweb/hgwebdir_mod.py
+++ b/mercurial/hgweb/hgwebdir_mod.py
@@ -30,6 +30,7 @@ from .. import (
 configitems,
 encoding,
 error,
+extensions,
 hg,
 profiling,
 pycompat,
@@ -268,6 +269,9 @@ class hgwebdir(object):
 self.lastrefresh = 0
 self.motd = None
 self.refresh()
+if not baseui:
+# set up environment for new ui
+extensions.loadall(self.ui)
 
 def refresh(self):
 if self.ui:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5291: branchmap: build the revbranchcache._namesreverse() only when required

2018-11-23 Thread pulkit (Pulkit Goyal)
This revision was automatically updated to reflect the committed changes.
Closed by commit rHG50a64c321c1e: branchmap: build the 
revbranchcache._namesreverse() only when required (authored by pulkit, 
committed by ).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D5291?vs=12586=12592

REVISION DETAIL
  https://phab.mercurial-scm.org/D5291

AFFECTED FILES
  mercurial/branchmap.py

CHANGE DETAILS

diff --git a/mercurial/branchmap.py b/mercurial/branchmap.py
--- a/mercurial/branchmap.py
+++ b/mercurial/branchmap.py
@@ -397,15 +397,18 @@
 self._names = []
 self._rbcnamescount = len(self._names) # number of names read at
# _rbcsnameslen
-self._namesreverse = dict((b, r) for r, b in enumerate(self._names))
 
 def _clear(self):
 self._rbcsnameslen = 0
 del self._names[:]
 self._rbcnamescount = 0
-self._namesreverse.clear()
 self._rbcrevslen = len(self._repo.changelog)
 self._rbcrevs = bytearray(self._rbcrevslen * _rbcrecsize)
+util.clearcachedproperty(self, '_namesreverse')
+
+@util.propertycache
+def _namesreverse(self):
+return dict((b, r) for r, b in enumerate(self._names))
 
 def branchinfo(self, rev):
 """Return branch name and close flag for rev, using and updating



To: pulkit, #hg-reviewers
Cc: yuja, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5290: branchmap: refactor for better encapsulation

2018-11-23 Thread yuja (Yuya Nishihara)
yuja added a comment.


  Can you split this to a couple of patches?
  
  The idea sounds good, but it isn't easy to review formatting changes,
  code moves, and interface improvements as a single patch.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5290

To: mjpieters, #hg-reviewers
Cc: yuja, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D5290: branchmap: refactor for better encapsulation

2018-11-23 Thread Yuya Nishihara
Can you split this to a couple of patches?

The idea sounds good, but it isn't easy to review formatting changes,
code moves, and interface improvements as a single patch.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5300: py3: replace str() with pycompat.bytestr() or ('%d' % int)

2018-11-23 Thread yuja (Yuya Nishihara)
yuja added a comment.


  > - revenc = lambda x: wrev if x is None else str(x) + wrevpad
  > - csetenc = lambda x: wnode if x is None else str(x) + ' ' +
revenc = lambda x: wrev if x is None else ('%d' % x) + wrevpad +
csetenc = lambda x: wnode if x is None else pycompat.bytestr(x) + ' '
  
  check-code complains that the line is too long. Maybe rewrite as a function?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5300

To: pulkit, #hg-reviewers
Cc: yuja, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D5300: py3: replace str() with pycompat.bytestr() or ('%d' % int)

2018-11-23 Thread Yuya Nishihara
> -revenc = lambda x: wrev if x is None else str(x) + wrevpad
> -csetenc = lambda x: wnode if x is None else str(x) + ' '
> +revenc = lambda x: wrev if x is None else ('%d' % x) + wrevpad
> +csetenc = lambda x: wnode if x is None else pycompat.bytestr(x) 
> + ' '

check-code complains that the line is too long. Maybe rewrite as a function?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5292: branchmap: make it easier for extensions not to break branchcache

2018-11-23 Thread yuja (Yuya Nishihara)
yuja added a comment.


  >   The _branchcache global gives us a reference for super() to use even if an
  >   extension subclasses branchmap.branchcache then replaces the class in the
  >   module.
  
  Can't we instead add a factory function which can be easily hooked by
  extensions?
  
  It should be discouraged to replace a class globally.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5292

To: mjpieters, #hg-reviewers
Cc: yuja, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D5292: branchmap: make it easier for extensions not to break branchcache

2018-11-23 Thread Yuya Nishihara
>   The _branchcache global gives us a reference for super() to use even if an
>   extension subclasses branchmap.branchcache then replaces the class in the
>   module.

Can't we instead add a factory function which can be easily hooked by
extensions?

It should be discouraged to replace a class globally.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 8 V6] perf: add a `perftracecopies` command to benchmark copy tracking logic

2018-11-23 Thread Yuya Nishihara
On Sat, 24 Nov 2018 11:29:48 +0900, Yuya Nishihara wrote:
> On Mon, 19 Nov 2018 17:49:40 +0100, Boris Feld wrote:
> > # HG changeset patch
> > # User Boris Feld 
> > # Date 1542628825 0
> > #  Mon Nov 19 12:00:25 2018 +
> > # Node ID 40c285f3b12012727bcdfd11984d81fe56386316
> > # Parent  dba590f27c7abacbd7e9b27f3e06822bb0b339cb
> > # EXP-Topic copy-perf
> > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> > 40c285f3b120
> > perf: add a `perftracecopies` command to benchmark copy tracking logic
> 
> > diff --git a/contrib/perf.py b/contrib/perf.py
> > --- a/contrib/perf.py
> > +++ b/contrib/perf.py
> > @@ -1146,6 +1146,24 @@ def perftemplating(ui, repo, testedtempl
> >  timer(format)
> >  fm.end()
> >  
> > +@command(b'perftracecopies', formatteropts +
> > + [
> > +  (b's', b'source', b'', b'copy tracing source'),
> > +  (b'd', b'destination', b'', b'copy tracing destination'),
> > + ])
> > +def perftracecopies(ui, repo, source, destination, **opts):
> > +"""measure time necessary to trace copy between  and 
> > 
> > +"""
> > +opts = _byteskwargs(opts)
> > +timer, fm = gettimer(ui, opts)
> > +src = scmutil.revsingle(repo, source)
> > +dst = scmutil.revsingle(repo, destination)
> > +
> > +def runone():
> > +copies.pathcopies(repo[src], repo[dst])
> > +timer(runone)
> > +fm.end()
> 
> I just found there's perfpathcopies.

Dropped this as I can rebase the descendants right now.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 8 V6] context: floor adjustlinkrev graph walk during copy tracing

2018-11-23 Thread Yuya Nishihara
On Mon, 19 Nov 2018 17:49:47 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1539125437 -7200
> #  Wed Oct 10 00:50:37 2018 +0200
> # Node ID 62fe8adca90eeba238fa313827f4714fed6c34a5
> # Parent  3b75faab24d72c1e5689d352ff13b87b9f9faa51
> # EXP-Topic copy-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> 62fe8adca90e
> context: floor adjustlinkrev graph walk during copy tracing

> @@ -750,6 +754,8 @@ class basefilectx(object):
>  fnode = self._filenode
>  path = self._path
>  for a in iteranc:
> +if stoprev is not None and a < stoprev:
> +return None

So, we rely on the fact that iteranc is sorted by revision numbers. Can you
update the docstring of revlog.ancestors? It only says "reverse topological
order", which is weaker constraint.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 8 V6] perf: add a `perftracecopies` command to benchmark copy tracking logic

2018-11-23 Thread Yuya Nishihara
On Mon, 19 Nov 2018 17:49:40 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1542628825 0
> #  Mon Nov 19 12:00:25 2018 +
> # Node ID 40c285f3b12012727bcdfd11984d81fe56386316
> # Parent  dba590f27c7abacbd7e9b27f3e06822bb0b339cb
> # EXP-Topic copy-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> 40c285f3b120
> perf: add a `perftracecopies` command to benchmark copy tracking logic

> diff --git a/contrib/perf.py b/contrib/perf.py
> --- a/contrib/perf.py
> +++ b/contrib/perf.py
> @@ -1146,6 +1146,24 @@ def perftemplating(ui, repo, testedtempl
>  timer(format)
>  fm.end()
>  
> +@command(b'perftracecopies', formatteropts +
> + [
> +  (b's', b'source', b'', b'copy tracing source'),
> +  (b'd', b'destination', b'', b'copy tracing destination'),
> + ])
> +def perftracecopies(ui, repo, source, destination, **opts):
> +"""measure time necessary to trace copy between  and 
> 
> +"""
> +opts = _byteskwargs(opts)
> +timer, fm = gettimer(ui, opts)
> +src = scmutil.revsingle(repo, source)
> +dst = scmutil.revsingle(repo, destination)
> +
> +def runone():
> +copies.pathcopies(repo[src], repo[dst])
> +timer(runone)
> +fm.end()

I just found there's perfpathcopies.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Yuya Nishihara
On Fri, 23 Nov 2018 15:51:58 -0800, Martin von Zweigbergk wrote:
> On Fri, Nov 23, 2018 at 9:20 AM Boris FELD  wrote:
> > So I feel like it is fine to just rely on the size limit.
> > >> Perhaps it's been fixed since 2.7.4. The regexp code width is extended
> > >> from 16bit to 32bit (or Py_UCS4) integer. That should be large enough to
> > >> handle practical patterns.
> > >>
> > >> https://bugs.python.org/issue1160
> >
> > Thanks for digging this out. It looks like we may be able to drop this
> > limit altogether. However, I would like to make it a change distinct
> > from this series.
> >
> > The current code is very problematic for some people (to the point where
> > the majority of `hg status` time is spent in that function). I would
> > like to get fast code for the same semantic first. Then look into
> > changing the semantic.
> >
> 
> Is your concern that you might regress in performance of something by
> changing how large the groups are? Or that it would be more work?
> 
> I tried creating a regex for *every* pattern and that actually seemed
> faster (to my surprise), both when creating the matcher and when evaluating
> it. I tried it on the mozilla-unified repo both with 1k files and with 10k
> files in the hgignores. I used the following patch on top of your series.

Wow. If we don't need to combine patterns into one, numbered groups should
just work.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 7 V5] sparse-revlog: introduce native (C) implementation of slicechunktodensity

2018-11-23 Thread Yuya Nishihara
On Thu, 22 Nov 2018 19:08:07 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1542276598 -3600
> #  Thu Nov 15 11:09:58 2018 +0100
> # Node ID 606a1aa722602ac9d76e4ae6d0ea54b088b50414
> # Parent  cc7132133f0e391b53985b4d08072304032fd444
> # EXP-Topic sparse-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> 606a1aa72260
> sparse-revlog: introduce native (C) implementation of slicechunktodensity

> diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
> --- a/mercurial/cext/revlog.c
> +++ b/mercurial/cext/revlog.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "bitmanipulation.h"
> @@ -1095,6 +1096,231 @@ static Py_ssize_t trim_endidx(indexObjec
>   return endidx;
>  }
>  
> +struct Gap {
> + int64_t size;
> + Py_ssize_t idx;
> +};
> +
> +static int gap_compare(const void *left, const void *right)
> +{
> + const struct Gap *l_left = ((const struct Gap *)left);
> + const struct Gap *l_right = ((const struct Gap *)right);
> + if (l_left->size < l_right->size) {
> + return -1;
> + } else if (l_left->size > l_right->size) {
> + return 1;
> + }
> + return 0;
> +}
> +static int Py_ssize_t_compare(const void *left, const void *right)
> +{
> + const Py_ssize_t l_left = *(const Py_ssize_t *)left;
> + const Py_ssize_t l_right = *(const Py_ssize_t *)right;
> + if (l_left < l_right) {
> + return -1;
> + } else if (l_left > l_right) {
> + return 1;
> + }
> + return 0;
> +}
> +
> +static PyObject *index_slicechunktodensity(indexObject *self, PyObject *args)
> +{
> + /* method arguments */
> + PyObject *list_revs = NULL; /* revisions in the chain */
> + double targetdensity = 0;   /* min density to achieve */
> + Py_ssize_t mingapsize = 0;  /* threshold to ignore gaps */
> +
> + /* other core variables */
> + Py_ssize_t i;/* used for various iteration */
> + PyObject *result = NULL; /* the final return of the function */
> +
> + /* generic information about the delta chain being slice */
> + Py_ssize_t num_revs = 0;/* size of the full delta chain */
> + Py_ssize_t *revs = NULL;/* native array of revision in the chain */
> + int64_t chainpayload = 0;   /* sum of all delta in the chain */
> + int64_t deltachainspan = 0; /* distance from first byte to last byte */
> +
> + /* variable used for slicing the delta chain */
> + int64_t readdata = 0; /* amount of data currently planned to be read */
> + double density = 0;   /* ration of payload data compared to read ones */
> + struct Gap *gaps = NULL; /* array of notable gap in the chain */
> + Py_ssize_t num_gaps =
> + 0; /* total number of notable gap recorded so far */
> + Py_ssize_t *selected_indices = NULL; /* indices of gap skipped over */
> + Py_ssize_t num_selected = 0; /* number of gaps skipped */
> + PyObject *chunk = NULL;  /* individual slice */
> + PyObject *allchunks = NULL;  /* all slices */
> +
> + /* parsing argument */
> + if (!PyArg_ParseTuple(args, "O!dl", _Type, _revs,
> +   , )) {

Fixed format notation of mingapsize to "n". It's Py_ssize_t.

> + goto bail;
> + }
> +
> + /* If the delta chain contains a single element, we do not need slicing
> +  */
> + num_revs = PyList_GET_SIZE(list_revs);
> + if (num_revs <= 1) {
> + result = PyTuple_Pack(1, list_revs);
> + goto done;
> + }
> +
> + /* Turn the python list into a native integer array (for efficiency) */
> + revs = (Py_ssize_t *)calloc(num_revs, sizeof(Py_ssize_t));
> + if (revs == NULL) {
> + PyErr_NoMemory();
> + goto bail;
> + }
> + Py_ssize_t idxlen = index_length(self);

Moved this declaration to top.

> + for (i = 0; i < num_revs; i++) {
> + Py_ssize_t revnum;
> + if (!pylong_to_long(PyList_GET_ITEM(list_revs, i), )) {
> + goto bail;
> + }

Replaced with PyInt_AsLong() to fix pointer type mismatch. The return value
is checked by the code below.

> + if (revnum == -1 && PyErr_Occurred()) {
> + goto bail;
> + }
> + if (revnum < 0 || revnum >= idxlen) {
> + PyErr_SetString(PyExc_IndexError, "index out of range");
> + goto bail;
> + }
> + revs[i] = revnum;
> + }
> +
> + /* Compute and check various property of the unsliced delta chain */
> + deltachainspan = index_segment_span(self, revs[0], revs[num_revs - 1]);
> + if (deltachainspan < 0) {
> + goto bail;
> + }
> +
> + if (deltachainspan <= mingapsize) {
> + 

Re: [PATCH 4 of 7 V5] sparse-revlog: add a `trim_endidx` function in C

2018-11-23 Thread Yuya Nishihara
On Thu, 22 Nov 2018 19:08:06 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1541785523 -3600
> #  Fri Nov 09 18:45:23 2018 +0100
> # Node ID cc7132133f0e391b53985b4d08072304032fd444
> # Parent  4ad3891a07ed83cda837db8d0cbe285ebb377869
> # EXP-Topic sparse-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> cc7132133f0e
> sparse-revlog: add a `trim_endidx` function in C
> 
> We are about to implement a native version of `slicechunktodensity`. For
> clarity, we introduce the helper functions first.
> 
> This function implement a subpart of the python function `_trimchunk` in
> `mercurial/revlogutils/deltas.py`. Handling of actual Python objects is left
> to the caller function.
> 
> diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
> --- a/mercurial/cext/revlog.c
> +++ b/mercurial/cext/revlog.c
> @@ -1077,6 +1077,24 @@ index_segment_span(indexObject *self, Py
>   return (end_offset - start_offset) + (int64_t)end_size;
>  }
>  
> +/* returns revs[startidx:endidx] without empty trailing revs */

The function doc isn't correct. Can you update as a follow up?

> +static Py_ssize_t trim_endidx(indexObject *self, Py_ssize_t *revs,
> +  Py_ssize_t startidx, Py_ssize_t endidx)

I've changed *revs to const * to make it clear revs isn't an output variable.

> +{
> + int length;
> + while (endidx > 1 && endidx > startidx) {
> + length = index_get_length(self, revs[endidx - 1]);
> + if (length < 0) {
> + return -1;
> + }
> + if (length != 0) {
> + break;
> + }
> + endidx -= 1;
> + }
> + return endidx;
> +}
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 7 V5] sparse-revlog: add a `index_segment_span` function in C

2018-11-23 Thread Yuya Nishihara
On Thu, 22 Nov 2018 19:08:05 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1541785396 -3600
> #  Fri Nov 09 18:43:16 2018 +0100
> # Node ID 4ad3891a07ed83cda837db8d0cbe285ebb377869
> # Parent  18864760091a1622d0404e9a87923cf2b1b82082
> # EXP-Topic sparse-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> 4ad3891a07ed
> sparse-revlog: add a `index_segment_span` function in C

> + PyErr_Format(PyExc_ValueError,
> +  "corrupted revlog index: inconsistent offset "
> +  "between revisions (%lld) and (%lld)",
> +  (long long)start_rev, (long long)end_offset);

s/%lld/%zd/g, and s/end_offset/end_rev/.

The doc says size_t format notations are supported.

https://docs.python.org/2/c-api/string.html#c.PyString_FromFormat
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 7 V5] sparse-revlog: add a `index_get_length` function in C

2018-11-23 Thread Yuya Nishihara
On Thu, 22 Nov 2018 19:08:04 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1541785378 -3600
> #  Fri Nov 09 18:42:58 2018 +0100
> # Node ID 18864760091a1622d0404e9a87923cf2b1b82082
> # Parent  b6fff7b07488608fe8ea86ffb69a74037ed15cbe
> # EXP-Topic sparse-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> 18864760091a
> sparse-revlog: add a `index_get_length` function in C

> diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
> --- a/mercurial/cext/revlog.c
> +++ b/mercurial/cext/revlog.c
> @@ -218,6 +218,31 @@ static inline int64_t index_get_start(in
>   return (int64_t)(offset >> 16);
>  }
>  
> +static inline int index_get_length(indexObject *self, Py_ssize_t rev)
> +{
> + if (rev >= self->length) {
> + PyObject *tuple;
> + PyObject *pylong;
> + long ret;
> + tuple = PyList_GET_ITEM(self->added, rev - self->length);
> + pylong = PyTuple_GET_ITEM(tuple, 1);
> + ret = PyInt_AsLong(pylong);
> + if (ret == -1 && PyErr_Occurred()) {
> + return -1;
> + }
> + if (ret < 0 || ret > (long)INT_MAX) {
> + PyErr_Format(PyExc_OverflowError,
> +  "revlog entry size out of bound (%llu)",
> +  (long long)ret);

Changed this to %ld ret as well.

> + return -1;
> + }
> + return (int)ret;
> + } else {
> + const char *data = index_deref(self, rev);
> + return (int)getbe32(data + 8);

Here, (int)getbe32(data + 8) may be negative. We have to check the underflow
so that Python interpreter wouldn't confused by bad NULL return.

Can you send a follow up?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 7 V5] sparse-revlog: add a `index_get_start` function in C

2018-11-23 Thread Yuya Nishihara
On Thu, 22 Nov 2018 19:08:03 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1542725358 0
> #  Tue Nov 20 14:49:18 2018 +
> # Node ID b6fff7b07488608fe8ea86ffb69a74037ed15cbe
> # Parent  4369c00a8ee168565fba97112283bbc00be8ce44
> # EXP-Topic sparse-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> b6fff7b07488
> sparse-revlog: add a `index_get_start` function in C

Queued the series, many thanks.

> +static inline int64_t index_get_start(indexObject *self, Py_ssize_t rev)
> +{
> + uint64_t offset;
> + if (rev >= self->length) {
> + PyObject *tuple;
> + PyObject *pylong;
> + PY_LONG_LONG tmp;
> + tuple = PyList_GET_ITEM(self->added, rev - self->length);
> + pylong = PyTuple_GET_ITEM(tuple, 0);
> + tmp = PyLong_AsLongLong(pylong);
> + if (tmp == -1 && PyErr_Occurred()) {
> + return -1;
> + }
> + if (tmp < 0) {
> + PyErr_Format(PyExc_OverflowError,
> +  "revlog entry size out of bound (%llu)",
> +  (unsigned long long)tmp);

Changed this to %lld (long long)tmp. If PyLong_AsLongLong() returned a
negative integer, it would be really a negative Python long integer.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Martin von Zweigbergk via Mercurial-devel
On Fri, Nov 23, 2018 at 9:20 AM Boris FELD  wrote:

>
> On 23/11/2018 10:24, Yuya Nishihara wrote:
> > On Fri, 23 Nov 2018 18:00:36 +0900, Yuya Nishihara wrote:
> >> On Fri, 23 Nov 2018 00:00:36 -0800, Martin von Zweigbergk via
> Mercurial-devel wrote:
> >>> On Thu, Nov 22, 2018 at 11:44 PM Martin von Zweigbergk <
> >>> martinv...@google.com> wrote:
>  On Thu, Nov 22, 2018 at 2:26 PM Boris Feld 
> wrote:
> 
> > # HG changeset patch
> > # User Boris Feld 
> > # Date 1542916922 -3600
> > #  Thu Nov 22 21:02:02 2018 +0100
> > # Node ID 018578f3ab597d5ea573107e7310470de76a3907
> > # Parent  4628c3cf1fc1052ca25296c8c1a42c4502b59dc9
> > # EXP-Topic perf-ignore-2
> > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > #  hg pull
> https://bitbucket.org/octobus/mercurial-devel/ -r
> > 018578f3ab59
> > match: avoid translating glob to matcher multiple times for large
> sets
> >
> > For hgignore with many globs, the resulting regexp might not fit
> under
> > the 20K
> > length limit. So the patterns need to be broken up in smaller pieces.
> >
>  Did you see 0f6a1bdf89fb (match: handle large regexes, 2007-08-19)
>  and 59a9dc9562e2 (ignore: split up huge patterns, 2008-02-11)? It
> might be
>  worth trying to figure out what Python versions those commits are
> talking
>  about. Maybe we've dropped support for those versions and we can
> simplify
>  this code.
> 
> >>> Oh, and what made me do the archaeology there was that you seem to have
> >>> lost the handling of OverlowError from the regex engine. As I said
> above, I
> >>> suspect that's fine because we no longer support some very old Python
> >>> versions (but please try to figure out what version that refers to).
> Still,
> >>> if we decide to drop that OverflowError handling, I'd prefer to see
> that in
> >>> an explicit commit early in this series.
> To me, 0f6a1bdf89fb (catching error from engine) is superseded by
> 59a9dc9562e2 (cannot trust the engine, preemptively raise our own error).
>

Yes, perhaps (if it was only expressions longer than 20k that raised
OverflowError). My point was that if that was the case, we should rewrite
to avoid using an internal exception for flow control, i.e. change from
this:

try:
regex = # create regex
if len(regex) > MAX_RE_SIZE:
raise OverflowError
return regex, _rematcher(regex)
except OverflowError:
# break up into smaller

to this:

regex = # create regex
if len(regex) < MAX_RE_SIZE:
return regex, _rematcher(regex)
# break up into smaller



>
> So I feel like it is fine to just rely on the size limit.
> >> Perhaps it's been fixed since 2.7.4. The regexp code width is extended
> >> from 16bit to 32bit (or Py_UCS4) integer. That should be large enough to
> >> handle practical patterns.
> >>
> >> https://bugs.python.org/issue1160
>
> Thanks for digging this out. It looks like we may be able to drop this
> limit altogether. However, I would like to make it a change distinct
> from this series.
>
> The current code is very problematic for some people (to the point where
> the majority of `hg status` time is spent in that function). I would
> like to get fast code for the same semantic first. Then look into
> changing the semantic.
>

Is your concern that you might regress in performance of something by
changing how large the groups are? Or that it would be more work?

I tried creating a regex for *every* pattern and that actually seemed
faster (to my surprise), both when creating the matcher and when evaluating
it. I tried it on the mozilla-unified repo both with 1k files and with 10k
files in the hgignores. I used the following patch on top of your series.

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -1184,51 +1184,15 @@ def _buildmatch(kindpats, globsuffix, li
 else:
 return regex, lambda f: any(mf(f) for mf in matchfuncs)

-MAX_RE_SIZE = 2
-_BASE_SIZE = len('(?:)') - 1
-
-def _joinregexes(regexps):
-"""gather multiple regular expressions into a single one"""
-return '(?:%s)' % '|'.join(regexps)
-
 def _buildregexmatch(kindpats, globsuffix):
 """Build a match function from a list of kinds and kindpats,
 return regexp string and a matcher function.
-
-Test too large input
->>> _buildregexmatch([
-... ('relglob', '?' * MAX_RE_SIZE, '')
-... ], '$')
-Traceback (most recent call last):
-...
-Abort: matcher pattern is too long (20009 bytes)
 """
 try:
-allgroups = []
 regexps = [_regex(k, p, globsuffix) for (k, p, s) in kindpats]
-fullregexp = _joinregexes(regexps)
-
-startidx = 0
-groupsize = _BASE_SIZE
-for idx, r in enumerate(regexps):
-piecesize = len(r)
-if (piecesize + 4) > MAX_RE_SIZE:
-msg = _("matcher 

Re: [PATCH 2 of 6 V2] match: extract a literal constant into a symbolic one

2018-11-23 Thread Martin von Zweigbergk via Mercurial-devel
On Fri, Nov 23, 2018 at 9:17 AM Boris FELD  wrote:

> On 23/11/2018 08:17, Martin von Zweigbergk via Mercurial-devel wrote:
>
>
>
> On Thu, Nov 22, 2018 at 2:21 PM Boris Feld  wrote:
>
>> # HG changeset patch
>> # User Boris Feld 
>> # Date 1542903632 -3600
>> #  Thu Nov 22 17:20:32 2018 +0100
>> # Node ID 7540e746d44775c7098d5fa473be9968317616f1
>> # Parent  98300756a74d424fcd1510b0bb98f07b9b0f8663
>> # EXP-Topic perf-ignore-2
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
>> 7540e746d447
>> match: extract a literal constant into a symbolic one
>>
>> diff --git a/mercurial/match.py b/mercurial/match.py
>> --- a/mercurial/match.py
>> +++ b/mercurial/match.py
>> @@ -1184,13 +1184,15 @@ def _buildmatch(kindpats, globsuffix, li
>>  else:
>>  return regex, lambda f: any(mf(f) for mf in matchfuncs)
>>
>> +MAXRESIZE = 2
>
>
> Would be clearer as MAX_RE_SIZE or MAX_REGEX_SIZE (it's very easy to parse
> the current name as "max resize")
>
> We would be more than happy to use '_' here. What're the new rules
> regarding '_'? Where can we use them and where can we not?
>

I don't know, but there are many instances of constants with this style in
revlog.py, so I think it should be fine. I'd happily queue it anyway, and
if someone objects, we can drop the underscores in a follow-up patch (seems
unlikely to happen).


> ___
> Mercurial-devel mailing 
> listMercurial-devel@mercurial-scm.orghttps://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5300: py3: replace str() with pycompat.bytestr() or ('%d' % int)

2018-11-23 Thread pulkit (Pulkit Goyal)
pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  After this patch, test-fastannotate-diffopts.t is about to pass. There are 
some
  extra newlines in the output.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5300

AFFECTED FILES
  hgext/fastannotate/formatter.py

CHANGE DETAILS

diff --git a/hgext/fastannotate/formatter.py b/hgext/fastannotate/formatter.py
--- a/hgext/fastannotate/formatter.py
+++ b/hgext/fastannotate/formatter.py
@@ -39,23 +39,23 @@
 orig = hexfunc
 hexfunc = lambda x: None if x is None else orig(x)
 wnode = hexfunc(repo[None].p1().node()) + '+'
-wrev = str(repo[None].p1().rev())
+wrev = '%d' % repo[None].p1().rev()
 wrevpad = ''
 if not opts.get('changeset'): # only show + if changeset is hidden
 wrev += '+'
 wrevpad = ' '
-revenc = lambda x: wrev if x is None else str(x) + wrevpad
-csetenc = lambda x: wnode if x is None else str(x) + ' '
+revenc = lambda x: wrev if x is None else ('%d' % x) + wrevpad
+csetenc = lambda x: wnode if x is None else pycompat.bytestr(x) + 
' '
 else:
-revenc = csetenc = str
+revenc = csetenc = pycompat.bytestr
 
 # opt name, separator, raw value (for json/plain), encoder (for plain)
 opmap = [('user', ' ', lambda x: getctx(x).user(), ui.shortuser),
  ('number', ' ', lambda x: getctx(x).rev(), revenc),
  ('changeset', ' ', lambda x: hexfunc(x[0]), csetenc),
  ('date', ' ', lambda x: getctx(x).date(), datefunc),
- ('file', ' ', lambda x: x[2], str),
- ('line_number', ':', lambda x: x[1] + 1, str)]
+ ('file', ' ', lambda x: x[2], pycompat.bytestr),
+ ('line_number', ':', lambda x: x[1] + 1, pycompat.bytestr)]
 fieldnamemap = {'number': 'rev', 'changeset': 'node'}
 funcmap = [(get, sep, fieldnamemap.get(op, op), enc)
for op, sep, get, enc in opmap



To: pulkit, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 4 of 4 V2] perf: disable revlogs clearing in `perftags` by default

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542735864 0
#  Tue Nov 20 17:44:24 2018 +
# Node ID 7677f117fbecdb04e993ad84af1591b395762f00
# Parent  438718ecc3058c55b0d5a4a9742b3325b83e78cc
# EXP-Topic perf-tags
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
7677f117fbec
perf: disable revlogs clearing in `perftags` by default

This aligns things with what `perfbookmarks` does. I decided to disable the
revlogs clearing by default to focus on the core logic by default, ignoring
side effects.

If we prefer to emphasize the side effect, we can instead keep this on in
`perftags` and enable it by default in `perfbookmarks`.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -530,7 +530,7 @@ def perfheads(ui, repo, **opts):
 
 @command(b'perftags', formatteropts+
 [
-(b'', b'clear-revlogs', True, 'refresh changelog and manifest'),
+(b'', b'clear-revlogs', False, 'refresh changelog and manifest'),
 ])
 def perftags(ui, repo, **opts):
 import mercurial.changelog
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 4 V2] tags: cache `repo.changelog` access when checking tags nodes

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542710295 0
#  Tue Nov 20 10:38:15 2018 +
# Node ID 832048aabff97aa43cd306cd70cea00227f5e19e
# Parent  2e15140b7b18f40ebbcf71e82c99acf8edadb69b
# EXP-Topic perf-tags
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
832048aabff9
tags: cache `repo.changelog` access when checking tags nodes

The tags reading process checks if the nodes referenced in tags exist. Caching
the access to `repo.changelog` provides a large speedup for repositories with
many tags.

running `hg perftags` in a large private repository
before: ! wall 0.393464 comb 0.39 user 0.33 sys 0.06 (median of 25)
after:  ! wall 0.267711 comb 0.27 user 0.21 sys 0.06 (median of 38)

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1416,13 +1416,11 @@ class localrepository(object):
 tags, tt = self._findtags()
 else:
 tags = self._tagscache.tags
+rev = self.changelog.nodemap.get
 for k, v in tags.iteritems():
-try:
-# ignore tags to unknown nodes
-self.changelog.rev(v)
+# ignore tags to unknown nodes
+if rev(v) is not None:
 t[k] = v
-except (error.LookupError, ValueError):
-pass
 return t
 
 def _findtags(self):
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 4 V2] perf: add a `clear-revlogs` flag to `perftags`

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542710780 0
#  Tue Nov 20 10:46:20 2018 +
# Node ID 2e15140b7b18f40ebbcf71e82c99acf8edadb69b
# Parent  4369c00a8ee168565fba97112283bbc00be8ce44
# EXP-Topic perf-tags
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
2e15140b7b18
perf: add a `clear-revlogs` flag to `perftags`

This flag (on by default) makes it possible to disable the refresh of the
changelog and revlog. This is useful to check for the time spent in the core
tags logic without the associated side effects. Usually, these side effects
are shared with other logics (eg: bookmarks).

Example output in my Mercurial repository

$ hg perftags
! wall 0.017919 comb 0.02 user 0.02 sys 0.00 (best of 141)
$ hg perftags --no-clear-revlogs
! wall 0.012982 comb 0.01 user 0.01 sys 0.00 (best of 207)

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -528,7 +528,10 @@ def perfheads(ui, repo, **opts):
 timer(d)
 fm.end()
 
-@command(b'perftags', formatteropts)
+@command(b'perftags', formatteropts+
+[
+(b'', b'clear-revlogs', True, 'refresh changelog and manifest'),
+])
 def perftags(ui, repo, **opts):
 import mercurial.changelog
 import mercurial.manifest
@@ -537,11 +540,13 @@ def perftags(ui, repo, **opts):
 timer, fm = gettimer(ui, opts)
 svfs = getsvfs(repo)
 repocleartagscache = repocleartagscachefunc(repo)
+clearrevlogs = opts['clear_revlogs']
 def s():
-repo.changelog = mercurial.changelog.changelog(svfs)
-rootmanifest = mercurial.manifest.manifestrevlog(svfs)
-repo.manifestlog = mercurial.manifest.manifestlog(svfs, repo,
-  rootmanifest)
+if clearrevlogs:
+repo.changelog = mercurial.changelog.changelog(svfs)
+rootmanifest = mercurial.manifest.manifestrevlog(svfs)
+repo.manifestlog = mercurial.manifest.manifestlog(svfs, repo,
+  rootmanifest)
 repocleartagscache()
 def t():
 return len(repo.tags())
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 4 V2] perf: add a `clear-revlogs` flag to `perfbookmarks`

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542711320 0
#  Tue Nov 20 10:55:20 2018 +
# Node ID 438718ecc3058c55b0d5a4a9742b3325b83e78cc
# Parent  832048aabff97aa43cd306cd70cea00227f5e19e
# EXP-Topic perf-tags
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
438718ecc305
perf: add a `clear-revlogs` flag to `perfbookmarks`

This flag (off by default) makes it possible to enable the refresh of the
changelog and revlog. This is useful to check for costly side effects of
bookmark loading.

Usually, these side effects are shared with other logics (eg: tags).

example output in my mercurial repo (with 1 bookmark, so not a great example):
$ hg perfbookmarks
! wall 0.44
$ hg perfbookmarks --clear-revlogs
! wall 0.001380

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -577,13 +577,20 @@ def perfancestorset(ui, repo, revset, **
 timer(d)
 fm.end()
 
-@command(b'perfbookmarks', formatteropts)
+@command(b'perfbookmarks', formatteropts +
+[
+(b'', b'clear-revlogs', False, 'refresh changelog and manifest'),
+])
 def perfbookmarks(ui, repo, **opts):
 """benchmark parsing bookmarks from disk to memory"""
 opts = _byteskwargs(opts)
 timer, fm = gettimer(ui, opts)
 
+svfs = getsvfs(repo)
+clearrevlogs = opts['clear_revlogs']
 def s():
+if clearrevlogs:
+repo.changelog = mercurial.changelog.changelog(svfs)
 clearfilecache(repo, b'_bookmarks')
 def d():
 repo._bookmarks
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 06 of 13] perf: prewarm the branchmap in perfbranchmapload

2018-11-23 Thread Boris FELD
On 23/11/2018 15:49, Pulkit Goyal wrote:
>
>
> On Fri, Nov 23, 2018 at 5:19 PM Boris Feld  > wrote:
>
> # HG changeset patch
> # User Boris Feld  >
> # Date 1542935471 -3600
> #      Fri Nov 23 02:11:11 2018 +0100
> # Node ID 9f543638d909768a0db0aa779d37817c4b8878ab
> # Parent  e72da9d014ba91ee4f2fe620a9646404a64d7484
> # EXP-Topic perf-branchmap
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull
> https://bitbucket.org/octobus/mercurial-devel/ -r 9f543638d909
> perf: prewarm the branchmap in perfbranchmapload
>
> It is not very interesting to have the command randomly failing
> because the
> branchmap for the tested filter happens to be cold. So we make
> sure to have a
> valid up to date branchmap before going further.
>
> The data might still be missing from disk if a subset was
> equivalent. See next
> changeset for details and fix.
>
> diff --git a/contrib/perf.py b/contrib/perf.py
> --- a/contrib/perf.py
> +++ b/contrib/perf.py
> @@ -2203,6 +2203,9 @@ def perfbranchmapload(ui, repo, filter=b
>          repo = repoview.repoview(repo, filter)
>      else:
>          repo = repo.unfiltered()
> +
> +    repo.branchmap # make sure we have a relevant, up to date
> branchmap
>
>
> Do you want to call this function? According to my understanding it
> won't load the branchmap or do something useful.
Ho yeah, good catch.
>
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Boris FELD

On 23/11/2018 10:24, Yuya Nishihara wrote:
> On Fri, 23 Nov 2018 18:00:36 +0900, Yuya Nishihara wrote:
>> On Fri, 23 Nov 2018 00:00:36 -0800, Martin von Zweigbergk via 
>> Mercurial-devel wrote:
>>> On Thu, Nov 22, 2018 at 11:44 PM Martin von Zweigbergk <
>>> martinv...@google.com> wrote:
 On Thu, Nov 22, 2018 at 2:26 PM Boris Feld  wrote:

> # HG changeset patch
> # User Boris Feld 
> # Date 1542916922 -3600
> #  Thu Nov 22 21:02:02 2018 +0100
> # Node ID 018578f3ab597d5ea573107e7310470de76a3907
> # Parent  4628c3cf1fc1052ca25296c8c1a42c4502b59dc9
> # EXP-Topic perf-ignore-2
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 018578f3ab59
> match: avoid translating glob to matcher multiple times for large sets
>
> For hgignore with many globs, the resulting regexp might not fit under
> the 20K
> length limit. So the patterns need to be broken up in smaller pieces.
>
 Did you see 0f6a1bdf89fb (match: handle large regexes, 2007-08-19)
 and 59a9dc9562e2 (ignore: split up huge patterns, 2008-02-11)? It might be
 worth trying to figure out what Python versions those commits are talking
 about. Maybe we've dropped support for those versions and we can simplify
 this code.

>>> Oh, and what made me do the archaeology there was that you seem to have
>>> lost the handling of OverlowError from the regex engine. As I said above, I
>>> suspect that's fine because we no longer support some very old Python
>>> versions (but please try to figure out what version that refers to). Still,
>>> if we decide to drop that OverflowError handling, I'd prefer to see that in
>>> an explicit commit early in this series.
To me, 0f6a1bdf89fb (catching error from engine) is superseded by
59a9dc9562e2 (cannot trust the engine, preemptively raise our own error).

So I feel like it is fine to just rely on the size limit.
>> Perhaps it's been fixed since 2.7.4. The regexp code width is extended
>> from 16bit to 32bit (or Py_UCS4) integer. That should be large enough to
>> handle practical patterns.
>>
>> https://bugs.python.org/issue1160

Thanks for digging this out. It looks like we may be able to drop this
limit altogether. However, I would like to make it a change distinct
from this series.

The current code is very problematic for some people (to the point where
the majority of `hg status` time is spent in that function). I would
like to get fast code for the same semantic first. Then look into
changing the semantic.

> That said, combining more chunks of regex patterns might be likely to
> lead to another funny problem.
>
> % python -c 'import re; re.compile("(a)" * 100)'
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/lib/python2.7/re.py", line 194, in compile
> return _compile(pattern, flags)
>   File "/usr/lib/python2.7/re.py", line 249, in _compile
> p = sre_compile.compile(pattern, flags)
>   File "/usr/lib/python2.7/sre_compile.py", line 583, in compile
> "sorry, but this version only supports 100 named groups"
> AssertionError: sorry, but this version only supports 100 named groups
>
> It's unrelated to the OverflowError issue, but splitting patterns could
> help avoiding the 100-named-group problem.

By chance, my current gigantic use case does not involve named groups.

Catching AssertionError, will be fun. I wish there were some clean API
to expose and check engine limitation.

> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 6 V2] match: extract a literal constant into a symbolic one

2018-11-23 Thread Boris FELD
On 23/11/2018 08:17, Martin von Zweigbergk via Mercurial-devel wrote:
>
>
> On Thu, Nov 22, 2018 at 2:21 PM Boris Feld  > wrote:
>
> # HG changeset patch
> # User Boris Feld  >
> # Date 1542903632 -3600
> #      Thu Nov 22 17:20:32 2018 +0100
> # Node ID 7540e746d44775c7098d5fa473be9968317616f1
> # Parent  98300756a74d424fcd1510b0bb98f07b9b0f8663
> # EXP-Topic perf-ignore-2
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull
> https://bitbucket.org/octobus/mercurial-devel/ -r 7540e746d447
> match: extract a literal constant into a symbolic one
>
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -1184,13 +1184,15 @@ def _buildmatch(kindpats, globsuffix, li
>      else:
>          return regex, lambda f: any(mf(f) for mf in matchfuncs)
>
> +MAXRESIZE = 2
>
>
> Would be clearer as MAX_RE_SIZE or MAX_REGEX_SIZE (it's very easy to
> parse the current name as "max resize")
We would be more than happy to use '_' here. What're the new rules
regarding '_'? Where can we use them and where can we not?
>
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5299: phabricator: fallback reading arcanist config files

2018-11-23 Thread philpep (Philippe Pepiot)
philpep added inline comments.

INLINE COMMENTS

> phabricator.py:175
> +def readarcconfig(repo):
> +"""Return url, token, callsign read from arcanist config files
> +

I'd be nice to cache the result of `readarcconfig` but I don't known how to 
implement this, any suggestion ?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5299

To: philpep, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5299: phabricator: fallback reading arcanist config files

2018-11-23 Thread philpep (Philippe Pepiot)
philpep added inline comments.

INLINE COMMENTS

> phabricator.py:201
> +if conduit_uri is not None:
> +token = config.get('hosts', {}).get(conduit_uri, {}).get('token')
> +url = conduit_uri.rstrip('/api/')

HINT: This doesn't work for current mercurial config because "arc 
install-certificates" add a trailing "/" to conduit_uri and our .arcconfig 
doesn't have this trailing slash.
Any idea how to handle this properly ?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5299

To: philpep, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5299: phabricator: fallback reading arcanist config files

2018-11-23 Thread philpep (Philippe Pepiot)
philpep created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This allow the phabricator extension to read arc config files to 
auto-configure
  url, token and callsign.
  
  We use it as a fallback when phabricator.url or phabricator.callsign aren't
  defined.
  
  This allow to configure conduit_uri and repository.callsign in a tracked
  .arcconfig json file in the root of the repository, so users having lot of
  small repositories in phabricator doesn't need to configure .hg/hgrc after a
  fresh clone.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5299

AFFECTED FILES
  hgext/phabricator.py

CHANGE DETAILS

diff --git a/hgext/phabricator.py b/hgext/phabricator.py
--- a/hgext/phabricator.py
+++ b/hgext/phabricator.py
@@ -37,14 +37,18 @@
 
 # API token. Get it from https://$HOST/conduit/login/
 example.phabtoken = cli-
+
+As a fallback, read config from arc config files (.arcconfig, ~/.arcrc and
+/etc/arcconfig)
 """
 
 from __future__ import absolute_import
 
 import itertools
 import json
 import operator
 import re
+import os
 
 from mercurial.node import bin, nullid
 from mercurial.i18n import _
@@ -167,16 +171,51 @@
 process(b'', params)
 return util.urlreq.urlencode(flatparams)
 
+def readarcconfig(repo):
+"""Return url, token, callsign read from arcanist config files
+
+This read and merge content of /etc/arcconfig, ~/.arcrc and .arconfig.
+"""
+if os.name == 'nt':
+paths = [
+os.path.join(os.environ['ProgramData'],
+ 'Phabricator',
+ 'Arcanist',
+ 'config'),
+os.path.join(os.environ['AppData'], '.arcrc'),
+]
+else:
+paths = [
+os.path.join('/etc', 'arcconfig'),
+os.path.join(os.path.expanduser('~'), '.arcrc'),
+os.path.join(repo.root, '.arcconfig'),
+]
+config = {}
+for path in paths:
+if os.path.exists(path):
+with open(path, 'rb') as f:
+config.update(json.load(f))
+callsign = config.get('repository.callsign')
+conduit_uri = config.get('conduit_uri', config.get('config', 
{}).get('default'))
+if conduit_uri is not None:
+token = config.get('hosts', {}).get(conduit_uri, {}).get('token')
+url = conduit_uri.rstrip('/api/')
+return url, token, callsign
+
 def readurltoken(repo):
 """return conduit url, token and make sure they exist
 
-Currently read from [auth] config section. In the future, it might
-make sense to read from .arcconfig and .arcrc as well.
+Currently read from [auth] config section and fallback to reading arc
+config files.
 """
 url = repo.ui.config(b'phabricator', b'url')
 if not url:
-raise error.Abort(_(b'config %s.%s is required')
-  % (b'phabricator', b'url'))
+url, token, __ = readarcconfig(repo)
+if not url or not token:
+raise error.Abort(_(b'unable to read phabricator conduit url and '
+b'token from config %s.%s or from arc config '
+b'files') % (b'phabricator', b'url'))
+return url, token
 
 res = httpconnectionmod.readauthforuri(repo.ui, url, util.url(url).user)
 token = None
@@ -241,7 +280,9 @@
 return repophid
 callsign = repo.ui.config(b'phabricator', b'callsign')
 if not callsign:
-return None
+__, __, callsign = readarcconfig(repo)
+if not callsign:
+return callsign
 query = callconduit(repo, b'diffusion.repository.search',
 {b'constraints': {b'callsigns': [callsign]}})
 if len(query[r'data']) == 0:



To: philpep, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Martin von Zweigbergk via Mercurial-devel
On Fri, Nov 23, 2018, 01:24 Yuya Nishihara  On Fri, 23 Nov 2018 18:00:36 +0900, Yuya Nishihara wrote:
> > On Fri, 23 Nov 2018 00:00:36 -0800, Martin von Zweigbergk via
> Mercurial-devel wrote:
> > > On Thu, Nov 22, 2018 at 11:44 PM Martin von Zweigbergk <
> > > martinv...@google.com> wrote:
> > > > On Thu, Nov 22, 2018 at 2:26 PM Boris Feld 
> wrote:
> > > >
> > > >> # HG changeset patch
> > > >> # User Boris Feld 
> > > >> # Date 1542916922 -3600
> > > >> #  Thu Nov 22 21:02:02 2018 +0100
> > > >> # Node ID 018578f3ab597d5ea573107e7310470de76a3907
> > > >> # Parent  4628c3cf1fc1052ca25296c8c1a42c4502b59dc9
> > > >> # EXP-Topic perf-ignore-2
> > > >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> > > >> #  hg pull
> https://bitbucket.org/octobus/mercurial-devel/ -r
> > > >> 018578f3ab59
> > > >> match: avoid translating glob to matcher multiple times for large
> sets
> > > >>
> > > >> For hgignore with many globs, the resulting regexp might not fit
> under
> > > >> the 20K
> > > >> length limit. So the patterns need to be broken up in smaller
> pieces.
> > > >>
> > > >
> > > > Did you see 0f6a1bdf89fb (match: handle large regexes, 2007-08-19)
> > > > and 59a9dc9562e2 (ignore: split up huge patterns, 2008-02-11)? It
> might be
> > > > worth trying to figure out what Python versions those commits are
> talking
> > > > about. Maybe we've dropped support for those versions and we can
> simplify
> > > > this code.
> > > >
> > >
> > > Oh, and what made me do the archaeology there was that you seem to have
> > > lost the handling of OverlowError from the regex engine. As I said
> above, I
> > > suspect that's fine because we no longer support some very old Python
> > > versions (but please try to figure out what version that refers to).
> Still,
> > > if we decide to drop that OverflowError handling, I'd prefer to see
> that in
> > > an explicit commit early in this series.
> >
> > Perhaps it's been fixed since 2.7.4. The regexp code width is extended
> > from 16bit to 32bit (or Py_UCS4) integer. That should be large enough to
> > handle practical patterns.
> >
> > https://bugs.python.org/issue1160
>
> That said, combining more chunks of regex patterns might be likely to
> lead to another funny problem.
>
> % python -c 'import re; re.compile("(a)" * 100)'
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/lib/python2.7/re.py", line 194, in compile
> return _compile(pattern, flags)
>   File "/usr/lib/python2.7/re.py", line 249, in _compile
> p = sre_compile.compile(pattern, flags)
>   File "/usr/lib/python2.7/sre_compile.py", line 583, in compile
> "sorry, but this version only supports 100 named groups"
> AssertionError: sorry, but this version only supports 100 named groups
>
> It's unrelated to the OverflowError issue, but splitting patterns could
> help avoiding the 100-named-group problem.
>

Another solution to that problem seems to be to use unnamed groups, of
course. I'd expect using a single regex to be faster at least with re2 (I
know too little about how Python's regex engine works).

>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 05 of 13] perf: add --clear-revlog flag to branchmapload

2018-11-23 Thread Pulkit Goyal
On Fri, Nov 23, 2018 at 5:18 PM Boris Feld  wrote:

> # HG changeset patch
> # User Boris Feld 
> # Date 1542951152 -3600
> #  Fri Nov 23 06:32:32 2018 +0100
> # Node ID e72da9d014ba91ee4f2fe620a9646404a64d7484
> # Parent  ba101026c80452a9f60b3d574011c7e8773b5a4e
> # EXP-Topic perf-branchmap
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> e72da9d014ba
> perf: add --clear-revlog flag to branchmapload
>
> Having the changelog index already loaded when loading the branchmap can
> have a
> large impact on performance.
>
> Example runs (large private repository):
>
> hg perfbranchmapload -f base
> ! wall 0.116722 comb 0.12 user 0.11 sys 0.01 (best of 59)
> hg perfbranchmapload -f base --clear-revlogs
> ! wall 0.258246 comb 0.23 user 0.22 sys 0.01 (best of 31)
>

Queued 1-5, many thanks!

>
> diff --git a/contrib/perf.py b/contrib/perf.py
> --- a/contrib/perf.py
> +++ b/contrib/perf.py
> @@ -2184,10 +2184,13 @@ def perfbranchmap(ui, repo, *filternames
>  @command(b'perfbranchmapload', [
>   (b'f', b'filter', b'', b'Specify repoview filter'),
>   (b'', b'list', False, b'List brachmap filter caches'),
> + (b'', b'clear-revlogs', False, 'refresh changelog and manifest'),
>

Added b'' prefix to 'refresh changelog ...' for py3 compatibility.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 06 of 13] perf: prewarm the branchmap in perfbranchmapload

2018-11-23 Thread Pulkit Goyal
On Fri, Nov 23, 2018 at 5:19 PM Boris Feld  wrote:

> # HG changeset patch
> # User Boris Feld 
> # Date 1542935471 -3600
> #  Fri Nov 23 02:11:11 2018 +0100
> # Node ID 9f543638d909768a0db0aa779d37817c4b8878ab
> # Parent  e72da9d014ba91ee4f2fe620a9646404a64d7484
> # EXP-Topic perf-branchmap
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 9f543638d909
> perf: prewarm the branchmap in perfbranchmapload
>
> It is not very interesting to have the command randomly failing because the
> branchmap for the tested filter happens to be cold. So we make sure to
> have a
> valid up to date branchmap before going further.
>
> The data might still be missing from disk if a subset was equivalent. See
> next
> changeset for details and fix.
>
> diff --git a/contrib/perf.py b/contrib/perf.py
> --- a/contrib/perf.py
> +++ b/contrib/perf.py
> @@ -2203,6 +2203,9 @@ def perfbranchmapload(ui, repo, filter=b
>  repo = repoview.repoview(repo, filter)
>  else:
>  repo = repo.unfiltered()
> +
> +repo.branchmap # make sure we have a relevant, up to date branchmap
>

Do you want to call this function? According to my understanding it won't
load the branchmap or do something useful.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 6 of 6 V3] match: raise an Abort error instead of OverflowError

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542904870 -3600
#  Thu Nov 22 17:41:10 2018 +0100
# Node ID 0aed4a6bdbfb848a8a6d10581721a40bd5d7d508
# Parent  b41a4db20b0898e57f9fa0bb9ebdabb30a53df75
# EXP-Topic perf-ignore
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
0aed4a6bdbfb
match: raise an Abort error instead of OverflowError

This case of OverflowError (one single pattern being too large) has never been
properly caught in the past.

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -1201,7 +1201,7 @@ def _buildregexmatch(kindpats, globsuffi
 ... ], '$')
 Traceback (most recent call last):
 ...
-OverflowError
+Abort: matcher pattern is too long (20009 bytes)
 """
 try:
 allgroups = []
@@ -1213,7 +1213,8 @@ def _buildregexmatch(kindpats, globsuffi
 for idx, r in enumerate(regexps):
 piecesize = len(r)
 if (piecesize + 4) > MAX_RE_SIZE:
-raise OverflowError
+msg = _("matcher pattern is too long (%d bytes)") % piecesize
+raise error.Abort(msg)
 elif (groupsize + 1 + piecesize) > MAX_RE_SIZE:
 group = regexps[startidx:idx]
 allgroups.append(_joinregexes(group))
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 5 of 6 V3] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542916922 -3600
#  Thu Nov 22 21:02:02 2018 +0100
# Node ID b41a4db20b0898e57f9fa0bb9ebdabb30a53df75
# Parent  4fa131d8a4d7d2aea452994458eb088e9e997df4
# EXP-Topic perf-ignore
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
b41a4db20b08
match: avoid translating glob to matcher multiple times for large sets

For hgignore with many globs, the resulting regexp might not fit under the 20K
length limit. So the patterns need to be broken up in smaller pieces.

Before this change, the logic was re-starting the full process from scratch
for each smaller pieces, including the translation of globs into regexp.
Effectively doing the work over and over.

If the 20K limit is reached, we are likely in a case where there is many such
glob, so exporting them is especially expensive and we should be careful not
to do that work more than once.

To work around this, we now translate glob to regexp once and for all. Then,
we assemble the resulting individual regexp into valid blocks.

This raises a very significant performance win for large `.hgignore file`:

Before: ! wall 0.153153 comb 0.15 user 0.15 sys 0.00 (median of 66)
After:  ! wall 0.059793 comb 0.06 user 0.06 sys 0.00 (median of 100)

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -1185,6 +1185,7 @@ def _buildmatch(kindpats, globsuffix, li
 return regex, lambda f: any(mf(f) for mf in matchfuncs)
 
 MAX_RE_SIZE = 2
+_BASE_SIZE = len('(?:)') - 1
 
 def _joinregexes(regexps):
 """gather multiple regular expressions into a single one"""
@@ -1203,21 +1204,31 @@ def _buildregexmatch(kindpats, globsuffi
 OverflowError
 """
 try:
-regex = _joinregexes([_regex(k, p, globsuffix)
-   for (k, p, s) in kindpats])
-if len(regex) > MAX_RE_SIZE:
-raise OverflowError
-return regex, _rematcher(regex)
-except OverflowError:
-# We're using a Python with a tiny regex engine and we
-# made it explode, so we'll divide the pattern list in two
-# until it works
-l = len(kindpats)
-if l < 2:
-raise
-regexa, a = _buildregexmatch(kindpats[:l//2], globsuffix)
-regexb, b = _buildregexmatch(kindpats[l//2:], globsuffix)
-return regex, lambda s: a(s) or b(s)
+allgroups = []
+regexps = [_regex(k, p, globsuffix) for (k, p, s) in kindpats]
+fullregexp = _joinregexes(regexps)
+
+startidx = 0
+groupsize = _BASE_SIZE
+for idx, r in enumerate(regexps):
+piecesize = len(r)
+if (piecesize + 4) > MAX_RE_SIZE:
+raise OverflowError
+elif (groupsize + 1 + piecesize) > MAX_RE_SIZE:
+group = regexps[startidx:idx]
+allgroups.append(_joinregexes(group))
+startidx = idx
+groupsize = _BASE_SIZE
+groupsize += piecesize + 1
+
+if startidx == 0:
+func = _rematcher(fullregexp)
+else:
+group = regexps[startidx:]
+allgroups.append(_joinregexes(group))
+allmatchers = [_rematcher(g) for g in allgroups]
+func = lambda s: any(m(s) for m in allmatchers)
+return fullregexp, func
 except re.error:
 for k, p, s in kindpats:
 try:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 4 of 6 V3] match: extract function that group regexps

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542903949 -3600
#  Thu Nov 22 17:25:49 2018 +0100
# Node ID 4fa131d8a4d7d2aea452994458eb088e9e997df4
# Parent  1fccf5fa1c8a65654083c85107649124f0721ffe
# EXP-Topic perf-ignore
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
4fa131d8a4d7
match: extract function that group regexps

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -1186,6 +1186,10 @@ def _buildmatch(kindpats, globsuffix, li
 
 MAX_RE_SIZE = 2
 
+def _joinregexes(regexps):
+"""gather multiple regular expressions into a single one"""
+return '(?:%s)' % '|'.join(regexps)
+
 def _buildregexmatch(kindpats, globsuffix):
 """Build a match function from a list of kinds and kindpats,
 return regexp string and a matcher function.
@@ -1199,8 +1203,8 @@ def _buildregexmatch(kindpats, globsuffi
 OverflowError
 """
 try:
-regex = '(?:%s)' % '|'.join([_regex(k, p, globsuffix)
- for (k, p, s) in kindpats])
+regex = _joinregexes([_regex(k, p, globsuffix)
+   for (k, p, s) in kindpats])
 if len(regex) > MAX_RE_SIZE:
 raise OverflowError
 return regex, _rematcher(regex)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 6 V3] match: test for overflow error in pattern

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542903365 -3600
#  Thu Nov 22 17:16:05 2018 +0100
# Node ID 1fccf5fa1c8a65654083c85107649124f0721ffe
# Parent  8847fda442975010d004dcb0293a40ba70434070
# EXP-Topic perf-ignore
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
1fccf5fa1c8a
match: test for overflow error in pattern

If a single pattern is too large to handle, we raise an exception. This case is
now doctested.

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -1188,7 +1188,16 @@ MAX_RE_SIZE = 2
 
 def _buildregexmatch(kindpats, globsuffix):
 """Build a match function from a list of kinds and kindpats,
-return regexp string and a matcher function."""
+return regexp string and a matcher function.
+
+Test too large input
+>>> _buildregexmatch([
+... ('relglob', '?' * MAX_RE_SIZE, '')
+... ], '$')
+Traceback (most recent call last):
+...
+OverflowError
+"""
 try:
 regex = '(?:%s)' % '|'.join([_regex(k, p, globsuffix)
  for (k, p, s) in kindpats])
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 6 V3] match: extract a literal constant into a symbolic one

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542903632 -3600
#  Thu Nov 22 17:20:32 2018 +0100
# Node ID 8847fda442975010d004dcb0293a40ba70434070
# Parent  917fa088dd67d1f56c751de59acc91abc799d83a
# EXP-Topic perf-ignore
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
8847fda44297
match: extract a literal constant into a symbolic one

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -1184,13 +1184,15 @@ def _buildmatch(kindpats, globsuffix, li
 else:
 return regex, lambda f: any(mf(f) for mf in matchfuncs)
 
+MAX_RE_SIZE = 2
+
 def _buildregexmatch(kindpats, globsuffix):
 """Build a match function from a list of kinds and kindpats,
 return regexp string and a matcher function."""
 try:
 regex = '(?:%s)' % '|'.join([_regex(k, p, globsuffix)
  for (k, p, s) in kindpats])
-if len(regex) > 2:
+if len(regex) > MAX_RE_SIZE:
 raise OverflowError
 return regex, _rematcher(regex)
 except OverflowError:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 6 V3] perf: add a perfignore command

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542916813 -3600
#  Thu Nov 22 21:00:13 2018 +0100
# Node ID 917fa088dd67d1f56c751de59acc91abc799d83a
# Parent  efd0f79246e3e6633dfd06226464a48584f69b19
# EXP-Topic perf-ignore
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
917fa088dd67
perf: add a perfignore command

The command is meant to benchmark operations related to hgignore. Right now the
command is benchmarking the loading time of the hgignore rules.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -961,6 +961,23 @@ def perfchangeset(ui, repo, rev, **opts)
 timer(d)
 fm.end()
 
+@command(b'perfignore', formatteropts)
+def perfignore(ui, repo, **opts):
+"""benchmark operation related to computing ignore"""
+opts = _byteskwargs(opts)
+timer, fm = gettimer(ui, opts)
+dirstate = repo.dirstate
+
+def setupone():
+dirstate.invalidate()
+clearfilecache(dirstate, b'_ignore')
+
+def runone():
+dirstate._ignore
+
+timer(runone, setup=setupone, title="load")
+fm.end()
+
 @command(b'perfindex', formatteropts)
 def perfindex(ui, repo, **opts):
 import mercurial.revlog
diff --git a/tests/test-contrib-perf.t b/tests/test-contrib-perf.t
--- a/tests/test-contrib-perf.t
+++ b/tests/test-contrib-perf.t
@@ -83,6 +83,7 @@ perfstatus
perffncachewrite
  (no help text available)
perfheads (no help text available)
+   perfignorebenchmark operation related to computing ignore
perfindex (no help text available)
perflinelogedits
  (no help text available)
@@ -161,6 +162,7 @@ perfstatus
   fncache already up to date
 #endif
   $ hg perfheads
+  $ hg perfignore
   $ hg perfindex
   $ hg perflinelogedits -n 1
   $ hg perfloadmarkers
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 13 of 13] perf: add a `--clear-caches` to `perfbranchmapupdate`

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542931777 -3600
#  Fri Nov 23 01:09:37 2018 +0100
# Node ID fb12c8325cea684942d949ff5ee913d90bf2d6e0
# Parent  ef17ac4ed0534a809c978c5677355d436ce90db9
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
fb12c8325cea
perf: add a `--clear-caches` to `perfbranchmapupdate`

This flag will help to measure the time we spend loading various cache that
support the branchmap update.

Example for an 500 000 revisions repository:

hg perfbranchmapupdate --base 'not tip' --target 'tip'
! wall 0.000860 comb 0.00 user 0.00 sys 0.00 (best of 336)
hg perfbranchmapupdate --base 'not tip' --target 'tip' --clear-caches
! wall 0.029494 comb 0.03 user 0.03 sys 0.00 (best of 100)

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2186,10 +2186,16 @@ def perfbranchmap(ui, repo, *filternames
 @command(b'perfbranchmapupdate', [
  (b'', b'base', [], b'subset of revision to start from'),
  (b'', b'target', [], b'subset of revision to end with'),
+ (b'', b'clear-caches', False, b'clear cache between each runs')
 ] + formatteropts)
 def perfbranchmapupdate(ui, repo, base=(), target=(), **opts):
 """benchmark branchmap update from for  revs to  revs
 
+if `--clear-caches` is passed, the following items will be reset before
+each update:
+* the changelog instance and associated indexes
+* the rev-branch-cache instance
+
 Examples:
 
# update for the one last revision
@@ -2202,6 +2208,7 @@ def perfbranchmapupdate(ui, repo, base=(
 from mercurial import repoview
 opts = _byteskwargs(opts)
 timer, fm = gettimer(ui, opts)
+clearcaches = opts['clear_caches']
 unfi = repo.unfiltered()
 x = [None] # used to pass data between closure
 
@@ -2267,6 +2274,9 @@ def perfbranchmapupdate(ui, repo, base=(
 
 def setup():
 x[0] = base.copy()
+if clearcaches:
+unfi._revbranchcache = None
+clearchangelog(repo)
 
 def bench():
 x[0].update(targetrepo, newrevs)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 12 of 13] perf: start from an existing branchmap if possible

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542834707 0
#  Wed Nov 21 21:11:47 2018 +
# Node ID ef17ac4ed0534a809c978c5677355d436ce90db9
# Parent  b0c7cead9b4e1c872b7bc8983a122cfb630045b6
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
ef17ac4ed053
perf: start from an existing branchmap if possible

If the --base set if a superset of one of the cached branchmap, we should use as
a starting point. This greatly help the overall runtime of
`hg perfbranchmapupdate`

For example, for a repository with about 500 000 revisions, using this trick
make the command runtime move from about 200 second to about 10 seconds. A 20x
gain.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2248,8 +2248,22 @@ def perfbranchmapupdate(ui, repo, base=(
 baserepo = repo.filtered('__perf_branchmap_update_base')
 targetrepo = repo.filtered('__perf_branchmap_update_target')
 
-base = branchmap.branchcache()
-base.update(baserepo, allbaserevs)
+# try to find an existing branchmap to reuse
+subsettable = getbranchmapsubsettable()
+candidatefilter = subsettable.get(None)
+while candidatefilter is not None:
+candidatebm = repo.filtered(candidatefilter).branchmap()
+if candidatebm.validfor(baserepo):
+filtered = repoview.filterrevs(repo, candidatefilter)
+missing = [r for r in allbaserevs if r in filtered]
+base = candidatebm.copy()
+base.update(baserepo, missing)
+break
+candidatefilter = subsettable.get(candidatefilter)
+else:
+# no suitable subset where found
+base = branchmap.branchcache()
+base.update(baserepo, allbaserevs)
 
 def setup():
 x[0] = base.copy()
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 11 of 13] perf: rely on repoview for perfbranchmapupdate

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542832522 0
#  Wed Nov 21 20:35:22 2018 +
# Node ID b0c7cead9b4e1c872b7bc8983a122cfb630045b6
# Parent  7db38417670e442bb34e0a047cfd6bed795a98f3
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
b0c7cead9b4e
perf: rely on repoview for perfbranchmapupdate

Using 'repoview' matching the base and target subset make the benchmark more
realistic. It also unlocks optimization to make the command initialization
faster.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2199,8 +2199,10 @@ def perfbranchmapupdate(ui, repo, base=(
$ hg perfbranchmapupdate --base 'stable' --target 'default'
 """
 from mercurial import branchmap
+from mercurial import repoview
 opts = _byteskwargs(opts)
 timer, fm = gettimer(ui, opts)
+unfi = repo.unfiltered()
 x = [None] # used to pass data between closure
 
 # we use a `list` here to avoid possible side effect from smartset
@@ -2223,21 +2225,43 @@ def perfbranchmapupdate(ui, repo, base=(
 newrevs = list(alltargetrevs.difference(allbaserevs))
 newrevs.sort()
 
+allrevs = frozenset(unfi.changelog.revs())
+basefilterrevs = frozenset(allrevs.difference(allbaserevs))
+targetfilterrevs = frozenset(allrevs.difference(alltargetrevs))
+
+def basefilter(repo, visibilityexceptions=None):
+return basefilterrevs
+
+def targetfilter(repo, visibilityexceptions=None):
+return targetfilterrevs
+
 msg = 'benchmark of branchmap with %d revisions with %d new ones\n'
 ui.status(msg % (len(allbaserevs), len(newrevs)))
+if targetfilterrevs:
+msg = '(%d revisions still filtered)\n'
+ui.status(msg % len(targetfilterrevs))
 
-if True:
+try:
+repoview.filtertable['__perf_branchmap_update_base'] = basefilter
+repoview.filtertable['__perf_branchmap_update_target'] = targetfilter
+
+baserepo = repo.filtered('__perf_branchmap_update_base')
+targetrepo = repo.filtered('__perf_branchmap_update_target')
+
 base = branchmap.branchcache()
-base.update(repo, allbaserevs)
+base.update(baserepo, allbaserevs)
 
 def setup():
 x[0] = base.copy()
 
 def bench():
-x[0].update(repo, newrevs)
+x[0].update(targetrepo, newrevs)
 
 timer(bench, setup=setup)
 fm.end()
+finally:
+repoview.filtertable.pop('__perf_branchmap_update_base', None)
+repoview.filtertable.pop('__perf_branchmap_update_target', None)
 
 @command(b'perfbranchmapload', [
  (b'f', b'filter', b'', b'Specify repoview filter'),
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 10 of 13] perf: pre-indent some code in `perfbranchmapupdate`

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542837366 -3600
#  Wed Nov 21 22:56:06 2018 +0100
# Node ID 7db38417670e442bb34e0a047cfd6bed795a98f3
# Parent  d9447d83339527ab8225b13e7a518fe4118ba947
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
7db38417670e
perf: pre-indent some code in `perfbranchmapupdate`

This make the next patch easier to read.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2226,17 +2226,18 @@ def perfbranchmapupdate(ui, repo, base=(
 msg = 'benchmark of branchmap with %d revisions with %d new ones\n'
 ui.status(msg % (len(allbaserevs), len(newrevs)))
 
-base = branchmap.branchcache()
-base.update(repo, allbaserevs)
-
-def setup():
-x[0] = base.copy()
+if True:
+base = branchmap.branchcache()
+base.update(repo, allbaserevs)
 
-def bench():
-x[0].update(repo, newrevs)
+def setup():
+x[0] = base.copy()
 
-timer(bench, setup=setup)
-fm.end()
+def bench():
+x[0].update(repo, newrevs)
+
+timer(bench, setup=setup)
+fm.end()
 
 @command(b'perfbranchmapload', [
  (b'f', b'filter', b'', b'Specify repoview filter'),
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 09 of 13] perf: add a `perfbranchmapupdate` command

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542801745 0
#  Wed Nov 21 12:02:25 2018 +
# Node ID d9447d83339527ab8225b13e7a518fe4118ba947
# Parent  1ab87863dc79a8e3139c723e83bef8c1ce107016
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
d9447d833395
perf: add a `perfbranchmapupdate` command

This command benchmark the time necessary to update the branchmap between two
sets of revisions. This changeset introduce a first version, doing nothing fancy
regarding cache or other internal details.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2183,6 +2183,61 @@ def perfbranchmap(ui, repo, *filternames
 branchcachewrite.restore()
 fm.end()
 
+@command(b'perfbranchmapupdate', [
+ (b'', b'base', [], b'subset of revision to start from'),
+ (b'', b'target', [], b'subset of revision to end with'),
+] + formatteropts)
+def perfbranchmapupdate(ui, repo, base=(), target=(), **opts):
+"""benchmark branchmap update from for  revs to  revs
+
+Examples:
+
+   # update for the one last revision
+   $ hg perfbranchmapupdate --base 'not tip' --target 'tip'
+
+   $ update for change coming with a new branch
+   $ hg perfbranchmapupdate --base 'stable' --target 'default'
+"""
+from mercurial import branchmap
+opts = _byteskwargs(opts)
+timer, fm = gettimer(ui, opts)
+x = [None] # used to pass data between closure
+
+# we use a `list` here to avoid possible side effect from smartset
+baserevs = list(scmutil.revrange(repo, base))
+targetrevs = list(scmutil.revrange(repo, target))
+if not baserevs:
+raise error.Abort('no revisions selected for --base')
+if not targetrevs:
+raise error.Abort('no revisions selected for --target')
+
+# make sure the target branchmap also contains the one in the base
+targetrevs = list(set(baserevs) | set(targetrevs))
+targetrevs.sort()
+
+cl = repo.changelog
+allbaserevs = list(cl.ancestors(baserevs, inclusive=True))
+allbaserevs.sort()
+alltargetrevs = frozenset(cl.ancestors(targetrevs, inclusive=True))
+
+newrevs = list(alltargetrevs.difference(allbaserevs))
+newrevs.sort()
+
+msg = 'benchmark of branchmap with %d revisions with %d new ones\n'
+ui.status(msg % (len(allbaserevs), len(newrevs)))
+
+base = branchmap.branchcache()
+base.update(repo, allbaserevs)
+
+def setup():
+x[0] = base.copy()
+
+def bench():
+x[0].update(repo, newrevs)
+
+timer(bench, setup=setup)
+fm.end()
+
 @command(b'perfbranchmapload', [
  (b'f', b'filter', b'', b'Specify repoview filter'),
  (b'', b'list', False, b'List brachmap filter caches'),
diff --git a/tests/test-contrib-perf.t b/tests/test-contrib-perf.t
--- a/tests/test-contrib-perf.t
+++ b/tests/test-contrib-perf.t
@@ -57,6 +57,9 @@ perfstatus
  benchmark the update of a branchmap
perfbranchmapload
  benchmark reading the branchmap
+   perfbranchmapupdate
+ benchmark branchmap update from for  revs to 
+ revs
perfbundleread
  Benchmark reading of bundle files.
perfcca   (no help text available)
@@ -141,6 +144,8 @@ perfstatus
   $ hg perfbookmarks
   $ hg perfbranchmap
   $ hg perfbranchmapload
+  $ hg perfbranchmapupdate --base "not tip" --target "tip"
+  benchmark of branchmap with 3 revisions with 1 new ones
   $ hg perfcca
   $ hg perfchangegroupchangelog
   $ hg perfchangeset 2
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 08 of 13] perf: run 'setup' function during stub run

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542926924 -3600
#  Thu Nov 22 23:48:44 2018 +0100
# Node ID 1ab87863dc79a8e3139c723e83bef8c1ce107016
# Parent  e2599c3fba4223ced20a7a253e67631851cab700
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
1ab87863dc79
perf: run 'setup' function during stub run

The benchmarked function might need the content of the setup to be run in order
to function properly.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -276,6 +276,8 @@ def gettimer(ui, opts=None):
 return functools.partial(_timer, fm, displayall=displayall), fm
 
 def stub_timer(fm, func, setup=None, title=None):
+if setup is not None:
+setup()
 func()
 
 @contextlib.contextmanager
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 07 of 13] perf: fallback to subset if ondisk cache is missing in perfbranchmapload

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542935281 -3600
#  Fri Nov 23 02:08:01 2018 +0100
# Node ID e2599c3fba4223ced20a7a253e67631851cab700
# Parent  9f543638d909768a0db0aa779d37817c4b8878ab
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
e2599c3fba42
perf: fallback to subset if ondisk cache is missing in perfbranchmapload

If there is no branchmap on disk for that filter, it means that the cache from
some subset's filter is relevant for this one. We look for it instead of
aborting.

That way it is much simpler to run the command in an automated way. We can now
add it to `test-contrib-perf.t`.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2199,17 +2199,25 @@ def perfbranchmapload(ui, repo, filter=b
 ui.status(b'%s - %s\n'
   % (filtername, util.bytecount(st.st_size)))
 return
-if filter:
+if not filter:
+filter = None
+subsettable = getbranchmapsubsettable()
+if filter is None:
+repo = repo.unfiltered()
+else:
 repo = repoview.repoview(repo, filter)
-else:
-repo = repo.unfiltered()
 
 repo.branchmap # make sure we have a relevant, up to date branchmap
 
+
+currentfilter = filter
 # try once without timer, the filter may not be cached
-if branchmap.read(repo) is None:
-raise error.Abort(b'No branchmap cached for %s repo'
-  % (filter or b'unfiltered'))
+while branchmap.read(repo) is None:
+currentfilter = subsettable.get(currentfilter)
+if currentfilter is None:
+raise error.Abort(b'No branchmap cached for %s repo'
+  % (filter or b'unfiltered'))
+repo = repo.filtered(currentfilter)
 timer, fm = gettimer(ui, opts)
 def setup():
 if clearrevlogs:
diff --git a/tests/test-contrib-perf.t b/tests/test-contrib-perf.t
--- a/tests/test-contrib-perf.t
+++ b/tests/test-contrib-perf.t
@@ -140,6 +140,7 @@ perfstatus
   $ hg perfunidiff --alldata 1
   $ hg perfbookmarks
   $ hg perfbranchmap
+  $ hg perfbranchmapload
   $ hg perfcca
   $ hg perfchangegroupchangelog
   $ hg perfchangeset 2
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 06 of 13] perf: prewarm the branchmap in perfbranchmapload

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542935471 -3600
#  Fri Nov 23 02:11:11 2018 +0100
# Node ID 9f543638d909768a0db0aa779d37817c4b8878ab
# Parent  e72da9d014ba91ee4f2fe620a9646404a64d7484
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
9f543638d909
perf: prewarm the branchmap in perfbranchmapload

It is not very interesting to have the command randomly failing because the
branchmap for the tested filter happens to be cold. So we make sure to have a
valid up to date branchmap before going further.

The data might still be missing from disk if a subset was equivalent. See next
changeset for details and fix.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2203,6 +2203,9 @@ def perfbranchmapload(ui, repo, filter=b
 repo = repoview.repoview(repo, filter)
 else:
 repo = repo.unfiltered()
+
+repo.branchmap # make sure we have a relevant, up to date branchmap
+
 # try once without timer, the filter may not be cached
 if branchmap.read(repo) is None:
 raise error.Abort(b'No branchmap cached for %s repo'
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 05 of 13] perf: add --clear-revlog flag to branchmapload

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542951152 -3600
#  Fri Nov 23 06:32:32 2018 +0100
# Node ID e72da9d014ba91ee4f2fe620a9646404a64d7484
# Parent  ba101026c80452a9f60b3d574011c7e8773b5a4e
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
e72da9d014ba
perf: add --clear-revlog flag to branchmapload

Having the changelog index already loaded when loading the branchmap can have a
large impact on performance.

Example runs (large private repository):

hg perfbranchmapload -f base
! wall 0.116722 comb 0.12 user 0.11 sys 0.01 (best of 59)
hg perfbranchmapload -f base --clear-revlogs
! wall 0.258246 comb 0.23 user 0.22 sys 0.01 (best of 31)

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2184,10 +2184,13 @@ def perfbranchmap(ui, repo, *filternames
 @command(b'perfbranchmapload', [
  (b'f', b'filter', b'', b'Specify repoview filter'),
  (b'', b'list', False, b'List brachmap filter caches'),
+ (b'', b'clear-revlogs', False, 'refresh changelog and manifest'),
+
 ] + formatteropts)
 def perfbranchmapload(ui, repo, filter=b'', list=False, **opts):
 """benchmark reading the branchmap"""
 opts = _byteskwargs(opts)
+clearrevlogs = opts[b'clear_revlogs']
 
 if list:
 for name, kind, st in repo.cachevfs.readdir(stat=True):
@@ -2205,9 +2208,12 @@ def perfbranchmapload(ui, repo, filter=b
 raise error.Abort(b'No branchmap cached for %s repo'
   % (filter or b'unfiltered'))
 timer, fm = gettimer(ui, opts)
+def setup():
+if clearrevlogs:
+clearchangelog(repo)
 def bench():
 branchmap.read(repo)
-timer(bench)
+timer(bench, setup=setup)
 fm.end()
 
 @command(b'perfloadmarkers')
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 04 of 13] perf: introduce a function to fully "unload" a changelog

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542951148 -3600
#  Fri Nov 23 06:32:28 2018 +0100
# Node ID ba101026c80452a9f60b3d574011c7e8773b5a4e
# Parent  56efcdd74aee7b45d2b85a7d414033cae6b465c7
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
ba101026c804
perf: introduce a function to fully "unload" a changelog

The function remove various attributes and caches related to changelog.

This is getting a common requirement.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -465,6 +465,12 @@ def clearfilecache(obj, attrname):
 delattr(obj, attrname)
 obj._filecache.pop(attrname, None)
 
+def clearchangelog(repo):
+if repo is not repo.unfiltered():
+object.__setattr__(repo, r'_clcachekey', None)
+object.__setattr__(repo, r'_clcache', None)
+clearfilecache(repo.unfiltered(), 'changelog')
+
 # perf commands
 
 @command(b'perfwalk', formatteropts)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 02 of 13] perf: update function name to match `perfbranchmapload` command

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542932418 -3600
#  Fri Nov 23 01:20:18 2018 +0100
# Node ID 9f29b499e0adb22a3fd23f8e88ea281d1073aa3f
# Parent  7fd3c1f11ea10875fe81828c4298e1b2bd79c9fd
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
9f29b499e0ad
perf: update function name to match `perfbranchmapload` command

Having function with the same name as the command is simpler.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2179,7 +2179,7 @@ def perfbranchmap(ui, repo, *filternames
  (b'f', b'filter', b'', b'Specify repoview filter'),
  (b'', b'list', False, b'List brachmap filter caches'),
 ] + formatteropts)
-def perfbranchmapread(ui, repo, filter=b'', list=False, **opts):
+def perfbranchmapload(ui, repo, filter=b'', list=False, **opts):
 """benchmark reading the branchmap"""
 opts = _byteskwargs(opts)
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 03 of 13] perf: use an explicit function in perfbranchmapload

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542933177 -3600
#  Fri Nov 23 01:32:57 2018 +0100
# Node ID 56efcdd74aee7b45d2b85a7d414033cae6b465c7
# Parent  9f29b499e0adb22a3fd23f8e88ea281d1073aa3f
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
56efcdd74aee
perf: use an explicit function in perfbranchmapload

This make things clearer.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2199,7 +2199,9 @@ def perfbranchmapload(ui, repo, filter=b
 raise error.Abort(b'No branchmap cached for %s repo'
   % (filter or b'unfiltered'))
 timer, fm = gettimer(ui, opts)
-timer(lambda: branchmap.read(repo) and None)
+def bench():
+branchmap.read(repo)
+timer(bench)
 fm.end()
 
 @command(b'perfloadmarkers')
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 01 of 13] perf: fix a minor typo in perfbranchmapload

2018-11-23 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1542800807 0
#  Wed Nov 21 11:46:47 2018 +
# Node ID 7fd3c1f11ea10875fe81828c4298e1b2bd79c9fd
# Parent  d7936a9dad471d0bfe99c5ddf873fa566df6e28b
# EXP-Topic perf-branchmap
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
7fd3c1f11ea1
perf: fix a minor typo in perfbranchmapload

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2196,7 +2196,7 @@ def perfbranchmapread(ui, repo, filter=b
 repo = repo.unfiltered()
 # try once without timer, the filter may not be cached
 if branchmap.read(repo) is None:
-raise error.Abort(b'No brachmap cached for %s repo'
+raise error.Abort(b'No branchmap cached for %s repo'
   % (filter or b'unfiltered'))
 timer, fm = gettimer(ui, opts)
 timer(lambda: branchmap.read(repo) and None)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Yuya Nishihara
On Fri, 23 Nov 2018 18:00:36 +0900, Yuya Nishihara wrote:
> On Fri, 23 Nov 2018 00:00:36 -0800, Martin von Zweigbergk via Mercurial-devel 
> wrote:
> > On Thu, Nov 22, 2018 at 11:44 PM Martin von Zweigbergk <
> > martinv...@google.com> wrote:
> > > On Thu, Nov 22, 2018 at 2:26 PM Boris Feld  wrote:
> > >
> > >> # HG changeset patch
> > >> # User Boris Feld 
> > >> # Date 1542916922 -3600
> > >> #  Thu Nov 22 21:02:02 2018 +0100
> > >> # Node ID 018578f3ab597d5ea573107e7310470de76a3907
> > >> # Parent  4628c3cf1fc1052ca25296c8c1a42c4502b59dc9
> > >> # EXP-Topic perf-ignore-2
> > >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> > >> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> > >> 018578f3ab59
> > >> match: avoid translating glob to matcher multiple times for large sets
> > >>
> > >> For hgignore with many globs, the resulting regexp might not fit under
> > >> the 20K
> > >> length limit. So the patterns need to be broken up in smaller pieces.
> > >>
> > >
> > > Did you see 0f6a1bdf89fb (match: handle large regexes, 2007-08-19)
> > > and 59a9dc9562e2 (ignore: split up huge patterns, 2008-02-11)? It might be
> > > worth trying to figure out what Python versions those commits are talking
> > > about. Maybe we've dropped support for those versions and we can simplify
> > > this code.
> > >
> > 
> > Oh, and what made me do the archaeology there was that you seem to have
> > lost the handling of OverlowError from the regex engine. As I said above, I
> > suspect that's fine because we no longer support some very old Python
> > versions (but please try to figure out what version that refers to). Still,
> > if we decide to drop that OverflowError handling, I'd prefer to see that in
> > an explicit commit early in this series.
> 
> Perhaps it's been fixed since 2.7.4. The regexp code width is extended
> from 16bit to 32bit (or Py_UCS4) integer. That should be large enough to
> handle practical patterns.
> 
> https://bugs.python.org/issue1160

That said, combining more chunks of regex patterns might be likely to
lead to another funny problem.

% python -c 'import re; re.compile("(a)" * 100)'
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
  File "/usr/lib/python2.7/re.py", line 249, in _compile
p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python2.7/sre_compile.py", line 583, in compile
"sorry, but this version only supports 100 named groups"
AssertionError: sorry, but this version only supports 100 named groups

It's unrelated to the OverflowError issue, but splitting patterns could
help avoiding the 100-named-group problem.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Yuya Nishihara
On Fri, 23 Nov 2018 00:00:36 -0800, Martin von Zweigbergk via Mercurial-devel 
wrote:
> On Thu, Nov 22, 2018 at 11:44 PM Martin von Zweigbergk <
> martinv...@google.com> wrote:
> > On Thu, Nov 22, 2018 at 2:26 PM Boris Feld  wrote:
> >
> >> # HG changeset patch
> >> # User Boris Feld 
> >> # Date 1542916922 -3600
> >> #  Thu Nov 22 21:02:02 2018 +0100
> >> # Node ID 018578f3ab597d5ea573107e7310470de76a3907
> >> # Parent  4628c3cf1fc1052ca25296c8c1a42c4502b59dc9
> >> # EXP-Topic perf-ignore-2
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> >> 018578f3ab59
> >> match: avoid translating glob to matcher multiple times for large sets
> >>
> >> For hgignore with many globs, the resulting regexp might not fit under
> >> the 20K
> >> length limit. So the patterns need to be broken up in smaller pieces.
> >>
> >
> > Did you see 0f6a1bdf89fb (match: handle large regexes, 2007-08-19)
> > and 59a9dc9562e2 (ignore: split up huge patterns, 2008-02-11)? It might be
> > worth trying to figure out what Python versions those commits are talking
> > about. Maybe we've dropped support for those versions and we can simplify
> > this code.
> >
> 
> Oh, and what made me do the archaeology there was that you seem to have
> lost the handling of OverlowError from the regex engine. As I said above, I
> suspect that's fine because we no longer support some very old Python
> versions (but please try to figure out what version that refers to). Still,
> if we decide to drop that OverflowError handling, I'd prefer to see that in
> an explicit commit early in this series.

Perhaps it's been fixed since 2.7.4. The regexp code width is extended
from 16bit to 32bit (or Py_UCS4) integer. That should be large enough to
handle practical patterns.

https://bugs.python.org/issue1160
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

2018-11-23 Thread Martin von Zweigbergk via Mercurial-devel
On Thu, Nov 22, 2018 at 11:44 PM Martin von Zweigbergk <
martinv...@google.com> wrote:

>
>
> On Thu, Nov 22, 2018 at 2:26 PM Boris Feld  wrote:
>
>> # HG changeset patch
>> # User Boris Feld 
>> # Date 1542916922 -3600
>> #  Thu Nov 22 21:02:02 2018 +0100
>> # Node ID 018578f3ab597d5ea573107e7310470de76a3907
>> # Parent  4628c3cf1fc1052ca25296c8c1a42c4502b59dc9
>> # EXP-Topic perf-ignore-2
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
>> 018578f3ab59
>> match: avoid translating glob to matcher multiple times for large sets
>>
>> For hgignore with many globs, the resulting regexp might not fit under
>> the 20K
>> length limit. So the patterns need to be broken up in smaller pieces.
>>
>
> Did you see 0f6a1bdf89fb (match: handle large regexes, 2007-08-19)
> and 59a9dc9562e2 (ignore: split up huge patterns, 2008-02-11)? It might be
> worth trying to figure out what Python versions those commits are talking
> about. Maybe we've dropped support for those versions and we can simplify
> this code.
>

Oh, and what made me do the archaeology there was that you seem to have
lost the handling of OverlowError from the regex engine. As I said above, I
suspect that's fine because we no longer support some very old Python
versions (but please try to figure out what version that refers to). Still,
if we decide to drop that OverflowError handling, I'd prefer to see that in
an explicit commit early in this series.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel