Re: [PATCH] parsers: use buffer to store revlog index

2016-12-12 Thread Gregory Szorc
On Tue, Dec 6, 2016 at 3:46 AM, Jun Wu  wrote:

> # HG changeset patch
> # User Jun Wu 
> # Date 1481024689 0
> #  Tue Dec 06 11:44:49 2016 +
> # Node ID 36b4288ea10a9a76c1d993d47e002baaf7705ffb
> # Parent  2463b16bbd3710b619e0e948651db9948341f990
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #  hg pull https://bitbucket.org/quark-zju/hg-draft -r
> 36b4288ea10a
> parsers: use buffer to store revlog index
>

I know this is queued already, but I think C code should have more eyes
than Python code.

This patch LGTM. Only thing I would have done differently is store a
Py_buffer* instead of a Py_buffer to ensure alignment of indexObject struct
entries. But I don't think this struct is that performance sensitive to
care. Feel free to address in a follow-up if you disagree.


>
> Previously, the revlog index passed to parse_index2 must be a "string",
> which means we have to read the whole revlog index into memory. This patch
> makes the code accept a generic Py_buffer, to be more flexible - it could
> be
> a "string", or anything that implements the buffer interface, like a
> mmap-ed
> region.
>
> Note: ideally we want to remove the "data" field. However, it is still used
> in parse_index2:
>
> if (idx->inlined) {
> cache = Py_BuildValue("iO", 0, idx->data);
> 
> }
> 
> tuple = Py_BuildValue("NN", idx, cache);
> 
> return tuple;
>
> Its only users are revlogio.parseindex and revlog.__init__:
>
> # revlogio.parseindex
> index, cache = parsers.parse_index2(data, inline)
> return index, getattr(index, 'nodemap', None), cache
>
> # revlog.__init__
> d = self._io.parseindex(indexdata, self._inline)
> self.index, nodemap, self._chunkcache = d
>
> Maybe we could move the logic (testing inline and returnning "data" object)
> to revlog.py. But that should be a separate patch.
>
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -754,4 +754,5 @@ typedef struct {
> /* Type-specific fields go here. */
> PyObject *data;/* raw bytes of index */
> +   Py_buffer buf; /* buffer of data */
> PyObject **cache;  /* cached tuples */
> const char **offsets;  /* populated on demand */
> @@ -809,5 +810,5 @@ static const char *index_deref(indexObje
> }
>
> -   return PyBytes_AS_STRING(self->data) + pos * v1_hdrsize;
> +   return (const char *)(self->buf.buf) + pos * v1_hdrsize;
>  }
>
> @@ -2390,7 +2391,7 @@ static int index_assign_subscript(indexO
>  static Py_ssize_t inline_scan(indexObject *self, const char **offsets)
>  {
> -   const char *data = PyBytes_AS_STRING(self->data);
> +   const char *data = (const char *)self->buf.buf;
> Py_ssize_t pos = 0;
> -   Py_ssize_t end = PyBytes_GET_SIZE(self->data);
> +   Py_ssize_t end = self->buf.len;
> long incr = v1_hdrsize;
> Py_ssize_t len = 0;
> @@ -2426,4 +2427,5 @@ static int index_init(indexObject *self,
> self->cache = NULL;
> self->data = NULL;
> +   memset(>buf, 0, sizeof(self->buf));
> self->headrevs = NULL;
> self->filteredrevs = Py_None;
> @@ -2434,9 +2436,13 @@ static int index_init(indexObject *self,
> if (!PyArg_ParseTuple(args, "OO", _obj, _obj))
> return -1;
> -   if (!PyBytes_Check(data_obj)) {
> -   PyErr_SetString(PyExc_TypeError, "data is not a string");
> +   if (!PyObject_CheckBuffer(data_obj)) {
> +   PyErr_SetString(PyExc_TypeError,
> +   "data does not support buffer interface");
> return -1;
> }
> -   size = PyBytes_GET_SIZE(data_obj);
> +
> +   if (PyObject_GetBuffer(data_obj, >buf, PyBUF_SIMPLE) == -1)
> +   return -1;
> +   size = self->buf.len;
>
> self->inlined = inlined_obj && PyObject_IsTrue(inlined_obj);
> @@ -2479,4 +2485,8 @@ static void index_dealloc(indexObject *s
> _index_clearcaches(self);
> Py_XDECREF(self->filteredrevs);
> +   if (self->buf.buf) {
> +   PyBuffer_Release(>buf);
> +   memset(>buf, 0, sizeof(self->buf));
> +   }
> Py_XDECREF(self->data);
> Py_XDECREF(self->added);
> @@ -2578,5 +2588,6 @@ static PyTypeObject indexType = {
>   *
>   * index: an index object that lazily parses RevlogNG records
> - * cache: if data is inlined, a tuple (index_file_content, 0), else None
> + * cache: if data is inlined, a tuple (0, index_file_content), else None
> + *index_file_content could be a string, or a buffer
>   *
>   * added complications are for backwards compatibility
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

Re: [PATCH 2 of 7 upgraderepo V2] debugcommands: stub for debugupgraderepo command

2016-12-12 Thread Gregory Szorc
On Mon, Dec 12, 2016 at 2:17 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

>
>
> On 11/25/2016 04:28 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1480033449 28800
>> #  Thu Nov 24 16:24:09 2016 -0800
>> # Node ID 4fffbdc2bca22e666482b55562c0e592c8a68027
>> # Parent  c60995fce14b0e34bd1416dab3712a6c33bb29bb
>> debugcommands: stub for debugupgraderepo command
>>
>> Currently, if Mercurial introduces a new repository/store feature or
>> changes behavior of an existing feature, users must perform an
>> `hg clone` to create a new repository with hopefully the
>> correct/optimal settings. Unfortunately, even `hg clone` may not
>> give the correct results. For example, if you do a local `hg clone`,
>> you may get hardlinks to revlog files that inherit the old state.
>> If you `hg clone` from a remote or `hg clone --pull`, changegroup
>> application may bypass some optimization, such as converting to
>> generaldelta.
>>
>> Optimizing a repository is harder than it seems and requires more
>> than a simple `hg` command invocation.
>>
>> This commit starts the process of changing that. We introduce
>> `hg debugupgraderepo`, a command that performs an in-place upgrade
>> of a repository to use new, optimal features. The command is just
>> a stub right now. Features will be added in subsequent commits.
>>
>
> I like the idea of this commit, this should probably be the first one in
> the stack. The revlog cloning feature can be delayed later in the stack
> until we actually start performing upgrade.
>
> This commit does foreshadow some of the behavior of the new command,
>> notably that it doesn't do anything by default and that it takes
>> arguments that influence what actions it performs. These will be
>> explained more in subsequent commits.
>>
>> diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py
>> --- a/mercurial/debugcommands.py
>> +++ b/mercurial/debugcommands.py
>> @@ -849,3 +849,28 @@ def debugdeltachain(ui, repo, file_=None
>>   extradist=extradist, extraratio=extraratio)
>>
>>  fm.end()
>> +
>> +@command('debugupgraderepo', [
>> +('o', 'optimize', [], _('extra optimization to perform'), _('NAME')),
>> +('', 'run', False, _('performs an upgrade')),
>>
>
> We should drop these option from the command until we actually use them
> this will allow to discuss them with their actual behavior.
>
> +])
>> +def debugupgraderepo(ui, repo, run=False, optimize=None):
>> +"""upgrade a repository to use different features
>> +
>> +If no arguments are specified, the repository is evaluated for
>> upgrade
>> +and a list of problems and potential optimizations is printed.
>> +
>> +With ``--run``, a repository upgrade is performed. Behavior of the
>> upgrade
>> +can be influenced via additional arguments. More details will be
>> provided
>> +by the command output when run without ``--run``.
>>
>
> (I would also delayed the help about --run until we actually run something)
>
> +
>> +During the upgrade, the repository will be locked and no writes will
>> be
>> +allowed.
>> +
>> +At the end of the upgrade, the repository may not be readable while
>> new
>> +repository data is swapped in. This window will be as long as it
>> takes to
>> +rename some directories inside the ``.hg`` directory. On most
>> machines, this
>> +should complete almost instantaneously and the chances of a consumer
>> being
>> +unable to access the repository should be low.
>>
>
> (I would also delay that doc until we actually behave that way)
> I know it would bot solve "everything" but we discussed the option of
> having a "repositorybeingupgraded" (or similar) in .hg/required to reduce
> the change for a reader to access the repository at that time.
>

I added all this code and documentation to foreshadow the feature so a
reviewer isn't guessing where things are going. It also cuts down on the
amount of patches and/or churn in future patches, making code review
easier, IMO. I think it is busy work to change it at this stage. But if you
insist, I will.


>
> +"""
>> +raise error.Abort(_('not yet implemented'))
>> diff --git a/tests/test-completion.t b/tests/test-completion.t
>> --- a/tests/test-completion.t
>> +++ b/tests/test-completion.t
>> @@ -109,6 +109,7 @@ Show debug commands if there are no othe
>>debugsub
>>debugsuccessorssets
>>debugtemplate
>> +  debugupgraderepo
>>debugwalk
>>debugwireargs
>>
>> @@ -274,6 +275,7 @@ Show all commands + options
>>debugsub: rev
>>debugsuccessorssets:
>>debugtemplate: rev, define
>> +  debugupgraderepo: optimize, run
>>debugwalk: include, exclude
>>debugwireargs: three, four, five, ssh, remotecmd, insecure
>>files: rev, print0, include, exclude, template, subrepos
>> diff --git a/tests/test-help.t b/tests/test-help.t
>> --- a/tests/test-help.t
>> +++ b/tests/test-help.t
>> @@ 

Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method

2016-12-12 Thread Gregory Szorc
On Mon, Dec 12, 2016 at 2:04 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

>
>
> On 11/25/2016 04:28 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1480031508 28800
>> #  Thu Nov 24 15:51:48 2016 -0800
>> # Node ID c60995fce14b0e34bd1416dab3712a6c33bb29bb
>> # Parent  906a7d8e969552536fffe0df7a5e63bf5d79b34b
>> revlog: add clone method
>>
>> Upcoming patches will introduce functionality for in-place
>> repository/store "upgrades." Copying the contents of a revlog
>> feels sufficiently low-level to warrant being in the revlog
>> class. So this commit implements that functionality.
>>
>> Because full delta recomputation can be *very* expensive (we're
>> talking several hours on the Firefox repository), we support
>> multiple modes of execution with regards to delta (re)use. This
>> will allow repository upgrades to choose the "level" of
>> processing/optimization they wish to perform when converting
>> revlogs.
>>
>> It's not obvious from this commit, but "addrevisioncb" will be
>> used for progress reporting.
>>
>> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
>> --- a/mercurial/revlog.py
>> +++ b/mercurial/revlog.py
>> @@ -1818,3 +1818,95 @@ class revlog(object):
>>  if not self._inline:
>>  res.append(self.datafile)
>>  return res
>> +
>> +DELTAREUSEALWAYS = 'always'
>> +DELTAREUSESAMEREVS = 'samerevs'
>> +DELTAREUSENEVER = 'never'
>>
>
> We don't use cap in our names, even for constant, these should be lower
> case.
>

Line 50 of revlog.py disagrees with you and I dislike not using UPPERCASE
for constants in the standard. But I'll change these to lowercase.


>
> I personnaly find "delta reuse" a bit obscure and would probably go for
> "rediff" or something like that instead. I'm fine with keeping "delta
> reuse" is you are not convinced.
>

"diff" is kinda overloaded and this functionality is related to *delta*
chains, so I think "delta" in the name is more appropriate.


>
> On a broader topic, we should probably get a config option for that. This
> is the kind of behavior we might what to be able to control on the
> client/server level. We want a config section dedicated to these (format
> does not seems appropriate, and "server" (used for validate) is not really
> really accurate either)
>
> (Yes, this is outside of the scope of this patch but a good spot to
> mention it)
>

Yes, that could be useful. But it is out of scope.


>
> +def clone(self, tr, destrevlog, addrevisioncb=None,
>> +  deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
>> +"""Copy this revlog to another, possibly with format changes.
>> +
>> +The destination revlog will contain the same revisions and nodes.
>> +However, it may not be bit-for-bit identical due to e.g. delta
>> encoding
>> +differences.
>>
>
> I'm curious about why we pass a destrevlog here instead of creating a new
> one. Is this because the creation of new revlog is going to be deferred to
> a "cloned stored" or something? Then why picking old.clone(new) direction
> instead of new.fromold(old) one?
>

It was an arbitrary choice.


>
> +The ``deltareuse`` argument control how deltas from the existing
>> revlog
>> +are preserved in the destination revlog. The argument can have
>> the
>> +following values:
>> +
>> +DELTAREUSEALWAYS
>> +   Deltas will always be reused (if possible), even if the
>> destination
>> +   revlog would not select the same revisions for the delta.
>> This is the
>> +   fastest mode of operation.
>> +DELTAREUSESAMEREVS
>> +   Deltas will be reused if the destination revlog would pick
>> the same
>> +   revisions for the delta. This mode strikes a balance between
>> speed
>> +   and optimization.
>> +DELTAREUSENEVER
>> +   Deltas will never be reused. This is the slowest mode of
>> execution.
>>
>
> Maybe mention that we are improving the diff algorithm and therefor it is
> useful to rediff sometime ?
>
> +Delta computation can be slow, so the choice of delta reuse
>> policy can
>> +significantly affect run time.
>> +
>> +The default policy (``DELTAREUSESAMEREVS``) strikes a balance
>> between
>> +two extremes. Deltas will be reused if they are appropriate. But
>> if the
>> +delta could choose a better revision, it will do so. This means
>> if you
>> +are converting a non-generaldelta revlog to a generaldelta
>> revlog,
>> +deltas will be recomputed if the delta's parent isn't a parent
>> of the
>> +revision.
>> +
>> +In addition to the delta policy, the ``aggressivemergedeltas``
>> argument
>> +controls whether to compute deltas against both parents for
>> merges.
>> +By default, the current default is used.
>>
>
> This is unrelated to the current patch, but 

Re: [PATCH] error: make it clear that ProgrammingError is for mercurial developers

2016-12-12 Thread Pierre-Yves David



On 12/12/2016 09:03 AM, Jun Wu wrote:

# HG changeset patch
# User Jun Wu 
# Date 1481529712 0
#  Mon Dec 12 08:01:52 2016 +
# Node ID 5efb46608ea75d45f1f4a73c33983c61177134b9
# Parent  3cec78a0556e35becc138801917b1d11a612ce60
# Available At https://bitbucket.org/quark-zju/hg-draft
#  hg pull https://bitbucket.org/quark-zju/hg-draft -r 5efb46608ea7
error: make it clear that ProgrammingError is for mercurial developers

The word "developer" could refer to users - people using hg are likely to be
developers. Add adjectives to make it refer to mercurial developers only.


That one is pushed, thanks. There is probably a small cohort of 'assert' 
waiting to be converted into ProgrammingError.



diff --git a/mercurial/error.py b/mercurial/error.py
--- a/mercurial/error.py
+++ b/mercurial/error.py
@@ -170,5 +170,5 @@ class PushRaced(RuntimeError):

 class ProgrammingError(RuntimeError):
-"""Raised if a developer has made some mistake"""
+"""Raised if a mercurial (core or extension) developer made a mistake"""

 # bundle2 related errors


--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[Bug 5443] New: Something broke after upgrading Ubuntu

2016-12-12 Thread bugzilla
https://bz.mercurial-scm.org/show_bug.cgi?id=5443

Bug ID: 5443
   Summary: Something broke after upgrading Ubuntu
   Product: Mercurial
   Version: unspecified
  Hardware: PC
OS: Linux
Status: UNCONFIRMED
  Severity: bug
  Priority: wish
 Component: Mercurial
  Assignee: bugzi...@selenic.com
  Reporter: teo8...@gmail.com
CC: mercurial-de...@selenic.com

I had Mercurial installed and working on Ubuntu 15.10.

After upgrading to Ubuntu 16.04, creating a branch in a repo produces this
error:

  $ hg branch test
  *** failed to import extension mercurial_keyring: No module named builtins
  marked working directory as branch test

Whatever this means, this would not happen before upgrading Ubuntu.

The branch is created nonetheless.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Constant naming convention (was: Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method)

2016-12-12 Thread Augie Fackler
On Mon, Dec 12, 2016 at 11:04:07AM +0100, Pierre-Yves David wrote:
>
>
> On 11/25/2016 04:28 AM, Gregory Szorc wrote:
> >diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> >--- a/mercurial/revlog.py
> >+++ b/mercurial/revlog.py
> >@@ -1818,3 +1818,95 @@ class revlog(object):
> > if not self._inline:
> > res.append(self.datafile)
> > return res
> >+
> >+DELTAREUSEALWAYS = 'always'
> >+DELTAREUSESAMEREVS = 'samerevs'
> >+DELTAREUSENEVER = 'never'
>
> We don't use cap in our names, even for constant, these should be lower
> case.

Can we just align with prevailing Python style here and start using
all caps for constants? This comes up roughly monthly, and I think
it'd be a good clarity win.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 5 v4] revlog: add flagprocessor

2016-12-12 Thread Pierre-Yves David



On 12/12/2016 11:00 AM, Rémi Chaintron wrote:



On Sat, 3 Dec 2016 at 02:49 Pierre-Yves David
>
wrote:

On 11/29/2016 06:16 PM, Rémi Chaintron wrote:
> Thanks for the awesome review.
>
> > +
> > +def register(self, transformmap):
> > +"""Register transforms to be applied for flags.
> > +
> > +``transformmap`` - a map of flag to transform
> > +"""
> > +for flag in transformmap:
> > +if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> > +self.transformmap[flag] = transformmap[flag]
>
> * unknown flag seems to be silently ignored. It would be
better to just
> fail here.
>
> * we have a mechanism to load extra elements (like flag
processors are).
> Have a look at 'mercurial.dispatch.extraloaders' (or use a
registrar
> API, but that one is probably overkill here)
>
> * we do not handle collision at all, if multiple extensions try to
> register a process for the same flag, we should not let the
last one
> just overwrite the other one. (Abort is the simplest handling).
>
>
> As part of moving the flagprocessor out, I now have a single
> 'processflags(self, text, flags, reverse=True)' method in revlog,
which
> I updated to take care of unknown flags (which was previously done in
> 'revlog.revision()').
>
> Collisions were explicitly not handled, as we were until now
relying on
> a flag per extension, but this is something I'm happy to discuss.

We cannot exclude the change for a collision, especially because
multiple extension could use the same flag without having registered it
with Mercurial ever. In addition, as discussed in Patch4 it would make
sense to have multiple extensions implement different variant of the
same flag/behavior.


If I understand correctly, what you would like to see is moving towards
a generic flag (for example REVIDX_ISSTOREDEXTERNALLY), that several
extensions can register transforms for.

* My assumption is that we want only one extension to be able to
register transforms on a specific flag. (cf. last point on challenging
this).

* An extension that registered a transform needs to be able to change
the transform it registered in order to cover all workflows. Some code
paths require both a read AND write transforms (e.g 'commit' calls
revision() followed by addrevision() and these need different transforms).

* Notwithstanding the fact any extension could force an update of the
transforms dictionary anyway, I'm not sure how to prevent collisions
here. Wouldn't it make sense to let users be responsible for which
extensions they enable?

* If not, I'm a bit unsure as how to enable extension A to register its
transforms while preventing extension B from doing so. I can think of
ways of how to detect genuine errors (i.e a developer enabling two of
these extensions by mistake) but I'm not sure how to keep the
whitelisting of a single extension going as from the point of view of
the  flag transforms dictionary, extensions are anonymous. Note that
this would cover only the aspect of genuine mistakes, not intentional
tempering with the dictionary.

>

[…]

>

Could you elaborate on what you'd like to see here?


(small initial note: we talk a lot about extension, but some flag will 
end up handled by core too (some of censors for example)).


If you provide a small function in charge of registering a new flag 
processors, you can add logic to thing function to make sure there is 
not an existing processors that we are about to silently overwrite.
That function could be a decorator. You can find example of this here: 
https://www.mercurial-scm.org/repo/hg/file/tip/mercurial/exchange.py#l697


Otherwise, you cannot really rely on the extension coder to do the right 
thing, if the obvious API to add a transformer is to directly add it to 
the dictionary you can expect them to not check for an existing one 
before adding their own.


Of course they could still decide to ignore the official API and 
directly temper with the dictionary and all other kind of bad things, 
but providing an official API to register something should be a good 
enough effort to catch most of it.


In the same way, I would really no rely on actual user knowing what they 
doing regarding extension. They will enable incompatible extension 
without checking.


>
> * We could also enable several extensions to register different
> transforms on the same flag (which, ultimately we might have to in order
> to avoid running out of flag bits) but that would require
> "de-anonymizing" extensions from the point of view of
> REVIDX_FLAGS_TRANSFORMS and keeping an ordering of such extensions to
> ensure non-commutative operations are handled gracefully (relying on a
> 

Re: [PATCH 2 of 7 upgraderepo V2] debugcommands: stub for debugupgraderepo command

2016-12-12 Thread Pierre-Yves David



On 11/25/2016 04:28 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1480033449 28800
#  Thu Nov 24 16:24:09 2016 -0800
# Node ID 4fffbdc2bca22e666482b55562c0e592c8a68027
# Parent  c60995fce14b0e34bd1416dab3712a6c33bb29bb
debugcommands: stub for debugupgraderepo command

Currently, if Mercurial introduces a new repository/store feature or
changes behavior of an existing feature, users must perform an
`hg clone` to create a new repository with hopefully the
correct/optimal settings. Unfortunately, even `hg clone` may not
give the correct results. For example, if you do a local `hg clone`,
you may get hardlinks to revlog files that inherit the old state.
If you `hg clone` from a remote or `hg clone --pull`, changegroup
application may bypass some optimization, such as converting to
generaldelta.

Optimizing a repository is harder than it seems and requires more
than a simple `hg` command invocation.

This commit starts the process of changing that. We introduce
`hg debugupgraderepo`, a command that performs an in-place upgrade
of a repository to use new, optimal features. The command is just
a stub right now. Features will be added in subsequent commits.


I like the idea of this commit, this should probably be the first one in 
the stack. The revlog cloning feature can be delayed later in the stack 
until we actually start performing upgrade.



This commit does foreshadow some of the behavior of the new command,
notably that it doesn't do anything by default and that it takes
arguments that influence what actions it performs. These will be
explained more in subsequent commits.

diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py
--- a/mercurial/debugcommands.py
+++ b/mercurial/debugcommands.py
@@ -849,3 +849,28 @@ def debugdeltachain(ui, repo, file_=None
  extradist=extradist, extraratio=extraratio)

 fm.end()
+
+@command('debugupgraderepo', [
+('o', 'optimize', [], _('extra optimization to perform'), _('NAME')),
+('', 'run', False, _('performs an upgrade')),


We should drop these option from the command until we actually use them 
this will allow to discuss them with their actual behavior.



+])
+def debugupgraderepo(ui, repo, run=False, optimize=None):
+"""upgrade a repository to use different features
+
+If no arguments are specified, the repository is evaluated for upgrade
+and a list of problems and potential optimizations is printed.
+
+With ``--run``, a repository upgrade is performed. Behavior of the upgrade
+can be influenced via additional arguments. More details will be provided
+by the command output when run without ``--run``.


(I would also delayed the help about --run until we actually run something)


+
+During the upgrade, the repository will be locked and no writes will be
+allowed.
+
+At the end of the upgrade, the repository may not be readable while new
+repository data is swapped in. This window will be as long as it takes to
+rename some directories inside the ``.hg`` directory. On most machines, 
this
+should complete almost instantaneously and the chances of a consumer being
+unable to access the repository should be low.


(I would also delay that doc until we actually behave that way)
I know it would bot solve "everything" but we discussed the option of 
having a "repositorybeingupgraded" (or similar) in .hg/required to 
reduce the change for a reader to access the repository at that time.



+"""
+raise error.Abort(_('not yet implemented'))
diff --git a/tests/test-completion.t b/tests/test-completion.t
--- a/tests/test-completion.t
+++ b/tests/test-completion.t
@@ -109,6 +109,7 @@ Show debug commands if there are no othe
   debugsub
   debugsuccessorssets
   debugtemplate
+  debugupgraderepo
   debugwalk
   debugwireargs

@@ -274,6 +275,7 @@ Show all commands + options
   debugsub: rev
   debugsuccessorssets:
   debugtemplate: rev, define
+  debugupgraderepo: optimize, run
   debugwalk: include, exclude
   debugwireargs: three, four, five, ssh, remotecmd, insecure
   files: rev, print0, include, exclude, template, subrepos
diff --git a/tests/test-help.t b/tests/test-help.t
--- a/tests/test-help.t
+++ b/tests/test-help.t
@@ -916,6 +916,8 @@ Test list of internal help commands
  show set of successors for revision
debugtemplate
  parse and apply a template
+   debugupgraderepo
+ upgrade a repository to use different features
debugwalk show how files match on given patterns
debugwireargs
  (no help text available)
diff --git a/tests/test-upgrade-repo.t b/tests/test-upgrade-repo.t
new file mode 100644
--- /dev/null
+++ b/tests/test-upgrade-repo.t
@@ -0,0 +1,5 @@
+  $ hg init empty
+  $ cd empty
+  $ hg debugupgraderepo
+  abort: not yet implemented
+  [255]
___
Mercurial-devel mailing list

Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method

2016-12-12 Thread Pierre-Yves David



On 11/25/2016 04:28 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1480031508 28800
#  Thu Nov 24 15:51:48 2016 -0800
# Node ID c60995fce14b0e34bd1416dab3712a6c33bb29bb
# Parent  906a7d8e969552536fffe0df7a5e63bf5d79b34b
revlog: add clone method

Upcoming patches will introduce functionality for in-place
repository/store "upgrades." Copying the contents of a revlog
feels sufficiently low-level to warrant being in the revlog
class. So this commit implements that functionality.

Because full delta recomputation can be *very* expensive (we're
talking several hours on the Firefox repository), we support
multiple modes of execution with regards to delta (re)use. This
will allow repository upgrades to choose the "level" of
processing/optimization they wish to perform when converting
revlogs.

It's not obvious from this commit, but "addrevisioncb" will be
used for progress reporting.

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1818,3 +1818,95 @@ class revlog(object):
 if not self._inline:
 res.append(self.datafile)
 return res
+
+DELTAREUSEALWAYS = 'always'
+DELTAREUSESAMEREVS = 'samerevs'
+DELTAREUSENEVER = 'never'


We don't use cap in our names, even for constant, these should be lower 
case.


I personnaly find "delta reuse" a bit obscure and would probably go for 
"rediff" or something like that instead. I'm fine with keeping "delta 
reuse" is you are not convinced.


On a broader topic, we should probably get a config option for that. 
This is the kind of behavior we might what to be able to control on the 
client/server level. We want a config section dedicated to these (format 
does not seems appropriate, and "server" (used for validate) is not 
really really accurate either)


(Yes, this is outside of the scope of this patch but a good spot to 
mention it)



+def clone(self, tr, destrevlog, addrevisioncb=None,
+  deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
+"""Copy this revlog to another, possibly with format changes.
+
+The destination revlog will contain the same revisions and nodes.
+However, it may not be bit-for-bit identical due to e.g. delta encoding
+differences.


I'm curious about why we pass a destrevlog here instead of creating a 
new one. Is this because the creation of new revlog is going to be 
deferred to a "cloned stored" or something? Then why picking 
old.clone(new) direction instead of new.fromold(old) one?



+The ``deltareuse`` argument control how deltas from the existing revlog
+are preserved in the destination revlog. The argument can have the
+following values:
+
+DELTAREUSEALWAYS
+   Deltas will always be reused (if possible), even if the destination
+   revlog would not select the same revisions for the delta. This is 
the
+   fastest mode of operation.
+DELTAREUSESAMEREVS
+   Deltas will be reused if the destination revlog would pick the same
+   revisions for the delta. This mode strikes a balance between speed
+   and optimization.
+DELTAREUSENEVER
+   Deltas will never be reused. This is the slowest mode of execution.


Maybe mention that we are improving the diff algorithm and therefor it 
is useful to rediff sometime ?



+Delta computation can be slow, so the choice of delta reuse policy can
+significantly affect run time.
+
+The default policy (``DELTAREUSESAMEREVS``) strikes a balance between
+two extremes. Deltas will be reused if they are appropriate. But if the
+delta could choose a better revision, it will do so. This means if you
+are converting a non-generaldelta revlog to a generaldelta revlog,
+deltas will be recomputed if the delta's parent isn't a parent of the
+revision.
+
+In addition to the delta policy, the ``aggressivemergedeltas`` argument
+controls whether to compute deltas against both parents for merges.
+By default, the current default is used.


This is unrelated to the current patch, but "aggressivemergedeltas" is a 
very obscure name for "consider both parent for delta". In addition the 
"official" option seems to live in the "format" config section while 
this is not a format option. So if we could eventually clean that up at 
some point that would be great.



+"""
+if deltareuse not in (self.DELTAREUSEALWAYS, self.DELTAREUSESAMEREVS,
+self.DELTAREUSENEVER):


We should get a set of valid flag on the class itself, that would make 
thing simpler:


  if deltause not in self._validdeltastrategies


+raise ValueError(_('value for deltareuse invalid: %s') % 
deltareuse)
+
+if len(destrevlog):
+raise ValueError(_('destination revlog is not empty'))
+
+# 

Re: [PATCH 2 of 5 v4] revlog: add flagprocessor

2016-12-12 Thread Rémi Chaintron
On Sat, 3 Dec 2016 at 02:49 Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

> On 11/29/2016 06:16 PM, Rémi Chaintron wrote:
> > Thanks for the awesome review.
> >
> > > +
> > > +def register(self, transformmap):
> > > +"""Register transforms to be applied for flags.
> > > +
> > > +``transformmap`` - a map of flag to transform
> > > +"""
> > > +for flag in transformmap:
> > > +if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> > > +self.transformmap[flag] = transformmap[flag]
> >
> > * unknown flag seems to be silently ignored. It would be better to
> just
> > fail here.
> >
> > * we have a mechanism to load extra elements (like flag processors
> are).
> > Have a look at 'mercurial.dispatch.extraloaders' (or use a registrar
> > API, but that one is probably overkill here)
> >
> > * we do not handle collision at all, if multiple extensions try to
> > register a process for the same flag, we should not let the last one
> > just overwrite the other one. (Abort is the simplest handling).
> >
> >
> > As part of moving the flagprocessor out, I now have a single
> > 'processflags(self, text, flags, reverse=True)' method in revlog, which
> > I updated to take care of unknown flags (which was previously done in
> > 'revlog.revision()').
> >
> > Collisions were explicitly not handled, as we were until now relying on
> > a flag per extension, but this is something I'm happy to discuss.
>
> We cannot exclude the change for a collision, especially because
> multiple extension could use the same flag without having registered it
> with Mercurial ever. In addition, as discussed in Patch4 it would make
> sense to have multiple extensions implement different variant of the
> same flag/behavior.
>

If I understand correctly, what you would like to see is moving towards a
generic flag (for example REVIDX_ISSTOREDEXTERNALLY), that several
extensions can register transforms for.

* My assumption is that we want only one extension to be able to register
transforms on a specific flag. (cf. last point on challenging this).

* An extension that registered a transform needs to be able to change the
transform it registered in order to cover all workflows. Some code paths
require both a read AND write transforms (e.g 'commit' calls revision()
followed by addrevision() and these need different transforms).

* Notwithstanding the fact any extension could force an update of the
transforms dictionary anyway, I'm not sure how to prevent collisions here.
Wouldn't it make sense to let users be responsible for which extensions
they enable?

* If not, I'm a bit unsure as how to enable extension A to register its
transforms while preventing extension B from doing so. I can think of ways
of how to detect genuine errors (i.e a developer enabling two of these
extensions by mistake) but I'm not sure how to keep the whitelisting of a
single extension going as from the point of view of the  flag transforms
dictionary, extensions are anonymous. Note that this would cover only the
aspect of genuine mistakes, not intentional tempering with the dictionary.

* We could also enable several extensions to register different transforms
on the same flag (which, ultimately we might have to in order to avoid
running out of flag bits) but that would require "de-anonymizing"
extensions from the point of view of REVIDX_FLAGS_TRANSFORMS and keeping an
ordering of such extensions to ensure non-commutative operations are
handled gracefully (relying on a double ordering of which flags handle
first and which extensions apply in which order for each flag).

Could you elaborate on what you'd like to see here?




> > The current updated version I'm working on simply relies on the
> > following constant:
> > REVIDX_FLAGS_TRANSFORMS = collections.OrderedDict({
> > REVIDX_ISCENSORED: None,
> > REVIDX_ISLFS: None,
> > })
> > This takes care of both the ordering and a way to register transforms,
> > but this enforces using only one transform per flag.
>
> Not that in the example you provide, You are feeding a dict to
> OrderedDict, so the ordering is lost before reaching the OrderedDict.
>
> In addition, using an ordered dict makes is hard to third party
> extension to add and use any non-declared flag.
>
> > > +
> > > +def unregister(self, transformmap):
> > > +"""Unregister transforms for flags."""
> > > +for flag in transformmap:
> > > +if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> > > +self.transformmap[flag] = None
> >
> > What is the usecase for unregistering? Since we do not call the
> process
> > if the flag is not set, I can't think of a case were we need to
> > unregister. Am I missing something?
> >
> >
> > I will get rid of this, as we're moving away from non-flag based
> > transforms (cf. below).
> >
> >
> > > +def 

[Bug 5442] New: Leap seconds not handled properly in commit time

2016-12-12 Thread bugzilla
https://bz.mercurial-scm.org/show_bug.cgi?id=5442

Bug ID: 5442
   Summary: Leap seconds not handled properly in commit time
   Product: Mercurial
   Version: 4.0.1
  Hardware: PC
OS: Linux
Status: UNCONFIRMED
  Severity: bug
  Priority: wish
 Component: Mercurial
  Assignee: bugzi...@selenic.com
  Reporter: f...@tuttu.info
CC: mercurial-de...@selenic.com

Hello,

The last minute of 2016 will have 61 seconds, this means that I should be able
to commit at that time: 

hg commit --date  '2016-12-31 23:59:60'

Actually, I can, but the date is wrong by 1 second:

changeset:   ...
tag: ...
user: ...
date:Sun Jan 01 00:00:00 2017 +0100
summary: ...


I should not be able to commit the day before at 23:59:60 :

hg commit --date  '2016-12-30 23:59:60' -> should yield an error !

changeset:   ...
tag: ...
user:...
date:Sat Dec 31 00:00:00 2016 +0100
summary: ...


Cheers,

Feth

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH] error: make it clear that ProgrammingError is for mercurial developers

2016-12-12 Thread Jun Wu
# HG changeset patch
# User Jun Wu 
# Date 1481529712 0
#  Mon Dec 12 08:01:52 2016 +
# Node ID 5efb46608ea75d45f1f4a73c33983c61177134b9
# Parent  3cec78a0556e35becc138801917b1d11a612ce60
# Available At https://bitbucket.org/quark-zju/hg-draft
#  hg pull https://bitbucket.org/quark-zju/hg-draft -r 5efb46608ea7
error: make it clear that ProgrammingError is for mercurial developers

The word "developer" could refer to users - people using hg are likely to be
developers. Add adjectives to make it refer to mercurial developers only.

diff --git a/mercurial/error.py b/mercurial/error.py
--- a/mercurial/error.py
+++ b/mercurial/error.py
@@ -170,5 +170,5 @@ class PushRaced(RuntimeError):
 
 class ProgrammingError(RuntimeError):
-"""Raised if a developer has made some mistake"""
+"""Raised if a mercurial (core or extension) developer made a mistake"""
 
 # bundle2 related errors
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel