Re: Constant naming convention (was: Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method)

2016-12-24 Thread Gregory Szorc
On Mon, Dec 12, 2016 at 8:12 AM, Augie Fackler  wrote:

> On Mon, Dec 12, 2016 at 11:04:07AM +0100, Pierre-Yves David wrote:
> >
> >
> > On 11/25/2016 04:28 AM, Gregory Szorc wrote:
> > >diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > >--- a/mercurial/revlog.py
> > >+++ b/mercurial/revlog.py
> > >@@ -1818,3 +1818,95 @@ class revlog(object):
> > > if not self._inline:
> > > res.append(self.datafile)
> > > return res
> > >+
> > >+DELTAREUSEALWAYS = 'always'
> > >+DELTAREUSESAMEREVS = 'samerevs'
> > >+DELTAREUSENEVER = 'never'
> >
> > We don't use cap in our names, even for constant, these should be lower
> > case.
>
> Can we just align with prevailing Python style here and start using
> all caps for constants? This comes up roughly monthly, and I think
> it'd be a good clarity win.
>

+1

I like trending towards consistency with the majority of the Python
community. That will make Mercurial hacking and contributing more
approachable to newcomers, which is good for The Project.

Our use of casing for constants is inconsistent today. And, there is no
check-code rule enforcing one way or the other. So in my mind that means we
can do whatever we want. One could make the argument that since Python has
no concept of constants, there's no way to tell whether a variable
assignment is a constant or not and that's why there is no check-code rule.
This is true. However, in my mind this is justification for distinguishing
between variables used as constants with a different naming scheme: the
UPPERCASING says "use this variable as a constant" - it gives a meaningful
signal to code authors and reviewers as to how a variable should be used.

FWIW, I also strongly disagree with our style convention that use lowercase
words without underscores for everything. IMO that makes things much harder
to read than they should be (especially to non-English native speakers).
But that rule is encoded in check-style and I'll fight on that hill another
day :)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Constant naming convention (was: Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method)

2016-12-13 Thread Yuya Nishihara
On Mon, 12 Dec 2016 10:12:48 -0500, Augie Fackler wrote:
> On Mon, Dec 12, 2016 at 11:04:07AM +0100, Pierre-Yves David wrote:
> >
> >
> > On 11/25/2016 04:28 AM, Gregory Szorc wrote:
> > >diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > >--- a/mercurial/revlog.py
> > >+++ b/mercurial/revlog.py
> > >@@ -1818,3 +1818,95 @@ class revlog(object):
> > > if not self._inline:
> > > res.append(self.datafile)
> > > return res
> > >+
> > >+DELTAREUSEALWAYS = 'always'
> > >+DELTAREUSESAMEREVS = 'samerevs'
> > >+DELTAREUSENEVER = 'never'
> >
> > We don't use cap in our names, even for constant, these should be lower
> > case.
> 
> Can we just align with prevailing Python style here and start using
> all caps for constants? This comes up roughly monthly, and I think
> it'd be a good clarity win.

I'm fine with ALLCAPS constants.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method

2016-12-12 Thread Gregory Szorc
On Mon, Dec 12, 2016 at 2:04 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

>
>
> On 11/25/2016 04:28 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc 
>> # Date 1480031508 28800
>> #  Thu Nov 24 15:51:48 2016 -0800
>> # Node ID c60995fce14b0e34bd1416dab3712a6c33bb29bb
>> # Parent  906a7d8e969552536fffe0df7a5e63bf5d79b34b
>> revlog: add clone method
>>
>> Upcoming patches will introduce functionality for in-place
>> repository/store "upgrades." Copying the contents of a revlog
>> feels sufficiently low-level to warrant being in the revlog
>> class. So this commit implements that functionality.
>>
>> Because full delta recomputation can be *very* expensive (we're
>> talking several hours on the Firefox repository), we support
>> multiple modes of execution with regards to delta (re)use. This
>> will allow repository upgrades to choose the "level" of
>> processing/optimization they wish to perform when converting
>> revlogs.
>>
>> It's not obvious from this commit, but "addrevisioncb" will be
>> used for progress reporting.
>>
>> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
>> --- a/mercurial/revlog.py
>> +++ b/mercurial/revlog.py
>> @@ -1818,3 +1818,95 @@ class revlog(object):
>>  if not self._inline:
>>  res.append(self.datafile)
>>  return res
>> +
>> +DELTAREUSEALWAYS = 'always'
>> +DELTAREUSESAMEREVS = 'samerevs'
>> +DELTAREUSENEVER = 'never'
>>
>
> We don't use cap in our names, even for constant, these should be lower
> case.
>

Line 50 of revlog.py disagrees with you and I dislike not using UPPERCASE
for constants in the standard. But I'll change these to lowercase.


>
> I personnaly find "delta reuse" a bit obscure and would probably go for
> "rediff" or something like that instead. I'm fine with keeping "delta
> reuse" is you are not convinced.
>

"diff" is kinda overloaded and this functionality is related to *delta*
chains, so I think "delta" in the name is more appropriate.


>
> On a broader topic, we should probably get a config option for that. This
> is the kind of behavior we might what to be able to control on the
> client/server level. We want a config section dedicated to these (format
> does not seems appropriate, and "server" (used for validate) is not really
> really accurate either)
>
> (Yes, this is outside of the scope of this patch but a good spot to
> mention it)
>

Yes, that could be useful. But it is out of scope.


>
> +def clone(self, tr, destrevlog, addrevisioncb=None,
>> +  deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
>> +"""Copy this revlog to another, possibly with format changes.
>> +
>> +The destination revlog will contain the same revisions and nodes.
>> +However, it may not be bit-for-bit identical due to e.g. delta
>> encoding
>> +differences.
>>
>
> I'm curious about why we pass a destrevlog here instead of creating a new
> one. Is this because the creation of new revlog is going to be deferred to
> a "cloned stored" or something? Then why picking old.clone(new) direction
> instead of new.fromold(old) one?
>

It was an arbitrary choice.


>
> +The ``deltareuse`` argument control how deltas from the existing
>> revlog
>> +are preserved in the destination revlog. The argument can have
>> the
>> +following values:
>> +
>> +DELTAREUSEALWAYS
>> +   Deltas will always be reused (if possible), even if the
>> destination
>> +   revlog would not select the same revisions for the delta.
>> This is the
>> +   fastest mode of operation.
>> +DELTAREUSESAMEREVS
>> +   Deltas will be reused if the destination revlog would pick
>> the same
>> +   revisions for the delta. This mode strikes a balance between
>> speed
>> +   and optimization.
>> +DELTAREUSENEVER
>> +   Deltas will never be reused. This is the slowest mode of
>> execution.
>>
>
> Maybe mention that we are improving the diff algorithm and therefor it is
> useful to rediff sometime ?
>
> +Delta computation can be slow, so the choice of delta reuse
>> policy can
>> +significantly affect run time.
>> +
>> +The default policy (``DELTAREUSESAMEREVS``) strikes a balance
>> between
>> +two extremes. Deltas will be reused if they are appropriate. But
>> if the
>> +delta could choose a better revision, it will do so. This means
>> if you
>> +are converting a non-generaldelta revlog to a generaldelta
>> revlog,
>> +deltas will be recomputed if the delta's parent isn't a parent
>> of the
>> +revision.
>> +
>> +In addition to the delta policy, the ``aggressivemergedeltas``
>> argument
>> +controls whether to compute deltas against both parents for
>> merges.
>> +By default, the current default is used.
>>
>
> This is unrelated to the current patch, but 

Constant naming convention (was: Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method)

2016-12-12 Thread Augie Fackler
On Mon, Dec 12, 2016 at 11:04:07AM +0100, Pierre-Yves David wrote:
>
>
> On 11/25/2016 04:28 AM, Gregory Szorc wrote:
> >diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> >--- a/mercurial/revlog.py
> >+++ b/mercurial/revlog.py
> >@@ -1818,3 +1818,95 @@ class revlog(object):
> > if not self._inline:
> > res.append(self.datafile)
> > return res
> >+
> >+DELTAREUSEALWAYS = 'always'
> >+DELTAREUSESAMEREVS = 'samerevs'
> >+DELTAREUSENEVER = 'never'
>
> We don't use cap in our names, even for constant, these should be lower
> case.

Can we just align with prevailing Python style here and start using
all caps for constants? This comes up roughly monthly, and I think
it'd be a good clarity win.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method

2016-12-12 Thread Pierre-Yves David



On 11/25/2016 04:28 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1480031508 28800
#  Thu Nov 24 15:51:48 2016 -0800
# Node ID c60995fce14b0e34bd1416dab3712a6c33bb29bb
# Parent  906a7d8e969552536fffe0df7a5e63bf5d79b34b
revlog: add clone method

Upcoming patches will introduce functionality for in-place
repository/store "upgrades." Copying the contents of a revlog
feels sufficiently low-level to warrant being in the revlog
class. So this commit implements that functionality.

Because full delta recomputation can be *very* expensive (we're
talking several hours on the Firefox repository), we support
multiple modes of execution with regards to delta (re)use. This
will allow repository upgrades to choose the "level" of
processing/optimization they wish to perform when converting
revlogs.

It's not obvious from this commit, but "addrevisioncb" will be
used for progress reporting.

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1818,3 +1818,95 @@ class revlog(object):
 if not self._inline:
 res.append(self.datafile)
 return res
+
+DELTAREUSEALWAYS = 'always'
+DELTAREUSESAMEREVS = 'samerevs'
+DELTAREUSENEVER = 'never'


We don't use cap in our names, even for constant, these should be lower 
case.


I personnaly find "delta reuse" a bit obscure and would probably go for 
"rediff" or something like that instead. I'm fine with keeping "delta 
reuse" is you are not convinced.


On a broader topic, we should probably get a config option for that. 
This is the kind of behavior we might what to be able to control on the 
client/server level. We want a config section dedicated to these (format 
does not seems appropriate, and "server" (used for validate) is not 
really really accurate either)


(Yes, this is outside of the scope of this patch but a good spot to 
mention it)



+def clone(self, tr, destrevlog, addrevisioncb=None,
+  deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
+"""Copy this revlog to another, possibly with format changes.
+
+The destination revlog will contain the same revisions and nodes.
+However, it may not be bit-for-bit identical due to e.g. delta encoding
+differences.


I'm curious about why we pass a destrevlog here instead of creating a 
new one. Is this because the creation of new revlog is going to be 
deferred to a "cloned stored" or something? Then why picking 
old.clone(new) direction instead of new.fromold(old) one?



+The ``deltareuse`` argument control how deltas from the existing revlog
+are preserved in the destination revlog. The argument can have the
+following values:
+
+DELTAREUSEALWAYS
+   Deltas will always be reused (if possible), even if the destination
+   revlog would not select the same revisions for the delta. This is 
the
+   fastest mode of operation.
+DELTAREUSESAMEREVS
+   Deltas will be reused if the destination revlog would pick the same
+   revisions for the delta. This mode strikes a balance between speed
+   and optimization.
+DELTAREUSENEVER
+   Deltas will never be reused. This is the slowest mode of execution.


Maybe mention that we are improving the diff algorithm and therefor it 
is useful to rediff sometime ?



+Delta computation can be slow, so the choice of delta reuse policy can
+significantly affect run time.
+
+The default policy (``DELTAREUSESAMEREVS``) strikes a balance between
+two extremes. Deltas will be reused if they are appropriate. But if the
+delta could choose a better revision, it will do so. This means if you
+are converting a non-generaldelta revlog to a generaldelta revlog,
+deltas will be recomputed if the delta's parent isn't a parent of the
+revision.
+
+In addition to the delta policy, the ``aggressivemergedeltas`` argument
+controls whether to compute deltas against both parents for merges.
+By default, the current default is used.


This is unrelated to the current patch, but "aggressivemergedeltas" is a 
very obscure name for "consider both parent for delta". In addition the 
"official" option seems to live in the "format" config section while 
this is not a format option. So if we could eventually clean that up at 
some point that would be great.



+"""
+if deltareuse not in (self.DELTAREUSEALWAYS, self.DELTAREUSESAMEREVS,
+self.DELTAREUSENEVER):


We should get a set of valid flag on the class itself, that would make 
thing simpler:


  if deltause not in self._validdeltastrategies


+raise ValueError(_('value for deltareuse invalid: %s') % 
deltareuse)
+
+if len(destrevlog):
+raise ValueError(_('destination revlog is not empty'))
+
+# 

Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method

2016-11-24 Thread Gregory Szorc
On Thu, Nov 24, 2016 at 7:28 PM, Gregory Szorc 
wrote:

> # HG changeset patch
> # User Gregory Szorc 
> # Date 1480031508 28800
> #  Thu Nov 24 15:51:48 2016 -0800
> # Node ID c60995fce14b0e34bd1416dab3712a6c33bb29bb
> # Parent  906a7d8e969552536fffe0df7a5e63bf5d79b34b
> revlog: add clone method
>

There were a ton of comments on version 1 of this series. I can't promise I
addressed them all.

There were a ton of other changes that went into version 2 of this series.

So if you are reviewing this series, please treat this as a version 1 and
look at everything in depth, as the delta since version 1 is very
significant.


>
> Upcoming patches will introduce functionality for in-place
> repository/store "upgrades." Copying the contents of a revlog
> feels sufficiently low-level to warrant being in the revlog
> class. So this commit implements that functionality.
>
> Because full delta recomputation can be *very* expensive (we're
> talking several hours on the Firefox repository), we support
> multiple modes of execution with regards to delta (re)use. This
> will allow repository upgrades to choose the "level" of
> processing/optimization they wish to perform when converting
> revlogs.
>
> It's not obvious from this commit, but "addrevisioncb" will be
> used for progress reporting.
>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1818,3 +1818,95 @@ class revlog(object):
>  if not self._inline:
>  res.append(self.datafile)
>  return res
> +
> +DELTAREUSEALWAYS = 'always'
> +DELTAREUSESAMEREVS = 'samerevs'
> +DELTAREUSENEVER = 'never'
> +
> +def clone(self, tr, destrevlog, addrevisioncb=None,
> +  deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
> +"""Copy this revlog to another, possibly with format changes.
> +
> +The destination revlog will contain the same revisions and nodes.
> +However, it may not be bit-for-bit identical due to e.g. delta
> encoding
> +differences.
> +
> +The ``deltareuse`` argument control how deltas from the existing
> revlog
> +are preserved in the destination revlog. The argument can have the
> +following values:
> +
> +DELTAREUSEALWAYS
> +   Deltas will always be reused (if possible), even if the
> destination
> +   revlog would not select the same revisions for the delta. This
> is the
> +   fastest mode of operation.
> +DELTAREUSESAMEREVS
> +   Deltas will be reused if the destination revlog would pick the
> same
> +   revisions for the delta. This mode strikes a balance between
> speed
> +   and optimization.
> +DELTAREUSENEVER
> +   Deltas will never be reused. This is the slowest mode of
> execution.
> +
> +Delta computation can be slow, so the choice of delta reuse
> policy can
> +significantly affect run time.
> +
> +The default policy (``DELTAREUSESAMEREVS``) strikes a balance
> between
> +two extremes. Deltas will be reused if they are appropriate. But
> if the
> +delta could choose a better revision, it will do so. This means
> if you
> +are converting a non-generaldelta revlog to a generaldelta revlog,
> +deltas will be recomputed if the delta's parent isn't a parent of
> the
> +revision.
> +
> +In addition to the delta policy, the ``aggressivemergedeltas``
> argument
> +controls whether to compute deltas against both parents for
> merges.
> +By default, the current default is used.
> +"""
> +if deltareuse not in (self.DELTAREUSEALWAYS,
> self.DELTAREUSESAMEREVS,
> +self.DELTAREUSENEVER):
> +raise ValueError(_('value for deltareuse invalid: %s') %
> deltareuse)
> +
> +if len(destrevlog):
> +raise ValueError(_('destination revlog is not empty'))
> +
> +# lazydeltabase controls whether to reuse a cached delta, if
> possible.
> +oldlazydeltabase = destrevlog._lazydeltabase
> +oldamd = destrevlog._aggressivemergedeltas
> +
> +if deltareuse == self.DELTAREUSEALWAYS:
> +destrevlog._lazydeltabase = True
> +elif deltareuse == self.DELTAREUSESAMEREVS:
> +destrevlog._lazydeltabase = False
> +
> +destrevlog._aggressivemergedeltas = aggressivemergedeltas or
> oldamd
> +
> +populatecachedelta = deltareuse in (self.DELTAREUSEALWAYS,
> +self.DELTAREUSESAMEREVS)
> +
> +try:
> +index = self.index
> +for rev in self:
> +entry = index[rev]
> +
> +# Some classes override linkrev to take filtered revs into
> +# account. Use raw entry from index.
> +linkrev = entry[4]
> +p1 =