[PATCH] bdiff: replace hash algorithm

2016-11-06 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478487117 28800
#  Sun Nov 06 18:51:57 2016 -0800
# Node ID bb7c6d6f4a10e80ff4bdf88919692f08497d2d66
# Parent  1c7269484883804b6f960e87309169ef4ae85043
bdiff: replace hash algorithm

This patch replaces lyhash with the hash algorithm used by diffutils.
The algorithm has its origins in Git commit 2e9d1410, which is all the
way back from 1992. The license header in the code at that revision
in GPL v2.

I have not performed an extensive analysis of the distribution
(and therefore buckets) of hash output. However, `hg perfbdiff`
gives some clear wins. I'd like to think that if it is good enough
for diffutils it is good enough for us?

From the mozilla-unified repository:

$ perfbdiff -m 3041e4d59df2
! wall 0.053271 comb 0.06 user 0.06 sys 0.00 (best of 100)
! wall 0.035827 comb 0.04 user 0.04 sys 0.00 (best of 100)

$ perfbdiff 0e9928989e9c --alldata --count 100
! wall 6.204277 comb 6.20 user 6.20 sys 0.00 (best of 3)
! wall 4.309710 comb 4.30 user 4.30 sys 0.00 (best of 3)

From the hg repo:

$ perfbdiff 35000 --alldata --count 1000
! wall 0.660358 comb 0.66 user 0.66 sys 0.00 (best of 15)
! wall 0.534092 comb 0.53 user 0.53 sys 0.00 (best of 19)

Looking at the generated assembly and statistical profiler output
from the kernel level, I believe there is room to make this function
even faster. Namely, we're still consuming data character by character
instead of at the word level. This translates to more loop iterations
and more instructions.

At this juncture though, the real performance killer is that we're
hashing every line. We should get a significant speedup if we change
the algorithm to find the longest prefix, longest suffix, treat those
as single "lines" and then only do the line splitting and hashing on
the parts that are different. That will require a lot of C code,
however. I'm optimistic this approach could result in a ~2x speedup.

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -17,6 +17,10 @@
 #include "bitmanipulation.h"
 #include "bdiff.h"
 
+/* Hash implementation from diffutils */
+#define ROL(v, n) ((v) << (n) | (v) >> (sizeof(v) * CHAR_BIT - (n)))
+#define HASH(h, c) ((c) + ROL(h ,7))
+
 struct pos {
int pos, len;
 };
@@ -44,8 +48,7 @@ int bdiff_splitlines(const char *a, ssiz
/* build the line array and calculate hashes */
hash = 0;
for (p = a; p < a + len; p++) {
-   /* Leonid Yuriev's hash */
-   hash = (hash * 1664525) + (unsigned char)*p + 1013904223;
+   hash = HASH(hash, *p);
 
if (*p == '\n' || p == plast) {
l->hash = hash;
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: to abort / to continue (UX)

2016-11-06 Thread Pierre-Yves David



On 11/04/2016 07:13 PM, timeless wrote:

timeless wrote:

technically this isn't wonderful, if you use bookmarks, `hg up -C .`
isn't equivalent to `hg up -C {currentbookmark}`, and if you're using
hg-git or something that puts significant weight on your active
bookmark, this can lead you into some very unfortunate weeds.


Martin von Zweigbergk wrote:

Good point. There's just no correct way of aborting yet, is there? I suppose
we'll have "hg abort" some day. "hg update -C ." always felt a little
git-like to me in the it reuses a command that happens to be closest
implementation-wise.


Yeah, `hg abort` is sort of the right pair to `hg continue`. And we're
only slowly approaching `hg continue` (actually, we're probably
moderately close).

I think we could probably at least write a helper to make the "to
abort" case work on average. Bookmarks could register to replace `.`
with `{current bookmark}`.


We can probably make a dirty patch in evolve just for that.
However, I would be very happy if someone would work on an 
abort/continue facility.



So, how about half a plan:
1. add code to enable hookable suggestions for `to abort`
2. someone should survey the hg world to figure out what other things
might be wanted beyond `hg up -C .` and `hg up -C {currentbookmark}`
3. we review the current `to continue`
4. we look at wiring `hg continue` and `hg abort`


Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 5] adjustlinkrev: use C implementation to test ancestor if possible

2016-11-06 Thread Pierre-Yves David



On 11/02/2016 03:05 AM, Jun Wu wrote:

Excerpts from Pierre-Yves David's message of 2016-11-02 00:29:45 +0100:


On 11/01/2016 09:51 AM, Jun Wu wrote:

# HG changeset patch
# User Jun Wu 
# Date 1477989396 0
#  Tue Nov 01 08:36:36 2016 +
# Node ID 93e073edc7f673d76d1113f5830ec46210707c25
# Parent  ac063914b3a3c01f1d7ed253c73113903fccb7a9
# Available At https://bitbucket.org/quark-zju/hg-draft
#  hg pull https://bitbucket.org/quark-zju/hg-draft  -r 93e073edc7f6
adjustlinkrev: use C implementation to test ancestor if possible

The C ancestors implementation is about 10-100x faster than the Python
lazyancestors implementation (and it has room to improve: 1. we only need
isancestor, not common ancestor; 2. we can shrink revs[p:q] to a single
ranged revision if all(revs[i].parentrevs() == [i-1] for i in range(p,q)),
the state of known ranged revisions can be stored in the C index object).

In the real world, the following command:

  HGRCPATH=/dev/null hg log -f mercurial/c*.py -T '{rev}\n' > /dev/null

Takes 2.3 to 2.5 seconds user-space CPU time before the patch,
  and 1.7 to 1.9 after.


I'd prefer commenting on details directly. Things like "this is a complex
issue, you should try to do more homework before trying to solve it" are
useful to new people, while not very helpful to me, who had some experience
here. I think you can save your time by simplifying most part of this email
to:

  - Could you include runtime data for the Mozilla graft case?
  - Could you include runtime data for pure and pypy?
  - How do you think about a C implementation for lazyancestors?


So, this not just an usual "This is non trivial - please include more 
details" feedback. From experience, I know this piece of code is 
especially full of monsters and dragons. When this code happened we had 
to multiple severe performance regressions reported during the freeze 
and after it, and fixing one of them usually made another one appears in 
its wake. This was bad for the project because it impacted user and had 
to issue bugfix releases and this was bad for the contributors because I 
remember having to context switch at 1 am on a release night attending 
FOSDEM conference to fix yet another regression at was introduced by the 
previous set batch of fixes, I wasn't happy. I want the project to avoid 
this again and I don't wish you to have to battle with unexpected 
consequence.


But thanks for expressing this general feeling anyway, I'll try to adapt 
and adjust feedback in a bit more direct way for other series.



There is an history of introducing large performance regression while
touching that code. That explain a bit its complexity, for example there
is case were we carry the direct parent in _descendantrev and adjust
linkrev as we go while there is some case where we just set the top most
revision in _descendantrev and adjust link on demand much later. These
serve different purposes for different top level use case.


I think I fully understand what the current _adjustlinkrev code is doing and
why it is the form today.


I don't doubt you have are getting a good understanding of the current 
code, but message like this one 
https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-November/089815.html 
show that you do not have a full grasp on the best yet.



Instead of running tests blindly, I'd like to measure things more
scientifically, like analyzing the code flow, the time complexity etc.
For this patch, if C code works, it should be faster in all cases. Reasons:

  - If _ancestrycontext exists, the code path is the same for both the new
and the old code, so there should be no difference. This covers cases
where copies.py sets _ancestrycontext explicitly.
  - If _ancestrycontext does not exist, "memberanc" will be created on the
fly by "_adjustlinkrev" and dropped when returning. Therefore no state
about which ancestor revisions are already visited is stored. It's
walking from _descendantrev or self.rev each time. The C code is doing a
same stateless thing but is 10-100x faster.

I can update the commit message to include the above reasons.


Yes please at least update the commit message next time. This was 
helpful to get a better grasp on what your changeset is doing.



As I'm lacking of time, probably like everyone else here, I'd defend my time
to be used on something more meaningful (than unnecessary tests here). So I
can end up completing more things here, which is a good thing.


As pointed earlier in this reply, last time we installed this code, this 
ended up much more impactful and timewasting than anticipated. I'm 
trying to get everybody to eventually spend less time on this here. This 
means moving carefully and having a lot of initial check so that we 
don't have to come back on this.



That said, pure / pypy is a valid concern. The problem is, is
"ancestor.commonancestorsheads" slower than "ancestor.lazyancestors"?
I will answer this 

[PATCH 3 of 3] perf: support measuring bdiff for all changeset related data

2016-11-06 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478458015 28800
#  Sun Nov 06 10:46:55 2016 -0800
# Node ID 1c7269484883804b6f960e87309169ef4ae85043
# Parent  91a510590fa6f4003f596bf661cb17da8800ecde
perf: support measuring bdiff for all changeset related data

The --all argument changes the behavior of `perfbdiff` to pull
in fulltext revision pairs for all changes related to a changeset.
The p1 and p2 manifests will be bdiffed against current. Every file
that changed between p1 and current will have its file revisions
loaded and bdiffed.

This mode of operation effectively measured the bdiff time required
for `hg commit`.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -748,7 +748,8 @@ def perffncacheencode(ui, repo, **opts):
 fm.end()
 
 @command('perfbdiff', revlogopts + formatteropts + [
-('', 'count', 1, 'number of revisions to test (when using --startrev)')],
+('', 'count', 1, 'number of revisions to test (when using --startrev)'),
+('', 'alldata', False, 'test bdiffs for all associated revisions')],
 '-c|-m|FILE REV')
 def perfbdiff(ui, repo, file_, rev=None, count=None, **opts):
 """benchmark a bdiff between revisions
@@ -757,7 +758,14 @@ def perfbdiff(ui, repo, file_, rev=None,
 
 With ``--count``, benchmark bdiffs between delta parents and self for N
 revisions starting at the specified revision.
+
+With ``--alldata``, assume the requested revision is a changeset and
+measure bdiffs for all changes related to that changeset (manifest
+and filelogs).
 """
+if opts['alldata']:
+opts['changelog'] = True
+
 if opts.get('changelog') or opts.get('manifest'):
 file_, rev = None, file_
 elif rev is None:
@@ -769,8 +777,25 @@ def perfbdiff(ui, repo, file_, rev=None,
 
 startrev = r.rev(r.lookup(rev))
 for rev in range(startrev, min(startrev + count, len(r) - 1)):
-dp = r.deltaparent(rev)
-textpairs.append((r.revision(dp), r.revision(rev)))
+if opts['alldata']:
+# Load revisions associated with changeset.
+ctx = repo[rev]
+mtext = repo.manifest.revision(ctx.manifestnode())
+for pctx in ctx.parents():
+pman = repo.manifest.revision(pctx.manifestnode())
+textpairs.append((pman, mtext))
+
+# Load filelog revisions by iterating manifest delta.
+man = ctx.manifest()
+pman = ctx.p1().manifest()
+for filename, change in pman.diff(man).items():
+fctx = repo.file(filename)
+f1 = fctx.revision(change[0][0] or -1)
+f2 = fctx.revision(change[1][0] or -1)
+textpairs.append((f1, f2))
+else:
+dp = r.deltaparent(rev)
+textpairs.append((r.revision(dp), r.revision(rev)))
 
 def d():
 for pair in textpairs:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 3] perf: support bdiffing multiple revisions in a single revlog

2016-11-06 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478458885 28800
#  Sun Nov 06 11:01:25 2016 -0800
# Node ID 91a510590fa6f4003f596bf661cb17da8800ecde
# Parent  b8907ec6c08b8215ea8937ecd5f801006d433604
perf: support bdiffing multiple revisions in a single revlog

This is useful for testing bdiff performance on several revision
pairs at a time.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -747,9 +747,17 @@ def perffncacheencode(ui, repo, **opts):
 timer(d)
 fm.end()
 
-@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV')
-def perfbdiff(ui, repo, file_, rev=None, **opts):
-"""benchmark a bdiff between a revision and its delta parent"""
+@command('perfbdiff', revlogopts + formatteropts + [
+('', 'count', 1, 'number of revisions to test (when using --startrev)')],
+'-c|-m|FILE REV')
+def perfbdiff(ui, repo, file_, rev=None, count=None, **opts):
+"""benchmark a bdiff between revisions
+
+By default, benchmark a bdiff between its delta parent and itself.
+
+With ``--count``, benchmark bdiffs between delta parents and self for N
+revisions starting at the specified revision.
+"""
 if opts.get('changelog') or opts.get('manifest'):
 file_, rev = None, file_
 elif rev is None:
@@ -759,10 +767,10 @@ def perfbdiff(ui, repo, file_, rev=None,
 
 r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts)
 
-node = r.lookup(rev)
-rev = r.rev(node)
-dp = r.deltaparent(rev)
-textpairs.append((r.revision(dp), r.revision(node)))
+startrev = r.rev(r.lookup(rev))
+for rev in range(startrev, min(startrev + count, len(r) - 1)):
+dp = r.deltaparent(rev)
+textpairs.append((r.revision(dp), r.revision(rev)))
 
 def d():
 for pair in textpairs:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 3] perf: prepare to handle multiple pairs in perfbdiff

2016-11-06 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478454674 28800
#  Sun Nov 06 09:51:14 2016 -0800
# Node ID b8907ec6c08b8215ea8937ecd5f801006d433604
# Parent  38f7d48d4a8399dc14e95250697fc98bfeafb1fc
perf: prepare to handle multiple pairs in perfbdiff

Before, we only supported benchmarking a single pair of texts
with bdiff. We want to enable feeding larger corpora into this
benchmark. So rewrite the code to support that.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -755,17 +755,18 @@ def perfbdiff(ui, repo, file_, rev=None,
 elif rev is None:
 raise error.CommandError('perfbdiff', 'invalid arguments')
 
+textpairs = []
+
 r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts)
 
 node = r.lookup(rev)
 rev = r.rev(node)
 dp = r.deltaparent(rev)
-
-text1 = r.revision(dp)
-text2 = r.revision(node)
+textpairs.append((r.revision(dp), r.revision(node)))
 
 def d():
-bdiff.bdiff(text1, text2)
+for pair in textpairs:
+bdiff.bdiff(*pair)
 
 timer, fm = gettimer(ui, opts)
 timer(d)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3] revlog: optimize _chunkraw when startrev==endrev

2016-11-06 Thread Pierre-Yves David

okay, I've fixed this in flight.

On 11/06/2016 06:31 PM, Gregory Szorc wrote:

On Sun, Nov 6, 2016 at 7:16 AM, Pierre-Yves David
>
wrote:



On 11/04/2016 09:27 AM, Yuya Nishihara wrote:

On Tue, 01 Nov 2016 18:25:47 -0700, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc >
# Date 1477244433 25200
#  Sun Oct 23 10:40:33 2016 -0700
# Node ID 4c96a177e4ff6a63156ec8cf49483a78e4437756
# Parent  0c41c0cb9b1ef7df40a30672927229ac195b1c92
revlog: optimize _chunkraw when startrev==endrev


LGTM, queued the series, thanks.

In many cases, _chunkraw() is called with startrev==endrev. When
this is true, we can avoid an extra index lookup and some other
minor operations.

On the mozilla-unified repo, `hg perfrevlogchunks -c` says this
has the following impact:

! read w/ reused fd
! wall 0.371846 comb 0.37 user 0.35 sys 0.02
(best of 27)
! wall 0.337930 comb 0.33 user 0.30 sys 0.03
(best of 30)

! read batch w/ reused fd
! wall 0.014952 comb 0.02 user 0.00 sys 0.02
(best of 197)
! wall 0.014866 comb 0.01 user 0.00 sys 0.01
(best of 196)

So, we've gone from ~25x slower than batch to ~22.5% slower.


I'm double reviewing this and I do not understand these number. I
see where you 25x number come from, but I fail to get what this
~22.5% slower one is about. Can you clarify this ?


That "22.5%" is supposed to be "22.5x."



At this point, there's probably not much else we can do except
implement an optimized function in the index itself,
including in C.


+1, if we get a significant high level performance boost it make
sense to move more of the revlog code into C (or other compiled
language).

Cheers,

--
Pierre-Yves David




--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 6 of 7] py3: use try/except to check for basestring

2016-11-06 Thread Pulkit Goyal
This 
https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089099.html
is a better version of what I want to do, since this didn't went
through I will be using this.

On Fri, Nov 4, 2016 at 9:21 AM, Yuya Nishihara  wrote:
> On Thu, 03 Nov 2016 03:53:11 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pul...@gmail.com>
>> # Date 1478121825 -19800
>> #  Thu Nov 03 02:53:45 2016 +0530
>> # Node ID aef03902a5c1a13e9775059a5efdeb2466399ada
>> # Parent  9e259e7b59b6358eb842eabbc99f4c18a4cc5009
>> py3: use try/except to check for basestring
>>
>> The term basestring don't exist in Python 3.5 and throws a NameError.
>> It used to refer to unicodes in Python 2. Used try/expect to handle this.
>>
>> diff -r 9e259e7b59b6 -r aef03902a5c1 mercurial/ui.py
>> --- a/mercurial/ui.py Thu Nov 03 02:27:46 2016 +0530
>> +++ b/mercurial/ui.py Thu Nov 03 02:53:45 2016 +0530
>> @@ -520,7 +520,12 @@
>>  result = self.config(section, name, untrusted=untrusted)
>>  if result is None:
>>  result = default or []
>> -if isinstance(result, basestring):
>> +checkunicode = False
>> +try:
>> +checkunicode = isinstance(result, basestring)
>> +except NameError:
>> +checkunicode = isinstance(result, str)
>> +if checkunicode:
>>  result = _configlist(result.lstrip(' ,\n'))
>
> Given "result" is a source variable of a list to be returned, it shouldn't
> be a unicode. So we can simply use bytes instead of basestring here.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH STABLE V2] hgweb: cache fctx.parents() in annotate command (issue5414)

2016-11-06 Thread Jun Wu
Excerpts from Yuya Nishihara's message of 2016-11-06 11:31:04 +0900:
> Perhaps fctx.parents() can be property-cached, but we'll need to drop
> uninteresting chains of parents in fctx.annotate().

If we go the property-cache approach, I think it's better to cache
"_adjustedlinkrev". It's at a lower level and covers both "parents"
and "introrev". Caching "parents" may increase memory usage unintentionally.

I don't fully get what "uninteresting chains of parents" means here.
In the annotate case, let's say f1, f2 = f0.parents().
Both f1 and f2 have _descendantrev set to f0's adjusted linkrev.

Suppose there is a global cache dict: {(path, filenode, srcrev): linkrev}, I
think if srcrev=_descendantrev (it's true for f1, f2) and _descendantrev is
adjusted from the direct child (f0), then it is "interesting" and can be
cached. This is similar to what marmoute said during the sprint - for the
log -f or annotate case, once the first fctx's introrev is known, the cache
can be used to calculate the ancestors' adjusted linkrevs.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] bdiff: don't check border condition in loop

2016-11-06 Thread Pierre-Yves David

These two are pushed thanks.

On 11/06/2016 08:42 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1478417870 25200
#  Sun Nov 06 00:37:50 2016 -0700
# Node ID 430a4c8f9fa6992afa47d459d592f6c58e686be2
# Parent  e65ef545a3dcb46fe0f1798e76ea17f9f9323452
bdiff: don't check border condition in loop

`plast = a + len - 1`. So, this "for" loop iterates from "a" to "plast",
inclusive. So, `p == plast` can only be true on the final iteration
of the loop. So checking for it on every loop iteration is wasteful.

This patch simply decreases the upper bound of the loop by 1 and
adds an explicit check after iteration for the `p == plast` case.
We can't simply add 1 to the initial value for "i" because that
doesn't do the correct thing on empty input strings.

`perfbdiff -m 3041e4d59df2` on the Firefox repo becomes significantly
faster:

! wall 0.072763 comb 0.07 user 0.07 sys 0.00 (best of 100)
! wall 0.053221 comb 0.06 user 0.06 sys 0.00 (best of 100)

For the curious, this code has its origins in 8b067bde6679, which is
the changeset that introduced bdiff.c in 2005.

Also, GNU diffutils is able to perform a similar line-based diff in
under 20ms. So there's likely more perf wins to be found in this code.
One of them is the hashing algorithm. But it looks like mpm spent
some time testing hash collisions in d0c48891dd4a. I'd like to do the
same before switching away from lyhash, just to be on the safe side.

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -31,9 +31,11 @@ int bdiff_splitlines(const char *a, ssiz

/* count the lines */
i = 1; /* extra line for sentinel */
-   for (p = a; p < a + len; p++)
-   if (*p == '\n' || p == plast)
+   for (p = a; p < plast; p++)
+   if (*p == '\n')
i++;
+   if (p == plast)
+   i++;

*lr = l = (struct bdiff_line *)malloc(sizeof(struct bdiff_line) * i);
if (!l)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel



--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 2] perf: add perfbdiff

2016-11-06 Thread Mads Kiilerich

On 11/06/2016 08:42 AM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1478414512 25200
#  Sat Nov 05 23:41:52 2016 -0700
# Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452
# Parent  f01367faa792635ad2f7a6b175ae3252292b5121
perf: add perfbdiff

bdiff shows up a lot in profiling. I think it would be useful to have
a perf command that runs bdiff over and over so we can find hot spots.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -25,6 +25,7 @@ import random
  import sys
  import time
  from mercurial import (
+bdiff,
  changegroup,
  cmdutil,
  commands,
@@ -746,6 +747,30 @@ def perffncacheencode(ui, repo, **opts):
  timer(d)
  fm.end()
  
+@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV')


I would prefer to keep it simple and consistently use -r for specifying 
revisions.



+def perfbdiff(ui, repo, file_, rev=None, **opts):
+"""benchmark a bdiff between a revision and its delta parent"""
+if opts.get('changelog') or opts.get('manifest'):
+file_, rev = None, file_
+elif rev is None:
+raise error.CommandError('perfbdiff', 'invalid arguments')
+
+r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts)
+
+node = r.lookup(rev)
+rev = r.rev(node)


This might be a stupid lazy question that essential is a request for 
more clarity in code and docstring:

Why this back and forth between rev and node?
Must rev always be a filelog revision ... or is it a changelog revision 
which then is changed to revlog revision while reusing the variable 
name? Perhaps reduce confusion by using different names.



+dp = r.deltaparent(rev)


Should it also consider aggressivemergedeltas?

Or perhaps more generally: should it be possible to compare any two 
revisions - especially for filelogs, to also cover the use case of diff?



More important: patch 2 LGTM.

/Mads

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5 rfc] tests: explore some bdiff cases

2016-11-06 Thread Mads Kiilerich

On 11/06/2016 10:07 AM, Yuya Nishihara wrote:

On Thu, 03 Nov 2016 22:34:11 +0100, Mads Kiilerich wrote:

# HG changeset patch
# User Mads Kiilerich 
# Date 1478208837 -3600
#  Thu Nov 03 22:33:57 2016 +0100
# Node ID f6408efe0d0f4179fe6cc2b967164c1b4567f3d6
# Parent  d06c049695e6ad3219e7479c65ce98a2f123e878
tests: explore some bdiff cases

diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t
new file mode 100644
--- /dev/null
+++ b/tests/test-bhalf.t

'#require no-pure' is necessary since we use difflib in pure.

The other changes in this series look good to me, but it's bdiff.c so I
don't queue them.


Thanks for reviewing and the positive feedback. I will try to polish it 
for "real" submission.


For the last patch, I wonder if it would be better to add a post 
processing step that - given all the chunks - try to shift/rotate all 
match sequences to be as early as possible (and thus deltas to be as 
late and "appending" as possible). That could give more readable diffs, 
especially when combined with heuristics for preferring chunks starting 
with the lowest amount of indentation.


One lesson from these changes seems to be that it is a problem that we 
use the same low level diff algorithm for revlog delta storage and 
bundles and for readable patch diffs. One idea that got mentioned at the 
latest sprint was to use zstandard for storage and "just" seed it with 
the "a" version of the file as dictionary and let it compress the "b" 
side. That might be a better long term solution.


More short term, I wonder how much we could gain from somehow teaching 
bdiff to consider both parents for each chunk instead of just using 
deltas from one side and store chunks from the other verbatim. I think 
that could make a significant difference for repositories with a lot of 
big merges in files or the manifest.


/Mads

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3] revlog: optimize _chunkraw when startrev==endrev

2016-11-06 Thread Pierre-Yves David



On 11/04/2016 09:27 AM, Yuya Nishihara wrote:

On Tue, 01 Nov 2016 18:25:47 -0700, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1477244433 25200
#  Sun Oct 23 10:40:33 2016 -0700
# Node ID 4c96a177e4ff6a63156ec8cf49483a78e4437756
# Parent  0c41c0cb9b1ef7df40a30672927229ac195b1c92
revlog: optimize _chunkraw when startrev==endrev


LGTM, queued the series, thanks.


In many cases, _chunkraw() is called with startrev==endrev. When
this is true, we can avoid an extra index lookup and some other
minor operations.

On the mozilla-unified repo, `hg perfrevlogchunks -c` says this
has the following impact:

! read w/ reused fd
! wall 0.371846 comb 0.37 user 0.35 sys 0.02 (best of 27)
! wall 0.337930 comb 0.33 user 0.30 sys 0.03 (best of 30)

! read batch w/ reused fd
! wall 0.014952 comb 0.02 user 0.00 sys 0.02 (best of 197)
! wall 0.014866 comb 0.01 user 0.00 sys 0.01 (best of 196)

So, we've gone from ~25x slower than batch to ~22.5% slower.


I'm double reviewing this and I do not understand these number. I see 
where you 25x number come from, but I fail to get what this ~22.5% 
slower one is about. Can you clarify this ?



At this point, there's probably not much else we can do except
implement an optimized function in the index itself, including in C.


+1, if we get a significant high level performance boost it make sense 
to move more of the revlog code into C (or other compiled language).


Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] help: show help for disabled extensions(issue5228)

2016-11-06 Thread Yuya Nishihara
On Sun, 6 Nov 2016 04:57:03 -0500, timeless wrote:
> Pulkit Goyal wrote:
> > This patch does not exactly solve issue5228 but it results in a better
> 
> If it doesn't solve the issue, should the issue be tagged in the commit 
> summary?

I think this fix is enough to say issue5228 is solved.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 8] py3: add os.fsdecode() as pycompat.fsdecode()

2016-11-06 Thread Yuya Nishihara
On Sun, 6 Nov 2016 04:57:40 -0500, timeless wrote:
> Pulkit Goyal wrote:
> > +# In Python 2, fsdecode() have a very chances to recieve bytes. So it's
> 
> receive

Fixed as:

# In Python 2, fsdecode() has a very chance to receive bytes. So it's
# better not to touch Python 2 part as it's already working fine.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[Bug 5417] New: Cannot clone repo

2016-11-06 Thread bugzilla
https://bz.mercurial-scm.org/show_bug.cgi?id=5417

Bug ID: 5417
   Summary: Cannot clone repo
   Product: Mercurial
   Version: 3.7.3
  Hardware: PC
OS: Linux
Status: UNCONFIRMED
  Severity: feature
  Priority: wish
 Component: Mercurial
  Assignee: bugzi...@selenic.com
  Reporter: edhillm...@yahoo.com
CC: mercurial-de...@selenic.com

Hi.  I am running Mercurial on Ubuntu 16.04.  I am trying to clone
http://hg.netbeans.org/main.  It starts fast, but the slows down and then
eventually crashes...

ed@Animal:~/netbeans-src$ hg clone http://hg.netbeans.org/main
destination directory: main
requesting all changes
adding changesets
adding manifests
transaction abort!  
rollback completed  
** unknown exception encountered, please report by visiting
** https://mercurial-scm.org/wiki/BugTracker
** Python 2.7.12 (default, Jul  1 2016, 15:12:24) [GCC 5.4.0 20160609]
** Mercurial Distributed SCM (version 3.7.3)
** Extensions loaded: 
Traceback (most recent call last):
  File "/usr/bin/hg", line 43, in 
mercurial.dispatch.run()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 54, in
run
sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 120, in
dispatch
ret = _runcatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 191, in
_runcatch
return _dispatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 924, in
_dispatch
cmdpats, cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 681, in
runcommand
ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 1055, in
_runcommand
return checkargs()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 1015, in
checkargs
return cmdfunc()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 921, in

d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 993, in check
return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/commands.py", line 1563, in
clone
shareopts=opts.get('shareopts'))
  File "/usr/lib/python2.7/dist-packages/mercurial/hg.py", line 595, in clone
streamclonerequested=stream)
  File "/usr/lib/python2.7/dist-packages/mercurial/exchange.py", line 1189, in
pull
_pullchangeset(pullop)
  File "/usr/lib/python2.7/dist-packages/mercurial/exchange.py", line 1386, in
_pullchangeset
pullop.cgresult = cg.apply(pullop.repo, 'pull', pullop.remote.url())
  File "/usr/lib/python2.7/dist-packages/mercurial/changegroup.py", line 381,
in apply
self._unpackmanifests(repo, revmap, trp, prog, changesets)
  File "/usr/lib/python2.7/dist-packages/mercurial/changegroup.py", line 307,
in _unpackmanifests
repo.manifest.addgroup(self, revmap, trp)
  File "/usr/lib/python2.7/dist-packages/mercurial/revlog.py", line 1636, in
addgroup
alwayscache=bool(addrevisioncb))
  File "/usr/lib/python2.7/dist-packages/mercurial/revlog.py", line 1458, in
_addrevision
cachedelta[1])
mpatch.mpatchError: patch cannot be decoded


I have tried to see if any more recent version is available, but apt-get
doesn't show a later version.  Can anyone help me with this?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 08 of 10] repair: migrate revlogs during upgrade

2016-11-06 Thread timeless
Gregory Szorc wrote:
> +return changelog.changelog(repo.svfs)
> +elif path.endswith('00manifest.i'):

Does mercurial use `else` after `return` or does it omit them?

> +  store replacement complete; repository was inconsistent for 0.0s

Do we want to allow tests to run a bit slower than instantaneously?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 05 of 10] repair: obtain and validate requirements for upgraded repo

2016-11-06 Thread timeless
Gregory Szorc wrote:
> +'missing: %s') % ', '.join(sorted(missingreqs)))

I'd argue ', ' should be _(', '), although we probably don't manage
this well or consistently...

Also, is it too early to start talking about -T support? -- See `hg
debuginstall`...
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 04 of 10] repair: identify repository deficiencies

2016-11-06 Thread timeless
Gregory Szorc wrote:
> A command that upgrades repositories but doesn't say what it is doing

shouldn't it still mention that it isn't doing the work?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 8] py3: add os.fsdecode() as pycompat.fsdecode()

2016-11-06 Thread timeless
Pulkit Goyal wrote:
> +# In Python 2, fsdecode() have a very chances to recieve bytes. So it's

receive
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] help: show help for disabled extensions(issue5228)

2016-11-06 Thread timeless
Pulkit Goyal wrote:
> This patch does not exactly solve issue5228 but it results in a better

If it doesn't solve the issue, should the issue be tagged in the commit summary?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] commands: introduce `hg display`

2016-11-06 Thread timeless
Gregory Szorc wrote:
> @@ -2019,6 +2026,13 @@ Dish up an empty repo; serve it cold.
>diff repository (or selected files)
>
>
> +  
> +  display
> +  
> +  
> +  show various repository information
> +  
> +  
>
>export
>

Will /help/display list the views it supports?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5 rfc] tests: explore some bdiff cases

2016-11-06 Thread Yuya Nishihara
On Thu, 03 Nov 2016 22:34:11 +0100, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich 
> # Date 1478208837 -3600
> #  Thu Nov 03 22:33:57 2016 +0100
> # Node ID f6408efe0d0f4179fe6cc2b967164c1b4567f3d6
> # Parent  d06c049695e6ad3219e7479c65ce98a2f123e878
> tests: explore some bdiff cases
> 
> diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-bhalf.t

'#require no-pure' is necessary since we use difflib in pure.

The other changes in this series look good to me, but it's bdiff.c so I
don't queue them.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH V2] help: show help for disabled extensions(issue5228)

2016-11-06 Thread Yuya Nishihara
On Sun, 06 Nov 2016 09:10:18 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pul...@gmail.com>
> # Date 1478395471 -19800
> #  Sun Nov 06 06:54:31 2016 +0530
> # Node ID 6d27c42a7e01adcc0eabf88c623390c85c4a5943
> # Parent  b5fc4e71286dd4f6e4f38e0b9fb17f51f1e3
> help: show help for disabled extensions(issue5228)

Queued this, thanks.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 2] bdiff: don't check border condition in loop

2016-11-06 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478417870 25200
#  Sun Nov 06 00:37:50 2016 -0700
# Node ID 430a4c8f9fa6992afa47d459d592f6c58e686be2
# Parent  e65ef545a3dcb46fe0f1798e76ea17f9f9323452
bdiff: don't check border condition in loop

`plast = a + len - 1`. So, this "for" loop iterates from "a" to "plast",
inclusive. So, `p == plast` can only be true on the final iteration
of the loop. So checking for it on every loop iteration is wasteful.

This patch simply decreases the upper bound of the loop by 1 and
adds an explicit check after iteration for the `p == plast` case.
We can't simply add 1 to the initial value for "i" because that
doesn't do the correct thing on empty input strings.

`perfbdiff -m 3041e4d59df2` on the Firefox repo becomes significantly
faster:

! wall 0.072763 comb 0.07 user 0.07 sys 0.00 (best of 100)
! wall 0.053221 comb 0.06 user 0.06 sys 0.00 (best of 100)

For the curious, this code has its origins in 8b067bde6679, which is
the changeset that introduced bdiff.c in 2005.

Also, GNU diffutils is able to perform a similar line-based diff in
under 20ms. So there's likely more perf wins to be found in this code.
One of them is the hashing algorithm. But it looks like mpm spent
some time testing hash collisions in d0c48891dd4a. I'd like to do the
same before switching away from lyhash, just to be on the safe side.

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -31,9 +31,11 @@ int bdiff_splitlines(const char *a, ssiz
 
/* count the lines */
i = 1; /* extra line for sentinel */
-   for (p = a; p < a + len; p++)
-   if (*p == '\n' || p == plast)
+   for (p = a; p < plast; p++)
+   if (*p == '\n')
i++;
+   if (p == plast)
+   i++;
 
*lr = l = (struct bdiff_line *)malloc(sizeof(struct bdiff_line) * i);
if (!l)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 2] perf: add perfbdiff

2016-11-06 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1478414512 25200
#  Sat Nov 05 23:41:52 2016 -0700
# Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452
# Parent  f01367faa792635ad2f7a6b175ae3252292b5121
perf: add perfbdiff

bdiff shows up a lot in profiling. I think it would be useful to have
a perf command that runs bdiff over and over so we can find hot spots.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -25,6 +25,7 @@ import random
 import sys
 import time
 from mercurial import (
+bdiff,
 changegroup,
 cmdutil,
 commands,
@@ -746,6 +747,30 @@ def perffncacheencode(ui, repo, **opts):
 timer(d)
 fm.end()
 
+@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV')
+def perfbdiff(ui, repo, file_, rev=None, **opts):
+"""benchmark a bdiff between a revision and its delta parent"""
+if opts.get('changelog') or opts.get('manifest'):
+file_, rev = None, file_
+elif rev is None:
+raise error.CommandError('perfbdiff', 'invalid arguments')
+
+r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts)
+
+node = r.lookup(rev)
+rev = r.rev(node)
+dp = r.deltaparent(rev)
+
+text1 = r.revision(dp)
+text2 = r.revision(node)
+
+def d():
+bdiff.bdiff(text1, text2)
+
+timer, fm = gettimer(ui, opts)
+timer(d)
+fm.end()
+
 @command('perfdiffwd', formatteropts)
 def perfdiffwd(ui, repo, **opts):
 """Profile diff of working directory changes"""
diff --git a/tests/test-contrib-perf.t b/tests/test-contrib-perf.t
--- a/tests/test-contrib-perf.t
+++ b/tests/test-contrib-perf.t
@@ -50,6 +50,7 @@ perfstatus
perfancestorset
  (no help text available)
perfannotate  (no help text available)
+   perfbdiff benchmark a bdiff between a revision and its delta parent
perfbranchmap
  benchmark the update of a branchmap
perfcca   (no help text available)
@@ -112,6 +113,7 @@ perfstatus
   $ hg perfancestors
   $ hg perfancestorset 2
   $ hg perfannotate a
+  $ hg perfbdiff -c 1
   $ hg perfbranchmap
   $ hg perfcca
   $ hg perfchangegroupchangelog
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel