[PATCH] bdiff: replace hash algorithm
# HG changeset patch # User Gregory Szorc# Date 1478487117 28800 # Sun Nov 06 18:51:57 2016 -0800 # Node ID bb7c6d6f4a10e80ff4bdf88919692f08497d2d66 # Parent 1c7269484883804b6f960e87309169ef4ae85043 bdiff: replace hash algorithm This patch replaces lyhash with the hash algorithm used by diffutils. The algorithm has its origins in Git commit 2e9d1410, which is all the way back from 1992. The license header in the code at that revision in GPL v2. I have not performed an extensive analysis of the distribution (and therefore buckets) of hash output. However, `hg perfbdiff` gives some clear wins. I'd like to think that if it is good enough for diffutils it is good enough for us? From the mozilla-unified repository: $ perfbdiff -m 3041e4d59df2 ! wall 0.053271 comb 0.06 user 0.06 sys 0.00 (best of 100) ! wall 0.035827 comb 0.04 user 0.04 sys 0.00 (best of 100) $ perfbdiff 0e9928989e9c --alldata --count 100 ! wall 6.204277 comb 6.20 user 6.20 sys 0.00 (best of 3) ! wall 4.309710 comb 4.30 user 4.30 sys 0.00 (best of 3) From the hg repo: $ perfbdiff 35000 --alldata --count 1000 ! wall 0.660358 comb 0.66 user 0.66 sys 0.00 (best of 15) ! wall 0.534092 comb 0.53 user 0.53 sys 0.00 (best of 19) Looking at the generated assembly and statistical profiler output from the kernel level, I believe there is room to make this function even faster. Namely, we're still consuming data character by character instead of at the word level. This translates to more loop iterations and more instructions. At this juncture though, the real performance killer is that we're hashing every line. We should get a significant speedup if we change the algorithm to find the longest prefix, longest suffix, treat those as single "lines" and then only do the line splitting and hashing on the parts that are different. That will require a lot of C code, however. I'm optimistic this approach could result in a ~2x speedup. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -17,6 +17,10 @@ #include "bitmanipulation.h" #include "bdiff.h" +/* Hash implementation from diffutils */ +#define ROL(v, n) ((v) << (n) | (v) >> (sizeof(v) * CHAR_BIT - (n))) +#define HASH(h, c) ((c) + ROL(h ,7)) + struct pos { int pos, len; }; @@ -44,8 +48,7 @@ int bdiff_splitlines(const char *a, ssiz /* build the line array and calculate hashes */ hash = 0; for (p = a; p < a + len; p++) { - /* Leonid Yuriev's hash */ - hash = (hash * 1664525) + (unsigned char)*p + 1013904223; + hash = HASH(hash, *p); if (*p == '\n' || p == plast) { l->hash = hash; ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: to abort / to continue (UX)
On 11/04/2016 07:13 PM, timeless wrote: timeless wrote: technically this isn't wonderful, if you use bookmarks, `hg up -C .` isn't equivalent to `hg up -C {currentbookmark}`, and if you're using hg-git or something that puts significant weight on your active bookmark, this can lead you into some very unfortunate weeds. Martin von Zweigbergk wrote: Good point. There's just no correct way of aborting yet, is there? I suppose we'll have "hg abort" some day. "hg update -C ." always felt a little git-like to me in the it reuses a command that happens to be closest implementation-wise. Yeah, `hg abort` is sort of the right pair to `hg continue`. And we're only slowly approaching `hg continue` (actually, we're probably moderately close). I think we could probably at least write a helper to make the "to abort" case work on average. Bookmarks could register to replace `.` with `{current bookmark}`. We can probably make a dirty patch in evolve just for that. However, I would be very happy if someone would work on an abort/continue facility. So, how about half a plan: 1. add code to enable hookable suggestions for `to abort` 2. someone should survey the hg world to figure out what other things might be wanted beyond `hg up -C .` and `hg up -C {currentbookmark}` 3. we review the current `to continue` 4. we look at wiring `hg continue` and `hg abort` Cheers, -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 5] adjustlinkrev: use C implementation to test ancestor if possible
On 11/02/2016 03:05 AM, Jun Wu wrote: Excerpts from Pierre-Yves David's message of 2016-11-02 00:29:45 +0100: On 11/01/2016 09:51 AM, Jun Wu wrote: # HG changeset patch # User Jun Wu# Date 1477989396 0 # Tue Nov 01 08:36:36 2016 + # Node ID 93e073edc7f673d76d1113f5830ec46210707c25 # Parent ac063914b3a3c01f1d7ed253c73113903fccb7a9 # Available At https://bitbucket.org/quark-zju/hg-draft # hg pull https://bitbucket.org/quark-zju/hg-draft -r 93e073edc7f6 adjustlinkrev: use C implementation to test ancestor if possible The C ancestors implementation is about 10-100x faster than the Python lazyancestors implementation (and it has room to improve: 1. we only need isancestor, not common ancestor; 2. we can shrink revs[p:q] to a single ranged revision if all(revs[i].parentrevs() == [i-1] for i in range(p,q)), the state of known ranged revisions can be stored in the C index object). In the real world, the following command: HGRCPATH=/dev/null hg log -f mercurial/c*.py -T '{rev}\n' > /dev/null Takes 2.3 to 2.5 seconds user-space CPU time before the patch, and 1.7 to 1.9 after. I'd prefer commenting on details directly. Things like "this is a complex issue, you should try to do more homework before trying to solve it" are useful to new people, while not very helpful to me, who had some experience here. I think you can save your time by simplifying most part of this email to: - Could you include runtime data for the Mozilla graft case? - Could you include runtime data for pure and pypy? - How do you think about a C implementation for lazyancestors? So, this not just an usual "This is non trivial - please include more details" feedback. From experience, I know this piece of code is especially full of monsters and dragons. When this code happened we had to multiple severe performance regressions reported during the freeze and after it, and fixing one of them usually made another one appears in its wake. This was bad for the project because it impacted user and had to issue bugfix releases and this was bad for the contributors because I remember having to context switch at 1 am on a release night attending FOSDEM conference to fix yet another regression at was introduced by the previous set batch of fixes, I wasn't happy. I want the project to avoid this again and I don't wish you to have to battle with unexpected consequence. But thanks for expressing this general feeling anyway, I'll try to adapt and adjust feedback in a bit more direct way for other series. There is an history of introducing large performance regression while touching that code. That explain a bit its complexity, for example there is case were we carry the direct parent in _descendantrev and adjust linkrev as we go while there is some case where we just set the top most revision in _descendantrev and adjust link on demand much later. These serve different purposes for different top level use case. I think I fully understand what the current _adjustlinkrev code is doing and why it is the form today. I don't doubt you have are getting a good understanding of the current code, but message like this one https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-November/089815.html show that you do not have a full grasp on the best yet. Instead of running tests blindly, I'd like to measure things more scientifically, like analyzing the code flow, the time complexity etc. For this patch, if C code works, it should be faster in all cases. Reasons: - If _ancestrycontext exists, the code path is the same for both the new and the old code, so there should be no difference. This covers cases where copies.py sets _ancestrycontext explicitly. - If _ancestrycontext does not exist, "memberanc" will be created on the fly by "_adjustlinkrev" and dropped when returning. Therefore no state about which ancestor revisions are already visited is stored. It's walking from _descendantrev or self.rev each time. The C code is doing a same stateless thing but is 10-100x faster. I can update the commit message to include the above reasons. Yes please at least update the commit message next time. This was helpful to get a better grasp on what your changeset is doing. As I'm lacking of time, probably like everyone else here, I'd defend my time to be used on something more meaningful (than unnecessary tests here). So I can end up completing more things here, which is a good thing. As pointed earlier in this reply, last time we installed this code, this ended up much more impactful and timewasting than anticipated. I'm trying to get everybody to eventually spend less time on this here. This means moving carefully and having a lot of initial check so that we don't have to come back on this. That said, pure / pypy is a valid concern. The problem is, is "ancestor.commonancestorsheads" slower than "ancestor.lazyancestors"? I will answer this
[PATCH 3 of 3] perf: support measuring bdiff for all changeset related data
# HG changeset patch # User Gregory Szorc# Date 1478458015 28800 # Sun Nov 06 10:46:55 2016 -0800 # Node ID 1c7269484883804b6f960e87309169ef4ae85043 # Parent 91a510590fa6f4003f596bf661cb17da8800ecde perf: support measuring bdiff for all changeset related data The --all argument changes the behavior of `perfbdiff` to pull in fulltext revision pairs for all changes related to a changeset. The p1 and p2 manifests will be bdiffed against current. Every file that changed between p1 and current will have its file revisions loaded and bdiffed. This mode of operation effectively measured the bdiff time required for `hg commit`. diff --git a/contrib/perf.py b/contrib/perf.py --- a/contrib/perf.py +++ b/contrib/perf.py @@ -748,7 +748,8 @@ def perffncacheencode(ui, repo, **opts): fm.end() @command('perfbdiff', revlogopts + formatteropts + [ -('', 'count', 1, 'number of revisions to test (when using --startrev)')], +('', 'count', 1, 'number of revisions to test (when using --startrev)'), +('', 'alldata', False, 'test bdiffs for all associated revisions')], '-c|-m|FILE REV') def perfbdiff(ui, repo, file_, rev=None, count=None, **opts): """benchmark a bdiff between revisions @@ -757,7 +758,14 @@ def perfbdiff(ui, repo, file_, rev=None, With ``--count``, benchmark bdiffs between delta parents and self for N revisions starting at the specified revision. + +With ``--alldata``, assume the requested revision is a changeset and +measure bdiffs for all changes related to that changeset (manifest +and filelogs). """ +if opts['alldata']: +opts['changelog'] = True + if opts.get('changelog') or opts.get('manifest'): file_, rev = None, file_ elif rev is None: @@ -769,8 +777,25 @@ def perfbdiff(ui, repo, file_, rev=None, startrev = r.rev(r.lookup(rev)) for rev in range(startrev, min(startrev + count, len(r) - 1)): -dp = r.deltaparent(rev) -textpairs.append((r.revision(dp), r.revision(rev))) +if opts['alldata']: +# Load revisions associated with changeset. +ctx = repo[rev] +mtext = repo.manifest.revision(ctx.manifestnode()) +for pctx in ctx.parents(): +pman = repo.manifest.revision(pctx.manifestnode()) +textpairs.append((pman, mtext)) + +# Load filelog revisions by iterating manifest delta. +man = ctx.manifest() +pman = ctx.p1().manifest() +for filename, change in pman.diff(man).items(): +fctx = repo.file(filename) +f1 = fctx.revision(change[0][0] or -1) +f2 = fctx.revision(change[1][0] or -1) +textpairs.append((f1, f2)) +else: +dp = r.deltaparent(rev) +textpairs.append((r.revision(dp), r.revision(rev))) def d(): for pair in textpairs: ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 3] perf: support bdiffing multiple revisions in a single revlog
# HG changeset patch # User Gregory Szorc# Date 1478458885 28800 # Sun Nov 06 11:01:25 2016 -0800 # Node ID 91a510590fa6f4003f596bf661cb17da8800ecde # Parent b8907ec6c08b8215ea8937ecd5f801006d433604 perf: support bdiffing multiple revisions in a single revlog This is useful for testing bdiff performance on several revision pairs at a time. diff --git a/contrib/perf.py b/contrib/perf.py --- a/contrib/perf.py +++ b/contrib/perf.py @@ -747,9 +747,17 @@ def perffncacheencode(ui, repo, **opts): timer(d) fm.end() -@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV') -def perfbdiff(ui, repo, file_, rev=None, **opts): -"""benchmark a bdiff between a revision and its delta parent""" +@command('perfbdiff', revlogopts + formatteropts + [ +('', 'count', 1, 'number of revisions to test (when using --startrev)')], +'-c|-m|FILE REV') +def perfbdiff(ui, repo, file_, rev=None, count=None, **opts): +"""benchmark a bdiff between revisions + +By default, benchmark a bdiff between its delta parent and itself. + +With ``--count``, benchmark bdiffs between delta parents and self for N +revisions starting at the specified revision. +""" if opts.get('changelog') or opts.get('manifest'): file_, rev = None, file_ elif rev is None: @@ -759,10 +767,10 @@ def perfbdiff(ui, repo, file_, rev=None, r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts) -node = r.lookup(rev) -rev = r.rev(node) -dp = r.deltaparent(rev) -textpairs.append((r.revision(dp), r.revision(node))) +startrev = r.rev(r.lookup(rev)) +for rev in range(startrev, min(startrev + count, len(r) - 1)): +dp = r.deltaparent(rev) +textpairs.append((r.revision(dp), r.revision(rev))) def d(): for pair in textpairs: ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 3] perf: prepare to handle multiple pairs in perfbdiff
# HG changeset patch # User Gregory Szorc# Date 1478454674 28800 # Sun Nov 06 09:51:14 2016 -0800 # Node ID b8907ec6c08b8215ea8937ecd5f801006d433604 # Parent 38f7d48d4a8399dc14e95250697fc98bfeafb1fc perf: prepare to handle multiple pairs in perfbdiff Before, we only supported benchmarking a single pair of texts with bdiff. We want to enable feeding larger corpora into this benchmark. So rewrite the code to support that. diff --git a/contrib/perf.py b/contrib/perf.py --- a/contrib/perf.py +++ b/contrib/perf.py @@ -755,17 +755,18 @@ def perfbdiff(ui, repo, file_, rev=None, elif rev is None: raise error.CommandError('perfbdiff', 'invalid arguments') +textpairs = [] + r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts) node = r.lookup(rev) rev = r.rev(node) dp = r.deltaparent(rev) - -text1 = r.revision(dp) -text2 = r.revision(node) +textpairs.append((r.revision(dp), r.revision(node))) def d(): -bdiff.bdiff(text1, text2) +for pair in textpairs: +bdiff.bdiff(*pair) timer, fm = gettimer(ui, opts) timer(d) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3] revlog: optimize _chunkraw when startrev==endrev
okay, I've fixed this in flight. On 11/06/2016 06:31 PM, Gregory Szorc wrote: On Sun, Nov 6, 2016 at 7:16 AM, Pierre-Yves David> wrote: On 11/04/2016 09:27 AM, Yuya Nishihara wrote: On Tue, 01 Nov 2016 18:25:47 -0700, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc > # Date 1477244433 25200 # Sun Oct 23 10:40:33 2016 -0700 # Node ID 4c96a177e4ff6a63156ec8cf49483a78e4437756 # Parent 0c41c0cb9b1ef7df40a30672927229ac195b1c92 revlog: optimize _chunkraw when startrev==endrev LGTM, queued the series, thanks. In many cases, _chunkraw() is called with startrev==endrev. When this is true, we can avoid an extra index lookup and some other minor operations. On the mozilla-unified repo, `hg perfrevlogchunks -c` says this has the following impact: ! read w/ reused fd ! wall 0.371846 comb 0.37 user 0.35 sys 0.02 (best of 27) ! wall 0.337930 comb 0.33 user 0.30 sys 0.03 (best of 30) ! read batch w/ reused fd ! wall 0.014952 comb 0.02 user 0.00 sys 0.02 (best of 197) ! wall 0.014866 comb 0.01 user 0.00 sys 0.01 (best of 196) So, we've gone from ~25x slower than batch to ~22.5% slower. I'm double reviewing this and I do not understand these number. I see where you 25x number come from, but I fail to get what this ~22.5% slower one is about. Can you clarify this ? That "22.5%" is supposed to be "22.5x." At this point, there's probably not much else we can do except implement an optimized function in the index itself, including in C. +1, if we get a significant high level performance boost it make sense to move more of the revlog code into C (or other compiled language). Cheers, -- Pierre-Yves David -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 6 of 7] py3: use try/except to check for basestring
This https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089099.html is a better version of what I want to do, since this didn't went through I will be using this. On Fri, Nov 4, 2016 at 9:21 AM, Yuya Nishiharawrote: > On Thu, 03 Nov 2016 03:53:11 +0530, Pulkit Goyal wrote: >> # HG changeset patch >> # User Pulkit Goyal <7895pul...@gmail.com> >> # Date 1478121825 -19800 >> # Thu Nov 03 02:53:45 2016 +0530 >> # Node ID aef03902a5c1a13e9775059a5efdeb2466399ada >> # Parent 9e259e7b59b6358eb842eabbc99f4c18a4cc5009 >> py3: use try/except to check for basestring >> >> The term basestring don't exist in Python 3.5 and throws a NameError. >> It used to refer to unicodes in Python 2. Used try/expect to handle this. >> >> diff -r 9e259e7b59b6 -r aef03902a5c1 mercurial/ui.py >> --- a/mercurial/ui.py Thu Nov 03 02:27:46 2016 +0530 >> +++ b/mercurial/ui.py Thu Nov 03 02:53:45 2016 +0530 >> @@ -520,7 +520,12 @@ >> result = self.config(section, name, untrusted=untrusted) >> if result is None: >> result = default or [] >> -if isinstance(result, basestring): >> +checkunicode = False >> +try: >> +checkunicode = isinstance(result, basestring) >> +except NameError: >> +checkunicode = isinstance(result, str) >> +if checkunicode: >> result = _configlist(result.lstrip(' ,\n')) > > Given "result" is a source variable of a list to be returned, it shouldn't > be a unicode. So we can simply use bytes instead of basestring here. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH STABLE V2] hgweb: cache fctx.parents() in annotate command (issue5414)
Excerpts from Yuya Nishihara's message of 2016-11-06 11:31:04 +0900: > Perhaps fctx.parents() can be property-cached, but we'll need to drop > uninteresting chains of parents in fctx.annotate(). If we go the property-cache approach, I think it's better to cache "_adjustedlinkrev". It's at a lower level and covers both "parents" and "introrev". Caching "parents" may increase memory usage unintentionally. I don't fully get what "uninteresting chains of parents" means here. In the annotate case, let's say f1, f2 = f0.parents(). Both f1 and f2 have _descendantrev set to f0's adjusted linkrev. Suppose there is a global cache dict: {(path, filenode, srcrev): linkrev}, I think if srcrev=_descendantrev (it's true for f1, f2) and _descendantrev is adjusted from the direct child (f0), then it is "interesting" and can be cached. This is similar to what marmoute said during the sprint - for the log -f or annotate case, once the first fctx's introrev is known, the cache can be used to calculate the ancestors' adjusted linkrevs. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] bdiff: don't check border condition in loop
These two are pushed thanks. On 11/06/2016 08:42 AM, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc# Date 1478417870 25200 # Sun Nov 06 00:37:50 2016 -0700 # Node ID 430a4c8f9fa6992afa47d459d592f6c58e686be2 # Parent e65ef545a3dcb46fe0f1798e76ea17f9f9323452 bdiff: don't check border condition in loop `plast = a + len - 1`. So, this "for" loop iterates from "a" to "plast", inclusive. So, `p == plast` can only be true on the final iteration of the loop. So checking for it on every loop iteration is wasteful. This patch simply decreases the upper bound of the loop by 1 and adds an explicit check after iteration for the `p == plast` case. We can't simply add 1 to the initial value for "i" because that doesn't do the correct thing on empty input strings. `perfbdiff -m 3041e4d59df2` on the Firefox repo becomes significantly faster: ! wall 0.072763 comb 0.07 user 0.07 sys 0.00 (best of 100) ! wall 0.053221 comb 0.06 user 0.06 sys 0.00 (best of 100) For the curious, this code has its origins in 8b067bde6679, which is the changeset that introduced bdiff.c in 2005. Also, GNU diffutils is able to perform a similar line-based diff in under 20ms. So there's likely more perf wins to be found in this code. One of them is the hashing algorithm. But it looks like mpm spent some time testing hash collisions in d0c48891dd4a. I'd like to do the same before switching away from lyhash, just to be on the safe side. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -31,9 +31,11 @@ int bdiff_splitlines(const char *a, ssiz /* count the lines */ i = 1; /* extra line for sentinel */ - for (p = a; p < a + len; p++) - if (*p == '\n' || p == plast) + for (p = a; p < plast; p++) + if (*p == '\n') i++; + if (p == plast) + i++; *lr = l = (struct bdiff_line *)malloc(sizeof(struct bdiff_line) * i); if (!l) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 2] perf: add perfbdiff
On 11/06/2016 08:42 AM, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc# Date 1478414512 25200 # Sat Nov 05 23:41:52 2016 -0700 # Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452 # Parent f01367faa792635ad2f7a6b175ae3252292b5121 perf: add perfbdiff bdiff shows up a lot in profiling. I think it would be useful to have a perf command that runs bdiff over and over so we can find hot spots. diff --git a/contrib/perf.py b/contrib/perf.py --- a/contrib/perf.py +++ b/contrib/perf.py @@ -25,6 +25,7 @@ import random import sys import time from mercurial import ( +bdiff, changegroup, cmdutil, commands, @@ -746,6 +747,30 @@ def perffncacheencode(ui, repo, **opts): timer(d) fm.end() +@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV') I would prefer to keep it simple and consistently use -r for specifying revisions. +def perfbdiff(ui, repo, file_, rev=None, **opts): +"""benchmark a bdiff between a revision and its delta parent""" +if opts.get('changelog') or opts.get('manifest'): +file_, rev = None, file_ +elif rev is None: +raise error.CommandError('perfbdiff', 'invalid arguments') + +r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts) + +node = r.lookup(rev) +rev = r.rev(node) This might be a stupid lazy question that essential is a request for more clarity in code and docstring: Why this back and forth between rev and node? Must rev always be a filelog revision ... or is it a changelog revision which then is changed to revlog revision while reusing the variable name? Perhaps reduce confusion by using different names. +dp = r.deltaparent(rev) Should it also consider aggressivemergedeltas? Or perhaps more generally: should it be possible to compare any two revisions - especially for filelogs, to also cover the use case of diff? More important: patch 2 LGTM. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 5 rfc] tests: explore some bdiff cases
On 11/06/2016 10:07 AM, Yuya Nishihara wrote: On Thu, 03 Nov 2016 22:34:11 +0100, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich# Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID f6408efe0d0f4179fe6cc2b967164c1b4567f3d6 # Parent d06c049695e6ad3219e7479c65ce98a2f123e878 tests: explore some bdiff cases diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t new file mode 100644 --- /dev/null +++ b/tests/test-bhalf.t '#require no-pure' is necessary since we use difflib in pure. The other changes in this series look good to me, but it's bdiff.c so I don't queue them. Thanks for reviewing and the positive feedback. I will try to polish it for "real" submission. For the last patch, I wonder if it would be better to add a post processing step that - given all the chunks - try to shift/rotate all match sequences to be as early as possible (and thus deltas to be as late and "appending" as possible). That could give more readable diffs, especially when combined with heuristics for preferring chunks starting with the lowest amount of indentation. One lesson from these changes seems to be that it is a problem that we use the same low level diff algorithm for revlog delta storage and bundles and for readable patch diffs. One idea that got mentioned at the latest sprint was to use zstandard for storage and "just" seed it with the "a" version of the file as dictionary and let it compress the "b" side. That might be a better long term solution. More short term, I wonder how much we could gain from somehow teaching bdiff to consider both parents for each chunk instead of just using deltas from one side and store chunks from the other verbatim. I think that could make a significant difference for repositories with a lot of big merges in files or the manifest. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3] revlog: optimize _chunkraw when startrev==endrev
On 11/04/2016 09:27 AM, Yuya Nishihara wrote: On Tue, 01 Nov 2016 18:25:47 -0700, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc# Date 1477244433 25200 # Sun Oct 23 10:40:33 2016 -0700 # Node ID 4c96a177e4ff6a63156ec8cf49483a78e4437756 # Parent 0c41c0cb9b1ef7df40a30672927229ac195b1c92 revlog: optimize _chunkraw when startrev==endrev LGTM, queued the series, thanks. In many cases, _chunkraw() is called with startrev==endrev. When this is true, we can avoid an extra index lookup and some other minor operations. On the mozilla-unified repo, `hg perfrevlogchunks -c` says this has the following impact: ! read w/ reused fd ! wall 0.371846 comb 0.37 user 0.35 sys 0.02 (best of 27) ! wall 0.337930 comb 0.33 user 0.30 sys 0.03 (best of 30) ! read batch w/ reused fd ! wall 0.014952 comb 0.02 user 0.00 sys 0.02 (best of 197) ! wall 0.014866 comb 0.01 user 0.00 sys 0.01 (best of 196) So, we've gone from ~25x slower than batch to ~22.5% slower. I'm double reviewing this and I do not understand these number. I see where you 25x number come from, but I fail to get what this ~22.5% slower one is about. Can you clarify this ? At this point, there's probably not much else we can do except implement an optimized function in the index itself, including in C. +1, if we get a significant high level performance boost it make sense to move more of the revlog code into C (or other compiled language). Cheers, -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] help: show help for disabled extensions(issue5228)
On Sun, 6 Nov 2016 04:57:03 -0500, timeless wrote: > Pulkit Goyal wrote: > > This patch does not exactly solve issue5228 but it results in a better > > If it doesn't solve the issue, should the issue be tagged in the commit > summary? I think this fix is enough to say issue5228 is solved. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 8] py3: add os.fsdecode() as pycompat.fsdecode()
On Sun, 6 Nov 2016 04:57:40 -0500, timeless wrote: > Pulkit Goyal wrote: > > +# In Python 2, fsdecode() have a very chances to recieve bytes. So it's > > receive Fixed as: # In Python 2, fsdecode() has a very chance to receive bytes. So it's # better not to touch Python 2 part as it's already working fine. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[Bug 5417] New: Cannot clone repo
https://bz.mercurial-scm.org/show_bug.cgi?id=5417 Bug ID: 5417 Summary: Cannot clone repo Product: Mercurial Version: 3.7.3 Hardware: PC OS: Linux Status: UNCONFIRMED Severity: feature Priority: wish Component: Mercurial Assignee: bugzi...@selenic.com Reporter: edhillm...@yahoo.com CC: mercurial-de...@selenic.com Hi. I am running Mercurial on Ubuntu 16.04. I am trying to clone http://hg.netbeans.org/main. It starts fast, but the slows down and then eventually crashes... ed@Animal:~/netbeans-src$ hg clone http://hg.netbeans.org/main destination directory: main requesting all changes adding changesets adding manifests transaction abort! rollback completed ** unknown exception encountered, please report by visiting ** https://mercurial-scm.org/wiki/BugTracker ** Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] ** Mercurial Distributed SCM (version 3.7.3) ** Extensions loaded: Traceback (most recent call last): File "/usr/bin/hg", line 43, in mercurial.dispatch.run() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 54, in run sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 120, in dispatch ret = _runcatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 191, in _runcatch return _dispatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 924, in _dispatch cmdpats, cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 681, in runcommand ret = _runcommand(ui, options, cmd, d) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 1055, in _runcommand return checkargs() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 1015, in checkargs return cmdfunc() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 921, in d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 993, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/commands.py", line 1563, in clone shareopts=opts.get('shareopts')) File "/usr/lib/python2.7/dist-packages/mercurial/hg.py", line 595, in clone streamclonerequested=stream) File "/usr/lib/python2.7/dist-packages/mercurial/exchange.py", line 1189, in pull _pullchangeset(pullop) File "/usr/lib/python2.7/dist-packages/mercurial/exchange.py", line 1386, in _pullchangeset pullop.cgresult = cg.apply(pullop.repo, 'pull', pullop.remote.url()) File "/usr/lib/python2.7/dist-packages/mercurial/changegroup.py", line 381, in apply self._unpackmanifests(repo, revmap, trp, prog, changesets) File "/usr/lib/python2.7/dist-packages/mercurial/changegroup.py", line 307, in _unpackmanifests repo.manifest.addgroup(self, revmap, trp) File "/usr/lib/python2.7/dist-packages/mercurial/revlog.py", line 1636, in addgroup alwayscache=bool(addrevisioncb)) File "/usr/lib/python2.7/dist-packages/mercurial/revlog.py", line 1458, in _addrevision cachedelta[1]) mpatch.mpatchError: patch cannot be decoded I have tried to see if any more recent version is available, but apt-get doesn't show a later version. Can anyone help me with this? -- You are receiving this mail because: You are on the CC list for the bug. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 08 of 10] repair: migrate revlogs during upgrade
Gregory Szorc wrote: > +return changelog.changelog(repo.svfs) > +elif path.endswith('00manifest.i'): Does mercurial use `else` after `return` or does it omit them? > + store replacement complete; repository was inconsistent for 0.0s Do we want to allow tests to run a bit slower than instantaneously? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 05 of 10] repair: obtain and validate requirements for upgraded repo
Gregory Szorc wrote: > +'missing: %s') % ', '.join(sorted(missingreqs))) I'd argue ', ' should be _(', '), although we probably don't manage this well or consistently... Also, is it too early to start talking about -T support? -- See `hg debuginstall`... ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 04 of 10] repair: identify repository deficiencies
Gregory Szorc wrote: > A command that upgrades repositories but doesn't say what it is doing shouldn't it still mention that it isn't doing the work? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 8] py3: add os.fsdecode() as pycompat.fsdecode()
Pulkit Goyal wrote: > +# In Python 2, fsdecode() have a very chances to recieve bytes. So it's receive ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] help: show help for disabled extensions(issue5228)
Pulkit Goyal wrote: > This patch does not exactly solve issue5228 but it results in a better If it doesn't solve the issue, should the issue be tagged in the commit summary? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] commands: introduce `hg display`
Gregory Szorc wrote: > @@ -2019,6 +2026,13 @@ Dish up an empty repo; serve it cold. >diff repository (or selected files) > > > + > + display > + > + > + show various repository information > + > + > >export > Will /help/display list the views it supports? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 5 rfc] tests: explore some bdiff cases
On Thu, 03 Nov 2016 22:34:11 +0100, Mads Kiilerich wrote: > # HG changeset patch > # User Mads Kiilerich> # Date 1478208837 -3600 > # Thu Nov 03 22:33:57 2016 +0100 > # Node ID f6408efe0d0f4179fe6cc2b967164c1b4567f3d6 > # Parent d06c049695e6ad3219e7479c65ce98a2f123e878 > tests: explore some bdiff cases > > diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t > new file mode 100644 > --- /dev/null > +++ b/tests/test-bhalf.t '#require no-pure' is necessary since we use difflib in pure. The other changes in this series look good to me, but it's bdiff.c so I don't queue them. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH V2] help: show help for disabled extensions(issue5228)
On Sun, 06 Nov 2016 09:10:18 +0530, Pulkit Goyal wrote: > # HG changeset patch > # User Pulkit Goyal <7895pul...@gmail.com> > # Date 1478395471 -19800 > # Sun Nov 06 06:54:31 2016 +0530 > # Node ID 6d27c42a7e01adcc0eabf88c623390c85c4a5943 > # Parent b5fc4e71286dd4f6e4f38e0b9fb17f51f1e3 > help: show help for disabled extensions(issue5228) Queued this, thanks. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 2] bdiff: don't check border condition in loop
# HG changeset patch # User Gregory Szorc# Date 1478417870 25200 # Sun Nov 06 00:37:50 2016 -0700 # Node ID 430a4c8f9fa6992afa47d459d592f6c58e686be2 # Parent e65ef545a3dcb46fe0f1798e76ea17f9f9323452 bdiff: don't check border condition in loop `plast = a + len - 1`. So, this "for" loop iterates from "a" to "plast", inclusive. So, `p == plast` can only be true on the final iteration of the loop. So checking for it on every loop iteration is wasteful. This patch simply decreases the upper bound of the loop by 1 and adds an explicit check after iteration for the `p == plast` case. We can't simply add 1 to the initial value for "i" because that doesn't do the correct thing on empty input strings. `perfbdiff -m 3041e4d59df2` on the Firefox repo becomes significantly faster: ! wall 0.072763 comb 0.07 user 0.07 sys 0.00 (best of 100) ! wall 0.053221 comb 0.06 user 0.06 sys 0.00 (best of 100) For the curious, this code has its origins in 8b067bde6679, which is the changeset that introduced bdiff.c in 2005. Also, GNU diffutils is able to perform a similar line-based diff in under 20ms. So there's likely more perf wins to be found in this code. One of them is the hashing algorithm. But it looks like mpm spent some time testing hash collisions in d0c48891dd4a. I'd like to do the same before switching away from lyhash, just to be on the safe side. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -31,9 +31,11 @@ int bdiff_splitlines(const char *a, ssiz /* count the lines */ i = 1; /* extra line for sentinel */ - for (p = a; p < a + len; p++) - if (*p == '\n' || p == plast) + for (p = a; p < plast; p++) + if (*p == '\n') i++; + if (p == plast) + i++; *lr = l = (struct bdiff_line *)malloc(sizeof(struct bdiff_line) * i); if (!l) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 2] perf: add perfbdiff
# HG changeset patch # User Gregory Szorc# Date 1478414512 25200 # Sat Nov 05 23:41:52 2016 -0700 # Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452 # Parent f01367faa792635ad2f7a6b175ae3252292b5121 perf: add perfbdiff bdiff shows up a lot in profiling. I think it would be useful to have a perf command that runs bdiff over and over so we can find hot spots. diff --git a/contrib/perf.py b/contrib/perf.py --- a/contrib/perf.py +++ b/contrib/perf.py @@ -25,6 +25,7 @@ import random import sys import time from mercurial import ( +bdiff, changegroup, cmdutil, commands, @@ -746,6 +747,30 @@ def perffncacheencode(ui, repo, **opts): timer(d) fm.end() +@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV') +def perfbdiff(ui, repo, file_, rev=None, **opts): +"""benchmark a bdiff between a revision and its delta parent""" +if opts.get('changelog') or opts.get('manifest'): +file_, rev = None, file_ +elif rev is None: +raise error.CommandError('perfbdiff', 'invalid arguments') + +r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts) + +node = r.lookup(rev) +rev = r.rev(node) +dp = r.deltaparent(rev) + +text1 = r.revision(dp) +text2 = r.revision(node) + +def d(): +bdiff.bdiff(text1, text2) + +timer, fm = gettimer(ui, opts) +timer(d) +fm.end() + @command('perfdiffwd', formatteropts) def perfdiffwd(ui, repo, **opts): """Profile diff of working directory changes""" diff --git a/tests/test-contrib-perf.t b/tests/test-contrib-perf.t --- a/tests/test-contrib-perf.t +++ b/tests/test-contrib-perf.t @@ -50,6 +50,7 @@ perfstatus perfancestorset (no help text available) perfannotate (no help text available) + perfbdiff benchmark a bdiff between a revision and its delta parent perfbranchmap benchmark the update of a branchmap perfcca (no help text available) @@ -112,6 +113,7 @@ perfstatus $ hg perfancestors $ hg perfancestorset 2 $ hg perfannotate a + $ hg perfbdiff -c 1 $ hg perfbranchmap $ hg perfcca $ hg perfchangegroupchangelog ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel