Re: [PATCH 1 of 9 V3] perf: introduce a perfrevlogwrite command

2018-11-07 Thread Yuya Nishihara
On Tue, 06 Nov 2018 12:27:29 +0100, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld 
> # Date 1538556809 -7200
> #  Wed Oct 03 10:53:29 2018 +0200
> # Node ID 1d1bc06187e9296430045aa39c3d3e2d12f61875
> # Parent  909c31805f54628ab5bf22cd92418c8ac9c09277
> # EXP-Topic revlog-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> 1d1bc06187e9
> perf: introduce a perfrevlogwrite command

Queued, thanks.

Can you add some tests as a follow up? This hooks into various revlog
internals, which will be likely to break by API change. Also, the current
code wouldn't work on Python 3.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 9 V3] perf: introduce a perfrevlogwrite command

2018-11-06 Thread Boris Feld
# HG changeset patch
# User Boris Feld 
# Date 1538556809 -7200
#  Wed Oct 03 10:53:29 2018 +0200
# Node ID 1d1bc06187e9296430045aa39c3d3e2d12f61875
# Parent  909c31805f54628ab5bf22cd92418c8ac9c09277
# EXP-Topic revlog-perf
# Available At https://bitbucket.org/octobus/mercurial-devel/
#  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
1d1bc06187e9
perf: introduce a perfrevlogwrite command

The command record times taken by adding many revisions to a revlog. Timing
each addition, individually. The "added revision" are recreations of the
original ones.

To time each addition individually, we have to handle the timing and the
reporting ourselves.

This command is introduced to track the impact of sparse-revlog format on
delta computations at initial storage time. It starts with the full text, a
situation similar to the "commit". Additions from an existing delta are better
timed with bundles.

The complaints from `check-perf-code.py` are not relevant. We are accessing
and "revlog" opener, not a repository opener.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -24,8 +24,10 @@ import functools
 import gc
 import os
 import random
+import shutil
 import struct
 import sys
+import tempfile
 import threading
 import time
 from mercurial import (
@@ -1565,6 +1567,161 @@ def perfrevlogrevisions(ui, repo, file_=
 timer(d)
 fm.end()
 
+@command(b'perfrevlogwrite', revlogopts + formatteropts +
+ [(b's', b'startrev', 1000, b'revision to start writing at'),
+  (b'', b'stoprev', -1, b'last revision to write'),
+  (b'', b'count', 3, b'last revision to write'),
+ ],
+ b'-c|-m|FILE')
+def perfrevlogwrite(ui, repo, file_=None, startrev=1000, stoprev=-1, **opts):
+"""Benchmark writing a series of revisions to a revlog.
+"""
+opts = _byteskwargs(opts)
+
+rl = cmdutil.openrevlog(repo, b'perfrevlogwrite', file_, opts)
+rllen = getlen(ui)(rl)
+if startrev < 0:
+startrev = rllen + startrev
+if stoprev < 0:
+stoprev = rllen + stoprev
+
+### actually gather results
+count = opts['count']
+if count <= 0:
+raise error.Abort('invalide run count: %d' % count)
+allresults = []
+for c in range(count):
+allresults.append(_timeonewrite(ui, rl, startrev, stoprev, c + 1))
+
+### consolidate the results in a single list
+results = []
+for idx, (rev, t) in enumerate(allresults[0]):
+ts = [t]
+for other in allresults[1:]:
+orev, ot = other[idx]
+assert orev == rev
+ts.append(ot)
+results.append((rev, ts))
+resultcount = len(results)
+
+### Compute and display relevant statistics
+
+# get a formatter
+fm = ui.formatter(b'perf', opts)
+displayall = ui.configbool(b"perf", b"all-timing", False)
+
+# sorts results by median time
+results.sort(key=lambda x: sorted(x[1])[len(x[1]) // 2])
+# list of (name, index) to display)
+relevants = [
+("min", 0),
+("10%", resultcount * 10 // 100),
+("25%", resultcount * 25 // 100),
+("50%", resultcount * 70 // 100),
+("75%", resultcount * 75 // 100),
+("90%", resultcount * 90 // 100),
+("95%", resultcount * 95 // 100),
+("99%", resultcount * 99 // 100),
+("max", -1),
+]
+for name, idx in relevants:
+data = results[idx]
+title = '%s of %d, rev %d' % (name, resultcount, data[0])
+formatone(fm, data[1], title=title, displayall=displayall)
+
+# XXX summing that many float will not be very precise, we ignore this fact
+# for now
+totaltime = []
+for item in allresults:
+totaltime.append((sum(x[1][0] for x in item),
+  sum(x[1][1] for x in item),
+  sum(x[1][2] for x in item),)
+)
+formatone(fm, totaltime, title="total time (%d revs)" % resultcount,
+  displayall=displayall)
+fm.end()
+
+class _faketr(object):
+def add(s, x, y, z=None):
+return None
+
+def _timeonewrite(ui, orig, startrev, stoprev, runidx=None):
+timings = []
+tr = _faketr()
+with _temprevlog(ui, orig, startrev) as dest:
+revs = list(orig.revs(startrev, stoprev))
+total = len(revs)
+topic = 'adding'
+if runidx is not None:
+topic += ' (run #%d)' % runidx
+for idx, rev in enumerate(revs):
+ui.progress(topic, idx, unit='revs', total=total)
+addargs, addkwargs = _getrevisionseed(orig, rev, tr)
+with timeone() as r:
+dest.addrawrevision(*addargs, **addkwargs)
+timings.append((rev, r[0]))
+ui.progress(topic, total, unit='revs', total=total)
+ui.progress(topic, None, unit='revs', total=total)
+return timings
+
+def _getrevisionseed(orig, rev, tr):
+linkrev = orig.linkrev(rev)
+node = orig.no