# HG changeset patch
# User Gregory Szorc
# Date 1476355827 -7200
# Thu Oct 13 12:50:27 2016 +0200
# Node ID 231e6b5206857a809198f5535fac32a004686bf1
# Parent aed87a8bed99b373eec5fb09dd2f76d166af59e8
changelog: disable delta chains
This patch disables delta chains on changelogs. After this patch, new
entries on changelogs - including existing changelogs - will be stored
as the fulltext of that data (likely compressed). No delta computation
will be performed.
An overview of delta chains and data justifying this change follows.
Revlogs try to store entries as a delta against a previous entry (either
a parent revision in the case of generaldelta or the previous physical
revision when not using generaldelta). Most of the time this is the
correct thing to do: it frequently results in less CPU usage and smaller
storage.
Delta chains are most effective when the base revision being deltad
against is similar to the current data. This tends to occur naturally
for manifests and file data, since only small parts of each tend to
change with each revision. Changelogs, however, are a different story.
Changelog entries represent changesets/commits. And unless commits in a
repository are homogonous (same author, changing same files, similar
commit messages, etc), a delta from one entry to the next tends to be
relatively large compared to the size of the entry. This means that
delta chains tend to be short. How short? Here is the full vs delta
revision breakdown on some real world repos:
Repo % Full% Delta Max Length
hg45.8 54.26
mozilla-central 42.4 57.68
mozilla-unified 42.5 57.5 17
pypy 46.1 53.96
python-zstandard 46.1 53.93
(I threw in python-zstandard as an example of a repo that is homogonous.
It contains a small Python project with changes all from the same
author.)
Contrast this with the manifest revlog for these repos, where 99+% of
revisions are deltas and delta chains run into the thousands.
So delta chains aren't as useful on changelogs. But even a short delta
chain may provide benefits. Let's measure that.
Delta chains may require less CPU to read revisions if the CPU time
spent reading smaller deltas is less than the CPU time used to
decompress larger individual entries. We can measure this via
`hg perfrevlog -c -d 1` to iterate a revlog to resolve each revision's
fulltext. Here are the results of that command on a repo using delta
chains in its changelog and on a repo without delta chains:
hg (forward)
! wall 0.407008 comb 0.41 user 0.41 sys 0.00 (best of 25)
! wall 0.390061 comb 0.39 user 0.39 sys 0.00 (best of 26)
hg (reverse)
! wall 0.515221 comb 0.52 user 0.52 sys 0.00 (best of 19)
! wall 0.400018 comb 0.40 user 0.39 sys 0.01 (best of 25)
mozilla-central (forward)
! wall 4.508296 comb 4.49 user 4.49 sys 0.00 (best of 3)
! wall 4.370222 comb 4.37 user 4.35 sys 0.02 (best of 3)
mozilla-central (reverse)
! wall 5.758995 comb 5.76 user 5.72 sys 0.04 (best of 3)
! wall 4.346503 comb 4.34 user 4.32 sys 0.02 (best of 3)
mozilla-unified (forward)
! wall 4.957088 comb 4.95 user 4.94 sys 0.01 (best of 3)
! wall 4.660528 comb 4.65 user 4.63 sys 0.02 (best of 3)
mozilla-unified (reverse)
! wall 6.119827 comb 6.11 user 6.09 sys 0.02 (best of 3)
! wall 4.675136 comb 4.67 user 4.67 sys 0.00 (best of 3)
pypy (forward)
! wall 1.231122 comb 1.24 user 1.23 sys 0.01 (best of 8)
! wall 1.164896 comb 1.16 user 1.16 sys 0.00 (best of 9)
pypy (reverse)
! wall 1.467049 comb 1.46 user 1.46 sys 0.00 (best of 7)
! wall 1.160200 comb 1.17 user 1.16 sys 0.01 (best of 9)
The data clearly shows that it takes less wall and CPU time to resolve
revisions when there are no delta chains in the changelogs, regardless
of the direction of traversal. Furthermore, not using a delta chain
means that fulltext resolution in reverse is as fast as iterating
forward. So not using delta chains on the changelog is a clear CPU win
for reading operations.
An example of a user-visible operation showing this speed-up is revset
evaluation. Here are results for
`hg perfrevset 'author(gps) or author(mpm)'`:
hg
! wall 1.655506 comb 1.66 user 1.65 sys 0.01 (best of 6)
! wall 1.612723 comb 1.61 user 1.60 sys 0.01 (best of 7)
mozilla-central
! wall 17.629826 comb 17.64 user 17.60 sys 0.04 (best of 3)
! wall 17.311033 comb 17.30 user 17.26 sys 0.04 (best of 3)
What about 00changelog.i size?
RepoDelta Chains No Delta Chains
hg7,033,250 6,976,771
mozilla-central 82,978,74881,574,623
mozilla-unified 88,112,34986,702,162
pypy 20,740,69920,659,741
The data shows that removing delta chains f