# HG changeset patch
# User Mads Kiilerich
# Date 1570804632 -7200
# Fri Oct 11 16:37:12 2019 +0200
# Node ID 72d12ee773795edc163f73b9160e5d29022878dd
# Parent 52781d57313d512efb7150603104bea3ca11d0eb
changegroup: leave out all parent file revisions when creating bundles
When creating a bundle, we get the revisions to bundle, and also the set of
revisions that have been discovered to be common between sender and receiver of
the bundle - which by definition will include all the ancestor revisions. File
revisions where linkrev points at common revisions will be pruned from the
bundle. The common revisions can be represented lazy and efficiently.
That is making the optimistic assumption that the linkrev revision is the
common one. Sometimes it isn't. In that case, the bundle will contain file
revisions that already are common. The increased size of the bundle might be a
problem, but it also makes the bundle size a useless approximation of
information content, and of how much it actually will increase repository size.
The linkrev based pruning does, however, have the advantage that it can leave
out common file revisions, even without being ancestors. Such aggressive
pruning will create bundles that only apply in a specific context, and
effectively make them depend on additional parents than the actual bundle
parents. It is debatable if that is a good feature and how much it actually
gains, but let's leave that out of scope and keep it for now.
We want to avoid that bundles contain filelog entries that are present in
parent revisions of the bundle content. We will do that by enumerating the
files that seem to be changed anywhere in the bundled set, find all parent
revisions of the bundled changesets, for the changed files find the set of file
revisions in the parent changesets, and leave these out when bundling.
To avoid visiting all base revisions for each file, we start by collecting
revisions of all touched files in all parents and keep that in memory. That
will be O(files_changed_in_set * revisions) ... but while bundled changesets
might be complex internally and all could have two unique parents outside the
bundle, they rarely have a lot of parents outside the set. It is also rare to
change a lot of files ... but even then it seems reasonable to be able to hold
the names of all files and some hashes in memory.) (Alternatively, we could
keep all manifests around and look files up on demand ... but manifests might
be big and changes are usually sparse, so it seems more efficient to compute
the sets upfront.)
Testing with the current verbose logging shows where this change makes a
difference ... including a few "because parents, not linkrev" places where this
change will have a positive impact. The changes generally seem reasonable to me
...
TODO:
gather feedback
more testing
investigate the
test-remotefilelog-bgprefetch.t instability
reduce the current logging
diff --git a/hgext/remotefilelog/shallowbundle.py
b/hgext/remotefilelog/shallowbundle.py
--- a/hgext/remotefilelog/shallowbundle.py
+++ b/hgext/remotefilelog/shallowbundle.py
@@ -71,7 +71,8 @@ class shallowcg1packer(changegroup.cgpac
try:
linknodes, commonrevs, source = args
except ValueError:
-commonrevs, source, mfdicts, fastpathlinkrev, fnodes, clrevs = args
+(commonrevs, source, mfdicts, fastpathlinkrev, fnodes, clrevs,
+ setparents) = args
if shallowutil.isenabled(self._repo):
repo = self._repo
if isinstance(repo, bundlerepo.bundlerepository):
diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -931,6 +931,9 @@ class cgpacker(object):
clrevorder = clstate[b'clrevorder']
manifests = clstate[b'manifests']
changedfiles = clstate[b'changedfiles']
+setparents = sorted(set(clstate[b'parentrevs'])
+.difference(clstate[b'changerevs'])
+.difference([-1]))
# We need to make sure that the linkrev in the changegroup refers to
# the first changeset that introduced the manifest or file revision.
@@ -1005,6 +1008,7 @@ class cgpacker(object):
fastpathlinkrev,
fnodes,
clrevs,
+setparents,
)
for path, deltas in it:
@@ -1044,12 +1048,16 @@ class cgpacker(object):
mfl = self._repo.manifestlog
changedfiles = set()
clrevtomanifestrev = {}
+changerevs = set()
+parentrevs = set()
state = {
b'clrevorder': clrevorder,
b'manifests': manifests,
b'changedfiles': changedfiles,
b'clrevtomanifestrev': clrevtomanifestrev,
+b'changerevs': changerevs,
+b'parentrevs': parentrevs,
}
if not (generate or self._ellipses):
@@ -1106,6 +1114,9 @@ class cgpacker(object):