D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
This revision was automatically updated to reflect the committed changes. Closed by commit rHGd80380ba8e7d: changegroup: use any node, not min(), in treemanifest's generatemanifests (authored by spectral, committed by ). REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D1351?vs=3388&id=3389 REVISION DETAIL https://phab.mercurial-scm.org/D1351 AFFECTED FILES mercurial/changegroup.py CHANGE DETAILS diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py --- a/mercurial/changegroup.py +++ b/mercurial/changegroup.py @@ -692,7 +692,7 @@ # Callback for the manifest, used to collect linkrevs for filelog # revisions. # Returns the linkrev node (collected in lookupcl). -def makelookupmflinknode(dir): +def makelookupmflinknode(dir, nodes): if fastpathlinkrev: assert not dir return mfs.__getitem__ @@ -713,7 +713,7 @@ the client before you can trust the list of files and treemanifests to send. """ -clnode = tmfnodes[dir][x] +clnode = nodes[x] mdata = mfl.get(dir, x).readfast(shallow=True) for p, n, fl in mdata.iterentries(): if fl == 't': # subdirectory manifest @@ -733,15 +733,13 @@ size = 0 while tmfnodes: -dir = min(tmfnodes) -nodes = tmfnodes[dir] +dir, nodes = tmfnodes.popitem() prunednodes = self.prune(dirlog(dir), nodes, commonrevs) if not dir or prunednodes: for x in self._packmanifests(dir, prunednodes, - makelookupmflinknode(dir)): + makelookupmflinknode(dir, nodes)): size += len(x) yield x -del tmfnodes[dir] self._verbosenote(_('%8.i (manifests)\n') % size) yield self._manifestsdone() To: spectral, #hg-reviewers, indygreg Cc: indygreg, martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
spectral marked 4 inline comments as done. spectral added a comment. run-tests.py found no issues with this version, my manual testing (using PYTHONHASHSEED to adjust the ordering) also encountered no issues INLINE COMMENTS > indygreg wrote in changegroup.py:738 > Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. > I just don't know if the key needs to remain in the dict until later in the > function... profiling seems to indicate this is a little faster as well as being a bit cleaner, so thanks for making me check :) REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1351 To: spectral, #hg-reviewers Cc: indygreg, martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
spectral updated this revision to Diff 3388. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D1351?vs=3358&id=3388 REVISION DETAIL https://phab.mercurial-scm.org/D1351 AFFECTED FILES mercurial/changegroup.py CHANGE DETAILS diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py --- a/mercurial/changegroup.py +++ b/mercurial/changegroup.py @@ -692,7 +692,7 @@ # Callback for the manifest, used to collect linkrevs for filelog # revisions. # Returns the linkrev node (collected in lookupcl). -def makelookupmflinknode(dir): +def makelookupmflinknode(dir, nodes): if fastpathlinkrev: assert not dir return mfs.__getitem__ @@ -713,7 +713,7 @@ the client before you can trust the list of files and treemanifests to send. """ -clnode = tmfnodes[dir][x] +clnode = nodes[x] mdata = mfl.get(dir, x).readfast(shallow=True) for p, n, fl in mdata.iterentries(): if fl == 't': # subdirectory manifest @@ -733,15 +733,13 @@ size = 0 while tmfnodes: -dir = min(tmfnodes) -nodes = tmfnodes[dir] +dir, nodes = tmfnodes.popitem() prunednodes = self.prune(dirlog(dir), nodes, commonrevs) if not dir or prunednodes: for x in self._packmanifests(dir, prunednodes, - makelookupmflinknode(dir)): + makelookupmflinknode(dir, nodes)): size += len(x) yield x -del tmfnodes[dir] self._verbosenote(_('%8.i (manifests)\n') % size) yield self._manifestsdone() To: spectral, #hg-reviewers Cc: indygreg, martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
martinvonz added inline comments. INLINE COMMENTS > spectral wrote in changegroup.py:738 > I think that it needs to remain, makelookupmflinknode(dir) relies on it > (L716). I haven't attempted to popitem and pass that to makelookupmflinknode > instead, let me try that out now... Oh, yuck. Can perhaps pass "nodes" into makelookupmflinknode() along with "dir"? It will still update it on line 721, but at least one of those unclear sideeffects go away. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1351 To: spectral, #hg-reviewers Cc: indygreg, martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
spectral added inline comments. INLINE COMMENTS > indygreg wrote in changegroup.py:738 > Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. > I just don't know if the key needs to remain in the dict until later in the > function... I think that it needs to remain, makelookupmflinknode(dir) relies on it (L716). I haven't attempted to popitem and pass that to makelookupmflinknode instead, let me try that out now... REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1351 To: spectral, #hg-reviewers Cc: indygreg, martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
martinvonz added a comment. In https://phab.mercurial-scm.org/D1351#22473, @indygreg wrote: > Should I be concerned about the lack of test fallout? This new behavior is non-deterministic. Do we not have testing for this code or is existing testing not low-level enough to uncover behavior changes resulting from this change? I'm not worried about it. It only makes a difference for treemanifests, and the change is just about the order in changegroup and (therefore) the order in which we write revlogs. We don't have much testing of treemanifests and we rarely check the exact format in a changegroup. We do print out a debug message on line 482 that could be used to see a difference before and after this patch, but I just checked that test-treemanifest.t doesn't pass --verbose. Still, I wouldn't mind if some Facebooker tried to run their treemanifest tests (in hgexperimental) against a version with this patch applied. INLINE COMMENTS > indygreg wrote in changegroup.py:738 > Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. > I just don't know if the key needs to remain in the dict until later in the > function... Good idea. I'm pretty sure that should be safe (but perhaps tests will prove me wrong). REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1351 To: spectral, #hg-reviewers Cc: indygreg, martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
indygreg added a comment. Should I be concerned about the lack of test fallout? This new behavior is non-deterministic. Do we not have testing for this code or is existing testing not low-level enough to uncover behavior changes resulting from this change? INLINE COMMENTS > changegroup.py:738 > +# element. > +dir = next(iter(tmfnodes)) > nodes = tmfnodes[dir] Can we use ``dict.popitem()`` instead? That will pop a random key-value pair. I just don't know if the key needs to remain in the dict until later in the function... REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1351 To: spectral, #hg-reviewers Cc: indygreg, martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
martinvonz added a comment. As the author of that line of code, this looks good to me. I'll let someone else queue it since I was involved in the internal discussion and I don't want to feel like I'm queuing my own code (even though I wasn't involved in the actual coding). REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1351 To: spectral, #hg-reviewers Cc: martinvonz, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D1351: changegroup: use any node, not min(), in treemanifest's generatemanifests
spectral created this revision. Herald added a subscriber: mercurial-devel. Herald added a reviewer: hg-reviewers. REVISION SUMMARY This is fixing quadratic behavior, which is probably not noticeable in the common case, but if a very large directory gets added here, it can get pretty bad. This was noticed because we had some pushes that spent >25s in changegroup generation calling min() here, according to profiling. The original reasoning for min() being used in https://phab.mercurial-scm.org/rHG829d369fc5a89f4c290013271c6e5dff2aea63de was that, at that point in the series, we were adding almost everything to tmfnodes during the first iteration through the loop , so we needed to avoid sending child directories before parents. Later changes made it so that the child directories were added only when we visited the parent directory (not all of them on the first iteration), so this is no longer necessary - there won't be any child directories in tmfnodes before the parents have been sent. This does mean that the manifests are now exchanged unordered, whereas previously we would essentially do [a, b, b/c, b/c/d, e], we now can send a, b, and e in any order; b/c must still follow b, and b/c/d must still follow b/c. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D1351 AFFECTED FILES mercurial/changegroup.py CHANGE DETAILS diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py --- a/mercurial/changegroup.py +++ b/mercurial/changegroup.py @@ -733,7 +733,9 @@ size = 0 while tmfnodes: -dir = min(tmfnodes) +# Pick some element from tmfnodes, this is not necessarily the 'min' +# element. +dir = next(iter(tmfnodes)) nodes = tmfnodes[dir] prunednodes = self.prune(dirlog(dir), nodes, commonrevs) if not dir or prunednodes: To: spectral, #hg-reviewers Cc: mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel