D5370: rust: core implementation of missingancestors (no bindings)
gracinet abandoned this revision. gracinet marked an inline comment as done. gracinet added a comment. This Differential has been superseded by https://phab.mercurial-scm.org/D5414 through https://phab.mercurial-scm.org/D5417 REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
gracinet marked 11 inline comments as done. gracinet added a comment. @yuja trying to submit the v2 with phabsend instead of arcanist… hope it'll be linked properly INLINE COMMENTS > kevincox wrote in ancestors.rs:201 > Isn't this loop redundant with the retain call above? indeed, must have been a leftover > kevincox wrote in ancestors.rs:235 > I would prefer `let (p0, p1) = ...`. This makes it obvious exactly what data > you are getting. In the next version, I'm simply doing a for loop on `.iter().cloned()`, it's even better imho. > kevincox wrote in lib.rs:26 > You can add `.cloned()` to remove the references. I would be surprised if > this has poor code generation. > > As for HashSet vs bit array I would stick to HashSet for now. HashSet also > has the advantage of being sparse. I guess after the initial implementation > we could benchmark the two to see what is better. Thanks for the tip with `.cloned()` that's exactly what I was missing. Out of curiosity, I took a look at the generated optimized assembly code: zero cost abstraction, indeed. About HashSet, yes that's exactly what I'm thinking. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
yuja added a comment. > @yuja: do you mean one of those Differential Revisions of this system for each commit, sure I can do. I don't know how Phabricator is working, but probably yes. I meant we want https://phab.mercurial-scm.org/D5370, https://phab.mercurial-scm.org/D5371, and https://phab.mercurial-scm.org/D5372 instead of a folded https://phab.mercurial-scm.org/D5370. I think phabsend will do that way. > With respect to rust-cpython bindings, I'm currently waiting for feedback on https://github.com/dgrunwald/rust-cpython/issues/164 > Perhaps you'd have an idea ? No, but it looks interesting. I'll take a look this weekend. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: D5370: rust: core implementation of missingancestors (no bindings)
> @yuja: do you mean one of those Differential Revisions of this system for > each commit, sure I can do. I don't know how Phabricator is working, but probably yes. I meant we want D5370, D5371, and D5372 instead of a folded D5370. I think phabsend will do that way. > With respect to rust-cpython bindings, I'm currently waiting for feedback > on https://github.com/dgrunwald/rust-cpython/issues/164 > Perhaps you'd have an idea ? No, but it looks interesting. I'll take a look this weekend. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
gracinet added a comment. @yuja: do you mean one of those Differential Revisions of this system for each commit, sure I can do. With respect to rust-cpython bindings, I'm currently waiting for feedback on https://github.com/dgrunwald/rust-cpython/issues/164 Perhaps you'd have an idea ? Short summary: sporadic segfaults that I can for now reproduce only by running the whole test suite. That being said, I do have rust-cpython bindings for AncestorsIterator and MissingAncestors (plus the full lazyancestors class). Unless I'm mistaken, that means that together with the existing C implementations, there's a potential for a full native version of `mercurial/ancestor.py`. I can send them to the mailing-list or here. What would be the correct flag for that ? RFC ? I'm also working on a perfmissingancestors (started a few hours ago). For now it confirms measurements I'd observed earlier: about x5 performance boost. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
yuja added a comment. Quickly scanned, and looks generally good to me. > rust: iterator version of Graph.parents > rust: translation of missingancestors > rust: translated random test of missingancestors Can you send these as separate patches? > An alternative would have been to expose to Python > MissingAncestors but that would have meant > > - pollution of the release build used from Python, whereas we do it in this changeset within the tests submodule > - waiting on rust-cpython bindings to be ready or doing the cumbersome direct-ffi (more pollution with unsafe C code) Still I want some CPython interface to measure the perf win. Are there lots of work remaining to bring rust-cpython to us? > although one could argue that actually parents() should return an > array instead of a tuple, giving us a similar iterator for free (but on > references instead of values, unless we also use the arrayvec crate > could help). Notably, the current C-backed parents() internally uses an > array for communication with C code, so that currently, we'd get less memory > copy and less code using an array. I prefer changing parents() to return `[Revision; 2]`. Then, we can write a simple utility function that drops NULL_REVISION, `&[Revision; 2] -> &[Revision]`. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: D5370: rust: core implementation of missingancestors (no bindings)
Quickly scanned, and looks generally good to me. > rust: iterator version of Graph.parents > rust: translation of missingancestors > rust: translated random test of missingancestors Can you send these as separate patches? > An alternative would have been to expose to Python > MissingAncestors but that would have meant > > - pollution of the release build used from Python, whereas we do it in > this changeset within the tests submodule > - waiting on rust-cpython bindings to be ready or doing the cumbersome > direct-ffi (more pollution with unsafe C code) Still I want some CPython interface to measure the perf win. Are there lots of work remaining to bring rust-cpython to us? > although one could argue that actually parents() should return an > array instead of a tuple, giving us a similar iterator for free (but on > references instead of values, unless we also use the arrayvec crate > could help). Notably, the current C-backed parents() internally uses an > array for communication with C code, so that currently, we'd get less memory > copy and less code using an array. I prefer changing parents() to return `[Revision; 2]`. Then, we can write a simple utility function that drops NULL_REVISION, `&[Revision; 2] -> &[Revision]`. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
kevincox added a comment. I think sticking close to the python version makes sense for the initial version. Then improvements can be made in follow-ups. INLINE COMMENTS > gracinet wrote in lib.rs:26 > The reason I've preferred to implement it directly is that `into_iter()` > iterates on references, which I found an unnecessary overhead,. > > But maybe that's a case of premature optimization ? In the same vein, I don't > think `HashSet` is the best choice for a set of `i32`, unless it becomes > really big, but I don't know where the threshold would be, compared to, say, > a bit array. You can add `.cloned()` to remove the references. I would be surprised if this has poor code generation. As for HashSet vs bit array I would stick to HashSet for now. HashSet also has the advantage of being sparse. I guess after the initial implementation we could benchmark the two to see what is better. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
gracinet added a comment. I'll include your remark in the upcoming v2, thanks again ! INLINE COMMENTS > kevincox wrote in ancestors.rs:159 > Is always having an item in the set actually saving many corner cases? It > seems like you usually check for empty sets anyways. Not sure about this, so for now, I've been playing safe, ie, exactly as the Python version. > kevincox wrote in lib.rs:20 > I think the better solution here is to make `parents` return `[Revision; 2]` > (or `&[Revision]` and not return nulls. I agree, it was a pythonism to use a tuple, but it has impact on `AncestorsIterators`, as well as the implementation in `hg-direct-ffi`, and in more code I've not submitted yet, so I'd prefer to do that change independently in a follow-up. > kevincox wrote in lib.rs:26 > Would something like the following be simpler? > > fn parents_iter( > , > r: Revision, > ) -> Result, GraphError> { > let (p0, p1) = self.parents(r)?; > let iter = [p0, p1].into_iter() > .map(|r| if r == NULL_REVISION { None} else { Some(r) }); > Ok(iter) > } The reason I've preferred to implement it directly is that `into_iter()` iterates on references, which I found an unnecessary overhead,. But maybe that's a case of premature optimization ? In the same vein, I don't think `HashSet` is the best choice for a set of `i32`, unless it becomes really big, but I don't know where the threshold would be, compared to, say, a bit array. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
kevincox added a comment. Overall looks good to me. Just a couple of points. - Using random numbers for tests without a seed that is logged will create failures which are basically impossible to reproduce. - A lot of the comments are comparing to the python, I don't know how useful this is to most readers. INLINE COMMENTS > ancestors.rs:155 > +where > +I: IntoIterator, > +{ I prefer doing `bases: impl IntoIterator` instead of using a where clause. > ancestors.rs:157 > +{ > +let mut bset: HashSet = bases.into_iter().collect(); > +if bset.is_empty() { If this is going to be stored in the `bases` field I would just call it `bases`. This will also allow you to do `MissingAncestors { graph, bases }` if you prefer. > ancestors.rs:159 > +if bset.is_empty() { > +bset.insert(NULL_REVISION); > +} Is always having an item in the set actually saving many corner cases? It seems like you usually check for empty sets anyways. > ancestors.rs:170 > +0 => false, // shouldn't happen > +1 => *(self.bases.iter().next().unwrap()) != -1, > +_ => true, s/-1/NULL_REVISISON > ancestors.rs:195 > +) -> Result<(), GraphError> { > +// using difference() or intersection() would not satisfy borrow > rules > +revs.retain(|r| !self.bases.contains(r)); I don't think this comment is helpful to the reader. > ancestors.rs:201 > +revs.remove(); > +} > +if revs.is_empty() { Isn't this loop redundant with the retain call above? > ancestors.rs:219 > +// going all the way to the root > +let keepcount = revs.iter().filter(|| *r > start).count(); > + I find the pattern match and deref slightly confusing. I would prefer a `**` to make it obvious you are doing a double deref. > ancestors.rs:235 > +fn add_parents( self, rev: Revision) -> Result<(), GraphError> { > +let parents = self.graph.parents(rev)?; > +// No need to bother the set with inserting NULL_REVISION over and I would prefer `let (p0, p1) = ...`. This makes it obvious exactly what data you are getting. > ancestors.rs:262 > +let bases_visit = self.bases; > +let mut revs_as_set: HashSet = revs > +.into_iter() I would simply call it `revs`. > ancestors.rs:281 > +None => NULL_REVISION, > +}; > +let start = max(max_bases, max_revs); Replace the match with `.unwrap_or(NULL_REVISION)` > ancestors.rs:286 > +let mut missing: Vec = Vec::new(); > +for curr in (0..start + 1).map(|i| start - i) { > +if revs_visit.is_empty() { If an inclusive range use `0..=start`. > ancestors.rs:293 > +// another path > +// TODO optim: Rust's HashSet.remove returns a boolean > telling > +// if it happened. This will spare us one set lookup You may as well do this now. Just move the remove into the if. > ancestors.rs:309 > +missing.push(curr); > +revs_visit.remove(); > +true Do this remove in the if as well. > ancestors.rs:484 > +fn test_missing_bases() { > +let mut missanc = MissingAncestors::new(Stub, vec![5, 3, 1, 3]); > +let mut as_vec: Vec; Since you have used IntoIter you shouldn't need the `vec!`s. > ancestors.rs:486 > +let mut as_vec: Vec; > +as_vec = missanc.get_bases().iter().map(|| r).collect(); > +as_vec.sort(); Do this assignment on the same line as the declaration. > lib.rs:20 > + > +fn parents_iter( > +, I think the better solution here is to make `parents` return `[Revision; 2]` (or `&[Revision]` and not return nulls. > lib.rs:26 > +Ok(ParentsIterator::new(parents)) > +} > +} Would something like the following be simpler? fn parents_iter( , r: Revision, ) -> Result, GraphError> { let (p0, p1) = self.parents(r)?; let iter = [p0, p1].into_iter() .map(|r| if r == NULL_REVISION { None} else { Some(r) }); Ok(iter) } > lib.rs:36 > +/// Instead of NULL_REVISION, None is returned > +/// Here a macro would probably be more in order for performance > +impl ParentsIterator { I don't think this comment is necessary. Also this should be quite transparent to the optimizer. > lib.rs:52 > +self.cur += 1; > +match match self.cur { > +0 => self.parents.0, I would prefer putting the result in a variable and then doing the second match later. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5370 To: gracinet, #hg-reviewers Cc: durin42, kevincox, mjpieters, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5370: rust: core implementation of missingancestors (no bindings)
gracinet created this revision. Herald added subscribers: mercurial-devel, mjpieters, kevincox, durin42. Herald added a reviewer: hg-reviewers. REVISION SUMMARY rust: iterator version of Graph.parents This is handy for callers that want to simply do: for p in graph.parents_iter(rev) although one could argue that actually parents() should return an array instead of a tuple, giving us a similar iterator for free (but on references instead of values, unless we also use the arrayvec crate could help). Notably, the current C-backed parents() internally uses an array for communication with C code, so that currently, we'd get less memory copy and less code using an array. rust: translation of missingancestors This is as direct as possible a translation of the ancestor.missingancestors Python class in pure Rust. The goal for this changeset is to make it easy to compare with the Python version. We also add to Python tests the cases that helped us develop and debug this implementation. Some possible optimizations are marked along the way as TODO comments rust: translated random test of missingancestors An alternative would have been to expose to Python MissingAncestors but that would have meant - pollution of the release build used from Python, whereas we do it in this changeset within the tests submodule - waiting on rust-cpython bindings to be ready or doing the cumbersome direct-ffi (more pollution with unsafe C code) REPOSITORY rHG Mercurial BRANCH default REVISION DETAIL https://phab.mercurial-scm.org/D5370 AFFECTED FILES rust/Cargo.lock rust/hg-core/Cargo.toml rust/hg-core/src/ancestors.rs rust/hg-core/src/lib.rs tests/test-ancestor.py tests/test-ancestor.py.out CHANGE DETAILS diff --git a/tests/test-ancestor.py.out b/tests/test-ancestor.py.out --- a/tests/test-ancestor.py.out +++ b/tests/test-ancestor.py.out @@ -1,3 +1,17 @@ +% removeancestorsfrom(), example 1 +remaining (sorted): [5, 6, 8, 9] +% removeancestorsfrom(), example 2 +remaining (sorted): [11, 12, 13, 14] +% removeancestorsfrom(), example 3 +remaining (sorted): [3, 5] +% missingancestors(), example 1 +return [3, 7, 11] +% missingancestors(), example 2 +return [5, 10] +% missingancestors(), example 3 +return [3, 6, 9, 11] +% removeancestorsfrom(), bigger graph +Ok % lazy ancestor set for [], stoprev = 0, inclusive = False membership: [] iteration: [] diff --git a/tests/test-ancestor.py b/tests/test-ancestor.py --- a/tests/test-ancestor.py +++ b/tests/test-ancestor.py @@ -182,6 +182,64 @@ 5: [4, -1], 6: [4, -1], 7: [4, -1], 8: [-1, -1], 9: [6, 7], 10: [5, -1], 11: [3, 7], 12: [9, -1], 13: [8, -1]} +def test_missingancestors_explicit(): +"""A few explicit cases, easier to check for catching errors in refactors. + +The bigger graph at the end has been produced by the random generator +above, and we have some evidence that the other tests don't cover it. +""" +for i, (bases, revs) in enumerate((({1, 2, 3, 4, 7}, set(xrange(10))), + ({10}, set({11, 12, 13, 14})), + ({7}, set({1, 2, 3, 4, 5})), + )): +print("%% removeancestorsfrom(), example %d" % (i + 1)) +missanc = ancestor.incrementalmissingancestors(graph.get, bases) +missanc.removeancestorsfrom(revs) +print("remaining (sorted): %s" % sorted(list(revs))) + +for i, (bases, revs) in enumerate((({10}, {11}), + ({11}, {10}), + ({7}, {9, 11}), + )): +print("%% missingancestors(), example %d" % (i + 1)) +missanc = ancestor.incrementalmissingancestors(graph.get, bases) +print("return %s" % missanc.missingancestors(revs)) + +print("% removeancestorsfrom(), bigger graph") +vecgraph = [ +[-1, -1], [0, -1], [1, 0], [2, 1], [3, -1], [4, -1], [5, 1], +[2, -1], [7, -1], [8, -1], [9, -1], [10, 1], [3, -1], [12, -1], +[13, -1], [14, -1], [4, -1], [16, -1], [17, -1], [18, -1], +[19, 11], [20, -1], [21, -1], [22, -1], [23, -1], [2, -1], +[3, -1], [26, 24], [27, -1], [28, -1], [12, -1], [1, -1], [1, 9], +[32, -1], [33, -1], [34, 31], [35, -1], [36, 26], [37, -1], +[38, -1], [39, -1], [40, -1], [41, -1], [42, 26], [0, -1], +[44, -1], [45, 4], [40, -1], [47, -1], [36, 0], [49, -1], +[-1, -1], [51, -1], [52, -1], [53, -1], [14, -1], +[55, -1], [15, -1], [23, -1], [58, -1], [59, -1], [2, -1], +[61, 59], [62, -1], [63, -1], [-1, -1], [65, -1], +[66, -1], [67, -1], [68, -1], [37, 28], [69, 25], +[71, -1], [72, -1], [50, 2], [74, -1], [12, -1], +[18, -1], [77, -1], [78, -1], [79, -1], [43, 33], +[81, -1], [82, -1], [83, -1], [84, 45], [85, -1], +[86, -1], [-1, -1],