Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator
On Sun, 30 Sep 2018 15:47:43 +0200, Georges Racinet wrote: > On 9/30/18 5:46 AM, Yuya Nishihara wrote: > > On Fri, 28 Sep 2018 15:31:08 +0200, Georges Racinet wrote: > >> # HG changeset patch > >> # User Georges Racinet > >> # Date 1538060596 -7200 > >> # Thu Sep 27 17:03:16 2018 +0200 > >> # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb > >> # Parent d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7 > >> # EXP-Topic rustancestors-rfc > >> rust: pure Rust lazyancestors iterator > > First, thanks for tackling on the oxidization plain. I've quickly scanned > > a first few patches, and I'm generally positive to these. > Thanks for your interest and the positive feedback ! > > > > Here's random thoughts regarding the project/module layout: > > > > - place pure Rust code under rust/ ? > > > >I think that's the intention why we have the rust/ tree. Perhaps, the > >pure Rust modules can be packaged into a crate (e.g. hg-core), and the > >Rust FFI part will depend on it (e.g. hg-ffi). > Ah, I was thinking that the rust/ tree was dedicated to the executable. > > > >And the compiled CPython module (and maybe source) will be placed at > >mercurial/rust. > > > > - HGMODULEPOLICY=rust > > > >Perhaps, we'll want a separate policy to enable the Rust bindings. > >Most of the unimplemented functions will be forwarded to cext, just like > >the cffi -> pure fallback. > I have yet to understand a bit more how way policy build and runtime > works, but this sounds nice too. Would you see this as part of the > patchset or in a subsequent one ? That can be addressed later. But my concern is that this patch series hooks into the cext sources. I'm afraid of doing that because the cext is, say, uneasy to review, whereas it should be stable. I hope the Rust cext part will be in a separate module even if it partially depends on the current CPython extension. > >> +#[inline] > >> +fn conditionally_push_rev( self, rev: i32) { > >> +if self.stoprev <= rev && !self.seen.contains() { > >> +self.seen.insert(rev); > >> +self.visit.push(rev); > > We could eliminate the seen set by skipping duplicated revisions in next(). > > It's theoretically slower and was actually slower in Python, but might be > > worth trying in Rust. > > What would be the procedure for submission here ? Maybe as a subsequent > patch, so that performance comparison would be easier to make, or do you > suggest we'd go in that direction right away ? That's just an idea. No need to try out right way. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator
On 9/30/18 5:46 AM, Yuya Nishihara wrote: > On Fri, 28 Sep 2018 15:31:08 +0200, Georges Racinet wrote: >> # HG changeset patch >> # User Georges Racinet >> # Date 1538060596 -7200 >> # Thu Sep 27 17:03:16 2018 +0200 >> # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb >> # Parent d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7 >> # EXP-Topic rustancestors-rfc >> rust: pure Rust lazyancestors iterator > First, thanks for tackling on the oxidization plain. I've quickly scanned > a first few patches, and I'm generally positive to these. Thanks for your interest and the positive feedback ! > > Here's random thoughts regarding the project/module layout: > > - place pure Rust code under rust/ ? > >I think that's the intention why we have the rust/ tree. Perhaps, the >pure Rust modules can be packaged into a crate (e.g. hg-core), and the >Rust FFI part will depend on it (e.g. hg-ffi). Ah, I was thinking that the rust/ tree was dedicated to the executable. > >And the compiled CPython module (and maybe source) will be placed at >mercurial/rust. > > - HGMODULEPOLICY=rust > >Perhaps, we'll want a separate policy to enable the Rust bindings. >Most of the unimplemented functions will be forwarded to cext, just like >the cffi -> pure fallback. I have yet to understand a bit more how way policy build and runtime works, but this sounds nice too. Would you see this as part of the patchset or in a subsequent one ? > >> +impl AncestorsIterator { >> +/// Constructor. >> +/// >> +/// We actually expect initrevs to be i64, and we coerce them to the >> +/// i32 of our internal representations. The reason is that from C >> +/// code, they actually come as such. >> +/// Here it would be better to take any iterable over our internal >> +/// revision numbers, and have the conversion be made from the caller. >> +pub fn new( >> +graph: G, >> +initrevs: , >> +stoprev: i64, >> +inclusive: bool, >> +) -> Self { >> +let stop32 = stoprev as i32; >> +// because in iteration, contrary to what happens in ancestor.py, we >> +// compare with stoprev before pushing, we have to prefilter in the >> +// constructor too. >> +let deref_filter = initrevs.iter() >> +.map(|| x as i32).filter(|x| stop32 <= *x); > Nit: I think i64->i32 conversion should be done by the caller. The initrevs > argument can be IntoIterator instead. Yes, that's what I had in mind with the comment. I also have to get rid of these i64 to support 32 bit architectures. > >> +#[inline] >> +fn conditionally_push_rev( self, rev: i32) { >> +if self.stoprev <= rev && !self.seen.contains() { >> +self.seen.insert(rev); >> +self.visit.push(rev); > We could eliminate the seen set by skipping duplicated revisions in next(). > It's theoretically slower and was actually slower in Python, but might be > worth trying in Rust. What would be the procedure for submission here ? Maybe as a subsequent patch, so that performance comparison would be easier to make, or do you suggest we'd go in that direction right away ? The seen set can also be used by a future (although I do have experimental code) Rust implementation of __contains__, but this would have positive impact on performance only if the lazyiterator object gets used for iteration (growing a seen set) and then for membership (reusing the existing seen set). I suppose this doesn't happen a lot. Regards, -- Georges Racinet Anybox SAS, http://anybox.fr Téléphone: +33 6 51 32 07 27 GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator
On Fri, 28 Sep 2018 15:31:08 +0200, Georges Racinet wrote: > # HG changeset patch > # User Georges Racinet > # Date 1538060596 -7200 > # Thu Sep 27 17:03:16 2018 +0200 > # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb > # Parent d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7 > # EXP-Topic rustancestors-rfc > rust: pure Rust lazyancestors iterator First, thanks for tackling on the oxidization plain. I've quickly scanned a first few patches, and I'm generally positive to these. Here's random thoughts regarding the project/module layout: - place pure Rust code under rust/ ? I think that's the intention why we have the rust/ tree. Perhaps, the pure Rust modules can be packaged into a crate (e.g. hg-core), and the Rust FFI part will depend on it (e.g. hg-ffi). And the compiled CPython module (and maybe source) will be placed at mercurial/rust. - HGMODULEPOLICY=rust Perhaps, we'll want a separate policy to enable the Rust bindings. Most of the unimplemented functions will be forwarded to cext, just like the cffi -> pure fallback. > +impl AncestorsIterator { > +/// Constructor. > +/// > +/// We actually expect initrevs to be i64, and we coerce them to the > +/// i32 of our internal representations. The reason is that from C > +/// code, they actually come as such. > +/// Here it would be better to take any iterable over our internal > +/// revision numbers, and have the conversion be made from the caller. > +pub fn new( > +graph: G, > +initrevs: , > +stoprev: i64, > +inclusive: bool, > +) -> Self { > +let stop32 = stoprev as i32; > +// because in iteration, contrary to what happens in ancestor.py, we > +// compare with stoprev before pushing, we have to prefilter in the > +// constructor too. > +let deref_filter = initrevs.iter() > +.map(|| x as i32).filter(|x| stop32 <= *x); Nit: I think i64->i32 conversion should be done by the caller. The initrevs argument can be IntoIterator instead. > +#[inline] > +fn conditionally_push_rev( self, rev: i32) { > +if self.stoprev <= rev && !self.seen.contains() { > +self.seen.insert(rev); > +self.visit.push(rev); We could eliminate the seen set by skipping duplicated revisions in next(). It's theoretically slower and was actually slower in Python, but might be worth trying in Rust. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator
On 9/28/18 3:31 PM, Georges Racinet wrote: > # HG changeset patch > # User Georges Racinet > # Date 1538060596 -7200 > # Thu Sep 27 17:03:16 2018 +0200 > # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb > # Parent d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7 > # EXP-Topic rustancestors-rfc > rust: pure Rust lazyancestors iterator Hi, since this is my first attempt to contribute to Mercurial, I feel it's appropriate to introduce myself. I've been using Mercurial professionally for more than ten years, mostly within small teams of developers, so I guess it's time to give back to the community and thank everyone that's been working hard to make it happen. In the past few years, I've grown a big fondness for the Rust programming language, and been looking for opportunities to use it besides some personal toy projects. I'm aware that there's probably much to be said about this patch series, at least with code organization, choice of names etc. I read the general guidelines, tried and respect them, but it can't be perfect. I'm looking forward to comments on this RFC, especially from the authors of the OxidationPlan wiki page, hoping the approach taken in this patch series would fit enough with that. Regards, -- Georges Racinet Anybox SAS, http://anybox.fr Téléphone: +33 6 51 32 07 27 GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator
# HG changeset patch # User Georges Racinet # Date 1538060596 -7200 # Thu Sep 27 17:03:16 2018 +0200 # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb # Parent d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7 # EXP-Topic rustancestors-rfc rust: pure Rust lazyancestors iterator This is the first of a patch series aiming to provide an alternative implementation in the Rust programming language of the _lazyancestorsiter from the ancestor module. This iterator has been brought to our attention by the people at Octobus, as a potential good candidate for incremental "oxydation" (rewriting in Rust), because it has shown performance issues lately and it merely deals with ints (revision numbers) obtained by calling the index, whih should be directly callable from Rust code, being itself implemented as a C extension. The idea behind this series is to provide a minimal example of Rust code collaborating with existing C and Python code. To open the way to gradually rewriting more of Mercurial's python code in Rust, without being forced to pay a large initial cost of rewriting the existing fast core into Rust. This patch does not introduce any bindings to other Mercurial code yet, but someone with a rustc/cargo installation may chdir into mercurial/rust and run the tests by issuing: cargo test The algorithm is a bit simplified (see details in docstrings), and at its simplest becomes rather trivial, showcasing that Rust has batteries included too: BinaryHeap, the Rust analog of Python's heapq does actually all the work. The implementation can be further optimized and be made more idiomatic Rust. diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/Cargo.lock --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/mercurial/rust/Cargo.lock Thu Sep 27 17:03:16 2018 +0200 @@ -0,0 +1,4 @@ +[[package]] +name = "hgancestors" +version = "0.1.0" + diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/Cargo.toml --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/mercurial/rust/Cargo.toml Thu Sep 27 17:03:16 2018 +0200 @@ -0,0 +1,4 @@ +[package] +name = "hgancestors" +version = "0.1.0" +authors = ["Georges Racinet "] diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/rustfmt.toml --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/mercurial/rust/rustfmt.toml Thu Sep 27 17:03:16 2018 +0200 @@ -0,0 +1,3 @@ +max_width = 79 +wrap_comments = true +error_on_line_overflow = true diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/src/ancestors.rs --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/mercurial/rust/src/ancestors.rs Thu Sep 27 17:03:16 2018 +0200 @@ -0,0 +1,203 @@ +// ancestors.rs +// +// Copyright 2018 Georges Racinet +// +// This software may be used and distributed according to the terms of the +// GNU General Public License version 2 or any later version. + +//! Rust versions of generic DAG ancestors algorithms for Mercurial + +use std::iter::Iterator; +use std::collections::{BinaryHeap, HashSet}; + +/// The simplest expression of what we need of Mercurial DAGs. +/// +/// As noted in revlog.c, revision numbers are actually encoded in +/// 4 bytes, and are liberally converted to ints, whence the i32. It would +/// probably be a good idea to make this trait generic over a suitable integer +/// type. +pub trait Graph { +fn parents(, i32) -> (i32, i32); +} + +/// Iterator over the ancestors of a given list of revisions +/// This is a generic type, defined and implemented for any Graph, so that +/// it's easy to +/// +/// - unit test in pure Rust +/// - bind to main Mercurial code, potentially in several ways and have these +/// bindings evolve over time +pub struct AncestorsIterator { +graph: G, +visit: BinaryHeap, +seen: HashSet, +stoprev: i32, +} + +impl AncestorsIterator { +/// Constructor. +/// +/// We actually expect initrevs to be i64, and we coerce them to the +/// i32 of our internal representations. The reason is that from C +/// code, they actually come as such. +/// Here it would be better to take any iterable over our internal +/// revision numbers, and have the conversion be made from the caller. +pub fn new( +graph: G, +initrevs: , +stoprev: i64, +inclusive: bool, +) -> Self { +let stop32 = stoprev as i32; +// because in iteration, contrary to what happens in ancestor.py, we +// compare with stoprev before pushing, we have to prefilter in the +// constructor too. +let deref_filter = initrevs.iter() +.map(|| x as i32).filter(|x| stop32 <= *x); + +if inclusive { +return AncestorsIterator { +visit: deref_filter.clone().collect(), +seen: deref_filter.collect(), +stoprev: stop32, +graph: graph, +}; +} +let mut this = AncestorsIterator { +visit: BinaryHeap::new(), +seen: HashSet::new(), +stoprev: stop32, +