Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator

2018-09-30 Thread Yuya Nishihara
On Sun, 30 Sep 2018 15:47:43 +0200, Georges Racinet wrote:
> On 9/30/18 5:46 AM, Yuya Nishihara wrote:
> > On Fri, 28 Sep 2018 15:31:08 +0200, Georges Racinet wrote:
> >> # HG changeset patch
> >> # User Georges Racinet 
> >> # Date 1538060596 -7200
> >> #  Thu Sep 27 17:03:16 2018 +0200
> >> # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb
> >> # Parent  d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7
> >> # EXP-Topic rustancestors-rfc
> >> rust: pure Rust lazyancestors iterator
> > First, thanks for tackling on the oxidization plain. I've quickly scanned
> > a first few patches, and I'm generally positive to these.
> Thanks for your interest and the positive feedback !
> >
> > Here's random thoughts regarding the project/module layout:
> >
> >  - place pure Rust code under rust/ ?
> >
> >I think that's the intention why we have the rust/ tree. Perhaps, the
> >pure Rust modules can be packaged into a crate (e.g. hg-core), and the
> >Rust FFI part will depend on it (e.g. hg-ffi).
> Ah, I was thinking that the rust/ tree was dedicated to the executable.
> >
> >And the compiled CPython module (and maybe source) will be placed at
> >mercurial/rust.
> >
> >  - HGMODULEPOLICY=rust
> >
> >Perhaps, we'll want a separate policy to enable the Rust bindings.
> >Most of the unimplemented functions will be forwarded to cext, just like
> >the cffi -> pure fallback.
> I have yet to understand a bit more how way policy build and runtime
> works, but this sounds nice too. Would you see this as part of the
> patchset or in a subsequent one ?

That can be addressed later. But my concern is that this patch series hooks
into the cext sources. I'm afraid of doing that because the cext is, say,
uneasy to review, whereas it should be stable.

I hope the Rust cext part will be in a separate module even if it partially
depends on the current CPython extension.

> >> +#[inline]
> >> +fn conditionally_push_rev( self, rev: i32) {
> >> +if self.stoprev <= rev && !self.seen.contains() {
> >> +self.seen.insert(rev);
> >> +self.visit.push(rev);
> > We could eliminate the seen set by skipping duplicated revisions in next().
> > It's theoretically slower and was actually slower in Python, but might be
> > worth trying in Rust.
> 
> What would be the procedure for submission here ? Maybe as a subsequent
> patch, so that performance comparison would be easier to make, or do you
> suggest we'd go in that direction right away ?

That's just an idea. No need to try out right way.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator

2018-09-30 Thread Georges Racinet
On 9/30/18 5:46 AM, Yuya Nishihara wrote:
> On Fri, 28 Sep 2018 15:31:08 +0200, Georges Racinet wrote:
>> # HG changeset patch
>> # User Georges Racinet 
>> # Date 1538060596 -7200
>> #  Thu Sep 27 17:03:16 2018 +0200
>> # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb
>> # Parent  d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7
>> # EXP-Topic rustancestors-rfc
>> rust: pure Rust lazyancestors iterator
> First, thanks for tackling on the oxidization plain. I've quickly scanned
> a first few patches, and I'm generally positive to these.
Thanks for your interest and the positive feedback !
>
> Here's random thoughts regarding the project/module layout:
>
>  - place pure Rust code under rust/ ?
>
>I think that's the intention why we have the rust/ tree. Perhaps, the
>pure Rust modules can be packaged into a crate (e.g. hg-core), and the
>Rust FFI part will depend on it (e.g. hg-ffi).
Ah, I was thinking that the rust/ tree was dedicated to the executable.
>
>And the compiled CPython module (and maybe source) will be placed at
>mercurial/rust.
>
>  - HGMODULEPOLICY=rust
>
>Perhaps, we'll want a separate policy to enable the Rust bindings.
>Most of the unimplemented functions will be forwarded to cext, just like
>the cffi -> pure fallback.
I have yet to understand a bit more how way policy build and runtime
works, but this sounds nice too. Would you see this as part of the
patchset or in a subsequent one ?
>
>> +impl AncestorsIterator {
>> +/// Constructor.
>> +///
>> +/// We actually expect initrevs to be i64, and we coerce them to the
>> +/// i32 of our internal representations. The reason is that from C
>> +/// code, they actually come as such.
>> +/// Here it would be better to take any iterable over our internal
>> +/// revision numbers, and have the conversion be made from the caller.
>> +pub fn new(
>> +graph: G,
>> +initrevs: ,
>> +stoprev: i64,
>> +inclusive: bool,
>> +) -> Self {
>> +let stop32 = stoprev as i32;
>> +// because in iteration, contrary to what happens in ancestor.py, we
>> +// compare with stoprev before pushing, we have to prefilter in the
>> +// constructor too.
>> +let deref_filter = initrevs.iter()
>> +.map(|| x as i32).filter(|x| stop32 <= *x);
> Nit: I think i64->i32 conversion should be done by the caller. The initrevs
> argument can be IntoIterator instead.
Yes, that's what I had in mind with the comment. I also have to get rid
of these i64 to support 32 bit architectures.
>
>> +#[inline]
>> +fn conditionally_push_rev( self, rev: i32) {
>> +if self.stoprev <= rev && !self.seen.contains() {
>> +self.seen.insert(rev);
>> +self.visit.push(rev);
> We could eliminate the seen set by skipping duplicated revisions in next().
> It's theoretically slower and was actually slower in Python, but might be
> worth trying in Rust.

What would be the procedure for submission here ? Maybe as a subsequent
patch, so that performance comparison would be easier to make, or do you
suggest we'd go in that direction right away ?

The seen set can also be used by a future (although I do have
experimental code) Rust implementation of __contains__, but this would
have positive impact on performance only if the lazyiterator object gets
used for iteration (growing a seen set) and then for membership (reusing
the existing seen set). I suppose this doesn't happen a lot.

Regards,

-- 
Georges Racinet
Anybox SAS, http://anybox.fr
Téléphone: +33 6 51 32 07 27
GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics


___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator

2018-09-29 Thread Yuya Nishihara
On Fri, 28 Sep 2018 15:31:08 +0200, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet 
> # Date 1538060596 -7200
> #  Thu Sep 27 17:03:16 2018 +0200
> # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb
> # Parent  d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7
> # EXP-Topic rustancestors-rfc
> rust: pure Rust lazyancestors iterator

First, thanks for tackling on the oxidization plain. I've quickly scanned
a first few patches, and I'm generally positive to these.

Here's random thoughts regarding the project/module layout:

 - place pure Rust code under rust/ ?

   I think that's the intention why we have the rust/ tree. Perhaps, the
   pure Rust modules can be packaged into a crate (e.g. hg-core), and the
   Rust FFI part will depend on it (e.g. hg-ffi).

   And the compiled CPython module (and maybe source) will be placed at
   mercurial/rust.

 - HGMODULEPOLICY=rust

   Perhaps, we'll want a separate policy to enable the Rust bindings.
   Most of the unimplemented functions will be forwarded to cext, just like
   the cffi -> pure fallback.

> +impl AncestorsIterator {
> +/// Constructor.
> +///
> +/// We actually expect initrevs to be i64, and we coerce them to the
> +/// i32 of our internal representations. The reason is that from C
> +/// code, they actually come as such.
> +/// Here it would be better to take any iterable over our internal
> +/// revision numbers, and have the conversion be made from the caller.
> +pub fn new(
> +graph: G,
> +initrevs: ,
> +stoprev: i64,
> +inclusive: bool,
> +) -> Self {
> +let stop32 = stoprev as i32;
> +// because in iteration, contrary to what happens in ancestor.py, we
> +// compare with stoprev before pushing, we have to prefilter in the
> +// constructor too.
> +let deref_filter = initrevs.iter()
> +.map(|| x as i32).filter(|x| stop32 <= *x);

Nit: I think i64->i32 conversion should be done by the caller. The initrevs
argument can be IntoIterator instead.

> +#[inline]
> +fn conditionally_push_rev( self, rev: i32) {
> +if self.stoprev <= rev && !self.seen.contains() {
> +self.seen.insert(rev);
> +self.visit.push(rev);

We could eliminate the seen set by skipping duplicated revisions in next().
It's theoretically slower and was actually slower in Python, but might be
worth trying in Rust.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator

2018-09-28 Thread Georges Racinet
On 9/28/18 3:31 PM, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet 
> # Date 1538060596 -7200
> #  Thu Sep 27 17:03:16 2018 +0200
> # Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb
> # Parent  d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7
> # EXP-Topic rustancestors-rfc
> rust: pure Rust lazyancestors iterator

Hi,

since this is my first attempt to contribute to Mercurial, I feel it's
appropriate to introduce myself.

I've been using Mercurial professionally for more than ten years, mostly
within small teams of developers, so I guess it's time to give back to
the community and thank everyone that's been working hard to make it
happen. In the past few years, I've grown a big fondness for the Rust
programming language, and been looking for opportunities to use it
besides some personal toy projects.

I'm aware that there's probably much to be said about this patch series,
at least with code organization, choice of names etc. I read the general
guidelines, tried and respect them, but it can't be perfect.

I'm looking forward to comments on this RFC, especially from the authors
of the OxidationPlan wiki page, hoping the approach taken in this patch
series would fit enough with that.

Regards,

-- 
Georges Racinet
Anybox SAS, http://anybox.fr
Téléphone: +33 6 51 32 07 27
GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics


___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 5 RFC] rust: pure Rust lazyancestors iterator

2018-09-28 Thread Georges Racinet
# HG changeset patch
# User Georges Racinet 
# Date 1538060596 -7200
#  Thu Sep 27 17:03:16 2018 +0200
# Node ID d8c9571755a64e1fc3429587dfd3949b9862eceb
# Parent  d3d4b4b5f725124ef9e93cf74d779a7a05aa11b7
# EXP-Topic rustancestors-rfc
rust: pure Rust lazyancestors iterator

This is the first of a patch series aiming to provide an
alternative implementation in the Rust programming language
of the _lazyancestorsiter from the ancestor module.

This iterator has been brought to our attention by the people at
Octobus, as a potential good candidate for incremental "oxydation"
(rewriting in Rust), because it has shown performance issues lately
and it merely deals with ints (revision numbers) obtained by calling
the index, whih should be directly callable from Rust code,
being itself implemented as a C extension.

The idea behind this series is to provide a minimal example of Rust code
collaborating with existing C and Python code. To open the way to gradually
rewriting more of Mercurial's python code in Rust, without being forced to pay
a large initial cost of rewriting the existing fast core into Rust.

This patch does not introduce any bindings to other Mercurial code
yet, but someone with a rustc/cargo installation may chdir into
mercurial/rust and run the tests by issuing:

   cargo test

The algorithm is a bit simplified (see details in docstrings),
and at its simplest becomes rather trivial, showcasing that Rust has
batteries included too: BinaryHeap, the Rust analog of Python's heapq
does actually all the work.

The implementation can be further optimized and be made more idiomatic
Rust.

diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/Cargo.lock
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/mercurial/rust/Cargo.lock Thu Sep 27 17:03:16 2018 +0200
@@ -0,0 +1,4 @@
+[[package]]
+name = "hgancestors"
+version = "0.1.0"
+
diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/Cargo.toml
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/mercurial/rust/Cargo.toml Thu Sep 27 17:03:16 2018 +0200
@@ -0,0 +1,4 @@
+[package]
+name = "hgancestors"
+version = "0.1.0"
+authors = ["Georges Racinet "]
diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/rustfmt.toml
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/mercurial/rust/rustfmt.toml   Thu Sep 27 17:03:16 2018 +0200
@@ -0,0 +1,3 @@
+max_width = 79
+wrap_comments = true
+error_on_line_overflow = true
diff -r d3d4b4b5f725 -r d8c9571755a6 mercurial/rust/src/ancestors.rs
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/mercurial/rust/src/ancestors.rs   Thu Sep 27 17:03:16 2018 +0200
@@ -0,0 +1,203 @@
+// ancestors.rs
+//
+// Copyright 2018 Georges Racinet 
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+//! Rust versions of generic DAG ancestors algorithms for Mercurial
+
+use std::iter::Iterator;
+use std::collections::{BinaryHeap, HashSet};
+
+/// The simplest expression of what we need of Mercurial DAGs.
+///
+/// As noted in revlog.c, revision numbers are actually encoded in
+/// 4 bytes, and are liberally converted to ints, whence the i32. It would
+/// probably be a good idea to make this trait generic over a suitable integer
+/// type.
+pub trait Graph {
+fn parents(, i32) -> (i32, i32);
+}
+
+/// Iterator over the ancestors of a given list of revisions
+/// This is a generic type, defined and implemented for any Graph, so that
+/// it's easy to
+///
+/// - unit test in pure Rust
+/// - bind to main Mercurial code, potentially in several ways and have these
+///   bindings evolve over time
+pub struct AncestorsIterator {
+graph: G,
+visit: BinaryHeap,
+seen: HashSet,
+stoprev: i32,
+}
+
+impl AncestorsIterator {
+/// Constructor.
+///
+/// We actually expect initrevs to be i64, and we coerce them to the
+/// i32 of our internal representations. The reason is that from C
+/// code, they actually come as such.
+/// Here it would be better to take any iterable over our internal
+/// revision numbers, and have the conversion be made from the caller.
+pub fn new(
+graph: G,
+initrevs: ,
+stoprev: i64,
+inclusive: bool,
+) -> Self {
+let stop32 = stoprev as i32;
+// because in iteration, contrary to what happens in ancestor.py, we
+// compare with stoprev before pushing, we have to prefilter in the
+// constructor too.
+let deref_filter = initrevs.iter()
+.map(|| x as i32).filter(|x| stop32 <= *x);
+
+if inclusive {
+return AncestorsIterator {
+visit: deref_filter.clone().collect(),
+seen: deref_filter.collect(),
+stoprev: stop32,
+graph: graph,
+};
+}
+let mut this = AncestorsIterator {
+visit: BinaryHeap::new(),
+seen: HashSet::new(),
+stoprev: stop32,
+