D5370: rust: core implementation of missingancestors (no bindings)

2018-12-15 Thread gracinet (Georges Racinet)
gracinet abandoned this revision.
gracinet marked an inline comment as done.
gracinet added a comment.


  This Differential has been superseded by https://phab.mercurial-scm.org/D5414 
through  https://phab.mercurial-scm.org/D5417

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-12 Thread gracinet (Georges Racinet)
gracinet marked 11 inline comments as done.
gracinet added a comment.


  @yuja trying to submit the v2 with phabsend instead of arcanist… hope it'll 
be linked properly

INLINE COMMENTS

> kevincox wrote in ancestors.rs:201
> Isn't this loop redundant with the retain call above?

indeed, must have been a leftover

> kevincox wrote in ancestors.rs:235
> I would prefer `let (p0, p1) = ...`. This makes it obvious exactly what data 
> you are getting.

In the next version, I'm simply doing a for loop on `.iter().cloned()`, it's 
even better imho.

> kevincox wrote in lib.rs:26
> You can add `.cloned()` to remove the references. I would be surprised if 
> this has poor code generation.
> 
> As for HashSet vs bit array I would stick to HashSet for now. HashSet also 
> has the advantage of being sparse. I guess after the initial implementation 
> we could benchmark the two to see what is better.

Thanks for the tip with `.cloned()` that's exactly what I was missing.

Out of curiosity, I took a look at the generated optimized assembly code: zero 
cost abstraction, indeed.

About HashSet, yes that's exactly what I'm thinking.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-06 Thread yuja (Yuya Nishihara)
yuja added a comment.


  >   @yuja: do you mean one of those Differential Revisions of this system for 
each commit, sure I can do.
  
  I don't know how Phabricator is working, but probably yes. I meant we want
  https://phab.mercurial-scm.org/D5370, https://phab.mercurial-scm.org/D5371, 
and https://phab.mercurial-scm.org/D5372 instead of a folded 
https://phab.mercurial-scm.org/D5370. I think phabsend will
  do that way.
  
  >   With respect to rust-cpython bindings, I'm currently waiting for feedback 
on https://github.com/dgrunwald/rust-cpython/issues/164
  >   Perhaps you'd have an idea ?
  
  No, but it looks interesting. I'll take a look this weekend.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D5370: rust: core implementation of missingancestors (no bindings)

2018-12-06 Thread Yuya Nishihara
>   @yuja: do you mean one of those Differential Revisions of this system for 
> each commit, sure I can do.

I don't know how Phabricator is working, but probably yes. I meant we want
D5370, D5371, and D5372 instead of a folded D5370. I think phabsend will
do that way.

>   With respect to rust-cpython bindings, I'm currently waiting for feedback 
> on https://github.com/dgrunwald/rust-cpython/issues/164
>   Perhaps you'd have an idea ?

No, but it looks interesting. I'll take a look this weekend.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-06 Thread gracinet (Georges Racinet)
gracinet added a comment.


  @yuja: do you mean one of those Differential Revisions of this system for 
each commit, sure I can do.
  
  With respect to rust-cpython bindings, I'm currently waiting for feedback on 
https://github.com/dgrunwald/rust-cpython/issues/164
  Perhaps you'd have an idea ?
  Short summary: sporadic segfaults that I can for now reproduce only by 
running the whole test suite.
  
  That being said, I do have rust-cpython bindings for AncestorsIterator and 
MissingAncestors (plus the full lazyancestors class). Unless I'm mistaken, that 
means that together with the existing C implementations, there's a potential 
for a full native version of `mercurial/ancestor.py`.
  I can send them to the mailing-list or here. What would be the correct flag 
for that ? RFC ?
  
  I'm also working on a perfmissingancestors (started a few hours ago). For now 
it confirms measurements I'd observed earlier: about x5 performance boost.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-06 Thread yuja (Yuya Nishihara)
yuja added a comment.


  Quickly scanned, and looks generally good to me.
  
  >   rust: iterator version of Graph.parents
  >   rust: translation of missingancestors
  >   rust: translated random test of missingancestors
  
  Can you send these as separate patches?
  
  >   An alternative would have been to expose to Python
  >   MissingAncestors but that would have meant
  >   
  >   - pollution of the release build used from Python,  whereas we do it in 
this changeset within the tests submodule
  >   - waiting on rust-cpython bindings to be ready or doing the cumbersome 
direct-ffi (more pollution with unsafe C code)
  
  Still I want some CPython interface to measure the perf win. Are there
  lots of work remaining to bring rust-cpython to us?
  
  >   although one could argue that actually parents() should return an
  >   array instead of a tuple, giving us a similar iterator for free (but on
  >   references instead of values, unless we also use the arrayvec crate
  >   could help). Notably, the current C-backed parents() internally uses an
  >   array for communication with C code, so that currently, we'd get less 
memory
  >   copy and less code using an array.
  
  I prefer changing parents() to return `[Revision; 2]`. Then, we can write a
  simple utility function that drops NULL_REVISION, `&[Revision; 2] -> 
&[Revision]`.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: yuja, durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D5370: rust: core implementation of missingancestors (no bindings)

2018-12-06 Thread Yuya Nishihara
Quickly scanned, and looks generally good to me.

>   rust: iterator version of Graph.parents
>   rust: translation of missingancestors
>   rust: translated random test of missingancestors

Can you send these as separate patches?

>   An alternative would have been to expose to Python
>   MissingAncestors but that would have meant
>   
>   - pollution of the release build used from Python,  whereas we do it in 
> this changeset within the tests submodule
>   - waiting on rust-cpython bindings to be ready or doing the cumbersome 
> direct-ffi (more pollution with unsafe C code)

Still I want some CPython interface to measure the perf win. Are there
lots of work remaining to bring rust-cpython to us?

>   although one could argue that actually parents() should return an
>   array instead of a tuple, giving us a similar iterator for free (but on
>   references instead of values, unless we also use the arrayvec crate
>   could help). Notably, the current C-backed parents() internally uses an
>   array for communication with C code, so that currently, we'd get less memory
>   copy and less code using an array.

I prefer changing parents() to return `[Revision; 2]`. Then, we can write a
simple utility function that drops NULL_REVISION, `&[Revision; 2] -> 
&[Revision]`.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-06 Thread kevincox (Kevin Cox)
kevincox added a comment.


  I think sticking close to the python version makes sense for the initial 
version. Then improvements can be made in follow-ups.

INLINE COMMENTS

> gracinet wrote in lib.rs:26
> The reason I've preferred to implement it directly is that `into_iter()` 
> iterates on references, which I found an unnecessary overhead,.
> 
> But maybe that's a case of premature optimization ? In the same vein, I don't 
> think `HashSet` is the best choice for a set of `i32`, unless it becomes 
> really big, but I don't know where the  threshold would be, compared to, say, 
> a bit array.

You can add `.cloned()` to remove the references. I would be surprised if this 
has poor code generation.

As for HashSet vs bit array I would stick to HashSet for now. HashSet also has 
the advantage of being sparse. I guess after the initial implementation we 
could benchmark the two to see what is better.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-05 Thread gracinet (Georges Racinet)
gracinet added a comment.


  I'll include your remark in the upcoming v2, thanks again !

INLINE COMMENTS

> kevincox wrote in ancestors.rs:159
> Is always having an item in the set actually saving many corner cases? It 
> seems like you usually check for empty sets anyways.

Not sure about this, so for now, I've been playing safe, ie, exactly as the 
Python version.

> kevincox wrote in lib.rs:20
> I think the better solution here is to make `parents` return `[Revision; 2]` 
> (or `&[Revision]` and not return nulls.

I agree, it was a pythonism to use a tuple, but it has impact on 
`AncestorsIterators`, as well as the implementation in `hg-direct-ffi`, and in 
more code I've not submitted yet, so I'd prefer to do that change independently 
in a follow-up.

> kevincox wrote in lib.rs:26
> Would something like the following be simpler?
> 
>   fn parents_iter(
>   ,
>   r: Revision,
>   ) -> Result, GraphError> {
>   let (p0, p1) = self.parents(r)?;
>   let iter = [p0, p1].into_iter()
>   .map(|r| if r == NULL_REVISION { None} else { Some(r) });
>   Ok(iter)
>   }

The reason I've preferred to implement it directly is that `into_iter()` 
iterates on references, which I found an unnecessary overhead,.

But maybe that's a case of premature optimization ? In the same vein, I don't 
think `HashSet` is the best choice for a set of `i32`, unless it becomes really 
big, but I don't know where the  threshold would be, compared to, say, a bit 
array.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-04 Thread kevincox (Kevin Cox)
kevincox added a comment.


  Overall looks good to me. Just a couple of points.
  
  - Using random numbers for tests without a seed that is logged will create 
failures which are basically impossible to reproduce.
  - A lot of the comments are comparing to the python, I don't know how useful 
this is to most readers.

INLINE COMMENTS

> ancestors.rs:155
> +where
> +I: IntoIterator,
> +{

I prefer doing `bases: impl  IntoIterator` instead of using a 
where clause.

> ancestors.rs:157
> +{
> +let mut bset: HashSet = bases.into_iter().collect();
> +if bset.is_empty() {

If this is going to be stored in the `bases` field I would just call it `bases`.

This will also allow you to do `MissingAncestors { graph, bases }` if you 
prefer.

> ancestors.rs:159
> +if bset.is_empty() {
> +bset.insert(NULL_REVISION);
> +}

Is always having an item in the set actually saving many corner cases? It seems 
like you usually check for empty sets anyways.

> ancestors.rs:170
> +0 => false, // shouldn't happen
> +1 => *(self.bases.iter().next().unwrap()) != -1,
> +_ => true,

s/-1/NULL_REVISISON

> ancestors.rs:195
> +) -> Result<(), GraphError> {
> +// using difference() or intersection() would not satisfy borrow 
> rules
> +revs.retain(|r| !self.bases.contains(r));

I don't think this comment is helpful to the reader.

> ancestors.rs:201
> +revs.remove();
> +}
> +if revs.is_empty() {

Isn't this loop redundant with the retain call above?

> ancestors.rs:219
> +// going all the way to the root
> +let keepcount = revs.iter().filter(|| *r > start).count();
> +

I find the pattern match and deref slightly confusing. I would prefer a `**` to 
make it obvious you are doing a double deref.

> ancestors.rs:235
> +fn add_parents( self, rev: Revision) -> Result<(), GraphError> {
> +let parents = self.graph.parents(rev)?;
> +// No need to bother the set with inserting NULL_REVISION over and

I would prefer `let (p0, p1) = ...`. This makes it obvious exactly what data 
you are getting.

> ancestors.rs:262
> +let bases_visit =  self.bases;
> +let mut revs_as_set: HashSet = revs
> +.into_iter()

I would simply call it `revs`.

> ancestors.rs:281
> +None => NULL_REVISION,
> +};
> +let start = max(max_bases, max_revs);

Replace the match with `.unwrap_or(NULL_REVISION)`

> ancestors.rs:286
> +let mut missing: Vec = Vec::new();
> +for curr in (0..start + 1).map(|i| start - i) {
> +if revs_visit.is_empty() {

If an inclusive range use `0..=start`.

> ancestors.rs:293
> +// another path
> +// TODO optim: Rust's HashSet.remove returns a boolean 
> telling
> +// if it happened. This will spare us one set lookup

You may as well do this now. Just move the remove into the if.

> ancestors.rs:309
> +missing.push(curr);
> +revs_visit.remove();
> +true

Do this remove in the if as well.

> ancestors.rs:484
> +fn test_missing_bases() {
> +let mut missanc = MissingAncestors::new(Stub, vec![5, 3, 1, 3]);
> +let mut as_vec: Vec;

Since you have used IntoIter you shouldn't need the `vec!`s.

> ancestors.rs:486
> +let mut as_vec: Vec;
> +as_vec = missanc.get_bases().iter().map(|| r).collect();
> +as_vec.sort();

Do this assignment on the same line as the declaration.

> lib.rs:20
> +
> +fn parents_iter(
> +,

I think the better solution here is to make `parents` return `[Revision; 2]` 
(or `&[Revision]` and not return nulls.

> lib.rs:26
> +Ok(ParentsIterator::new(parents))
> +}
> +}

Would something like the following be simpler?

  fn parents_iter(
  ,
  r: Revision,
  ) -> Result, GraphError> {
  let (p0, p1) = self.parents(r)?;
  let iter = [p0, p1].into_iter()
  .map(|r| if r == NULL_REVISION { None} else { Some(r) });
  Ok(iter)
  }

> lib.rs:36
> +/// Instead of NULL_REVISION, None is returned
> +/// Here a macro would probably be more in order for performance
> +impl ParentsIterator {

I don't think this comment is necessary. Also this should be quite transparent 
to the optimizer.

> lib.rs:52
> +self.cur += 1;
> +match match self.cur {
> +0 => self.parents.0,

I would prefer putting the result in a variable and then doing the second match 
later.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5370: rust: core implementation of missingancestors (no bindings)

2018-12-04 Thread gracinet (Georges Racinet)
gracinet created this revision.
Herald added subscribers: mercurial-devel, mjpieters, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  rust: iterator version of Graph.parents
  
  This is handy for callers that want to simply do:
  
for p in graph.parents_iter(rev)
  
  although one could argue that actually parents() should return an
  array instead of a tuple, giving us a similar iterator for free (but on
  references instead of values, unless we also use the arrayvec crate
  could help). Notably, the current C-backed parents() internally uses an
  array for communication with C code, so that currently, we'd get less memory
  copy and less code using an array.
  
  rust: translation of missingancestors
  
  This is as direct as possible a translation of the ancestor.missingancestors
  Python class in pure Rust. The goal for this changeset is to make it easy
  to compare with the Python version.
  
  We also add to Python tests the cases that helped us develop and debug
  this implementation.
  
  Some possible optimizations are marked along the way as TODO comments
  
  rust: translated random test of missingancestors
  
  An alternative would have been to expose to Python
  MissingAncestors but that would have meant
  
  - pollution of the release build used from Python,  whereas we do it in this 
changeset within the tests submodule
  - waiting on rust-cpython bindings to be ready or doing the cumbersome 
direct-ffi (more pollution with unsafe C code)

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D5370

AFFECTED FILES
  rust/Cargo.lock
  rust/hg-core/Cargo.toml
  rust/hg-core/src/ancestors.rs
  rust/hg-core/src/lib.rs
  tests/test-ancestor.py
  tests/test-ancestor.py.out

CHANGE DETAILS

diff --git a/tests/test-ancestor.py.out b/tests/test-ancestor.py.out
--- a/tests/test-ancestor.py.out
+++ b/tests/test-ancestor.py.out
@@ -1,3 +1,17 @@
+% removeancestorsfrom(), example 1
+remaining (sorted): [5, 6, 8, 9]
+% removeancestorsfrom(), example 2
+remaining (sorted): [11, 12, 13, 14]
+% removeancestorsfrom(), example 3
+remaining (sorted): [3, 5]
+% missingancestors(), example 1
+return [3, 7, 11]
+% missingancestors(), example 2
+return [5, 10]
+% missingancestors(), example 3
+return [3, 6, 9, 11]
+% removeancestorsfrom(), bigger graph
+Ok
 % lazy ancestor set for [], stoprev = 0, inclusive = False
 membership: []
 iteration:  []
diff --git a/tests/test-ancestor.py b/tests/test-ancestor.py
--- a/tests/test-ancestor.py
+++ b/tests/test-ancestor.py
@@ -182,6 +182,64 @@
  5: [4, -1], 6: [4, -1], 7: [4, -1], 8: [-1, -1], 9: [6, 7],
  10: [5, -1], 11: [3, 7], 12: [9, -1], 13: [8, -1]}
 
+def test_missingancestors_explicit():
+"""A few explicit cases, easier to check for catching errors in refactors.
+
+The bigger graph at the end has been produced by the random generator
+above, and we have some evidence that the other tests don't cover it.
+"""
+for i, (bases, revs) in enumerate((({1, 2, 3, 4, 7}, set(xrange(10))),
+   ({10}, set({11, 12, 13, 14})),
+   ({7}, set({1, 2, 3, 4, 5})),
+   )):
+print("%% removeancestorsfrom(), example %d" % (i + 1))
+missanc = ancestor.incrementalmissingancestors(graph.get, bases)
+missanc.removeancestorsfrom(revs)
+print("remaining (sorted): %s" % sorted(list(revs)))
+
+for i, (bases, revs) in enumerate((({10}, {11}),
+   ({11}, {10}),
+   ({7}, {9, 11}),
+   )):
+print("%% missingancestors(), example %d" % (i + 1))
+missanc = ancestor.incrementalmissingancestors(graph.get, bases)
+print("return %s" % missanc.missingancestors(revs))
+
+print("% removeancestorsfrom(), bigger graph")
+vecgraph = [
+[-1, -1], [0, -1], [1, 0], [2, 1], [3, -1], [4, -1], [5, 1],
+[2, -1], [7, -1], [8, -1], [9, -1], [10, 1], [3, -1], [12, -1],
+[13, -1], [14, -1], [4, -1], [16, -1], [17, -1], [18, -1],
+[19, 11], [20, -1], [21, -1], [22, -1], [23, -1], [2, -1],
+[3, -1], [26, 24], [27, -1], [28, -1], [12, -1], [1, -1], [1, 9],
+[32, -1], [33, -1], [34, 31], [35, -1], [36, 26], [37, -1],
+[38, -1], [39, -1], [40, -1], [41, -1], [42, 26], [0, -1],
+[44, -1], [45, 4], [40, -1], [47, -1], [36, 0], [49, -1],
+[-1, -1], [51, -1], [52, -1], [53, -1], [14, -1],
+[55, -1], [15, -1], [23, -1], [58, -1], [59, -1], [2, -1],
+[61, 59], [62, -1], [63, -1], [-1, -1], [65, -1],
+[66, -1], [67, -1], [68, -1], [37, 28], [69, 25],
+[71, -1], [72, -1], [50, 2], [74, -1], [12, -1],
+[18, -1], [77, -1], [78, -1], [79, -1], [43, 33],
+[81, -1], [82, -1], [83, -1], [84, 45], [85, -1],
+[86, -1], [-1, -1],