Re: Testing an RC of evolve 11.0.0 and topic 1.0.0

2023-02-13 Thread Georges Racinet

Hi Anton,

On 1/30/23 16:24, Anton Shestakov wrote:
We're planning to release a feature release candidate of evolve and 
topic extensions that contains an implementation of a concept that was 
in the plans for quite some time: topic namespaces (the name could 
change, if we find something better) 
https://www.mercurial-scm.org/wiki/TopicPlan#sub_branches.2C_namespacing_and_representation


I'm glad to see topic namespaces going forward, since they will be an 
important part of user experience improvements in Heptapod (drive-by 
contributons, notably). I have a few questions.




In short, topic namespaces are trying to fix one of our major UX 
flaws: branches can continue to be used for release management, and 
topics can still be used for implementing features, but now there 
would be something for organizing topics into wider concepts. Topic 
namespaces, just like topics, are designed to go away when you publish 
your commits, and we tried to make them as unobtrusive as possible, to 
not get in the way of current workflows. There is a default topic 
namespace that will not clutter the UI, and users can keep using only 
branches and topics as usual.


But we did change some things in the UI. Remember seeing 
"branch:topic" format in `hg branches`? Well, it was a temporary 
implementation detail. With the implementation of topic namespaces we 
introduce a more thought-out way to represent branches plus topics. In 
this new format, topic namespaces become sort of a glue for all things 
related to (named) branching, e.g. this is what you'd see in hg log:


  branch//namespace/topic

or, if you don't set topic namespace:

  branch//topic


Internally, Heptapod relies on the `branch:topic` syntax, more 
specifically as keys in the branchmap. Well, it relies on lots of 
Mercurial and evolve internals (I dare not say APIs).


The good news here is that simply replacing `:` with `//` makes all the 
tests pass. Given the full coverage, I'm fairly confident that there are 
no other problems, as far as we're only talking about the compatibility 
itself with (server-side) Heptapod.


Overall it looks like an unnecessary complication to try and support 
both syntaxes in Heptapod. I'm more comfortable keeping my compatibility 
fixes around and making the switch once Evolve 11.0.0 is released.


My questions:

- will there be in 11.0 a way for clients to specify topic namespaces? 
I'd like to test what will happen if some client sends namespaced topics 
to the server before it is fully ready to interpret them (i.e. map them 
to the permissions model).


- what will happen when topic namespaces are fully implemented and users 
with, say hg-evolve 12.0, push to a server with hg-evolve 10.5? Can I 
simulate that from a Python interpreter already?


- will 11.0 be required to support the upcoming Mercurial 6.4?

Best,

--
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@lists.mercurial-scm.org
https://lists.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: call for Emacs testers (and opinion) native support of hg ci -i

2022-10-11 Thread Georges Racinet

Hi Uwe,


On 10/10/2022 22:17, Uwe Brauer wrote:

Hi

I am not sure how many mercurial users, also use GNU emacs for their
work. I do, and I am mostly happy with vc-hg.el provided by GNU emacs.

There is however one feature I really miss,

  hg ci -i

So that you can just commit certain hunks.



Be it as it may, right now, GNU emacs starts to support that feature
natively!
It is already implemented for git on the master branch for the developer
version.

And just today Dmitry Gutov kindly sent me a patch.

Sounds really great.



Dmitry unfortunately uses only git and is not very acquainted with
mercurial so he asked me to test it. I did and found a bit strange
behavior that I would like to discuss below.
So I am looking for mercurial+emacs users who are interested in this
issue and would like to test the patch and send their comments or
recommendations, best would be to the emacs dev list or to me.


This is interesting and I'd be happy to test it, if I get the time (that 
is the hard part as you can imagine). Such a feature could even get me 
to actually use `vc-hg`, especially if we can get the variant for `hg 
amend -i`.




Let me describe how is this is supposed to work


 1. In a file of your hg repository you start vc-diff (or
diff-hl-diff-goto-hunk which is also provided Dmitry and I can
only recommend).

 2. You visit the diff buffer and manipulates the hunks as you wish:
delete some, split others etc

 3. You call  vc-next-action a windows appears for the commit message
(if you use commit-patch, you would use bound to   C-c C-c)

 4. It seems to work fine, the only problem I faced is that contrary
to  commit-patch
@ is *not* updated, that is hg up tip has not been run. I think
it should, since this is the standard behavior. What do others think


Can you please clarify this last point? Do you mean that the working 
directory parent is not the changeset that has just been created, or is 
that a matter of bookmarks?





How to test it?

 1. Clone or pull git://git.savannah.gnu.org/emacs.git (I recommend
to use git, since hg with hg-git needs hours)

 2. Apply the attached patch with git apply

 3. Configure and compiles

 4. Either install it or  cd $HOME/src/emacs-git/src (or wherever you
have your source code) and run./emacs


Believe it or not, I've never compiled Emacs that I can remember. Could 
you please point me to quick instructions to compile it in text mode 
only, like `emacs -nw` does ? I suppose it's much simpler and faster, as 
GUI dependencies tend to be terrible, and it will be remind me of VTs 
time ;-)


Best,



--
Georges Racinet
https://octobus.net, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@lists.mercurial-scm.org
https://lists.mercurial-scm.org/mailman/listinfo/mercurial-devel


Heptapod CI: OSUOSL Linux runners transition and sharding

2022-06-18 Thread Georges Racinet
Dear Mercurial developer

I want to point out that the `osuosl-xy-docker-x86_64` shared runners of
foss.heptapod.net are in the process of being replaced by vastly
different new systems at OSUOSL. I won't delve too much in detail here,
you are invited to follow the tracking issue [1]

The Mercurial project is a big user of these runners, and perhaps the
most delicate to handle, as its test suite requires many cores / virtual
CPUs for the tests to conclude in acceptable time. We are doing our
best, but we won't know how it fares until we try the new runners for
good. It is possible that the sharding strategy outlined below will
become necessary.

Performance evaluation should occur within the next week.

Please don't be surprised if there are slow runs and check what is
happening on the tracking issue [1].

= CI sharding =

The general trends in continuous integration since the likes of Travis
arose has been to favor lots of smaller running systems rather than a
few big ones. This allows for a higher density and less locking of
resources. Many projects with large test suites have implemented a
sharding strategy, in which each tests run is split in as many as
necessary to run on small systems. The industry seems to have settled
onto 2vCPUs per job (this is the standard in GitHub actions, gitlab.com
shared runners, BitBucket, etc.).

For instance, I am running the Ruby tests of Heptapod with up to 28 jobs
and its functional tests with 3 jobs per variant (e.g, native Mercurial,
hg-git based…).

I highly recommend that the Mercurial project implements such a sharding
strategy. Without it, we are forced to provision runners with many
vCPUs, and that means that any single-threaded job of other projects
will needlessly block lots of cores [3].

Some test runners [2] actually gather the execution times at the finest
possible granularity possible, in order to compute shards that are
expected to run in roughly equal times. There is much more to gain in my
humble opinion by starting with a simple modulo on a numeric test ID,
because the statistics-based balancing means is a time-consuming effort
and doesn't work so well on heterogeneous swarms of runners like ours.

Best,

[1] https://foss.heptapod.net/heptapod/foss.heptapod.net/-/issues/137.

[2] Notably, in Ruby realm, this is what Knapsack does. Advanced CI
definitions like GitLab's own pipeline have stages to update the
statistics automatically. Setting this up is a serious effort, even if a
test runner that implements it is available. By contrast, pytest-shard
is much more primitive (a simple modulo), but that is good enough for
Heptapod's functional tests.

[3] AS of today, we have a maximum concurrency of 2 jobs per runner on
foss.h.n shared runners. On the machines that are about to be
decommisioned, that means reserving a whole CPU with 8 cores, even for
single-threaded jobs.

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Moving away from Phabricator

2022-02-28 Thread Georges Racinet
orkflow resembled a more modern one, and I can't say I can fault
them for that. We are not in a position where we are swarmed by
contributions where filtering the ones that are not quite
up-to-par is a real issue. (The only other Mercurial forge that
I've had some experience with is Sourcehut⁵ and while its
absolutely stellar performance and accessibility are great
advantages over Heptapod, the email-only contribution system is a
definite no-go for me.)

The current default workflow (but not the only supported one) on
Heptapod is:
- People contribute using topics (from the `topic` extension)
which can be asked for review through Merge Requests(MR)
- The CI is automatically run on the MR
- MR is approved and/or merged by maintainers
- What happens upon merge depends on the project policy
(fast-forward or merge changeset), but the result is *currently*
always made public.

This workflow has worked quite well for the vast majority of
Heptapod users (others use named branches), and I've been
personally quite satisfied with it both as a contribution tool and
a review tool in other projects.

There are plenty of small adjustments that can be made as to MR
policy, which brings us to a good point in favor of Heptapod: we
have knowledge and control within the Mercurial community. If we
want to keep the `hg-committed` workflow that does not publish
automatically for example, it's something that can be developed
(note that this not my opinion on the current `hg-committed`
workflow, just being thorough). We are currently working on
enabling any authenticated user to submit a MR, which should help
a lot especially with drive-by contributors, who currently have to
ask for permission on their first submission⁶.

This email started with the mention of our current VM going out of
support; moving to `foss.heptapod.net <http://foss.heptapod.net>`
will remove the maintenance burden from Mercurial itself with no
administrative overhead (and it's free!). The current de-facto CI
for Mercurial is already hosted on `foss.heptapod.net
<http://foss.heptapod.net>`, so this will only bring more
attention to the CI and make its use more widespread and simple.

The VCS parts of Heptapod/GitLab represent a pretty small part of
the entire feature set, basically all other Gitlab Community
features will be available to us should we use them. One such
feature is the issue tracker, which is actually quite good. There
is a bugzilla integration plugin which would allow us to
progressively migrate the current bugzilla to further reduce
sysadmin pressure. Also, like in Phabricator, dealing with spam is
basically impossible in bugzilla, while it's very much possible in
Heptapod as we have done in the past on other `foss.heptapod.net
<http://foss.heptapod.net>` projects. This particular issue of
migrating bugzilla is probably less urgent and can be discussed
separately. The same goes for further automation of the release
process which is currently a bit too manual and too bus-factor-y.

From a sysadmin standpoint, it would be much easier to simply have
moved to Heptapod before June, removing the need to migrate Phab
to another VM. From Heptapod's standpoint it's not really
advisable to try the migration until at least 6.1 is out and used
in Heptapod (which should be around mid-March) since it fixes a
*massive* performance issue with exchange. Other roadblocks (like
workflow, access rights, etc.) will have to be addressed by this
point and will determine the course of action: they will have to
be brought up by people that have been contributing and reviewing
recently.

Finally, `phab.mercurial-scm.org <http://phab.mercurial-scm.org>`
will have to become a static archive of the relevant parts of
Phabricator in its final state so that the links to discussion
around previous patches still work.

What do you all think?

Raphaël

[1] https://phab.mercurial-scm.org
[2]

https://admin.phacility.com/phame/post/view/11/phacility_is_winding_down_operations/
[3] https://heptapod.net/
[4] https://octobus.net/
[5] https://sourcehut.org
[6]
https://www.mercurial-scm.org/wiki/TopicPlan#Use_cases_for_topic_namespaces
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel



--
Georges Racinet
https://octobus.net,https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics
___
Mercurial

Heptapod mini-sprint on Nov 5th

2021-10-28 Thread Georges Racinet
Dear users and developers,

The next occurrence of monthly remote mini-sprints organized by Octobus,
will be about Heptapod (GitLab + Mercurial).

The rallying point will be https://mattermost.heptapod.net, our chat system.

There is no fixed topic, you may bring your own. Of course I have a few
ideas, but I don't want to force anything.

Our timezone is, or rather will be, CET, or if you prefer UTC+1. Of
course people from other timezones are welcome to join when they can!

I'll be on leave until Nov 3rd, watching the mailing-lists only very
loosely.

Best,

-- 
Georges Racinet
https://octobus.net, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D11703: rust-nodemap: backed out mitigation for issue 6554

2021-10-19 Thread gracinet (Georges Racinet)
gracinet created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  This is a backout of changeset 3fffb48539ee 
.
  
  Issue 6554 is now considered solved, hence its mitigation
  has to be removed, if only for its performance cost.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D11703

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs
  tests/test-persistent-nodemap.t

CHANGE DETAILS

diff --git a/tests/test-persistent-nodemap.t b/tests/test-persistent-nodemap.t
--- a/tests/test-persistent-nodemap.t
+++ b/tests/test-persistent-nodemap.t
@@ -435,46 +435,6 @@
   data-length: 121088
   data-unused: 0
   data-unused: 0.000%
-
-Sub-case: fallback for corrupted data file
---
-
-Sabotaging the data file so that nodemap resolutions fail, triggering fallback 
to
-(non-persistent) C implementation.
-
-
-  $ UUID=`hg debugnodemap --metadata| grep 'uid:' | \
-  > sed 's/uid: //'`
-  $ FILE=.hg/store/00changelog-"${UUID}".nd
-  $ python -c "fobj = open('$FILE', 'r+b'); fobj.write(b'\xff' * 121088); 
fobj.close()"
-
-The nodemap data file is still considered in sync with the docket. This
-would fail without the fallback to the (non-persistent) C implementation:
-
-  $ hg log -r b355ef8adce0949b8bdf6afc72ca853740d65944 -T '{rev}\n' --traceback
-  5002
-
-The nodemap data file hasn't been fixed, more tests can be inserted:
-
-  $ hg debugnodemap --dump-disk | f --bytes=256 --hexdump --size
-  size=121088
-  : ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  0090: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  00a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  00b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  00c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  00d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  00e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-  00f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
-
   $ mv ../tmp-data-file $FILE
   $ mv ../tmp-docket .hg/store/00changelog.n
 
diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -59,22 +59,12 @@
 
 /// Return Revision if found, raises a bare `error.RevlogError`
 /// in case of ambiguity, same as C version does
-def get_rev(, pynode: PyBytes) -> PyResult> {
+def get_rev(, node: PyBytes) -> PyResult> {
 let opt = self.get_nodetree(py)?.borrow();
 let nt = opt.as_ref().unwrap();
 let idx = &*self.cindex(py).borrow();
-let node = node_from_py_bytes(py, )?;
-match nt.find_bin(idx, node.into())
-{
-Ok(None) =>
-// fallback to C implementation, remove once
-// https://bz.mercurial-scm.org/show_bug.cgi?id=6554
-// is fixed (a simple backout should do)
-self.call_cindex(py, "get_rev", ::new(py, 
&[pynode.into_object()]), None)?
-.extract(py),
-Ok(Some(rev)) => Ok(Some(rev)),
-Err(e) => Err(nodemap_error(py, e)),
-}
+let node = node_from_py_bytes(py, )?;
+nt.find_bin(idx, node.into()).map_err(|e| nodemap_error(py, e))
 }
 
 /// same as `get_rev()` but raises a bare `error.RevlogError` if node
@@ -104,34 +94,27 @@
 }
 }
 
-def partialmatch(, pynode: PyObject) -> PyResult> {
+def partialmatch(, node: PyObject) -> PyResult> {
 let opt = self.get_nodetree(py)?.borrow();
 let nt = opt.as_ref().unwrap();
 let idx = &*self.cindex(py).borrow();
 
 let node_as_string = if cfg!(feature = "python3-sys") {
-pynode.cast_as::(py)?.to_string(py)?.to_string()
+node.cast_as::(py)?.to_string(py)?.to_string()
 }
 else {
-let node = pynode.extract::(py)?;
+let node = node.extract::(py)?;
 String::from_utf8_lossy(node.data(py)).to_string()
 };
 
 let prefix = NodePrefix::from_hex(_as_string).map_err(|_| 
PyErr::new::(py, "Invalid 

D11238: rust-nodemap: falling back to C impl as mitigation

2021-08-02 Thread gracinet (Georges Racinet)
gracinet created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  This is a mitigation for https://bz.mercurial-scm.org/show_bug.cgi?id=6554
  
  We see sometimes almost all data except the most recent revisions
  removed from the persistent nodemap, but we don't know how to
  reproduce yet.
  
  This has sadly repercussions beyond just needing to reconstruct the
  persistent nodemap: for instance, this automatically filters out
  all bookmarks pointing to revisions that the nodemap cannot resolve.
  If such filtering happens in a transaction, the update of the
  bookmarks file that happens at the end of transaction loses all
  bookmarks that have been affected. There may be similar consequences
  for other data.
  
  So this is a data loss, something that we have to prevent as soon as
  possible.
  
  As a mitigation measure, we will now fallback to the C implementation
  in case nodemap lookups failed. This will add some latency, e.g., in
  discovery, yet less than disabling the persistent nodemap entirely.
  
  We considered implementing the fallback directly on the Python
  side, but `revlog.get_rev()` is not systematically used, there are
  also several direct calls to the index method (`self.index.rev()` for
  a `revlog` instance). It is therefore more direct to implement the
  mitigation in the rust-cpython wrapper.

REPOSITORY
  rHG Mercurial

BRANCH
  stable

REVISION DETAIL
  https://phab.mercurial-scm.org/D11238

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs
  tests/test-persistent-nodemap.t

CHANGE DETAILS

diff --git a/tests/test-persistent-nodemap.t b/tests/test-persistent-nodemap.t
--- a/tests/test-persistent-nodemap.t
+++ b/tests/test-persistent-nodemap.t
@@ -428,6 +428,46 @@
   data-length: 121088
   data-unused: 0
   data-unused: 0.000%
+
+Sub-case: fallback for corrupted data file
+--
+
+Sabotaging the data file so that nodemap resolutions fail, triggering fallback 
to
+(non-persistent) C implementation.
+
+
+  $ UUID=`hg debugnodemap --metadata| grep 'uid:' | \
+  > sed 's/uid: //'`
+  $ FILE=.hg/store/00changelog-"${UUID}".nd
+  $ python -c "fobj = open('$FILE', 'r+b'); fobj.write(b'\xff' * 121088); 
fobj.close()"
+
+The nodemap data file is still considered in sync with the docket. This
+would fail without the fallback to the (non-persistent) C implementation:
+
+  $ hg log -r b355ef8adce0949b8bdf6afc72ca853740d65944 -T '{rev}\n' --traceback
+  5002
+
+The nodemap data file hasn't been fixed, more tests can be inserted:
+
+  $ hg debugnodemap --dump-disk | f --bytes=256 --hexdump --size
+  size=121088
+  : ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  0090: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  00a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  00b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  00c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  00d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  00e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+  00f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ||
+
   $ mv ../tmp-data-file $FILE
   $ mv ../tmp-docket .hg/store/00changelog.n
 
diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -59,12 +59,19 @@
 
 /// Return Revision if found, raises a bare `error.RevlogError`
 /// in case of ambiguity, same as C version does
-def get_rev(, node: PyBytes) -> PyResult> {
+def get_rev(, pynode: PyBytes) -> PyResult> {
 let opt = self.get_nodetree(py)?.borrow();
 let nt = opt.as_ref().unwrap();
 let idx = &*self.cindex(py).borrow();
-let node = node_from_py_bytes(py, )?;
-nt.find_bin(idx, node.into()).map_err(|e| nodemap_error(py, e))
+let node = node_from_py_bytes(py, )?;
+match nt.find_bin(idx, node.into())
+{
+Ok(None) =>
+self.call_cindex(py, "get_rev", ::new(py, 
&[pynode.into_object()]), None)?
+.extract(py),
+Ok(Some(rev)) => Ok(Some(rev)),
+Err(e) => Err(nodemap_error(py, e)),
+}
 }
 
  

D11204: hgwebdir: avoid systematic full garbage collection

2021-07-20 Thread gracinet (Georges Racinet)
gracinet created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  Forcing a systematic full garbage collection upon each request
  can serioulsy harm performance. This is reported as
  https://bz.mercurial-scm.org/show_bug.cgi?id=6075
  
  With this change we're performing the full collection according
  to a new setting, `web.full-garbage-collection-rate`.
  The default value is 0, which means not to force any
  full garbage collection.
  
  In my limited and biased towards Python3 experience chasing memory
  leaks in Mercurial servers, the full collection never
  made a difference in terms of memory footprint. It is possible that
  the condition that it was mitigating when it was introduced is not
  there anymore, or that it is specific to some repositories.
  
  On the other hand, as explained in the Python developer docs [1],
  frequent full collections are very harmful in terms of performance if
  lots of objects survive the collection, and hence stay in the
  oldest generation. Note that `gc.collect()` is indeed trying to
  collect the oldest generation [2]. This happens usually in two cases:
  
  - unwanted lingering objects (i.e., an actual memory leak that the GC cannot 
do anything about). Sadly, we have lots of those these days.
  - desireable long-term objects, typically in caches. This is an area of 
interest of mine.
  
  In short, the flat rate that this change still permits is
  probably a bad idea in most cases, which is why it is set to zero
  by default. It is possible, though, that a value like 100 or 1000
  could be a good trade-off if someone has a repository for which the GC
  can actually mitigate excessive memory usage.
  
  The test is inspired from test-hgwebdir-paths.py
  
  [1] 
https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation
  [2] https://docs.python.org/3/library/gc.html#gc.collect

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D11204

AFFECTED FILES
  mercurial/configitems.py
  mercurial/hgweb/hgwebdir_mod.py
  tests/test-hgwebdir-gc.py

CHANGE DETAILS

diff --git a/tests/test-hgwebdir-gc.py b/tests/test-hgwebdir-gc.py
new file mode 100644
--- /dev/null
+++ b/tests/test-hgwebdir-gc.py
@@ -0,0 +1,44 @@
+from __future__ import absolute_import
+
+import os
+from mercurial.hgweb import hgwebdir_mod
+
+hgwebdir = hgwebdir_mod.hgwebdir
+
+os.mkdir(b'webdir')
+os.chdir(b'webdir')
+
+webdir = os.path.realpath(b'.')
+
+
+def trivial_response(req, res):
+return []
+
+
+def make_hgwebdir(gc_rate):
+config = os.path.join(webdir, b'hgwebdir.conf')
+with open(config, 'wb') as configfile:
+configfile.write(b'[web]\n')
+configfile.write(b'full-garbage-collection-rate=%d\n' % gc_rate)
+
+hg_wd = hgwebdir(config)
+hg_wd._runwsgi = trivial_response
+return hg_wd
+
+
+def process_requests(webdir_instance, number):
+# we don't care for now about passing realistic arguments
+for _ in range(number):
+for chunk in webdir_instance.run_wsgi(None, None):
+pass
+
+
+without_gc = make_hgwebdir(gc_rate=0)
+process_requests(without_gc, 5)
+assert without_gc.requests_count == 5
+assert without_gc.gc_full_collections_done == 0
+
+with_gc = make_hgwebdir(gc_rate=2)
+process_requests(with_gc, 5)
+assert with_gc.requests_count == 5
+assert with_gc.gc_full_collections_done == 2
diff --git a/mercurial/hgweb/hgwebdir_mod.py b/mercurial/hgweb/hgwebdir_mod.py
--- a/mercurial/hgweb/hgwebdir_mod.py
+++ b/mercurial/hgweb/hgwebdir_mod.py
@@ -285,6 +285,7 @@
 self.lastrefresh = 0
 self.motd = None
 self.refresh()
+self.requests_count = 0
 if not baseui:
 # set up environment for new ui
 extensions.loadall(self.ui)
@@ -341,6 +342,10 @@
 
 self.repos = repos
 self.ui = u
+self.gc_full_collect_rate = self.ui.configint(
+b'web', b'full-garbage-collection-rate'
+)
+self.gc_full_collections_done = 0
 encoding.encoding = self.ui.config(b'web', b'encoding')
 self.style = self.ui.config(b'web', b'style')
 self.templatepath = self.ui.config(
@@ -383,12 +388,25 @@
 finally:
 # There are known cycles in localrepository that prevent
 # those objects (and tons of held references) from being
-# collected through normal refcounting. We mitigate those
-# leaks by performing an explicit GC on every request.
-# TODO remove this once leaks are fixed.
-# TODO only run this on requests that create localrepository
-# instances instead of every request.
-gc.collect()
+# collected through normal refcounting.
+# In some cases, the resulting memory consumption can
+# be tamed by performing explicit garbage 

Re: [PATCH STABLE] cext: fix memory leak in phases computation

2021-06-06 Thread Georges Racinet
On 6/7/21 12:29 AM, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet 
> # Date 1622935470 -7200
> #  Sun Jun 06 01:24:30 2021 +0200
> # Branch stable
> # Node ID be560b55eb7cfe25c68fb6fab5417fab6688cf84
> # Parent  5ac0f2a8ba7205266a206ad8da89a79173e8efea
> # EXP-Topic memleak-phases
> cext: fix memory leak in phases computation

Can also directly be pulled from
https://foss.heptapod.net/octobus/mercurial-devel

Corresponding CI run :
https://foss.heptapod.net/octobus/mercurial-devel/-/pipelines/22818

Thanks!

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH STABLE] cext: fix memory leak in phases computation

2021-06-06 Thread Georges Racinet
# HG changeset patch
# User Georges Racinet 
# Date 1622935470 -7200
#  Sun Jun 06 01:24:30 2021 +0200
# Branch stable
# Node ID be560b55eb7cfe25c68fb6fab5417fab6688cf84
# Parent  5ac0f2a8ba7205266a206ad8da89a79173e8efea
# EXP-Topic memleak-phases
cext: fix memory leak in phases computation

Without this a buffer whose size in bytes is the number of
changesets in the repository is leaked each time the repository is
opened and changeset phases are computed.

Impact: the current code in hgwebdir creates a new `localrepository`
instance for each HTTP request. Since any pull or push is made of several
requests, a team of 100 people can easily produce thousands of such
requests per day.

Being a low-level malloc, this leak can't be seen with the gc module and
tools relying on that, but was spotted by valgrind immediately.

Reproduction


  for i in range(cl_args.iterations):
  repo = hg.repository(baseui, repo_path)
  rev = repo.revs(rev).first()
  ctx = repo[rev]

  del ctx
  del repo
  # avoid any pollution by other type of leak
  # (that should be fixed in 5.8)
  repoview._filteredrepotypes.clear()

  gc.collect()

Measurements


Resident Set Size (RSS), taken on a clone of
mozilla-central for performance analysis (440 000
changesets).

before:
  5.8+hg19.5ac0f2a8ba72  1000 iterations: 1606MB
  5.8+hg19.5ac0f2a8ba72 1 iterations: 5723MB
after:
  5.8+hg20.e2084d39e145  1000 iterations:  555MB
  5.8+hg20.e2084d39e145 1 iterations:  555MB
 (double checked, not a copy/paste error)

(e2084d39e14 is the present changeset, before amendment
of the message to add the measurements)

diff -r 5ac0f2a8ba72 -r be560b55eb7c mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c   Thu May 20 14:20:39 2021 -0400
+++ b/mercurial/cext/revlog.c   Sun Jun 06 01:24:30 2021 +0200
@@ -919,6 +919,7 @@
phasesets[i] = NULL;
}
 
+   free(phases);
return Py_BuildValue("nN", len, phasesetsdict);
 
 release:

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Octobus monthly mini-sprints

2021-06-01 Thread Georges Racinet
Hi everybody,

In order to keep on improving our coordination with the community,
Octobus will organize periodic mini-sprints, starting next month.

We'll dedicate one day per month to work with whomever is interested on
one of the three areas we're active in: Mercurial core, Evolve and
Heptapod. Of course, in the current context, these will be held entirely
in virtual.

The sprints will happen every first Friday of the month.

The whole thing will start next month. Here's our schedule for the next
three mini-sprints:

- 2021-07-02: Heptapod [1]

- 2021-08-06: Mercurial core (early in development cycle after 5.9)

- 2021-09-03: Evolve

Since we're trying to start a routine, we'll try and keep the same
rotation for a while after that.

The backbone of communication should be provided by the chat systems
[2], and of course we can add videoconferencing on the fly when needed,
but we can work out the details until then.

Hope to "see" you there, we're looking forward for your feedback, ideas
of things to discuss etc.

Best,

[1] I'll be organizing that one, more about it soon to come.

[2] should be
https://mattermost.heptapod.net/heptapod/channels/development for
Heptapod and IRC channels in other cases.

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3 STABLE] repoview: separate concerns in _filteredrepotypes comment

2021-04-26 Thread Georges Racinet
Full CI run of this series can be seen here:

    https://foss.heptapod.net/octobus/mercurial-devel/-/pipelines/21207

Submitting for stable because I think it's relevant that 5.8 does not
have this leak. One user interaction with hgwebdir typically entails
several requests, notably one for capabilities, one for each iteration
of discovery etc, so this can pile up in a few days for a moderately
busy server.

Heptapod ≥ 0.20.5 has a direct mitigation.

Best,

On 4/26/21 12:19 PM, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet 
> # Date 1619274605 -7200
> #  Sat Apr 24 16:30:05 2021 +0200
> # Branch stable
> # Node ID 11011fd630710cce991c9deb949ae85a85651b9a
> # Parent  46db86edf8f279063d6f9d5dd673dc88da8b83cf
> # EXP-Topic memleak-repo-class
> repoview: separate concerns in _filteredrepotypes comment


-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 3 STABLE] repoview: separate concerns in _filteredrepotypes comment

2021-04-26 Thread Georges Racinet
# HG changeset patch
# User Georges Racinet 
# Date 1619274605 -7200
#  Sat Apr 24 16:30:05 2021 +0200
# Branch stable
# Node ID 11011fd630710cce991c9deb949ae85a85651b9a
# Parent  46db86edf8f279063d6f9d5dd673dc88da8b83cf
# EXP-Topic memleak-repo-class
repoview: separate concerns in _filteredrepotypes comment

The cited issue in Python bugtracker is closed, but hasn't been
fixed. We've been able to use the attached example and reproduce
it with Python 3.9.

The point where it turns from needless stress on the GC to
the an actual leak is when one factors in the fact that the GC
was before Python 3.4 unable to collect some types (see PEP 442).

Note that even with Python 2.7, the simple example of cycles
due to __mro__ are collectable. This was seen again with the
example attached on the CPython issue.

diff -r 46db86edf8f2 -r 11011fd63071 mercurial/repoview.py
--- a/mercurial/repoview.py Fri Apr 23 18:30:53 2021 +0200
+++ b/mercurial/repoview.py Sat Apr 24 16:30:05 2021 +0200
@@ -464,9 +464,10 @@
 return delattr(self._unfilteredrepo, attr)
 
 
-# Python <3.4 easily leaks types via __mro__. See
-# https://bugs.python.org/issue17950. We cache dynamically created types
-# so they won't be leaked on every invocation of repo.filtered().
+# Dynamically created classes introduce memory cycles via __mro__. See
+# https://bugs.python.org/issue17950.
+# This need of the garbage collector can turn into memory leak in
+# Python <3.4, which is the first version released with PEP 442.
 _filteredrepotypes = weakref.WeakKeyDictionary()
 
 

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 3 STABLE] repoview: style change in newtype() cache handling

2021-04-26 Thread Georges Racinet
# HG changeset patch
# User Georges Racinet 
# Date 1619271999 -7200
#  Sat Apr 24 15:46:39 2021 +0200
# Branch stable
# Node ID 125fe190f484d396abc93428e3bd58b0b7455c24
# Parent  77e73827a02db25dc675b0afe552f24c807c741d
# EXP-Topic memleak-repo-class
repoview: style change in newtype() cache handling

This way of writing it does not change the logic at all,
but is more fit for the change we want to make in the
next changeset.

If anything, that's one dict lookup less in the hot path,
but that should be non measurable.

diff -r 77e73827a02d -r 125fe190f484 mercurial/repoview.py
--- a/mercurial/repoview.py Thu Apr 22 02:57:30 2021 +0200
+++ b/mercurial/repoview.py Sat Apr 24 15:46:39 2021 +0200
@@ -472,10 +472,12 @@
 
 def newtype(base):
 """Create a new type with the repoview mixin and the given base class"""
-if base not in _filteredrepotypes:
+cls = _filteredrepotypes.get(base)
+if cls is not None:
+return cls
 
-class filteredrepo(repoview, base):
-pass
+class filteredrepo(repoview, base):
+pass
 
-_filteredrepotypes[base] = filteredrepo
-return _filteredrepotypes[base]
+_filteredrepotypes[base] = filteredrepo
+return filteredrepo

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 3 STABLE] repoview: fix memory leak of filtered repo classes

2021-04-26 Thread Georges Racinet
# HG changeset patch
# User Georges Racinet 
# Date 1619195453 -7200
#  Fri Apr 23 18:30:53 2021 +0200
# Branch stable
# Node ID 46db86edf8f279063d6f9d5dd673dc88da8b83cf
# Parent  125fe190f484d396abc93428e3bd58b0b7455c24
# EXP-Topic memleak-repo-class
repoview: fix memory leak of filtered repo classes

The leak occurs in long-running server processes with
extensions, and is measured at 110kB per request.

Before this change, the contents of the `_filteredrepotypes`
cache are not properly garbage collected, despite it begin
a `WeakKeyDictionary`.

Extensions have a tendency to generate a new repository class
for each `localrepo` instantiation. Server processes based
on `hgwebdir_mod` will instantiate a new `localrepo` for each
HTTP request that involves a repository.

As a result, with a testing process that repeatedly opens a
repository with several extensions activated
(`topic` notably among them), we see a steady increase in
resident memory of 110kB per repository instantiation before this
change. This is also true, if we call `gc.collect()` at each
instantiation, like `hgwebdir_mod` does, or not.

The cause of the leak is that the *values* aren't weak references.
This change uses `weakref.ref` for the values, and this makes
in our measurements the resident size increase drop to 5kB per
repository instantiation, with no explicit call of `gc.collect()`
at all.
There is currently no reason to believe that this remaining leak
of 5kB is related to or even due to Mercurial core.

We've also seen evidence that `ui.ui` instances weren't properly
garbage collected before the change (with the change, they are).
This could explain why the figures are relatively high.

In theory, the collection of weak references could lead to
much more misses in the cache, so we measured the impact on
the original case that was motivation for introducing that cache
in 7e89bd0cfb86 (see also issue5043): `hg convert` of the
mozilla-central repository. The bad news here is that there is a
major memory leak there, both with and without the present changeset.
There were no more cache misses, and we could see no
more memory leak with this change: the resident size after importing
roughly 10 changesets was at 12.4GB before, and 12.5GB after.
The small increase is mentioned for completeness only, and we
believe that it should be ignored, at least as long as the main
leak isn't fixed. At less than 1% of the main leak, even finding out
whether it is merely noise would be wasteful.

Original context where this was spotted and first mitigated:
  https://foss.heptapod.net/heptapod/heptapod/-/issues/466

The leak reduction was also obtained in Heptapod inner HTTP server,
which amounts to the same as `hgwebdir_mod` for these questions.

The measurements done with Python 3.9, similar figures seen with 3.8.
More work on our side would be needed to give measurements with 2.7,
because of testing server process does not support it.

diff -r 125fe190f484 -r 46db86edf8f2 mercurial/repoview.py
--- a/mercurial/repoview.py Sat Apr 24 15:46:39 2021 +0200
+++ b/mercurial/repoview.py Fri Apr 23 18:30:53 2021 +0200
@@ -472,12 +472,15 @@
 
 def newtype(base):
 """Create a new type with the repoview mixin and the given base class"""
-cls = _filteredrepotypes.get(base)
-if cls is not None:
-return cls
+ref = _filteredrepotypes.get(base)
+if ref is not None:
+cls = ref()
+if cls is not None:
+return cls
 
 class filteredrepo(repoview, base):
 pass
 
-_filteredrepotypes[base] = filteredrepo
+_filteredrepotypes[base] = weakref.ref(filteredrepo)
+# do not reread from weakref to be 100% sure not to return None
 return filteredrepo

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Getting the revisions of .hgtags programatically

2021-01-24 Thread Georges Racinet
On 1/24/21 11:16 PM, Roy Marples via Mercurial-devel wrote:
> Hi Joerg
>
> On 24/01/2021 22:03, Joerg Sonnenberger wrote:
>> Hello Roy,
>> isn't repo.tagslist() (and maybe filtering out local tags) already doing
>> what you want?
>
> It is sort of. I use it to get a list of tag and yes, I filter out non
> globals already.
> However, it only returns the revision tagged with the name, not which
> revision created the tag.
>
For the record, we have a similar problem in Heptapod, which exposes to
a machinery heavily tailored for Git, namely GitLab. It would make sense
to consider the details (author, description) of the tagging changesets
to be the equivalent of annotated Git tags.

At the time I looked into it, I couldn't find any internal API to
retrieve that information in an efficient way. But I didn't spend more
than half an hour on this and went on to more pressing matters. Maybe
would something like that counter-balance the heaviness of our global
tags by giving it more applications? In any case, it's still low on my
personal list of problems to tackle.

For reference, here's our tracking issue for this:
https://foss.heptapod.net/heptapod/hgitaly/-/issues/5

Best,


-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Heptapod 0.18.0 released

2020-12-21 Thread Georges Racinet
Dear all,

Heptapod 0.18.0 got released today.

Apart that we stay on supported GitLab versions, the main highlights for
the Mercurial mailing-lists are that:

- it comes with Mercurial 5.6.1

- it can optionally run the Rust code. The Docker image actually comes
with a Mercurial built with Rust, but HGMODULEPOLICY is set to "c" by
default in all processes. That can be changed with a single setting.

- foss.heptapod.net is running with the Rust code enabled.

All the details are available in the release notes [2] or, sparing you
an indirection, [3] for the Rust options.

Also, of special interest to Mercurial core developers will be that the
Mercurial clone onto which Heptapod CI jobs are attached [1] is now
updated automatically from `hg-committed`, hence providing more
immediate feedback. For now, it's a time-based system, but we are close
to the point where we can have a hook on mercurial-scm.org trigger it as
soon as something lands.

Augie (speaking to you with your infra hat on): what has to be done is
just an HTTP request with the proper token. I'm currently doing it with
`curl`, but that means registering a new secret. Perhaps we can nail
this in January, after my vacation time?

I wish you all a nice holiday season, take care, and see you in 2021!


[1] https://foss.heptapod.net/octobus/mercurial-devel

[2] https://heptapod.net/heptapod-0.18.0.html#heptapod-0.18.0

[3]
https://heptapod.net/heptapod-0180rc2-released-with-rust-in-mercurial.html#heptapod-0180rc2-released-with-rust-in-mercurial#rust

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Test notifications | Pipeline #358 has failed for branch/default | c086d56d

2020-12-02 Thread Georges Racinet
Hi,

That was me testing that upgrade tomorrow should really manage to notify
the mailing-list upon build failures.

Cheers,

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Integration tests: mercurial-testhelpers

2020-11-23 Thread Georges Racinet
Hi,

I've updated recently the Integration Tests Plan [1] with what has been
said during the 5.6 sprint (from the pad) and more ideas and realizations.

## mercurial-testhelpers

Then, I've extracted my helpers as the `mercurial-testhelpers` separate
package [2]. The main driving reason is that even if it went straight to
the core right now, extension authors wouldn't be able to use it until
they drop support for hg 5.6. For some, that would mean at least a few
years…

mercurial-testhelpers is usable right now by extensions and other
downstreams. It comes with examples for the core [3], hg-evolve and
hg-git and is of course entirely tested using itself, with CI etc.

It is compatible down to at least hg 4.3 (I didn't try further).

Just to say how committed I am about it, all the Python testing of
Heptapod relies on it. I must add that its introduction was a tipping
point in development.

## core submission coming soon

That extraction already had me remove some specifics – and provide means
for projects that need them to add them back.

As a result, I feel now ready to start submitting code to the core. I've
explained some of the choices in the README [4].

Even if it's quite small, I'm not sure how to cut it in smaller
reviewable chunks, but I wouldn't probably start by hooking it in
`run-tests.py`, that's not what I'm the most at ease with. I'd simply
tell people to install and run pytest manually in the first drafts [5]
if that's ok. Guidance and feedback would be much appreciated!

I've had a chat with Jörg today about the dreadful
`test-http-bad-server.t`, will experiment later this week or the next to
see if I can make something equivalent and more maintainable out of it.

Best,


[1] https://www.mercurial-scm.org/wiki/IntegrationTestsPlan

[2] https://pypi.org/project/mercurial-testhelpers  and 
https://foss.heptapod.net/mercurial/testhelpers

[3] the core example is the exact same translation of test-revset.t I've
prepared for the sprint (needs to be split into separate concerns and
completed), plus a Python test copied over to check that launching it
with pytest works (it does).

[4]
https://foss.heptapod.net/mercurial/testhelpers/-/blob/branch/default/README.md

[5] of course, that means not removing any existing case until it's run
by the main runner automatically.

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Heptapod 0.17 released

2020-11-23 Thread Georges Racinet
On 11/23/20 8:16 PM, Augie Fackler wrote:
>
>
>> On Nov 23, 2020, at 09:31, Georges Racinet
>> mailto:georges.raci...@octobus.net>> wrote:
>>
>> - Mercurial native mode (very experimental, don't activate at all on
>> production servers).
>>
>>     It's what we've always wanted to do. We hope it can be made the
>> default mode by the end of the year.
>
> That sounds like a huge milestone. Congrats!

Thank you, much appreciated!

Now I must insist that the native mode is not even ready enough to put a
test project on a production instance. It's disabled by default, of course.

But we've started to dogfood it. Here's a native project:
https://foss.heptapod.net/mercurial/testhelpers

Also, in this first incarnation, the inner conversion to Git still
happens, but it's completely contained to the lower layers. I could
elaborate later on that elsewhere.

Cheers,

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Heptapod 0.17 released

2020-11-23 Thread Georges Racinet
Dear Mercurial users and developers,

we're happy to announce the release of Heptapod 0.17.0 last Friday [1].
The flagship instances, https://foss.heptapod.net and
https://heptapod.host were upgraded on the same day.

This version drops support for Python 2 on the server side, is based on
GitLab 13.4, Mercurial 5.5.2 and hg-evolve 10.1.0, and has several
"technology previews", notably:

- Git repositories (don't host production projects yet).

    Of course, the point of Heptapod is to host Mercurial repos, but
that should make life simpler for the many among our users that *also*
have Git repositories.

- Mercurial native mode (very experimental, don't activate at all on
production servers).

    It's what we've always wanted to do. We hope it can be made the
default mode by the end of the year.

Many more details, including some sreenshots are provided by the
0.17.0rc1 announcement [2].

Overall an important release for us, we hope you'll enjoy it.

Best,


[1] https://heptapod.net/heptapod-0.17.0.html#heptapod-0.17.0

[2]
https://heptapod.net/heptapod-0170rc1-released-with-3-tech-previews.html#heptapod-0170rc1-released-with-3-tech-previews



-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Improving the CI experience

2020-09-29 Thread Georges Racinet
ck
tests about 4 minutes). I don't know yet if it would be easy to get fast
feedback about them, just saying there's room for follow-up improvements.

[1] https://buildbot.mercurial-scm.org/builders/hg%20tests/builds/3078

[2] https://foss.heptapod.net/octobus/mercurial-devel/-/pipelines/10794

> However, the Buildbot still has value since it covers cases that our
> CI still does not cover and not everyone is using Heptapod, so let's
> keep it around for a while.
>
Yes these are complementary. However, if we get to the point where the
Heptapod pipelines run after each push to hg-committed and it works
smoothly, we could consider lightening the weight on the buildbot by
removing redundant builds.

> To further drive the point home, I'm not comfortable recommending
> Heptapod for the project or anything of that sort yet, even ruling out
> personal preferences for workflow. Maybe we can have that discussion
> in a few months, who knows.

I broadly agree. Nevertheless, as the maintainer of Heptapod, I'm always
interested in use cases and thought experiments, as they could turn out
to be useful for other projects.
>
> The fact that the Buildbot runs *after* the patches have landed makes
> me sad, but that's an issue for another day probably!
>
Yes, pre-landing CI has tremendous value, and I think it had something
to do with the success of the GitHub platform. Still, there are uses for
post-landing CI : very expensive and long stuff (fuzzing, load tests…),
exotic hardware you won't be allowed to control yourself, etc. Better to
be able to automate them than not at all.

Best

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics



signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Heptapod news: doc sprint, versions 0.14 and 0.15

2020-07-17 Thread Georges Racinet
Hello everybody,

Here are some news about Heptapod, the friendly fork of GitLab Community
Edition supporting Mercurial.

== Documentation virtual sprint ==

    We're organizing an informal single day sprint about documentation
and user messages. It will happen on the last week of July and doesn't
require a full developer setup.

    There is a poll to choose the exact date at
https://framadate.org/heptapod-vsprint-2020-07

    Everybody is welcome to join us on https://mattermost.heptapod.net.
The event will be on an European time zone, but don't hesitate to attend
even if you can only be there for a couple of hours.

    We intend to make more such virtual sprints in the future and we're
open to set them in other time zones if there are enough interested
attendees.

== Current Heptapod 0.14 is based on a supported GitLab version ==

    Since version 0.14.0 (released on June, 30th), Heptapod has been
running on top of the currently supported GitLab 12.10. With Python 3.8,
Mercurial 5.4 and hg-evolve 10.0, I'm glad to say there is currently
only supported software in Heptapod by default.

    More details at
https://heptapod.net/heptapod-0.14.0.html#heptapod-0.14.0

    Today saw the release of Heptapod 0.14.2, featuring in particular
Mercurial 5.4.2 and GitLab 12.10.14.

    The community instance for Free and Open Source Software
(foss.heptapod.net) and the commercial service (heptapod.host) are both
running 0.14.2,

== Heptapod 0.15 around the corner ==

    GitLab 12.10 will reach its end of life on 2020-07-22 (next
Wednesday), so we have to hurry up for next version: Heptapod 0.15 will
be based on GitLab 13.1, whose end of life is 2020-09-22.

    In fact, 0.15.0rc1 was scheduled for today, but I've decided to play
safe and postpone it to the beginning of next week.

    We'll probably stay a few days on GitLab 12.10 after it becomes
obsolete. Once Heptapod 0.15 is out, we'll have almost two months of
stability, during which we'll turn ourselves a bit more towards
Mercurial features and user experience improvements.

Best,

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Heptapod 0.13.0 "dual python" released

2020-06-02 Thread Georges Racinet
Dear Mercurial users and developers.

we're happy to announce the release of Heptapod 0.13.0, the adaptation
of GitLab CE for Mercurial. This new version ships Mercurial 5.4,
supports both Python 2 and 3, brings in support for wikis and more.

The full announcement is at
https://heptapod.net/heptapod-0.13.0.html#heptapod-0.13.0.

We now have an indicative roadmap page [1] that covers the next few
versions up to Heptapod 1.0. As always, feedback is very much welcomed.

We encourage people managing their own instances to try Python 3 as much
as they can. There's an easy way to fallback to Python 2 if needed [2].

The flagship instances, https://foss.heptapod.net and
https://heptapod.host, are running Heptapod 0.13.0 already.

Cheers,

[1] https://heptapod.net/pages/roadmap.html#roadmap

[2] detailed page about Python 3:
https://heptapod.net/py3-heptapod-0-13.html#py3-heptapod-0-13.


-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Extensions: py3 compatibility metadata

2020-05-13 Thread Georges Racinet
Hi all,

I don't think we have anything for third party extensions to claim their
compatibility with Mercurial on Python 3.

Would it be a good idea to add new metadata, such as `py3testedwith` or
`py3minimumhgversion`?

I personally would find the latter to be enough.

Cheers,

-- 
Georges Racinet
https://octobus.net, https://about.heptapod.host, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Proposed PEP on Python C API

2020-04-14 Thread Georges Racinet
On 4/10/20 9:04 PM, Gregory Szorc wrote:
> https://mail.python.org/archives/list/python-...@python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/
>
>
> This is very relevant to Mercurial’s interests, since some of the
> changes might adversely affect Mercurial performance, as we rely on
> various internal APIs not part of the stable ABI.

For the record, I'm also following that from the rust-cpython side. Do
you have something specific in mind about our current C extensions ?

I pointed Victor to your (merged) PR for PEP-587 initialization APIs
[1], gave a bit of context about PyOxydizer and how it's going to be
central in our Windows packaging.

Long term, there's the perspective that our Rust extensions would be
naturally compatible with PyPy thanks to the HPy project [2]. It's quite
experimental for now,  but I'm very excited about this. 

Best,

[1] https://github.com/dgrunwald/rust-cpython/pull/211

[2] https://github.com/pyhandle

-- 
Georges Racinet
https://octobus.net, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: mercurial.host, the Heptapod commercial service

2020-04-11 Thread Georges Racinet
Hi,

On 3/23/20 3:23 PM, Augie Fackler wrote:
> (-mercurial@)
>
>> On Mar 22, 2020, at 17:29, Georges Racinet  
>> wrote:
>>
>> Hi everybody,
>>
>> Heptapod [1] has made really good progress lately. As a result, we're
>> almost ready to launch the long awaited public commercial service [2].
>>
>> We're planning to launch it as https://mercurial.host around the middle
>> of *this week*.
>>
>> We've already bought that domain name, but as a courtesy to the
>> community, we wanted to ask if there were strong objections to us using
>> it for these purposes.
>>
>> We'll be giving more details both about the upcoming new Heptapod
>> version and the commercial service in separate messages. In the present
>> one, I wanted to focus on the domain name.
>>
>> So, what do you think ?
> I guess I'm kind of -0: it kind of makes it sound like it's "blessed" by the 
> upstream community as _the_ hosting provider to me. I suppose if 
> mercurial.host redirected to some other name for the service that doesn't 
> really bug me, since the name wouldn't just be "mercurial host"?

Thanks for the feedback, we settled for heptapod.host for the permanent
domain name, and we did exactly as you were writing there:
https://mercurial.host is a simple redirection. We'll see how this goes.

Take care,


-- 
Georges Racinet
https://octobus.net, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Free public beta of heptapod.host opening now.

2020-04-11 Thread Georges Racinet
Hi everybody.

Octobus and Clever Cloud are launching a public Mercurial hosting
service based on Heptapod. This service is now entering a public beta
phase that will last until June 1st 2020. It will be entirely free of
charge while in beta.

You may check it out at https://about.heptapod.host (don't miss the
detailed registration instructions). The service itself runs on
https://heptapod.host

There is a draft of the pricing policy [1] that we have in mind for when
the beta phase will be over. It's there because we like to do things in
the open, and because we need to get your feedback. So, please take it
as a request for comments, and come talk to us [2] about it. I'm
confident we can build a successful and fair priced platform for all use
cases.

Of course, those of you that need to leave Bitbucket ASAP are especially
close to our mind. Therefore, heptapod.host is ready from the onset to
import your repositories from Bitbucket. The tutorial about that on
Heptapod's website [3] will guide you through that process.

Don't forget also that Free and Open Source Software projects are
welcome on https://foss.heptapod.net, which is a non commercial and
community managed instance.

As far as the underlying software is concerned, heptapod.host is running
Heptapod 0.12.0, which has been released on April 6th [4] and is
arguably the largest improvement of Heptapod in a single release.

All the best,


[1] https://about.heptapod.host/pricing.html

[2] contact at heptapod.host (see also
https://heptapod.net/pages/contact-us.html#contact-us)

[3] https://heptapod.net/pages/tuto-bb-import.html

[4]
https://heptapod.net/heptapod-0120-released-featuring-gitlab-12.html#heptapod-0120-released-featuring-gitlab-12


-- 
Georges Racinet
https://octobus.net, https://heptapod.net, https://about.heptapod.host
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: mercurial.host, the Heptapod commercial service

2020-03-24 Thread Georges Racinet
Hi,

On 3/23/20 3:53 PM, Anton Shestakov wrote:
> On Mon, 23 Mar 2020 10:23:05 -0400
> Augie Fackler  wrote:
>
>>> On Mar 22, 2020, at 17:29, Georges Racinet  
>>> wrote:
>>>
>>> We've already bought that domain name, but as a courtesy to the
>>> community, we wanted to ask if there were strong objections to us using
>>> it for these purposes.
>> I guess I'm kind of -0: it kind of makes it sound like it's "blessed" by the 
>> upstream community as _the_ hosting provider to me. I suppose if 
>> mercurial.host redirected to some other name for the service that doesn't 
>> really bug me, since the name wouldn't just be "mercurial host"?
> Or an "index" page, having a list of hosting options from
> MercurialHosting wiki page (and/or having a link to said wiki page).

Thanks for these quick replies. Is it fair to summarize that each of you
isn't really in favor to it, while not very strongly opposed either?

-- 
Georges Racinet
https://octobus.net, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


mercurial.host, the Heptapod commercial service

2020-03-22 Thread Georges Racinet
Hi everybody,

Heptapod [1] has made really good progress lately. As a result, we're
almost ready to launch the long awaited public commercial service [2].

We're planning to launch it as https://mercurial.host around the middle
of *this week*.

We've already bought that domain name, but as a courtesy to the
community, we wanted to ask if there were strong objections to us using
it for these purposes.

We'll be giving more details both about the upcoming new Heptapod
version and the commercial service in separate messages. In the present
one, I wanted to focus on the domain name.

So, what do you think ?

Take care,


[1] for those who wouldn't know what I'm talking about, see
https://heptapod.net

[2] see https://clever-cloud.com/en/heptapod

-- 
Georges Racinet
https://octobus.net, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8161: rust-nodemap: also clear Rust data in `clearcaches`

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHGcadcc8c20860: rust-nodemap: also clear Rust data in 
`clearcaches` (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8161?vs=20660=20699

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8161/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8161

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -161,8 +161,12 @@
 self.call_cindex(py, "commonancestorsheads", args, kw)
 }
 
-/// clear the index caches
+/// Clear the index caches and inner py_class data.
+/// It is Python's responsibility to call `update_nodemap_data` again.
 def clearcaches(, *args, **kw) -> PyResult {
+self.nt(py).borrow_mut().take();
+self.docket(py).borrow_mut().take();
+self.mmap(py).borrow_mut().take();
 self.call_cindex(py, "clearcaches", args, kw)
 }
 
@@ -231,7 +235,7 @@
 // `index_getitem` does not handle conversion from PyLong,
 // which expressions such as [e for e in index] internally use.
 // Note that we don't seem to have a direct way to call
-// PySequence_GetItem (does the job), which would be better for
+// PySequence_GetItem (does the job), which would possibly be better
 // for performance
 let key = match key.extract::(py) {
 Ok(rev) => rev.to_py_object(py).into_object(),



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8160: rust-nodemap: add binding to `nodemap_update_data`

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHG15febf99a9c6: rust-nodemap: add binding to 
`nodemap_update_data` (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8160?vs=20659=20698

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8160/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8160

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -10,6 +10,7 @@
 utils::{node_from_py_bytes, node_from_py_object},
 };
 use cpython::{
+buffer::{Element, PyBuffer},
 exc::{IndexError, ValueError},
 ObjectProtocol, PyBytes, PyClone, PyDict, PyErr, PyModule, PyObject,
 PyResult, PyString, PyTuple, Python, PythonObject, ToPyObject,
@@ -36,6 +37,8 @@
 data cindex: RefCell;
 data nt: RefCell>;
 data docket: RefCell>;
+// Holds a reference to the mmap'ed persistent nodemap data
+data mmap: RefCell>;
 
 def __new__(_cls, cindex: PyObject) -> PyResult {
 Self::new(py, cindex)
@@ -268,6 +271,14 @@
 def nodemap_data_incremental() -> PyResult {
 self.inner_nodemap_data_incremental(py)
 }
+def update_nodemap_data(
+,
+docket: PyObject,
+nm_data: PyObject
+) -> PyResult {
+self.inner_update_nodemap_data(py, docket, nm_data)
+}
+
 
 });
 
@@ -278,6 +289,7 @@
 RefCell::new(cindex::Index::new(py, cindex)?),
 RefCell::new(None),
 RefCell::new(None),
+RefCell::new(None),
 )
 }
 
@@ -374,6 +386,56 @@
 .to_py_object(py)
 .into_object())
 }
+
+/// Update the nodemap from the new (mmaped) data.
+/// The docket is kept as a reference for later incremental calls.
+fn inner_update_nodemap_data(
+,
+py: Python,
+docket: PyObject,
+nm_data: PyObject,
+) -> PyResult {
+let buf = PyBuffer::get(py, _data)?;
+let len = buf.item_count();
+
+// Build a slice from the mmap'ed buffer data
+let cbuf = buf.buf_ptr();
+let bytes = if std::mem::size_of::() == buf.item_size()
+&& buf.is_c_contiguous()
+&& u8::is_compatible_format(buf.format())
+{
+unsafe { std::slice::from_raw_parts(cbuf as *const u8, len) }
+} else {
+return Err(PyErr::new::(
+py,
+"Nodemap data buffer has an invalid memory representation"
+.to_string(),
+));
+};
+
+// Keep a reference to the mmap'ed buffer, otherwise we get a dangling
+// pointer.
+self.mmap(py).borrow_mut().replace(buf);
+
+let mut nt = NodeTree::load_bytes(Box::new(bytes), len);
+
+let data_tip =
+docket.getattr(py, "tip_rev")?.extract::(py)?;
+self.docket(py).borrow_mut().replace(docket.clone_ref(py));
+let idx = self.cindex(py).borrow();
+let current_tip = idx.len();
+
+for r in (data_tip + 1)..current_tip as Revision {
+let rev = r as Revision;
+// in this case node() won't ever return None
+nt.insert(&*idx, idx.node(rev).unwrap(), rev)
+.map_err(|e| nodemap_error(py, e))?
+}
+
+*self.nt(py).borrow_mut() = Some(nt);
+
+Ok(py.None())
+}
 }
 
 fn revlog_error(py: Python) -> PyErr {



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8159: rust-nodemap: add binding for `nodemap_data_incremental`

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHG5bbf887275b0: rust-nodemap: add binding for 
`nodemap_data_incremental` (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8159?vs=20658=20697

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8159/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8159

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -15,7 +15,7 @@
 PyResult, PyString, PyTuple, Python, PythonObject, ToPyObject,
 };
 use hg::{
-nodemap::{NodeMapError, NodeTree},
+nodemap::{Block, NodeMapError, NodeTree},
 revlog::{nodemap::NodeMap, RevlogIndex},
 NodeError, Revision,
 };
@@ -35,6 +35,7 @@
 py_class!(pub class MixedIndex |py| {
 data cindex: RefCell;
 data nt: RefCell>;
+data docket: RefCell>;
 
 def __new__(_cls, cindex: PyObject) -> PyResult {
 Self::new(py, cindex)
@@ -264,6 +265,9 @@
 self.inner_nodemap_data_all(py)
 }
 
+def nodemap_data_incremental() -> PyResult {
+self.inner_nodemap_data_incremental(py)
+}
 
 });
 
@@ -273,6 +277,7 @@
 py,
 RefCell::new(cindex::Index::new(py, cindex)?),
 RefCell::new(None),
+RefCell::new(None),
 )
 }
 
@@ -347,6 +352,28 @@
 let bytes = PyBytes::new(py, );
 Ok(bytes)
 }
+
+/// Returns the last saved docket along with the size of any changed data
+/// (in number of blocks), and said data as bytes.
+fn inner_nodemap_data_incremental(
+,
+py: Python,
+) -> PyResult {
+let docket = self.docket(py).borrow();
+let docket = match docket.as_ref() {
+Some(d) => d,
+None => return Ok(py.None()),
+};
+
+let node_tree = self.get_nodetree(py)?.borrow_mut().take().unwrap();
+let masked_blocks = node_tree.masked_readonly_blocks();
+let (_, data) = node_tree.into_readonly_and_added_bytes();
+let changed = masked_blocks * std::mem::size_of::();
+
+Ok((docket, changed, PyBytes::new(py, ))
+.to_py_object(py)
+.into_object())
+}
 }
 
 fn revlog_error(py: Python) -> PyErr {



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8158: rust-nodemap: add binding for `nodemap_data_all`

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHGb581231ae9d1: rust-nodemap: add binding for 
`nodemap_data_all` (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8158?vs=20657=20696

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8158/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8158

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -260,6 +260,10 @@
 }
 }
 
+def nodemap_data_all() -> PyResult {
+self.inner_nodemap_data_all(py)
+}
+
 
 });
 
@@ -320,6 +324,29 @@
 pub fn clone_cindex(, py: Python) -> cindex::Index {
 self.cindex(py).borrow().clone_ref(py)
 }
+
+/// Returns the full nodemap bytes to be written as-is to disk
+fn inner_nodemap_data_all(, py: Python) -> PyResult {
+let nodemap = self.get_nodetree(py)?.borrow_mut().take().unwrap();
+let (readonly, bytes) = nodemap.into_readonly_and_added_bytes();
+
+// If there's anything readonly, we need to build the data again from
+// scratch
+let bytes = if readonly.len() > 0 {
+let mut nt = NodeTree::load_bytes(Box::new(vec![]), 0);
+self.fill_nodemap(py,  nt)?;
+
+let (readonly, bytes) = nt.into_readonly_and_added_bytes();
+assert_eq!(readonly.len(), 0);
+
+bytes
+} else {
+bytes
+};
+
+let bytes = PyBytes::new(py, );
+Ok(bytes)
+}
 }
 
 fn revlog_error(py: Python) -> PyErr {



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8152: revlog: using two new functions in C capsule from Rust code

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHG166349510398: revlog: using two new functions in C capsule 
from Rust code (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8152?vs=20678=20690

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8152/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8152

AFFECTED FILES
  mercurial/cext/revlog.c
  rust/hg-core/src/revlog/node.rs
  rust/hg-cpython/src/cindex.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/cindex.rs b/rust/hg-cpython/src/cindex.rs
--- a/rust/hg-cpython/src/cindex.rs
+++ b/rust/hg-cpython/src/cindex.rs
@@ -11,14 +11,21 @@
 //! but this will take some time to get there.
 
 use cpython::{exc::ImportError, PyClone, PyErr, PyObject, PyResult, Python};
+use hg::revlog::{Node, RevlogIndex};
 use hg::{Graph, GraphError, Revision, WORKING_DIRECTORY_REVISION};
 use libc::c_int;
 
-const REVLOG_CABI_VERSION: c_int = 1;
+const REVLOG_CABI_VERSION: c_int = 2;
 
 #[repr(C)]
 pub struct Revlog_CAPI {
 abi_version: c_int,
+index_length:
+unsafe extern "C" fn(index: *mut revlog_capi::RawPyObject) -> c_int,
+index_node: unsafe extern "C" fn(
+index: *mut revlog_capi::RawPyObject,
+rev: c_int,
+) -> *const Node,
 index_parents: unsafe extern "C" fn(
 index: *mut revlog_capi::RawPyObject,
 rev: c_int,
@@ -131,3 +138,30 @@
 }
 }
 }
+
+impl RevlogIndex for Index {
+/// Note C return type is Py_ssize_t (hence signed), but we shall
+/// force it to unsigned, because it's a length
+fn len() -> usize {
+unsafe { (self.capi.index_length)(self.index.as_ptr()) as usize }
+}
+
+fn node<'a>(&'a self, rev: Revision) -> Option<&'a Node> {
+let raw = unsafe {
+(self.capi.index_node)(self.index.as_ptr(), rev as c_int)
+};
+if raw.is_null() {
+None
+} else {
+// TODO it would be much better for the C layer to give us
+// a length, since the hash length will change in the near
+// future, but that's probably out of scope for the nodemap
+// patch series.
+//
+// The root of that unsafety relies in the signature of
+// `capi.index_node()` itself: returning a `Node` pointer
+// whereas it's a `char *` in the C counterpart.
+Some(unsafe { &*raw })
+}
+}
+}
diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
--- a/rust/hg-core/src/revlog/node.rs
+++ b/rust/hg-core/src/revlog/node.rs
@@ -44,6 +44,7 @@
 /// [`nybbles_len`]: #method.nybbles_len
 /// [`ExactLengthRequired`]: struct.NodeError#variant.ExactLengthRequired
 #[derive(Clone, Debug, PartialEq)]
+#[repr(transparent)]
 pub struct Node {
 data: NodeData,
 }
diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c
+++ b/mercurial/cext/revlog.c
@@ -39,6 +39,8 @@
 
 typedef struct {
int abi_version;
+   Py_ssize_t (*index_length)(const indexObject *);
+   const char *(*index_node)(indexObject *, Py_ssize_t);
int (*index_parents)(PyObject *, int, int *);
 } Revlog_CAPI;
 
@@ -2877,7 +2879,9 @@
 static Revlog_CAPI CAPI = {
 /* increment the abi_version field upon each change in the Revlog_CAPI
struct or in the ABI of the listed functions */
-1,
+2,
+index_length,
+index_node,
 HgRevlogIndex_GetParents,
 };
 



To: Alphare, #hg-reviewers
Cc: marmoute, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8156: rust-nodemap: add utils for propagating errors

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHG26dd35ac59b8: rust-nodemap: add utils for propagating 
errors (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8156?vs=20655=20694

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8156/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8156

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -1,16 +1,16 @@
 // revlog.rs
 //
-// Copyright 2019 Georges Racinet 
+// Copyright 2019-2020 Georges Racinet 
 //
 // This software may be used and distributed according to the terms of the
 // GNU General Public License version 2 or any later version.
 
 use crate::cindex;
 use cpython::{
-ObjectProtocol, PyClone, PyDict, PyModule, PyObject, PyResult, PyTuple,
-Python, PythonObject, ToPyObject,
+exc::ValueError, ObjectProtocol, PyClone, PyDict, PyErr, PyModule,
+PyObject, PyResult, PyTuple, Python, PythonObject, ToPyObject,
 };
-use hg::Revision;
+use hg::{nodemap::NodeMapError, NodeError, Revision};
 use std::cell::RefCell;
 
 /// Return a Struct implementing the Graph trait
@@ -224,6 +224,43 @@
 }
 }
 
+fn revlog_error(py: Python) -> PyErr {
+match py
+.import("mercurial.error")
+.and_then(|m| m.get(py, "RevlogError"))
+{
+Err(e) => e,
+Ok(cls) => PyErr::from_instance(py, cls),
+}
+}
+
+fn rev_not_in_index(py: Python, rev: Revision) -> PyErr {
+PyErr::new::(
+py,
+format!(
+"Inconsistency: Revision {} found in nodemap \
+ is not in revlog index",
+rev
+),
+)
+}
+
+/// Standard treatment of NodeMapError
+fn nodemap_error(py: Python, err: NodeMapError) -> PyErr {
+match err {
+NodeMapError::MultipleResults => revlog_error(py),
+NodeMapError::RevisionNotInIndex(r) => rev_not_in_index(py, r),
+NodeMapError::InvalidNodePrefix(s) => invalid_node_prefix(py, ),
+}
+}
+
+fn invalid_node_prefix(py: Python, ne: ) -> PyErr {
+PyErr::new::(
+py,
+format!("Invalid node or prefix: {:?}", ne),
+)
+}
+
 /// Create the module, with __package__ given from parent
 pub fn init_module(py: Python, package: ) -> PyResult {
 let dotted_name = !("{}.revlog", package);



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8155: rust-nodemap: add utils to create `Node`s from Python objects

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHGd738b7a18438: rust-nodemap: add utils to create `Node`s 
from Python objects (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8155?vs=20654=20693

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8155/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8155

AFFECTED FILES
  rust/hg-cpython/src/utils.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/utils.rs b/rust/hg-cpython/src/utils.rs
--- a/rust/hg-cpython/src/utils.rs
+++ b/rust/hg-cpython/src/utils.rs
@@ -1,4 +1,7 @@
-use cpython::{PyDict, PyObject, PyResult, PyTuple, Python};
+use cpython::exc::ValueError;
+use cpython::{PyBytes, PyDict, PyErr, PyObject, PyResult, PyTuple, Python};
+use hg::revlog::Node;
+use std::convert::TryFrom;
 
 #[allow(unused)]
 pub fn print_python_trace(py: Python) -> PyResult {
@@ -11,3 +14,34 @@
 kwargs.set_item(py, "file", sys.get(py, "stderr")?)?;
 traceback.call(py, "print_stack", PyTuple::new(py, &[]), Some())
 }
+
+// Necessary evil for the time being, could maybe be moved to
+// a TryFrom in Node itself
+const NODE_BYTES_LENGTH: usize = 20;
+type NodeData = [u8; NODE_BYTES_LENGTH];
+
+/// Copy incoming Python bytes given as `PyObject` into `Node`,
+/// doing the necessary checks
+pub fn node_from_py_object<'a>(
+py: Python,
+bytes: &'a PyObject,
+) -> PyResult {
+let as_py_bytes: &'a PyBytes = bytes.extract(py)?;
+node_from_py_bytes(py, as_py_bytes)
+}
+
+/// Clone incoming Python bytes given as `PyBytes` as a `Node`,
+/// doing the necessary checks.
+pub fn node_from_py_bytes<'a>(
+py: Python,
+bytes: &'a PyBytes,
+) -> PyResult {
+::try_from(bytes.data(py))
+.map_err(|_| {
+PyErr::new::(
+py,
+format!("{}-byte hash required", NODE_BYTES_LENGTH),
+)
+})
+.map(|n| n.into())
+}



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8154: rust-index: add `append` method to cindex/Index

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHGcefd130c98be: rust-index: add `append` method to 
cindex/Index (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8154?vs=20653=20692

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8154/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8154

AFFECTED FILES
  rust/hg-cpython/src/cindex.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/cindex.rs b/rust/hg-cpython/src/cindex.rs
--- a/rust/hg-cpython/src/cindex.rs
+++ b/rust/hg-cpython/src/cindex.rs
@@ -10,7 +10,10 @@
 //! Ideally, we should use an Index entirely implemented in Rust,
 //! but this will take some time to get there.
 
-use cpython::{exc::ImportError, PyClone, PyErr, PyObject, PyResult, Python};
+use cpython::{
+exc::ImportError, ObjectProtocol, PyClone, PyErr, PyObject, PyResult,
+PyTuple, Python, PythonObject,
+};
 use hg::revlog::{Node, RevlogIndex};
 use hg::{Graph, GraphError, Revision, WORKING_DIRECTORY_REVISION};
 use libc::c_int;
@@ -97,6 +100,15 @@
 pub fn inner() ->  {
 
 }
+
+pub fn append( self, py: Python, tup: PyTuple) -> PyResult {
+self.index.call_method(
+py,
+"append",
+PyTuple::new(py, &[tup.into_object()]),
+None,
+)
+}
 }
 
 impl Clone for Index {



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8153: rust-index: moved constructor in separate impl block

2020-03-11 Thread gracinet (Georges Racinet)
Closed by commit rHG887d0f921b34: rust-index: moved constructor in separate 
impl block (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8153?vs=20652=20691

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8153/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8153

AFFECTED FILES
  rust/hg-cpython/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/revlog.rs b/rust/hg-cpython/src/revlog.rs
--- a/rust/hg-cpython/src/revlog.rs
+++ b/rust/hg-cpython/src/revlog.rs
@@ -28,8 +28,7 @@
 data cindex: RefCell;
 
 def __new__(_cls, cindex: PyObject) -> PyResult {
-Self::create_instance(py, RefCell::new(
-cindex::Index::new(py, cindex)?))
+Self::new(py, cindex)
 }
 
 /// Compatibility layer used for Python consumers needing access to the C 
index
@@ -199,6 +198,13 @@
 });
 
 impl MixedIndex {
+fn new(py: Python, cindex: PyObject) -> PyResult {
+Self::create_instance(
+py,
+RefCell::new(cindex::Index::new(py, cindex)?),
+)
+}
+
 /// forward a method call to the underlying C index
 fn call_cindex(
 ,



To: Alphare, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7819: rust-nodemap: core implementation for shortest

2020-02-26 Thread gracinet (Georges Racinet)
Closed by commit rHG0e8a9c0fbc8c: rust-nodemap: core implementation for 
shortest (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7819?vs=20283=20313

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7819/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7819

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,6 +17,7 @@
 RevlogIndex, NULL_REVISION,
 };
 
+use std::cmp::max;
 use std::fmt;
 use std::mem;
 use std::ops::Deref;
@@ -98,6 +99,42 @@
 ) -> Result, NodeMapError> {
 self.find_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
 }
+
+/// Give the size of the shortest node prefix that determines
+/// the revision uniquely.
+///
+/// From a binary node prefix, if it is matched in the node map, this
+/// returns the number of hexadecimal digits that would had sufficed
+/// to find the revision uniquely.
+///
+/// Returns `None` if no `Revision` could be found for the prefix.
+///
+/// If several Revisions match the given prefix, a [`MultipleResults`]
+/// error is returned.
+fn unique_prefix_len_bin<'a>(
+,
+idx:  RevlogIndex,
+node_prefix: NodePrefixRef<'a>,
+) -> Result, NodeMapError>;
+
+/// Same as `unique_prefix_len_bin`, with the hexadecimal representation
+/// of the prefix as input.
+fn unique_prefix_len_hex(
+,
+idx:  RevlogIndex,
+prefix: ,
+) -> Result, NodeMapError> {
+self.unique_prefix_len_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
+}
+
+/// Same as `unique_prefix_len_bin`, with a full `Node` as input
+fn unique_prefix_len_node(
+,
+idx:  RevlogIndex,
+node: ,
+) -> Result, NodeMapError> {
+self.unique_prefix_len_bin(idx, node.into())
+}
 }
 
 pub trait MutableNodeMap: NodeMap {
@@ -279,20 +316,24 @@
 fn validate_candidate(
 idx:  RevlogIndex,
 prefix: NodePrefixRef,
-rev: Option,
-) -> Result, NodeMapError> {
-if prefix.is_prefix_of(_NODE) {
-// NULL_REVISION always matches a prefix made only of zeros
+candidate: (Option, usize),
+) -> Result<(Option, usize), NodeMapError> {
+let (rev, steps) = candidate;
+if let Some(nz_nybble) = prefix.first_different_nybble(_NODE) {
+rev.map_or(Ok((None, steps)), |r| {
+has_prefix_or_none(idx, prefix, r)
+.map(|opt| (opt, max(steps, nz_nybble + 1)))
+})
+} else {
+// the prefix is only made of zeros; NULL_REVISION always matches it
 // and any other *valid* result is an ambiguity
 match rev {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 Some(r) => match has_prefix_or_none(idx, prefix, r)? {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 _ => Err(NodeMapError::MultipleResults),
 },
 }
-} else {
-rev.map_or(Ok(None), |r| has_prefix_or_none(idx, prefix, r))
 }
 }
 
@@ -387,13 +428,26 @@
 }
 
 /// Main working method for `NodeTree` searches
-fn lookup<'p>(
+///
+/// The first returned value is the result of analysing `NodeTree` data
+/// *alone*: whereas `None` guarantees that the given prefix is absent
+/// from the `NodeTree` data (but still could match `NULL_NODE`), with
+/// `Some(rev)`, it is to be understood that `rev` is the unique `Revision`
+/// that could match the prefix. Actually, all that can be inferred from
+/// the `NodeTree` data is that `rev` is the revision with the longest
+/// common node prefix with the given prefix.
+///
+/// The second returned value is the size of the smallest subprefix
+/// of `prefix` that would give the same result, i.e. not the
+/// `MultipleResults` error variant (again, using only the data of the
+/// `NodeTree`).
+fn lookup(
 ,
-prefix: NodePrefixRef<'p>,
-) -> Result, NodeMapError> {
-for visit_item in self.visit(prefix) {
+prefix: NodePrefixRef,
+) -> Result<(Option, usize), NodeMapError> {
+for (i, visit_item) in self.visit(prefix).enumerate() {
 if let Some(opt) = visit_item.final_revision() {
-return Ok(opt);
+return Ok((opt, i + 1));
 }
 }
 Err(NodeMapError::MultipleResults)
@@ -638,6 +692,16 @@
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError> {
 

D8097: rust-nodemap: accounting for dead blocks

2020-02-26 Thread gracinet (Georges Racinet)
Closed by commit rHG706c1bd7f71c: rust-nodemap: accounting for dead blocks 
(authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8097?vs=20284=20315

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8097/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8097

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -274,6 +274,7 @@
 readonly: Box + Send>,
 growable: Vec,
 root: Block,
+masked_inner_blocks: usize,
 }
 
 impl Index for NodeTree {
@@ -350,6 +351,7 @@
 readonly: readonly,
 growable: Vec::new(),
 root: root,
+masked_inner_blocks: 0,
 }
 }
 
@@ -483,6 +485,7 @@
 let ro_len = ro_blocks.len();
 let glen = self.growable.len();
 if idx < ro_len {
+self.masked_inner_blocks += 1;
 // TODO OPTIM I think this makes two copies
 self.growable.push(ro_blocks[idx].clone());
 (glen + ro_len,  self.growable[glen], glen + 1)
@@ -571,6 +574,22 @@
 }
 Ok(())
 }
+
+/// Return the number of blocks in the readonly part that are currently
+/// masked in the mutable part.
+///
+/// The `NodeTree` structure has no efficient way to know how many blocks
+/// are already unreachable in the readonly part.
+pub fn masked_readonly_blocks() -> usize {
+if let Some(readonly_root) = self.readonly.last() {
+if readonly_root ==  {
+return 0;
+}
+} else {
+return 0;
+}
+self.masked_inner_blocks + 1
+}
 }
 
 pub struct NodeTreeBytes {
@@ -853,6 +872,7 @@
 readonly: sample_nodetree().readonly,
 growable: vec![block![0: Rev(1), 5: Rev(3)]],
 root: block![0: Block(1), 1:Block(3), 12: Rev(2)],
+masked_inner_blocks: 1,
 };
 assert_eq!(nt.find_hex(, "10")?, Some(1));
 assert_eq!(nt.find_hex(, "c")?, Some(2));
@@ -861,6 +881,7 @@
 assert_eq!(nt.find_hex(, "000")?, Some(NULL_REVISION));
 assert_eq!(nt.unique_prefix_len_hex(, "000")?, Some(3));
 assert_eq!(nt.find_hex(, "01")?, Some(9));
+assert_eq!(nt.masked_readonly_blocks(), 2);
 Ok(())
 }
 
@@ -950,6 +971,8 @@
 assert_eq!(idx.find_hex("1a345")?, Some(3));
 assert_eq!(idx.find_hex("1a341")?, None);
 
+// there's no readonly block to mask
+assert_eq!(idx.nt.masked_readonly_blocks(), 0);
 Ok(())
 }
 
@@ -1011,6 +1034,8 @@
 assert_eq!(idx.find_hex("1235")?, Some(1));
 assert_eq!(idx.find_hex("131")?, Some(2));
 assert_eq!(idx.find_hex("cafe")?, Some(3));
+// we did not add anything since init from readonly
+assert_eq!(idx.nt.masked_readonly_blocks(), 0);
 
 idx.insert(4, "123A")?;
 assert_eq!(idx.find_hex("1234")?, Some(0));
@@ -1018,12 +1043,18 @@
 assert_eq!(idx.find_hex("131")?, Some(2));
 assert_eq!(idx.find_hex("cafe")?, Some(3));
 assert_eq!(idx.find_hex("123A")?, Some(4));
+// we masked blocks for all prefixes of "123", including the root
+assert_eq!(idx.nt.masked_readonly_blocks(), 4);
 
+eprintln!("{:?}", idx.nt);
 idx.insert(5, "c0")?;
 assert_eq!(idx.find_hex("cafe")?, Some(3));
 assert_eq!(idx.find_hex("c0")?, Some(5));
 assert_eq!(idx.find_hex("c1")?, None);
 assert_eq!(idx.find_hex("1234")?, Some(0));
+// inserting "c0" is just splitting the 'c' slot of the mutable root,
+// it doesn't mask anything
+assert_eq!(idx.nt.masked_readonly_blocks(), 4);
 
 Ok(())
 }



To: marmoute, #hg-reviewers, kevincox
Cc: kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8098: rust-nodemap: a method for full invalidation

2020-02-26 Thread gracinet (Georges Racinet)
Closed by commit rHGbbc61f36733c: rust-nodemap: a method for full invalidation 
(authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8098?vs=20285=20314

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8098/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8098

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -575,11 +575,25 @@
 Ok(())
 }
 
+/// Make the whole `NodeTree` logically empty, without touching the
+/// immutable part.
+pub fn invalidate_all( self) {
+self.root = Block::new();
+self.growable = Vec::new();
+self.masked_inner_blocks = self.readonly.len();
+}
+
 /// Return the number of blocks in the readonly part that are currently
 /// masked in the mutable part.
 ///
 /// The `NodeTree` structure has no efficient way to know how many blocks
 /// are already unreachable in the readonly part.
+///
+/// After a call to `invalidate_all()`, the returned number can be actually
+/// bigger than the whole readonly part, a conventional way to mean that
+/// all the readonly blocks have been masked. This is what is really
+/// useful to the caller and does not require to know how many were
+/// actually unreachable to begin with.
 pub fn masked_readonly_blocks() -> usize {
 if let Some(readonly_root) = self.readonly.last() {
 if readonly_root ==  {
@@ -1060,6 +1074,27 @@
 }
 
 #[test]
+fn test_invalidate_all() -> Result<(), NodeMapError> {
+let mut idx = TestNtIndex::new();
+idx.insert(0, "1234")?;
+idx.insert(1, "1235")?;
+idx.insert(2, "131")?;
+idx.insert(3, "cafe")?;
+let mut idx = idx.commit();
+
+idx.nt.invalidate_all();
+
+assert_eq!(idx.find_hex("1234")?, None);
+assert_eq!(idx.find_hex("1235")?, None);
+assert_eq!(idx.find_hex("131")?, None);
+assert_eq!(idx.find_hex("cafe")?, None);
+// all the readonly blocks have been masked, this is the
+// conventional expected response
+assert_eq!(idx.nt.masked_readonly_blocks(), idx.nt.readonly.len() + 1);
+Ok(())
+}
+
+#[test]
 fn test_into_added_empty() {
 assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
 assert!(sample_nodetree()



To: marmoute, #hg-reviewers, kevincox
Cc: kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7797: rust-nodemap: pure Rust example

2020-02-26 Thread gracinet (Georges Racinet)
Closed by commit rHG57d5b0fb9740: rust-nodemap: pure Rust example (authored by 
gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7797?vs=20277=20311

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7797/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7797

AFFECTED FILES
  rust/Cargo.lock
  rust/hg-core/Cargo.toml
  rust/hg-core/examples/nodemap/index.rs
  rust/hg-core/examples/nodemap/main.rs

CHANGE DETAILS

diff --git a/rust/hg-core/examples/nodemap/main.rs 
b/rust/hg-core/examples/nodemap/main.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/examples/nodemap/main.rs
@@ -0,0 +1,146 @@
+// Copyright 2019-2020 Georges Racinet 
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+use clap::*;
+use hg::revlog::node::*;
+use hg::revlog::nodemap::*;
+use hg::revlog::*;
+use memmap::MmapOptions;
+use rand::Rng;
+use std::fs::File;
+use std::io;
+use std::io::Write;
+use std::path::{Path, PathBuf};
+use std::str::FromStr;
+use std::time::Instant;
+
+mod index;
+use index::Index;
+
+fn mmap_index(repo_path: ) -> Index {
+let mut path = PathBuf::from(repo_path);
+path.extend([".hg", "store", "00changelog.i"].iter());
+Index::load_mmap(path)
+}
+
+fn mmap_nodemap(path: ) -> NodeTree {
+let file = File::open(path).unwrap();
+let mmap = unsafe { MmapOptions::new().map().unwrap() };
+let len = mmap.len();
+NodeTree::load_bytes(Box::new(mmap), len)
+}
+
+/// Scan the whole index and create the corresponding nodemap file at `path`
+fn create(index: , path: ) -> io::Result<()> {
+let mut file = File::create(path)?;
+let start = Instant::now();
+let mut nm = NodeTree::default();
+for rev in 0..index.len() {
+let rev = rev as Revision;
+nm.insert(index, index.node(rev).unwrap(), rev).unwrap();
+}
+eprintln!("Nodemap constructed in RAM in {:?}", start.elapsed());
+file.write(_readonly_and_added_bytes().1)?;
+eprintln!("Nodemap written to disk");
+Ok(())
+}
+
+fn query(index: , nm: , prefix: ) {
+let start = Instant::now();
+let res = nm.find_hex(index, prefix);
+println!("Result found in {:?}: {:?}", start.elapsed(), res);
+}
+
+fn bench(index: , nm: , queries: usize) {
+let len = index.len() as u32;
+let mut rng = rand::thread_rng();
+let nodes: Vec = (0..queries)
+.map(|_| {
+index
+.node((rng.gen::() % len) as Revision)
+.unwrap()
+.clone()
+})
+.collect();
+if queries < 10 {
+let nodes_hex: Vec =
+nodes.iter().map(|n| n.encode_hex()).collect();
+println!("Nodes: {:?}", nodes_hex);
+}
+let mut last: Option = None;
+let start = Instant::now();
+for node in nodes.iter() {
+last = nm.find_bin(index, node.into()).unwrap();
+}
+let elapsed = start.elapsed();
+println!(
+"Did {} queries in {:?} (mean {:?}), last was {:?} with result {:?}",
+queries,
+elapsed,
+elapsed / (queries as u32),
+nodes.last().unwrap().encode_hex(),
+last
+);
+}
+
+fn main() {
+let matches = App::new("Nodemap pure Rust example")
+.arg(
+Arg::with_name("REPOSITORY")
+.help("Path to the repository, always necessary for its index")
+.required(true),
+)
+.arg(
+Arg::with_name("NODEMAP_FILE")
+.help("Path to the nodemap file, independent of REPOSITORY")
+.required(true),
+)
+.subcommand(
+SubCommand::with_name("create")
+.about("Create NODEMAP_FILE by scanning repository index"),
+)
+.subcommand(
+SubCommand::with_name("query")
+.about("Query NODEMAP_FILE for PREFIX")
+.arg(Arg::with_name("PREFIX").required(true)),
+)
+.subcommand(
+SubCommand::with_name("bench")
+.about(
+"Perform #QUERIES random successful queries on 
NODEMAP_FILE")
+.arg(Arg::with_name("QUERIES").required(true)),
+)
+.get_matches();
+
+let repo = matches.value_of("REPOSITORY").unwrap();
+let nm_path = matches.value_of("NODEMAP_FILE").unwrap();
+
+let index = mmap_index(::new(repo));
+
+if let Some(_) = matches.subcommand_matches("create") {
+println!("Creati

D7798: rust-nodemap: special case for prefixes of NULL_NODE

2020-02-26 Thread gracinet (Georges Racinet)
Closed by commit rHG895934342631: rust-nodemap: special case for prefixes of 
NULL_NODE (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7798?vs=20282=20312

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7798/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7798

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -13,7 +13,8 @@
 //! is used in a more abstract context.
 
 use super::{
-Node, NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex,
+node::NULL_NODE, Node, NodeError, NodePrefix, NodePrefixRef, Revision,
+RevlogIndex, NULL_REVISION,
 };
 
 use std::fmt;
@@ -270,6 +271,31 @@
 })
 }
 
+/// validate that the candidate's node starts indeed with given prefix,
+/// and treat ambiguities related to `NULL_REVISION`.
+///
+/// From the data in the NodeTree, one can only conclude that some
+/// revision is the only one for a *subprefix* of the one being looked up.
+fn validate_candidate(
+idx:  RevlogIndex,
+prefix: NodePrefixRef,
+rev: Option,
+) -> Result, NodeMapError> {
+if prefix.is_prefix_of(_NODE) {
+// NULL_REVISION always matches a prefix made only of zeros
+// and any other *valid* result is an ambiguity
+match rev {
+None => Ok(Some(NULL_REVISION)),
+Some(r) => match has_prefix_or_none(idx, prefix, r)? {
+None => Ok(Some(NULL_REVISION)),
+_ => Err(NodeMapError::MultipleResults),
+},
+}
+} else {
+rev.map_or(Ok(None), |r| has_prefix_or_none(idx, prefix, r))
+}
+}
+
 impl NodeTree {
 /// Initiate a NodeTree from an immutable slice-like of `Block`
 ///
@@ -361,8 +387,6 @@
 }
 
 /// Main working method for `NodeTree` searches
-///
-/// This partial implementation lacks special cases for NULL_REVISION
 fn lookup<'p>(
 ,
 prefix: NodePrefixRef<'p>,
@@ -613,9 +637,7 @@
 idx:  RevlogIndex,
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError> {
-self.lookup(prefix.clone()).and_then(|opt| {
-opt.map_or(Ok(None), |rev| has_prefix_or_none(idx, prefix, rev))
-})
+validate_candidate(idx, prefix.clone(), self.lookup(prefix)?)
 }
 }
 
@@ -748,8 +770,9 @@
 
 assert_eq!(nt.find_hex(, "0"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "01"), Ok(Some(9)));
-assert_eq!(nt.find_hex(, "00"), Ok(Some(0)));
+assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "00a"), Ok(Some(0)));
+assert_eq!(nt.find_hex(, "000"), Ok(Some(NULL_REVISION)));
 }
 
 #[test]
@@ -768,7 +791,8 @@
 };
 assert_eq!(nt.find_hex(, "10")?, Some(1));
 assert_eq!(nt.find_hex(, "c")?, Some(2));
-assert_eq!(nt.find_hex(, "00")?, Some(0));
+assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
+assert_eq!(nt.find_hex(, "000")?, Some(NULL_REVISION));
 assert_eq!(nt.find_hex(, "01")?, Some(9));
 Ok(())
 }



To: gracinet, #hg-reviewers, kevincox, marmoute
Cc: marmoute, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7796: rust-nodemap: input/output primitives

2020-02-26 Thread gracinet (Georges Racinet)
Closed by commit rHGc3cc881aaa17: rust-nodemap: input/output primitives 
(authored by gracinet).
gracinet marked an inline comment as done.
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=20281=20310

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
 };
 
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -172,20 +174,42 @@
 /// represented at all, because we want an immutable empty nodetree
 /// to be valid.
 
-#[derive(Clone, PartialEq)]
-pub struct Block([RawElement; 16]);
+#[derive(Copy, Clone)]
+pub struct Block([u8; BLOCK_SIZE]);
+
+/// Not derivable for arrays of length >32 until const generics are stable
+impl PartialEq for Block {
+fn eq(, other: ) -> bool {
+[..] == [..]
+}
+}
+
+pub const BLOCK_SIZE: usize = 64;
 
 impl Block {
 fn new() -> Self {
-Block([-1; 16])
+// -1 in 2's complement to create an absent node
+let byte: u8 = 255;
+Block([byte; BLOCK_SIZE])
 }
 
 fn get(, nybble: u8) -> Element {
-Element::from(RawElement::from_be(self.0[nybble as usize]))
+let index = nybble as usize * mem::size_of::();
+Element::from(RawElement::from_be_bytes([
+self.0[index],
+self.0[index + 1],
+self.0[index + 2],
+self.0[index + 3],
+]))
 }
 
 fn set( self, nybble: u8, element: Element) {
-self.0[nybble as usize] = RawElement::to_be(element.into())
+let values = RawElement::to_be_bytes(element.into());
+let index = nybble as usize * mem::size_of::();
+self.0[index] = values[0];
+self.0[index + 1] = values[1];
+self.0[index + 2] = values[2];
+self.0[index + 3] = values[3];
 }
 }
 
@@ -230,9 +254,9 @@
 }
 
 /// Return `None` unless the `Node` for `rev` has given prefix in `index`.
-fn has_prefix_or_none<'p>(
+fn has_prefix_or_none(
 idx:  RevlogIndex,
-prefix: NodePrefixRef<'p>,
+prefix: NodePrefixRef,
 rev: Revision,
 ) -> Result, NodeMapError> {
 idx.node(rev)
@@ -262,6 +286,67 @@
 }
 }
 
+/// Create from an opaque bunch of bytes
+///
+/// The created `NodeTreeBytes` from `buffer`,
+/// of which exactly `amount` bytes are used.
+///
+/// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+/// - `offset` allows for the final file format to include fixed data
+///   (generation number, behavioural flags)
+/// - `amount` is expressed in bytes, and is not automatically derived from
+///   `bytes`, so that a caller that manages them atomically can perform
+///   temporary disk serializations and still rollback easily if needed.
+///   First use-case for this would be to support Mercurial shell hooks.
+///
+/// panics if `buffer` is smaller than `amount`
+pub fn load_bytes(
+bytes: Box + Send>,
+amount: usize,
+) -> Self {
+NodeTree::new(Box::new(NodeTreeBytes::new(bytes, amount)))
+}
+
+/// Retrieve added `Block` and the original immutable data
+pub fn into_readonly_and_added(
+self,
+) -> (Box + Send>, Vec) {
+let mut vec = self.growable;
+let readonly = self.readonly;
+if readonly.last() != Some() {
+vec.push(self.root);
+}
+(readonly, vec)
+}
+
+/// Retrieve added `Blocks` as bytes, ready to be written to persistent
+/// storage
+pub fn into_readonly_and_added_bytes(
+self,
+) -> (Box + Send>, Vec) {
+let (readonly, vec) = self.into_readonly_and_added();
+// Prevent running `v`'s destructor so we are in complete control
+// of the allocation.
+let vec = mem::ManuallyDrop::new(vec);
+
+// Transmute the `Vec` to a `Vec`. Blocks are contiguous
+// bytes, so this is perfectly safe.
+let bytes = unsafe {
+// Assert that `Block` hasn't been changed and has no padding
+let _: [u8; 4 * BLOCK_SIZE] =
+std::mem::transmute([Block::new(); 4]);
+
+// /!\ Any use of `vec` after this is use-after-free.
+// TODO: use `into_raw_parts` once stabilized
+Vec::from_raw_parts(
+vec.as_ptr() as *mut u8,
+vec.len() * BLOCK_SIZE,
+vec.capacity() * BLOCK_SIZE,
+)
+};
+

D7795: rust-nodemap: insert method

2020-02-12 Thread gracinet (Georges Racinet)
Closed by commit rHGd2da8667125b: rust-nodemap: insert method (authored by 
gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

CHANGED PRIOR TO COMMIT
  https://phab.mercurial-scm.org/D7795?vs=20024=20169#toc

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7795?vs=20024=20169

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7795/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7795

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -15,6 +15,7 @@
 use super::{
 Node, NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex,
 };
+
 use std::fmt;
 use std::ops::Deref;
 use std::ops::Index;
@@ -96,6 +97,15 @@
 }
 }
 
+pub trait MutableNodeMap: NodeMap {
+fn insert(
+ self,
+index: ,
+node: ,
+rev: Revision,
+) -> Result<(), NodeMapError>;
+}
+
 /// Low level NodeTree [`Blocks`] elements
 ///
 /// These are exactly as for instance on persistent storage.
@@ -292,6 +302,112 @@
 done: false,
 }
 }
+/// Return a mutable reference for `Block` at index `idx`.
+///
+/// If `idx` lies in the immutable area, then the reference is to
+/// a newly appended copy.
+///
+/// Returns (new_idx, glen, mut_ref) where
+///
+/// - `new_idx` is the index of the mutable `Block`
+/// - `mut_ref` is a mutable reference to the mutable Block.
+/// - `glen` is the new length of `self.growable`
+///
+/// Note: the caller wouldn't be allowed to query `self.growable.len()`
+/// itself because of the mutable borrow taken with the returned `Block`
+fn mutable_block( self, idx: usize) -> (usize,  Block, usize) {
+let ro_blocks = 
+let ro_len = ro_blocks.len();
+let glen = self.growable.len();
+if idx < ro_len {
+// TODO OPTIM I think this makes two copies
+self.growable.push(ro_blocks[idx].clone());
+(glen + ro_len,  self.growable[glen], glen + 1)
+} else if glen + ro_len == idx {
+(idx,  self.root, glen)
+} else {
+(idx,  self.growable[idx - ro_len], glen)
+}
+}
+
+/// Main insertion method
+///
+/// This will dive in the node tree to find the deepest `Block` for
+/// `node`, split it as much as needed and record `node` in there.
+/// The method then backtracks, updating references in all the visited
+/// blocks from the root.
+///
+/// All the mutated `Block` are copied first to the growable part if
+/// needed. That happens for those in the immutable part except the root.
+pub fn insert(
+ self,
+index: ,
+node: ,
+rev: Revision,
+) -> Result<(), NodeMapError> {
+let ro_len = ();
+
+let mut visit_steps: Vec<_> = self.visit(node.into()).collect();
+let read_nybbles = visit_steps.len();
+// visit_steps cannot be empty, since we always visit the root block
+let deepest = visit_steps.pop().unwrap();
+
+let (mut block_idx, mut block, mut glen) =
+self.mutable_block(deepest.block_idx);
+
+if let Element::Rev(old_rev) = deepest.element {
+let old_node = index
+.node(old_rev)
+.ok_or_else(|| NodeMapError::RevisionNotInIndex(old_rev))?;
+if old_node == node {
+return Ok(()); // avoid creating lots of useless blocks
+}
+
+// Looping over the tail of nybbles in both nodes, creating
+// new blocks until we find the difference
+let mut new_block_idx = ro_len + glen;
+let mut nybble = deepest.nybble;
+for nybble_pos in read_nybbles..node.nybbles_len() {
+block.set(nybble, Element::Block(new_block_idx));
+
+let new_nybble = node.get_nybble(nybble_pos);
+let old_nybble = old_node.get_nybble(nybble_pos);
+
+if old_nybble == new_nybble {
+self.growable.push(Block::new());
+block =  self.growable[glen];
+glen += 1;
+new_block_idx += 1;
+nybble = new_nybble;
+} else {
+let mut new_block = Block::new();
+new_block.set(old_nybble, Element::Rev(old_rev));
+new_block.set(new_nybble, Element::Rev(rev));
+self.growable.push(new_block);
+break;
+}
+}
+} else {
+// Free slot in the deepest block: no splitting has to be done
+

D8019: rust-node: avoid meaningless read at the end of odd prefix

2020-01-28 Thread gracinet (Georges Racinet)
Closed by commit rHGbe52b7372ec2: rust-node: avoid meaningless read at the end 
of odd prefix (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8019?vs=19648=19658

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8019/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8019

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
--- a/rust/hg-core/src/revlog/node.rs
+++ b/rust/hg-core/src/revlog/node.rs
@@ -223,6 +223,7 @@
 /// This is also the `i`th hexadecimal digit in numeric form,
 /// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
 pub fn get_nybble(, i: usize) -> u8 {
+assert!(i < self.len());
 get_nybble(self.buf, i)
 }
 }



To: gracinet, #hg-reviewers
Cc: kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8019: rust-node: avoid meaningless read at the end of odd prefix

2020-01-27 Thread gracinet (Georges Racinet)
gracinet created this revision.
Herald added subscribers: mercurial-devel, kevincox.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This should be heavily factored out by the CPU branch predictor
  anyway.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D8019

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
--- a/rust/hg-core/src/revlog/node.rs
+++ b/rust/hg-core/src/revlog/node.rs
@@ -223,6 +223,7 @@
 /// This is also the `i`th hexadecimal digit in numeric form,
 /// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
 pub fn get_nybble(, i: usize) -> u8 {
+assert!(i < self.len());
 get_nybble(self.buf, i)
 }
 }



To: gracinet, #hg-reviewers
Cc: kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7794: rust-nodemap: generic NodeTreeVisitor

2020-01-27 Thread gracinet (Georges Racinet)
Closed by commit rHG796d05f3fa84: rust-nodemap: generic NodeTreeVisitor 
(authored by gracinet).
gracinet marked an inline comment as done.
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7794?vs=19634=19647

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7794/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7794

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -272,17 +272,82 @@
 ,
 prefix: NodePrefixRef<'p>,
 ) -> Result, NodeMapError> {
-let mut visit = self.len() - 1;
-for i in 0..prefix.len() {
-let nybble = prefix.get_nybble(i);
-match self[visit].get(nybble) {
-Element::None => return Ok(None),
-Element::Rev(r) => return Ok(Some(r)),
-Element::Block(idx) => visit = idx,
+for visit_item in self.visit(prefix) {
+if let Some(opt) = visit_item.final_revision() {
+return Ok(opt);
 }
 }
 Err(NodeMapError::MultipleResults)
 }
+
+fn visit<'n, 'p>(
+&'n self,
+prefix: NodePrefixRef<'p>,
+) -> NodeTreeVisitor<'n, 'p> {
+NodeTreeVisitor {
+nt: self,
+prefix: prefix,
+visit: self.len() - 1,
+nybble_idx: 0,
+done: false,
+}
+}
+}
+
+struct NodeTreeVisitor<'n, 'p> {
+nt: &'n NodeTree,
+prefix: NodePrefixRef<'p>,
+visit: usize,
+nybble_idx: usize,
+done: bool,
+}
+
+#[derive(Debug, PartialEq, Clone)]
+struct NodeTreeVisitItem {
+block_idx: usize,
+nybble: u8,
+element: Element,
+}
+
+impl<'n, 'p> Iterator for NodeTreeVisitor<'n, 'p> {
+type Item = NodeTreeVisitItem;
+
+fn next( self) -> Option {
+if self.done || self.nybble_idx >= self.prefix.len() {
+return None;
+}
+
+let nybble = self.prefix.get_nybble(self.nybble_idx);
+self.nybble_idx += 1;
+
+let visit = self.visit;
+let element = self.nt[visit].get(nybble);
+if let Element::Block(idx) = element {
+self.visit = idx;
+} else {
+self.done = true;
+}
+
+Some(NodeTreeVisitItem {
+block_idx: visit,
+nybble: nybble,
+element: element,
+})
+}
+}
+
+impl NodeTreeVisitItem {
+// Return `Some(opt)` if this item is final, with `opt` being the
+// `Revision` that it may represent.
+//
+// If the item is not terminal, return `None`
+fn final_revision() -> Option> {
+match self.element {
+Element::Block(_) => None,
+Element::Rev(r) => Some(Some(r)),
+Element::None => Some(None),
+}
+}
 }
 
 impl From> for NodeTree {



To: gracinet, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7791: rust-nodemap: NodeMap trait with simplest implementation

2020-01-27 Thread gracinet (Georges Racinet)
Closed by commit rHGe52401a95b94: rust-nodemap: NodeMap trait with simplest 
implementation (authored by gracinet).
gracinet marked 2 inline comments as done.
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7791?vs=19631=19644

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -12,8 +12,88 @@
 //! Following existing implicit conventions, the "nodemap" terminology
 //! is used in a more abstract context.
 
-use super::Revision;
+use super::{
+Node, NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex,
+};
 use std::fmt;
+use std::ops::Deref;
+
+#[derive(Debug, PartialEq)]
+pub enum NodeMapError {
+MultipleResults,
+InvalidNodePrefix(NodeError),
+/// A `Revision` stored in the nodemap could not be found in the index
+RevisionNotInIndex(Revision),
+}
+
+impl From for NodeMapError {
+fn from(err: NodeError) -> Self {
+NodeMapError::InvalidNodePrefix(err)
+}
+}
+
+/// Mapping system from Mercurial nodes to revision numbers.
+///
+/// ## `RevlogIndex` and `NodeMap`
+///
+/// One way to think about their relationship is that
+/// the `NodeMap` is a prefix-oriented reverse index of the `Node` information
+/// carried by a [`RevlogIndex`].
+///
+/// Many of the methods in this trait take a `RevlogIndex` argument
+/// which is used for validation of their results. This index must naturally
+/// be the one the `NodeMap` is about, and it must be consistent.
+///
+/// Notably, the `NodeMap` must not store
+/// information about more `Revision` values than there are in the index.
+/// In these methods, an encountered `Revision` is not in the index, a
+/// [`RevisionNotInIndex`] error is returned.
+///
+/// In insert operations, the rule is thus that the `NodeMap` must always
+/// be updated after the `RevlogIndex`
+/// be updated first, and the `NodeMap` second.
+///
+/// [`RevisionNotInIndex`]: enum.NodeMapError.html#variant.RevisionNotInIndex
+/// [`RevlogIndex`]: ../trait.RevlogIndex.html
+pub trait NodeMap {
+/// Find the unique `Revision` having the given `Node`
+///
+/// If no Revision matches the given `Node`, `Ok(None)` is returned.
+fn find_node(
+,
+index:  RevlogIndex,
+node: ,
+) -> Result, NodeMapError> {
+self.find_bin(index, node.into())
+}
+
+/// Find the unique Revision whose `Node` starts with a given binary prefix
+///
+/// If no Revision matches the given prefix, `Ok(None)` is returned.
+///
+/// If several Revisions match the given prefix, a [`MultipleResults`]
+/// error is returned.
+fn find_bin<'a>(
+,
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'a>,
+) -> Result, NodeMapError>;
+
+/// Find the unique Revision whose `Node` hexadecimal string representation
+/// starts with a given prefix
+///
+/// If no Revision matches the given prefix, `Ok(None)` is returned.
+///
+/// If several Revisions match the given prefix, a [`MultipleResults`]
+/// error is returned.
+fn find_hex(
+,
+idx:  RevlogIndex,
+prefix: ,
+) -> Result, NodeMapError> {
+self.find_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
+}
+}
 
 /// Low level NodeTree [`Blocks`] elements
 ///
@@ -110,9 +190,86 @@
 }
 }
 
+/// A 16-radix tree with the root block at the end
+pub struct NodeTree {
+readonly: Box + Send>,
+}
+
+/// Return `None` unless the `Node` for `rev` has given prefix in `index`.
+fn has_prefix_or_none<'p>(
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'p>,
+rev: Revision,
+) -> Result, NodeMapError> {
+idx.node(rev)
+.ok_or_else(|| NodeMapError::RevisionNotInIndex(rev))
+.map(|node| {
+if prefix.is_prefix_of(node) {
+Some(rev)
+} else {
+None
+}
+})
+}
+
+impl NodeTree {
+/// Main working method for `NodeTree` searches
+///
+/// This partial implementation lacks special cases for NULL_REVISION
+fn lookup<'p>(
+,
+prefix: NodePrefixRef<'p>,
+) -> Result, NodeMapError> {
+let blocks: &[Block] = &*self.readonly;
+if blocks.is_empty() {
+return Ok(None);
+}
+let mut visit = blocks.len() - 1;
+for i in 0..prefix.len() {
+let nybble = prefix.get_nybble(i);
+match blocks[visit].get(nybble) {
+Element::None => return Ok(None),
+  

D7793: rust-nodemap: mutable NodeTree data structure

2020-01-27 Thread gracinet (Georges Racinet)
Closed by commit rHGa19331456d48: rust-nodemap: mutable NodeTree data structure 
(authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7793?vs=19633=19646

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7793/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7793

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -191,16 +191,31 @@
 }
 }
 
-/// A 16-radix tree with the root block at the end
+/// A mutable 16-radix tree with the root block logically at the end
+///
+/// Because of the append only nature of our node trees, we need to
+/// keep the original untouched and store new blocks separately.
+///
+/// The mutable root `Block` is kept apart so that we don't have to rebump
+/// it on each insertion.
 pub struct NodeTree {
 readonly: Box + Send>,
+growable: Vec,
+root: Block,
 }
 
 impl Index for NodeTree {
 type Output = Block;
 
 fn index(, i: usize) ->  {
-[i]
+let ro_len = self.readonly.len();
+if i < ro_len {
+[i]
+} else if i == ro_len + self.growable.len() {
+
+} else {
+[i - ro_len]
+}
 }
 }
 
@@ -222,12 +237,32 @@
 }
 
 impl NodeTree {
-fn len() -> usize {
-self.readonly.len()
+/// Initiate a NodeTree from an immutable slice-like of `Block`
+///
+/// We keep `readonly` and clone its root block if it isn't empty.
+fn new(readonly: Box + Send>) -> Self {
+let root = readonly
+.last()
+.map(|b| b.clone())
+.unwrap_or_else(|| Block::new());
+NodeTree {
+readonly: readonly,
+growable: Vec::new(),
+root: root,
+}
 }
 
+/// Total number of blocks
+fn len() -> usize {
+self.readonly.len() + self.growable.len() + 1
+}
+
+/// Implemented for completeness
+///
+/// A `NodeTree` always has at least the mutable root block.
+#[allow(dead_code)]
 fn is_empty() -> bool {
-self.len() == 0
+false
 }
 
 /// Main working method for `NodeTree` searches
@@ -237,9 +272,6 @@
 ,
 prefix: NodePrefixRef<'p>,
 ) -> Result, NodeMapError> {
-if self.is_empty() {
-return Ok(None);
-}
 let mut visit = self.len() - 1;
 for i in 0..prefix.len() {
 let nybble = prefix.get_nybble(i);
@@ -255,16 +287,18 @@
 
 impl From> for NodeTree {
 fn from(vec: Vec) -> Self {
-NodeTree {
-readonly: Box::new(vec),
-}
+Self::new(Box::new(vec))
 }
 }
 
 impl fmt::Debug for NodeTree {
 fn fmt(, f:  fmt::Formatter<'_>) -> fmt::Result {
-let blocks: &[Block] = &*self.readonly;
-write!(f, "readonly: {:?}", blocks)
+let readonly: &[Block] = &*self.readonly;
+write!(
+f,
+"readonly: {:?}, growable: {:?}, root: {:?}",
+readonly, self.growable, self.root
+)
 }
 }
 
@@ -365,7 +399,9 @@
 assert_eq!(
 format!("{:?}", nt),
 "readonly: \
- [{0: Rev(9)}, {0: Rev(0), 1: Rev(9)}, {0: Block(1), 1: Rev(1)}]"
+ [{0: Rev(9)}, {0: Rev(0), 1: Rev(9)}, {0: Block(1), 1: Rev(1)}], \
+ growable: [], \
+ root: {0: Block(1), 1: Rev(1)}",
 );
 }
 
@@ -374,7 +410,7 @@
 let mut idx: TestIndex = HashMap::new();
 pad_insert( idx, 1, "1234deadcafe");
 
-let nt = NodeTree::from(vec![block![1: Rev(1)]]);
+let nt = NodeTree::from(vec![block! {1: Rev(1)}]);
 assert_eq!(nt.find_hex(, "1")?, Some(1));
 assert_eq!(nt.find_hex(, "12")?, Some(1));
 assert_eq!(nt.find_hex(, "1234de")?, Some(1));
@@ -401,4 +437,25 @@
 assert_eq!(nt.find_hex(, "00"), Ok(Some(0)));
 assert_eq!(nt.find_hex(, "00a"), Ok(Some(0)));
 }
+
+#[test]
+fn test_mutated_find() -> Result<(), NodeMapError> {
+let mut idx = TestIndex::new();
+pad_insert( idx, 9, "012");
+pad_insert( idx, 0, "00a");
+pad_insert( idx, 2, "cafe");
+pad_insert( idx, 3, "15");
+pad_insert( idx, 1, "10");
+
+let nt = NodeTree {
+readonly: sample_nodetree().readonly,
+growable: vec![block![0: Rev(1), 5: Rev(3)]],
+root: block![0: Block(1), 1:Block(3), 12: Rev(2)],
+};
+assert_eq!(nt.find_hex(, "10")?, Some(1));
+assert_eq!(nt.find_hex(, "c")?, Some(2));
+assert_eq!(nt.find_hex(, "00")?, Some(0));
+assert_eq!(nt.find_hex(, "01")?, 

D7792: rust-nodemap: abstracting the indexing

2020-01-27 Thread gracinet (Georges Racinet)
Closed by commit rHG220d4d2e3185: rust-nodemap: abstracting the indexing 
(authored by gracinet).
gracinet marked an inline comment as done.
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7792?vs=19632=19645

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7792/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7792

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,6 +17,7 @@
 };
 use std::fmt;
 use std::ops::Deref;
+use std::ops::Index;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -195,6 +196,14 @@
 readonly: Box + Send>,
 }
 
+impl Index for NodeTree {
+type Output = Block;
+
+fn index(, i: usize) ->  {
+[i]
+}
+}
+
 /// Return `None` unless the `Node` for `rev` has given prefix in `index`.
 fn has_prefix_or_none<'p>(
 idx:  RevlogIndex,
@@ -213,6 +222,14 @@
 }
 
 impl NodeTree {
+fn len() -> usize {
+self.readonly.len()
+}
+
+fn is_empty() -> bool {
+self.len() == 0
+}
+
 /// Main working method for `NodeTree` searches
 ///
 /// This partial implementation lacks special cases for NULL_REVISION
@@ -220,14 +237,13 @@
 ,
 prefix: NodePrefixRef<'p>,
 ) -> Result, NodeMapError> {
-let blocks: &[Block] = &*self.readonly;
-if blocks.is_empty() {
+if self.is_empty() {
 return Ok(None);
 }
-let mut visit = blocks.len() - 1;
+let mut visit = self.len() - 1;
 for i in 0..prefix.len() {
 let nybble = prefix.get_nybble(i);
-match blocks[visit].get(nybble) {
+match self[visit].get(nybble) {
 Element::None => return Ok(None),
 Element::Rev(r) => return Ok(Some(r)),
 Element::Block(idx) => visit = idx,



To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7794: rust-nodemap: generic NodeTreeVisitor

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.
gracinet marked 2 inline comments as done.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:300
> would something like `block_index` be clearer?

I found it to be clearer for the emitter, but in the iterator implementation, 
it expresses what's to be done next, same as in many cases I've seeen in 
Mercurial

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7794/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7794

To: gracinet, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7790: rust-node: handling binary Node prefix

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.

INLINE COMMENTS

> martinvonz wrote in node.rs:226
> Should we check here that `i < self.len()`? I'm especially thinking of the 
> case of an odd-length prefix where we would otherwise silently return 0.

yes, indeed.

current callers from `nodemap` work either on full Nodes or (the visitor) by 
explicitely bounding with `prefix.len()`, but it's safer to protect it 
inconditonally.

I think a simple `assert!` is enough though: that's already what slicing does 
anyway.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

To: gracinet, #hg-reviewers, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7790: rust-node: handling binary Node prefix

2020-01-27 Thread gracinet (Georges Racinet)
Closed by commit rHG9896a8d0d3d2: rust-node: handling binary Node prefix 
(authored by gracinet).
This revision was automatically updated to reflect the committed changes.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7790?vs=19630=19643

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

AFFECTED FILES
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/node.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
--- a/rust/hg-core/src/revlog/node.rs
+++ b/rust/hg-core/src/revlog/node.rs
@@ -62,6 +62,7 @@
 #[derive(Debug, PartialEq)]
 pub enum NodeError {
 ExactLengthRequired(usize, String),
+PrefixTooLong(String),
 HexError(FromHexError, String),
 }
 
@@ -119,17 +120,119 @@
 }
 }
 
-impl From<(FromHexError, )> for NodeError {
-fn from(err_offender: (FromHexError, )) -> Self {
+impl> From<(FromHexError, T)> for NodeError {
+fn from(err_offender: (FromHexError, T)) -> Self {
 let (err, offender) = err_offender;
 match err {
 FromHexError::InvalidStringLength => {
 NodeError::ExactLengthRequired(
 NODE_NYBBLES_LENGTH,
-offender.to_string(),
+offender.as_ref().to_owned(),
 )
 }
-_ => NodeError::HexError(err, offender.to_string()),
+_ => NodeError::HexError(err, offender.as_ref().to_owned()),
+}
+}
+}
+
+/// The beginning of a binary revision SHA.
+///
+/// Since it can potentially come from an hexadecimal representation with
+/// odd length, it needs to carry around whether the last 4 bits are relevant
+/// or not.
+#[derive(Debug, PartialEq)]
+pub struct NodePrefix {
+buf: Vec,
+is_odd: bool,
+}
+
+impl NodePrefix {
+/// Convert from hexadecimal string representation
+///
+/// Similarly to `hex::decode`, can be used with Unicode string types
+/// (`String`, ``) as well as bytes.
+///
+/// To be used in FFI and I/O only, in order to facilitate future
+/// changes of hash format.
+pub fn from_hex(hex: impl AsRef<[u8]>) -> Result {
+let hex = hex.as_ref();
+let len = hex.len();
+if len > NODE_NYBBLES_LENGTH {
+return Err(NodeError::PrefixTooLong(
+String::from_utf8_lossy(hex).to_owned().to_string(),
+));
+}
+
+let is_odd = len % 2 == 1;
+let even_part = if is_odd { [..len - 1] } else { hex };
+let mut buf: Vec = Vec::from_hex(_part)
+.map_err(|e| (e, String::from_utf8_lossy(hex)))?;
+
+if is_odd {
+let latest_char = char::from(hex[len - 1]);
+let latest_nybble = latest_char.to_digit(16).ok_or_else(|| {
+(
+FromHexError::InvalidHexCharacter {
+c: latest_char,
+index: len - 1,
+},
+String::from_utf8_lossy(hex),
+)
+})? as u8;
+buf.push(latest_nybble << 4);
+}
+Ok(NodePrefix { buf, is_odd })
+}
+
+pub fn borrow() -> NodePrefixRef {
+NodePrefixRef {
+buf: ,
+is_odd: self.is_odd,
+}
+}
+}
+
+#[derive(Clone, Debug, PartialEq)]
+pub struct NodePrefixRef<'a> {
+buf: &'a [u8],
+is_odd: bool,
+}
+
+impl<'a> NodePrefixRef<'a> {
+pub fn len() -> usize {
+if self.is_odd {
+self.buf.len() * 2 - 1
+} else {
+self.buf.len() * 2
+}
+}
+
+pub fn is_prefix_of(, node: ) -> bool {
+if self.is_odd {
+let buf = self.buf;
+let last_pos = buf.len() - 1;
+node.data.starts_with(buf.split_at(last_pos).0)
+&& node.data[last_pos] >> 4 == buf[last_pos] >> 4
+} else {
+node.data.starts_with(self.buf)
+}
+}
+
+/// Retrieve the `i`th half-byte from the prefix.
+///
+/// This is also the `i`th hexadecimal digit in numeric form,
+/// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
+pub fn get_nybble(, i: usize) -> u8 {
+get_nybble(self.buf, i)
+}
+}
+
+/// A shortcut for full `Node` references
+impl<'a> From<&'a Node> for NodePrefixRef<'a> {
+fn from(node: &'a Node) -> Self {
+NodePrefixRef {
+buf: ,
+is_odd: false,
 }
 }
 }
@@ -188,4 +291,74 @@
 fn test_node_encode_hex() {
 assert_eq!(sample_node().encode_hex(), sample_node_hex());
 }
+
+#[test]
+fn test_prefix_from_hex() -> Result<(), NodeError> {
+assert_eq!(
+NodePrefix::from_hex("0e1")?,
+NodePrefix {
+buf: vec![14, 16],
+is_odd: true
+}
+);
+assert_eq!(
+

D7789: rust-revlog: a trait for the revlog index

2020-01-27 Thread gracinet (Georges Racinet)
Closed by commit rHG3fb39dc2e356: rust-revlog: a trait for the revlog index 
(authored by gracinet).
This revision was automatically updated to reflect the committed changes.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7789?vs=19629=19642

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7789/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7789

AFFECTED FILES
  rust/hg-core/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog.rs b/rust/hg-core/src/revlog.rs
--- a/rust/hg-core/src/revlog.rs
+++ b/rust/hg-core/src/revlog.rs
@@ -40,3 +40,17 @@
 ParentOutOfRange(Revision),
 WorkingDirectoryUnsupported,
 }
+
+/// The Mercurial Revlog Index
+///
+/// This is currently limited to the minimal interface that is needed for
+/// the [`nodemap`](nodemap/index.html) module
+pub trait RevlogIndex {
+/// Total number of Revisions referenced in this index
+fn len() -> usize;
+
+/// Return a reference to the Node or `None` if rev is out of bounds
+///
+/// `NULL_REVISION` is not considered to be out of bounds.
+fn node(, rev: Revision) -> Option<>;
+}



To: gracinet, #hg-reviewers, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7819: rust-nodemap: core implementation for shortest

2020-01-27 Thread gracinet (Georges Racinet)
gracinet marked an inline comment as done.
gracinet updated this revision to Diff 19640.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7819?vs=19639=19640

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7819/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7819

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,6 +17,7 @@
 RevlogIndex, NULL_REVISION,
 };
 
+use std::cmp::max;
 use std::fmt;
 use std::mem;
 use std::ops::Deref;
@@ -98,6 +99,42 @@
 ) -> Result, NodeMapError> {
 self.find_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
 }
+
+/// Give the size of the shortest node prefix that determines
+/// the revision uniquely.
+///
+/// From a binary node prefix, if it is matched in the node map, this
+/// returns the number of hexadecimal digits that would had sufficed
+/// to find the revision uniquely.
+///
+/// Returns `None` if no `Revision` could be found for the prefix.
+///
+/// If several Revisions match the given prefix, a [`MultipleResults`]
+/// error is returned.
+fn unique_prefix_len_bin<'a>(
+,
+idx:  RevlogIndex,
+node_prefix: NodePrefixRef<'a>,
+) -> Result, NodeMapError>;
+
+/// Same as `unique_prefix_len_bin`, with the hexadecimal representation
+/// of the prefix as input.
+fn unique_prefix_len_hex(
+,
+idx:  RevlogIndex,
+prefix: ,
+) -> Result, NodeMapError> {
+self.unique_prefix_len_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
+}
+
+/// Same as `unique_prefix_len_bin`, with a full `Node` as input
+fn unique_prefix_len_node(
+,
+idx:  RevlogIndex,
+node: ,
+) -> Result, NodeMapError> {
+self.unique_prefix_len_bin(idx, node.into())
+}
 }
 
 pub trait MutableNodeMap: NodeMap {
@@ -259,20 +296,24 @@
 fn validate_candidate<'p>(
 idx:  RevlogIndex,
 prefix: NodePrefixRef<'p>,
-rev: Option,
-) -> Result, NodeMapError> {
-if prefix.is_prefix_of(_NODE) {
-// NULL_REVISION always matches a prefix made only of zeros
+candidate: (Option, usize),
+) -> Result<(Option, usize), NodeMapError> {
+let (rev, steps) = candidate;
+if let Some(nz_nybble) = prefix.first_different_nybble(_NODE) {
+rev.map_or(Ok((None, steps)), |r| {
+has_prefix_or_none(idx, prefix, r)
+.map(|opt| (opt, max(steps, nz_nybble + 1)))
+})
+} else {
+// the prefix is only made of zeros; NULL_REVISION always matches it
 // and any other *valid* result is an ambiguity
 match rev {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 Some(r) => match has_prefix_or_none(idx, prefix, r)? {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 _ => Err(NodeMapError::MultipleResults),
 },
 }
-} else {
-rev.map_or(Ok(None), |r| has_prefix_or_none(idx, prefix, r))
 }
 }
 
@@ -356,13 +397,26 @@
 }
 
 /// Main working method for `NodeTree` searches
+///
+/// The first returned value is the result of analysing `NodeTree` data
+/// *alone*: whereas `None` guarantees that the given prefix is absent
+/// from the `NodeTree` data (but still could match `NULL_NODE`), with
+/// `Some(rev)`, it is to be understood that `rev` is the unique `Revision`
+/// that could match the prefix. Actually, all that can be inferred from
+/// the `NodeTree` data is that `rev` is the revision with the longest
+/// common node prefix with the given prefix.
+///
+/// The second returned value is the size of the smallest subprefix
+/// of `prefix` that would give the same result, i.e. not the
+/// `MultipleResults` error variant (again, using only the data of the
+/// `NodeTree`).
 fn lookup<'p>(
 ,
 prefix: NodePrefixRef<'p>,
-) -> Result, NodeMapError> {
-for visit_item in self.visit(prefix) {
+) -> Result<(Option, usize), NodeMapError> {
+for (i, visit_item) in self.visit(prefix).enumerate() {
 if let Some(opt) = visit_item.final_revision() {
-return Ok(opt);
+return Ok((opt, i + 1));
 }
 }
 Err(NodeMapError::MultipleResults)
@@ -607,6 +661,16 @@
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError> {
 validate_candidate(idx, prefix.clone(), self.lookup(prefix)?)
+.map(|(opt, _shortest)| opt)
+}
+
+fn unique_prefix_len_bin<'a>(
+,
+idx:  RevlogIndex,
+

D7819: rust-nodemap: core implementation for shortest

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.
gracinet marked 3 inline comments as done.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:337
> Can you describe the return value in the doc comment.

Done

> martinvonz wrote in nodemap.rs:575
> I'm fine with either. We'll expose it as "shortest" in the python API anyway, 
> I assume. Feel free to change as far as I'm concerned.

Okay, I've settled with `unique_prefix_len_bin` etc, keeping the convention 
that it's `something_bin`, `something_hex`, etc.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7819/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7819

To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7796: rust-nodemap: input/output primitives

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.
gracinet marked 2 inline comments as done.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:268
> I thought about this more and I think I am okay doing it this way. It seems 
> like this should be well defined as long as there is no padding. However on 
> that note I would want to add a check that there is no padding as expected. I 
> would also like to ensure that this fails to compile if there is ever 
> padding, even in release mode. I think this can be accomplished by something 
> like:
> 
>   let _: [u8; 4 * BLOCK_SIZE] = std::mem::transmute([Block::new(); 4]);
> 
> I would probably want to repeat this check near any code that is relying on 
> this invariant.

About a method to output to a writer: for the time being, we're avoiding doing 
I/O directly in Rust because most of it is shared with Python through layers 
that perform various sanitizations, lock management etc. That's why the 
simplest is to transfer bytes out.

Context note: the data structure (`Block`) is directly taken from the C 
version, which is non-persistent, but I believe that these 64 bytes are just 
some low-level programmer reflex. It's probably not a coincidence that IIRC 
that's also the length of cache lines on most current CPUs.

Anyway, the point of all of this is to get on disk without padding, so yes, we 
could implement our own non-padding by changing the definition of `Block` to 
`[u8; 64]`. In the end, we are forced to trust what's on disk anyway.

> kevincox wrote in nodemap.rs:272
> If someone panics between the `from_raw_parts` and `mem::forget` this is a 
> double free. Right now I think this can't happen but it is completely 
> possible that the code changes between then and now. I would prefer using 
> `Vec::from_raw_parts` to make sure that there is only one `Vec` alive with 
> that pointer at a time. This risks a leak in the face of panic but I think 
> that is much better.

Not sure to understand what you suggest here, since I'm already using 
`Vec::from_raw_parts` in that method. I couldn't find an `into_raw_parts`.
Anyway, the official example 
  has the 
`mem::forget` before  `Vec::from_raw_parts`. Would that be sufficient?

I agree that a leak upon some exceptional-cannot-happen condition is better 
than a double free.

Also, forgot to came back to that one in the latest phab update

> kevincox wrote in nodemap.rs:433
> This is a weird API. Why does new take a buffer and an offset? Can it take 
> either a slice or just remove the offset parameter and always have the buffer 
> start at 0? It is very strange to be passing in data that we own but isn't 
> ever used.

Owning some memory and not using the whole of it is anyway what happens when 
that memory region is just obtained from a mmap (which is the whole point of 
carrying theses `Deref` all around).
Technically, in a mmap, I suppose we actually only own the file descriptor and 
the resulting pointer, but we can't just convert it to an inert slice.

Anyway it's now confirmed that we won't be needing the offset finally, so I've 
removed it.  Some data at the end of the bytes slice may be ignored, too : it 
would be the result of aborted transactions.

Note: the Python layer will maintain the amount of validated blocks in a 
separate small file which is updated atomically. Any future reimplementation of 
this in Rust would have to do the same.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7794: rust-nodemap: generic NodeTreeVisitor

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.
gracinet marked 2 inline comments as done.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:264-269
> There will only be a valid result if this is a leaf, so it might be better to 
> combine `leaf` and `opt` into one `Option>` so the caller 
> cannot mistake a `opt=None` for "missing" when `leaf=false`.

I understand your concern, but `>` would defeat the 
purpose of not having a dead match arm to fill with a "can't happen" comment in 
the future `insert()`.

Fortunately, @kevincox suggestion of having `NodeTreeVisitor` emit  a `struct` 
provides us with the best of both worlds, since it can have methods.

It should be fully transparent performance-wise, I just hope that is really 
true.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7794/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7794

To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7793: rust-nodemap: mutable NodeTree data structure

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:158
> This strikes me as a weird name. The fact that it is an adjective rather than 
> a noun is a hint. Can you rename to answer "Growable what?"

In a previous version, I was calling it `mutable` (also an adjective, btw), but 
that became just wrong with the introduction of the separate root block. I 
don't have many ideas:

- `added_inner_blocks` ? actually, these are added on top of the readonly part 
and often mask some of the readonly blocks
- `added_non_root_blocks` ? Eww
- `mutable_inner_blocks` ?

I'm no native speaker, but the adjective thing doesn't deter me, I would expect 
it to be rather clear that it refers to blocks, given the type.
Open to suggestions, leaving as is for the time being

> kevincox wrote in nodemap.rs:249
> I would use 
> https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_struct 
> for consistency unless you really want to avoid printing the struct name.

sadly, `Formatter.debug_struct` doesn't tale `?Sized` arguments (at least in 
our target 1.34 version):

 --> hg-core/src/revlog/nodemap.rs:298:32
  |
  298 | .field("readonly", readonly)
  | doesn't have a size known at 
compile-time
  |
  = help: the trait `std::marker::Sized` is not implemented for 
`[revlog::nodemap::Block]`

Kept it as it was for the time being, just dropped the unneeded  anonymous 
lifetime

Omitting the struct name was indeed on purpose, to make it more manageable in 
test data. That was useful in early versions. I would have been willing to drop 
it for the final form. Anyone using this for real data wouldn't care about just 
a few bytes at the beginning.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7793/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7793

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7792: rust-nodemap: abstracting the indexing

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.
gracinet marked 2 inline comments as done.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:193
> I would add a `self.is_empty()` helper. It's good practice for anything that 
> has a `.len()`.

Sure

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7792/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7792

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7791: rust-nodemap: NodeMap trait with simplest implementation

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added inline comments.
gracinet marked 3 inline comments as done.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:37
> Can you please add doc-comments for this? I find that documenting trait 
> methods is especially important.

Sure, indeed it's more important than with the `impl`.

> martinvonz wrote in nodemap.rs:150
> How about something like this then?
> 
>   type BlockSource = Box + Send>;
>   type ByteSource = Box + Send>;
> 
> I won't insist, so up to you if you think it helps.

Missed the type (not trait) alias suggestion. Maybe for the next update, then

> martinvonz wrote in nodemap.rs:153
> I understand that you picked this interface because it works well for the 
> caller, but it feel a little weird to always return `None` or `Some(rev)` 
> where `rev` is one of the function inputs. It's practically a boolean-valued 
> function. Do the callers get much harder to read if you actually make it 
> boolean-valued?

Perhaps a better name would be better than this `has_` that indeed feels 
boolean?

`check_prefix`?  `confirm` ?

Previous naming was `validate_candidate`, but that very same name is used at 
the end of the series when it becomes involved with `NULL_NODE` etc. Then this 
`has_prefix_or_none` becomes one step in the final verification.

Returning a `bool` would mean adding a closure to the callers. making it less 
obvious that they are just chaining the maturation of the response.

I'm leaving unchanged for now, not sure what the best is. Still note that this 
is a private function, there's no risk an external caller would be confused by 
what it does.

> kevincox wrote in nodemap.rs:205
> I don't think the `<'_>` is necessary.

Removed

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: Alphare, martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7790: rust-node: handling binary Node prefix

2020-01-27 Thread gracinet (Georges Racinet)
gracinet added a comment.


  > Depends on the definition of NodePrefixRef. I don't think there would be 
any extra allocation if you define it like this:
  
  
  
pub enum NodePrefixRef<'a> {
Full(&'a Node),
Prefix(&'a NodePrefix),
}
  
  That's an interesting idea, another possibility would be to define a trait 
(essentially for `get_nybble`) and implement it for `` and ``. 
We'll see when we're pass preliminary tests, thanks.

INLINE COMMENTS

> martinvonz wrote in node.rs:79
> Why not use `(len + 1) / 2` as capacity?

Just thought 20 would be the simplest one-size-fit-all. With the preparations 
for different hash length (and potentially even somewhat dynamic), it's really 
obsolete now (switched to actual len based, yes).

> martinvonz wrote in node.rs:89
> Is this lifetime parameter needed?

ah yes indeed the compiler seems to be able to infer that `` and 
`Nodeprefix.buf` have the same lifteime, and that it must then be equal to the 
lifetime parameter of `NodePrefix` itself.

> martinvonz wrote in node.rs:136
> What does the `&*` do? Specifically, what's different if you drop that part?

These are conversions that need to be explicit when the compiler doesn't insert 
them magically. Usually, I try to avoid them, but it can happen that they 
become non necessary in the final form after some changes.

Not needed in the new form with a real struct.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

To: gracinet, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7819: rust-nodemap: core implementation for shortest

2020-01-27 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19639.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7819?vs=19331=19639

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7819/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7819

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,6 +17,7 @@
 RevlogIndex, NULL_REVISION,
 };
 
+use std::cmp::max;
 use std::fmt;
 use std::mem;
 use std::ops::Deref;
@@ -98,6 +99,42 @@
 ) -> Result, NodeMapError> {
 self.find_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
 }
+
+/// Give the size of the shortest node prefix that determines
+/// the revision uniquely.
+///
+/// From a binary node prefix, if it is matched in the node map, this
+/// returns the number of hexadecimal digits that would had sufficed
+/// to find the revision uniquely.
+///
+/// Returns `None` if no `Revision` could be found for the prefix.
+///
+/// If several Revisions match the given prefix, a [`MultipleResults`]
+/// error is returned.
+fn unique_prefix_len_bin<'a>(
+,
+idx:  RevlogIndex,
+node_prefix: NodePrefixRef<'a>,
+) -> Result, NodeMapError>;
+
+/// Same as `unique_prefix_len_bin`, with the hexadecimal representation
+/// of the prefix as input.
+fn unique_prefix_len_hex(
+,
+idx:  RevlogIndex,
+prefix: ,
+) -> Result, NodeMapError> {
+self.unique_prefix_len_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
+}
+
+/// Same as `unique_prefix_len_bin`, with a full `Node` as input
+fn unique_prefix_len_node(
+,
+idx:  RevlogIndex,
+node: ,
+) -> Result, NodeMapError> {
+self.unique_prefix_len_bin(idx, node.into())
+}
 }
 
 pub trait MutableNodeMap: NodeMap {
@@ -259,20 +296,24 @@
 fn validate_candidate<'p>(
 idx:  RevlogIndex,
 prefix: NodePrefixRef<'p>,
-rev: Option,
-) -> Result, NodeMapError> {
-if prefix.is_prefix_of(_NODE) {
-// NULL_REVISION always matches a prefix made only of zeros
+cand: (Option, usize),
+) -> Result<(Option, usize), NodeMapError> {
+let (rev, steps) = cand;
+if let Some(nz_nybble) = prefix.first_different_nybble(_NODE) {
+rev.map_or(Ok((None, steps)), |r| {
+has_prefix_or_none(idx, prefix, r)
+.map(|opt| (opt, max(steps, nz_nybble + 1)))
+})
+} else {
+// the prefix is only made of zeros; NULL_REVISION always matches it
 // and any other *valid* result is an ambiguity
 match rev {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 Some(r) => match has_prefix_or_none(idx, prefix, r)? {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 _ => Err(NodeMapError::MultipleResults),
 },
 }
-} else {
-rev.map_or(Ok(None), |r| has_prefix_or_none(idx, prefix, r))
 }
 }
 
@@ -356,13 +397,26 @@
 }
 
 /// Main working method for `NodeTree` searches
+///
+/// The first returned value is the result of analysing `NodeTree` data
+/// *alone*: whereas `None` guarantees that the given prefix is absent
+/// from the `NodeTree` data (but still could match `NULL_NODE`), with
+/// `Some(rev)`, it is to be understood that `rev` is the unique `Revision`
+/// that could match the prefix. Actually, all that can be inferred from
+/// the `NodeTree` data is that `rev` is the revision with the longest
+/// common node prefix with the given prefix.
+///
+/// The second returned value is the size of the smallest subprefix
+/// of `prefix` that would give the same result, i.e. not the
+/// `MultipleResults` error variant (again, using only the data of the
+/// `NodeTree`).
 fn lookup<'p>(
 ,
 prefix: NodePrefixRef<'p>,
-) -> Result, NodeMapError> {
-for visit_item in self.visit(prefix) {
+) -> Result<(Option, usize), NodeMapError> {
+for (i, visit_item) in self.visit(prefix).enumerate() {
 if let Some(opt) = visit_item.final_revision() {
-return Ok(opt);
+return Ok((opt, i + 1));
 }
 }
 Err(NodeMapError::MultipleResults)
@@ -607,6 +661,16 @@
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError> {
 validate_candidate(idx, prefix.clone(), self.lookup(prefix)?)
+.map(|(opt, _shortest)| opt)
+}
+
+fn unique_prefix_len_bin<'a>(
+,
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'a>,
+) -> Result, 

D7797: rust-nodemap: pure Rust example

2020-01-27 Thread gracinet (Georges Racinet)
gracinet edited the summary of this revision.
gracinet updated this revision to Diff 19637.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7797?vs=19046=19637

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7797/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7797

AFFECTED FILES
  rust/Cargo.lock
  rust/hg-core/Cargo.toml
  rust/hg-core/examples/nodemap/index.rs
  rust/hg-core/examples/nodemap/main.rs

CHANGE DETAILS

diff --git a/rust/hg-core/examples/nodemap/main.rs 
b/rust/hg-core/examples/nodemap/main.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/examples/nodemap/main.rs
@@ -0,0 +1,150 @@
+// Copyright 2019-2020 Georges Racinet 
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+extern crate clap;
+extern crate hg;
+extern crate memmap;
+
+use clap::*;
+use hg::revlog::node::*;
+use hg::revlog::nodemap::*;
+use hg::revlog::*;
+use memmap::MmapOptions;
+use rand::Rng;
+use std::fs::File;
+use std::io;
+use std::io::Write;
+use std::path::{Path, PathBuf};
+use std::str::FromStr;
+use std::time::Instant;
+
+mod index;
+use index::Index;
+
+fn mmap_index(repo_path: ) -> Index {
+let mut path = PathBuf::from(repo_path);
+path.extend([".hg", "store", "00changelog.i"].iter());
+Index::load_mmap(path)
+}
+
+fn mmap_nodemap(path: ) -> NodeTree {
+let file = File::open(path).unwrap();
+let mmap = unsafe { MmapOptions::new().map().unwrap() };
+let len = mmap.len();
+NodeTree::load_bytes(Box::new(mmap), len)
+}
+
+/// Scan the whole index and create the corresponding nodemap file at `path`
+fn create(index: , path: ) -> io::Result<()> {
+let mut file = File::create(path)?;
+let start = Instant::now();
+let mut nm = NodeTree::default();
+for rev in 0..index.len() {
+let rev = rev as Revision;
+nm.insert(index, index.node(rev).unwrap(), rev).unwrap();
+}
+eprintln!("Nodemap constructed in RAM in {:?}", start.elapsed());
+file.write(_readonly_and_added_bytes().1)?;
+eprintln!("Nodemap written to disk");
+Ok(())
+}
+
+fn query(index: , nm: , prefix: ) {
+let start = Instant::now();
+let res = nm.find_hex(index, prefix);
+println!("Result found in {:?}: {:?}", start.elapsed(), res);
+}
+
+fn bench(index: , nm: , queries: usize) {
+let len = index.len() as u32;
+let mut rng = rand::thread_rng();
+let nodes: Vec = (0..queries)
+.map(|_| {
+index
+.node((rng.gen::() % len) as Revision)
+.unwrap()
+.clone()
+})
+.collect();
+if queries < 10 {
+let nodes_hex: Vec =
+nodes.iter().map(|n| n.encode_hex()).collect();
+println!("Nodes: {:?}", nodes_hex);
+}
+let mut last: Option = None;
+let start = Instant::now();
+for node in nodes.iter() {
+last = nm.find_bin(index, node.into()).unwrap();
+}
+let elapsed = start.elapsed();
+println!(
+"Did {} queries in {:?} (mean {:?}), last was {:?} with result {:?}",
+queries,
+elapsed,
+elapsed / (queries as u32),
+nodes.last().unwrap().encode_hex(),
+last
+);
+}
+
+fn main() {
+let matches = App::new("Nodemap pure Rust example")
+.arg(
+Arg::with_name("REPOSITORY")
+.help("Path to the repository, always necessary for its index")
+.required(true),
+)
+.arg(
+Arg::with_name("NODEMAP_FILE")
+.help("Path to the nodemap file, independent of REPOSITORY")
+.required(true),
+)
+.subcommand(
+SubCommand::with_name("create")
+.about("Create NODEMAP_FILE by scanning repository index"),
+)
+.subcommand(
+SubCommand::with_name("query")
+.about("Query NODEMAP_FILE for PREFIX")
+.arg(Arg::with_name("PREFIX").required(true)),
+)
+.subcommand(
+SubCommand::with_name("bench")
+.about(
+"Perform #QUERIES random successful queries on 
NODEMAP_FILE")
+.arg(Arg::with_name("QUERIES").required(true)),
+)
+.get_matches();
+
+let repo = matches.value_of("REPOSITORY").unwrap();
+let nm_path = matches.value_of("NODEMAP_FILE").unwrap();
+
+let index = mmap_index(::new(repo));
+
+if let Some(_) = matches.subcommand_matches("create") {
+println!("Creating nodemap file {} for repository {}", nm_path, repo);
+create(, ::new(nm

D7798: rust-nodemap: special case for prefixes of NULL_NODE

2020-01-27 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19638.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7798?vs=19330=19638

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7798/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7798

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -13,7 +13,8 @@
 //! is used in a more abstract context.
 
 use super::{
-Node, NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex,
+node::NULL_NODE, Node, NodeError, NodePrefix, NodePrefixRef, Revision,
+RevlogIndex, NULL_REVISION,
 };
 
 use std::fmt;
@@ -250,6 +251,31 @@
 })
 }
 
+/// validate that the candidate's node starts indeed with given prefix,
+/// and treat ambiguities related to `NULL_REVISION`.
+///
+/// From the data in the NodeTree, one can only conclude that some
+/// revision is the only one for a *subprefix* of the one being looked up.
+fn validate_candidate<'p>(
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'p>,
+rev: Option,
+) -> Result, NodeMapError> {
+if prefix.is_prefix_of(_NODE) {
+// NULL_REVISION always matches a prefix made only of zeros
+// and any other *valid* result is an ambiguity
+match rev {
+None => Ok(Some(NULL_REVISION)),
+Some(r) => match has_prefix_or_none(idx, prefix, r)? {
+None => Ok(Some(NULL_REVISION)),
+_ => Err(NodeMapError::MultipleResults),
+},
+}
+} else {
+rev.map_or(Ok(None), |r| has_prefix_or_none(idx, prefix, r))
+}
+}
+
 impl NodeTree {
 /// Initiate a NodeTree from an immutable slice-like of `Block`
 ///
@@ -330,8 +356,6 @@
 }
 
 /// Main working method for `NodeTree` searches
-///
-/// This partial implementation lacks special cases for NULL_REVISION
 fn lookup<'p>(
 ,
 prefix: NodePrefixRef<'p>,
@@ -582,9 +606,7 @@
 idx:  RevlogIndex,
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError> {
-self.lookup(prefix.clone()).and_then(|opt| {
-opt.map_or(Ok(None), |rev| has_prefix_or_none(idx, prefix, rev))
-})
+validate_candidate(idx, prefix.clone(), self.lookup(prefix)?)
 }
 }
 
@@ -713,8 +735,9 @@
 
 assert_eq!(nt.find_hex(, "0"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "01"), Ok(Some(9)));
-assert_eq!(nt.find_hex(, "00"), Ok(Some(0)));
+assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "00a"), Ok(Some(0)));
+assert_eq!(nt.find_hex(, "000"), Ok(Some(NULL_REVISION)));
 }
 
 #[test]
@@ -733,7 +756,8 @@
 };
 assert_eq!(nt.find_hex(, "10")?, Some(1));
 assert_eq!(nt.find_hex(, "c")?, Some(2));
-assert_eq!(nt.find_hex(, "00")?, Some(0));
+assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
+assert_eq!(nt.find_hex(, "000")?, Some(NULL_REVISION));
 assert_eq!(nt.find_hex(, "01")?, Some(9));
 Ok(())
 }



To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7796: rust-nodemap: input/output primitives

2020-01-27 Thread gracinet (Georges Racinet)
gracinet edited the summary of this revision.
gracinet updated this revision to Diff 19636.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=19329=19636

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
 };
 
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -175,6 +177,8 @@
 #[derive(Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::();
+
 impl Block {
 fn new() -> Self {
 Block([-1; 16])
@@ -262,6 +266,56 @@
 }
 }
 
+/// Create from an opaque bunch of bytes
+///
+/// The created `NodeTreeBytes` from `buffer`,
+/// of which exactly `amount` bytes are used.
+///
+/// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+/// - `offset` allows for the final file format to include fixed data
+///   (generation number, behavioural flags)
+/// - `amount` is expressed in bytes, and is not automatically derived from
+///   `bytes`, so that a caller that manages them atomically can perform
+///   temporary disk serializations and still rollback easily if needed.
+///   First use-case for this would be to support Mercurial shell hooks.
+///
+/// panics if `buffer` is smaller than `amount`
+pub fn load_bytes(
+bytes: Box + Send>,
+amount: usize,
+) -> Self {
+NodeTree::new(Box::new(NodeTreeBytes::new(bytes, amount)))
+}
+
+/// Retrieve added `Block` and the original immutable data
+pub fn into_readonly_and_added(
+self,
+) -> (Box + Send>, Vec) {
+let mut vec = self.growable;
+let readonly = self.readonly;
+if readonly.last() != Some() {
+vec.push(self.root);
+}
+(readonly, vec)
+}
+
+/// Retrieve added `Blocks` as bytes, ready to be written to persistent
+/// storage
+pub fn into_readonly_and_added_bytes(
+self,
+) -> (Box + Send>, Vec) {
+let (readonly, vec) = self.into_readonly_and_added();
+let bytes = unsafe {
+Vec::from_raw_parts(
+vec.as_ptr() as *mut u8,
+vec.len() * BLOCK_SIZE,
+vec.capacity() * BLOCK_SIZE,
+)
+};
+mem::forget(vec);
+(readonly, bytes)
+}
+
 /// Total number of blocks
 fn len() -> usize {
 self.readonly.len() + self.growable.len() + 1
@@ -410,6 +464,38 @@
 }
 }
 
+pub struct NodeTreeBytes {
+buffer: Box + Send>,
+len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+fn new(
+buffer: Box + Send>,
+amount: usize,
+) -> Self {
+assert!(buffer.len() >= amount);
+let len_in_blocks = amount / BLOCK_SIZE;
+NodeTreeBytes {
+buffer,
+len_in_blocks,
+}
+}
+}
+
+impl Deref for NodeTreeBytes {
+type Target = [Block];
+
+fn deref() -> &[Block] {
+unsafe {
+slice::from_raw_parts(
+().as_ptr() as *const Block,
+self.len_in_blocks,
+)
+}
+}
+}
+
 struct NodeTreeVisitor<'n, 'p> {
 nt: &'n NodeTree,
 prefix: NodePrefixRef<'p>,
@@ -786,4 +872,30 @@
 
 Ok(())
 }
+
+#[test]
+fn test_into_added_empty() {
+assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+assert!(sample_nodetree()
+.into_readonly_and_added_bytes()
+.1
+.is_empty());
+}
+
+#[test]
+fn test_into_added_bytes() -> Result<(), NodeMapError> {
+let mut idx = TestNtIndex::new();
+idx.insert(0, "1234")?;
+let mut idx = idx.commit();
+idx.insert(4, "cafe")?;
+let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+// only the root block has been changed
+assert_eq!(bytes.len(), BLOCK_SIZE);
+// big endian for -2
+assert_eq!([4..2 * 4], [255, 255, 255, 254]);
+// big endian for -6
+assert_eq!([12 * 4..13 * 4], [255, 255, 255, 250]);
+Ok(())
+}
 }



To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7795: rust-nodemap: insert method

2020-01-27 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19635.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7795?vs=19328=19635

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7795/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7795

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -15,6 +15,7 @@
 use super::{
 Node, NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex,
 };
+
 use std::fmt;
 use std::ops::Deref;
 use std::ops::Index;
@@ -96,6 +97,15 @@
 }
 }
 
+pub trait MutableNodeMap: NodeMap {
+fn insert(
+ self,
+index: ,
+node: ,
+rev: Revision,
+) -> Result<(), NodeMapError>;
+}
+
 /// Low level NodeTree [`Blocks`] elements
 ///
 /// These are exactly as for instance on persistent storage.
@@ -292,6 +302,112 @@
 done: false,
 }
 }
+/// Return a mutable reference for `Block` at index `idx`.
+///
+/// If `idx` lies in the immutable area, then the reference is to
+/// a newly appended copy.
+///
+/// Returns (new_idx, glen, mut_ref) where
+///
+/// - `new_idx` is the index of the mutable `Block`
+/// - `mut_ref` is a mutable reference to the mutable Block.
+/// - `glen` is the new length of `self.growable`
+///
+/// Note: the caller wouldn't be allowed to query `self.growable.len()`
+/// itself because of the mutable borrow taken with the returned `Block`
+fn mutable_block( self, idx: usize) -> (usize,  Block, usize) {
+let ro_blocks = 
+let ro_len = ro_blocks.len();
+let glen = self.growable.len();
+if idx < ro_len {
+// TODO OPTIM I think this makes two copies
+self.growable.push(ro_blocks[idx].clone());
+(glen + ro_len,  self.growable[glen], glen + 1)
+} else if glen + ro_len == idx {
+(idx,  self.root, glen)
+} else {
+(idx,  self.growable[idx - ro_len], glen)
+}
+}
+
+/// Main insertion method
+///
+/// This will dive in the node tree to find the deepest `Block` for
+/// `node`, split it as much as needed and record `node` in there.
+/// The method then backtracks, updating references in all the visited
+/// blocks from the root.
+///
+/// All the mutated `Block` are copied first to the growable part if
+/// needed. That happens for those in the immutable part except the root.
+pub fn insert(
+ self,
+index: ,
+node: ,
+rev: Revision,
+) -> Result<(), NodeMapError> {
+let ro_len = ();
+
+let mut visit_steps: Vec<_> = self.visit(node.into()).collect();
+let read_nybbles = visit_steps.len();
+// visit_steps cannot be empty, since we always visit the root block
+let deepest = visit_steps.pop().unwrap();
+
+let (mut block_idx, mut block, mut glen) =
+self.mutable_block(deepest.block_idx);
+
+if let Element::Rev(old_rev) = deepest.element {
+let old_node = index
+.node(old_rev)
+.ok_or_else(|| NodeMapError::RevisionNotInIndex(old_rev))?;
+if old_node == node {
+return Ok(()); // avoid creating lots of useless blocks
+}
+
+// Looping over the tail of nybbles in both nodes, creating
+// new blocks until we find the difference
+let mut new_block_idx = ro_len + glen;
+let mut nybble = deepest.nybble;
+for nybble_pos in read_nybbles..node.nybbles_len() {
+block.set(nybble, Element::Block(new_block_idx));
+
+let new_nybble = node.get_nybble(nybble_pos);
+let old_nybble = old_node.get_nybble(nybble_pos);
+
+if old_nybble == new_nybble {
+self.growable.push(Block::new());
+block =  self.growable[glen];
+glen += 1;
+new_block_idx += 1;
+nybble = new_nybble;
+} else {
+let mut new_block = Block::new();
+new_block.set(old_nybble, Element::Rev(old_rev));
+new_block.set(new_nybble, Element::Rev(rev));
+self.growable.push(new_block);
+break;
+}
+}
+} else {
+// Free slot in the deepest block: no splitting has to be done
+block.set(deepest.nybble, Element::Rev(rev));
+}
+
+// Backtrack over visit steps to update references
+while let Some(visited) = visit_steps.pop() {
+let to_write = Element::Block(block_idx);
+if visit_steps.is_empty() {
+   

D7794: rust-nodemap: generic NodeTreeVisitor

2020-01-27 Thread gracinet (Georges Racinet)
gracinet edited the summary of this revision.
gracinet updated this revision to Diff 19634.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7794?vs=19327=19634

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7794/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7794

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -272,17 +272,82 @@
 ,
 prefix: NodePrefixRef<'p>,
 ) -> Result, NodeMapError> {
-let mut visit = self.len() - 1;
-for i in 0..prefix.len() {
-let nybble = prefix.get_nybble(i);
-match self[visit].get(nybble) {
-Element::None => return Ok(None),
-Element::Rev(r) => return Ok(Some(r)),
-Element::Block(idx) => visit = idx,
+for visit_item in self.visit(prefix) {
+if let Some(opt) = visit_item.final_revision() {
+return Ok(opt);
 }
 }
 Err(NodeMapError::MultipleResults)
 }
+
+fn visit<'n, 'p>(
+&'n self,
+prefix: NodePrefixRef<'p>,
+) -> NodeTreeVisitor<'n, 'p> {
+NodeTreeVisitor {
+nt: self,
+prefix: prefix,
+visit: self.len() - 1,
+nybble_idx: 0,
+done: false,
+}
+}
+}
+
+struct NodeTreeVisitor<'n, 'p> {
+nt: &'n NodeTree,
+prefix: NodePrefixRef<'p>,
+visit: usize,
+nybble_idx: usize,
+done: bool,
+}
+
+#[derive(Debug, PartialEq, Clone)]
+struct NodeTreeVisitItem {
+block_idx: usize,
+nybble: u8,
+element: Element,
+}
+
+impl<'n, 'p> Iterator for NodeTreeVisitor<'n, 'p> {
+type Item = NodeTreeVisitItem;
+
+fn next( self) -> Option {
+if self.done || self.nybble_idx >= self.prefix.len() {
+return None;
+}
+
+let nybble = self.prefix.get_nybble(self.nybble_idx);
+self.nybble_idx += 1;
+
+let visit = self.visit;
+let element = self.nt[visit].get(nybble);
+if let Element::Block(idx) = element {
+self.visit = idx;
+} else {
+self.done = true;
+}
+
+Some(NodeTreeVisitItem {
+block_idx: visit,
+nybble: nybble,
+element: element,
+})
+}
+}
+
+impl NodeTreeVisitItem {
+// Return `Some(opt)` if this item is final, with `opt` being the
+// `Revision` that it may represent.
+//
+// If the item is not terminal, return `None`
+fn final_revision() -> Option> {
+match self.element {
+Element::Block(_) => None,
+Element::Rev(r) => Some(Some(r)),
+Element::None => Some(None),
+}
+}
 }
 
 impl From> for NodeTree {



To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7793: rust-nodemap: mutable NodeTree data structure

2020-01-27 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19633.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7793?vs=19326=19633

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7793/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7793

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -191,16 +191,31 @@
 }
 }
 
-/// A 16-radix tree with the root block at the end
+/// A mutable 16-radix tree with the root block logically at the end
+///
+/// Because of the append only nature of our node trees, we need to
+/// keep the original untouched and store new blocks separately.
+///
+/// The mutable root `Block` is kept apart so that we don't have to rebump
+/// it on each insertion.
 pub struct NodeTree {
 readonly: Box + Send>,
+growable: Vec,
+root: Block,
 }
 
 impl Index for NodeTree {
 type Output = Block;
 
 fn index(, i: usize) ->  {
-[i]
+let ro_len = self.readonly.len();
+if i < ro_len {
+[i]
+} else if i == ro_len + self.growable.len() {
+
+} else {
+[i - ro_len]
+}
 }
 }
 
@@ -222,12 +237,32 @@
 }
 
 impl NodeTree {
-fn len() -> usize {
-self.readonly.len()
+/// Initiate a NodeTree from an immutable slice-like of `Block`
+///
+/// We keep `readonly` and clone its root block if it isn't empty.
+fn new(readonly: Box + Send>) -> Self {
+let root = readonly
+.last()
+.map(|b| b.clone())
+.unwrap_or_else(|| Block::new());
+NodeTree {
+readonly: readonly,
+growable: Vec::new(),
+root: root,
+}
 }
 
+/// Total number of blocks
+fn len() -> usize {
+self.readonly.len() + self.growable.len() + 1
+}
+
+/// Implemented for completeness
+///
+/// A `NodeTree` always has at least the mutable root block.
+#[allow(dead_code)]
 fn is_empty() -> bool {
-self.len() == 0
+false
 }
 
 /// Main working method for `NodeTree` searches
@@ -237,9 +272,6 @@
 ,
 prefix: NodePrefixRef<'p>,
 ) -> Result, NodeMapError> {
-if self.is_empty() {
-return Ok(None);
-}
 let mut visit = self.len() - 1;
 for i in 0..prefix.len() {
 let nybble = prefix.get_nybble(i);
@@ -255,16 +287,18 @@
 
 impl From> for NodeTree {
 fn from(vec: Vec) -> Self {
-NodeTree {
-readonly: Box::new(vec),
-}
+Self::new(Box::new(vec))
 }
 }
 
 impl fmt::Debug for NodeTree {
 fn fmt(, f:  fmt::Formatter<'_>) -> fmt::Result {
-let blocks: &[Block] = &*self.readonly;
-write!(f, "readonly: {:?}", blocks)
+let readonly: &[Block] = &*self.readonly;
+write!(
+f,
+"readonly: {:?}, growable: {:?}, root: {:?}",
+readonly, self.growable, self.root
+)
 }
 }
 
@@ -365,7 +399,9 @@
 assert_eq!(
 format!("{:?}", nt),
 "readonly: \
- [{0: Rev(9)}, {0: Rev(0), 1: Rev(9)}, {0: Block(1), 1: Rev(1)}]"
+ [{0: Rev(9)}, {0: Rev(0), 1: Rev(9)}, {0: Block(1), 1: Rev(1)}], \
+ growable: [], \
+ root: {0: Block(1), 1: Rev(1)}",
 );
 }
 
@@ -374,7 +410,7 @@
 let mut idx: TestIndex = HashMap::new();
 pad_insert( idx, 1, "1234deadcafe");
 
-let nt = NodeTree::from(vec![block![1: Rev(1)]]);
+let nt = NodeTree::from(vec![block! {1: Rev(1)}]);
 assert_eq!(nt.find_hex(, "1")?, Some(1));
 assert_eq!(nt.find_hex(, "12")?, Some(1));
 assert_eq!(nt.find_hex(, "1234de")?, Some(1));
@@ -401,4 +437,25 @@
 assert_eq!(nt.find_hex(, "00"), Ok(Some(0)));
 assert_eq!(nt.find_hex(, "00a"), Ok(Some(0)));
 }
+
+#[test]
+fn test_mutated_find() -> Result<(), NodeMapError> {
+let mut idx = TestIndex::new();
+pad_insert( idx, 9, "012");
+pad_insert( idx, 0, "00a");
+pad_insert( idx, 2, "cafe");
+pad_insert( idx, 3, "15");
+pad_insert( idx, 1, "10");
+
+let nt = NodeTree {
+readonly: sample_nodetree().readonly,
+growable: vec![block![0: Rev(1), 5: Rev(3)]],
+root: block![0: Block(1), 1:Block(3), 12: Rev(2)],
+};
+assert_eq!(nt.find_hex(, "10")?, Some(1));
+assert_eq!(nt.find_hex(, "c")?, Some(2));
+assert_eq!(nt.find_hex(, "00")?, Some(0));
+assert_eq!(nt.find_hex(, "01")?, Some(9));
+Ok(())
+}
 }



To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list

D7791: rust-nodemap: NodeMap trait with simplest implementation

2020-01-27 Thread gracinet (Georges Racinet)
gracinet retitled this revision from "rust-nodemap: NodeMap trait with simplest 
implementor" to "rust-nodemap: NodeMap trait with simplest implementation".
gracinet updated this revision to Diff 19631.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7791?vs=19324=19631

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -12,8 +12,88 @@
 //! Following existing implicit conventions, the "nodemap" terminology
 //! is used in a more abstract context.
 
-use super::Revision;
+use super::{
+Node, NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex,
+};
 use std::fmt;
+use std::ops::Deref;
+
+#[derive(Debug, PartialEq)]
+pub enum NodeMapError {
+MultipleResults,
+InvalidNodePrefix(NodeError),
+/// A `Revision` stored in the nodemap could not be found in the index
+RevisionNotInIndex(Revision),
+}
+
+impl From for NodeMapError {
+fn from(err: NodeError) -> Self {
+NodeMapError::InvalidNodePrefix(err)
+}
+}
+
+/// Mapping system from Mercurial nodes to revision numbers.
+///
+/// ## `RevlogIndex` and `NodeMap`
+///
+/// One way to think about their relationship is that
+/// the `NodeMap` is a prefix-oriented reverse index of the `Node` information
+/// carried by a [`RevlogIndex`].
+///
+/// Many of the methods in this trait take a `RevlogIndex` argument
+/// which is used for validation of their results. This index must naturally
+/// be the one the `NodeMap` is about, and it must be consistent.
+///
+/// Notably, the `NodeMap` must not store
+/// information about more `Revision` values than there are in the index.
+/// In these methods, an encountered `Revision` is not in the index, a
+/// [`RevisionNotInIndex`] error is returned.
+///
+/// In insert operations, the rule is thus that the `NodeMap` must always
+/// be updated after the `RevlogIndex`
+/// be updated first, and the `NodeMap` second.
+///
+/// [`RevisionNotInIndex`]: enum.NodeMapError.html#variant.RevisionNotInIndex
+/// [`RevlogIndex`]: ../trait.RevlogIndex.html
+pub trait NodeMap {
+/// Find the unique `Revision` having the given `Node`
+///
+/// If no Revision matches the given `Node`, `Ok(None)` is returned.
+fn find_node(
+,
+index:  RevlogIndex,
+node: ,
+) -> Result, NodeMapError> {
+self.find_bin(index, node.into())
+}
+
+/// Find the unique Revision whose `Node` starts with a given binary prefix
+///
+/// If no Revision matches the given prefix, `Ok(None)` is returned.
+///
+/// If several Revisions match the given prefix, a [`MultipleResults`]
+/// error is returned.
+fn find_bin<'a>(
+,
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'a>,
+) -> Result, NodeMapError>;
+
+/// Find the unique Revision whose `Node` hexadecimal string representation
+/// starts with a given prefix
+///
+/// If no Revision matches the given prefix, `Ok(None)` is returned.
+///
+/// If several Revisions match the given prefix, a [`MultipleResults`]
+/// error is returned.
+fn find_hex(
+,
+idx:  RevlogIndex,
+prefix: ,
+) -> Result, NodeMapError> {
+self.find_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
+}
+}
 
 /// Low level NodeTree [`Blocks`] elements
 ///
@@ -110,9 +190,86 @@
 }
 }
 
+/// A 16-radix tree with the root block at the end
+pub struct NodeTree {
+readonly: Box + Send>,
+}
+
+/// Return `None` unless the `Node` for `rev` has given prefix in `index`.
+fn has_prefix_or_none<'p>(
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'p>,
+rev: Revision,
+) -> Result, NodeMapError> {
+idx.node(rev)
+.ok_or_else(|| NodeMapError::RevisionNotInIndex(rev))
+.map(|node| {
+if prefix.is_prefix_of(node) {
+Some(rev)
+} else {
+None
+}
+})
+}
+
+impl NodeTree {
+/// Main working method for `NodeTree` searches
+///
+/// This partial implementation lacks special cases for NULL_REVISION
+fn lookup<'p>(
+,
+prefix: NodePrefixRef<'p>,
+) -> Result, NodeMapError> {
+let blocks: &[Block] = &*self.readonly;
+if blocks.is_empty() {
+return Ok(None);
+}
+let mut visit = blocks.len() - 1;
+for i in 0..prefix.len() {
+let nybble = prefix.get_nybble(i);
+match blocks[visit].get(nybble) {
+Element::None => return Ok(None),
+Element::Rev(r) => return Ok(Some(r)),
+Element::Block(idx) => 

D7792: rust-nodemap: abstracting the indexing

2020-01-27 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19632.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7792?vs=19325=19632

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7792/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7792

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,6 +17,7 @@
 };
 use std::fmt;
 use std::ops::Deref;
+use std::ops::Index;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -195,6 +196,14 @@
 readonly: Box + Send>,
 }
 
+impl Index for NodeTree {
+type Output = Block;
+
+fn index(, i: usize) ->  {
+[i]
+}
+}
+
 /// Return `None` unless the `Node` for `rev` has given prefix in `index`.
 fn has_prefix_or_none<'p>(
 idx:  RevlogIndex,
@@ -213,6 +222,14 @@
 }
 
 impl NodeTree {
+fn len() -> usize {
+self.readonly.len()
+}
+
+fn is_empty() -> bool {
+self.len() == 0
+}
+
 /// Main working method for `NodeTree` searches
 ///
 /// This partial implementation lacks special cases for NULL_REVISION
@@ -220,14 +237,13 @@
 ,
 prefix: NodePrefixRef<'p>,
 ) -> Result, NodeMapError> {
-let blocks: &[Block] = &*self.readonly;
-if blocks.is_empty() {
+if self.is_empty() {
 return Ok(None);
 }
-let mut visit = blocks.len() - 1;
+let mut visit = self.len() - 1;
 for i in 0..prefix.len() {
 let nybble = prefix.get_nybble(i);
-match blocks[visit].get(nybble) {
+match self[visit].get(nybble) {
 Element::None => return Ok(None),
 Element::Rev(r) => return Ok(Some(r)),
 Element::Block(idx) => visit = idx,



To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7790: rust-node: handling binary Node prefix

2020-01-27 Thread gracinet (Georges Racinet)
gracinet edited the summary of this revision.
gracinet updated this revision to Diff 19630.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7790?vs=19039=19630

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

AFFECTED FILES
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/node.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
--- a/rust/hg-core/src/revlog/node.rs
+++ b/rust/hg-core/src/revlog/node.rs
@@ -62,6 +62,7 @@
 #[derive(Debug, PartialEq)]
 pub enum NodeError {
 ExactLengthRequired(usize, String),
+PrefixTooLong(String),
 HexError(FromHexError, String),
 }
 
@@ -119,17 +120,119 @@
 }
 }
 
-impl From<(FromHexError, )> for NodeError {
-fn from(err_offender: (FromHexError, )) -> Self {
+impl> From<(FromHexError, T)> for NodeError {
+fn from(err_offender: (FromHexError, T)) -> Self {
 let (err, offender) = err_offender;
 match err {
 FromHexError::InvalidStringLength => {
 NodeError::ExactLengthRequired(
 NODE_NYBBLES_LENGTH,
-offender.to_string(),
+offender.as_ref().to_owned(),
 )
 }
-_ => NodeError::HexError(err, offender.to_string()),
+_ => NodeError::HexError(err, offender.as_ref().to_owned()),
+}
+}
+}
+
+/// The beginning of a binary revision SHA.
+///
+/// Since it can potentially come from an hexadecimal representation with
+/// odd length, it needs to carry around whether the last 4 bits are relevant
+/// or not.
+#[derive(Debug, PartialEq)]
+pub struct NodePrefix {
+buf: Vec,
+is_odd: bool,
+}
+
+impl NodePrefix {
+/// Convert from hexadecimal string representation
+///
+/// Similarly to `hex::decode`, can be used with Unicode string types
+/// (`String`, ``) as well as bytes.
+///
+/// To be used in FFI and I/O only, in order to facilitate future
+/// changes of hash format.
+pub fn from_hex(hex: impl AsRef<[u8]>) -> Result {
+let hex = hex.as_ref();
+let len = hex.len();
+if len > NODE_NYBBLES_LENGTH {
+return Err(NodeError::PrefixTooLong(
+String::from_utf8_lossy(hex).to_owned().to_string(),
+));
+}
+
+let is_odd = len % 2 == 1;
+let even_part = if is_odd { [..len - 1] } else { hex };
+let mut buf: Vec = Vec::from_hex(_part)
+.map_err(|e| (e, String::from_utf8_lossy(hex)))?;
+
+if is_odd {
+let latest_char = char::from(hex[len - 1]);
+let latest_nybble = latest_char.to_digit(16).ok_or_else(|| {
+(
+FromHexError::InvalidHexCharacter {
+c: latest_char,
+index: len - 1,
+},
+String::from_utf8_lossy(hex),
+)
+})? as u8;
+buf.push(latest_nybble << 4);
+}
+Ok(NodePrefix { buf, is_odd })
+}
+
+pub fn borrow() -> NodePrefixRef {
+NodePrefixRef {
+buf: ,
+is_odd: self.is_odd,
+}
+}
+}
+
+#[derive(Clone, Debug, PartialEq)]
+pub struct NodePrefixRef<'a> {
+buf: &'a [u8],
+is_odd: bool,
+}
+
+impl<'a> NodePrefixRef<'a> {
+pub fn len() -> usize {
+if self.is_odd {
+self.buf.len() * 2 - 1
+} else {
+self.buf.len() * 2
+}
+}
+
+pub fn is_prefix_of(, node: ) -> bool {
+if self.is_odd {
+let buf = self.buf;
+let last_pos = buf.len() - 1;
+node.data.starts_with(buf.split_at(last_pos).0)
+&& node.data[last_pos] >> 4 == buf[last_pos] >> 4
+} else {
+node.data.starts_with(self.buf)
+}
+}
+
+/// Retrieve the `i`th half-byte from the prefix.
+///
+/// This is also the `i`th hexadecimal digit in numeric form,
+/// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
+pub fn get_nybble(, i: usize) -> u8 {
+get_nybble(self.buf, i)
+}
+}
+
+/// A shortcut for full `Node` references
+impl<'a> From<&'a Node> for NodePrefixRef<'a> {
+fn from(node: &'a Node) -> Self {
+NodePrefixRef {
+buf: ,
+is_odd: false,
 }
 }
 }
@@ -188,4 +291,74 @@
 fn test_node_encode_hex() {
 assert_eq!(sample_node().encode_hex(), sample_node_hex());
 }
+
+#[test]
+fn test_prefix_from_hex() -> Result<(), NodeError> {
+assert_eq!(
+NodePrefix::from_hex("0e1")?,
+NodePrefix {
+buf: vec![14, 16],
+is_odd: true
+}
+);
+assert_eq!(
+NodePrefix::from_hex("0e1a")?,
+NodePrefix {
+  

D7789: rust-revlog: a trait for the revlog index

2020-01-27 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19629.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7789?vs=19038=19629

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7789/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7789

AFFECTED FILES
  rust/hg-core/src/revlog.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog.rs b/rust/hg-core/src/revlog.rs
--- a/rust/hg-core/src/revlog.rs
+++ b/rust/hg-core/src/revlog.rs
@@ -40,3 +40,17 @@
 ParentOutOfRange(Revision),
 WorkingDirectoryUnsupported,
 }
+
+/// The Mercurial Revlog Index
+///
+/// This is currently limited to the minimal interface that is needed for
+/// the [`nodemap`](nodemap/index.html) module
+pub trait RevlogIndex {
+/// Total number of Revisions referenced in this index
+fn len() -> usize;
+
+/// Return a reference to the Node or `None` if rev is out of bounds
+///
+/// `NULL_REVISION` is not considered to be out of bounds.
+fn node(, rev: Revision) -> Option<>;
+}



To: gracinet, #hg-reviewers, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7789: rust-revlog: a trait for the revlog index

2020-01-23 Thread gracinet (Georges Racinet)
gracinet added a comment.


  @martinvonz yeah, probably wrote the doc-comment too fast
  
  Of course it's meant to match what revlog.c does, since the first 
implementation willl actually be re-exposure of the C version

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7789/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7789

To: gracinet, #hg-reviewers, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7788: rust-node: binary Node ID and conversion utilities

2020-01-22 Thread gracinet (Georges Racinet)
Closed by commit rHG7f86426fdd2c: rust-node: binary Node ID and conversion 
utilities (authored by gracinet).
gracinet marked 3 inline comments as done.
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7788?vs=19521=19523

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7788/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7788

AFFECTED FILES
  rust/Cargo.lock
  rust/hg-core/Cargo.toml
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/node.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/revlog/node.rs
@@ -0,0 +1,191 @@
+// Copyright 2019-2020 Georges Racinet 
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+//! Definitions and utilities for Revision nodes
+//!
+//! In Mercurial code base, it is customary to call "a node" the binary SHA
+//! of a revision.
+
+use hex::{self, FromHex, FromHexError};
+
+/// The length in bytes of a `Node`
+///
+/// This constant is meant to ease refactors of this module, and
+/// are private so that calling code does not expect all nodes have
+/// the same size, should we support several formats concurrently in
+/// the future.
+const NODE_BYTES_LENGTH: usize = 20;
+
+/// The length in bytes of a `Node`
+///
+/// see also `NODES_BYTES_LENGTH` about it being private.
+const NODE_NYBBLES_LENGTH: usize = 2 * NODE_BYTES_LENGTH;
+
+/// Private alias for readability and to ease future change
+type NodeData = [u8; NODE_BYTES_LENGTH];
+
+/// Binary revision SHA
+///
+/// ## Future changes of hash size
+///
+/// To accomodate future changes of hash size, Rust callers
+/// should use the conversion methods at the boundaries (FFI, actual
+/// computation of hashes and I/O) only, and only if required.
+///
+/// All other callers outside of unit tests should just handle `Node` values
+/// and never make any assumption on the actual length, using [`nybbles_len`]
+/// if they need a loop boundary.
+///
+/// All methods that create a `Node` either take a type that enforces
+/// the size or fail immediately at runtime with [`ExactLengthRequired`].
+///
+/// [`nybbles_len`]: #method.nybbles_len
+/// [`ExactLengthRequired`]: struct.NodeError#variant.ExactLengthRequired
+#[derive(Clone, Debug, PartialEq)]
+pub struct Node {
+data: NodeData,
+}
+
+/// The node value for NULL_REVISION
+pub const NULL_NODE: Node = Node {
+data: [0; NODE_BYTES_LENGTH],
+};
+
+impl From for Node {
+fn from(data: NodeData) -> Node {
+Node { data }
+}
+}
+
+#[derive(Debug, PartialEq)]
+pub enum NodeError {
+ExactLengthRequired(usize, String),
+HexError(FromHexError, String),
+}
+
+/// Low level utility function, also for prefixes
+fn get_nybble(s: &[u8], i: usize) -> u8 {
+if i % 2 == 0 {
+s[i / 2] >> 4
+} else {
+s[i / 2] & 0x0f
+}
+}
+
+impl Node {
+/// Retrieve the `i`th half-byte of the binary data.
+///
+/// This is also the `i`th hexadecimal digit in numeric form,
+/// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
+pub fn get_nybble(, i: usize) -> u8 {
+get_nybble(, i)
+}
+
+/// Length of the data, in nybbles
+pub fn nybbles_len() -> usize {
+// public exposure as an instance method only, so that we can
+// easily support several sizes of hashes if needed in the future.
+NODE_NYBBLES_LENGTH
+}
+
+/// Convert from hexadecimal string representation
+///
+/// Exact length is required.
+///
+/// To be used in FFI and I/O only, in order to facilitate future
+/// changes of hash format.
+pub fn from_hex(hex: ) -> Result {
+Ok(NodeData::from_hex(hex)
+.map_err(|e| NodeError::from((e, hex)))?
+.into())
+}
+
+/// Convert to hexadecimal string representation
+///
+/// To be used in FFI and I/O only, in order to facilitate future
+/// changes of hash format.
+pub fn encode_hex() -> String {
+hex::encode(self.data)
+}
+
+/// Provide access to binary data
+///
+/// This is needed by FFI layers, for instance to return expected
+/// binary values to Python.
+pub fn as_bytes() -> &[u8] {
+
+}
+}
+
+impl From<(FromHexError, )> for NodeError {
+fn from(err_offender: (FromHexError, )) -> Self {
+let (err, offender) = err_offender;
+match err {
+FromHexError::InvalidStringLength => {
+NodeError::ExactLengthRequired(
+NODE_NYBBLES_LENGTH,
+offender.to_string(),
+)
+

D7787: rust-nodemap: building blocks for nodetree structures

2020-01-22 Thread gracinet (Georges Racinet)
Closed by commit rHG63db6657d280: rust-nodemap: building blocks for nodetree 
structures (authored by gracinet).
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state "Needs 
Review".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7787?vs=19504=19522

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7787/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7787

AFFECTED FILES
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -0,0 +1,160 @@
+// Copyright 2018-2020 Georges Racinet 
+//   and Mercurial contributors
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+//! Indexing facilities for fast retrieval of `Revision` from `Node`
+//!
+//! This provides a variation on the 16-ary radix tree that is
+//! provided as "nodetree" in revlog.c, ready for append-only persistence
+//! on disk.
+//!
+//! Following existing implicit conventions, the "nodemap" terminology
+//! is used in a more abstract context.
+
+use super::Revision;
+use std::fmt;
+
+/// Low level NodeTree [`Blocks`] elements
+///
+/// These are exactly as for instance on persistent storage.
+type RawElement = i32;
+
+/// High level representation of values in NodeTree
+/// [`Blocks`](struct.Block.html)
+///
+/// This is the high level representation that most algorithms should
+/// use.
+#[derive(Clone, Debug, Eq, PartialEq)]
+enum Element {
+Rev(Revision),
+Block(usize),
+None,
+}
+
+impl From for Element {
+/// Conversion from low level representation, after endianness conversion.
+///
+/// See [`Block`](struct.Block.html) for explanation about the encoding.
+fn from(raw: RawElement) -> Element {
+if raw >= 0 {
+Element::Block(raw as usize)
+} else if raw == -1 {
+Element::None
+} else {
+Element::Rev(-raw - 2)
+}
+}
+}
+
+impl From for RawElement {
+fn from(element: Element) -> RawElement {
+match element {
+Element::None => 0,
+Element::Block(i) => i as RawElement,
+Element::Rev(rev) => -rev - 2,
+}
+}
+}
+
+/// A logical block of the `NodeTree`, packed with a fixed size.
+///
+/// These are always used in container types implementing `Index`,
+/// such as ``
+///
+/// As an array of integers, its ith element encodes that the
+/// ith potential edge from the block, representing the ith hexadecimal digit
+/// (nybble) `i` is either:
+///
+/// - absent (value -1)
+/// - another `Block` in the same indexable container (value ≥ 0)
+///  - a `Revision` leaf (value ≤ -2)
+///
+/// Endianness has to be fixed for consistency on shared storage across
+/// different architectures.
+///
+/// A key difference with the C `nodetree` is that we need to be
+/// able to represent the [`Block`] at index 0, hence -1 is the empty marker
+/// rather than 0 and the `Revision` range upper limit of -2 instead of -1.
+///
+/// Another related difference is that `NULL_REVISION` (-1) is not
+/// represented at all, because we want an immutable empty nodetree
+/// to be valid.
+
+#[derive(Clone, PartialEq)]
+pub struct Block([RawElement; 16]);
+
+impl Block {
+fn new() -> Self {
+Block([-1; 16])
+}
+
+fn get(, nybble: u8) -> Element {
+Element::from(RawElement::from_be(self.0[nybble as usize]))
+}
+
+fn set( self, nybble: u8, element: Element) {
+self.0[nybble as usize] = RawElement::to_be(element.into())
+}
+}
+
+impl fmt::Debug for Block {
+/// sparse representation for testing and debugging purposes
+fn fmt(, f:  fmt::Formatter) -> fmt::Result {
+f.debug_map()
+.entries((0..16).filter_map(|i| match self.get(i) {
+Element::None => None,
+element => Some((i, element)),
+}))
+.finish()
+}
+}
+
+#[cfg(test)]
+mod tests {
+use super::*;
+
+/// Creates a `Block` using a syntax close to the `Debug` output
+macro_rules! block {
+{$($nybble:tt : $variant:ident($val:tt)),*} => (
+{
+let mut block = Block::new();
+$(block.set($nybble, Element::$variant($val)));*;
+block
+}
+)
+}
+
+#[test]
+fn test_block_debug() {
+let mut block = Block::new();
+block.set(1, Element::Rev(3));
+block.set(10, Element::Block(0));
+assert_eq!(format!("{:?}", block), "{1: Rev(3), 10: Block(0)}");
+}
+
+#[test]
+fn test_block_macro() 

D7788: rust-node: binary Node ID and conversion utilities

2020-01-22 Thread gracinet (Georges Racinet)
gracinet added a comment.
gracinet marked 6 inline comments as done.


  As the new commit description explains, I've done all I could to make the 
change of hash format easier
  
  I should add that my other colleagues at Octobus seem also to be involved in 
the improvement of hashing, there's no risk we would forget this one.

INLINE COMMENTS

> martinvonz wrote in dirs_multiset.rs:111
> please revert unrelated change

ah yes, sorry must have been a rebase artifact

> martinvonz wrote in node.rs:31
> Or even use `hex::decode()` from the `hex` crate? Can also use 
> `hex::encode()` in `node_to_hex` if we're okay with that.

Yes, `hex` is nice, it's able to produce arrays without further copies and does 
length checks. Its error type does not repeat the input string, though, which 
in my experience is really very useful to understand problems when they occur.

`hex` does not support hexadecimal string representations with odd numbers of
digits, which will be easy to work around in the next patch.

> kevincox wrote in node.rs:39
> If we want to avoid a number of allocations we can do:
> 
>   pub fn node_to_hex(n: ) -> String {
>   let mut r = String::with_capacity(n.len() * 2);
>   for byte in n {
>   write!(r, "{:02x}", byte).unwrap();
>   }
>   debug_assert_eq!(r.len(), n.len() * 2);
>   r
>   }
> 
> The generated code for `write!()` doesn't look great but if hex printing 
> shows up in benchmarks it would be trivial to write a custom hex-formatter.

Finally used the `hex` crate here too.

> martinvonz wrote in node.rs:41-51
> This feels like something that will only be used in nodemap.rs, so put it 
> there instead? Or do you think (or know) that it will be used elsewhere soon?

Well it's become a public method in the new `Node` struct.
`NodePrefix` will be another user.

I tend to prefer encapsulation over where it's used for these choices. This 
way, there's no tempation for nodemap.rs to play with the binary data directly.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7788/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7788

To: gracinet, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7788: rust-node: binary Node ID and conversion utilities

2020-01-22 Thread gracinet (Georges Racinet)
gracinet edited the summary of this revision.
gracinet retitled this revision from "rust-node: binary Node and conversion 
utilities" to "rust-node: binary Node ID and conversion utilities".
gracinet updated this revision to Diff 19521.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7788?vs=19323=19521

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7788/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7788

AFFECTED FILES
  rust/Cargo.lock
  rust/hg-core/Cargo.toml
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/node.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/revlog/node.rs
@@ -0,0 +1,191 @@
+// Copyright 2019-2020 Georges Racinet 
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+//! Definitions and utilities for Revision nodes
+//!
+//! In Mercurial code base, it is customary to call "a node" the binary SHA
+//! of a revision.
+
+use hex::{self, FromHex, FromHexError};
+
+/// The length in bytes of a `Node`
+///
+/// This constant is meant to ease refactors of this module, and
+/// are private so that calling code does not expect all nodes have
+/// the same size, should we support several formats concurrently in
+/// the future.
+const NODE_BYTES_LENGTH: usize = 20;
+
+/// The length in bytes of a `Node`
+///
+/// see also `NODES_BYTES_LENGTH` about it being private.
+const NODE_NYBBLES_LENGTH: usize = 2 * NODE_BYTES_LENGTH;
+
+/// Private alias for readability and to ease future change
+type NodeData = [u8; NODE_BYTES_LENGTH];
+
+/// Binary revision SHA
+///
+/// ## Future changes of hash size
+///
+/// To accomodate future changes of hash size, Rust callers
+/// should use the conversion methods at the boundaries (FFI, actual
+/// computation of hashes and I/O) only, and only if required.
+///
+/// All other callers outside of unit tests should just handle `Node` values
+/// and never make any assumption on the actual length, using [`nybbles_len`]
+/// if they need a loop boundary.
+///
+/// All methods that create a `Node` either take a type that enforces
+/// the size or fail immediately at runtime with [`ExactLengthRequired`].
+///
+/// [`nybbles_len`]: #method.nybbles_len
+/// [`ExactLengthRequired`]: struct.NodeError#variant.ExactLengthRequired
+#[derive(Clone, Debug, PartialEq)]
+pub struct Node {
+data: NodeData,
+}
+
+/// The node value for NULL_REVISION
+pub const NULL_NODE: Node = Node {
+data: [0; NODE_BYTES_LENGTH],
+};
+
+impl From for Node {
+fn from(data: NodeData) -> Node {
+Node { data }
+}
+}
+
+#[derive(Debug, PartialEq)]
+pub enum NodeError {
+ExactLengthRequired(usize, String),
+HexError(FromHexError, String),
+}
+
+/// Low level utility function, also for prefixes
+fn get_nybble(s: &[u8], i: usize) -> u8 {
+if i % 2 == 0 {
+s[i / 2] >> 4
+} else {
+s[i / 2] & 0x0f
+}
+}
+
+impl Node {
+/// Retrieve the `i`th half-byte of the binary data.
+///
+/// This is also the `i`th hexadecimal digit in numeric form,
+/// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
+pub fn get_nybble(, i: usize) -> u8 {
+get_nybble(, i)
+}
+
+/// Length of the data, in nybbles
+pub fn nybbles_len() -> usize {
+// public exposure as an instance method only, so that we can
+// easily support several sizes of hashes if needed in the future.
+NODE_NYBBLES_LENGTH
+}
+
+/// Convert from hexadecimal string representation
+///
+/// Exact length is required.
+///
+/// To be used in FFI and I/O only, in order to facilitate future
+/// changes of hash format.
+pub fn from_hex(hex: ) -> Result {
+Ok(NodeData::from_hex(hex)
+.map_err(|e| NodeError::from((e, hex)))?
+.into())
+}
+
+/// Convert to hexadecimal string representation
+///
+/// To be used in FFI and I/O only, in order to facilitate future
+/// changes of hash format.
+pub fn encode_hex() -> String {
+hex::encode(self.data)
+}
+
+/// Provide access to binary data
+///
+/// This is needed by FFI layers, for instance to return expected
+/// binary values to Python.
+pub fn as_bytes() -> &[u8] {
+
+}
+}
+
+impl From<(FromHexError, )> for NodeError {
+fn from(err_offender: (FromHexError, )) -> Self {
+let (err, offender) = err_offender;
+match err {
+FromHexError::InvalidStringLength => {
+NodeError::ExactLengthRequired(
+NODE_NYBBLES_LENGTH,
+offender.to_string(),
+)
+}
+_ => NodeError::

D7819: rust-nodemap: core implementation for shortest

2020-01-22 Thread gracinet (Georges Racinet)
gracinet added inline comments.

INLINE COMMENTS

> kevincox wrote in nodemap.rs:575
> How about `unique_bin_prefix_len`?

I have personally nothing against this, and actually find this `shortest` not 
to be clear, but… it's also the name in the C implementation and the Python 
API, IIRC.

@martinvonz I see you subscribed, what do you think ?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7819/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7819

To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7787: rust-nodemap: building blocks for nodetree structures

2020-01-22 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19504.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7787?vs=19503=19504

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7787/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7787

AFFECTED FILES
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -0,0 +1,160 @@
+// Copyright 2018-2020 Georges Racinet 
+//   and Mercurial contributors
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+//! Indexing facilities for fast retrieval of `Revision` from `Node`
+//!
+//! This provides a variation on the 16-ary radix tree that is
+//! provided as "nodetree" in revlog.c, ready for append-only persistence
+//! on disk.
+//!
+//! Following existing implicit conventions, the "nodemap" terminology
+//! is used in a more abstract context.
+
+use super::Revision;
+use std::fmt;
+
+/// Low level NodeTree [`Blocks`] elements
+///
+/// These are exactly as for instance on persistent storage.
+type RawElement = i32;
+
+/// High level representation of values in NodeTree
+/// [`Blocks`](struct.Block.html)
+///
+/// This is the high level representation that most algorithms should
+/// use.
+#[derive(Clone, Debug, Eq, PartialEq)]
+enum Element {
+Rev(Revision),
+Block(usize),
+None,
+}
+
+impl From for Element {
+/// Conversion from low level representation, after endianness conversion.
+///
+/// See [`Block`](struct.Block.html) for explanation about the encoding.
+fn from(raw: RawElement) -> Element {
+if raw >= 0 {
+Element::Block(raw as usize)
+} else if raw == -1 {
+Element::None
+} else {
+Element::Rev(-raw - 2)
+}
+}
+}
+
+impl From for RawElement {
+fn from(element: Element) -> RawElement {
+match element {
+Element::None => 0,
+Element::Block(i) => i as RawElement,
+Element::Rev(rev) => -rev - 2,
+}
+}
+}
+
+/// A logical block of the `NodeTree`, packed with a fixed size.
+///
+/// These are always used in container types implementing `Index`,
+/// such as ``
+///
+/// As an array of integers, its ith element encodes that the
+/// ith potential edge from the block, representing the ith hexadecimal digit
+/// (nybble) `i` is either:
+///
+/// - absent (value -1)
+/// - another `Block` in the same indexable container (value ≥ 0)
+///  - a `Revision` leaf (value ≤ -2)
+///
+/// Endianness has to be fixed for consistency on shared storage across
+/// different architectures.
+///
+/// A key difference with the C `nodetree` is that we need to be
+/// able to represent the [`Block`] at index 0, hence -1 is the empty marker
+/// rather than 0 and the `Revision` range upper limit of -2 instead of -1.
+///
+/// Another related difference is that `NULL_REVISION` (-1) is not
+/// represented at all, because we want an immutable empty nodetree
+/// to be valid.
+
+#[derive(Clone, PartialEq)]
+pub struct Block([RawElement; 16]);
+
+impl Block {
+fn new() -> Self {
+Block([-1; 16])
+}
+
+fn get(, nybble: u8) -> Element {
+Element::from(RawElement::from_be(self.0[nybble as usize]))
+}
+
+fn set( self, nybble: u8, element: Element) {
+self.0[nybble as usize] = RawElement::to_be(element.into())
+}
+}
+
+impl fmt::Debug for Block {
+/// sparse representation for testing and debugging purposes
+fn fmt(, f:  fmt::Formatter) -> fmt::Result {
+f.debug_map()
+.entries((0..16).filter_map(|i| match self.get(i) {
+Element::None => None,
+element => Some((i, element)),
+}))
+.finish()
+}
+}
+
+#[cfg(test)]
+mod tests {
+use super::*;
+
+/// Creates a `Block` using a syntax close to the `Debug` output
+macro_rules! block {
+{$($nybble:tt : $variant:ident($val:tt)),*} => (
+{
+let mut block = Block::new();
+$(block.set($nybble, Element::$variant($val)));*;
+block
+}
+)
+}
+
+#[test]
+fn test_block_debug() {
+let mut block = Block::new();
+block.set(1, Element::Rev(3));
+block.set(10, Element::Block(0));
+assert_eq!(format!("{:?}", block), "{1: Rev(3), 10: Block(0)}");
+}
+
+#[test]
+fn test_block_macro() {
+let block = block! {5: Block(2)};
+assert_eq!(format!("{:?}", block), "{5: Block(2)}");
+
+let block = block! {13: Rev(15), 5: Block(2)};
+assert_eq!(for

D7787: rust-nodemap: building blocks for nodetree structures

2020-01-22 Thread gracinet (Georges Racinet)
gracinet added a comment.


  These are done, thanks for the remarks.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7787/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7787

To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7787: rust-nodemap: building blocks for nodetree structures

2020-01-22 Thread gracinet (Georges Racinet)
gracinet marked 3 inline comments as done.
gracinet updated this revision to Diff 19503.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7787?vs=19322=19503

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7787/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7787

AFFECTED FILES
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -0,0 +1,160 @@
+// Copyright 2018-2020 Georges Racinet 
+//   and Mercurial contributors
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+//! Indexing facilities for fast retrieval of `Revision` from `Node`
+//!
+//! This provides a variation on the radix tree with valency 16 that is
+//! provided as "nodetree" in revlog.c, ready for append-only persistence
+//! on disk.
+//!
+//! Following existing implicit conventions, the "nodemap" terminology
+//! is used in a more abstract context.
+
+use super::Revision;
+use std::fmt;
+
+/// Low level NodeTree [`Blocks`] elements
+///
+/// These are exactly as for instance on persistent storage.
+type RawElement = i32;
+
+/// High level representation of values in NodeTree
+/// [`Blocks`](struct.Block.html)
+///
+/// This is the high level representation that most algorithms should
+/// use.
+#[derive(Clone, Debug, Eq, PartialEq)]
+enum Element {
+Rev(Revision),
+Block(usize),
+None,
+}
+
+impl From for Element {
+/// Conversion from low level representation, after endianity conversion.
+///
+/// See [`Block`](struct.Block.html) for explanation about the encoding.
+fn from(raw: RawElement) -> Element {
+if raw >= 0 {
+Element::Block(raw as usize)
+} else if raw == -1 {
+Element::None
+} else {
+Element::Rev(-raw - 2)
+}
+}
+}
+
+impl From for RawElement {
+fn from(elt: Element) -> RawElement {
+match elt {
+Element::None => 0,
+Element::Block(i) => i as RawElement,
+Element::Rev(rev) => -rev - 2,
+}
+}
+}
+
+/// A logical block of the `NodeTree`, packed with a fixed size.
+///
+/// These are always used in container types implementing `Index`,
+/// such as ``
+///
+/// As an array of integers, its ith element encodes that the
+/// ith potential edge from the block, representing the ith hexadecimal digit
+/// (nybble) `i` is either:
+///
+/// - absent (value -1)
+/// - another `Block` in the same indexable container (value ≥ 0)
+///  - a `Revision` leaf (value ≤ -2)
+///
+/// Endianity has to be fixed for consistency on shared storage across
+/// different architectures.
+///
+/// A key difference with the C `nodetree` is that we need to be
+/// able to represent the [`Block`] at index 0, hence -1 is the empty marker
+/// rather than 0 and the `Revision` range upper limit of -2 instead of -1.
+///
+/// Another related difference is that `NULL_REVISION` (-1) is not
+/// represented at all, because we want an immutable empty nodetree
+/// to be valid.
+
+#[derive(Clone, PartialEq)]
+pub struct Block([RawElement; 16]);
+
+impl Block {
+fn new() -> Self {
+Block([-1; 16])
+}
+
+fn get(, nybble: u8) -> Element {
+Element::from(RawElement::from_be(self.0[nybble as usize]))
+}
+
+fn set( self, nybble: u8, elt: Element) {
+self.0[nybble as usize] = RawElement::to_be(elt.into())
+}
+}
+
+impl fmt::Debug for Block {
+/// sparse representation for testing and debugging purposes
+fn fmt(, f:  fmt::Formatter<'_>) -> fmt::Result {
+f.debug_map()
+.entries((0..16).filter_map(|i| match self.get(i) {
+Element::None => None,
+elt => Some((i, elt)),
+}))
+.finish()
+}
+}
+
+#[cfg(test)]
+mod tests {
+use super::*;
+
+/// Creates a `Block` using a syntax close to the `Debug` output
+macro_rules! block {
+{$($nybble:tt : $variant:ident($val:tt)),*} => (
+{
+let mut block = Block::new();
+$(block.set($nybble, Element::$variant($val)));*;
+block
+}
+)
+}
+
+#[test]
+fn test_block_debug() {
+let mut block = Block::new();
+block.set(1, Element::Rev(3));
+block.set(10, Element::Block(0));
+assert_eq!(format!("{:?}", block), "{1: Rev(3), 10: Block(0)}");
+}
+
+#[test]
+fn test_block_macro() {
+let block = block! {5: Block(2)};
+assert_eq!(format!("{:?}", block), "{5: Block(2)}");
+
+let block = block! {13: Rev(15), 5: Bloc

D7791: rust-nodemap: NodeMap trait with simplest implementor

2020-01-22 Thread gracinet (Georges Racinet)
gracinet added inline comments.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:150
> I find `Deref + Send` hard to understand. I think it would 
> be helpful to name it. Could we at least define an alias for it?
> 
> Perhaps it's even better to define a trait for it and add named methods on 
> that, but if those methods would just be `len()` and `index()` it probably 
> doesn't make any sense to define the trait.

I've looked into trait alias while working on this, for the exact same reason, 
and went through most you guys are saying about that.

Seems that this is complicated for the language because they'd like to have the
possibility to `impl` it.

> martinvonz wrote in nodemap.rs:150
> Do we need `Send`? Maybe it later when you need it (unless you actually need 
> it now and I'm just mistaken, of course).

About `Send`, without it we can't expose `NodeTree` to the Python layer, IIRC

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: Alphare, martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7787: rust-nodemap: building blocks for nodetree structures

2020-01-22 Thread gracinet (Georges Racinet)
gracinet added inline comments.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:8
> i think s/valency/arity/ is a more common term for it

yes, and valency is actually almost incorrect actually in that case, it turns 
out

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7787/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7787

To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7787: rust-nodemap: building blocks for nodetree structures

2020-01-22 Thread gracinet (Georges Racinet)
gracinet added a comment.


  @martinvonz
  
  > Just a few nits here, but it looks like we're expecting an update on this 
series anyway, so maybe you can address them.
  
  No problem with these. I'll be doing a swipe on the whole series today and 
will include them.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:96
> Could you call it `element` instead? I think it's a little obvious what that 
> means. `elt` is at least not an abbreviation I'm familiar with.

right, I've must have spent too much of my life using XML libraries, I suppose 
;-)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7787/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7787

To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7952: rust: add a README

2020-01-21 Thread gracinet (Georges Racinet)
gracinet added inline comments.

INLINE COMMENTS

> valentin.gatienbaron wrote in README.rst:25
> When I tested, setting HGMODULEPOLICY didn't work, perhaps because of what 
> Raphael said.

Both Raphaël and I were saying the same thin, perhaps a bit partially.

Here's the whole picture:

- options to `make` or `setup.py` govern what is built (`HGWITHRUSTEXT` also).
- among the built extensions, what is actually used at runtime depends on the 
module policy, which is read from `mercurial/__modulepolicy__`.py and can be 
overridden by HGMODULEPOLICY ,
- at build time, a default policy value is set in 
`mercurial/__modulepolicy__.py`. For the record, this is `rust+c-allow` for 
local builds with Rust and `rust+c` for other Rust builds.

This policy system is actuallly not specific to Rust, it's also in use  with 
the C extensions, the major difference being that C extensions are built by 
default.

For example, if one does not build the C extensions and use `HGMODULEPOLICY=c`, 
that's a hard error.  With `HGMODULEPOLICY=allow`, one gets a fallback to the 
pure implementation.

Same with Rust: `rust+c` gives a hard error is Rust isn't built, while 
`rust+c-allow` fallbacks to C extensions only

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7952/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7952

To: valentin.gatienbaron, #hg-reviewers, Alphare
Cc: pulkit, gracinet, Alphare, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7952: rust: add a README

2020-01-21 Thread gracinet (Georges Racinet)
gracinet added a comment.


  "deprecate" was a bit too strong. Actually I'd like to downgrade 
`HGWITHRUSTEXT` to be something that'll be useful to play with alternative ways 
of building the Rust extension(s). Example: if someone wants to experiment with 
PyO3.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7952/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7952

To: valentin.gatienbaron, #hg-reviewers, Alphare
Cc: pulkit, gracinet, Alphare, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7952: rust: add a README

2020-01-21 Thread gracinet (Georges Racinet)
gracinet added a comment.


  My general feeling is that I'd like to deprecate `HGWITHRUSTEXT`.
  
  We now have `python setup.py --rust [build|install|etc]` and the coincidental 
`make local PURE=--rust` (works because `$PURE`, while meant to be for `--pure` 
is inserted blindly at the right place in the `setup.py` command.
  
  Maybe this is a good time to decide of a better make variable ? 
EXTENSION_TYPE ? COMPILE_OPTIONS ? BUILD_OPTIONS ?
  
  It's obviously a good thing to have official instructions, thank you Valentin 
for raising the subject.

INLINE COMMENTS

> pulkit wrote in README.rst:25
> IIRC, it's now possible to use `HGMODULEPOLICY=rust`. @gracinet that's 
> possible now right?

`HGMODULEPOLICY` is for runtime, it can be `rust+c` or `rust+c-allow`. 
In the first form, not having the Rust extension is a failure, in the second, 
there's a fallback.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7952/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7952

To: valentin.gatienbaron, #hg-reviewers, Alphare
Cc: pulkit, gracinet, Alphare, mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7790: rust-node: handling binary Node prefix

2020-01-20 Thread gracinet (Georges Racinet)
gracinet added a comment.


  @martinvonz in this code, we're in competition with the C implementation, 
which does something similar.
  
  Switching to a full u8 per nybble is more than a few minutes, it means 
changing the ownership model completely. Also, it introduces a new allocation 
and a copy instead of merely taking a reference.
  
  So, it's more work and has performance impact that we have no means to 
measure.
  
  The odd/even thing is not *that* complicated. It's very bad for readability 
in the C code, but it's nicely encapsulated in the Rust case and fully tested. 
I'd be more comfortable trying what you're suggesting once we have real-life 
measurements.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

To: gracinet, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7788: rust-node: binary Node and conversion utilities

2020-01-20 Thread gracinet (Georges Racinet)
gracinet added a comment.


  @durin42 @martinvonz we might have a nice path forward with this, with the 
suggestion of @kevincox to make `Node` a truly defined type, not just an alias, 
with a private member for the bytes.
  
  Indeed I find it important to know if a node information is complete or 
merely a prefix. With a truly defined type, the assumption that it's 20 bytes 
long would be completely encapsulated in the `Node` type. It would then be very 
easy (much easier than with Python code) to change it to something else, 
whether that'd be a longer bag of bytes, or a struct with two values, an enum 
with alternatives for sha-1 or sha-2 etc.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7788/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7788

To: gracinet, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Heptapod 0.8.0 (SSH) released

2020-01-17 Thread Georges Racinet
We're glad to announce version 0.8.0 of Heptapod, the project to
integrate Mercurial in GitLab.

What follows is an abridged version of the online announcement published
earlier today [1].

It's been a crowded development cycle, as we are readying ourselves for
public hosting at Clever Cloud [2], thanks to a partnership with Octobus.

Building on the groundwork layed down in Heptapod 0.7.0, several major
targets have been attained in this new version:

- SSH support
- Install from source
- Full import of Bitbucket Pull Requests

Further important foundational work has been done: we're hoping to offer
contributors a much better experience in the near future, and finally,
work on the HGitaly project has started for good.

It's a perfect time to start contributing - well, can it be ever a bad
time?

More on what's going on and upgrade notes are available from the full
announcement [1].

Cheers, and happy 2020 everyone.


[1] https://heptapod.net/heptapod-080-released.html

[2] https://www.clever-cloud.com/en/heptapod

-- 
Georges Racinet
https://octobus.net, https://heptapod.net
GPG: BF5456F4DC625443849B6E58EE20CA44EF691D39, sur serveurs publics




signature.asc
Description: OpenPGP digital signature
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7796: rust-nodemap: input/output primitives

2020-01-16 Thread gracinet (Georges Racinet)
gracinet added a comment.


  @kevincox (replying hastily because I'm also finishing something unrelated 
today).
  Doesn't `mem::size_of` guarantee to take any padding into account? At least 
that's what the doc seems to say: 
https://doc.rust-lang.org/std/mem/fn.size_of.html

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7787: rust-nodemap: building blocks for nodetree structures

2020-01-15 Thread gracinet (Georges Racinet)
gracinet added inline comments.
gracinet marked 2 inline comments as done.

INLINE COMMENTS

> gracinet wrote in nodemap.rs:111
> Nice, thanks for the tip

So, that gives formatting with braces, hence for consistency I changed the 
`block!` macro, too.

I didn't keep the hexadecimal formatting, because it'd now lead to lots of `\"` 
making the tests less readable.

An upside of this is that it's now really consistent with `block!`. A downside 
is that someone using it for real debugging with input given in hexadecimal 
would presumably have to mentally convert hexadecimal nybbles to their decimal 
form.
It would have been a bit of a drag in the intiial development effort, but I 
don't think
that'll be a problem in the future: : either it'll be on small data or with 
diffrent tools anyway.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7787/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7787

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7819: rust-nodemap: core implementation for shortest

2020-01-15 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19331.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7819?vs=19139=19331

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7819/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7819

AFFECTED FILES
  rust/hg-core/src/revlog/node.rs
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -16,6 +16,7 @@
 node::get_nybble, node::NULL_NODE, Node, NodeError, NodePrefix,
 NodePrefixRef, Revision, RevlogIndex, NULL_REVISION,
 };
+use std::cmp::max;
 use std::fmt;
 use std::mem;
 use std::ops::Deref;
@@ -47,6 +48,20 @@
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError>;
 
+/// Give the size of the shortest node prefix that determines
+/// the revision uniquely.
+///
+/// From a binary node prefix, if it is matched in the node map, this
+/// returns the number of hexadecimal digits that would had sufficed
+/// to find the revision uniquely.
+///
+/// Returns `None` if no `Revision` could be found for the prefix.
+fn shortest_bin<'a>(
+,
+idx:  RevlogIndex,
+node_prefix: NodePrefixRef<'a>,
+) -> Result, NodeMapError>;
+
 fn find_hex(
 ,
 idx:  RevlogIndex,
@@ -54,6 +69,16 @@
 ) -> Result, NodeMapError> {
 self.find_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
 }
+
+/// Same as `shortest_bin`, with the hexadecimal representation of the
+/// prefix as input.
+fn shortest_hex(
+,
+idx:  RevlogIndex,
+prefix: ,
+) -> Result, NodeMapError> {
+self.shortest_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
+}
 }
 
 pub trait MutableNodeMap: NodeMap {
@@ -215,20 +240,24 @@
 fn validate_candidate<'p>(
 idx:  RevlogIndex,
 prefix: NodePrefixRef<'p>,
-rev: Option,
-) -> Result, NodeMapError> {
-if prefix.is_prefix_of(_NODE) {
-// NULL_REVISION always matches a prefix made only of zeros
+cand: (Option, usize),
+) -> Result<(Option, usize), NodeMapError> {
+let (rev, steps) = cand;
+if let Some(nz_nybble) = prefix.first_different_nybble(_NODE) {
+rev.map_or(Ok((None, steps)), |r| {
+has_prefix_or_none(idx, prefix, r)
+.map(|opt| (opt, max(steps, nz_nybble + 1)))
+})
+} else {
+// the prefix is only made of zeros; NULL_REVISION always matches it
 // and any other *valid* result is an ambiguity
 match rev {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 Some(r) => match has_prefix_or_none(idx, prefix, r)? {
-None => Ok(Some(NULL_REVISION)),
+None => Ok((Some(NULL_REVISION), steps + 1)),
 _ => Err(NodeMapError::MultipleResults),
 },
 }
-} else {
-rev.map_or(Ok(None), |r| has_prefix_or_none(idx, prefix, r))
 }
 }
 
@@ -308,10 +337,10 @@
 fn lookup<'p>(
 ,
 prefix: NodePrefixRef<'p>,
-) -> Result, NodeMapError> {
-for (leaf, _, _, opt) in self.visit(prefix) {
+) -> Result<(Option, usize), NodeMapError> {
+for (i, (leaf, _, _, opt)) in self.visit(prefix).enumerate() {
 if leaf {
-return Ok(opt);
+return Ok((opt, i + 1));
 }
 }
 Err(NodeMapError::MultipleResults)
@@ -540,6 +569,16 @@
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError> {
 validate_candidate(idx, prefix.clone(), self.lookup(prefix)?)
+.map(|(opt, _shortest)| opt)
+}
+
+fn shortest_bin<'a>(
+,
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'a>,
+) -> Result, NodeMapError> {
+validate_candidate(idx, prefix.clone(), self.lookup(prefix)?)
+.map(|(opt, shortest)| opt.map(|_rev| shortest))
 }
 }
 
@@ -665,6 +704,7 @@
 assert_eq!(nt.find_hex(, "01"), Ok(Some(9)));
 assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "00a"), Ok(Some(0)));
+assert_eq!(nt.shortest_hex(, "00a"), Ok(Some(3)));
 assert_eq!(nt.find_hex(, "000"), Ok(Some(NULL_REVISION)));
 }
 
@@ -684,8 +724,10 @@
 };
 assert_eq!(nt.find_hex(, "10")?, Some(1));
 assert_eq!(nt.find_hex(, "c")?, Some(2));
+assert_eq!(nt.shortest_hex(, "c")?, Some(1));
 assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "000")?, Some(NULL_REVISION));
+assert_eq!(nt.shortest_hex(, "000")?, Some(3));
 assert_eq!(nt.find_hex(, "01")?, Some(9));
 Ok(())
 }
@@ -721,6 +763,13 @@
 self.nt.find_hex(, prefix)
 }
 
+fn 

D7798: rust-nodemap: special case for prefixes of NULL_NODE

2020-01-15 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19330.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7798?vs=19138=19330

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7798/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7798

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -13,8 +13,8 @@
 //! is used in a more abstract context.
 
 use super::{
-node::get_nybble, Node, NodeError, NodePrefix, NodePrefixRef, Revision,
-RevlogIndex,
+node::get_nybble, node::NULL_NODE, Node, NodeError, NodePrefix,
+NodePrefixRef, Revision, RevlogIndex, NULL_REVISION,
 };
 use std::fmt;
 use std::mem;
@@ -207,6 +207,31 @@
 })
 }
 
+/// validate that the candidate's node starts indeed with given prefix,
+/// and treat ambiguities related to `NULL_REVISION`.
+///
+/// From the data in the NodeTree, one can only conclude that some
+/// revision is the only one for a *subprefix* of the one being looked up.
+fn validate_candidate<'p>(
+idx:  RevlogIndex,
+prefix: NodePrefixRef<'p>,
+rev: Option,
+) -> Result, NodeMapError> {
+if prefix.is_prefix_of(_NODE) {
+// NULL_REVISION always matches a prefix made only of zeros
+// and any other *valid* result is an ambiguity
+match rev {
+None => Ok(Some(NULL_REVISION)),
+Some(r) => match has_prefix_or_none(idx, prefix, r)? {
+None => Ok(Some(NULL_REVISION)),
+_ => Err(NodeMapError::MultipleResults),
+},
+}
+} else {
+rev.map_or(Ok(None), |r| has_prefix_or_none(idx, prefix, r))
+}
+}
+
 impl NodeTree {
 /// Initiate a NodeTree from an immutable slice-like of `Block`
 ///
@@ -280,9 +305,6 @@
 }
 
 /// Main working method for `NodeTree` searches
-///
-/// This partial implementation lacks
-/// - special cases for NULL_REVISION
 fn lookup<'p>(
 ,
 prefix: NodePrefixRef<'p>,
@@ -517,9 +539,7 @@
 idx:  RevlogIndex,
 prefix: NodePrefixRef<'a>,
 ) -> Result, NodeMapError> {
-self.lookup(prefix.clone()).and_then(|opt| {
-opt.map_or(Ok(None), |rev| has_prefix_or_none(idx, prefix, rev))
-})
+validate_candidate(idx, prefix.clone(), self.lookup(prefix)?)
 }
 }
 
@@ -643,8 +663,9 @@
 
 assert_eq!(nt.find_hex(, "0"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "01"), Ok(Some(9)));
-assert_eq!(nt.find_hex(, "00"), Ok(Some(0)));
+assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
 assert_eq!(nt.find_hex(, "00a"), Ok(Some(0)));
+assert_eq!(nt.find_hex(, "000"), Ok(Some(NULL_REVISION)));
 }
 
 #[test]
@@ -663,7 +684,8 @@
 };
 assert_eq!(nt.find_hex(, "10")?, Some(1));
 assert_eq!(nt.find_hex(, "c")?, Some(2));
-assert_eq!(nt.find_hex(, "00")?, Some(0));
+assert_eq!(nt.find_hex(, "00"), Err(MultipleResults));
+assert_eq!(nt.find_hex(, "000")?, Some(NULL_REVISION));
 assert_eq!(nt.find_hex(, "01")?, Some(9));
 Ok(())
 }



To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7796: rust-nodemap: input/output primitives

2020-01-15 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19329.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7796?vs=19137=19329

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7796/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7796

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -17,8 +17,10 @@
 RevlogIndex,
 };
 use std::fmt;
+use std::mem;
 use std::ops::Deref;
 use std::ops::Index;
+use std::slice;
 
 #[derive(Debug, PartialEq)]
 pub enum NodeMapError {
@@ -132,6 +134,8 @@
 #[derive(Clone, PartialEq)]
 pub struct Block([RawElement; 16]);
 
+pub const BLOCK_SIZE: usize = mem::size_of::();
+
 impl Block {
 fn new() -> Self {
 Block([-1; 16])
@@ -219,6 +223,57 @@
 }
 }
 
+/// Create from an opaque bunch of bytes
+///
+/// The created `NodeTreeBytes` is taken after the fixed `offset` from
+/// `buffer`, of which exactly `amount` bytes are used.
+///
+/// - `buffer` could be derived from `PyBuffer` and `Mmap` objects.
+/// - `offset` allows for the final file format to include fixed data
+///   (generation number, behavioural flags)
+/// - `amount` is expressed in bytes, and is not automatically derived from
+///   `bytes`, so that a caller that manages them atomically can perform
+///   temporary disk serializations and still rollback easily if needed.
+///   First use-case for this would be to support Mercurial shell hooks.
+///
+/// panics if `buffer` is smaller than `offset + amount`
+pub fn load_bytes(
+bytes: Box + Send>,
+offset: usize,
+amount: usize,
+) -> Self {
+NodeTree::new(Box::new(NodeTreeBytes::new(bytes, offset, amount)))
+}
+
+/// Retrieve added `Block` and the original immutable data
+pub fn into_readonly_and_added(
+self,
+) -> (Box + Send>, Vec) {
+let mut vec = self.growable;
+let readonly = self.readonly;
+if readonly.last() != Some() {
+vec.push(self.root);
+}
+(readonly, vec)
+}
+
+/// Retrieve added `Blocks` as bytes, ready to be written to persistent
+/// storage
+pub fn into_readonly_and_added_bytes(
+self,
+) -> (Box + Send>, Vec) {
+let (readonly, vec) = self.into_readonly_and_added();
+let bytes = unsafe {
+Vec::from_raw_parts(
+vec.as_ptr() as *mut u8,
+vec.len() * BLOCK_SIZE,
+vec.capacity() * BLOCK_SIZE,
+)
+};
+mem::forget(vec);
+(readonly, bytes)
+}
+
 /// Total number of blocks
 fn len() -> usize {
 self.readonly.len() + self.growable.len() + 1
@@ -364,6 +419,42 @@
 }
 }
 
+pub struct NodeTreeBytes {
+buffer: Box + Send>,
+offset: usize,
+len_in_blocks: usize,
+}
+
+impl NodeTreeBytes {
+fn new(
+buffer: Box + Send>,
+offset: usize,
+amount: usize,
+) -> Self {
+assert!(buffer.len() >= offset + amount);
+let len_in_blocks = amount / BLOCK_SIZE;
+NodeTreeBytes {
+buffer,
+offset,
+len_in_blocks,
+}
+}
+}
+
+impl Deref for NodeTreeBytes {
+type Target = [Block];
+
+fn deref() -> &[Block] {
+unsafe {
+slice::from_raw_parts(
+().as_ptr().offset(self.offset as isize)
+as *const Block,
+self.len_in_blocks,
+)
+}
+}
+}
+
 struct NodeTreeVisitor<'n, 'p> {
 nt: &'n NodeTree,
 prefix: NodePrefixRef<'p>,
@@ -708,4 +799,30 @@
 
 Ok(())
 }
+
+#[test]
+fn test_into_added_empty() {
+assert!(sample_nodetree().into_readonly_and_added().1.is_empty());
+assert!(sample_nodetree()
+.into_readonly_and_added_bytes()
+.1
+.is_empty());
+}
+
+#[test]
+fn test_into_added_bytes() -> Result<(), NodeMapError> {
+let mut idx = TestNtIndex::new();
+idx.insert(0, "1234")?;
+let mut idx = idx.commit();
+idx.insert(4, "cafe")?;
+let (_, bytes) = idx.nt.into_readonly_and_added_bytes();
+
+// only the root block has been changed
+assert_eq!(bytes.len(), BLOCK_SIZE);
+// big endian for -2
+assert_eq!([4..2 * 4], [255, 255, 255, 254]);
+// big endian for -6
+assert_eq!([12 * 4..13 * 4], [255, 255, 255, 250]);
+Ok(())
+}
 }



To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D7795: rust-nodemap: insert method

2020-01-15 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19328.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7795?vs=19044=19328

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7795/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7795

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/revlog/nodemap.rs 
b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -12,7 +12,10 @@
 //! Following existing implicit conventions, the "nodemap" terminology
 //! is used in a more abstract context.
 
-use super::{NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex};
+use super::{
+node::get_nybble, Node, NodeError, NodePrefix, NodePrefixRef, Revision,
+RevlogIndex,
+};
 use std::fmt;
 use std::ops::Deref;
 use std::ops::Index;
@@ -51,6 +54,15 @@
 }
 }
 
+pub trait MutableNodeMap: NodeMap {
+fn insert(
+ self,
+index: ,
+node: ,
+rev: Revision,
+) -> Result<(), NodeMapError>;
+}
+
 /// Low level NodeTree [`Blocks`] elements
 ///
 /// These are exactly as for instance on persistent storage.
@@ -240,6 +252,116 @@
 done: false,
 }
 }
+/// Return a mutable reference for `Block` at index `idx`.
+///
+/// If `idx` lies in the immutable area, then the reference is to
+/// a newly appended copy.
+///
+/// Returns (new_idx, glen, mut_ref) where
+///
+/// - `new_idx` is the index of the mutable `Block`
+/// - `mut_ref` is a mutable reference to the mutable Block.
+/// - `glen` is the new length of `self.growable`
+///
+/// Note: the caller wouldn't be allowed to query `self.growable.len()`
+/// itself because of the mutable borrow taken with the returned `Block`
+fn mutable_block( self, idx: usize) -> (usize,  Block, usize) {
+let ro_blocks = 
+let ro_len = ro_blocks.len();
+let glen = self.growable.len();
+if idx < ro_len {
+// TODO OPTIM I think this makes two copies
+self.growable.push(ro_blocks[idx].clone());
+(glen + ro_len,  self.growable[glen], glen + 1)
+} else if glen + ro_len == idx {
+(idx,  self.root, glen)
+} else {
+(idx,  self.growable[idx - ro_len], glen)
+}
+}
+
+/// Main insertion method
+///
+/// This will dive in the node tree to find the deepest `Block` for
+/// `node`, split it as much as needed and record `node` in there.
+/// The method then backtracks, updating references in all the visited
+/// blocks from the root.
+///
+/// All the mutated `Block` are copied first to the growable part if
+/// needed. That happens for those in the immutable part except the root.
+pub fn insert(
+ self,
+index: ,
+node: ,
+rev: Revision,
+) -> Result<(), NodeMapError> {
+let ro_len = ();
+
+let mut visit_steps: Vec<(usize, u8, Option)> = self
+.visit(node.into())
+.map(|(_leaf, visit, nybble, rev_opt)| (visit, nybble, rev_opt))
+.collect();
+let read_nybbles = visit_steps.len();
+// visit_steps cannot be empty, since we always visit the root block
+let (deepest_idx, mut nybble, rev_opt) = visit_steps.pop().unwrap();
+let (mut block_idx, mut block, mut glen) =
+self.mutable_block(deepest_idx);
+
+match rev_opt {
+None => {
+// Free slot in the deepest block: no splitting has to be done
+block.set(nybble, Element::Rev(rev));
+}
+Some(old_rev) => {
+let old_node = index.node(old_rev).ok_or_else(|| {
+NodeMapError::RevisionNotInIndex(old_rev)
+})?;
+if old_node == node {
+return Ok(()); // avoid creating lots of useless blocks
+}
+
+// Looping over the tail of nybbles in both nodes, creating
+// new blocks until we find the difference
+let mut new_block_idx = ro_len + glen;
+for nybble_pos in read_nybbles..40 {
+block.set(nybble, Element::Block(new_block_idx));
+
+let new_nybble = get_nybble(nybble_pos, node);
+let old_nybble = get_nybble(nybble_pos, old_node);
+
+if old_nybble == new_nybble {
+self.growable.push(Block::new());
+block =  self.growable[glen];
+glen += 1;
+new_block_idx += 1;
+nybble = new_nybble;
+} else {
+let mut new_block = Block::new();
+new_block.set(old_nybble, Element::Rev(old_rev));
+

D7788: rust-node: binary Node and conversion utilities

2020-01-15 Thread gracinet (Georges Racinet)
gracinet updated this revision to Diff 19323.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D7788?vs=19037=19323

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7788/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7788

AFFECTED FILES
  rust/hg-core/src/dirstate/dirs_multiset.rs
  rust/hg-core/src/matchers.rs
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/node.rs
  rust/hg-core/src/utils.rs
  rust/hg-core/src/utils/hg_path.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/utils/hg_path.rs 
b/rust/hg-core/src/utils/hg_path.rs
--- a/rust/hg-core/src/utils/hg_path.rs
+++ b/rust/hg-core/src/utils/hg_path.rs
@@ -157,7 +157,7 @@
 return Err(HgPathError::ContainsNullByte(
 bytes.to_vec(),
 index,
-))
+));
 }
 b'/' => {
 if previous_byte.is_some() && previous_byte == Some(b'/') {
diff --git a/rust/hg-core/src/utils.rs b/rust/hg-core/src/utils.rs
--- a/rust/hg-core/src/utils.rs
+++ b/rust/hg-core/src/utils.rs
@@ -38,10 +38,7 @@
 /// use crate::hg::utils::replace_slice;
 /// let mut line = b"I hate writing tests!".to_vec();
 /// replace_slice( line, b"hate", b"love");
-/// assert_eq!(
-/// line,
-/// b"I love writing tests!".to_vec()
-/// );
+/// assert_eq!(line, b"I love writing tests!".to_vec());
 /// ```
 pub fn replace_slice(buf:  [T], from: &[T], to: &[T])
 where
@@ -86,18 +83,9 @@
 
 /// ```
 /// use hg::utils::SliceExt;
-/// assert_eq!(
-/// b"  to trim  ".trim(),
-/// b"to trim"
-/// );
-/// assert_eq!(
-/// b"to trim  ".trim(),
-/// b"to trim"
-/// );
-/// assert_eq!(
-/// b"  to trim".trim(),
-/// b"to trim"
-/// );
+/// assert_eq!(b"  to trim  ".trim(), b"to trim");
+/// assert_eq!(b"to trim  ".trim(), b"to trim");
+/// assert_eq!(b"  to trim".trim(), b"to trim");
 /// ```
 fn trim() -> &[u8] {
 self.trim_start().trim_end()
diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/revlog/node.rs
@@ -0,0 +1,91 @@
+// Copyright 2019-2020 Georges Racinet 
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+//! Definitions and utilities for Revision nodes
+//!
+//! In Mercurial code base, it is customary to call "a node" the binary SHA
+//! of a revision.
+
+use std::num::ParseIntError;
+
+/// Binary revisions SHA
+pub type Node = [u8; 20];
+
+/// The node value for NULL_REVISION
+pub const NULL_NODE: Node = [0; 20];
+
+#[derive(Debug, PartialEq)]
+pub enum NodeError {
+ExactLengthRequired(String),
+NotHexadecimal,
+}
+
+pub fn node_from_hex(hex: ) -> Result {
+if hex.len() != 40 {
+return Err(NodeError::ExactLengthRequired(hex.to_string()));
+}
+let mut node = [0; 20];
+for i in 0..20 {
+node[i] = u8::from_str_radix([i * 2..i * 2 + 2], 16)?
+}
+Ok(node)
+}
+
+pub fn node_to_hex(n: ) -> String {
+let as_vec: Vec = n.iter().map(|b| format!("{:02x}", b)).collect();
+as_vec.join("")
+}
+
+/// Retrieve the `i`th half-byte from a bytes slice
+///
+/// This is also the `i`th hexadecimal digit in numeric form,
+/// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
+pub fn get_nybble(i: usize, s: &[u8]) -> u8 {
+if i % 2 == 0 {
+s[i / 2] >> 4
+} else {
+s[i / 2] & 0x0f
+}
+}
+
+impl From for NodeError {
+fn from(_: ParseIntError) -> Self {
+NodeError::NotHexadecimal
+}
+}
+
+#[cfg(test)]
+mod tests {
+use super::*;
+
+const SAMPLE_NODE_HEX:  = "0123456789abcdeffedcba9876543210deadbeef";
+
+#[test]
+fn test_node_from_hex() {
+assert_eq!(
+node_from_hex(SAMPLE_NODE_HEX),
+Ok([
+0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc,
+0xba, 0x98, 0x76, 0x54, 0x32, 0x10, 0xde, 0xad, 0xbe, 0xef
+])
+);
+let short = "0123456789abcdeffedcba9876543210";
+assert_eq!(
+node_from_hex(short),
+Err(NodeError::ExactLengthRequired(short.to_string())),
+);
+}
+
+#[test]
+fn test_node_to_hex() {
+assert_eq!(
+node_to_hex(&[
+0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc,
+0xba, 0x98, 0x76, 0x54, 0x32, 0x10, 0xde, 0xad, 0xbe, 0xef
+]),
+SAMPLE_NODE_HEX
+);
+}
+}
di

  1   2   3   4   5   6   >