Re: New revlog format, plan page

2021-01-06 Thread Joerg Sonnenberger
On Thu, Jan 07, 2021 at 01:05:34AM +, Johannes Totz wrote:
> Have we ever addressed the file duplication on hg-mv? .hg/store/data/ will
> end up with lots of duplicated data. That has always been my biggest gripe.

That's the unified revlog issue.

Joerg
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: New revlog format, plan page

2021-01-06 Thread Johannes Totz

On 05/01/2021 15:38, Raphaël Gomès wrote:

Hi all,

During the last (virtual) sprint, a lot of us spoke about the need for a 
format change of the revlog to overcome some of its limitations.


I've opened a very much draft plan page [1] to try to list all the 
things we want to do in that version and try to figure out an efficient 
new format.


I'm aware that the v2 is already planned, but I figured that we can just 
merge that (seemingly) paused effort and this new one.


I wish you all a nice 2021!
Raphaël

[1] https://www.mercurial-scm.org/wiki/RevlogV2Plan


I haven't kept up to date with hg dev... sorry if this is a stupid question:

Have we ever addressed the file duplication on hg-mv? .hg/store/data/ 
will end up with lots of duplicated data. That has always been my 
biggest gripe.


Random idea I had:
store the *.d files as usual but add a layer of "segments" like *.d.0 
and *.d.1 and so on. So when one does a hg-mv, the initial oldname.d.0 
is hardlinked to newname.d.0 and any appends to oldname.d.0 would 
instead go to a new segment oldname.d.1 (same for newname.d.1) because 
that segment *.d.0 has >1 link count. And so on for subsequent segments.
That should work for local clones, moves, copies. But will prob make a 
mess of the wire protocol.


___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D9687: contrib: py3 compat for perfnodemap

2021-01-06 Thread joerg.sonnenberger (Joerg Sonnenberger)
joerg.sonnenberger created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9687

AFFECTED FILES
  contrib/perf.py

CHANGE DETAILS

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -1627,12 +1627,12 @@
 mercurial.revlog._prereadsize = 2 ** 24  # disable lazy parser in old hg
 
 unfi = repo.unfiltered()
-clearcaches = opts['clear_caches']
+clearcaches = opts[b'clear_caches']
 # find the filecache func directly
 # This avoid polluting the benchmark with the filecache logic
 makecl = unfi.__class__.changelog.func
 if not opts[b'rev']:
-raise error.Abort('use --rev to specify revisions to look up')
+raise error.Abort(b'use --rev to specify revisions to look up')
 revs = scmutil.revrange(repo, opts[b'rev'])
 cl = repo.changelog
 nodes = [cl.node(r) for r in revs]



To: joerg.sonnenberger, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D9684: copies-rust: move CPU-heavy Rust processing into a child thread

2021-01-06 Thread SimonSapin
SimonSapin created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  … that runs in parallel with the parent thread fetching data.
  This can be disabled through a new config. CLI example:
  
hg --config=devel.copy-tracing.multi-thread=no
  
  For now both threads use the GIL, later commits will reduce this.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9684

AFFECTED FILES
  mercurial/configitems.py
  mercurial/copies.py
  rust/Cargo.lock
  rust/hg-cpython/Cargo.toml
  rust/hg-cpython/src/copy_tracing.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/copy_tracing.rs 
b/rust/hg-cpython/src/copy_tracing.rs
--- a/rust/hg-cpython/src/copy_tracing.rs
+++ b/rust/hg-cpython/src/copy_tracing.rs
@@ -22,6 +22,7 @@
 children_count: PyDict,
 target_rev: Revision,
 rev_info: PyObject,
+multi_thread: bool,
 ) -> PyResult {
 let children_count = children_count
 .items(py)
@@ -42,20 +43,81 @@
 Ok((rev, p1, p2, opt_bytes))
 });
 
-let mut combine_changeset_copies =
-CombineChangesetCopies::new(children_count);
+let path_copies = if !multi_thread {
+let mut combine_changeset_copies =
+CombineChangesetCopies::new(children_count);
+
+for rev_info in revs_info {
+let (rev, p1, p2, opt_bytes) = rev_info?;
+let files = match _bytes {
+Some(bytes) => ChangedFiles::new(bytes.data(py)),
+// Python None was extracted to Option::None,
+// meaning there was no copy data.
+None => ChangedFiles::new_empty(),
+};
+
+combine_changeset_copies.add_revision(rev, p1, p2, files)
+}
+combine_changeset_copies.finish(target_rev)
+} else {
+// Use a bounded channel to provide back-pressure:
+// if the child thread is slower to process revisions than this thread
+// is to gather data for them, an unbounded channel would keep
+// growing and eat memory.
+//
+// TODO: tweak the bound?
+let (rev_info_sender, rev_info_receiver) =
+crossbeam_channel::bounded::(1000);
 
-for rev_info in revs_info {
-let (rev, p1, p2, opt_bytes) = rev_info?;
-let files = match _bytes {
-Some(bytes) => ChangedFiles::new(bytes.data(py)),
-// value was presumably None, meaning they was no copy data.
-None => ChangedFiles::new_empty(),
-};
+// Start a thread that does CPU-heavy processing in parallel with the
+// loop below.
+//
+// If the parent thread panics, `rev_info_sender` will be dropped and
+// “disconnected”. `rev_info_receiver` will be notified of this and
+// exit its own loop.
+let thread = std::thread::spawn(move || {
+let mut combine_changeset_copies =
+CombineChangesetCopies::new(children_count);
+for (rev, p1, p2, opt_bytes) in rev_info_receiver {
+let gil = Python::acquire_gil();
+let py = gil.python();
+let files = match _bytes {
+Some(raw) => ChangedFiles::new(raw.data(py)),
+// Python None was extracted to Option::None,
+// meaning there was no copy data.
+None => ChangedFiles::new_empty(),
+};
+combine_changeset_copies.add_revision(rev, p1, p2, files)
+}
+
+combine_changeset_copies.finish(target_rev)
+});
 
-combine_changeset_copies.add_revision(rev, p1, p2, files)
-}
-let path_copies = combine_changeset_copies.finish(target_rev);
+for rev_info in revs_info {
+let (rev, p1, p2, opt_bytes) = rev_info?;
+
+// We’d prefer to avoid the child thread calling into Python code,
+// but this avoids a potential deadlock on the GIL if it does:
+py.allow_threads(|| {
+rev_info_sender.send((rev, p1, p2, opt_bytes)).expect(
+"combine_changeset_copies: channel is disconnected",
+);
+});
+}
+// We’d prefer to avoid the child thread calling into Python code,
+// but this avoids a potential deadlock on the GIL if it does:
+py.allow_threads(|| {
+// Disconnect the channel to signal the child thread to stop:
+// the `for … in rev_info_receiver` loop will end.
+drop(rev_info_sender);
+
+// Wait for the child thread to stop, and propagate any panic.
+thread.join().unwrap_or_else(|panic_payload| {
+std::panic::resume_unwind(panic_payload)
+})
+})
+};
+
 let out = PyDict::new(py);
 for (dest, source) in path_copies.into_iter() {
 out.set_item(
@@ 

D9685: copies-rust: introduce PyBytesWithData to reduce GIL requirement

2021-01-06 Thread SimonSapin
SimonSapin created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  See explanations in new doc-comments.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9685

AFFECTED FILES
  rust/hg-cpython/src/copy_tracing.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/copy_tracing.rs 
b/rust/hg-cpython/src/copy_tracing.rs
--- a/rust/hg-cpython/src/copy_tracing.rs
+++ b/rust/hg-cpython/src/copy_tracing.rs
@@ -12,6 +12,55 @@
 use hg::copy_tracing::CombineChangesetCopies;
 use hg::Revision;
 
+use self::pybytes_with_data::PyBytesWithData;
+
+// Module to encapsulate private fields
+mod pybytes_with_data {
+use cpython::{PyBytes, Python};
+
+/// Safe abstraction over a `PyBytes` together with the `&[u8]` slice
+/// that borrows it.
+///
+/// Calling `PyBytes::data` requires a GIL marker but we want to access the
+/// data in a thread that (ideally) does not need to acquire the GIL.
+/// This type allows separating the call an the use.
+pub(super) struct PyBytesWithData {
+#[allow(unused)]
+keep_alive: PyBytes,
+
+/// Borrows the buffer inside `self.keep_alive`,
+/// but the borrow-checker cannot express self-referential structs.
+data: *const [u8],
+}
+
+fn require_send() {}
+
+#[allow(unused)]
+fn static_assert_pybytes_is_send() {
+require_send::;
+}
+
+// Safety: PyBytes is Send. Raw pointers are not by default,
+// but here sending one to another thread is fine since we ensure it stays
+// valid.
+unsafe impl Send for PyBytesWithData {}
+
+impl PyBytesWithData {
+pub fn new(py: Python, bytes: PyBytes) -> Self {
+Self {
+data: bytes.data(py),
+keep_alive: bytes,
+}
+}
+
+pub fn data() -> &[u8] {
+// Safety: the raw pointer is valid as long as the PyBytes is still
+// alive, and the returned slice borrows `self`.
+unsafe { &*self.data }
+}
+}
+}
+
 /// Combines copies information contained into revision `revs` to build a copy
 /// map.
 ///
@@ -31,17 +80,18 @@
 .collect::>()?;
 
 /// (Revision number, parent 1, parent 2, copy data for this revision)
-type RevInfo = (Revision, Revision, Revision, Option);
+type RevInfo = (Revision, Revision, Revision, Option);
 
-let revs_info = revs.iter(py).map(|rev_py| -> PyResult {
-let rev = rev_py.extract(py)?;
-let tuple: PyTuple =
-rev_info.call(py, (rev_py,), None)?.cast_into(py)?;
-let p1 = tuple.get_item(py, 0).extract(py)?;
-let p2 = tuple.get_item(py, 1).extract(py)?;
-let opt_bytes = tuple.get_item(py, 2).extract(py)?;
-Ok((rev, p1, p2, opt_bytes))
-});
+let revs_info =
+revs.iter(py).map(|rev_py| -> PyResult> {
+let rev = rev_py.extract(py)?;
+let tuple: PyTuple =
+rev_info.call(py, (rev_py,), None)?.cast_into(py)?;
+let p1 = tuple.get_item(py, 0).extract(py)?;
+let p2 = tuple.get_item(py, 1).extract(py)?;
+let opt_bytes = tuple.get_item(py, 2).extract(py)?;
+Ok((rev, p1, p2, opt_bytes))
+});
 
 let path_copies = if !multi_thread {
 let mut combine_changeset_copies =
@@ -67,7 +117,7 @@
 //
 // TODO: tweak the bound?
 let (rev_info_sender, rev_info_receiver) =
-crossbeam_channel::bounded::(1000);
+crossbeam_channel::bounded::>(1000);
 
 // Start a thread that does CPU-heavy processing in parallel with the
 // loop below.
@@ -79,15 +129,16 @@
 let mut combine_changeset_copies =
 CombineChangesetCopies::new(children_count);
 for (rev, p1, p2, opt_bytes) in rev_info_receiver {
-let gil = Python::acquire_gil();
-let py = gil.python();
 let files = match _bytes {
-Some(raw) => ChangedFiles::new(raw.data(py)),
+Some(raw) => ChangedFiles::new(raw.data()),
 // Python None was extracted to Option::None,
 // meaning there was no copy data.
 None => ChangedFiles::new_empty(),
 };
 combine_changeset_copies.add_revision(rev, p1, p2, files)
+
+// The GIL is (still) implicitly acquired here through
+// `impl Drop for PyBytes`.
 }
 
 combine_changeset_copies.finish(target_rev)
@@ -95,6 +146,7 @@
 
 for rev_info in revs_info {
 let (rev, p1, p2, opt_bytes) = rev_info?;
+let opt_bytes = opt_bytes.map(|b| PyBytesWithData::new(py, b));
 
 // We’d prefer to avoid the child thread calling into Python code,
 // but this avoids a 

D9686: copies-rust: send PyBytes values back be dropped ino the parent thread

2021-01-06 Thread SimonSapin
SimonSapin created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  … instead of acquiring the GIL in the Rust thread in the Drop impl
  
  This commit is based on the premise that crossbeam-channel
  with unbounded send and non-blocking receive is faster than
  a contended GIL, but that remains to be measured.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9686

AFFECTED FILES
  rust/hg-cpython/src/copy_tracing.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/copy_tracing.rs 
b/rust/hg-cpython/src/copy_tracing.rs
--- a/rust/hg-cpython/src/copy_tracing.rs
+++ b/rust/hg-cpython/src/copy_tracing.rs
@@ -1,6 +1,7 @@
 use cpython::ObjectProtocol;
 use cpython::PyBytes;
 use cpython::PyDict;
+use cpython::PyDrop;
 use cpython::PyList;
 use cpython::PyModule;
 use cpython::PyObject;
@@ -58,6 +59,10 @@
 // alive, and the returned slice borrows `self`.
 unsafe { &*self.data }
 }
+
+pub fn unwrap(self) -> PyBytes {
+self.keep_alive
+}
 }
 }
 
@@ -93,7 +98,8 @@
 Ok((rev, p1, p2, opt_bytes))
 });
 
-let path_copies = if !multi_thread {
+let path_copies;
+if !multi_thread {
 let mut combine_changeset_copies =
 CombineChangesetCopies::new(children_count);
 
@@ -108,7 +114,7 @@
 
 combine_changeset_copies.add_revision(rev, p1, p2, files)
 }
-combine_changeset_copies.finish(target_rev)
+path_copies = combine_changeset_copies.finish(target_rev)
 } else {
 // Use a bounded channel to provide back-pressure:
 // if the child thread is slower to process revisions than this thread
@@ -119,6 +125,13 @@
 let (rev_info_sender, rev_info_receiver) =
 crossbeam_channel::bounded::>(1000);
 
+// This channel (going the other way around) however is unbounded.
+// If they were both bounded, there might potentially be deadlocks
+// where both channels are full and both threads are waiting on each
+// other.
+let (pybytes_sender, pybytes_receiver) =
+crossbeam_channel::unbounded();
+
 // Start a thread that does CPU-heavy processing in parallel with the
 // loop below.
 //
@@ -135,10 +148,20 @@
 // meaning there was no copy data.
 None => ChangedFiles::new_empty(),
 };
-combine_changeset_copies.add_revision(rev, p1, p2, files)
+combine_changeset_copies.add_revision(rev, p1, p2, files);
 
-// The GIL is (still) implicitly acquired here through
-// `impl Drop for PyBytes`.
+// Send `PyBytes` back to the parent thread so the parent
+// thread can drop it. Otherwise the GIL would be implicitly
+// acquired here through `impl Drop for PyBytes`.
+if let Some(bytes) = opt_bytes {
+if let Err(_) = pybytes_sender.send(bytes.unwrap()) {
+// The channel is disconnected, meaning the parent
+// thread panicked or returned
+// early through
+// `?` to propagate a Python exception.
+break;
+}
+}
 }
 
 combine_changeset_copies.finish(target_rev)
@@ -155,10 +178,15 @@
 "combine_changeset_copies: channel is disconnected",
 );
 });
+
+// Drop anything in the channel, without blocking
+for pybytes in pybytes_receiver.try_iter() {
+pybytes.release_ref(py)
+}
 }
 // We’d prefer to avoid the child thread calling into Python code,
 // but this avoids a potential deadlock on the GIL if it does:
-py.allow_threads(|| {
+path_copies = py.allow_threads(|| {
 // Disconnect the channel to signal the child thread to stop:
 // the `for … in rev_info_receiver` loop will end.
 drop(rev_info_sender);
@@ -167,7 +195,12 @@
 thread.join().unwrap_or_else(|panic_payload| {
 std::panic::resume_unwind(panic_payload)
 })
-})
+});
+
+// Drop anything left in the channel
+for pybytes in pybytes_receiver.iter() {
+pybytes.release_ref(py)
+}
 };
 
 let out = PyDict::new(py);



To: SimonSapin, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D9683: copies-rust: split up combine_changeset_copies function into a struct

2021-01-06 Thread SimonSapin
SimonSapin created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  … such that each iteration of its former loop is now a method call,
  with the caller driving the loop.
  
  This entirely removes the need for the `DataHolder` hack:
  the method now takes a `ChangedFiles<'_>` parameter that borrows
  a bytes buffer that can be owned by the caller’s stack frame,
  just for the duration of that call.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9683

AFFECTED FILES
  rust/hg-core/src/copy_tracing.rs
  rust/hg-cpython/src/copy_tracing.rs

CHANGE DETAILS

diff --git a/rust/hg-cpython/src/copy_tracing.rs 
b/rust/hg-cpython/src/copy_tracing.rs
--- a/rust/hg-cpython/src/copy_tracing.rs
+++ b/rust/hg-cpython/src/copy_tracing.rs
@@ -8,11 +8,8 @@
 use cpython::PyTuple;
 use cpython::Python;
 
-use hg::copy_tracing::combine_changeset_copies;
 use hg::copy_tracing::ChangedFiles;
-use hg::copy_tracing::DataHolder;
-use hg::copy_tracing::RevInfo;
-use hg::copy_tracing::RevInfoMaker;
+use hg::copy_tracing::CombineChangesetCopies;
 use hg::Revision;
 
 /// Combines copies information contained into revision `revs` to build a copy
@@ -26,64 +23,41 @@
 target_rev: Revision,
 rev_info: PyObject,
 ) -> PyResult {
-let revs: PyResult<_> =
-revs.iter(py).map(|r| Ok(r.extract(py)?)).collect();
-
-// Wrap the `rev_info_maker` python callback as a Rust closure
-//
-// No errors are expected from the Python side, and they will should only
-// happens in case of programing error or severe data corruption. Such
-// errors will raise panic and the rust-cpython harness will turn them into
-// Python exception.
-let rev_info_maker: RevInfoMaker =
-Box::new(|rev: Revision, d:  DataHolder| -> RevInfo {
-let res: PyTuple = rev_info
-.call(py, (rev,), None)
-.expect("rust-copy-tracing: python call to `rev_info` failed")
-.cast_into(py)
-.expect(
-"rust-copy_tracing: python call to `rev_info` returned \
-unexpected non-Tuple value",
-);
-let p1 = res.get_item(py, 0).extract(py).expect(
-"rust-copy-tracing: rev_info return is invalid, first item \
-is a not a revision",
-);
-let p2 = res.get_item(py, 1).extract(py).expect(
-"rust-copy-tracing: rev_info return is invalid, first item \
-is a not a revision",
-);
-
-let files = match res.get_item(py, 2).extract::(py) {
-Ok(raw) => {
-// Give responsability for the raw bytes lifetime to
-// hg-core
-d.data = Some(raw);
-let addrs = d.data.as_ref().expect(
-"rust-copy-tracing: failed to get a reference to the \
-raw bytes for copy data").data(py);
-ChangedFiles::new(addrs)
-}
-// value was presumably None, meaning they was no copy data.
-Err(_) => ChangedFiles::new_empty(),
-};
-
-(p1, p2, files)
-});
-let children_count: PyResult<_> = children_count
+let children_count = children_count
 .items(py)
 .iter()
 .map(|(k, v)| Ok((k.extract(py)?, v.extract(py)?)))
-.collect();
+.collect::>()?;
+
+/// (Revision number, parent 1, parent 2, copy data for this revision)
+type RevInfo = (Revision, Revision, Revision, Option);
+
+let revs_info = revs.iter(py).map(|rev_py| -> PyResult {
+let rev = rev_py.extract(py)?;
+let tuple: PyTuple =
+rev_info.call(py, (rev_py,), None)?.cast_into(py)?;
+let p1 = tuple.get_item(py, 0).extract(py)?;
+let p2 = tuple.get_item(py, 1).extract(py)?;
+let opt_bytes = tuple.get_item(py, 2).extract(py)?;
+Ok((rev, p1, p2, opt_bytes))
+});
 
-let res = combine_changeset_copies(
-revs?,
-children_count?,
-target_rev,
-rev_info_maker,
-);
+let mut combine_changeset_copies =
+CombineChangesetCopies::new(children_count);
+
+for rev_info in revs_info {
+let (rev, p1, p2, opt_bytes) = rev_info?;
+let files = match _bytes {
+Some(bytes) => ChangedFiles::new(bytes.data(py)),
+// value was presumably None, meaning they was no copy data.
+None => ChangedFiles::new_empty(),
+};
+
+combine_changeset_copies.add_revision(rev, p1, p2, files)
+}
+let path_copies = combine_changeset_copies.finish(target_rev);
 let out = PyDict::new(py);
-for (dest, source) in res.into_iter() {
+for (dest, source) in path_copies.into_iter() {
 out.set_item(
 py,
  

D9682: copies-rust: extract generic map merge logic from merge_copies_dict

2021-01-06 Thread SimonSapin
SimonSapin created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  This deduplicates the copy-tracing-specific logic

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9682

AFFECTED FILES
  rust/hg-core/src/copy_tracing.rs
  rust/hg-core/src/utils.rs

CHANGE DETAILS

diff --git a/rust/hg-core/src/utils.rs b/rust/hg-core/src/utils.rs
--- a/rust/hg-core/src/utils.rs
+++ b/rust/hg-core/src/utils.rs
@@ -8,6 +8,8 @@
 //! Contains useful functions, traits, structs, etc. for use in core.
 
 use crate::utils::hg_path::HgPath;
+use im_rc::ordmap::DiffItem;
+use im_rc::ordmap::OrdMap;
 use std::{io::Write, ops::Deref};
 
 pub mod files;
@@ -176,3 +178,151 @@
 None
 }
 }
+
+pub(crate) enum MergeResult {
+UseLeftValue,
+UseRightValue,
+UseNewValue(V),
+}
+
+/// Return the union of the two given maps,
+/// calling `merge(key, left_value, right_value)` to resolve keys that exist in
+/// both.
+///
+/// CC https://github.com/bodil/im-rs/issues/166
+pub(crate) fn ordmap_union_with_merge(
+left: OrdMap,
+right: OrdMap,
+mut merge: impl FnMut(, , ) -> MergeResult,
+) -> OrdMap
+where
+K: Clone + Ord,
+V: Clone + PartialEq,
+{
+if left.ptr_eq() {
+// One of the two maps is an unmodified clone of the other
+left
+} else if left.len() / 2 > right.len() {
+// When two maps have different sizes,
+// their size difference is a lower bound on
+// how many keys of the larger map are not also in the smaller map.
+// This in turn is a lower bound on the number of differences in
+// `OrdMap::diff` and the "amount of work" that would be done
+// by `ordmap_union_with_merge_by_diff`.
+//
+// Here `left` is more than twice the size of `right`,
+// so the number of differences is more than the total size of
+// `right`. Therefore an algorithm based on iterating `right`
+// is more efficient.
+//
+// This helps a lot when a tiny (or empty) map is merged
+// with a large one.
+ordmap_union_with_merge_by_iter(left, right, merge)
+} else if left.len() < right.len() / 2 {
+// Same as above but with `left` and `right` swapped
+ordmap_union_with_merge_by_iter(right, left, |key, a, b| {
+// Also swapped in `merge` arguments:
+match merge(key, b, a) {
+MergeResult::UseNewValue(v) => MergeResult::UseNewValue(v),
+// … and swap back in `merge` result:
+MergeResult::UseLeftValue => MergeResult::UseRightValue,
+MergeResult::UseRightValue => MergeResult::UseLeftValue,
+}
+})
+} else {
+// For maps of similar size, use the algorithm based on `OrdMap::diff`
+ordmap_union_with_merge_by_diff(left, right, merge)
+}
+}
+
+/// Efficient if `right` is much smaller than `left`
+fn ordmap_union_with_merge_by_iter(
+mut left: OrdMap,
+right: OrdMap,
+mut merge: impl FnMut(, , ) -> MergeResult,
+) -> OrdMap
+where
+K: Clone + Ord,
+V: Clone,
+{
+for (key, right_value) in right {
+match left.get() {
+None => {
+left.insert(key, right_value);
+}
+Some(left_value) => match merge(, left_value, _value) {
+MergeResult::UseLeftValue => {}
+MergeResult::UseRightValue => {
+left.insert(key, right_value);
+}
+MergeResult::UseNewValue(new_value) => {
+left.insert(key, new_value);
+}
+},
+}
+}
+left
+}
+
+/// Fallback when both maps are of similar size
+fn ordmap_union_with_merge_by_diff(
+mut left: OrdMap,
+mut right: OrdMap,
+mut merge: impl FnMut(, , ) -> MergeResult,
+) -> OrdMap
+where
+K: Clone + Ord,
+V: Clone + PartialEq,
+{
+// (key, value) pairs that would need to be inserted in either map
+// in order to turn it into the union.
+//
+// TODO: if/when https://github.com/bodil/im-rs/pull/168 is accepted,
+// change these from `Vec<(K, V)>` to `Vec<(, Cow)>`
+// with `left_updates` only borrowing from `right` and `right_updates` from
+// `left`, and with `Cow::Owned` used for `MergeResult::UseNewValue`.
+//
+// This would allow moving all `.clone()` calls to after we’ve decided
+// which of `right_updates` or `left_updates` to use
+// (value ones becoming `Cow::into_owned`),
+// and avoid making clones we don’t end up using.
+let mut left_updates = Vec::new();
+let mut right_updates = Vec::new();
+
+for difference in left.diff() {
+match difference {
+DiffItem::Add(key, value) => {
+left_updates.push((key.clone(), value.clone()))
+}
+

D9681: localrepo: move storevfs calculation out of if statement

2021-01-06 Thread pulkit (Pulkit Goyal)
pulkit created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  In next patch, we will need this variable in else statement too. So, let's 
take
  it out.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9681

AFFECTED FILES
  mercurial/localrepo.py

CHANGE DETAILS

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -549,8 +549,13 @@
 requirementsmod.SHARED_REQUIREMENT in requirements
 or requirementsmod.RELATIVE_SHARED_REQUIREMENT in requirements
 )
+storevfs = None
 if shared:
+# This is a shared repo
 sharedvfs = _getsharedvfs(hgvfs, requirements)
+storevfs = vfsmod.vfs(sharedvfs.join(b'store'))
+else:
+storevfs = vfsmod.vfs(hgvfs.join(b'store'))
 
 # if .hg/requires contains the sharesafe requirement, it means
 # there exists a `.hg/store/requires` too and we should read it
@@ -573,12 +578,6 @@
 _(b"share source does not support exp-sharesafe requirement")
 )
 
-if shared:
-# This is a shared repo
-storevfs = vfsmod.vfs(sharedvfs.join(b'store'))
-else:
-storevfs = vfsmod.vfs(hgvfs.join(b'store'))
-
 requirements |= _readrequires(storevfs, False)
 elif shared:
 sourcerequires = _readrequires(sharedvfs, False)



To: pulkit, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D9680: sharesafe: add functionality to automatically downgrade shares

2021-01-06 Thread pulkit (Pulkit Goyal)
pulkit created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  Reasoning is same as previous patch which adds automatic upgrade support.
  
  Downgrade is required as if automatic upgrade is enabled, all shares upgrade 
and
  then source repository downgrades, shares won't work. We need to downgrade 
them.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9680

AFFECTED FILES
  mercurial/configitems.py
  mercurial/localrepo.py
  mercurial/scmutil.py
  tests/test-share-safe.t

CHANGE DETAILS

diff --git a/tests/test-share-safe.t b/tests/test-share-safe.t
--- a/tests/test-share-safe.t
+++ b/tests/test-share-safe.t
@@ -484,6 +484,20 @@
   abort: share source does not support exp-sharesafe requirement
   [255]
 
+Testing automatic downgrade of shares when config is set
+  $ hg log -GT "{node}: {desc}\n" -R ../ss-share --config 
experimental.sharesafe-auto-downgrade-shares=true
+  repository downgraded to not use share-safe mode
+  @  f63db81e6dde1d9c78814167f77fb1fb49283f4f: added bar
+  |
+  o  f3ba8b99bb6f897c87bbc1c07b75c6ddf43a4f77: added foo
+  
+
+  $ hg log -GT "{node}: {desc}\n" -R ../ss-share
+  @  f63db81e6dde1d9c78814167f77fb1fb49283f4f: added bar
+  |
+  o  f3ba8b99bb6f897c87bbc1c07b75c6ddf43a4f77: added foo
+  
+
 
 Testing automatic upgrade of shares when config is set
 
diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -1613,6 +1613,23 @@
 return current_requirements
 
 
+def downgrade_share_to_non_safe(
+hgvfs, current_requirements, source_requirements
+):
+"""Downgrades a share which use share-safe to not use it
+
+Returns the set of new repository requirements
+"""
+# we cannot be 100% sure on which requirements were present in store when
+# the source supported share-safe. However, we do know that working
+# directory requirements were not there. Hence we remove them
+source_requirements -= requirementsmod.WORKING_DIR_REQUIREMENTS
+current_requirements |= source_requirements
+current_requirements.remove(requirementsmod.SHARESAFE_REQUIREMENT)
+writerequires(hgvfs, current_requirements)
+return current_requirements
+
+
 class filecachesubentry(object):
 def __init__(self, path, stat):
 self.path = path
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -574,11 +574,24 @@
 and requirementsmod.SHARESAFE_REQUIREMENT
 not in _readrequires(sharedvfs, True)
 ):
-raise error.Abort(
-_(b"share source does not support exp-sharesafe requirement")
-)
-
-requirements |= _readrequires(storevfs, False)
+if ui.configbool(
+b'experimental', b'sharesafe-auto-downgrade-shares'
+):
+sourcerequires = _readrequires(sharedvfs, True)
+requirements = scmutil.downgrade_share_to_non_safe(
+hgvfs, requirements, sourcerequires
+)
+ui.warn(
+_(b'repository downgraded to not use share-safe mode\n')
+)
+else:
+raise error.Abort(
+_(
+b"share source does not support exp-sharesafe 
requirement"
+)
+)
+else:
+requirements |= _readrequires(storevfs, False)
 elif shared:
 sourcerequires = _readrequires(sharedvfs, False)
 if requirementsmod.SHARESAFE_REQUIREMENT in sourcerequires:
diff --git a/mercurial/configitems.py b/mercurial/configitems.py
--- a/mercurial/configitems.py
+++ b/mercurial/configitems.py
@@ -1074,6 +1074,11 @@
 )
 coreconfigitem(
 b'experimental',
+b'sharesafe-auto-downgrade-shares',
+default=False,
+)
+coreconfigitem(
+b'experimental',
 b'sharesafe-auto-upgrade-shares',
 default=False,
 )



To: pulkit, #hg-reviewers
Cc: mercurial-patches, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D9679: sharesafe: introduce functionality to automatically upgrade shares

2021-01-06 Thread pulkit (Pulkit Goyal)
pulkit created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  In past few months, we have developed a `share-safe` mode for sharing 
repository
  in which share source requirements and config values are shared with the 
shares.
  
  To get it rolling, an important task is to get these shares automatically
  upgraded. We are focusing on an installation where shares are created by 
scripts
  and test jobs. It will be difficult to manually upgrade these and we need some
  functionality to do so automatically.
  
  This patch introduces a config option to deal with it. If all of the following
  conditions are met, we upgrade the share repository automatically:
  
  - If the config option is enabled
  - Share source repository is share-safe enabled
  - Share is not share-safe enabled
  - Any command is run in the share
  
  Upgrading the share is pretty easy as it involves only editing the 
requirements
  file.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9679

AFFECTED FILES
  mercurial/configitems.py
  mercurial/localrepo.py
  mercurial/scmutil.py
  tests/test-share-safe.t

CHANGE DETAILS

diff --git a/tests/test-share-safe.t b/tests/test-share-safe.t
--- a/tests/test-share-safe.t
+++ b/tests/test-share-safe.t
@@ -479,8 +479,53 @@
   |
   o  f3ba8b99bb6f897c87bbc1c07b75c6ddf43a4f77: added foo
   
-  $ hg unshare -R ../nss-share
 
   $ hg log -GT "{node}: {desc}\n" -R ../ss-share
   abort: share source does not support exp-sharesafe requirement
   [255]
+
+
+Testing automatic upgrade of shares when config is set
+
+  $ hg debugupgraderepo -q --run --config format.exp-share-safe=True
+  upgrade will perform the following actions:
+  
+  requirements
+ preserved: dotencode, fncache, generaldelta, revlogv1, sparserevlog, store
+ added: exp-sharesafe
+  
+  processed revlogs:
+- all-filelogs
+- changelog
+- manifest
+  
+  repository upgraded to share safe mode, existing shares will still work in 
old non-safe mode. Re-share existing shares to use them in safe mode New shares 
will be created in safe mode.
+  $ hg debugrequirements
+  dotencode
+  exp-sharesafe
+  fncache
+  generaldelta
+  revlogv1
+  sparserevlog
+  store
+  $ hg log -GT "{node}: {desc}\n" -R ../nss-share
+  warning: source repository supports share-safe functionality. Reshare to 
upgrade.
+  @  f63db81e6dde1d9c78814167f77fb1fb49283f4f: added bar
+  |
+  o  f3ba8b99bb6f897c87bbc1c07b75c6ddf43a4f77: added foo
+  
+  $ hg log -GT "{node}: {desc}\n" -R ../nss-share --config 
experimental.sharesafe-auto-upgrade-shares=true
+  repository upgraded to use share-safe mode
+  @  f63db81e6dde1d9c78814167f77fb1fb49283f4f: added bar
+  |
+  o  f3ba8b99bb6f897c87bbc1c07b75c6ddf43a4f77: added foo
+  
+
+Test that unshare works
+
+  $ hg unshare -R ../nss-share
+  $ hg log -GT "{node}: {desc}\n" -R ../nss-share
+  @  f63db81e6dde1d9c78814167f77fb1fb49283f4f: added bar
+  |
+  o  f3ba8b99bb6f897c87bbc1c07b75c6ddf43a4f77: added foo
+  
diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -1597,6 +1597,22 @@
 fp.write(b"%s\n" % r)
 
 
+def upgrade_share_to_safe(hgvfs, current_requirements, store_requirements):
+"""Upgrades a share to use share-safe mechanism
+
+Returns the set of new requirements for the repository
+"""
+# after upgrade, store requires will be shared, so lets find
+# the requirements which are not present in store and
+# write them to share's .hg/requires
+diffrequires = current_requirements - store_requirements
+# add share-safe requirement as it will mark the share as share-safe
+diffrequires.add(requirementsmod.SHARESAFE_REQUIREMENT)
+writerequires(hgvfs, diffrequires)
+current_requirements.add(requirementsmod.SHARESAFE_REQUIREMENT)
+return current_requirements
+
+
 class filecachesubentry(object):
 def __init__(self, path, stat):
 self.path = path
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -582,12 +582,19 @@
 elif shared:
 sourcerequires = _readrequires(sharedvfs, False)
 if requirementsmod.SHARESAFE_REQUIREMENT in sourcerequires:
-ui.warn(
-_(
-b'warning: source repository supports share-safe 
functionality.'
-b' Reshare to upgrade.\n'
+if ui.configbool(b'experimental', 
b'sharesafe-auto-upgrade-shares'):
+storerequires = _readrequires(storevfs, False)
+requirements = scmutil.upgrade_share_to_safe(
+hgvfs, requirements, storerequires
 )
-)
+ui.warn(_(b'repository upgraded to use share-safe mode\n'))
+else:
+ui.warn(
+_(
+