hgext/git: octopus merge work-in-progress code

2024-05-01 Thread Josef 'Jeff' Sipek
This is a very rough work-in-progress that I've been sitting on for far too
long.  Since I don't know when I'll have time to hack more on it, I thought
I'd share it.

The idea is to convert each octopus merge into a series of commits that pull
in one branch at a time.  These intermediate commits use made up hashes that
are a simple counter printed as a 40-digit hex number.

If someone picks it up and makes it actually work - great.  If nobody does,
I'll probably end up hacking on it some more in the next few months/years.

Jeff.

diff --git a/hgext/git/gitlog.py b/hgext/git/gitlog.py
--- a/hgext/git/gitlog.py
+++ b/hgext/git/gitlog.py
@@ -61,6 +61,14 @@ class baselog:  # revlog.revlog):
 raise error.LookupError(r, b'00changelog.i', _(b'no node'))
 return bin(t[0])
 
+def synthetic(self, n):
+t = self._db.execute(
+'SELECT synthetic FROM changelog WHERE node = ?', 
(gitutil.togitnode(n),)
+).fetchone()
+if t is None or t[0] is None:
+return n
+return bin(t[0])
+
 def hasnode(self, n):
 t = self._db.execute(
 'SELECT node FROM changelog WHERE node = ?',
@@ -222,6 +230,7 @@ class changelog(baselog):
 return hgchangelog._changelogrevision(
 extra=extra, manifest=sha1nodeconstants.nullid
 )
+n = self.synthetic(n)
 hn = gitutil.togitnode(n)
 # We've got a real commit!
 files = [
@@ -367,20 +376,13 @@ class changelog(baselog):
 return bool(self.reachableroots(a, [b], [a], includepath=False))
 
 def parentrevs(self, rev):
-n = self.node(rev)
-hn = gitutil.togitnode(n)
-if hn != gitutil.nullgit:
-c = self.gitrepo[hn]
-else:
-return nullrev, nullrev
-p1 = p2 = nullrev
-if c.parents:
-p1 = self.rev(c.parents[0].id.raw)
-if len(c.parents) > 2:
-raise error.Abort(b'TODO octopus merge handling')
-if len(c.parents) == 2:
-p2 = self.rev(c.parents[1].id.raw)
-return p1, p2
+assert rev >= 0, rev
+t = self._db.execute(
+'SELECT p1, p2 FROM changelog WHERE rev = ?', (rev,)
+).fetchone()
+if t is None:
+raise error.LookupError(rev, b'00changelog.i', _(b'no rev %d'))
+return self.rev(bin(t[0])), self.rev(bin(t[1]))
 
 # Private method is used at least by the tags code.
 _uncheckedparentrevs = parentrevs
@@ -459,6 +461,7 @@ class manifestlog(baselog):
 if node == sha1nodeconstants.nullid:
 # TODO: this should almost certainly be a memgittreemanifestctx
 return manifest.memtreemanifestctx(self, relpath)
+node = self.synthetic(node)
 commit = self.gitrepo[gitutil.togitnode(node)]
 t = commit.tree
 if relpath:
diff --git a/hgext/git/index.py b/hgext/git/index.py
--- a/hgext/git/index.py
+++ b/hgext/git/index.py
@@ -43,7 +43,8 @@ CREATE TABLE changelog (
   rev INTEGER NOT NULL PRIMARY KEY,
   node TEXT NOT NULL,
   p1 TEXT,
-  p2 TEXT
+  p2 TEXT,
+  synthetic TEXT
 );
 
 CREATE UNIQUE INDEX changelog_node_idx ON changelog(node);
@@ -273,26 +274,43 @@ def _index_repo(
 prog = progress_factory(b'commits')
 # This walker is sure to visit all the revisions in history, but
 # only once.
-for pos, commit in enumerate(walker):
+pos = -1
+for commit in walker:
 if prog is not None:
 prog.update(pos)
 p1 = p2 = gitutil.nullgit
-if len(commit.parents) > 2:
-raise error.ProgrammingError(
-(
-b"git support can't handle octopus merges, "
-b"found a commit with %d parents :("
-)
-% len(commit.parents)
+if len(commit.parents) <= 2:
+if commit.parents:
+p1 = commit.parents[0].id.hex
+if len(commit.parents) == 2:
+p2 = commit.parents[1].id.hex
+pos += 1
+db.execute(
+'INSERT INTO changelog (rev, node, p1, p2, synthetic) 
VALUES(?, ?, ?, ?, NULL)',
+(pos, commit.id.hex, p1, p2),
 )
-if commit.parents:
-p1 = commit.parents[0].id.hex
-if len(commit.parents) == 2:
-p2 = commit.parents[1].id.hex
-db.execute(
-'INSERT INTO changelog (rev, node, p1, p2) VALUES(?, ?, ?, ?)',
-(pos, commit.id.hex, p1, p2),
-)
+else:
+parents = list(commit.parents)
+
+p1 = parents.pop(0).id.hex
+while parents:
+pos += 1
+
+if len(parents) == 1:
+this = commit.id.hex
+synth = None
+else:
+this = "%040x" % pos
+synth = commit.id.hex
+
+p2 = parents.pop(0).id.hex
+

Re: New revlog format, plan page

2021-01-07 Thread Josef 'Jeff' Sipek
On Thu, Jan 07, 2021 at 19:22:23 +0100, Joerg Sonnenberger wrote:
> On Thu, Jan 07, 2021 at 12:04:06PM -0500, Josef 'Jeff' Sipek wrote:
> > On Tue, Jan 05, 2021 at 19:33:36 +0100, Joerg Sonnenberger wrote:

... snip things I agree with and have nothing to add to ...

> > > "No support for sidedata"
> > > 
> > > My big design level concern is that revlog ATM is optimized for fast
> > > integer indexing and append-only storage.
> > 
> > This is an interesting point.  What *are* the most common revlog operations?
> > It probably varies between repos, but I suspect that they are mostly reads
> > rather than writes.  As a consequence, a good revlog format would optimize
> > for the common case (without making the less common cases completely suck).
> 
> The problem is that anything that needs inplace writes is a lot more
> difficult to get right for on-disk consistency and for concurrent
> read-access.

Yes, it definitely is harder.  Depending on the expected workloads and the
exact design, it may or may not be worth the effort.

> Normal revision data does not change, by design. That's
> quite different from any unversioned metadata. This can include
> signatures for example, it could include obsolescence data etc.
> Separating mutable and immutable data is a natural design choice.

Yes, this is a variant of the age old "separate hot and cold data".

If mutable and immutable data is considered independently, each type can be
stored in a different format - each optimized in its own way for the common
case.  This however will likely still require some form of transaction and
synchronization code to guarantee sufficiently atomic updates in face of
errors.

> > hg already makes use of CBOR, so it'd be reasonable to use here - either for
> > the whole entry or just for parts of it.  For example, CBOR's interegers are
> > encoded as 1 byte type, followed by 0, 1, 2, 4, or 8 byte integer.  Smaller
> > values use less space.  For example, values less than 2^32 use 1-5 bytes.
> 
> Needing a separate index from the index for efficient access would
> defeat the point of revlog being an index format in first place...

Variable length encoding can be constrained.  If for example, each entry is
padded to be a multiple of 16B, then access can still be relatively
efficient.  If having sparse revnums is ok, then it shouldn't even require
many (any?) changes to the "core" code.

(My favorite example from CPU instruction sets is the s390 instruction set -
it is variable length, but the length can only be 2, 4, or 6 bytes.)

Jeff.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: New revlog format, plan page

2021-01-07 Thread Josef 'Jeff' Sipek
On Tue, Jan 05, 2021 at 19:33:36 +0100, Joerg Sonnenberger wrote:
> On Tue, Jan 05, 2021 at 04:38:20PM +0100, Raphaël Gomès wrote:
> > I've opened a very much draft plan page [1] to try to list all the things we
> > want to do in that version and try to figure out an efficient new format.
> 
> "No support for hash version"
> 
> I don't think that points really matters. The plan for the hash
> migration allows them in theory to coexist fully on the revlog layer and
> the main problems for mixing them are on the changeset/manifest layer
> anyway. That is, any migration strategy will IMO rewrite all revlogs to
> the newer hash anyway and only keep a secondary index for changesets and
> maybe manifests.

At the same time, I think it is sensible (and very useful when looking an a
revlog without repo-level info) for revlogs to identify which hash they
contain.  Either in some sort of revlog header or in each entry (if hash can
vary between entries).

> "No support for sidedata"
> 
> My big design level concern is that revlog ATM is optimized for fast
> integer indexing and append-only storage.

This is an interesting point.  What *are* the most common revlog operations?
It probably varies between repos, but I suspect that they are mostly reads
rather than writes.  As a consequence, a good revlog format would optimize
for the common case (without making the less common cases completely suck).

> At least for some sidedata use cases I have, that is an ill fit.

I actually have no idea what sidedata is, but I don't think it changes my
point about picking formats that match the workload :)

> "No support for unified revlog"
> 
> IMO this should be the driving feature.

Agreed (assuming that 'unified revlog' is just a placeholder name for 'a
storage scheme that uses less than O(n) files to store revision data').  I
always think twice before I move a file in a hg repo because I don't like
wasting disk space.  It's a stupid feeling, I know.

> The biggest issue for me is that
> it creates two challenges that didn't exist so far:
> (1) Inter-file patches and how they interact with the wire protocol
> (2) Identical revisions stored in different places.
> 
> "No support for larger files"
> 
> Supporting large revlog files is sensible and having a store for
> design-challenged file systems might be necessary. Microsoft, I'm
> looking at you. Otherwise the concern is space use in the revlog file
> and RAM use during operations. I don't think the latter is as big an
> issue now as it was 15 years ago, but the former is real. But it might
> be a good point in time to just go for 64bit offsets by default...

I'd *strongly* advocate for 64-bit offsets.  They pretty much let you forget
that there is a limit.  Storage is cheap.

If revlog entry size is a concern (e.g., it takes more than 1% of the size
of the data it is tracking), then maybe a variable encoding would be the way
to go.

hg already makes use of CBOR, so it'd be reasonable to use here - either for
the whole entry or just for parts of it.  For example, CBOR's interegers are
encoded as 1 byte type, followed by 0, 1, 2, 4, or 8 byte integer.  Smaller
values use less space.  For example, values less than 2^32 use 1-5 bytes.

A common alternative is LEB128 [1], which IIRC is used by git for something
internally.  It is however a bit more expensive to pack/unpack.

Jeff.

[1] https://en.wikipedia.org/wiki/LEB128
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] who: remove OpenJDK

2020-07-31 Thread Josef 'Jeff' Sipek
On Fri, Jul 31, 2020 at 18:30:57 +0200, Antonio Muci wrote:
> > Il 31/07/2020 17:55 Pierre-Yves David  ha 
> > scritto:
...
> > Moving to a modern Mercurial version, using sparse revlog for storage 
> > and recomputing delta gave a massive boost to storage size and clone 
> > performance.
> 
> At least this reassures that performance-wise mercurial has not fallen
> behind so much.
> The tests performed by Josef and Joerg confirm that a performance
> disadvantage exists indeed, but it's not massive.

Keep in mind that I did only clone testing.  I use both hg and git (hg
because I want to, git because I have to), and I have to admit that
something as simple as 'hg log' / 'git log' feel completely different.
git's log output feel instantaneously on the screen, while hg's takes a
fraction of a second.  It is a small fraction, but it "feels" slower.  I
think this has been diagnosed over and over as slow python startup.

...
> What concerns me the most are two things:
> 
> 1. scripta manent: when in some years people will google for "mercurial
> performance" they will stumble upon JDK considerations, and take them form
> granted. What will remain in a potential user's head is "mercurial is
> slow, go for git. JDK guys have done the same". There is no other written
> material counterweighting these moves (except for very interesting blog
> entries by Gregory Szorc, possibly), and so the collective mindset slowly
> slips away.

Around 2010, I messed quite a bit with the xfs file system in linux.  It was
really annoying that users found "tuning guide" slashdot posts from
2001-2003 that were completely wrong but they still kept finding them and
using them.  Often, this resulted in worse performance but the users were
also bad at benchmarking so they didn't notice until it was too late and
they file systems had a lot of data.  (I think it has gotten better, but
those horrid guides are still out there.)  In other words, it takes a *lot*
of effort to make sure people on the internet don't find misinformation.  I
don't really know how, but I think it needs to be a concentrated effort to
be "louder" than the misinformation.  (I consider outdated information
misinformation.)

Jeff.

-- 
All science is either physics or stamp collecting.
- Ernest Rutherford
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: The SHA1 replacement plan

2020-07-28 Thread Josef 'Jeff' Sipek
On Tue, Jul 28, 2020 at 22:31:51 +0200, Joerg Sonnenberger wrote:
> On Tue, Jul 28, 2020 at 03:55:55PM -0400, Josef 'Jeff' Sipek wrote:
> > On Tue, Jul 28, 2020 at 21:35:51 +0200, Joerg Sonnenberger wrote:
> > > On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote:
> > > > On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
> > > > ...
> > > > > I've attached basic benchmark numbers below. The asm variant is using
> > > > > whatever my Threadripper supports in terms of low-level primitives, 
> > > > > e.g.
> > > > > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) 
> > > > > or
> > > > > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> > > > > hashing a large file (~7GB). 
> > > > 
> > > > While these performance measurements are important, it is also 
> > > > important to
> > > > make sure that older (or less "top of the line") hardware isn't 
> > > > completely
> > > > terrible.  For example, it is completely reasonable (at least IMO) to 
> > > > still
> > > > use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
> > > > embedded system (although those tend to have wimpy I/O as well).
> > > 
> > > Yeah, that's why I included the C variant.
> > 
> > I don't trust compilers *not* to do some massive amount of optimization
> > unless they are told to target an older CPU.  Also, newer CPUs like to do a
> > lot of "magic" to speed things up and keep security researchers employed ;)
> 
> The core of the hash functions isn't by itself very friendly to compiler
> optimisations. It's more a case of how bad the automatic code generation
> will be.

In general, yes you are correct.  However, it is a big loop and a compiler
can do a decent amount of pipelining.

> > > Just to establish that baseline:
> > > 
> > > SHA1 (asm) 4.8s
> > > SHA1 (C)   10.7s
> > >
> > > So K12 is somewhat slower on a Threadripper, but should be somewhat
> > > faster than hardware without specific acceleration. SHA support on the
> > > Zen1 Threadripper is quite fast.
> > 
> > I think we're in agreement.  The new algo shouldn't be much worse than the
> > existing SHA1.
> > 
> > For the record: when the time comes, I'm willing to collect some hash perf
> > data on slightly older/weaker hw as a sanity check.
> 
> If you have a modern OpenSSL version, you can get the numbers for
> sha256, sha3-256, blake2b512 and blake2s256 easily. K12 for non-vector
> CPUs or short messages can be reasonable approximated as "half of
> sha3-256 time" as that's the primary difference. BLAKE3 is the only one
> that would be more tricky.

I couldn't help myself and I ran it on 3 of my systems.  The results
are...interesting.


Thinkpad T520 (2011 vintage Sandy Bridge i7) with FreeBSD 12:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes  
16384 bytes
sha1 38552.35k   112781.34k   256808.02k   375173.76k   433724.00k  
 438686.07k
sha256   26402.90k67908.11k   131459.19k   171880.98k   188168.04k  
 189348.31k
sha3-256 13828.62k55332.70k   130879.40k   152721.79k   169003.69k  
 170909.80k
blake2b512   23551.38k94267.47k   240451.58k   311282.76k   342076.92k  
 343736.32k
blake2s256   27319.90k   107783.29k   162617.17k   187314.17k   195500.87k  
 196625.70k

Server (2009 vintage Xeon) running OmniOS:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes  
16384 bytes
sha1 34924.79k   109450.16k   269382.74k   425380.18k   508193.55k  
 517892.78k
sha256   25028.82k67870.78k   142345.81k   195099.15k   218278.57k  
 220632.41k
sha3-256 14281.78k57115.29k   140363.97k   172130.99k   195810.65k  
 199223.98k
blake2b512   24261.26k   101366.74k   286062.42k   433886.55k   514296.49k  
 519682.56k
blake2s256   28521.81k   113023.49k   212133.77k   275259.33k   301708.63k  
 304365.57k

Raspberry Pi 1B+ running FreeBSD 12:

type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes  
16384 bytes
sha1  1084.15k 3711.04k10173.97k18059.62k23704.69k  
  23738.60k
sha256 942.90k 3105.90k 7843.36k12770.46k16318.84k  
  16399.04k
sha3-256   456.77k 1805.49k 4432.56k 5607.01k 6652.73k  
   6715.47k
blake2b512 173.38k  694.93k 1646.80k 2107.62k 2320.81k  
   2328.24k
blake2s256 861.31k 3420.77k 7963.73k12459.78k15746.41k  
  15978.57k


The differ

Re: The SHA1 replacement plan

2020-07-28 Thread Josef 'Jeff' Sipek
On Tue, Jul 28, 2020 at 21:35:51 +0200, Joerg Sonnenberger wrote:
> On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote:
> > On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
> > ...
> > > I've attached basic benchmark numbers below. The asm variant is using
> > > whatever my Threadripper supports in terms of low-level primitives, e.g.
> > > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> > > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> > > hashing a large file (~7GB). 
> > 
> > While these performance measurements are important, it is also important to
> > make sure that older (or less "top of the line") hardware isn't completely
> > terrible.  For example, it is completely reasonable (at least IMO) to still
> > use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
> > embedded system (although those tend to have wimpy I/O as well).
> 
> Yeah, that's why I included the C variant.

I don't trust compilers *not* to do some massive amount of optimization
unless they are told to target an older CPU.  Also, newer CPUs like to do a
lot of "magic" to speed things up and keep security researchers employed ;)

> Those are essentially
> straight forward implementations used in pkgsrc's digest tool. Ideally,
> we can pick a function that not too much worse than SHA1.

Agreed.

> Just to establish that baseline:
> 
> SHA1 (asm) 4.8s
> SHA1 (C)   10.7s
>
> So K12 is somewhat slower on a Threadripper, but should be somewhat
> faster than hardware without specific acceleration. SHA support on the
> Zen1 Threadripper is quite fast.

I think we're in agreement.  The new algo shouldn't be much worse than the
existing SHA1.

For the record: when the time comes, I'm willing to collect some hash perf
data on slightly older/weaker hw as a sanity check.

Jeff.

-- 
A CRAY is the only computer that runs an endless loop in just 4 hours...
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: The SHA1 replacement plan

2020-07-28 Thread Josef 'Jeff' Sipek
On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
...
> I've attached basic benchmark numbers below. The asm variant is using
> whatever my Threadripper supports in terms of low-level primitives, e.g.
> AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) or
> the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> hashing a large file (~7GB). 

While these performance measurements are important, it is also important to
make sure that older (or less "top of the line") hardware isn't completely
terrible.  For example, it is completely reasonable (at least IMO) to still
use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
embedded system (although those tend to have wimpy I/O as well).

Anyway, thanks for spending time on this!

Jeff.

-- 
All parts should go together without forcing.  You must remember that the
parts you are reassembling were disassembled by you.  Therefore, if you
can’t get them together again, there must be a reason.  By all means, do not
use a hammer.
— IBM Manual, 1925
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] who: remove OpenJDK

2020-07-26 Thread Josef 'Jeff' Sipek
On Sun, Jul 26, 2020 at 18:35:03 +0200, Joerg Sonnenberger wrote:
> On Sun, Jul 26, 2020 at 11:12:25AM -0400, Josef 'Jeff' Sipek wrote:
> > > > I'm guessing that they would have benefited from treemanifest.
> > > 
> > > From my testing, treemanifests don't help at all.
> > 
> > They seemed to help with the jdk repo.  I'm guessing that jdk has a deeper
> > nested directories with longer file names because the conversion certainly
> > seemed to help (tm == treemanifest):
> 
> Can you run "hg debugupgraderepo -o re-delta-all" once? IIRC the
> original repository doesn't use generaldelta and this would also affect
> the manifest. 

$ hg debugupgraderepo -o re-delta-all --run --no-backup
...
beginning upgrade...
repository locked and read-only
creating temporary repository to stage migrated data: 
/ws/tmp/jdk-hg/.hg/upgrade.UaS6Ss
(it is safe to interrupt this process any time before data migration completes)
migrating 637431 total revisions (516970 in filelogs, 60143 in manifests, 60318 
in changelog)
migrating 1.07 GB in store; 298 GB tracked data
migrating 187542 filelogs containing 516970 revisions (625 MB in store; 11.9 GB 
tracked data)
finished migrating 516970 filelog revisions across 187542 filelogs; change in 
size: -2.14 MB
migrating 1 manifests containing 60143 revisions (438 MB in store; 286 GB 
tracked data)
finished migrating 60143 manifest revisions across 1 manifests; change in size: 
-382 MB
migrating changelog containing 60318 revisions (28.8 MB in store; 175 MB 
tracked data)
finished migrating 60318 changelog revisions; change in size: 0 bytes
finished migrating 637431 total revisions; total change in store size: -384 MB
copying phaseroots
...

Wow, that's a massive change to the manifest size!

-rw-r--r--   1 jeffpc   jeffpc 25.2M Jul 26 13:55 00changelog.d
-rw-r--r--   1 jeffpc   jeffpc 3.68M Jul 26 13:55 00changelog.i
-rw-r--r--   1 jeffpc   jeffpc 52.3M Jul 26 13:54 00manifest.d
-rw-r--r--   1 jeffpc   jeffpc 3.67M Jul 26 13:54 00manifest.i

After the repo upgrade, I ran hg server and cloned it (non-streaming).  The
clone's manifest is somewhat larger but still reasonably sized:

-rw-r--r--   1 jeffpc  jeffpc25M Jul 26 14:23 00changelog.d
-rw-r--r--   1 jeffpc  jeffpc   3.7M Jul 26 14:16 00changelog.i
-rw-r--r--   1 jeffpc  jeffpc61M Jul 26 14:17 00manifest.d
-rw-r--r--   1 jeffpc  jeffpc   3.7M Jul 26 14:17 00manifest.i

Jeff.

-- 
mainframe, n.:
  An obsolete device still used by thousands of obsolete companies serving
  billions of obsolete customers and making huge obsolete profits for their
  obsolete shareholders. And this year's run twice as fast as last year's.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] who: remove OpenJDK

2020-07-26 Thread Josef 'Jeff' Sipek
On Sun, Jul 26, 2020 at 04:11:06 +0200, Joerg Sonnenberger wrote:
> On Sat, Jul 25, 2020 at 01:36:32PM -0400, Josef 'Jeff' Sipek wrote:
> > First off, the clone itself.  I cloned it from the official upstream repos.
> > My internet connection is 150 Mbit/s, the storage is a 3-way ZFS mirror.  I
> > used hg 4.9.1 (py27), and git 2.21.0.  (I know, I need to update both.  This
> > is on a box that has a solid network connection but is harder to update.  If
> > there is interest I can spend the effort to update them and re-run it with
> > newer versions.)
> 
> It should be noted that for all intends and purposes, a git clone is
> much more comparable to hg clone --stream.

I don't know if this is a temporary error or if the java.net server
disallows it, but:

$ hg clone --stream https://hg.openjdk.java.net/jdk/jdk jdk-stream
streaming all changes
abort: locking the remote repository failed

It'd make sense for this to be a disabled by policy, because you don't want
someone doing a slow streaming pull to lock the server's repo for hours
preventing other pushes (assuming that's the same lock).


Doing the clone over the LAN (gigabit ethernet) took 1m26s total (including
the checkout):

$ hg clone --stream http://server-host:8000 test-hg
streaming all changes
187754 files to transfer, 1.07 GB of data
transferred 1.07 GB in 45.5 seconds (24.0 MB/sec)
updating to branch default
65415 files updated, 0 files merged, 0 files removed, 0 files unresolved

The client host was running at 99% CPU while receiving the data, while the
server was at around 80-90%.  So, I'm concluding that in this local case I
was CPU bound on the client, but the server wasn't exactly lightly loaded.

For comparison, git cloning (including checkout) over the same LAN took 60
seconds.  So, faster than hg streaming clone, but only by ~26 seconds.

> > Now, hg specifics.  It looks like the manifest is huge.  This corresponds to
> > how long it took to download.
> > 
> > -rw-r--r--   1 jeffpc   jeffpc 25.2M Jul 25 12:16 00changelog.d
> > -rw-r--r--   1 jeffpc   jeffpc 3.68M Jul 25 12:01 00changelog.i
> > -rw-r--r--   1 jeffpc   jeffpc  434M Jul 25 12:09 00manifest.d
> > -rw-r--r--   1 jeffpc   jeffpc 3.67M Jul 25 12:09 00manifest.i
> 
> I have similar reservations about the way manifests are handled for the
> NetBSD repository. It's been a topic of discussion recently on IRC. The
> manifest processing itself currently takes nearly half of the total
> clone time and that looks ...suspicious at best.

Indeed.  I don't have the knowledge/experience to suggest improvements, but
I can run benchmarks :)

> > I'm guessing that they would have benefited from treemanifest.
> 
> From my testing, treemanifests don't help at all.

They seemed to help with the jdk repo.  I'm guessing that jdk has a deeper
nested directories with longer file names because the conversion certainly
seemed to help (tm == treemanifest):

$ hg --config extensions.convert= convert ../jdk-hg . ../tm-map
$ cd ..
$ du -sAh */.{git,hg}
452Mjdk-git/.git 
1.11G   jdk-hg/.hg
784Mjdk-tm/.hg

Not amazing, but it is about 70% of the "monolithic" manifest repo.  The
manifest part itself:

$ ls -lh 00*
-rw-r--r--   1 jeffpc   jeffpc 25.2M Jul 25 20:46 00changelog.d
-rw-r--r--   1 jeffpc   jeffpc 3.68M Jul 25 20:47 00changelog.i
-rw-r--r--   1 jeffpc   jeffpc 4.08M Jul 25 20:46 00manifest.d
-rw-r--r--   1 jeffpc   jeffpc 3.67M Jul 25 20:47 00manifest.i

$ du -sAh meta
89.4M   meta

So, the (treemanifest) manifest data is about 97M total vs. 437MB total with
the monolithic manifest.  This equates to 22% of the original manifest size.

...
> > I just kicked off a conversion to treemanifest.  It'll take a while.
> 
> Did you convert to generaldelta and etc already?

'hg clone' produced a reasonable repo without conversion.  The only
requirement added during the conversion was treemanifest.

$ cat jdk-hg/.hg/requires
dotencode
fncache
generaldelta
revlogv1
sparserevlog
store
$ diff jdk-{hg,tm}/.hg/requires
6a7
> treemanifest

I can try other requirements, but I think the manifest problem jdk people
saw was the huge size due to data duplication inside the manifest data -
duplication that went away by manifest subtree "dedup" between revisions.

Jeff.

-- 
UNIX is user-friendly ... it's just selective about who its friends are
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] who: remove OpenJDK

2020-07-25 Thread Josef 'Jeff' Sipek
On Sat, Jul 25, 2020 at 12:27:42 +0200, Antonio Muci via Mercurial-devel wrote:
> That's sad.

Yeah.

This motivated me enough to clone the repos (hg and git) and collect some
data.  Maybe people here will find it useful.

First off, the clone itself.  I cloned it from the official upstream repos.
My internet connection is 150 Mbit/s, the storage is a 3-way ZFS mirror.  I
used hg 4.9.1 (py27), and git 2.21.0.  (I know, I need to update both.  This
is on a box that has a solid network connection but is harder to update.  If
there is interest I can spend the effort to update them and re-run it with
newer versions.)

$ hg clone https://hg.openjdk.java.net/jdk/jdk
destination directory: jdk
requesting all changes
adding changesets
adding manifests
adding file changes
added 60318 changesets with 516970 changes to 187542 files
new changesets fd16c54261b3:227cd01f15fa
updating to branch default
65415 files updated, 0 files merged, 0 files removed, 0 files unresolved

This took a total of ~16.3 mins (978 seconds), of which:

 1) ~30 seconds were used by "adding changesets"
 2) ~8 mins were used by "adding manifests"
 3) ~7 mins were used by "adding files"

The adding of manifests and files was receiving ~1.0-1.2 MB/s (bytes
received on the NIC, *not* actual payload inside TCP and hg specific
framing).

My box still had plenty of CPU, RAM, and I/O left so I don't know if the 1.0
MB/s was a result of hg being sub-optimal or if the hg server or the network
connection were the bottleneck.

To rule out internet slowness, I ran 'hg serve' on the clone and did a clone
on my laptop (5.5rc0+25-fbc53c5853b0, py3) on the same subnet (wifi
connected).  It took 495 seconds (2x faster), and I saw slightly higher
network utilization (~1.7 MB/s) and the laptop CPU pegged at 100% for pretty
much the entire duration of the "adding file changes" portion.  (The laptop
has an SSD, so that probably helped eliminate some of the slowness - it is a
bit of an apples and oranges comparison, but interesting none the less.)

Cloning directly from java.net on my laptop took 1400 seconds - so, about
50% slower.  This could be because of the wifi, py3 vs. py27, hg version
difference, etc., etc.


$ git clone https://github.com/openjdk/jdk.git jdk-git
Cloning into 'jdk-git'...
remote: Enumerating objects: 819, done.
remote: Counting objects: 100% (819/819), done.
remote: Compressing objects: 100% (577/577), done.
remote: Total 1072595 (delta 356), reused 423 (delta 199), pack-reused 1071776
Receiving objects: 100% (1072595/1072595), 414.42 MiB | 6.17 MiB/s, done.
Resolving deltas: 100% (800673/800673), done.
Checking out files: 100% (65415/65415), done.

This took a total of 1 min 49 secs (109 seconds), of which:

 1) 1 min 8 secs were used by "receiving objects"
 2) 25 seconds were used by "resolving deltas"

The receiving of objects was pulling in 6.8 MB/s.

Cloning directly on my laptop took 99 seconds with git version 2.26.2.

...
> About .hg size (1a): is it really true that .hg is 1.2GB and the 
> corresponding .git version is 300 MB? Verifying it should not be too 
> difficult. If it's true (I doubt it), something has to be done.

$ du -shA jdk-*/.{hg,git}
1.10G   jdk-hg/.hg
452Mjdk-git/.git

So, both numbers seem to be tweaked to justify migration - at least on a
fresh clone - but I'd say hg is worse by 2-3x.

The whole checkout in case anyone cares:

$ du -shA *
1014M   jdk-git
1.65G   jdk-hg

Now, hg specifics.  It looks like the manifest is huge.  This corresponds to
how long it took to download.

-rw-r--r--   1 jeffpc   jeffpc 25.2M Jul 25 12:16 00changelog.d
-rw-r--r--   1 jeffpc   jeffpc 3.68M Jul 25 12:01 00changelog.i
-rw-r--r--   1 jeffpc   jeffpc  434M Jul 25 12:09 00manifest.d
-rw-r--r--   1 jeffpc   jeffpc 3.67M Jul 25 12:09 00manifest.i

Not a complete surprised given that there are a lot of files (~65k) tracked
and many use the super-long file paths (e.g.,
test/hotspot/jtreg/runtime/exceptionMsgs/AbstractMethodError/AbstractMethodErrorTest.java).
That adds up.  Just the paths in the manifest itself add up to almost 4.7MB.

$ hg manifest | wc
   65415   65415 4694467

I'm guessing that they would have benefited from treemanifest.


I also tried to clone locally to see what sort of thing a user would see.

$ hg clone jdk-hg test
$ git clone jdk-git test-git

hg took 60 seconds (with hot cache, ~120 secs cold cache), git took 13
seconds.  Git hardlinked the one big pack file, while hg hardlinked each of
the file in .hg/store.  Obviosly, hardlinking 2 files is much faster than
hardlinking ~180k.  (treemanifest would have made this even worse for hg.)


I just kicked off a conversion to treemanifest.  It'll take a while.

Jeff.

-- 
Intellectuals solve problems; geniuses prevent them
- Albert Einstein
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH] git: implement diff manifest method

2020-06-01 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1591024345 14400
#  Mon Jun 01 11:12:25 2020 -0400
# Node ID 8b5e8dc4163b540bf7d0ce6b989090f751081583
# Parent  c8ee7f58b11b835918e1e83b89741999f8809866
git: implement diff manifest method

This makes 'hg diff' work.

diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
--- a/hgext/git/manifest.py
+++ b/hgext/git/manifest.py
@@ -126,9 +126,79 @@ class gittreemanifest(object):
 def hasdir(self, dir):
 return dir in self._dirs
 
-def diff(self, other, match=None, clean=False):
-# TODO
-assert False
+def diff(self, other, match=lambda x: True, clean=False):
+'''Finds changes between the current manifest and m2.
+
+The result is returned as a dict with filename as key and
+values of the form ((n1,fl1),(n2,fl2)), where n1/n2 is the
+nodeid in the current/other manifest and fl1/fl2 is the flag
+in the current/other manifest. Where the file does not exist,
+the nodeid will be None and the flags will be the empty
+string.
+'''
+result = {}
+
+def _iterativediff(t1, t2, subdir):
+"""compares two trees and appends new tree nodes to examine to
+the stack"""
+if t1 is None:
+t1 = {}
+if t2 is None:
+t2 = {}
+
+for e1 in t1:
+realname = subdir + pycompat.fsencode(e1.name)
+
+if e1.type == pygit2.GIT_OBJ_TREE:
+try:
+e2 = t2[e1.name]
+if e2.type != pygit2.GIT_OBJ_TREE:
+e2 = None
+except KeyError:
+e2 = None
+
+stack.append((realname + b'/', e1, e2))
+else:
+n1, fl1 = self.find(realname)
+
+try:
+e2 = t2[e1.name]
+n2, fl2 = other.find(realname)
+except KeyError:
+e2 = None
+n2, fl2 = (None, b'')
+
+if e2 is not None and e2.type == pygit2.GIT_OBJ_TREE:
+stack.append((realname + b'/', None, e2))
+
+if not match(realname):
+continue
+
+if n1 != n2 or fl1 != fl2:
+result[realname] = ((n1, fl1), (n2, fl2))
+elif clean:
+result[realname] = None
+
+for e2 in t2:
+if e2.name in t1:
+continue
+
+realname = subdir + pycompat.fsencode(e2.name)
+
+if e2.type == pygit2.GIT_OBJ_TREE:
+stack.append((realname + b'/', None, e2))
+elif match(realname):
+n2, fl2 = other.find(realname)
+result[realname] = ((None, b''), (n2, fl2))
+
+stack = []
+_iterativediff(self._tree, other._tree, b'')
+while stack:
+subdir, t1, t2 = stack.pop()
+# stack is populated in the function call
+_iterativediff(t1, t2, subdir)
+
+return result
 
 def setflag(self, path, flag):
 node, unused_flag = self._resolve_entry(path)

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 5] git: implement diff manifest method

2020-06-01 Thread Josef 'Jeff' Sipek
On Mon, Jun 01, 2020 at 11:32:49 -0400, Josef 'Jeff' Sipek wrote:
> # HG changeset patch
> # User Josef 'Jeff' Sipek 
> # Date 1591024345 14400
> #  Mon Jun 01 11:12:25 2020 -0400
> # Node ID c9d3c553309f1b6662659e94dbd3fb358e84bdfe
> # Parent  c8ee7f58b11b835918e1e83b89741999f8809866
> git: implement diff manifest method
> 
> This makes 'hg diff' work.
> 
> diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
> --- a/hgext/git/manifest.py
> +++ b/hgext/git/manifest.py
> @@ -127,8 +127,78 @@ class gittreemanifest(object):
>  return dir in self._dirs
>  
>  def diff(self, other, match=None, clean=False):

Bah.  I completely forgot to deal with matches - so this always producess an
unfiltered diff.

> -# TODO
> -assert False
> +'''Finds changes between the current manifest and m2.
> +
> +The result is returned as a dict with filename as key and
> +values of the form ((n1,fl1),(n2,fl2)), where n1/n2 is the
> +nodeid in the current/other manifest and fl1/fl2 is the flag
> +in the current/other manifest. Where the file does not exist,
> +the nodeid will be None and the flags will be the empty
> +string.
> +'''
> +result = {}
> +
> +def _iterativediff(t1, t2, subdir):
> +"""compares two trees and appends new tree nodes to examine to
> +the stack"""
> +if t1 is not None and t2 is not None and t1.id == t2.id: # TODO: 
> check dirtyness

And about this. :|

But the other 4 commits are fine, IMO :)

Jeff.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 5 of 5] git: implement diff manifest method

2020-06-01 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1591024345 14400
#  Mon Jun 01 11:12:25 2020 -0400
# Node ID c9d3c553309f1b6662659e94dbd3fb358e84bdfe
# Parent  c8ee7f58b11b835918e1e83b89741999f8809866
git: implement diff manifest method

This makes 'hg diff' work.

diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
--- a/hgext/git/manifest.py
+++ b/hgext/git/manifest.py
@@ -127,8 +127,78 @@ class gittreemanifest(object):
 return dir in self._dirs
 
 def diff(self, other, match=None, clean=False):
-# TODO
-assert False
+'''Finds changes between the current manifest and m2.
+
+The result is returned as a dict with filename as key and
+values of the form ((n1,fl1),(n2,fl2)), where n1/n2 is the
+nodeid in the current/other manifest and fl1/fl2 is the flag
+in the current/other manifest. Where the file does not exist,
+the nodeid will be None and the flags will be the empty
+string.
+'''
+result = {}
+
+def _iterativediff(t1, t2, subdir):
+"""compares two trees and appends new tree nodes to examine to
+the stack"""
+if t1 is not None and t2 is not None and t1.id == t2.id: # TODO: 
check dirtyness
+return
+if t1 is None:
+t1 = ()
+if t2 is None:
+t2 = ()
+
+for e1 in t1:
+realname = subdir + pycompat.fsencode(e1.name)
+
+if e1.type == pygit2.GIT_OBJ_TREE:
+try:
+e2 = t2[e1.name]
+except KeyError:
+e2 = None
+
+if e2.type != pygit2.GIT_OBJ_TREE:
+e2 = None
+
+stack.append((realname + b'/', e1, e2))
+else:
+n1, fl1 = self.find(realname)
+
+try:
+e2 = t2[e1.name]
+n2, fl2 = other.find(realname)
+except KeyError:
+e2 = None
+n2, fl2 = (None, b'')
+
+if e2 is not None and e2.type == pygit2.GIT_OBJ_TREE:
+stack.append((realname + b'/', None, e2))
+
+if n1 != n2 or fl1 != fl2:
+result[realname] = ((n1, fl1), (n2, fl2))
+elif clean:
+result[realname] = None
+
+for e2 in t2:
+if e2.name in t1:
+continue
+
+realname = subdir + pycompat.fsencode(e2.name)
+
+if e2.type == pygit2.GIT_OBJ_TREE:
+stack.append((realname + b'/', None, e2))
+else:
+n2, fl2 = other.find(realname)
+result[realname] = ((None, b''), (n2, fl2))
+
+stack = []
+_iterativediff(self._tree, other._tree, b'')
+while stack:
+subdir, t1, t2 = stack.pop()
+# stack is populated in the function call
+_iterativediff(t1, t2, subdir)
+
+return result
 
 def setflag(self, path, flag):
 node, unused_flag = self._resolve_entry(path)

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 5] git: implement stub prefetch_parents dirstate method

2020-06-01 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1591016388 14400
#  Mon Jun 01 08:59:48 2020 -0400
# Node ID 1a21d199a8b00e5fb2566ca99fab417253b13b19
# Parent  1537ce87e3ba3759470812143ee972ef9eb8
git: implement stub prefetch_parents dirstate method

A recent change (35b255e474d9) introduced this new required dirstate method
but didn't update the git extension.

diff --git a/hgext/git/dirstate.py b/hgext/git/dirstate.py
--- a/hgext/git/dirstate.py
+++ b/hgext/git/dirstate.py
@@ -288,6 +288,10 @@ class gitdirstate(object):
 # TODO: track copies?
 return None
 
+def prefetch_parents(self):
+# TODO
+pass
+
 @contextlib.contextmanager
 def parentchange(self):
 # TODO: track this maybe?

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 4 of 5] git: properly visit child tree objects when resolving a path

2020-06-01 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1591019387 14400
#  Mon Jun 01 09:49:47 2020 -0400
# Node ID c8ee7f58b11b835918e1e83b89741999f8809866
# Parent  e05b94be1ba0cd6e2ccd9c1688a82ff3a8103a7e
git: properly visit child tree objects when resolving a path

diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
--- a/hgext/git/manifest.py
+++ b/hgext/git/manifest.py
@@ -56,8 +56,9 @@ class gittreemanifest(object):
 return val
 t = self._tree
 comps = upath.split('/')
+te = self._tree
 for comp in comps[:-1]:
-te = self._tree[comp]
+te = te[comp]
 t = self._git_repo[te.id]
 ent = t[comps[-1]]
 if ent.filemode == pygit2.GIT_FILEMODE_BLOB:

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 5] git: don't yield paths for directories when walking

2020-06-01 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1591018818 14400
#  Mon Jun 01 09:40:18 2020 -0400
# Node ID e05b94be1ba0cd6e2ccd9c1688a82ff3a8103a7e
# Parent  cb9f077123e7e204587b2b9146736ddcfb8d677d
git: don't yield paths for directories when walking

diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
--- a/hgext/git/manifest.py
+++ b/hgext/git/manifest.py
@@ -173,9 +173,8 @@ class gittreemanifest(object):
 self._git_repo[te.id], match, realname + b'/'
 ):
 yield inner
-if not match(realname):
-continue
-yield pycompat.fsencode(realname)
+elif match(realname):
+yield pycompat.fsencode(realname)
 
 def walk(self, match):
 # TODO: this is a very lazy way to merge in the pending

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 5] git: correctly check for type of object when walking

2020-06-01 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1591017773 14400
#  Mon Jun 01 09:22:53 2020 -0400
# Node ID cb9f077123e7e204587b2b9146736ddcfb8d677d
# Parent  1a21d199a8b00e5fb2566ca99fab417253b13b19
git: correctly check for type of object when walking

diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
--- a/hgext/git/manifest.py
+++ b/hgext/git/manifest.py
@@ -168,7 +168,7 @@ class gittreemanifest(object):
 for te in tree:
 # TODO: can we prune dir walks with the matcher?
 realname = subdir + pycompat.fsencode(te.name)
-if te.type == r'tree':
+if te.type == pygit2.GIT_OBJ_TREE:
 for inner in self._walkonetree(
 self._git_repo[te.id], match, realname + b'/'
 ):

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Moving patches traffic to a another list?

2020-04-21 Thread Josef 'Jeff' Sipek
> On Apr 21, 2020, at 18:49, Augie Fackler  wrote:
> 
> In the admin interface for the individual list? Or is there some 
> mailman-global interface I don't yet know about?

Per list.  I’m not aware of any global interface either (global changes can be 
done by editing the mailman config).

Jeff.


> 
>> On Tue, Apr 21, 2020, 18:36 Josef 'Jeff' Sipek  wrote:
>> On Tue, Apr 21, 2020 at 18:09:37 -0400, Augie Fackler wrote:
>> > Presumably, but I honestly have no idea where to look. Anyone?
>> 
>> I'm guessing you want in the admin interface:
>> 
>> "Privacy options..." and then "advertised" = yes
>> 
>> Jeff.
>> 
>> > 
>> > > On Apr 21, 2020, at 13:00, Gregory Szorc  wrote:
>> > > 
>> > > I don't see a new list at https://www.mercurial-scm.org/mailman/listinfo 
>> > > <https://www.mercurial-scm.org/mailman/listinfo>. Are we missing a 
>> > > setting in Mailman to make it public?
>> > > 
>> > > On Tue, Apr 21, 2020 at 12:39 AM Pierre-Yves David 
>> > > mailto:pierre-yves.da...@ens-lyon.org>> 
>> > > wrote:
>> > > I think the list is ready to receive visitor now.
>> > > 
>> > > On 4/20/20 5:56 PM, Augie Fackler wrote:
>> > > > I think this is all squared away via IRC, please let me know if 
>> > > > something looks amiss.
>> > > > 
>> > > >> On Apr 20, 2020, at 06:16, Pierre-Yves David 
>> > > >> > > > >> <mailto:pierre-yves.da...@ens-lyon.org>> wrote:
>> > > >>
>> > > >> I set the mailing list side up, but I am not sure of how do to thing 
>> > > >> on the phabricator side. Does someone knows hwo the emailing to 
>> > > >> mercurial-devel@mercurial-scm.org 
>> > > >> <mailto:mercurial-devel@mercurial-scm.org> was made? Do I have the 
>> > > >> necessary access level to do something similar ?
>> > > >>
>> > > >> On 4/6/20 9:04 PM, Augie Fackler wrote:
>> > > >>>> On Apr 5, 2020, at 19:38, Pierre-Yves David 
>> > > >>>> > > > >>>> <mailto:pierre-yves.da...@ens-lyon.org>> wrote:
>> > > >>>>
>> > > >>>> A couple of day ago, you asked on IRC for help on how to create a 
>> > > >>>> new list. It looks like the `newlist` utility is meant for that. A 
>> > > >>>> small blog post give some more details here:
>> > > >>>>
>> > > >>>> http://data.agaric.com/create-mailman-list-through-command-line 
>> > > >>>> <http://data.agaric.com/create-mailman-list-through-command-line>
>> > > >>>>
>> > > >>>> And the utility seems to have a manpage
>> > > >>>>
>> > > >>>> http://manpages.ubuntu.com/manpages/bionic/man8/newlist.8.html 
>> > > >>>> <http://manpages.ubuntu.com/manpages/bionic/man8/newlist.8.html>
>> > > >>>>
>> > > >>>> Does this matches what we have installed ?
>> > > >>> That was a huge help! Okay, I think I've done the right thing. Can 
>> > > >>> you go check?
>> > > >>> Also, I have an admin password here for list moderation. Do you want 
>> > > >>> that so you can mod through the initial phab messages etc?
>> > > >>> I'm also happy to temporarily make you a phab admin to set the right 
>> > > >>> knobs...I'm assuming we can't ansible-ize that?
>> > > >>> ___
>> > > >>> Mercurial-devel mailing list
>> > > >>> Mercurial-devel@mercurial-scm.org 
>> > > >>> <mailto:Mercurial-devel@mercurial-scm.org>
>> > > >>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel 
>> > > >>> <https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel>
>> > > >>
>> > > >> -- 
>> > > >> Pierre-Yves David
>> > > > 
>> > > 
>> > > -- 
>> > > Pierre-Yves David
>> > > ___
>> > > Mercurial-devel mailing list
>> > > Mercurial-devel@mercurial-scm.org 
>> > > <mailto:Mercurial-devel@mercurial-scm.org>
>> > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel 
>> > > <https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel>
>> > 
>> 
>> > ___
>> > Mercurial-devel mailing list
>> > Mercurial-devel@mercurial-scm.org
>> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>> 
>> 
>> -- 
>> Computer Science is no more about computers than astronomy is about
>> telescopes.
>> - Edsger Dijkstra
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Moving patches traffic to a another list?

2020-04-21 Thread Josef 'Jeff' Sipek
On Tue, Apr 21, 2020 at 18:09:37 -0400, Augie Fackler wrote:
> Presumably, but I honestly have no idea where to look. Anyone?

I'm guessing you want in the admin interface:

"Privacy options..." and then "advertised" = yes

Jeff.

> 
> > On Apr 21, 2020, at 13:00, Gregory Szorc  wrote:
> > 
> > I don't see a new list at https://www.mercurial-scm.org/mailman/listinfo 
> > . Are we missing a setting 
> > in Mailman to make it public?
> > 
> > On Tue, Apr 21, 2020 at 12:39 AM Pierre-Yves David 
> > mailto:pierre-yves.da...@ens-lyon.org>> 
> > wrote:
> > I think the list is ready to receive visitor now.
> > 
> > On 4/20/20 5:56 PM, Augie Fackler wrote:
> > > I think this is all squared away via IRC, please let me know if something 
> > > looks amiss.
> > > 
> > >> On Apr 20, 2020, at 06:16, Pierre-Yves David 
> > >> mailto:pierre-yves.da...@ens-lyon.org>> 
> > >> wrote:
> > >>
> > >> I set the mailing list side up, but I am not sure of how do to thing on 
> > >> the phabricator side. Does someone knows hwo the emailing to 
> > >> mercurial-devel@mercurial-scm.org 
> > >>  was made? Do I have the 
> > >> necessary access level to do something similar ?
> > >>
> > >> On 4/6/20 9:04 PM, Augie Fackler wrote:
> >  On Apr 5, 2020, at 19:38, Pierre-Yves David 
> >   >  > wrote:
> > 
> >  A couple of day ago, you asked on IRC for help on how to create a new 
> >  list. It looks like the `newlist` utility is meant for that. A small 
> >  blog post give some more details here:
> > 
> >  http://data.agaric.com/create-mailman-list-through-command-line 
> >  
> > 
> >  And the utility seems to have a manpage
> > 
> >  http://manpages.ubuntu.com/manpages/bionic/man8/newlist.8.html 
> >  
> > 
> >  Does this matches what we have installed ?
> > >>> That was a huge help! Okay, I think I've done the right thing. Can you 
> > >>> go check?
> > >>> Also, I have an admin password here for list moderation. Do you want 
> > >>> that so you can mod through the initial phab messages etc?
> > >>> I'm also happy to temporarily make you a phab admin to set the right 
> > >>> knobs...I'm assuming we can't ansible-ize that?
> > >>> ___
> > >>> Mercurial-devel mailing list
> > >>> Mercurial-devel@mercurial-scm.org 
> > >>> 
> > >>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel 
> > >>> 
> > >>
> > >> -- 
> > >> Pierre-Yves David
> > > 
> > 
> > -- 
> > Pierre-Yves David
> > ___
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org 
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel 
> > 
> 

> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


-- 
Computer Science is no more about computers than astronomy is about
telescopes.
- Edsger Dijkstra
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] pathutil: document that dirs map type implies manifest/dirstate processing

2020-04-01 Thread Josef 'Jeff' Sipek
On Fri, Mar 27, 2020 at 12:21:50 -0400, Augie Fackler wrote:
> On Mar 27, 2020, at 10:56, Josef 'Jeff' Sipek  wrote:
> > On Fri, Mar 27, 2020 at 10:48:58 -0400, Josef 'Jeff' Sipek wrote:
> > > # HG changeset patch
> > > # User Josef 'Jeff' Sipek 
> > > # Date 158531 14400
> > > #  Fri Mar 27 10:39:59 2020 -0400
> > > # Node ID f313b33e0341724093d866f0bf5478a28cad092a
> > > # Parent  4f4c2622ec748ce806c6577c30d4f1ae8660a0c2
> > > pathutil: document that dirs map type implies manifest/dirstate processing
> > 
> > FWIW, this "argument type implies manifest or dirstate" seems like a hack.
> > I'm not familiar enough with python to know if I'm wrong here, but I'm open
> > to replacing this patch with somethig that adds a "type" argument.  Then,
> > __init__ can assert/abort/etc. if it gets a mismatch.  I imagine something
> > like:
> > 
> > def __init__(self, map, is_dirstate, skip=None):
> >if is_dirstate:
> >assert isinstance(map, dict)
> >else:
> >assert isinstance(map, list)
> >...code more or less as before...
> > 
> > Would this be a good change?
> 
> I'm conflicted. What we've got is akin to a type-overload in C++, but
> there's (obviously) no function overloading in Python. It might be clearer
> to convert this to three methods, eg (PEP484 hints included for clarity):
...
> Thoughts?

Agreed.

I already shared this on IRC last week, but I should share it here as well.
It is a work-in-progress.  I haven't run the tests, but normal usage seems
to work WITHOUT the C or Rust code.  Those still need to be changed and I
haven't even looked at how to implement class methods in those.

Jeff.


# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1585358782 14400
#  Fri Mar 27 21:26:22 2020 -0400
# Node ID c594db96ec9d119655d30305b5ac8d22e9d4a3bb
# Parent  f313b33e0341724093d866f0bf5478a28cad092a
WIP: rewrite pathutils.dirs

diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
--- a/hgext/git/manifest.py
+++ b/hgext/git/manifest.py
@@ -120,7 +120,7 @@ class gittreemanifest(object):
 
 @util.propertycache
 def _dirs(self):
-return pathutil.dirs(self)
+return pathutil.dirs.from_manifest(self)
 
 def hasdir(self, dir):
 return dir in self._dirs
@@ -232,7 +232,7 @@ class memgittreemanifestctx(object):
 # just narrow?
 assert not match or isinstance(match, matchmod.alwaysmatcher)
 
-touched_dirs = pathutil.dirs(list(self._pending_changes))
+touched_dirs = pathutil.dirs.from_manifest(self._pending_changes)
 trees = {
 b'': self._tree,
 }
diff --git a/hgext/narrow/narrowcommands.py b/hgext/narrow/narrowcommands.py
--- a/hgext/narrow/narrowcommands.py
+++ b/hgext/narrow/narrowcommands.py
@@ -278,7 +278,7 @@ def _narrow(
 todelete.append(f)
 elif f.startswith(b'meta/'):
 dir = f[5:-13]
-dirs = sorted(pathutil.dirs({dir})) + [dir]
+dirs = sorted(pathutil.dirs.from_manifest({dir})) + [dir]
 include = True
 for d in dirs:
 visit = newmatch.visitdir(d)
diff --git a/hgext/uncommit.py b/hgext/uncommit.py
--- a/hgext/uncommit.py
+++ b/hgext/uncommit.py
@@ -186,7 +186,7 @@ def uncommit(ui, repo, *pats, **opts):
 # if not everything tracked in that directory can be
 # uncommitted.
 if badfiles:
-badfiles -= {f for f in pathutil.dirs(eligible)}
+badfiles -= {f for f in pathutil.dirs.from_manifest(eligible)}
 
 for f in sorted(badfiles):
 if f in s.clean:
diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -2823,7 +2823,7 @@ def remove(
 progress.complete()
 
 # warn about failure to delete explicit files/dirs
-deleteddirs = pathutil.dirs(deleted)
+deleteddirs = pathutil.dirs.from_manifest(deleted)
 files = m.files()
 progress = ui.makeprogress(
 _(b'deleting'), total=len(files), unit=_(b'files')
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -1577,11 +1577,11 @@ class dirstatemap(object):
 
 @propertycache
 def _dirs(self):
-return pathutil.dirs(self._map, b'r')
+return pathutil.dirs.from_dirstate(self._map, b'r')
 
 @propertycache
 def _alldirs(self):
-return pathutil.dirs(self._map)
+return pathutil.dirs.from_dirstate(self._map)
 
 def _opendirstatefile(self):
 fp, mode = txnutil.trypending(self._root, self._opener, self._filename)
diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@

Re: [PATCH 2 of 2] pathutil: document that dirs map type implies manifest/dirstate processing

2020-03-27 Thread Josef 'Jeff' Sipek
On Fri, Mar 27, 2020 at 10:48:58 -0400, Josef 'Jeff' Sipek wrote:
> # HG changeset patch
> # User Josef 'Jeff' Sipek 
> # Date 158531 14400
> #  Fri Mar 27 10:39:59 2020 -0400
> # Node ID f313b33e0341724093d866f0bf5478a28cad092a
> # Parent  4f4c2622ec748ce806c6577c30d4f1ae8660a0c2
> pathutil: document that dirs map type implies manifest/dirstate processing

FWIW, this "argument type implies manifest or dirstate" seems like a hack.
I'm not familiar enough with python to know if I'm wrong here, but I'm open
to replacing this patch with somethig that adds a "type" argument.  Then,
__init__ can assert/abort/etc. if it gets a mismatch.  I imagine something
like:

def __init__(self, map, is_dirstate, skip=None):
if is_dirstate:
assert isinstance(map, dict)
else:
assert isinstance(map, list)
...code more or less as before...

Would this be a good change?

Thanks,

Jeff.

> 
> diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py
> --- a/mercurial/pathutil.py
> +++ b/mercurial/pathutil.py
> @@ -286,6 +286,9 @@ class dirs(object):
>  '''a multiset of directory names from a set of file paths'''
>  
>  def __init__(self, map, skip=None):
> +'''
> +a dict map indicates a dirstate while a list indicates a manifest
> +'''
>  self._dirs = {}
>  addpath = self.addpath
>  if isinstance(map, dict) and skip is not None:
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
You measure democracy by the freedom it gives its dissidents, not the
freedom it gives its assimilated conformists.
- Abbie Hoffman
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 2] git: pass a list to pathutil.dirs to indicate that it is a manifest

2020-03-27 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1585319920 14400
#  Fri Mar 27 10:38:40 2020 -0400
# Node ID 4f4c2622ec748ce806c6577c30d4f1ae8660a0c2
# Parent  5a77ab1704526629c316ebd93ca355d3439eb0b7
git: pass a list to pathutil.dirs to indicate that it is a manifest

The python implementation of pathutil.dirs just uses a for loop which
happens to work the same on both dicts and lists.  The rust implementation
actually figures out which of the two types it is, and directs the execution
to either dirstate or manifest processing.

diff --git a/hgext/git/manifest.py b/hgext/git/manifest.py
--- a/hgext/git/manifest.py
+++ b/hgext/git/manifest.py
@@ -232,7 +232,7 @@ class memgittreemanifestctx(object):
 # just narrow?
 assert not match or isinstance(match, matchmod.alwaysmatcher)
 
-touched_dirs = pathutil.dirs(self._pending_changes)
+touched_dirs = pathutil.dirs(list(self._pending_changes))
 trees = {
 b'': self._tree,
 }
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 2] pathutil: document that dirs map type implies manifest/dirstate processing

2020-03-27 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 158531 14400
#  Fri Mar 27 10:39:59 2020 -0400
# Node ID f313b33e0341724093d866f0bf5478a28cad092a
# Parent  4f4c2622ec748ce806c6577c30d4f1ae8660a0c2
pathutil: document that dirs map type implies manifest/dirstate processing

diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py
--- a/mercurial/pathutil.py
+++ b/mercurial/pathutil.py
@@ -286,6 +286,9 @@ class dirs(object):
 '''a multiset of directory names from a set of file paths'''
 
 def __init__(self, map, skip=None):
+'''
+a dict map indicates a dirstate while a list indicates a manifest
+'''
 self._dirs = {}
 addpath = self.addpath
 if isinstance(map, dict) and skip is not None:
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 4] git: remove obsolete todo item

2020-03-26 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1585254234 14400
#  Thu Mar 26 16:23:54 2020 -0400
# Node ID 6d0a988e6a5d4189f111a6dc2ce2c1fa5fd29422
# Parent  50843409eac03e9e31d39ad4539e7a39d03d17eb
git: remove obsolete todo item

The changes in 02c47b74366c cleaned up the requirement check.

diff --git a/hgext/git/TODO.md b/hgext/git/TODO.md
--- a/hgext/git/TODO.md
+++ b/hgext/git/TODO.md
@@ -28,12 +28,3 @@ We should spend some time thinking hard 
 repository, but may not have enough locking correctness in places
 where hg does locking that git isn't aware of (notably the working
 copy, which I believe Git does not lock.)
-
-Clean up requirements
-=
-
-Right now (for historical reasons, mainly) hgext.git uses a
-.hg/this-is-git file to detect repositories that should be treated as
-git. We should look in the .hg/requires for the "git" requirement
-instead (we already set this requirement, so it's mostly keying off
-that instead of using an empty file.)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 4] git: abort when attempting to set a branch

2020-03-26 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1585256974 14400
#  Thu Mar 26 17:09:34 2020 -0400
# Node ID c1b5ff87d07b7029f5df5345dbbc097a5f01d9b4
# Parent  6d0a988e6a5d4189f111a6dc2ce2c1fa5fd29422
git: abort when attempting to set a branch

Given the mapping we use (namely, a git head is a bookmark), it is better to
error out with a hint.

diff --git a/hgext/git/dirstate.py b/hgext/git/dirstate.py
--- a/hgext/git/dirstate.py
+++ b/hgext/git/dirstate.py
@@ -300,3 +300,6 @@ class gitdirstate(object):
 def clearbackup(self, tr, backupname):
 # TODO
 pass
+
+def setbranch(self, branch):
+raise error.Abort(b'git repos do not support branches. try using 
bookmarks')
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 4 of 4] git: implement basic bookmark activation

2020-03-26 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1585259370 14400
#  Thu Mar 26 17:49:30 2020 -0400
# Node ID 5a77ab1704526629c316ebd93ca355d3439eb0b7
# Parent  917cd2c4073cf69b2c366dccbfccc3b28ff4fca2
git: implement basic bookmark activation

This is very limited, but it allows 'hg update foo' when already on foo.

The caching is based on bmstore's caching.

diff --git a/hgext/git/__init__.py b/hgext/git/__init__.py
--- a/hgext/git/__init__.py
+++ b/hgext/git/__init__.py
@@ -143,6 +143,8 @@ def _setupdothg(ui, path):
 class gitbmstore(object):
 def __init__(self, gitrepo):
 self.gitrepo = gitrepo
+self._aclean = True
+self._active = gitrepo.references['HEAD'] # git head, not mark
 
 def __contains__(self, name):
 return (
@@ -180,7 +182,18 @@ class gitbmstore(object):
 
 @active.setter
 def active(self, mark):
-raise NotImplementedError
+githead = mark is not None and (_BMS_PREFIX + mark) or None
+if githead is not None and githead not in self.gitrepo.references:
+raise AssertionError(b'bookmark %s does not exist!' % mark)
+
+self._active = githead
+self._aclean = False
+
+def _writeactive(self):
+if self._aclean:
+return
+self.gitrepo.references.create('HEAD', self._active, True)
+self._aclean = True
 
 def names(self, node):
 r = []
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 4] git: implement a basic checkconflict bookmark store method

2020-03-26 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1585257894 14400
#  Thu Mar 26 17:24:54 2020 -0400
# Node ID 917cd2c4073cf69b2c366dccbfccc3b28ff4fca2
# Parent  c1b5ff87d07b7029f5df5345dbbc097a5f01d9b4
git: implement a basic checkconflict bookmark store method

It is heavily based on bmstore's own checkconflict.

diff --git a/hgext/git/__init__.py b/hgext/git/__init__.py
--- a/hgext/git/__init__.py
+++ b/hgext/git/__init__.py
@@ -219,6 +219,34 @@ class gitbmstore(object):
 force=True,
 )
 
+def checkconflict(self, mark, force=False, target=None):
+githead = _BMS_PREFIX + mark
+cur = self.gitrepo.references['HEAD']
+if githead in self.gitrepo.references and not force:
+if target:
+if self.gitrepo.references[githead] == target and target == 
cur:
+# re-activating a bookmark
+return []
+# moving a bookmark - forward?
+raise NotImplementedError
+raise error.Abort(
+_(b"bookmark '%s' already exists (use -f to force)") % mark
+)
+if len(mark) > 3 and not force:
+try:
+shadowhash = scmutil.isrevsymbol(self._repo, mark)
+except error.LookupError:  # ambiguous identifier
+shadowhash = False
+if shadowhash:
+self._repo.ui.warn(
+_(
+b"bookmark %s matches a changeset hash\n"
+b"(did you leave a -r out of an 'hg bookmark' "
+b"command?)\n"
+)
+% mark
+)
+return []
 
 def init(orig, ui, dest=b'.', **opts):
 if opts.get('git', False):
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Interest in integrating hg-git into Mercurial

2019-08-19 Thread Josef 'Jeff' Sipek
On Fri, Aug 16, 2019 at 16:58:56 -0400, Augie Fackler wrote:
> On Aug 1, 2019, at 22:30, Gregory Szorc  wrote:
...
> > I would prefer we interface with Git repositories using the storage
> > abstraction and not have to maintain a shadow Mercurial repository as
> > well. And I do think that is technically viable.
> 
> For your consideration, a (crude) prototype: phab.mercurial-scm.org/D6734
> 
> It's a hack, but that's under a week of hacking, even when I try to
> account for the reuse of code I had laying around. You're right that
> octopus merges will be a pain, but it looks like rationalizing the
> dirstate interface is the real big hassle at the moment. I suspect we
> could dummy up octopus merges with some weird hash tricks...

I certainly haven't done a proper survey, but AFAIK virtually all git repos
have *zero* octopus merges.  So, I think it would be acceptable (at least
initially) if this code just refuses to work on repos with octopuses.

Jeff.

-- 
*NOTE: This message is ROT-13 encrypted twice for extra protection*
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 7 of 8 "] compression: introduce an official `zstd-revlog` requirement

2019-04-02 Thread Josef 'Jeff' Sipek
On Sun, Mar 31, 2019 at 17:36:23 +0200, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David 
> # Date 1553707623 -3600
> #  Wed Mar 27 18:27:03 2019 +0100
> # Node ID 2cfe9983fa92313d58f0420ec62f2341a810343e
> # Parent  108e26fa0a97fe5342a1ce246cc4e4c185803454
> # EXP-Topic zstd-revlog
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #  hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 
> 2cfe9983fa92
> compression: introduce an official `zstd-revlog` requirement

Is the requirement for the compression algo or for the compression algo's
use in revlog?

If the former, something like 'compression-' makes more sense.

If the later, would it be better to call it 'revlog-compression-' or
something to that effect?

Either way, while a *human* knows that zstd is a compression algo, could it
make sense to make it easily parsable?  I'm imagining a slightly better
error messages when requirements fail, or just the ability to
programmatically identify the algo.  For example, instead of the current:

  abort: repository requires features unknown to this Mercurial: foobar-revlog!

hg could emit:

  abort: repository requires a compression algo unknown to this Mercurial: 
foobar!

> 
> This requirement supersede `exp-compression-zstd`. However, we keep support 
> for

s/supersede/supersedes/ :)

Jeff.

-- 
What is the difference between Mechanical Engineers and Civil Engineers?
Mechanical Engineers build weapons, Civil Engineers build targets.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 6 of 8 "] compression: introduce an official `format.revlog-compression` option

2019-04-02 Thread Josef 'Jeff' Sipek
On Sun, Mar 31, 2019 at 17:36:22 +0200, Pierre-Yves David wrote:
...
> compression: introduce an official `format.revlog-compression` option
> 
> This option superseed the `experiment.format.compression` option. The value

s/superseed/supersedes/ :)

> currently supported are zlib (default) and zstd (if Mercurial was compiled 
> with
> zstd support).
> 
> The option gained an explicite reference to `revlog` since this is the target

s/explicite/explicit/

> usage here. Different storage methods might requires different compression
> strategies.

s/requires/require/

> 
> In our tests, using zstd give a significant CPU usage improvement (both
> compression and decompressing) while keeping similar repository size.
> 
> Zstd as other interresting mode (dictionnaly, pre-text, etc…) that are 
> probably

I'm guessing here: s/dictionnaly/dictionary/ ?

> worth exploring. However, just play switching from zlib to zstd provide a 
> large
> benefit.

s/play/plain/

...
> diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
> --- a/mercurial/help/config.txt
> +++ b/mercurial/help/config.txt
> @@ -866,6 +866,13 @@ https://www.mercurial-scm.org/wiki/Missi
>  Repositories with this on-disk format require Mercurial version 4.7
>  
>  Enabled by default.
> +``revlog-compression``
> +Compression algorithm used by revlog. Supported value are `zlib` and 
> `zstd`.
> +The `zlib` engine is the historical default of Mercurial. `zstd` is a 
> newer
> +format that is usually a net win over `zlib` operating faster at better
> +compression rate. Use `zstd` to reduce CPU usage.
> +
> +On some system, Mercurial installation may lack `zstd` supports. Default 
> is `zlib`.

This says that 'zlib' is the default - twice.  Should it repeat itself like
this?

Jeff.

-- 
Don't drink and derive. Alcohol and algebra don't mix.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 8 "] compression: introduce a `storage.revlog.zlib.level` configuration

2019-04-02 Thread Josef 'Jeff' Sipek
On Sun, Mar 31, 2019 at 17:36:19 +0200, Pierre-Yves David wrote:
...
> compression: introduce a `storage.revlog.zlib.level` configuration
> 
> This option control the zlib compression level used when compression revlog
> chunk.
> 
> This is also a good excuse to pave the way for a similar configuration option
> for the zstd compression engine. Having a dedicated option for each 
> compression
> algorithm is useful because they don't support the same range of values.
> 
> Using a higher zlib compression impact CPU consumption at compression time, 
> but
> does not directly affected decompression time. However dealing with small
> compressed chunk can directly help decompression and indirectly help other
> revlog logic.
> 
> I ran some basic test on repositories using different level. I am user the

s/user/using/ ?

...
> I also made some basic timing measurement. The "read" timing are gathered 
> using
> simple run of `hg perfrevlogrevisions`, the "write" measurement using `hg
> perfrevlogwrite` (restricted to the last 5000 revisions for netbeans and
> mozilla central). The timing are gathered on a generic machine, (not one  of
> our performance locked machine), so small variation might not be meaningful.

You did more than one measurement, so measurement -> measurements, and
timing -> timings?  Alternatively, keep the singular but then make the verbs
match: are -> is.

Sorry to nit-pick, but since this text will end up in the commit messages...
:)

> However large trend remains relevant.
> 
> Keep in mind that these number are not pure compression/decompression time.

s/number/numbers/

> They also involve the full revlog logic. In particular the difference in chunk
> size has an impact on the delta chain structure, affecting performance when
> writing or reading them.
> 
> On read/write performance, the compression level has a bigger impact.
> Counter-intuitively, higher compression level raise better "write" performance

s/raise better/increase/ ?

This actually confuses me a bit.  Based on the table below, it looks like
higher compression level has non-linear effect on read/write performance.
Maybe I'm not understanding what you meant by 'raise "better"'.

While I expect to see a "hump" in *write* performance (because high zlib
compression levels are such cpu hogs), I didn't expect to see one for *read*
perfomance.  I suppose the read hump could be explained by the shape of the
DAG, as you point out.

> for the large repositories in our tested setting. Maybe because the last 5000
> delta chain end up having a very different shape in this specific spot? Or 
> maybe
> because of a more general trend of better delta chains thanks to the smaller
> chunk and snapshot.
> 
> This series does not intend to change the default compression level. However,
> these result call for a deeper analysis of this performance difference in the
> future.
> 
> Full data
> =
> 
> repo   level  .hg/store size  00manifest.d read   write
> 
> mercurial  1  49,402,813 5,963,475   0.170159  53.250304
> mercurial  6  47,197,397 5,875,730   0.182820  56.264320
> mercurial  9  47,121,596 5,849,781   0.189219  56.293612
> 
> pypy   1 370,830,57228,462,425   2.679217 460.721984
> pypy   6 340,112,31727,648,747   2.768691 467.537158
> pypy   9 338,360,73627,639,003   2.763495 476.589918
> 
> netbeans   1   1,281,847,810   165,495,457 122.477027 520.560316
> netbeans   6   1,205,284,353   159,161,207 139.876147 715.930400
> netbeans   9   1,197,135,671   155,034,586 141.620281 678.297064
> 
> mozilla1   2,775,497,186   298,527,987 147.867662 751.263721
> mozilla6   2,596,856,420   286,597,671 170.572118 987.056093
> mozilla9   2,587,542,494   287,018,264 163.622338 739.803002
...
> diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
> --- a/mercurial/help/config.txt
> +++ b/mercurial/help/config.txt
> @@ -1881,6 +1881,11 @@ category impact performance and reposito
>  This option is enabled by default. When disabled, it also disables the
>  related ``storage.revlog.reuse-external-delta-parent`` option.
>  
> +``revlog.zlib.level``
> +Zlib compression level used when storing data into the repository. 
> Accepted
> +Value range from 1 (lowest compression) to 9 (highest compression). Zlib
> +default value is 6.

I know this is very unlikely to change, but does it make sense to say what
an external libarary's defaults are?


Thanks for doing this! :)

Jeff.

-- 
Reality is merely an illusion, albeit a very persistent one.
- Albert Einstein
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 8 "] util: extract compression code in `mercurial.utils.compression`

2019-04-02 Thread Josef 'Jeff' Sipek
On Sun, Mar 31, 2019 at 17:36:17 +0200, Pierre-Yves David wrote:
...
> diff --git a/mercurial/util.py b/mercurial/utils/compression.py
> copy from mercurial/util.py
> copy to mercurial/utils/compression.py
> --- a/mercurial/util.py
> +++ b/mercurial/utils/compression.py
> @@ -1,1555 +1,37 @@
> -# util.py - Mercurial utility functions and platform specific implementations
> -#
> -#  Copyright 2005 K. Thananchayan 
> -#  Copyright 2005-2007 Matt Mackall 
> -#  Copyright 2006 Vadim Gelfer 
> +# util.py - Mercurial utility functions for compression

Should this say compression.py instead?

Jeff.

-- 
I think there is a world market for maybe five computers.
- Thomas Watson, chairman of IBM, 1943.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] uncommit: abort if an explicitly given file is uncommittable

2019-03-30 Thread Josef 'Jeff' Sipek
On Sat, Mar 30, 2019 at 00:25:48 -0400, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison 
> # Date 1553910795 14400
> #  Fri Mar 29 21:53:15 2019 -0400
> # Node ID 25c902e210625bf71459da835ef4e5558fbab7dc
> # Parent  eec20025ada33889233e553c5825aac36b708f6c
> uncommit: abort if an explicitly given file is uncommittable

Shouldn't this technically say "... is non-uncommittable" or something like
that?  :)

Jeff.

> 
> I've gotten burned several times by this in the last few days.  The former 
> tests
> look simple enough, but if a good file and a bad file are given, the bad files
> are silently ignored.  Some commands like `forget` will warn about bogus 
> files,
> but that would likely get lost in the noise of an interactive uncommit.  The
> commit command aborts if a bad file is given, so this seems more consistent 
> for
> commands that alter the repository.
> 
> diff --git a/hgext/uncommit.py b/hgext/uncommit.py
> --- a/hgext/uncommit.py
> +++ b/hgext/uncommit.py
> @@ -133,8 +133,28 @@ def uncommit(ui, repo, *pats, **opts):
>  if len(old.parents()) > 1:
>  raise error.Abort(_("cannot uncommit merge changeset"))
>  
> +match = scmutil.match(old, pats, opts)
> +
> +# Check all explicitly given files; abort if there's a problem.
> +if match.files():
> +s = old.status(old.p1(), match, listclean=True)
> +eligible = set(s.added) | set(s.modified) | set(s.removed)
> +
> +for f in match.files():
> +if f not in eligible:
> +if f in s.clean:
> +hint = _(
> +b"file was not changed in working directory 
> parent")
> +elif repo.wvfs.exists(f):
> +hint = _(
> +b"file was untracked in working directory 
> parent")
> +else:
> +hint = _(b"file does not exist")
> +
> +raise error.Abort(_(b'cannot uncommit "%s"')
> +  % scmutil.getuipathfn(repo)(f), 
> hint=hint)
> +
>  with repo.transaction('uncommit'):
> -match = scmutil.match(old, pats, opts)
>  keepcommit = pats
>  if not keepcommit:
>  if opts.get('keep') is not None:
> diff --git a/tests/test-uncommit.t b/tests/test-uncommit.t
> --- a/tests/test-uncommit.t
> +++ b/tests/test-uncommit.t
> @@ -102,14 +102,16 @@ Recommit
>$ hg heads -T '{rev}:{node} {desc}'
>5:0c07a3ccda771b25f1cb1edbd02e683723344ef1 new change abcde (no-eol)
>  
> -Uncommit of non-existent and unchanged files has no effect
> +Uncommit of non-existent and unchanged files aborts
>$ hg uncommit nothinghere
> -  nothing to uncommit
> -  [1]
> +  abort: cannot uncommit "nothinghere"
> +  (file does not exist)
> +  [255]
>$ hg status
>$ hg uncommit file-abc
> -  nothing to uncommit
> -  [1]
> +  abort: cannot uncommit "file-abc"
> +  (file was not changed in working directory parent)
> +  [255]
>$ hg status
>  
>  Try partial uncommit, also moves bookmark
> @@ -513,3 +515,12 @@ Copy a->b1 and a->b2, then rename b1->c 
>date:Thu Jan 01 00:00:00 1970 +
>summary: add a
>
> +Removes can be uncommitted
> +
> +  $ hg ci -m 'modified b'
> +  $ hg rm b
> +  $ hg ci -m 'remove b'
> +  $ hg uncommit b
> +  note: keeping empty commit
> +  $ hg status
> +  R b
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
Computer Science is no more about computers than astronomy is about
telescopes.
- Edsger Dijkstra
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D6140: revset: add new contiguous(x) function for "x::x"

2019-03-20 Thread Josef 'Jeff' Sipek
On Wed, Mar 20, 2019 at 17:28:46 +, martinvonz (Martin von Zweigbergk) 
wrote:
> martinvonz added a comment.
> 
> 
>   In https://phab.mercurial-scm.org/D6140#89770, @martinvonz wrote:
>   
>   > Josef 'Jeff' Sipek  sent this to mercurial-devel. 
> I'm adding it here for reference.
>   >
>   > >   "x::x" is a useful trick for making a range contiguous, but it gets
>   > >   annoying if "x" is a long expression. Let's provide a simple function
>   > >   that helps with that. It also makes it the trick more discoverable.
>   >
>   > ...
>   >
>   > > +@predicate('contiguous(set)', safe=True, takeorder=True)
>   > >  +def contiguous(repo, subset, x, order):
>   > >  +"""Changesets that have both ancestors and descendants in the 
> set. This
>   > >  +effectively fills in gaps in the set to make it contiguous, 
> without adding
>   > >  +new common ancestors or common descendants.
>   > >  +
>   > >  + "contiguous(x)" is identical to "x::x".
>   >
>   > I read this doc string and the patch intro several times, and every time I
>   >  concluded that this function was useless.  Only after reading some of the
>   >  other replies, did I realize that "x" here can be a set.
>   
>   
>   The docstring does say "in the set" :)

Technically true :)

> But I agree that it's not very
>   clear. I copied the pattern from other functions. I would probably have
>   said "in the input set" otherwise.  Do you think that would have been
>   clearer?

That would make it clearer.  My brain connected the word set with the whole
expression "x::x" (which is *obviously* a set), not with the input - even
though the string in the @predicate does say that the input is a set.

> We could make that change to all the existing cases of plain
>   "set" referring to the input.

In general, I'm always for docs being accessible.  If a doc makes the user
feel like they need a degree in mathematics, the doc is bad.  With that
said, I have not looked at the other doc strings so I don't know where the
place on the good/bad scale.

Jeff.

> 
> REPOSITORY
>   rHG Mercurial
> 
> REVISION DETAIL
>   https://phab.mercurial-scm.org/D6140
> 
> To: martinvonz, #hg-reviewers
> Cc: mharbison72, yuja, av6, spectral, gracinet, marmoute, mercurial-devel
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
I have always wished for my computer to be as easy to use as my telephone;
my wish has come true because I can no longer figure out how to use my
telephone.
- Bjarne Stroustrup
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D6140: revset: add new contiguous(x) function for "x::x"

2019-03-20 Thread Josef 'Jeff' Sipek
On Fri, Mar 15, 2019 at 05:27:46 +, martinvonz (Martin von Zweigbergk) 
wrote:
...
>   "x::x" is a useful trick for making a range contiguous, but it gets
>   annoying if "x" is a long expression. Let's provide a simple function
>   that helps with that. It also makes it the trick more discoverable.
...
> +@predicate('contiguous(set)', safe=True, takeorder=True)
> +def contiguous(repo, subset, x, order):
> +"""Changesets that have both ancestors and descendants in the set. This
> +effectively fills in gaps in the set to make it contiguous, without 
> adding
> +new common ancestors or common descendants.
> +
> + "contiguous(x)" is identical to "x::x".

I read this doc string and the patch intro several times, and every time I
concluded that this function was useless.  Only after reading some of the
other replies, did I realize that "x" here can be a set.

Therefore, technically, the above is correct.  Practically, if you are a
mere mortal and you aren't used to very advanced revsets, the doc string
will make you go "huh?".  I don't have a good suggestion how to improve it,
I just thought I'd point out that it's a bit...opaque to more novice users.

I agree that the name isn't the best, but I'll stay away from this bike shed
:)

Jeff.

> +"""
> +return dagrange(repo, subset, x, x, order)
> +
>  @predicate('converted([id])', safe=True)
>  def converted(repo, subset, x):
>  """Changesets converted from the given identifier in the old repository 
> if
> 
> 
> 
> To: martinvonz, #hg-reviewers
> Cc: mercurial-devel
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
Only two things are infinite, the universe and human stupidity, and I'm not
sure about the former.
- Albert Einstein
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 2] template: add CBOR output format

2019-03-10 Thread Josef 'Jeff' Sipek
On Sun, Mar 10, 2019 at 14:56:48 +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara 
> # Date 1552190244 -32400
> #  Sun Mar 10 12:57:24 2019 +0900
> # Node ID 343027851edc0337e4058feb6f51d67dc584bc0b
> # Parent  9d4ae5044b4c96bdfb2bbb33fa696908b664a1d7
> template: add CBOR output format
> 
> The whole output is wrapped as an array just like the other serialization
> formats. It's an indefinite-length array since the size is unknown while
> encoding. Maybe we can add 'cbor-stream' (and 'pickle-stream') as needed.

FWIW, cbor sequences are what I think you have in mind for cbor-stream:

https://mailarchive.ietf.org/arch/msg/cbor/3MMQdOMd6ESrMQPFPzSORlsLfiY

Jeff.

> 
> diff --git a/mercurial/formatter.py b/mercurial/formatter.py
> --- a/mercurial/formatter.py
> +++ b/mercurial/formatter.py
> @@ -130,6 +130,7 @@ from . import (
>  util,
>  )
>  from .utils import (
> +cborutil,
>  dateutil,
>  stringutil,
>  )
> @@ -341,6 +342,18 @@ class pickleformatter(baseformatter):
>  baseformatter.end(self)
>  self._out.write(pickle.dumps(self._data))
>  
> +class cborformatter(baseformatter):
> +'''serialize items as an indefinite-length CBOR array'''
> +def __init__(self, ui, out, topic, opts):
> +baseformatter.__init__(self, ui, topic, opts, _nullconverter)
> +self._out = out
> +self._out.write(cborutil.BEGIN_INDEFINITE_ARRAY)
> +def _showitem(self):
> +self._out.write(b''.join(cborutil.streamencode(self._item)))
> +def end(self):
> +baseformatter.end(self)
> +self._out.write(cborutil.BREAK)
> +
>  class jsonformatter(baseformatter):
>  def __init__(self, ui, out, topic, opts):
>  baseformatter.__init__(self, ui, topic, opts, _nullconverter)
> @@ -617,7 +630,9 @@ class templateresources(templater.resour
>  
>  def formatter(ui, out, topic, opts):
>  template = opts.get("template", "")
> -if template == "json":
> +if template == "cbor":
> +return cborformatter(ui, out, topic, opts)
> +elif template == "json":
>  return jsonformatter(ui, out, topic, opts)
>  elif template == "pickle":
>  return pickleformatter(ui, out, topic, opts)
> diff --git a/mercurial/help/scripting.txt b/mercurial/help/scripting.txt
> --- a/mercurial/help/scripting.txt
> +++ b/mercurial/help/scripting.txt
> @@ -142,9 +142,11 @@ output containing authors, dates, descri
> using templates to make your life easier.
>  
>  The ``-T/--template`` argument allows specifying pre-defined styles.
> -Mercurial ships with the machine-readable styles ``json`` and ``xml``,
> -which provide JSON and XML output, respectively. These are useful for
> -producing output that is machine readable as-is.
> +Mercurial ships with the machine-readable styles ``cbor``, ``json``,
> +and ``xml``, which provide CBOR, JSON, and XML output, respectively.
> +These are useful for producing output that is machine readable as-is.
> +
> +(Mercurial 5.0 is required for CBOR style.)
>  
>  .. important::
>  
> diff --git a/mercurial/logcmdutil.py b/mercurial/logcmdutil.py
> --- a/mercurial/logcmdutil.py
> +++ b/mercurial/logcmdutil.py
> @@ -542,7 +542,7 @@ def changesetdisplayer(ui, repo, opts, d
>  regular display via changesetprinter() is done.
>  """
>  postargs = (differ, opts, buffered)
> -if opts.get('template') == 'json':
> +if opts.get('template') in {'cbor', 'json'}:
>  fm = ui.formatter('log', opts)
>  return changesetformatter(ui, repo, fm, *postargs)
>  
> diff --git a/tests/test-template-map.t b/tests/test-template-map.t
> --- a/tests/test-template-map.t
> +++ b/tests/test-template-map.t
> @@ -669,6 +669,70 @@ Test xml styles:
>
>  
>  
> +test CBOR style:
> +
> +  $ cat <<'EOF' > "$TESTTMP/decodecborarray.py"
> +  > from __future__ import absolute_import
> +  > from mercurial import pycompat
> +  > from mercurial.utils import (
> +  > cborutil,
> +  > stringutil,
> +  > )
> +  > data = pycompat.stdin.read()
> +  > # our CBOR decoder doesn't support parsing indefinite-length arrays,
> +  > # but the log output is indefinite stream by nature.
> +  > assert data[:1] == cborutil.BEGIN_INDEFINITE_ARRAY
> +  > assert data[-1:] == cborutil.BREAK
> +  > items = cborutil.decodeall(data[1:-1])
> +  > pycompat.stdout.write(stringutil.pprint(items, indent=1) + b'\n')
> +  > EOF
> +
> +  $ hg log -k nosuch -Tcbor | "$PYTHON" "$TESTTMP/decodecborarray.py"
> +  []
> +
> +  $ hg log -qr0:1 -Tcbor | "$PYTHON" "$TESTTMP/decodecborarray.py"
> +  [
> +   {
> +'node': '1e4e1b8f71e05681d422154f5421e385fec3454f',
> +'rev': 0
> +   },
> +   {
> +'node': 'b608e9d1a3f0273ccf70fb85fd6866b3482bf965',
> +'rev': 1
> +   }
> +  ]
> +
> +  $ hg log -vpr . -Tcbor --stat | "$PYTHON" "$TESTTMP/decodecborarray.py"
> +  [
> +   {
> +'bookmarks': [],
> +'branch': 'default',
> +'date': [
> + 1577872860,
> + 0
> +],
> +'desc': 'third',
> 

Re: Stop bugzilla bot from marking issues as "RESOLVED ARCHIVED".

2019-03-01 Thread Josef 'Jeff' Sipek
On Thu, Feb 28, 2019 at 21:33:31 -0500, Augie Fackler wrote:
> On Wed, Feb 27, 2019 at 03:31:11AM +0530, Faheem Mitha wrote:
...
> > practice was a good idea, anyway?
> 
> Without the bot, the bug backlog grows without bound. Nobody actually
> checks to see if old bugs are fixed, and eventually the bugtracker
> becomes a disaster and not useful.

I have to say that at the beginning of this thread, I 100% agreed with
Faheem but you managed to change my mind.

I think that it'd be a *huge* improvement if the message the bot used to
archive the bugs included "kinder" words.  That is, explain to whoever filed
the bug why the bug was closed and that reopening it is ok.  The current
message certainly evokes a "why did I even bother?" feeling which only
discourages reporting of new issues - and some of those issues are probably
serious bugs and not just minor nits.

Jeff.

-- 
It used to be said [...] that AIX looks like one space alien discovered
Unix, and described it to another different space alien who then implemented
AIX. But their universal translators were broken and they'd had to gesture a
lot.
- Paul Tomblin 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3 STABLE] rust-cpython: raising error.WdirUnsupported

2019-01-24 Thread Josef 'Jeff' Sipek
On Wed, Jan 23, 2019 at 23:23:53 -0500, Georges Racinet wrote:
...
> diff -r a35cfd592a90 -r 47d5458a4c32 tests/test-rust-ancestor.py
> --- a/tests/test-rust-ancestor.py Wed Jan 23 07:47:04 2019 -0500
> +++ b/tests/test-rust-ancestor.py Wed Jan 23 07:49:36 2019 -0500
> @@ -1,6 +1,10 @@
>  from __future__ import absolute_import
>  import sys
>  import unittest
> +from mercurial import (
> +error,
> +node,
> +)

Is there supposed to be whitespace before the ) ?  I'm not all that familiar
with the python coding style.

I know, such important feedback... :)

Jeff.


>  
>  try:
>  from mercurial import rustext
> @@ -153,6 +157,12 @@
>  # rust-cpython issues appropriate str instances for Python 2 and 3
>  self.assertEqual(exc.args, ('ParentOutOfRange', 1))
>  
> +def testwdirunsupported(self):
> +# trying to access ancestors of the working directory raises
> +# WdirUnsupported directly
> +idx = self.parseindex()
> +with self.assertRaises(error.WdirUnsupported):
> +list(AncestorsIterator(idx, [node.wdirrev], -1, False))
>  
>  if __name__ == '__main__':
>  import silenttestrunner
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
All parts should go together without forcing.  You must remember that the
parts you are reassembling were disassembled by you.  Therefore, if you
can’t get them together again, there must be a reason.  By all means, do not
use a hammer.
— IBM Manual, 1925
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D5269: tests: sniff for /usr/local/bin/gmake and use it in test-fuzz-targets.t

2018-11-14 Thread Josef 'Jeff' Sipek
On Wed, Nov 14, 2018 at 15:13:11 +, durin42 (Augie Fackler) wrote:
> durin42 created this revision.
> Herald added a subscriber: mercurial-devel.
> Herald added a reviewer: hg-reviewers.
> 
> REVISION SUMMARY
>   This isn't as robust as it probably should be, but for now it'll get
>   the job done on the buildbots.
> 
> REPOSITORY
>   rHG Mercurial
> 
> REVISION DETAIL
>   https://phab.mercurial-scm.org/D5269
> 
> AFFECTED FILES
>   tests/test-fuzz-targets.t
> 
> CHANGE DETAILS
> 
> diff --git a/tests/test-fuzz-targets.t b/tests/test-fuzz-targets.t
> --- a/tests/test-fuzz-targets.t
> +++ b/tests/test-fuzz-targets.t
> @@ -2,11 +2,17 @@
>  
>$ cd $TESTDIR/../contrib/fuzz
>  
> +  $ if [ -x /usr/local/bin/gmake ] ; then
> +  > MAKE=gmake

Isn't this assuming that /usr/local/bin is in $PATH?  IOW, shouldn't this
assignment be:

MAKE=/usr/local/bin/gmake

?

Jeff.

> +  > else
> +  > MAKE=make
> +  > fi
> +
>  #if clang-libfuzzer
> -  $ make -s clean all
> +  $ $MAKE -s clean all
>  #endif
>  #if no-clang-libfuzzer clang-6.0
> -  $ make -s clean all CC=clang-6.0 CXX=clang++-6.0
> +  $ $MAKE -s clean all CC=clang-6.0 CXX=clang++-6.0
>  #endif
>  #if no-clang-libfuzzer no-clang-6.0
>$ exit 80
> 
> 
> 
> To: durin42, #hg-reviewers
> Cc: mercurial-devel
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
mainframe, n.:
  An obsolete device still used by thousands of obsolete companies serving
  billions of obsolete customers and making huge obsolete profits for their
  obsolete shareholders. And this year's run twice as fast as last year's.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Rust extensions: the next step

2018-10-18 Thread Josef 'Jeff' Sipek
On Thu, Oct 18, 2018 at 12:22:16 +0200, Gregory Szorc wrote:
...
> Something else we may want to consider is a single Python module exposing
> the Rust code instead of N. Rust’s more aggressive cross function
> compilation optimization could result in better performance if everything
> is linked/exposed in a single shared library/module/extension. Maybe this
> is what you are proposing? It is unclear if Rust code is linked into the
> Python extension or loaded from a shared shared library.

(Warning: I suck at python, aren't an expert on rust, but have more
knowledge about ELF linking/loading/etc. than is healthy.)

Isn't there also a distinction between code layout (separate crates) and the
actual binary that cargo/rustc builds?  IOW, the code could (and probably
should) be nicely separated but rustc can combine all the crates' code into
one big binary for loading into python.  Since it would see all the code, it
can do its fancy optimizations without impacting code readability.

Jeff.

-- 
C is quirky, flawed, and an enormous success.
- Dennis M. Ritchie.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Wire protocol futures

2018-09-05 Thread Josef 'Jeff' Sipek
There is a lot of info here, thanks for the write up!

On Fri, Aug 31, 2018 at 15:47:34 -0700, Gregory Szorc wrote:
...
> Assuming you only have primitive data retrieval commands, you are now
> issuing a lot more commands.

While I'm all for allowing simpler servers (and hopefully clients too), I'm
worried about the chattiness of such a protocol - specifically the number of
network round-trips that depend on previous commands completing.

Over the years, I've seen plenty of protocols evolve to reduce chattiness.
For example, NFSv4 added compounds - a way to pack up several RPCs and send
them as a unit, SMB/CIFS reduced the number of RPCs, and so on.  I realize
that both those examples are file systems, but I'd argue that their lessons
apply here as well.

Somewhat relatedly:  The jmap IETF working group [1] is working on a new way
to access email - ideally replacing IMAP.  The interesting thing here is
that the entire design is visibly targetting high latency links.
(Personally, I think this is because the authors are from Australia and
therefore they are very sensitive to latency.)  I don't know if there are
any lessons in jmap that would apply here, but I would certainly encourage
testing on high-latency & high-bandwidth links if there is any concern of
chattiness in the new protocol.

[1] https://datatracker.ietf.org/group/jmap/about/

...
> At the end of the day, the wire protocol command set will be driven by
> practical needs, not by ivory tower architecting. We'll see what shortcuts
> we need to employ in the name of performance and we'll implement them.

That's good to hear.  I just hope that these "bonus" commands will fit more
or less nicely into the new protocol design.  It'd be rather unfortunate if
in the process of adding these bonus commands you reinvented getbundle.

...
> Since we are effectively talking about a new VCS at the wire protocol
> level, let's talk about other crazy ideas. As Augie likes to say, once we
> decide to incur a backwards compatibility break, we can drive a truck
> through it.
> 
> Let's talk about hashes.
> 
> Mercurial uses SHA-1 for content indexing. We know we want to transition
> off of SHA-1 eventually due to security weaknesses.
...
> In addition, Mercurial has 2 ways to store manifests: flat and tree.
...
> 
> One of the ideas I'm exploring in the new wire protocol is the idea of
> "hash namespaces." Essentially, the server's capabilities will advertise
> which hash flavors are supported. Example hash flavors could be
> "hg-sha1-flat" for flat manifests using SHA-1 and "hg-blake2b-tree" for
> tree manifests using blake2b. When a client makes a request, that request
> will be associated with a "hash namespace" such that any nodes referenced
> by that command are in the requested "hash namespace."

While this idea is intriguing, it also means AFAICT that a changeset no
longer has one globally unique ID.  E.g., consider the world where there
are:

hg-sha256-flat
hg-blake2b-flat

or:

hg-blake2b-flat
hg-blake2b-tree

In both cases, the node id will be 32 bytes/64 hex chars long.  I can no
longer paste at you a hash I see in 'hg log' and (1) know what hash function
generated it, and (2) be certain that you can grep your 'hg log' output for
it and find it.  This whole thing gets even more fun when you share
abbreviated hashes - e.g., abc may be the shortest unique node prefix in
both namespaces, but may map to completely different revisions.

As a side note, wouldn't it be possible to deal with flat<->tree transitions
by making a "dummy" commit that rewrites the manifest to the new format and
sets some flag in .hg/requires?

Anyway, as intriguing as this idea is, I'm skeptical that the resulting UX
will be good.  It also possible that I'm not fully understanding your idea
here :)

> This feature, if implemented, would allow a server/repository to index and
> serve data under multiple hashing methodologies simultaneously. For
> example, pushes to the repository would be indexed under SHA-1 flat, SHA-1
> tree, blake2b flat, and blake2b tree. Assuming the server operator opts
> into this feature, new clones would use whatever format is
> supported/recommended at that time. Existing clones would continue to
> receive SHA-1 flat manifests. New clones would receive blake2b tree
> manifests.

See above about UX.

Regardless, it is certainly something to experiment with and either keep or
throw away.

Thanks for all the work you've put in,

Jeff.

-- 
Once you have their hardware. Never give it back.
(The First Rule of Hardware Acquisition)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] lazymanifest: don't crash when out of memory (issue5916)

2018-06-14 Thread Josef 'Jeff' Sipek
On Thu, Jun 14, 2018 at 20:14:24 +0900, Yuya Nishihara wrote:
> On Wed, 13 Jun 2018 10:55:12 -0400, Josef 'Jeff' Sipek wrote:
> > # HG changeset patch
> > # User Josef 'Jeff' Sipek 
> > # Date 1528900880 14400
> > #  Wed Jun 13 10:41:20 2018 -0400
> > # Branch stable
> > # Node ID d591c80025ee7316b77235b2d71c4b0f01c03123
> > # Parent  cbb47a946bc0e0346bfc9f9ba505f9475de43606
> > lazymanifest: don't crash when out of memory (issue5916)
> > 
> > self->lines can be NULL if we failed to allocate memory for it.
> > 
> > diff --git a/mercurial/cext/manifest.c b/mercurial/cext/manifest.c
> > --- a/mercurial/cext/manifest.c
> > +++ b/mercurial/cext/manifest.c
> > @@ -185,7 +185,7 @@ static void lazymanifest_dealloc(lazyman
> >  {
> > /* free any extra lines we had to allocate */
> > int i;
> > -   for (i = 0; i < self->numlines; i++) {
> > +   for (i = 0; self->lines && (i < self->numlines); i++) {
> 
> Perhaps realloc_if_full() shouldn't overwrite self->lines and maxlines on
> failure. I think that's the source of the inconsistency.

That is one possible place, but not the one I've hit in production.  I've
seen lazymanifest_copy() fail to allocate memory for ->lines.  I'm not
familiar with all the python goo, but I'm guessing:

1. lazymanifest_copy() is called
2. a new python object is allocated (copy)
3. copy->lines = malloc(...) = NULL because of ENOMEM
4. goto nomem
5. Py_XDECREF(copy)
6. lazymanifest_dealloc() is called by python goo

All in all, it looks like there are 4 places that allocate memory for
->lines.

Jeff.

-- 
*NOTE: This message is ROT-13 encrypted twice for extra protection*
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 2] lazymanifest: don't crash when out of memory (issue5916)

2018-06-13 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1528900880 14400
#  Wed Jun 13 10:41:20 2018 -0400
# Branch stable
# Node ID d591c80025ee7316b77235b2d71c4b0f01c03123
# Parent  cbb47a946bc0e0346bfc9f9ba505f9475de43606
lazymanifest: don't crash when out of memory (issue5916)

self->lines can be NULL if we failed to allocate memory for it.

diff --git a/mercurial/cext/manifest.c b/mercurial/cext/manifest.c
--- a/mercurial/cext/manifest.c
+++ b/mercurial/cext/manifest.c
@@ -185,7 +185,7 @@ static void lazymanifest_dealloc(lazyman
 {
/* free any extra lines we had to allocate */
int i;
-   for (i = 0; i < self->numlines; i++) {
+   for (i = 0; self->lines && (i < self->numlines); i++) {
if (self->lines[i].from_malloc) {
free(self->lines[i].start);
}

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 2] cext: stop worrying and love the free(NULL)

2018-06-13 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek 
# Date 1528900659 14400
#  Wed Jun 13 10:37:39 2018 -0400
# Branch stable
# Node ID cbb47a946bc0e0346bfc9f9ba505f9475de43606
# Parent  3c84493556db3bffcff2fa2f24bb6738dde9fc58
cext: stop worrying and love the free(NULL)

There is no need to check for a NULL pointer before calling free since
free(NULL) is defined by C standards as a no-op.  Lots of software relies on
this behavior so it is completely safe to call even on the most obscure of
systems.

diff --git a/mercurial/cext/bdiff.c b/mercurial/cext/bdiff.c
--- a/mercurial/cext/bdiff.c
+++ b/mercurial/cext/bdiff.c
@@ -155,12 +155,8 @@ cleanup:
PyEval_RestoreThread(_save);
PyBuffer_Release();
PyBuffer_Release();
-   if (al) {
-   free(al);
-   }
-   if (bl) {
-   free(bl);
-   }
+   free(al);
+   free(bl);
if (l.next) {
bdiff_freehunks(l.next);
}
diff --git a/mercurial/cext/manifest.c b/mercurial/cext/manifest.c
--- a/mercurial/cext/manifest.c
+++ b/mercurial/cext/manifest.c
@@ -190,10 +190,8 @@ static void lazymanifest_dealloc(lazyman
free(self->lines[i].start);
}
}
-   if (self->lines) {
-   free(self->lines);
-   self->lines = NULL;
-   }
+   free(self->lines);
+   self->lines = NULL;
if (self->pydata) {
Py_DECREF(self->pydata);
self->pydata = NULL;
diff --git a/mercurial/cext/revlog.c b/mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c
+++ b/mercurial/cext/revlog.c
@@ -319,10 +319,8 @@ static void _index_clearcaches(indexObje
PyMem_Free(self->offsets);
self->offsets = NULL;
}
-   if (self->nt) {
-   free(self->nt);
-   self->nt = NULL;
-   }
+   free(self->nt);
+   self->nt = NULL;
Py_CLEAR(self->headrevs);
 }
 

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D3667: graft: fix the help text to say `graft reapplies previous options`

2018-06-12 Thread Josef 'Jeff' Sipek
On Tue, Jun 12, 2018 at 12:27:04 +, pulkit (Pulkit Goyal) wrote:
> This revision was automatically updated to reflect the committed changes.
> Closed by commit rHGad4c5c0af6b5: graft: fix the help text to say `graft 
> reapplies previous options` (authored by pulkit, committed by ).
> 
> REPOSITORY
>   rHG Mercurial
> 
> CHANGES SINCE LAST UPDATE
>   https://phab.mercurial-scm.org/D3667?vs=9014=9021
> 
> REVISION DETAIL
>   https://phab.mercurial-scm.org/D3667
> 
> AFFECTED FILES
>   mercurial/commands.py
> 
> CHANGE DETAILS
> 
> diff --git a/mercurial/commands.py b/mercurial/commands.py
> --- a/mercurial/commands.py
> +++ b/mercurial/commands.py
> @@ -2161,10 +2161,7 @@
>  Once all conflicts are addressed, the graft process can be
>  continued with the -c/--continue option.
>  
> -.. note::
> -
> -   The -c/--continue option does not reapply earlier options, except
> -   for --force, --user and --date.
> +The -c/--continue option does reapply all the earlier options.

Minor nit:

s/does reapply/reapplies/

Jeff.

>  
>  .. container:: verbose
>  
> 
> 
> 
> To: pulkit, #hg-reviewers
> Cc: mercurial-devel
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
Si hoc legere scis nimium eruditionis habes.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D3177: wireproto: crude support for version 2 HTTP peer

2018-04-07 Thread Josef 'Jeff' Sipek
On Sat, Apr 07, 2018 at 05:53:50 +, indygreg (Gregory Szorc) wrote:
> indygreg created this revision.
> Herald added a subscriber: mercurial-devel.
> Herald added a reviewer: hg-reviewers.
> 
> REVISION SUMMARY
>   As part of implementing the server-side bits of the wire protocol
>   command handlers for version 2, we want a way to easily test those

This seems a bit confusing... is this wireproto2 or HTTP2?

(There are places in this diff with similar ambiguity.)

Jeff.

>   commands. Currently, we use the "httprequest" action of `hg
>   debugwireproto`. But this requires explicitly specifying the HTTP
>   request headers, low-level frame details, and the data structure
>   to encode with CBOR. That's a lot of boilerplate and a lot of it can
>   change as the wire protocol evolves.
>   
>   `hg debugwireproto` has a mechanism to issue commands via the peer
>   interface. That is *much* easier to use and we prefer to test with
>   that going forward.
>   
>   This commit implements enough parts of the peer API to send basic
>   requests via the HTTP version 2 transport.
>   
>   The peer code is super hacky. Again, the goal is to facilitate
>   server testing, not robustly implement a client. The client code
>   will receive love at a later time.
> 
> REPOSITORY
>   rHG Mercurial
> 
> REVISION DETAIL
>   https://phab.mercurial-scm.org/D3177
> 
> AFFECTED FILES
>   mercurial/debugcommands.py
>   mercurial/httppeer.py
>   tests/test-http-api-httpv2.t
>   tests/test-http-protocol.t
>   tests/wireprotohelpers.sh
> 
> CHANGE DETAILS
> 
> diff --git a/tests/wireprotohelpers.sh b/tests/wireprotohelpers.sh
> --- a/tests/wireprotohelpers.sh
> +++ b/tests/wireprotohelpers.sh
> @@ -5,6 +5,10 @@
>hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT/
>  }
>  
> +sendhttpv2peer() {
> +  hg --verbose debugwireproto --peer http2 http://$LOCALIP:$HGPORT/

-- 
Only two things are infinite, the universe and human stupidity, and I'm not
sure about the former.
- Albert Einstein
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D2948: wireproto: syntax for encoding CBOR into frames

2018-03-29 Thread Josef 'Jeff' Sipek
On Wed, Mar 28, 2018 at 22:06:17 +, indygreg (Gregory Szorc) wrote:
> indygreg updated this revision to Diff 7352.
> 
> REPOSITORY
>   rHG Mercurial
> 
> CHANGES SINCE LAST UPDATE
>   https://phab.mercurial-scm.org/D2948?vs=7327=7352
> 
> REVISION DETAIL
>   https://phab.mercurial-scm.org/D2948
> 
> AFFECTED FILES
>   mercurial/debugcommands.py
>   mercurial/utils/stringutil.py
>   mercurial/wireprotoframing.py
>   tests/test-wireproto-serverreactor.py
> 
> CHANGE DETAILS
> 
...
> diff --git a/mercurial/wireprotoframing.py b/mercurial/wireprotoframing.py
> --- a/mercurial/wireprotoframing.py
> +++ b/mercurial/wireprotoframing.py
> @@ -16,6 +16,7 @@
>  from .i18n import _
>  from .thirdparty import (
>  attr,
> +cbor,
>  )
>  from . import (
>  error,
> @@ -156,6 +157,9 @@
>  def makeframefromhumanstring(s):
>  """Create a frame from a human readable string
>  
> +DANGER: NOT SAFE TO USE WITH UNTRUSTED INPUT BECAUSE OF POTENTIAL
> +eval() USAGE. DO NOT USE IN CORE.
> +
>  Strings have the form:
>  
>   
> @@ -169,6 +173,11 @@
>  named constant.
>  
>  Flags can be delimited by `|` to bitwise OR them together.
> +
> +If the payload begins with ``cbor:``, the following string will be
> +evaluated as Python code and the resulting object will be fed into
> +a CBOR encoder. Otherwise, the payload is interpreted as a Python
> +byte string literal.
>  """

Um, why?  Why not just *always* wrap byte strings in CBOR?  The overhead
will be 1-9 bytes depending on the length (short strings will use 1-2 bytes
of overhead), then there's no need to prefix anything with "cbor:".  Any
string <4GB in size will incur 5 bytes overhead.  Which is (IMO) acceptable.

Jeff.

>  fields = s.split(b' ', 5)
>  requestid, streamid, streamflags, frametype, frameflags, payload = fields
> @@ -196,7 +205,11 @@
>  else:
>  finalflags |= int(flag)
>  
> -payload = stringutil.unescapestr(payload)
> +if payload.startswith(b'cbor:'):
> +payload = cbor.dumps(stringutil.evalpython(payload[5:]), 
> canonical=True)
> +
> +else:
> +payload = stringutil.unescapestr(payload)
>  
>  return makeframe(requestid=requestid, streamid=streamid,
>   streamflags=finalstreamflags, typeid=frametype,
> diff --git a/mercurial/utils/stringutil.py b/mercurial/utils/stringutil.py
> --- a/mercurial/utils/stringutil.py
> +++ b/mercurial/utils/stringutil.py
> @@ -9,6 +9,7 @@
>  
>  from __future__ import absolute_import
>  
> +import __future__
>  import codecs
>  import re as remod
>  import textwrap
> @@ -286,3 +287,29 @@
>  If s is not a valid boolean, returns None.
>  """
>  return _booleans.get(s.lower(), None)
> +
> +def evalpython(s):
> +"""Evaluate a string containing a Python expression.
> +
> +THIS FUNCTION IS NOT SAFE TO USE ON UNTRUSTED INPUT. IT'S USE SHOULD BE
> +LIMITED TO DEVELOPER-FACING FUNCTIONALITY.
> +"""
> +globs = {
> +r'__builtins__': {
> +r'None': None,
> +r'False': False,
> +r'True': True,
> +r'int': int,
> +r'set': set,
> +r'tuple': tuple,
> +# Don't need to expose dict and list because we can use
> +# literals.
> +},
> +}
> +
> +# We can't use eval() directly because it inherits compiler
> +# flags from this module and we need unicode literals for Python 3
> +# compatibility.
> +code = compile(s, r'', r'eval',
> +   __future__.unicode_literals.compiler_flag, True)
> +return eval(code, globs, {})
> diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py
> --- a/mercurial/debugcommands.py
> +++ b/mercurial/debugcommands.py
> @@ -2785,7 +2785,10 @@
>  or a flag name for stream flags or frame flags, respectively. Values are
>  resolved to integers and then bitwise OR'd together.
>  
> -``payload`` is is evaluated as a Python byte string literal.
> +``payload`` represents the raw frame payload. If it begins with
> +``cbor:``, the following string is evaluated as Python code and the
> +resulting object is fed into a CBOR encoder. Otherwise it is interpreted
> +as a Python byte string literal.
>  """
>  opts = pycompat.byteskwargs(opts)
>  
> 
> 
> 
> To: indygreg, #hg-reviewers
> Cc: mercurial-devel
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
Si hoc legere scis nimium eruditionis habes.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] diff: do not split function name if character encoding is unknown

2018-02-23 Thread Josef 'Jeff' Sipek
On Fri, Feb 23, 2018 at 23:53:18 +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara 
> # Date 1519394998 -32400
> #  Fri Feb 23 23:09:58 2018 +0900
> # Node ID 98cfd7926442dc0a649e0359455ad6962815bd13
> # Parent  b8d0761a85c7421071750de23228415306852d69
> diff: do not split function name if character encoding is unknown
> 
> Only ASCII characters can be split reliably at any byte positions, so let's
> just leave long multi-byte sequence long. It's probably less bad than putting
> an invalid byte sequence into a diff.
> 
> This doesn't try to split the first ASCII slice from multi-byte sequence
> because a combining character may follow.

I like it!

Thanks,

Jeff.

> 
> diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
> --- a/mercurial/mdiff.py
> +++ b/mercurial/mdiff.py
> @@ -13,6 +13,7 @@ import zlib
>  
>  from .i18n import _
>  from . import (
> +encoding,
>  error,
>  policy,
>  pycompat,
> @@ -348,7 +349,11 @@ def _unidiff(t1, t2, opts=defaultopts):
>  # alphanumeric char.
>  for i in xrange(astart - 1, lastpos - 1, -1):
>  if l1[i][0:1].isalnum():
> -func = ' ' + l1[i].rstrip()[:40]
> +func = b' ' + l1[i].rstrip()
> +# split long function name if ASCII. otherwise we have no
> +# idea where the multi-byte boundary is, so just leave 
> it.
> +if encoding.isasciistr(func):
> +func = func[:41]
>  lastfunc[1] = func
>  break
>  # by recording this hunk's starting point as the next place to
> diff --git a/tests/test-diff-unified.t b/tests/test-diff-unified.t
> --- a/tests/test-diff-unified.t
> +++ b/tests/test-diff-unified.t
> @@ -386,3 +386,73 @@ If [diff] git is set to true, but the us
> }
>  
>$ cd ..
> +
> +Long function names should be abbreviated, but multi-byte character shouldn't
> +be broken up
> +
> +  $ hg init longfunc
> +  $ cd longfunc
> +
> +  >>> with open('a', 'wb') as f:
> +  ... f.write(b'a' * 39 + b'bb' + b'\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b' 0 b\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b'a' * 39 + b'\xc3\xa0' + b'\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b' 0 a with grave (single code point)\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b'a' * 39 + b'a\xcc\x80' + b'\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b' 0 a with grave (composition)\n')
> +  ... f.write(b' .\n' * 3)
> +  $ hg ci -qAm0
> +
> +  >>> with open('a', 'wb') as f:
> +  ... f.write(b'a' * 39 + b'bb' + b'\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b' 1 b\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b'a' * 39 + b'\xc3\xa0' + b'\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b' 1 a with grave (single code point)\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b'a' * 39 + b'a\xcc\x80' + b'\n')
> +  ... f.write(b' .\n' * 3)
> +  ... f.write(b' 1 a with grave (composition)\n')
> +  ... f.write(b' .\n' * 3)
> +  $ hg ci -m1
> +
> +  $ hg diff -c1 --nodates --show-function
> +  diff -r 3e92dd6fa812 -r a256341606cb a
> +  --- a/a
> +  +++ b/a
> +  @@ -2,7 +2,7 @@ aaab
> +.
> +.
> +.
> +  - 0 b
> +  + 1 b
> +.
> +.
> +.
> +  @@ -10,7 +10,7 @@ aaa\xc3\xa0 (esc)
> +.
> +.
> +.
> +  - 0 a with grave (single code point)
> +  + 1 a with grave (single code point)
> +.
> +.
> +.
> +  @@ -18,7 +18,7 @@ \xcc\x80 (esc)
> +.
> +.
> +.
> +  - 0 a with grave (composition)
> +  + 1 a with grave (composition)
> +.
> +.
> +.
> +
> +  $ cd ..

-- 
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like
that.
- Linus Torvalds
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] mdiff: split on unicode character boundaries when shortening function name

2018-02-22 Thread Josef 'Jeff' Sipek
On Fri, Feb 23, 2018 at 01:06:28 +0900, Yuya Nishihara wrote:
> On Thu, 22 Feb 2018 10:01:00 -0500, Josef 'Jeff' Sipek wrote:
...
> > Yeah... I thought that might be an issue.  The code in the 'except' is meant
> > as best-effort -
> 
> Ok, I didn't notice that. It's indeed better to catch the UnicodeError.
> 
> That said, UTF-8 is well designed encoding, we can easily find the nearest
> multi-byte boundary by looking back a couple of bytes.
> 
> https://en.wikipedia.org/wiki/UTF-8#Description

Right, but isn't this code required to handle any-to-any situation?  That
is, the versioned data can be in any encoding, and the terminal can be in
any encoding.  Currently, the code "handles" it by just copying bytes.  This
obviously breaks down the moment multi-byte characters show up.

UTF-8 being resilient is a good thing, but IMO that justifies leaving the
code alone.

I don't know if there is some weird variable length encoding (other than
UTF-8) out there that hg needs to handle.

> > if there is any UTF-8 issue decoding/encoding, just fall
> > back to previous method.  That of course wouldn't help if the input happened
> > to be valid UTF-8 but wasn't actually UTF-8.
> > 
> > I had to do the encode step, otherwise I got a giant stack trace saying that
> > unicode strings cannot be  using ascii encoder.
> > (Leaving it un-encoded would also mean that this for loop would output
> > either a unicode string or a raw string - which seems unclean.)
> > 
> > I'm not really sure how to proceed.  Most UTF-8 decoders should handle the
> > illegal byte sequence ok, but it still feels wrong to let it make a mess of
> > valid data.  The answer might be to just ignore this issue.  :|
> 
> As an old Linux user, I would say yeah, don't bother about non-ascii 
> characters,
> it's just bytes. Alternatively, maybe we could take it a UTF-8 sequence and 
> find
> a possible boundary, but I'm not sure if it's a good idea.

As in: implement a UTF-8 decoder to "seek" to the right place?  Eh.

I'm looking forward to the day when everything is only Unicode, but that'll
be a while...

Jeff.

-- 
Only two things are infinite, the universe and human stupidity, and I'm not
sure about the former.
- Albert Einstein
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] mdiff: split on unicode character boundaries when shortening function name

2018-02-22 Thread Josef 'Jeff' Sipek
On Thu, Feb 22, 2018 at 21:01:44 +0900, Yuya Nishihara wrote:
> On Thu, 22 Feb 2018 09:01:14 +0100, Denis Laxalde wrote:
> > Josef 'Jeff' Sipek wrote:
> > > # HG changeset patch
> > > # User Josef 'Jeff' Sipek <jef...@josefsipek.net>
> > > # Date 1519251311 18000
> > > #  Wed Feb 21 17:15:11 2018 -0500
> > > # Node ID b99df94fdd4813e0ce538a8caa682802da4a6cb2
> > > # Parent  106872aa15af9919220705ed72c78459774e1575
> > > mdiff: split on unicode character boundaries when shortening function name
> > > 
> > > Splitting the raw bytes may lead to truncating the string in the middle 
> > > of a
> > > UTF-8 character which would lead to the generated diff containing an 
> > > invalid
> > > byte sequence even though the original data had none.  For example, the
> > > Unicode codepoint U+308B (る) gets represented as \xe3\x82\x8b in UTF-8.
> > > Before this change a diff on i18n/ja.po would yield:
> > > 
> > >   @@ -28953,7 +28953,7 @@ msgstr "Mercurial と SSH を併用す<82>
> > > 
> > > After this change, the output is cleaner:
> > > 
> > >   @@ -28953,7 +28953,7 @@ msgstr "Mercurial と SSH を併用する場合の注意点:"
> > > 
> > > diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
> > > --- a/mercurial/mdiff.py
> > > +++ b/mercurial/mdiff.py
> > > @@ -348,7 +348,12 @@ def _unidiff(t1, t2, opts=defaultopts):
> > >  # alphanumeric char.
> > >  for i in xrange(astart - 1, lastpos - 1, -1):
> > >  if l1[i][0:1].isalnum():
> > > -func = ' ' + l1[i].rstrip()[:40]
> > > +func = l1[i].rstrip()
> > > +try:
> > > +func = func.decode("utf-8")[:40].encode("utf-8")
> > > +except:
> > > +func = func[:40]
> > 
> > I'd suggest catching exception types explicitly (UnicodeDecodeError and
> > UnicodeEncodeError I guess) and avoid "bare" except.
> > 
> > (No idea if the change itself is correct by the way.)
> 
> Nah, it's wrong to assume external world is always UTF-8. It might be okayish
> to split func as a UTF-8 as best effort, but which shouldn't involve encoding
> conversion.

Yeah... I thought that might be an issue.  The code in the 'except' is meant
as best-effort - if there is any UTF-8 issue decoding/encoding, just fall
back to previous method.  That of course wouldn't help if the input happened
to be valid UTF-8 but wasn't actually UTF-8.

I had to do the encode step, otherwise I got a giant stack trace saying that
unicode strings cannot be  using ascii encoder.
(Leaving it un-encoded would also mean that this for loop would output
either a unicode string or a raw string - which seems unclean.)

I'm not really sure how to proceed.  Most UTF-8 decoders should handle the
illegal byte sequence ok, but it still feels wrong to let it make a mess of
valid data.  The answer might be to just ignore this issue.  :|

Jeff.

-- 
All science is either physics or stamp collecting.
- Ernest Rutherford
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH] mdiff: split on unicode character boundaries when shortening function name

2018-02-21 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek <jef...@josefsipek.net>
# Date 1519251311 18000
#  Wed Feb 21 17:15:11 2018 -0500
# Node ID b99df94fdd4813e0ce538a8caa682802da4a6cb2
# Parent  106872aa15af9919220705ed72c78459774e1575
mdiff: split on unicode character boundaries when shortening function name

Splitting the raw bytes may lead to truncating the string in the middle of a
UTF-8 character which would lead to the generated diff containing an invalid
byte sequence even though the original data had none.  For example, the
Unicode codepoint U+308B (る) gets represented as \xe3\x82\x8b in UTF-8.
Before this change a diff on i18n/ja.po would yield:

@@ -28953,7 +28953,7 @@ msgstr "Mercurial と SSH を併用す<82>

After this change, the output is cleaner:

@@ -28953,7 +28953,7 @@ msgstr "Mercurial と SSH を併用する場合の注意点:"

diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
--- a/mercurial/mdiff.py
+++ b/mercurial/mdiff.py
@@ -348,7 +348,12 @@ def _unidiff(t1, t2, opts=defaultopts):
 # alphanumeric char.
 for i in xrange(astart - 1, lastpos - 1, -1):
 if l1[i][0:1].isalnum():
-func = ' ' + l1[i].rstrip()[:40]
+func = l1[i].rstrip()
+try:
+func = func.decode("utf-8")[:40].encode("utf-8")
+except:
+func = func[:40]
+func = ' ' + func
 lastfunc[1] = func
 break
 # by recording this hunk's starting point as the next place to
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH] help: fix wording describing SSH requirements

2018-02-21 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek <jef...@josefsipek.net>
# Date 1519249869 18000
#  Wed Feb 21 16:51:09 2018 -0500
# Node ID 106872aa15af9919220705ed72c78459774e1575
# Parent  c8891cc3fa9ec855a3bdefd3dd759d19927c6b85
help: fix wording describing SSH requirements

diff --git a/i18n/da.po b/i18n/da.po
--- a/i18n/da.po
+++ b/i18n/da.po
@@ -15949,7 +15949,7 @@ msgstr ""
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr ""
diff --git a/i18n/de.po b/i18n/de.po
--- a/i18n/de.po
+++ b/i18n/de.po
@@ -20637,7 +20637,7 @@ msgstr "Einige Hinweise zur Nutzung von 
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr ""
diff --git a/i18n/el.po b/i18n/el.po
--- a/i18n/el.po
+++ b/i18n/el.po
@@ -631,7 +631,7 @@ msgstr ""
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr ""
diff --git a/i18n/fr.po b/i18n/fr.po
--- a/i18n/fr.po
+++ b/i18n/fr.po
@@ -691,7 +691,7 @@ msgstr ""
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr ""
diff --git a/i18n/it.po b/i18n/it.po
--- a/i18n/it.po
+++ b/i18n/it.po
@@ -12053,7 +12053,7 @@ msgstr ""
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr ""
diff --git a/i18n/ja.po b/i18n/ja.po
--- a/i18n/ja.po
+++ b/i18n/ja.po
@@ -28953,7 +28953,7 @@ msgstr "Mercurial と SSH を併用すã‚
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr ""
diff --git a/i18n/pt_BR.po b/i18n/pt_BR.po
--- a/i18n/pt_BR.po
+++ b/i18n/pt_BR.po
@@ -33277,7 +33277,7 @@ msgstr "Algumas notas sobre o uso de SSH
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr ""
diff --git a/i18n/ro.po b/i18n/ro.po
--- a/i18n/ro.po
+++ b/i18n/ro.po
@@ -14141,7 +14141,7 @@ msgstr ""
 
 msgid ""
 "- SSH requires an accessible shell account on the destination machine\n"
-"  and a copy of hg in the remote path or specified with as remotecmd.\n"
+"  and a copy of hg in the remote path or specified with remotecmd.\n"
 "- path is relative to the remote user's home directory by default. Use\n"
 "  an extra slash at the start of a path to specify an absolute path::"
 msgstr &qu

[PATCH] gpg: print unknown key IDs in their entirety

2018-02-11 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek <jef...@josefsipek.net>
# Date 1518391957 18000
#  Sun Feb 11 18:32:37 2018 -0500
# Node ID 0c3e67adde02590c1d8882ba7050d19ff41ba7ff
# Parent  f91b7f26c68ac87961aa6ef883ba96e5a2822ad3
gpg: print unknown key IDs in their entirety

Shortening the key is nice in theory but it results in ambiguity which can
be exploited.  Therefore, when encountering an unknown key ID we should
print the whole ID returned by gpg.  This may or may not be the whole key,
however it will match the user preference set in gpg configuration.

Furthermore, the key ID shortening had a couple of issues:

(1) it truncated the key ID (dropping the last digit and outputting only 15
hex digits) making it very hard to find the correct key on a key server

(2) since only 15 digits were fed into shortkey(), it always emitted the
ui.debug() warning

diff --git a/hgext/gpg.py b/hgext/gpg.py
--- a/hgext/gpg.py
+++ b/hgext/gpg.py
@@ -153,8 +153,7 @@ def getkeys(ui, repo, mygpg, sigdata, co
 # warn for expired key and/or sigs
 for key in keys:
 if key[0] == "ERRSIG":
-ui.write(_("%s Unknown key ID \"%s\"\n")
- % (prefix, shortkey(ui, key[1][:15])))
+ui.write(_("%s Unknown key ID \"%s\"\n") % (prefix, key[1]))
 continue
 if key[0] == "BADSIG":
 ui.write(_("%s Bad signature from \"%s\"\n") % (prefix, key[2]))
@@ -320,13 +319,6 @@ def _dosign(ui, repo, *revs, **opts):
 except ValueError as inst:
 raise error.Abort(str(inst))
 
-def shortkey(ui, key):
-if len(key) != 16:
-ui.debug("key ID \"%s\" format error\n" % key)
-return key
-
-return key[-8:]
-
 def node2txt(repo, node, ver):
 """map a manifest into some text"""
 if ver == "0":

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 3 topic-experiment] tutorial: word wrap long lines

2017-07-09 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek <jef...@josefsipek.net>
# Date 1499599407 -10800
#  Sun Jul 09 14:23:27 2017 +0300
# Branch stable
# Node ID c497d62fdcea9861369a15896a1f59d6f3787765
# Parent  db3830646e34220cfcac0837a33f9a8503dea5d2
tutorial: word wrap long lines

diff --git a/tests/test-topic-tutorial.t b/tests/test-topic-tutorial.t
--- a/tests/test-topic-tutorial.t
+++ b/tests/test-topic-tutorial.t
@@ -57,7 +57,8 @@ a topic. Creating a new topic is done us
 
   $ hg topic food
 
-As for named branch, our topic is active but it does not contains any 
changesets yet::
+As for named branch, our topic is active but it does not contains any
+changesets yet::
 
   $ hg topic
* food
@@ -117,7 +118,8 @@ And future commit will be part of that t
  summary: adding condiments
   
 
-We can get a compact view of the content of our topic using the ``stack`` 
command::
+We can get a compact view of the content of our topic using the ``stack``
+command::
 
   $ hg stack
   ### topic: food
@@ -133,7 +135,8 @@ The topic desactivate when we update awa
   $ hg topic
  food
 
-Note that ``default`` (name of the branch) now refers to the tipmost changeset 
of default without a topic::
+Note that ``default`` (name of the branch) now refers to the tipmost
+changeset of default without a topic::
 
   $ hg log --graph
   o  changeset:   2:287de11b401f
@@ -163,7 +166,8 @@ And updating back to the topic reactivat
   $ hg topic
* food
 
-The name used for updating does not affect the activation of the topic, 
updating to a revision part of a topic will activate it in all cases::
+The name used for updating does not affect the activation of the topic,
+updating to a revision part of a topic will activate it in all cases::
 
   $ hg up default
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
@@ -190,7 +194,8 @@ The name used for updating does not affe
   $ hg commit -A -m "Adding clothes"
   $ cd ../client
 
-Topic will also affect rebase and merge destination. Let's pull the latest 
update from the main server::
+Topic will also affect rebase and merge destination. Let's pull the latest
+update from the main server::
 
   $ hg pull
   pulling from $TESTTMP/server (glob)
@@ -226,7 +231,8 @@ Topic will also affect rebase and merge 
  summary: Shopping list
   
 
-The topic head will not be considered when merge from the new head of the 
branch::
+The topic head will not be considered when merge from the new head of the
+branch::
 
   $ hg up default
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
@@ -367,7 +373,8 @@ The information ``hg stack`` command ada
   t1: Adding hammer
 ^ adding fruits
 
-They are seen as independant branch by Mercurial. No rebase or merge betwen 
them will be attempted by default::
+They are seen as independant branch by Mercurial. No rebase or merge betwen
+them will be attempted by default::
 
   $ hg rebase
   nothing to rebase
@@ -400,7 +407,8 @@ Lets see what other people did in the me
   added 2 changesets with 2 changes to 1 files (+1 heads)
   (run 'hg heads' to see heads)
 
-There is new changes! We can simply use ``hg rebase`` to update our changeset 
on top of the latest::
+There is new changes! We can simply use ``hg rebase`` to update our
+changeset on top of the latest::
 
   $ hg rebase
   rebasing 6:183984ef46d1 "Adding hammer"
@@ -411,13 +419,15 @@ There is new changes! We can simply use 
   rebasing 8:34255b455dac "Adding drill"
   merging shopping
 
-But what about the other topic? You can use 'hg topic --verbose' to see 
information about them::
+But what about the other topic? You can use 'hg topic --verbose' to see
+information about them::
 
   $ hg topic --verbose
  drinks (on branch: default, 2 changesets, 2 behind)
* tools  (on branch: default, 3 changesets)
 
-The "2 behind" is telling you that there is 2 new changesets on the named 
branch of the topic. You need to merge or rebase to incorporate them.
+The "2 behind" is telling you that there is 2 new changesets on the named
+branch of the topic. You need to merge or rebase to incorporate them.
 
 Pushing that topic would create a new heads will be prevented::
 
@@ -429,7 +439,8 @@ Pushing that topic would create a new he
   [255]
 
 
-Even after a rebase Pushing all active topics at the same time will complains 
about the multiple heads it would create on that branch::
+Even after a rebase Pushing all active topics at the same time will
+complains about the multiple heads it would create on that branch::
 
   $ hg rebase -b drinks
   rebasing 9:8dfa45bd5e0c "Adding apple juice"
@@ -445,7 +456,8 @@ Even after a rebase Pushing all active t
   (merge or see 'hg help push' for details about pushing new heads)
   [255]
 
-Publishing only one of them is allowed (as long as it does not create a new 
branch head has we just saw in the previous case)::
+Publishing only o

[PATCH 3 of 3 topic-experiment] tutorial: fix grammar and spelling

2017-07-09 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek <jef...@josefsipek.net>
# Date 1499601692 -10800
#  Sun Jul 09 15:01:32 2017 +0300
# Branch stable
# Node ID d3297cb2c810432906d8cf4b45e412f2f87241b0
# Parent  c497d62fdcea9861369a15896a1f59d6f3787765
tutorial: fix grammar and spelling

diff --git a/tests/test-topic-tutorial.t b/tests/test-topic-tutorial.t
--- a/tests/test-topic-tutorial.t
+++ b/tests/test-topic-tutorial.t
@@ -42,7 +42,7 @@ their unfinished work.
 Topic Basics
 
 
-Let's says we use Mercurial to manage our shopping list::
+Let's say we use Mercurial to manage our shopping list::
 
   $ hg log --graph
   @  changeset:   0:38da43f0a2ea
@@ -52,12 +52,12 @@ Let's says we use Mercurial to manage ou
  summary: Shopping list
   
 
-We are about to do some edition to this list and would like to do them within
-a topic. Creating a new topic is done using the ``topic`` command::
+We are about to make some additions to this list and would like to do them
+within a topic. Creating a new topic is done using the ``topic`` command::
 
   $ hg topic food
 
-As for named branch, our topic is active but it does not contains any
+Much like a named branch, our topic is active but it does not contain any
 changesets yet::
 
   $ hg topic
@@ -95,7 +95,7 @@ Our next commit will be part of the acti
  summary: adding condiments
   
 
-And future commit will be part of that topic too::
+And future commits will be part of that topic too::
 
   $ cat >> shopping << EOF
   > Bananas
@@ -128,7 +128,7 @@ command::
   t1: adding condiments
 ^ Shopping list
 
-The topic desactivate when we update away from it::
+The topic deactivates when we update away from it::
 
   $ hg up default
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
@@ -158,7 +158,7 @@ changeset of default without a topic::
  summary: Shopping list
   
 
-And updating back to the topic reactivate it::
+And updating back to the topic reactivates it::
 
   $ hg up food
   switching to topic food
@@ -166,8 +166,8 @@ And updating back to the topic reactivat
   $ hg topic
* food
 
-The name used for updating does not affect the activation of the topic,
-updating to a revision part of a topic will activate it in all cases::
+Updating to any changeset that is part of a topic activates the topic
+regardless of how the revision was specified::
 
   $ hg up default
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
@@ -194,8 +194,8 @@ updating to a revision part of a topic w
   $ hg commit -A -m "Adding clothes"
   $ cd ../client
 
-Topic will also affect rebase and merge destination. Let's pull the latest
-update from the main server::
+The topic will also affect the rebase and the merge destinations. Let's pull
+the latest update from the main server::
 
   $ hg pull
   pulling from $TESTTMP/server (glob)
@@ -231,7 +231,7 @@ update from the main server::
  summary: Shopping list
   
 
-The topic head will not be considered when merge from the new head of the
+The topic head will not be considered when merging from the new head of the
 branch::
 
   $ hg up default
@@ -278,7 +278,7 @@ But the topic will see that branch head 
  summary: Shopping list
   
 
-The topic information will fade out when we publish the changesets::
+The topic information will disappear when we publish the changesets::
 
   $ hg topic
* food
@@ -321,12 +321,12 @@ The topic information will fade out when
 Working with Multiple Topics
 
 
-In the above example, topic are not bring much benefit since you only have one
-line of developement. Topic start to be more useful when you have to work on
-multiple features are the same time.
+In the above example, topics do not bring much benefit since you only have one
+line of development. Topics start to be more useful when you have to work on
+multiple features at the same time.
 
 We might go shopping in a hardware store in the same go, so let's add some
-tools to the shopping list withing a new topic::
+tools to the shopping list within a new topic::
 
   $ hg topic tools
   $ echo hammer >> shopping
@@ -336,9 +336,9 @@ tools to the shopping list withing a new
   $ echo drill >> shopping
   $ hg ci -m 'Adding drill'
 
-But are not sure to actually go in the hardward store, so in the meantime, we
-want to extend the list with drinks. We go back to the official default branch
-and start a new topic::
+But we are not sure we will actually go to the hardware store, so in the
+meantime, we want to extend the list with drinks. We go back to the official
+default branch and start a new topic::
 
   $ hg up default
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
@@ -354,7 +354,7 @@ We now have two topics::
* drinks
  tools
 
-The information ``hg stack`` command adapt to the active topic::
+The information displayed by ``hg stack`` adapts to the 

[PATCH 1 of 3 topic-experiment] tutorial: use rm instead of 'hg rm' for an untracked temporary file

2017-07-09 Thread Josef 'Jeff' Sipek
# HG changeset patch
# User Josef 'Jeff' Sipek <jef...@josefsipek.net>
# Date 1499599224 -10800
#  Sun Jul 09 14:20:24 2017 +0300
# Branch stable
# Node ID db3830646e34220cfcac0837a33f9a8503dea5d2
# Parent  61e73c8fe169717105e832b23086683848a9ef53
tutorial: use rm instead of 'hg rm' for an untracked temporary file

diff --git a/tests/test-topic-tutorial.t b/tests/test-topic-tutorial.t
--- a/tests/test-topic-tutorial.t
+++ b/tests/test-topic-tutorial.t
@@ -385,9 +385,7 @@ They are seen as independant branch by M
   $ echo 'Coat' > shopping
   $ echo 'Shoes' >> shopping
   $ cat foo >> shopping
-  $ hg rm foo
-  not removing foo: file is untracked
-  [1]
+  $ rm foo
   $ hg ci -m 'add a pair of shoes'
   $ cd ../client
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3] color: also enable by default on windowso

2017-04-17 Thread Josef 'Jeff' Sipek
On Sun, Apr 16, 2017 at 02:55:27 +0200, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David 
> # Date 1492302848 -7200
> #  Sun Apr 16 02:34:08 2017 +0200
> # Node ID 9d4f7de0a25c91103f6419c4675bcb21ad7c5098
> # Parent  7ec1415a4de91a5c56e549cd4a40cb8e9411f8d9
> # EXP-Topic color
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> #  hg pull 
> https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 9d4f7de0a25c
> color: also enable by default on windowso

Typo in "windows".

Jeff.

> 
> I've not found anything related to color + windows on the bug tracker. So I'm
> suggesting we get bolder and turn it on for windows too in the release
> candidate. We can always backout that changeset if we find serious issue on
> windows.
> 
> diff --git a/mercurial/color.py b/mercurial/color.py
> --- a/mercurial/color.py
> +++ b/mercurial/color.py
> @@ -45,7 +45,7 @@ except ImportError:
>  curses = None
>  _baseterminfoparams = {}
>  
> -_enabledbydefault = pycompat.osname != 'nt'
> +_enabledbydefault = True
>  
>  # start and stop parameters for effects
>  _effects = {
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

-- 
I think there is a world market for maybe five computers.
- Thomas Watson, chairman of IBM, 1943.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel