How can you end up with a discrepancy between branchmap and heads?

2024-05-04 Thread Mike Hommey
Hi,

I'm trying to reproduce a peculiarity of the (former) pypy mercurial
repository (https://foss.heptapod.net/pypy/pypy) but I can't manage to
reproduce locally, even if I stream-clone the original repository (with
hg --config experimental.evolution=true clone --stream $url).

So, what's peculiar about the repository is that the following changesets
aren't listed in its heads (https://foss.heptapod.net/pypy/pypy?cmd=heads):
  96b962f458200a4757ea4a5ddf00b0ef73f267a5
  dd73cf12b2d2d1b0c8420fe5d64b3a372db20a29
  9787f52124b1bf6086b1286bb16fcbd2cca4ed80
  d5ab4175e409e094208483118daa5a7bffd37141
  2eb83a0fb3e9c1b5712aea9d767c3a775b3aaf85

but they appear in the branchmap
(https://foss.heptapod.net/pypy/pypy?cmd=branchmap).

If you clone the repository, you don't end up with them locally, but if you
hg pull -r $sha1, you can get them.

So far, I've ruled out obsolescence markers and phases being a possible
source. I'm trying to generate a minimal reproducer to write a testcase
for git-cinnabar. I guess, worst case scenario, I could change the listed
heads via an extension? But I'm also curious what's up with the original
repository. Bad cache?

I know of other cases where that can happen, namely, when the head of a
named branch is merged into another branch and is thus not a topological
head. But that's not the case here.

Mike
___
Mercurial-devel mailing list
Mercurial-devel@lists.mercurial-scm.org
https://lists.mercurial-scm.org/mailman/listinfo/mercurial-devel


Outdated homepage and repository

2024-05-04 Thread Mike Hommey
Hi,

As of writing, the www.mercurial-scm.org homepage is still showing
version 6.6.3 as download, while the latest release is 6.7.2, although it's
listed on https://www.mercurial-scm.org/downloads (I guess it could be
related to mac and windows builds not being available?)

Similarly, the main repository (https://repo.mercurial-scm.org/hg/) is
outdated. Well, it's not /really/ outdated, but when you `hg clone` it,
your work tree ends up on a changeset from two years ago (290c29df1915)
because that's where the @ bookmark is pointing. You should probably either
update the bookmark, or remove it.

While I'm here, the mailing list archives are returning 403 Forbidden
errors (they were working a couple days ago).

Mike

___
Mercurial-devel mailing list
Mercurial-devel@lists.mercurial-scm.org
https://lists.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3964: macosx: fixing macOS version generation after db9d1dd01bf0

2018-07-18 Thread glandium (Mike Hommey)
glandium added a comment.


  This is python code, why is it parsing the version file instead of importing 
it?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3964

To: rdamazio, #hg-reviewers
Cc: glandium, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3958: Allow to run setup.py with python 3 without a mercurial checkout

2018-07-16 Thread glandium (Mike Hommey)
glandium created this revision.
glandium added a reviewer: indygreg.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Some people may want to test mercurial in a python 3 environment through e.g. 
pip, in which case setup.py doesn't run in a mercurial checkout, so the hack in 
setup.py to allow python 3 cannot be overcome.
  
  This change allows a manual override with the HGPYTHON3 environment variable.
  
  Additionally, when for some reason the version is unknown (for crazy people 
like me, who have a git checkout of the mercurial repo), the version variable 
ends up being an unicode string, which fails the `isinstance(version, bytes)` 
assertion. So fix that at the same time.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3958

AFFECTED FILES
  setup.py

CHANGE DETAILS

diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -74,7 +74,7 @@
 badpython = True
 
 # Allow Python 3 from source checkouts.
-if os.path.isdir('.hg'):
+if os.path.isdir('.hg') or 'HGPYTHON3' in os.environ:
 badpython = False
 
 if badpython:
@@ -369,7 +369,7 @@
 from mercurial import __version__
 version = __version__.version
 except ImportError:
-version = 'unknown'
+version = b'unknown'
 finally:
 if oldpolicy is None:
 del os.environ['HGMODULEPOLICY']



To: glandium, indygreg, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2057: rust implementation of hg status

2018-03-07 Thread glandium (Mike Hommey)
glandium added a comment.


  Doesn't mononoke have code to read revlogs already?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: glandium, krbullock, indygreg, durin42, kevincox, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D1846: rust: avoid redundant 'static lifetime

2018-01-11 Thread glandium (Mike Hommey)
glandium added a comment.


  > but it's okay to depend on backports or rustup to /build/ packages
  
  The only exception where it's okay is, essentially, Firefox. With a backport 
of the rust compiler landing in stable about once a year, and even then it 
won't replace the version in stable. See the gcc-mozilla package in Debian 
wheezy.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1846

To: indygreg, #hg-reviewers, durin42
Cc: glandium, durin42, yuja, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D1846: rust: avoid redundant 'static lifetime

2018-01-11 Thread glandium (Mike Hommey)
glandium added a comment.


  >   To my understanding, as long as we're only using the stable channel, we 
should be fine for the binaries we're building being packageable even on 
slower-moving distros like Debian.
  
  Slow-moving distros like Debian don't update the rust compiler. Debian stable 
is stuck on rustc 1.14 until Debian buster (next year?).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1846

To: indygreg, #hg-reviewers, durin42
Cc: glandium, durin42, yuja, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D329: setup: Fix installing in a mingw environment

2017-08-12 Thread glandium (Mike Hommey)
This revision was automatically updated to reflect the committed changes.
Closed by commit rHG7686cbb0ba41: setup: fix installing in a mingw environment 
(authored by glandium).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D329?vs=751=828

REVISION DETAIL
  https://phab.mercurial-scm.org/D329

AFFECTED FILES
  setup.py

CHANGE DETAILS

diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -784,11 +784,11 @@
 from distutils import cygwinccompiler
 
 # the -mno-cygwin option has been deprecated for years
-compiler = cygwinccompiler.Mingw32CCompiler
+mingw32compilerclass = cygwinccompiler.Mingw32CCompiler
 
 class HackedMingw32CCompiler(cygwinccompiler.Mingw32CCompiler):
 def __init__(self, *args, **kwargs):
-compiler.__init__(self, *args, **kwargs)
+mingw32compilerclass.__init__(self, *args, **kwargs)
 for i in 'compiler compiler_so linker_exe linker_so'.split():
 try:
 getattr(self, i).remove('-mno-cygwin')
@@ -809,11 +809,11 @@
 # effect.
 from distutils import msvccompiler
 
-compiler = msvccompiler.MSVCCompiler
+msvccompilerclass = msvccompiler.MSVCCompiler
 
 class HackedMSVCCompiler(msvccompiler.MSVCCompiler):
 def initialize(self):
-compiler.initialize(self)
+msvccompilerclass.initialize(self)
 # "warning LNK4197: export 'func' specified multiple times"
 self.ldflags_shared.append('/ignore:4197')
 self.ldflags_shared_debug.append('/ignore:4197')



To: glandium, #hg-reviewers, quark
Cc: durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D329: setup: Fix installing in a mingw environment

2017-08-11 Thread glandium (Mike Hommey)
glandium added a comment.


  4.3.1 can't currently be installed with MINGW64 python currently because of 
this, so I'd say yes.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D329

To: glandium, #hg-reviewers, quark
Cc: durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D330: Backed out changeset c34532365b38

2017-08-10 Thread glandium (Mike Hommey)
This revision was automatically updated to reflect the committed changes.
Closed by commit rHG1814ca418b30: branchmap: revert c34532365b38 for Python 2.7 
compatibility (authored by glandium).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D330?vs=752=755

REVISION DETAIL
  https://phab.mercurial-scm.org/D330

AFFECTED FILES
  mercurial/branchmap.py

CHANGE DETAILS

diff --git a/mercurial/branchmap.py b/mercurial/branchmap.py
--- a/mercurial/branchmap.py
+++ b/mercurial/branchmap.py
@@ -406,7 +406,8 @@
 
 # fast path: extract data from cache, use it if node is matching
 reponode = changelog.node(rev)[:_rbcnodelen]
-cachenode, branchidx = unpack_from(_rbcrecfmt, self._rbcrevs, 
rbcrevidx)
+cachenode, branchidx = unpack_from(
+_rbcrecfmt, util.buffer(self._rbcrevs), rbcrevidx)
 close = bool(branchidx & _rbccloseflag)
 if close:
 branchidx &= _rbcbranchidxmask



To: glandium, #hg-reviewers, quark, indygreg
Cc: indygreg, quark, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D330: Backed out changeset c34532365b38

2017-08-10 Thread glandium (Mike Hommey)
glandium created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Old versions of python 2.7 don't like that the second argument to
  struct.unpack_from is a bytearray, so the change removing the util.buffer
  around that argument in branchmap broke running on older versions of python
  2.7.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D330

AFFECTED FILES
  mercurial/branchmap.py

CHANGE DETAILS

diff --git a/mercurial/branchmap.py b/mercurial/branchmap.py
--- a/mercurial/branchmap.py
+++ b/mercurial/branchmap.py
@@ -406,7 +406,8 @@
 
 # fast path: extract data from cache, use it if node is matching
 reponode = changelog.node(rev)[:_rbcnodelen]
-cachenode, branchidx = unpack_from(_rbcrecfmt, self._rbcrevs, 
rbcrevidx)
+cachenode, branchidx = unpack_from(
+_rbcrecfmt, util.buffer(self._rbcrevs), rbcrevidx)
 close = bool(branchidx & _rbccloseflag)
 if close:
 branchidx &= _rbcbranchidxmask



To: glandium, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D328: setup: Fix installing in a mingw environment

2017-08-10 Thread glandium (Mike Hommey)
glandium added a comment.


  Sorry, I rebased to stable, and that created a new differential: 
https://phab.mercurial-scm.org/D329, even though I updated the local tag.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D328

To: glandium, #hg-reviewers, quark
Cc: quark, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D329: setup: Fix installing in a mingw environment

2017-08-10 Thread glandium (Mike Hommey)
glandium created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The addition, in 
https://phab.mercurial-scm.org/rHG9a4adc76c88a1a217983f051766b3009c0bca3aa, of 
a hack for the MSVC compiler class was
  overwriting the original class for the Mingw32CCompiler class, leading to an
  error when the HackedMingw32CCompiler is instantiated.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D329

AFFECTED FILES
  setup.py

CHANGE DETAILS

diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -784,11 +784,11 @@
 from distutils import cygwinccompiler
 
 # the -mno-cygwin option has been deprecated for years
-compiler = cygwinccompiler.Mingw32CCompiler
+mingw32compilerclass = cygwinccompiler.Mingw32CCompiler
 
 class HackedMingw32CCompiler(cygwinccompiler.Mingw32CCompiler):
 def __init__(self, *args, **kwargs):
-compiler.__init__(self, *args, **kwargs)
+mingw32compilerclass.__init__(self, *args, **kwargs)
 for i in 'compiler compiler_so linker_exe linker_so'.split():
 try:
 getattr(self, i).remove('-mno-cygwin')
@@ -809,11 +809,11 @@
 # effect.
 from distutils import msvccompiler
 
-compiler = msvccompiler.MSVCCompiler
+msvccompilerclass = msvccompiler.MSVCCompiler
 
 class HackedMSVCCompiler(msvccompiler.MSVCCompiler):
 def initialize(self):
-compiler.initialize(self)
+msvccompilerclass.initialize(self)
 # "warning LNK4197: export 'func' specified multiple times"
 self.ldflags_shared.append('/ignore:4197')
 self.ldflags_shared_debug.append('/ignore:4197')



To: glandium, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D328: setup: Fix installing in a mingw environment

2017-08-10 Thread glandium (Mike Hommey)
glandium created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The addition, in 
https://phab.mercurial-scm.org/rHG9a4adc76c88a1a217983f051766b3009c0bca3aa, of 
a hack for the MSVC compiler class was
  overwriting the original class for the Mingw32CCompiler class, leading to an
  error when the HackedMingw32CCompiler is instantiated.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D328

AFFECTED FILES
  setup.py

CHANGE DETAILS

diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -784,11 +784,11 @@
 from distutils import cygwinccompiler
 
 # the -mno-cygwin option has been deprecated for years
-compiler = cygwinccompiler.Mingw32CCompiler
+mingw32_compiler_class = cygwinccompiler.Mingw32CCompiler
 
 class HackedMingw32CCompiler(cygwinccompiler.Mingw32CCompiler):
 def __init__(self, *args, **kwargs):
-compiler.__init__(self, *args, **kwargs)
+mingw32_compiler_class.__init__(self, *args, **kwargs)
 for i in 'compiler compiler_so linker_exe linker_so'.split():
 try:
 getattr(self, i).remove('-mno-cygwin')
@@ -809,11 +809,11 @@
 # effect.
 from distutils import msvccompiler
 
-compiler = msvccompiler.MSVCCompiler
+msvc_compiler_class = msvccompiler.MSVCCompiler
 
 class HackedMSVCCompiler(msvccompiler.MSVCCompiler):
 def initialize(self):
-compiler.initialize(self)
+msvc_compiler_class.initialize(self)
 # "warning LNK4197: export 'func' specified multiple times"
 self.ldflags_shared.append('/ignore:4197')
 self.ldflags_shared_debug.append('/ignore:4197')



To: glandium, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: A possible explanation for random "stream ended unexpectedly (got m bytes, expected n)"

2017-03-25 Thread Mike Hommey
On Sat, Mar 25, 2017 at 06:24:26PM -0700, Jun Wu wrote:
> May I ask what version of Python are you using? If it's < 2.7.4, the EINTR
> issue is expected.

2.7.13.

> 
> Excerpts from Mike Hommey's message of 2017-03-26 09:08:25 +0900:
> > On Sat, Mar 25, 2017 at 10:34:02AM -0700, Gregory Szorc wrote:
> > > On Sat, Mar 25, 2017 at 4:19 AM, Mike Hommey <m...@glandium.org> wrote:
> > > 
> > > > Hi,
> > > >
> > > > I don't know about you, but occasionally, I've hit "stream ended
> > > > unexpectedly (got m bytes, expected n)" errors that didn't make sense.
> > > > Retrying would always work.
> > > >
> > > > Recently, I was trying to use signal.setitimer and a signal handler for
> > > > some memory profiling on git-cinnabar, which uses mercurial as a
> > > > library, and got "stream ended 4 unexpectedly (got m bytes, expected n)"
> > > > *very* reproducibly. Like, with an interval timer firing every second,
> > > > it took only a few seconds to hit the error during a clone.
> > > >
> > > > I'm pretty sure this can be reproduced with a similar setup in mercurial
> > > > itself.
> > > >
> > > > Now, the reason this happens in this case is that, the code that fails
> > > > does:
> > > >
> > > > def readexactly(stream, n):
> > > > '''read n bytes from stream.read and abort if less was available'''
> > > > s = stream.read(n)
> > > > if len(s) < n:
> > > > raise error.Abort(_("stream ended unexpectedly"
> > > >" (got %d bytes, expected %d)")
> > > >   % (len(s), n))
> > > > return s
> > > >
> > > > ... and thanks to POSIX, interrupted reads can lead to short reads. So,
> > > > you request n bytes, and get less, just because something interrupted
> > > > the process.  The problem then is that python doesn't let you know why
> > > > you just got a short read, and you have to figure that out on your own.
> > > >
> > > > The same kind of problem is also true to some extent on writes.
> > > >
> > > > Now, the problem is that this piece of code is likely the most visible
> > > > place where the issue exists, but there are many other places in the
> > > > mercurial code base that are likely affected.
> > > >
> > > > And while the signal.setitimer case is a corner case (and I ended up
> > > > using a separate thread to work around the problem ; my code wasn't
> > > > interruption safe either anyways), I wonder if those random "stream
> > > > ended unexpectedly (got m bytes, expected n)" errors I was getting under
> > > > normal circumstances are not just a manifestation of the same underlying
> > > > issue, which is that the code doesn't like interrupted reads.
> > > >
> > > > Disclaimer: I'm not going to work on fixing this issue, but I figured
> > > > I'd let you know, in case someone wants to look into it more deeply.
> > > >
> > > 
> > > Thank you for writing this up. This "stream ended unexpectedly" has been a
> > > thorn in my side for a while, as it comes up frequently in Mozilla's CI
> > > with a frequency somewhere between 1 in 100-1000. Even retrying failed
> > > operations multiple times isn't enough to overcome it
> > > 
> > > I have long suspected interrupted system calls as a likely culprit.
> > > However, when I initially investigated this a few months ago, I found that
> > > Python's I/O APIs retry automatically for EINTR. See
> > > https://hg.python.org/cpython/file/54c93e0fe79b/Lib/socket.py#l365  for
> > > example. This /should/ make e.g. socket._fileobject.read() resilient
> > > against signal interruption. (If Python's I/O APIs didn't do this, tons of
> > > programs would break. Also, the semantics of .read() are such that it is
> > > always supposed to retrieve all available bytes until EOF - at least for
> > > some io ABCs. read1() exists to perform at most 1 system call.)
> > 
> > Note that EINTR is not the only way read() can end from interruption:
> > 
> >If a read() is interrupted by a signal before it reads any data, it
> >shall return -1 with errno set to [EINTR].
> > 
> >If a read() is interrupted by a signal after it has successfully read
> >some dat

Re: A possible explanation for random "stream ended unexpectedly (got m bytes, expected n)"

2017-03-25 Thread Mike Hommey
On Sat, Mar 25, 2017 at 10:34:02AM -0700, Gregory Szorc wrote:
> On Sat, Mar 25, 2017 at 4:19 AM, Mike Hommey <m...@glandium.org> wrote:
> 
> > Hi,
> >
> > I don't know about you, but occasionally, I've hit "stream ended
> > unexpectedly (got m bytes, expected n)" errors that didn't make sense.
> > Retrying would always work.
> >
> > Recently, I was trying to use signal.setitimer and a signal handler for
> > some memory profiling on git-cinnabar, which uses mercurial as a
> > library, and got "stream ended 4 unexpectedly (got m bytes, expected n)"
> > *very* reproducibly. Like, with an interval timer firing every second,
> > it took only a few seconds to hit the error during a clone.
> >
> > I'm pretty sure this can be reproduced with a similar setup in mercurial
> > itself.
> >
> > Now, the reason this happens in this case is that, the code that fails
> > does:
> >
> > def readexactly(stream, n):
> > '''read n bytes from stream.read and abort if less was available'''
> > s = stream.read(n)
> > if len(s) < n:
> > raise error.Abort(_("stream ended unexpectedly"
> >" (got %d bytes, expected %d)")
> >   % (len(s), n))
> > return s
> >
> > ... and thanks to POSIX, interrupted reads can lead to short reads. So,
> > you request n bytes, and get less, just because something interrupted
> > the process.  The problem then is that python doesn't let you know why
> > you just got a short read, and you have to figure that out on your own.
> >
> > The same kind of problem is also true to some extent on writes.
> >
> > Now, the problem is that this piece of code is likely the most visible
> > place where the issue exists, but there are many other places in the
> > mercurial code base that are likely affected.
> >
> > And while the signal.setitimer case is a corner case (and I ended up
> > using a separate thread to work around the problem ; my code wasn't
> > interruption safe either anyways), I wonder if those random "stream
> > ended unexpectedly (got m bytes, expected n)" errors I was getting under
> > normal circumstances are not just a manifestation of the same underlying
> > issue, which is that the code doesn't like interrupted reads.
> >
> > Disclaimer: I'm not going to work on fixing this issue, but I figured
> > I'd let you know, in case someone wants to look into it more deeply.
> >
> 
> Thank you for writing this up. This "stream ended unexpectedly" has been a
> thorn in my side for a while, as it comes up frequently in Mozilla's CI
> with a frequency somewhere between 1 in 100-1000. Even retrying failed
> operations multiple times isn't enough to overcome it
> 
> I have long suspected interrupted system calls as a likely culprit.
> However, when I initially investigated this a few months ago, I found that
> Python's I/O APIs retry automatically for EINTR. See
> https://hg.python.org/cpython/file/54c93e0fe79b/Lib/socket.py#l365 for
> example. This /should/ make e.g. socket._fileobject.read() resilient
> against signal interruption. (If Python's I/O APIs didn't do this, tons of
> programs would break. Also, the semantics of .read() are such that it is
> always supposed to retrieve all available bytes until EOF - at least for
> some io ABCs. read1() exists to perform at most 1 system call.)

Note that EINTR is not the only way read() can end from interruption:

   If a read() is interrupted by a signal before it reads any data, it
   shall return -1 with errno set to [EINTR].

   If a read() is interrupted by a signal after it has successfully read
   some data, it shall return the number of bytes read.

   From http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

But that's POSIX, Windows is another story. Recv is different too.

On Sat, Mar 25, 2017 at 12:00:42PM -0700, Gregory Szorc wrote:
> Can you please provide more detailed steps to reproduce?
> 
> I added the following code at the top of exchange.pull:
> 
> def sighandler(sig, stack):
> pass
> 
> import signal
> signal.signal(signal.SIGALRM, sighandler)
> signal.setitimer(signal.ITIMER_REAL, 1.0, 1.0)
> 
> However, I was unable to reproduce the "stream ended unexpectedly" failure
> when cloning a Firefox repo from hg.mozilla.org. And I even tried with the
> interval set to 1ms.

So, I tried to reproduce again with my original testcase, and failed to.
In fact, instead I was getting urllib2.URLError:  errors. Delaying the initial SIGALRM to go past
the start of the HTTP request didn't fai

A possible explanation for random "stream ended unexpectedly (got m bytes, expected n)"

2017-03-25 Thread Mike Hommey
Hi,

I don't know about you, but occasionally, I've hit "stream ended
unexpectedly (got m bytes, expected n)" errors that didn't make sense.
Retrying would always work.

Recently, I was trying to use signal.setitimer and a signal handler for
some memory profiling on git-cinnabar, which uses mercurial as a
library, and got "stream ended 4 unexpectedly (got m bytes, expected n)"
*very* reproducibly. Like, with an interval timer firing every second,
it took only a few seconds to hit the error during a clone.

I'm pretty sure this can be reproduced with a similar setup in mercurial
itself.

Now, the reason this happens in this case is that, the code that fails
does:

def readexactly(stream, n):
'''read n bytes from stream.read and abort if less was available'''
s = stream.read(n)
if len(s) < n:
raise error.Abort(_("stream ended unexpectedly"
   " (got %d bytes, expected %d)")
  % (len(s), n))
return s

... and thanks to POSIX, interrupted reads can lead to short reads. So,
you request n bytes, and get less, just because something interrupted
the process.  The problem then is that python doesn't let you know why
you just got a short read, and you have to figure that out on your own.

The same kind of problem is also true to some extent on writes.

Now, the problem is that this piece of code is likely the most visible
place where the issue exists, but there are many other places in the
mercurial code base that are likely affected.

And while the signal.setitimer case is a corner case (and I ended up
using a separate thread to work around the problem ; my code wasn't
interruption safe either anyways), I wonder if those random "stream
ended unexpectedly (got m bytes, expected n)" errors I was getting under
normal circumstances are not just a manifestation of the same underlying
issue, which is that the code doesn't like interrupted reads.

Disclaimer: I'm not going to work on fixing this issue, but I figured
I'd let you know, in case someone wants to look into it more deeply.

Cheers,

Mike
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: hg-git and round-tripping (and file copies?)

2017-03-16 Thread Mike Hommey
On Thu, Mar 16, 2017 at 01:38:18PM -0700, Gregory Szorc wrote:
> On Thu, Mar 16, 2017 at 1:05 PM, Danek Duvall 
> wrote:
> 
> > In trying to convert
> >
> > https://hg.java.net/hg/solaris-userland~gate
> >
> > to a git repo and back, I'm seeing issues at changeset 34, where the hash
> > changes for reasons I can't see.  If I do a diff of the debug log, I see
> > it's due to the manifest:
> >
> > $ diff -u =(hg log -R userland-more --debug -r 34) =(hg log -R
> > userland-more.hgagain --debug -r 34 | grep -v "^phase:")
> > --- /tmp/zshhHyEIb  2017-03-16 11:37:57.601340643 -0700
> > +++ /tmp/zshlyqHbd  2017-03-16 11:37:57.793642372 -0700
> > @@ -1,12 +1,10 @@
> >  no terminfo entry for sitm
> > -changeset:   34:d20b10eba31725ad8954aa6d20374da512f0e636
> > -tag: build-149
> > +changeset:   34:2ccb817b85926f410df2a6bd23000265805088df
> >  parent:  33:371c8e56136d19872ae7db8d273f9de78c8fa783
> >  parent:  -1:
> > -manifest:34:e031f26e68549dadb3dfb4705d429c75622a58b4
> > +manifest:34:5a12a2a1bf3e7c0f7c30d01bd09a2e37185bcfb6
> >  user:Norm Jacobs 
> >  date:Sun Sep 19 13:50:53 2010 -0700
> > -phase:   public
> >  files:
> > components/Makefile
> > make-rules/prep.mk
> >
> > and if I use debugdata to look at the manifest at changeset 34, I see:
> >
> > $ gdiff -a -u =(hg -R userland-more debugdata -m 34) =(hg -R
> > userland-more.hgagain debugdata -m 34)
> > --- /tmp/zshOdnjza  2017-03-16 11:53:16.971130878 +
> > +++ /tmp/zshzoTzmc  2017-03-16 11:53:17.118194061 +
> > @@ -24,12 +24,12 @@
> >  make-rules/setup.py.mk302733d738cc7c6cceb63457442f24f931867472
> >  make-rules/shared-macros.mk03dd5df583b6e39a17ba66fc6ed6205df7f6be49
> >  tools/Makefilecc964766028e3b963b4a321c88815d211415006b
> > -tools/bass-o-matica618ef38ceda467b9a09680dd8b94debcd303037x
> > +tools/bass-o-matic349f9611499fddf1a110f9488a84fb110c90b7bfx
> >  tools/build-watch.df69b9a2b6a265c06268733430bbf3f9aa7d5e160x
> >  tools/build-watch.pl5e23340c7a84ac555e630a5ccdc28eceda95f4b6x
> >  tools/time.ca0a1f64ff8ac947ce9d045e0448f8ee72f9fd273
> > -tools/userland-fetch851170bb5cebf2648c53d4909eac26ac2055cdd3x
> > -tools/userland-unpack0977e35fa356d4cfab889b93613dc75d90d89b6bx
> > +tools/userland-fetchbae023e70db29fd07f6f989aaa858cfaed09238ax
> > +tools/userland-unpackb3800b9db86df38a644a653b3095805b269b6ac6x
> >  transforms/actuatorsc9d84677229efde5f89b1d985de5cd1b09267b56
> >  transforms/archive-libraries-drop5b346a0133242f460ff66f6689
> > 790da094ce27f6
> >  transforms/comparison-cleanupde1288c586594a171d43a3da5234cb920be408cc
> >
> > Now, those three files were copied in that changeset, but they're not the
> > first to be copied, so it's not that, strictly.  But it is the first
> > changeset in which files were copied without being modified.
> >
> > The index data is off-by-one, if that makes any difference:
> >
> > $ hg -R userland-more debugrevlog -d tools/bass-o-matic
> > # rev p1rev p2rev start   end deltastart base   p1   p2 rawsize
> > totalsize compression heads chainlen
> > 0-1-1 0  2175  00006005
> > 6005   2 10
> > 1 0-1  2175  2228  00005929
> >  11934   5 11
> >
> > $ hg -R userland-more.hgagain debugrevlog -d tools/bass-o-matic
> > # rev p1rev p2rev start   end deltastart base   p1   p2 rawsize
> > totalsize compression heads chainlen
> > 0-1-1 0  2174  00006005
> > 6005   2 10
> > 1 0-1  2174  2227  00005929
> >  11934   5 11
> >
> > Any thoughts on how to further debug this?
> >
> > Or is this just
> >
> > https://bitbucket.org/durin42/hg-git/issues/46

Note that bug is about git->hg conversion where the original repository
is git.

> >
> > and I'm out of luck?
> >
> 
> It is effectively impossible to round-trip between Git and Mercurial when
> file copies are involved. This is because Mercurial's filelog hashes
> include copy metadata and the parent nodes. Git's blob hashes, by contrast,
> are effectively content only. When you convert from Mercurial to Git, it
> will drop copy metadata (because Git doesn't track it explicitly). Then
> when you convert back to Mercurial, the copies have to be detected "just
> right" by hg-git for the hashes to align. Furthermore, the files have to be
> reintroduced in the same order, or the filelog parents may not align and
> the hashes may diverge. If a repo isn't linear, there's a non-zero chance
> of that happening.

hg-git actually "stores" copy/rename in the commit messages, but that's
assuming the commit was done in mercurial and 

Re: [PATCH RFC] similar: allow similarity detection to use sha256 for digesting file contents

2017-03-01 Thread Mike Hommey
On Wed, Mar 01, 2017 at 04:34:43PM -0800, Gregory Szorc wrote:
> On Wed, Mar 1, 2017 at 7:02 AM, FUJIWARA Katsunori 
> wrote:
> 
> > # HG changeset patch
> > # User FUJIWARA Katsunori 
> > # Date 1488380487 -32400
> > #  Thu Mar 02 00:01:27 2017 +0900
> > # Node ID 018d9759cb93f116007d4640341a82db6cf2d45c
> > # Parent  0bb3089fe73527c64f1afc40b86ecb8dfe7fd7aa
> > similar: allow similarity detection to use sha256 for digesting file
> > contents
> >
> > Before this patch, similarity detection logic (used for addremove and
> > automv) uses SHA-1 digesting. But this cause incorrect rename
> > detection, if:
> >
> >   - removing file A and adding file B occur at same committing, and
> >   - SHA-1 hash values of file A and B are same
> >
> > This may prevent security experts from managing sample files for
> > SHAttered issue in Mercurial repository, for example.
> >
> >   https://security.googleblog.com/2017/02/announcing-first-
> > sha1-collision.html
> >   https://shattered.it/
> >
> > Hash collision itself isn't so serious for core repository
> > functionality of Mercurial, described by mpm as below, though.
> >
> >   https://www.mercurial-scm.org/wiki/mpm/SHA1
> >
> > HOW ABOUT:
> >
> >   - which should we use default algorithm SHA-1, SHA-256 or SHA-512 ?
> >
> 
> SHA-512 should be faster than SHA-256 on 64-bit hardware. So, there's
> likely no good reason to use SHA-256 for simple identity checks.
> 
> 
> >
> > ease (handling problematic files safely by default) or
> > performance?
> >
> 
> 
> On my Skylake at 4.0 GHz, SHA-1 is capable of running at ~975 MB/s and
> SHA-512 at ~700 MB/s. Both are fast enough that for simple one-time content
> identity checks, hashing shouldn't be a bottleneck, at least not for most
> repos.
> 
> So I think it is fine to change this function from SHA-1 to SHA-512
> assuming the hashes don't "leak" into storage. If they end up being stored
> or used for something other than identity checks, then we need to bloat
> scope to discuss our general hashing future. And that needs its own thread
> ;)

With hashing, there is *always* the risk of collision. It might be tiny,
but it still exists. Why not just compare the contents when the hash
match? Then it doesn't really matter what the hash is. The hash is just
a shortcut to avoid comparing full contents in a O(n^2) fashion.

There aren't going to be that many hash matches anyways, comparing the
content then should not make a significant difference in speed, but
would guarantee that the "similarity" is real.

(BTW, interestingly, in terms of similarity detection, while the
SHAttered PDFs are not 100% identical, they are 80%+ similar)

Mike
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 7 of 7 v2] bdiff: give slight preference to removing trailing lines

2016-11-24 Thread Mike Hommey
On Thu, Nov 24, 2016 at 05:52:29PM +, Jun Wu wrote:
> Excerpts from Augie Fackler's message of 2016-11-17 12:42:26 -0500:
> > My own cursory perfbdiff runs suggest this is a perf wash (using
> > `perfbdiff -m 3041e4d59df2` in the mozilla repo). Queued. Thanks!
> 
> I'd mention this series changes the behavior of the diff output. The
> difference was caught by fastannotate test.
> 
> See the below table (old: e1d6aa0e4c3a, new: 8836f13e3c5b):
> 
>a | b | old | new
>   
>a | a |  a  | -a
>a | z | +z  |  a
>a | a |  a  | +z
>  |   | -a  |  a
>   
>a | a | a
>a | a | a
>a |   |-a
> 
> I think we would always prefer putting deletions at the end, to be consistent.

Wouldn't
 a
-a
+z
 a

Be preferable to both old and new? That's what plain diff does, by the
way.

Mike
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 8 of 9 RFC] wireproto: introduce listkeys2 command

2016-08-14 Thread Mike Hommey
On Sun, Aug 14, 2016 at 04:59:50PM -0700, Gregory Szorc wrote:
> On Sun, Aug 14, 2016 at 3:12 PM, Mike Hommey <m...@glandium.org> wrote:
> 
> > On Sun, Aug 14, 2016 at 02:10:07PM -0700, Gregory Szorc wrote:
> > > # HG changeset patch
> > > # User Gregory Szorc <gregory.sz...@gmail.com>
> > > # Date 1471208237 25200
> > > #  Sun Aug 14 13:57:17 2016 -0700
> > > # Node ID d2870bcbc43041909e9f637b294cb889f7ed4933
> > > # Parent  eb2bc1ac7869ad255965d16004524a95cea83c9d
> > > wireproto: introduce listkeys2 command
> > >
> > > The command behaves exactly like "listkeys" except it uses a more
> > > efficient and more robust binary encoding mechanism.
> >
> > Nowhere in the patch queue I see mentioned why you want this. Not saying
> > that this shouldn't be done, but it's really not clear what the expected
> > benefit is of all this refactoring and this new command.
> 
> 
> I said it concisely in the commit message you just quoted: the wire
> encoding is smaller and is able to represent all binary values. I offer
> more detail at
> https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-August/087243.html.

... and a large part of that message should be in this commit message.

Mike
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel