[PATCH stable v2] worker: avoid potential partial write of pickled data

2022-05-21 Thread Manuel Jacob
# HG changeset patch
# User Manuel Jacob 
# Date 1653184234 -7200
#  Sun May 22 03:50:34 2022 +0200
# Branch stable
# Node ID b475b0ea695438f6b2994eba0ddb3189d8b4fd05
# Parent  477b5145e1a02715f846ce017b460858a58e03b1
# EXP-Topic worker-pickle-fix_partial_write
worker: avoid potential partial write of pickled data

Previously, the code wrote the pickled data using os.write(). However,
os.write() can write less bytes than passed to it. To trigger the problem, the
pickled data had to be larger than 2147479552 bytes on my system.

Instead, open a file object and pass it to pickle.dump(). This also has the
advantage that it doesn’t buffer the whole pickled data in memory.

Note that the opened file must be buffered because pickle doesn’t support
unbuffered streams because unbuffered streams’ write() method might write less
bytes than passed to it (like os.write()) but pickle.dump() relies on that all
bytes are written (see https://github.com/python/cpython/issues/93050).

The side effect of using a file object and a with statement is that wfd is
explicitly closed now while it seems like before it was implicitly closed by
process exit.

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -255,8 +255,10 @@
 os.close(r)
 os.close(w)
 os.close(rfd)
-for result in func(*(staticargs + (pargs,))):
-os.write(wfd, util.pickle.dumps(result))
+with os.fdopen(wfd, 'wb') as wf:
+for result in func(*(staticargs + (pargs,))):
+util.pickle.dump(result, wf)
+wf.flush()
 return 0
 
 ret = scmutil.callcatch(ui, workerfunc)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH stable] worker: avoid potential partial write of pickled data

2022-05-21 Thread Manuel Jacob

On 22/05/2022 05.44, Yuya Nishihara wrote:

On Sun, 22 May 2022 04:34:58 +0200, Manuel Jacob wrote:

# HG changeset patch
# User Manuel Jacob 
# Date 1653184234 -7200
#  Sun May 22 03:50:34 2022 +0200
# Branch stable
# Node ID beebf9c4b8ed6257c8f8bfeb5e9fcae6f54268d7
# Parent  477b5145e1a02715f846ce017b460858a58e03b1
# EXP-Topic worker-pickle-fix_partial_write
worker: avoid potential partial write of pickled data

Previously, the code wrote the pickled data using os.write(). However,
os.write() can write less bytes than passed to it. To trigger the problem, the
pickled data had to be larger than 2147479552 bytes on my system.

Instead, open a file object and pass it to pickle.dump(). This also has the
advantage that it doesn’t buffer the whole pickled data in memory.

Note that the opened file must be buffered because pickle doesn’t support
unbuffered streams because unbuffered streams’ write() method might write less
bytes than passed to it (like os.write()) but pickle.dump() relies on that all
bytes are written (see https://github.com/python/cpython/issues/93050).

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -255,8 +255,10 @@
  os.close(r)
  os.close(w)
  os.close(rfd)
+wf = os.fdopen(wfd, 'wb')
  for result in func(*(staticargs + (pargs,))):
-os.write(wfd, util.pickle.dumps(result))
+util.pickle.dump(result, wf)
+wf.flush()


It's probably better to write "with os.fdopen(wfd, 'wb') as wf:" to clarify
that the wf and wfd are closed there.


wfd was not explicitly closed in the child process before. But I agree 
that this would be a good idea.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 4] worker: explain why pickle reading stream has to be unbuffered

2022-05-21 Thread Yuya Nishihara
On Sun, 22 May 2022 12:03:02 +0900, Yuya Nishihara wrote:
> On Sun, 22 May 2022 02:37:11 +0200, Manuel Jacob wrote:
> > # HG changeset patch
> > # User Manuel Jacob 
> > # Date 1653164539 -7200
> > #  Sat May 21 22:22:19 2022 +0200
> > # Node ID 5dfd26ed9c4eeb51bbf910ca7450bdb65f06af81
> > # Parent  a17ffde1e71b1c9de5899ae8d5078e67c54ef1d3
> > # EXP-Topic worker-improvements
> > worker: explain why pickle reading stream has to be unbuffered  
> 
> Queued the series,

Oops.

Alphare, can you handle this series?
I don't know how to deal with emailed patches these days.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH stable] worker: avoid potential partial write of pickled data

2022-05-21 Thread Yuya Nishihara
On Sun, 22 May 2022 04:34:58 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob 
> # Date 1653184234 -7200
> #  Sun May 22 03:50:34 2022 +0200
> # Branch stable
> # Node ID beebf9c4b8ed6257c8f8bfeb5e9fcae6f54268d7
> # Parent  477b5145e1a02715f846ce017b460858a58e03b1
> # EXP-Topic worker-pickle-fix_partial_write
> worker: avoid potential partial write of pickled data
> 
> Previously, the code wrote the pickled data using os.write(). However,
> os.write() can write less bytes than passed to it. To trigger the problem, the
> pickled data had to be larger than 2147479552 bytes on my system.
> 
> Instead, open a file object and pass it to pickle.dump(). This also has the
> advantage that it doesn’t buffer the whole pickled data in memory.
> 
> Note that the opened file must be buffered because pickle doesn’t support
> unbuffered streams because unbuffered streams’ write() method might write less
> bytes than passed to it (like os.write()) but pickle.dump() relies on that all
> bytes are written (see https://github.com/python/cpython/issues/93050).
> 
> diff --git a/mercurial/worker.py b/mercurial/worker.py
> --- a/mercurial/worker.py
> +++ b/mercurial/worker.py
> @@ -255,8 +255,10 @@
>  os.close(r)
>  os.close(w)
>  os.close(rfd)
> +wf = os.fdopen(wfd, 'wb')
>  for result in func(*(staticargs + (pargs,))):
> -os.write(wfd, util.pickle.dumps(result))
> +util.pickle.dump(result, wf)
> +wf.flush()

It's probably better to write "with os.fdopen(wfd, 'wb') as wf:" to clarify
that the wf and wfd are closed there.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 4] worker: explain why pickle reading stream has to be unbuffered

2022-05-21 Thread Yuya Nishihara
On Sun, 22 May 2022 02:37:11 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob 
> # Date 1653164539 -7200
> #  Sat May 21 22:22:19 2022 +0200
> # Node ID 5dfd26ed9c4eeb51bbf910ca7450bdb65f06af81
> # Parent  a17ffde1e71b1c9de5899ae8d5078e67c54ef1d3
> # EXP-Topic worker-improvements
> worker: explain why pickle reading stream has to be unbuffered

Queued the series, thanks. I didn't know that memoryview can be
a context manager.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH stable] hghave: make black version regex work with newer versions of black

2022-05-21 Thread Yuya Nishihara
On Sun, 22 May 2022 02:43:05 +0200, Joerg Sonnenberger wrote:
> Am Sun, May 22, 2022 at 02:13:03AM +0200 schrieb Manuel Jacob:
> > # HG changeset patch
> > # User Manuel Jacob 
> > # Date 1653176900 -7200
> > #  Sun May 22 01:48:20 2022 +0200
> > # Branch stable
> > # Node ID 29f2716c5c54c7e0f7aa6d91979893f5d2078862
> > # Parent  477b5145e1a02715f846ce017b460858a58e03b1
> > # EXP-Topic black_version_regex
> > hghave: make black version regex work with newer versions of black

This works for me, but I hesitated to queue since test-check-format.t starts
detecting formatting errors with

  % black --version
  black, 22.3.0 (compiled: no)

> > Black commit 117891878e5be4d6b771ae5de299e51b679cea27 (included in black >=
> > 21.11b0) dropped the string "version " from the output of "black 
> > --version". To
> > make the regex work with newer black versions, make matching of "version "
> > optional.  
> 
> I had a patch like this locally, but newer black versions insist on
> incompatible output and that's where I stopped. There is also the issue
> that the regex itself seems wrong, e.g. the unescaped "." in the [].

[.] should be okay. "." has no special meaning in character set.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH stable] worker: avoid potential partial write of pickled data

2022-05-21 Thread Manuel Jacob
# HG changeset patch
# User Manuel Jacob 
# Date 1653184234 -7200
#  Sun May 22 03:50:34 2022 +0200
# Branch stable
# Node ID beebf9c4b8ed6257c8f8bfeb5e9fcae6f54268d7
# Parent  477b5145e1a02715f846ce017b460858a58e03b1
# EXP-Topic worker-pickle-fix_partial_write
worker: avoid potential partial write of pickled data

Previously, the code wrote the pickled data using os.write(). However,
os.write() can write less bytes than passed to it. To trigger the problem, the
pickled data had to be larger than 2147479552 bytes on my system.

Instead, open a file object and pass it to pickle.dump(). This also has the
advantage that it doesn’t buffer the whole pickled data in memory.

Note that the opened file must be buffered because pickle doesn’t support
unbuffered streams because unbuffered streams’ write() method might write less
bytes than passed to it (like os.write()) but pickle.dump() relies on that all
bytes are written (see https://github.com/python/cpython/issues/93050).

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -255,8 +255,10 @@
 os.close(r)
 os.close(w)
 os.close(rfd)
+wf = os.fdopen(wfd, 'wb')
 for result in func(*(staticargs + (pargs,))):
-os.write(wfd, util.pickle.dumps(result))
+util.pickle.dump(result, wf)
+wf.flush()
 return 0
 
 ret = scmutil.callcatch(ui, workerfunc)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH stable] hghave: make black version regex work with newer versions of black

2022-05-21 Thread Manuel Jacob

On 22/05/2022 02.43, Joerg Sonnenberger wrote:

Am Sun, May 22, 2022 at 02:13:03AM +0200 schrieb Manuel Jacob:

# HG changeset patch
# User Manuel Jacob 
# Date 1653176900 -7200
#  Sun May 22 01:48:20 2022 +0200
# Branch stable
# Node ID 29f2716c5c54c7e0f7aa6d91979893f5d2078862
# Parent  477b5145e1a02715f846ce017b460858a58e03b1
# EXP-Topic black_version_regex
hghave: make black version regex work with newer versions of black

Black commit 117891878e5be4d6b771ae5de299e51b679cea27 (included in black >=
21.11b0) dropped the string "version " from the output of "black --version". To
make the regex work with newer black versions, make matching of "version "
optional.


I had a patch like this locally, but newer black versions insist on
incompatible output and that's where I stopped.


It’s right that the test requiring black don’t pass anyway with some 
newer black versions. However, there are black versions with the new 
version string which seem to format the Mercurial source code as it is 
currently (e.g. black 21.11b0). Also, if we want to re-format the source 
code with a newer black version, we would need a change like this anyway.



There is also the issue
that the regex itself seems wrong, e.g. the unescaped "." in the [].


According to the documentation of the re module: “Special characters 
lose their special meaning inside sets.”.



Joerg
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH stable] hghave: make black version regex work with newer versions of black

2022-05-21 Thread Joerg Sonnenberger
Am Sun, May 22, 2022 at 02:13:03AM +0200 schrieb Manuel Jacob:
> # HG changeset patch
> # User Manuel Jacob 
> # Date 1653176900 -7200
> #  Sun May 22 01:48:20 2022 +0200
> # Branch stable
> # Node ID 29f2716c5c54c7e0f7aa6d91979893f5d2078862
> # Parent  477b5145e1a02715f846ce017b460858a58e03b1
> # EXP-Topic black_version_regex
> hghave: make black version regex work with newer versions of black
> 
> Black commit 117891878e5be4d6b771ae5de299e51b679cea27 (included in black >=
> 21.11b0) dropped the string "version " from the output of "black --version". 
> To
> make the regex work with newer black versions, make matching of "version "
> optional.

I had a patch like this locally, but newer black versions insist on
incompatible output and that's where I stopped. There is also the issue
that the regex itself seems wrong, e.g. the unescaped "." in the [].

Joerg
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 4 of 4] worker: implement _blockingreader.readinto() (issue6444)

2022-05-21 Thread Manuel Jacob
# HG changeset patch
# User Manuel Jacob 
# Date 1653171058 -7200
#  Sun May 22 00:10:58 2022 +0200
# Node ID 643d03eb9cdf48513c0bc639915f2c763d0635c5
# Parent  f8b3781ee6a2241fe40d5420f5d0293947f94022
# EXP-Topic worker-improvements
worker: implement _blockingreader.readinto() (issue6444)

The core logic for readinto() was already implemented in read(), so this is
mostly extracting that code into its own method.

Another fix for issue6444 was sent for the stable branch: 
https://foss.heptapod.net/mercurial/mercurial-devel/-/merge_requests/115
That is a minimal fix that implements readinto() only on Python versions that
require readinto() (3.8.0 and 3.8.1), which is the right approach for the
stable branch. However, I think that this patch has its value. It improves
performance in cases when pickle can use readinto(), it reduces code
duplication compared to the other patch, and by defining readinto() on all
Python versions, it makes behavior more consistent across all Python versions.
When merging stable into default, the other change can be dropped.

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -79,20 +79,12 @@
 def __init__(self, wrapped):
 self._wrapped = wrapped
 
-# Do NOT implement readinto() by making it delegate to
-# _wrapped.readinto(), since that is unbuffered. The unpickler is fine
-# with just read() and readline(), so we don't need to implement it.
-
 def readline(self):
 return self._wrapped.readline()
 
-# issue multiple reads until size is fulfilled (or EOF is encountered)
-def read(self, size=-1):
-if size < 0:
-return self._wrapped.readall()
-
-buf = bytearray(size)
+def readinto(self, buf):
 pos = 0
+size = len(buf)
 
 with memoryview(buf) as view:
 while pos < size:
@@ -102,7 +94,16 @@
 break
 pos += ret
 
-del buf[pos:]
+return pos
+
+# issue multiple reads until size is fulfilled (or EOF is encountered)
+def read(self, size=-1):
+if size < 0:
+return self._wrapped.readall()
+
+buf = bytearray(size)
+n_read = self.readinto(buf)
+del buf[n_read:]
 return bytes(buf)
 
 

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 4] worker: stop relying on garbage collection to release memoryview

2022-05-21 Thread Manuel Jacob
# HG changeset patch
# User Manuel Jacob 
# Date 1653168690 -7200
#  Sat May 21 23:31:30 2022 +0200
# Node ID f8b3781ee6a2241fe40d5420f5d0293947f94022
# Parent  ff98395f2742f27fa1e2e7ddbe31a3b810e470bc
# EXP-Topic worker-improvements
worker: stop relying on garbage collection to release memoryview

On CPython, before resizing the bytearray, all memoryviews referencing it must
be released. Before this change, we ensured that all references to them were
deleted. On CPython, this was enough to set the reference count to zero, which
results in garbage collecting and releasing them.

On PyPy, releasing the memoryviews is not necessary because they are implemented
differently. If it would be necessary however, ensuring that all references are
deleted would not be suffient because PyPy doesn’t use reference counting.

By using with statements that take care of releasing the memoryviews, we ensure
that the bytearray is resizable without relying on implementation details. So
while this doesn’t fix any observable bug, it increases compatiblity with other
and future Python implementations.

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -92,16 +92,16 @@
 return self._wrapped.readall()
 
 buf = bytearray(size)
-view = memoryview(buf)
 pos = 0
 
-while pos < size:
-ret = self._wrapped.readinto(view[pos:])
-if not ret:
-break
-pos += ret
+with memoryview(buf) as view:
+while pos < size:
+with view[pos:] as subview:
+ret = self._wrapped.readinto(subview)
+if not ret:
+break
+pos += ret
 
-del view
 del buf[pos:]
 return bytes(buf)
 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 4] worker: add docstring to _blockingreader

2022-05-21 Thread Manuel Jacob
# HG changeset patch
# User Manuel Jacob 
# Date 1653164642 -7200
#  Sat May 21 22:24:02 2022 +0200
# Node ID ff98395f2742f27fa1e2e7ddbe31a3b810e470bc
# Parent  5dfd26ed9c4eeb51bbf910ca7450bdb65f06af81
# EXP-Topic worker-improvements
worker: add docstring to _blockingreader

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -69,6 +69,13 @@
 
 
 class _blockingreader:
+"""Wrap unbuffered stream such that pickle.load() works with it.
+
+pickle.load() expects that calls to read() and readinto() read as many
+bytes as requested. On EOF, it is fine to read fewer bytes. In this case,
+pickle.load() raises an EOFError.
+"""
+
 def __init__(self, wrapped):
 self._wrapped = wrapped
 
@@ -79,7 +86,7 @@
 def readline(self):
 return self._wrapped.readline()
 
-# issue multiple reads until size is fulfilled
+# issue multiple reads until size is fulfilled (or EOF is encountered)
 def read(self, size=-1):
 if size < 0:
 return self._wrapped.readall()

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 4] worker: explain why pickle reading stream has to be unbuffered

2022-05-21 Thread Manuel Jacob
# HG changeset patch
# User Manuel Jacob 
# Date 1653164539 -7200
#  Sat May 21 22:22:19 2022 +0200
# Node ID 5dfd26ed9c4eeb51bbf910ca7450bdb65f06af81
# Parent  a17ffde1e71b1c9de5899ae8d5078e67c54ef1d3
# EXP-Topic worker-improvements
worker: explain why pickle reading stream has to be unbuffered

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -265,6 +265,10 @@
 selector = selectors.DefaultSelector()
 for rfd, wfd in pipes:
 os.close(wfd)
+# The stream has to be unbuffered. Otherwise, if all data is read from
+# the raw file into the buffer, the selector thinks that the FD is not
+# ready to read while pickle.load() could read from the buffer. This
+# would delay the processing of readable items.
 selector.register(os.fdopen(rfd, 'rb', 0), selectors.EVENT_READ)
 
 def cleanup():

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH stable] hghave: make black version regex work with newer versions of black

2022-05-21 Thread Manuel Jacob
# HG changeset patch
# User Manuel Jacob 
# Date 1653176900 -7200
#  Sun May 22 01:48:20 2022 +0200
# Branch stable
# Node ID 29f2716c5c54c7e0f7aa6d91979893f5d2078862
# Parent  477b5145e1a02715f846ce017b460858a58e03b1
# EXP-Topic black_version_regex
hghave: make black version regex work with newer versions of black

Black commit 117891878e5be4d6b771ae5de299e51b679cea27 (included in black >=
21.11b0) dropped the string "version " from the output of "black --version". To
make the regex work with newer black versions, make matching of "version "
optional.

diff --git a/tests/hghave.py b/tests/hghave.py
--- a/tests/hghave.py
+++ b/tests/hghave.py
@@ -1123,7 +1123,7 @@
 @check('black', 'the black formatter for python (>= 20.8b1)')
 def has_black():
 blackcmd = 'black --version'
-version_regex = b'black, version ([0-9a-b.]+)'
+version_regex = b'black, (?:version )?([0-9a-b.]+)'
 version = matchoutput(blackcmd, version_regex)
 sv = distutils.version.StrictVersion
 return version and sv(_bytes2sys(version.group(1))) >= sv('20.8b1')

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel