[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Jack O'Connor


Jack O'Connor  added the comment:

I did reply to that point above with some baseless speculation, but now I can 
back up my baseless speculation with unscientific data :)

https://gist.github.com/oconnor663/aed7016c9dbe5507510fc50faceaaa07

According to whatever `powerstat -R` measures on my laptop, running 
hardware-accelerated SHA-256 in a loop for a minute or so takes 26.86 Watts on 
average. Doing the same with AVX-512 BLAKE3 takes 29.53 Watts, 10% more. 
Factoring in the 4.69x difference in throughput reported by those loops, the 
overall energy/byte for BLAKE3 is 4.27x lower than SHA-256. This is my first 
time running a power benchmark, so if this sounds implausible hopefully someone 
can catch my mistakes.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Jack O'Connor


Jack O'Connor  added the comment:

> Truncated sha512 (sha512-256) typically performs 40% faster than sha256 on 
> X86_64.

Without hardware acceleration, yes. But because SHA-NI includes only SHA-1 and 
SHA-256, and not SHA-512, it's no longer a level playing field. OpenSSL's 
SHA-512 and SHA-512/256 both get about 797 MB/s on my machine.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Jack O'Connor


Jack O'Connor  added the comment:

> Hardware accelerated SHAs are likely faster than blake3 single core.

Surprisingly, they're not. Here's a quick measurement on my recent ThinkPad 
laptop (64 KiB of input, single-threaded, TurboBoost left on), which supports 
both AVX-512 and the SHA extensions:

OpenSSL SHA-256: 1816 MB/s
OpenSSL SHA-1:   2103 MB/s
BLAKE3 SSE2: 2109 MB/s
BLAKE3 SSE4.1:   2474 MB/s
BLAKE3 AVX2: 4898 MB/s
BLAKE3 AVX-512:  8754 MB/s

The main reason SHA-1 and SHA-256 don't do better is that they're fundamentally 
serial algorithms. Hardware acceleration can speed up a single instance of 
their compression functions, but there's just no way for it to run more than 
one instance per message at a time. In contrast, AES-CTR can easily parallelize 
its blocks, and hardware accelerated AES does beat BLAKE3.

> And certainly more efficient in terms of watt-secs/byte.

I don't have any experience measuring power myself, so take this with a grain 
of salt: I think the difference in throughput shown above is large enough that, 
even accounting for the famously high power draw of AVX-512, BLAKE3 comes out 
ahead in terms of energy/byte. Probably not on ARM though.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-22 Thread Jack O'Connor


Jack O'Connor  added the comment:

> maintaining a complicated build process in-tree

For what it's worth, if you have any sort of "in a perfect world" vision for 
what the upstream BLAKE3 project could do to make it trivially easy for you to 
integrate, I'd be very interested in getting that done. Making integration easy 
would benefit all callers. We have some issues on the backburner about 
committing CMake build files, but I assume those would be useless for us here. 
Is there anything that would be more useful? If we provided autotools build 
files, could you call into them?

Fundamentally, BLAKE3 wants to build some code on x86 and some other code on 
ARM, and also some code on Unix and some other code on Windows. Currently we 
just ask the caller to do that for us, for lack of a common standard. (And if 
we're building intrinsics rather than assembly, we also need the compiler flags 
that enable our intrinsics.) But maybe we could handle more of that upstream, 
using the preprocessor? If the build instructions said "compile this one giant 
file on all platforms and don't worry about what it does", would that be 
better? Or would that be gross? Is a header-only library the gold standard? Or 
too C++-ish? Has anyone ever done a really good job of this?

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-04 Thread Jack O'Connor


Jack O'Connor  added the comment:

Thanks Larry! Was any of the experimental C extension code under 
https://github.com/oconnor663/blake3-py/tree/master/c_impl useful to you? I was 
wondering if it could be possible to copy blake3module.c from there verbatim. 
The setup.py build there also has working Windows and NEON support.

I've patched the blake3-py test suite (which both the production Rust-based 
extension and that experimental C-based extension currently pass) to invoke the 
new hashlib implementation from your branch. You can find the full test output, 
and the procedure I used to run the tests, in this Gist 
https://gist.github.com/oconnor663/533048580b1c0f4a01d1d55f57f92792. Here are 
some differences:

- My derive_key_context parameter requires a string and refuses to accept 
bytes. This is consistent with our Rust and C APIs (though the C API does 
include a _raw version specifically for bindings, which we're using here). For 
a long discussion of why we prefer to do things this way, see 
https://github.com/BLAKE3-team/BLAKE3/issues/13. The short version is that any 
use case that requires arbitrary bytes for the context string is almost 
certainly violating the documented security requirement that the context string 
must be hardcoded.

- I've included an `AUTO` constant that provides a special value (-1) for the 
`max_threads` argument, and I explicitly don't support `max_threads=0`.

- I enforce that the `data` arguments are positional-only and that the other 
keyword arguments are keyword-only. I think this is consistent with the rest of 
hashlib.

- I include a `.reset()` method. This isn't particularly useful in the default 
case, where you might as well create a new hasher. But when `max_threads` is 
greater than 1 in the Rust implementation, the hasher owns a thread pool, and 
`.reset()` is currently the only way to reuse that pool. (A BLAKE3 hasher is 
also ~2 KB, somewhat larger than other hashers, so callers who are pinching 
pennies with their allocator traffic might prefer to reuse the object.)

- Unrelated to tests: I haven't made any attempt to zero memory in my `dealloc` 
function. But if that's what other hashlib functions do, then I'm certainly in 
favor of doing it here too.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-02-19 Thread Jack O'Connor


Jack O'Connor  added the comment:

Yes, everything in https://github.com/BLAKE3-team/BLAKE3 and 
https://github.com/oconnor663/blake3-py is public domain via CC0, and dual 
licensed under Apache for good measure. Hopefully that makes it easy to use it 
anywhere.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-02-17 Thread Jack O'Connor


Jack O'Connor  added the comment:

What's the best way for me to help with the next steps of this?

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Jack O'Connor

Jack O'Connor  added the comment:

Yeah by intrinsics I mean stuff like _mm256_add_epi32(). All of that stuff is 
in these vendored files:

blake3_avx2.c
blake3_avx512.c
blake3_neon.c
blake3_sse2.c
blake3_sse41.c

Also to Michał's question above, I'm not necessarily opposed to publishing 
something like "blake3-c" on PyPI once things stabilize. Even if we get BLAKE3 
into hashlib in 3.11, PyPI modules will be useful to folks running older 
versions, and not everyone wants to install the Rust toolchain.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Jack O'Connor


Jack O'Connor  added the comment:

> As a first pass I say we merge the reference C implementation.

Do you mean portable-only C code, or portable + intrinsics? If the assembly 
files are out, I'd advocate for the latter. The intrinsics implementations are 
nearly as fast as the assembly code, and both of those are several times faster 
than the portable code. You can test this configuration with my current 
setup.py by setting the env var FORCE_INTRINSICS=1.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Jack O'Connor


Jack O'Connor  added the comment:

I was about to say the only missing feature was docstrings, but then I realized 
I hadn't included releasing the GIL. I've added that and pushed an update just 
now. Fingers crossed there's nothing else I've missed. I think it's in 
reasonably good shape, and I'd like to propose it for inclusion in 3.11.

However, I'm not very experienced with setup.py or the Python C API, so I might 
not be the best judge of shape. Here are some highlights for reviewers, where I 
think the implementation (mostly the build) could be shaky:

- Platform detection. In setup.py I assume that the target platform of the 
build is the same as the current interpreter's platform. So for example, if the 
current interpreter has sys.maxsize==2**31-1, I assume we're compiling for 
32-bit. This clearly doesn't allow for any sort of cross-compilation, and in 
general I need feedback about whether there's a more correct way to obtain the 
target platform.

- Compiling assembly files. On Unix this is easy, because we can supply *.S 
files as `extra_objects` and GCC/Clang will do the right thing. But on Windows 
this isn't easy. Currently I shell out to vswhere.exe, ask it for the path to 
the latest version of the ml64.exe assembler, and then shell out to that to 
build .obj files. Then I pass those assembled .obj files as `extra_objects`. 
This feels awfully manual, and there's a good chance I've missed some 
better-supported way to do it. I assume we don't want to check in the .obj 
files?

- Does Python support the GNU ABI on Windows? We have assembly files for this 
in vendor/, but I'm not currently building them.

- Compiling intrinsics files for 32-bit x86. In this case, I create a 
`ccompiler.new_compiler()` for each intrinsics file, so that I can set the 
appropriate flags for each. This is relatively clean, but it leads to things 
getting rebuilt every single time, rather than participating in `setup.py 
build` caching. Maybe nobody cares about this, but again it makes me think 
there might be a better-supported way to do it.

- blake3module.c contains an awful lot of gotos to handle allocation failure 
cases. Is this still considered a best practice? These are bug-prone, and I'm 
not sure how to test them.

- Existing hashlib implementations include an optimization where they avoid 
allocating an internal mutex until they see a long input and want to release 
the GIL. As a quirky side effect of this, they handle allocation failures for 
that mutex by falling back to the do-not-release-the-GIL codepath. That feels 
kind of complicated to me, and in my code I'm always allocating the mutex 
during initialization. This doesn't seem to make much of a performance 
difference when I measure it, but there might be use cases I haven't considered.

Here are some other API details that might be worth bikeshedding:

- The `max_threads` parameter is currently defined to take a special value, 
`blake3.AUTO`, to indicate that the implementation may use as many threads as 
it likes. (The C implementation doesn't include multithreading support, but 
it's API-compatible with the Rust implementation.) `blake3.AUTO` is currently a 
class attribute equal to -1. We may want to bikeshed this name or propose some 
other representation.

- BLAKE3 has three modes: regular hashing, keyed hashing, and key derivation. 
The keyed hashing mode takes a 32-byte key, and the key derivation mode takes a 
context string. Calling the 32-byte key `key` seems good. Calling the context 
string `context` seems less good. Larry has pointed out before that lots of 
random things are called `context`, and readers might not understand what 
they're seeing. I currently call it `derive_key_context` instead, but we could 
bikeshed this.

- I check `itemsize` on input Py_buffers and throw an exception if it's 
anything other than 1. My test suite exercises this, see 
`test_int_array_fails`. However, some (all?) standard hashes don't do this 
check. For example:

>>> from hashlib import sha256
>>> import array
>>> a = array.array("i", [255])
>>> sha256(a).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'
>>> sha256(bytes([0xff, 0, 0, 0])).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'

Here we can see sha256() hashing an array of int. On my machine, an int is 4 
little-endian bytes, but of course this sort of thing isn't portable. The same 
array will result in a different SHA-256 output on a big-endian machine, or on 
a machine with ints of a different size. This seems undesirable, and I'm 
surprised that hashlib allows it. However, if there's some known compatibility 
reason why we have to allow it, I could remove this check.

--
versions: +Python 3.11 -Python 3.10


[issue39298] add BLAKE3 to hashlib

2022-01-11 Thread Jack O'Connor


Jack O'Connor  added the comment:

Ah, good idea. I've published the new C implementation as: 
https://test.pypi.org/project/blake3-experimental-c/

You can install it with: pip install -i https://test.pypi.org/simple/ 
blake3-experimental-c

Despite the package name change, the extension module is still "blake3", so we 
still "import blake3" to get at it. For example:

$ pip install -i https://test.pypi.org/simple/ blake3-experimental-c
$ python
>>> from blake3 import blake3
>>> blake3(b"foo").hexdigest()
'04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a7502e9'
>>> blake3(b"foo", max_threads=blake3.AUTO).hexdigest()
'04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a7502e9'

To run the Rust implementation's test suite against this implementation, you 
could then:

$ pip install pytest numpy
$ git clone https://github.com/oconnor663/blake3-py
$ python -m pytest blake3-py/tests/test_blake3.py
= test session starts =
platform linux -- Python 3.10.1, pytest-6.2.5, py-1.11.0, pluggy-0.13.1
rootdir: /tmp
collected 24 items

blake3-py/tests/test_blake3.py  [100%]

= 24 passed in 0.30s ==

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-10 Thread Jack O'Connor


Jack O'Connor  added the comment:

Update: There is now a C version of the `blake3` Python module available at 
https://github.com/oconnor663/blake3-py/tree/master/c_impl. It's completely 
API-compatible with the Rust version, and it passes the same test suite. 
Multithreading (which is implemented in upstream Rust but not in upstream C) is 
exposed through a "max_threads" argument, as Larry Hastings suggested. The C 
implementation allows this argument but ignores it.

Unlike my previous attempt, this setup.py build handles the full range of 
target platforms and optimized flavors: x86-64 assembly on Windows-MSVC and 
Unix, 32-bit x86 intrinsics on Windows-MSVC and Unix, NEON intrinsics on 
AArch64, and portable C for everyone else. I'm new to distutils/setuptools and 
not particular familiar with the MSVC toolchain either, so there's a good 
chance that others can suggest better/cleaner/more robust approaches than what 
I've got, but it's at least working on my machines and on GitHub CI. (I haven't 
tried to get any cross-compilation working though; is that a thing?)

I haven't published this module to PyPI, partly to avoid confusion with the 
Rust-based implementation, which I think most applications should prefer. But 
if it would make a big difference to anyone who wants to review this code, we 
could certainly put it up as `experimental_blake3_c` or something? Let me know 
what's best.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-09-05 Thread Jack O'Connor

Jack O'Connor  added the comment:

Hi Michał, no I haven't done any more work on this since my comments back in 
April. If you wanted to get started on a PyPI implementation, I think that 
would be fantastic. I'd be happy to collaborate over email: 
oconnor...@gmail.com. The branches I linked are still up, but I'm not sure my 
code will be very useful to someone who actually knows what they're doing :) 
Larry also had several ideas about how multithreading could fit in (which would 
be API changes in the Rust case, and forward-looking design work in the C 
case), and if I get permission from Larry I'll forward those emails.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25782] CPython hangs on error __context__ set to the error itself

2021-08-06 Thread Jack O'Connor


Change by Jack O'Connor :


--
nosy:  -oconnor663

___
Python tracker 
<https://bugs.python.org/issue25782>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-04-19 Thread Jack O'Connor


Jack O'Connor  added the comment:

Hey Christian, yes these are new bindings, and also incomplete. See comments in 
https://github.com/oconnor663/cpython/commit/dc6f6163ad9754c9ad53e9e3f3613ca3891a77ba,
 but in short only x86-64 Unix is in working order. If 3.10 doesn't seem 
realistic, I'm happy to go the PyPI route. That said, this is my first time 
using the Python C API. (My code in that branch is going to make that pretty 
obvious.) Could you recommend any existing packages that I might be able to use 
as a model?

For OpenSSL, I'm very interested in the abstract but less familiar with their 
project and their schedules. Who might be a good person to get in touch with?

> I assume there's a completely generic platform-agnostic C implementation, for 
> build environments where the assembly won't work, yes?

Yes, that's the vendored file blake3_portable.c. One TODO for my branch here is 
convincing the Python build system not to try to compile the x86-64-specific 
stuff on other platforms. The vendored file blake3_dispatch.c abstracts over 
all the different implementations and takes care of #ifdef'ing 
platform-specific function calls. (It also does runtime CPU feature detection 
on x86.)

> written using the Rust implementation, which I understand is even more 
> performant

A few details here: The upstream Rust and C implementations have been matched 
in single threaded performance for a while now. They share the same assembly 
files, and the rest is a direct port. The big difference is that Rust also 
includes multithreading support, using the Rayon work-stealing runtime. The 
blake3-py module based on the Rust crate exposes this with a simple boolean 
flag, though we've been thinking about ways to give the caller more control 
over the number of threads used.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-04-18 Thread Jack O'Connor


Jack O'Connor  added the comment:

An update a year later: I have a proof-of-concept branch that adds BLAKE3 
support to hashlib: https://github.com/oconnor663/cpython/tree/blake3. That 
branch is API compatible with the current master branch of 
https://github.com/oconnor663/blake3-py. Both that module and the upstream 
BLAKE3 repo are ready to be tagged 1.0, just waiting to see whether any 
integrations like this one end up requesting changes.

Would anyone be interested in moving ahead with this? One of the open questions 
would be whether CPython would vendor the BLAKE3 optimized assembly files, or 
whether we'd prefer to stick to C intrinsics.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42558] waitpid/waitid race caused by change to Popen.send_signal in Python 3.9

2020-12-04 Thread Jack O'Connor


Jack O'Connor  added the comment:

Right, the example above is contrived to demonstrate the race and the crash.

In real life code, the good reason I know of to write code like this is to use 
os.waidid(WNOWAIT) to solve the wait/kill race properly. This is what Duct has 
been doing, and Nathaniel Smith also described this strategy in 
https://bugs.python.org/issue38630. The idea is that a waiting thread follows 
these steps:

1. waitid() with WNOWAIT set, without locking the child
2. after waitid returns, indicating the child has exited, lock the child
3. waitid() without WNOWAIT, or just waitpid(), to reap the zombie child
4. stash the exit status and unlock

Meanwhile a killing thread follows these steps:

1. lock the child
2. check the stashed exit status, and unlock and exit early if it's set
3. otherwise, signal the child and unlock

This strategy solves the race. The killing thread is free to signal while the 
waiting thread is blocked in step 1. If the killing thread happens to race in 
between when waitid() returns and when the waiting thread acquires the child 
lock, the child is a zombie and the kill signal has no effect. This is safe 
even if other threads (or e.g. the OOM killer) can randomly kill our child: 
*they* might have to worry about PID reuse, but their signals can never cause 
*us* to kill an unrelated process. What breaks this scheme is if some thread 
calls waitpid() and reaps the child outside of the lock, but normally that'd be 
a pretty unreasonable thing to do, especially since it can only be done by 
other threads in the parent process. (There's also some complexity around 
whether multiple threads are allowed to call waitid(WNOWAIT) on the same PID at 
the same time. I've just had one thread call it, and had other blocking waiters 
block on a second lock, but maybe that's overcautious.)

So anyway, if you use the strategy above -- precisely because you care about 
the PID reuse race and want to solve it properly -- and you also happen to use 
Popen.kill(), then changing Popen.send_signal to reap the child can break you.

I don't think this is a bug per se, but it's a behavior change, which matters 
to a small set of (abnormally) correct programs. But then again, if Duct is the 
only project that hits this in practice, maybe I'm making a mountain out of a 
molehill :)

--

___
Python tracker 
<https://bugs.python.org/issue42558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38630] subprocess.Popen.send_signal() should poll the process

2020-12-03 Thread Jack O'Connor


Jack O'Connor  added the comment:

Filed separately as https://bugs.python.org/issue42558.

--

___
Python tracker 
<https://bugs.python.org/issue38630>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42558] waitpid/waitid race caused by change to Popen.send_signal in Python 3.9

2020-12-03 Thread Jack O'Connor


New submission from Jack O'Connor :

In Python 3.9, Popen.send_signal() was changed to call Popen.poll() internally 
before signaling. (Tracking bug: https://bugs.python.org/issue38630.) This is a 
best-effort check for the famous kill/wait race condition. However, because 
this can now reap an already-exited child process as a side effect, it can 
cause previously working programs to crash. Here's a simple example:

```
import os
import subprocess
import time

child = subprocess.Popen(["true"])
time.sleep(1)
child.kill()
os.waitpid(child.pid, 0)
```

The program above exits cleanly in Python 3.8 but crashes with 
ChildProcessError in Python 3.9. It's a race against child process exit, so in 
practice (without the sleep) it's a heisenbug.

There's a deeper race here that's harder to demonstrate in an example, but 
parallel to the original wait/kill issue: If the child PID happens to be reused 
by another parent thread in this same process, the call to waitpid might 
*succeed* and reap the unrelated child process. That would export the crash to 
that thread, and possibly create more kill/wait races.

In short, the question of when exactly a child process is reaped is important 
for correct signaling on Unix, and changing that behavior can break programs in 
confusing ways. This change affected the Duct library, and I might not've 
caught it if not for a lucky failing doctest: 
https://github.com/oconnor663/duct.py/commit/5dfae70cc9481051c5e53da0c48d9efa8ff71507

I haven't searched for more instances of this bug in the wild, but one way to 
find them would be to look for code that calls both os.waitpid/waitid and also 
Popen.send_signal/kill/terminate. Duct found itself in this position because it 
was using waitid(WNOWAIT) on Unix only, to solve this same race condition, and 
also using Popen.kill on both Unix and Windows.

--
messages: 382429
nosy: oconnor663
priority: normal
severity: normal
status: open
title: waitpid/waitid race caused by change to Popen.send_signal in Python 3.9
type: crash
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue42558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40550] Popen.terminate fails with ProcessLookupError under certain conditions

2020-12-03 Thread Jack O'Connor


Jack O'Connor  added the comment:

I'm late to the party, but I want to explain what's going on here in case it's 
helpful to folks. The issue you're seeing here has to do with whether a child 
processs has been "reaped". (Windows is different from Unix here, because the 
parent keeps an open handle to the child, so this is mostly a Unix thing.) In 
short, when a child exits, it leaves a "zombie" process whose only job is to 
hold some metadata and keep the child's PID reserved.  When the parent calls 
wait/waitpid/waitid or similar, that zombie process is cleaned up. That means 
that waiting has important correctness properties apart from just blocking the 
parent -- signaling after wait returns is unsafe, and forgetting to wait also 
leaks kernel resources.

Here's a short example demonstrating this:

```
  import signal 

   
  import subprocess 

   
  import time   

   


   
  # Start a child process and sleep a little bit so that we know it's exited.   

   
  child = subprocess.Popen(["true"])

   
  time.sleep(1) 

   


   
  # Signal it. Even though it's definitely exited, this is not an error.

  
  os.kill(child.pid, signal.SIGKILL)

   
  print("signaling before waiting works fine")  

   


   
  # Now wait on it. We could also use os.waitpid or os.waitid here. This reaps  

   
  # the zombie child.   

   
  child.wait()  

   


   
  # Try to signal it again. This raises ProcessLookupError, because the child's 

   
  # PID has been freed. But note that Popen.kill() would be a no-op here,
  # because it knows the child has already been waited on.  

  
  os.kill(child.pid, signal.SIGKILL)

   
```

With that in mind, the original behavior with communicate() that started this 
bug is expected. The docs say that communicate() "waits for process to 
terminate and sets the returncode attribute." That means internally it calls 
waitpid, so your terminate() thread is racing against process exit. Catching 
the exc

[issue38630] subprocess.Popen.send_signal() should poll the process

2020-12-02 Thread Jack O'Connor


Jack O'Connor  added the comment:

This change caused a crash in the Duct library in Python 3.9. Duct uses the 
waitid(NOWAIT) strategy that Nathaniel Smith has described in this thread. With 
this change, the child process is reaped by kill() in a spot we didn't expect, 
and a subsequent call to os.waitid() raises a ChildProcessError. This is a race 
condition that only triggers if the child happens to exit before kill() is 
called.

I just pushed 
https://github.com/oconnor663/duct.py/commit/5dfae70cc9481051c5e53da0c48d9efa8ff71507
 to work around this, which I'll release shortly as Duct version 0.6.4.

Broadly speaking, this change could break any program that uses Popen.kill() 
together with os.waitpid() or os.waitid(). Checking Popen.returncode before 
calling the os functions is a good workaround for the single-threaded case, but 
it doesn't fix the multithreaded case. Duct is going to avoid calling 
Popen.kill() entirely. There are some longer comments about race conditions in 
that commit I linked.

I don't feel strongly one way the other about keeping this new behavior, but we 
should probably document it clearly in Popen.send_signal() and all of its 
callers that these functions might reap the child.

--
nosy: +oconnor663

___
Python tracker 
<https://bugs.python.org/issue38630>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39988] Remove AugLoad and AugStore expression context from AST

2020-03-20 Thread Jack O'Connor


Jack O'Connor  added the comment:

Ah never mind, it looks like that's covered by 
https://bugs.python.org/issue3

--

___
Python tracker 
<https://bugs.python.org/issue39988>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39988] Remove AugLoad and AugStore expression context from AST

2020-03-20 Thread Jack O'Connor


Jack O'Connor  added the comment:

This change may have broken pyflakes on nightly Python. Is that expected or 
problematic?

Error message:

"pyflakes" failed during execution due to "module 'ast' has no attribute 
'AugLoad'"

Seen at: https://travis-ci.org/github/buildinspace/peru/jobs/665005023

--
nosy: +oconnor663

___
Python tracker 
<https://bugs.python.org/issue39988>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-03-04 Thread Jack O'Connor


Jack O'Connor  added the comment:

I've just published some Python bindings for the Rust implementation on PyPI: 
https://pypi.org/project/blake3

> I'm guessing Python is gonna hold off until BLAKE3 reaches 1.0.

That's very fair. The spec and test vectors are set in stone at this point, but 
the implementations are new, and I don't see any reason to rush things out. 
(Especially since early adopters can now use the library above.) That said, 
there aren't really any expected implementation changes that would be a natural 
moment for the implementations to tag 1.0. I'll probably end up tagging 1.0 as 
soon as a caller appears who needs it to be tagged to meet their own stability 
requirements.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-02-12 Thread Jack O'Connor


Jack O'Connor  added the comment:

Version 0.2.0 of the BLAKE3 repo includes optimized assembly implementations. 
These are behind the "c" Cargo feature for the `blake3` Rust crate, but 
included by default for the internal bindings crate. So the easiest way to 
rerun our favorite benchmark is:

git clone https://github.com/BLAKE3-team/BLAKE3
cd BLAKE3
git fetch
# I rebased this branch on top of version 0.2.0 today.
git checkout origin/bench_406668786
cd c/blake3_c_rust_bindings
# Nightly is currently broken for unrelated reasons, so
# we use stable with this internal bootstrapping flag.
RUSTC_BOOTSTRAP=1 cargo bench 406668786

Running the above on my machine, I get 2888 MB/s, up another 12% from the 0.1.3 
numbers. As a bonus, we don't need to worry about the difference between GCC 
and Clang.

These new assembly files are essentially drop-in replacements for the 
instruction-set-specific C files we had before, which are also still supported. 
The updated C README has more details: 
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-22 Thread Jack O'Connor


Jack O'Connor  added the comment:

Version 0.1.3 of the official BLAKE3 repo includes some significant performance 
improvements:

- The x86 implementations include explicit prefetch instructions, which helps 
with long inputs. (commit b8c33e1)
- The C implementation now uses the same parallel parent hashing strategy that 
the Rust implementation uses. (commit 163f522)

When I repeat the benchmarks above with TurboBoost on, here's what I see now:

BLAKE3 Rust  2578 MB/s
BLAKE3 C (clang -O3) 2502 MB/s
BLAKE3 C (gcc -O2)   2223 MB/s
K12 C (gcc -O2)  2175 MB/s

Larry, if you have time to repeat your benchmarks with the latest C code, I'd 
be curious to see if you get similar results.

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-17 Thread Jack O'Connor


Jack O'Connor  added the comment:

I plan to bring the C code up to speed with the Rust code this week. As part of 
that, I'll probably remove comments like the one above :) Otherwise, is there 
anything else we can do on our end to help with this?

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-16 Thread Jack O'Connor


Jack O'Connor  added the comment:

Ok, I've added Rust bindings to the BLAKE3 C implementation, so that I can 
benchmark it in a vaguely consistent way. My laptop is an i5-8250U, which 
should be very similar to yours. (Both are "Kaby Lake Refresh".) My end result 
do look similar to yours with TurboBoost on, but pretty different with 
TurboBoost off:

with TurboBoost on
--
K12 GCC| 2159 MB/s
BLAKE3 Rust| 1787 MB/s
BLAKE3 C Clang | 1588 MB/s
BLAKE3 C GCC   | 1453 MB/s

with TurboBoost off
---
BLAKE3 Rust| 1288 MB/s
K12 GCC| 1060 MB/s
BLAKE3 C Clang | 1094 MB/s
BLAKE3 C GCC   |  943 MB/s

The difference seems to be that with TurboBoost on, the BLAKE3 benchmarks have 
my CPU sitting around 2.4 GHz, while for the K12 benchmarks it's more like 2.9 
GHz. With TurboBoost off, both benchmarks run at 1.6 GHz, and BLAKE3 does 
better. I'm not sure what causes that frequency difference. Perhaps some 
high-power instruction that the BLAKE3 implementation is emitting?

To reproduce these numbers you can clone these two repos (the latter is where I 
happen to have a K12 benchmark):

https://github.com/BLAKE3-team/BLAKE3
https://github.com/oconnor663/blake2_simd

Then in both cases checkout the "bench_406668786" branch, where I've put some 
benchmarks with the same input length you used.

For Rust BLAKE3, at the root of the BLAKE3 repo, run: cargo +nightly bench 
406668786

For C BLAKE3, the command is the same, but run it in the 
"./c/blake3_c_rust_bindings" directory. The build defaults to GCC, and you can 
"export CC=clang" to switch it.

For my K12 benchmark, at the root of the blake2_simd repo, run: cargo +nightly 
bench --features=kangarootwelve 406668786

--

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-13 Thread Jack O'Connor


Jack O'Connor  added the comment:

I'm in the middle of adding some Rust bindings to the C implementation in 
github.com/BLAKE3-team/BLAKE3, so that `cargo test` and `cargo bench` can cover 
both. Once that's done, I'll follow up with benchmark numbers from my laptop 
(Kaby Lake i5-8250U, also AVX2 with no AVX-512). For benchmark numbers with 
AVX-512 support, see the Performance section of the BLAKE3 paper 
(https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf). Larry, 
what processor did you run your benchmarks on?

Also, is there anything currently in CPython that does dispatch based on 
runtime CPU feature detection? Is this something that BLAKE3 should do for 
itself, or is there existing machinery that we'd want to integrate with?

--
nosy: +oconnor663

___
Python tracker 
<https://bugs.python.org/issue39298>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31933] some Blake2 parameters are encoded backwards on big-endian platforms

2017-11-03 Thread Jack O'Connor

New submission from Jack O'Connor <oconnor...@gmail.com>:

See https://github.com/BLAKE2/libb2/issues/12.

All Blake2 params have to be encoded in little-endian byte order. For the two 
multi-byte integer params, leaf_length and node_offset, that means that 
assigning a native-endian integer to them appears to work on little-endian 
platforms, but gives the wrong result on big-endian. The current libb2 API 
doesn't make that very clear, and @sneves is working on new API functions in 
the GH issue above. In the meantime, we can work around the problem by 
explicitly assigning little-endian values to the parameter block.

--
messages: 305473
nosy: oconnor663
priority: normal
severity: normal
status: open
title: some Blake2 parameters are encoded backwards on big-endian platforms
type: behavior
versions: Python 3.6, Python 3.7, Python 3.8

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31933>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26848] asyncio.subprocess's communicate() method mishandles empty input bytes

2016-04-25 Thread Jack O'Connor

Jack O'Connor added the comment:

Related: The asyncio communicate() method differs from standard subprocess in 
how it treats input bytes when stdin is (probably mistakenly) not set to PIPE. 
Like this:

proc = await create_subprocess_shell("sleep 5")
await proc.communicate(b"foo")  # Oops, I forgot stdin=PIPE above!

The standard, non-async version of this example, communicate would ignore the 
input bytes entirely. But here in the asyncio version, communicate will try to 
write those bytes to stdin, which is None, and the result is an AttributeError.

Since the user probably only hits this case by mistake, I think raising an 
exception is preferable. But it would be nice to raise an exception that 
explicitly said "you've forgotten stdin=PIPE" instead of the unhelpful 
"'NoneType' object has no attribute 'write'". Maybe it would be worth cleaning 
this up while we're here?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26848>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26848] asyncio.subprocess's communicate() method mishandles empty input bytes

2016-04-25 Thread Jack O'Connor

Jack O'Connor added the comment:

Thanks for the heads up, Berker, I've re-submitted the PR as 
https://github.com/python/asyncio/pull/335.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26848>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26848] asyncio.subprocess's communicate() method mishandles empty input bytes

2016-04-25 Thread Jack O'Connor

New submission from Jack O'Connor:

Setting stdin=PIPE and then calling communicate(b"") should close the child's 
stdin immediately, similar to stdin=DEVNULL. Instead, communicate() treats b"" 
like None and leaves the child's stdin open, which makes the child hang forever 
if it tries to read anything.

I have a PR open with a fix and a test: 
https://github.com/python/cpython/pull/33

--
components: asyncio
messages: 264212
nosy: gvanrossum, haypo, oconnor663, yselivanov
priority: normal
severity: normal
status: open
title: asyncio.subprocess's communicate() method mishandles empty input bytes
type: behavior
versions: Python 3.5, Python 3.6

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26848>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24909] Windows: subprocess.Popen: race condition for leaking inheritable handles

2016-03-04 Thread Jack O'Connor

Changes by Jack O'Connor <oconnor...@gmail.com>:


--
nosy: +oconnor663

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24909>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25779] deadlock with asyncio+contextmanager+ExitStack

2015-12-02 Thread Jack O'Connor

New submission from Jack O'Connor:

The following hangs at 100% CPU on Python 3.5, though not on Python 3.4:

1) Start an asyncio coroutine with run_until_complete().
2) Inside the coroutine, enter an ExitStack using a with-statement.
3) Inside the with-statement, call ExitStack.enter_context() with a generator 
context manager. It doesn't matter what the generator yields.
4) After the enter_context() call, raise an exception.

Here's an example script that does all of this and repros the hang: 
https://gist.github.com/oconnor663/483db2820bb5f877c9ed

--
components: asyncio
messages: 255719
nosy: gvanrossum, haypo, oconnor663, yselivanov
priority: normal
severity: normal
status: open
title: deadlock with asyncio+contextmanager+ExitStack
type: behavior
versions: Python 3.5

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25779>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25779] deadlock with asyncio+contextmanager+ExitStack

2015-12-02 Thread Jack O'Connor

Jack O'Connor added the comment:

Thanks for chasing this down. Yury, can you suggest a workaround?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25779>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25781] infinite loop in reprlib

2015-12-02 Thread Jack O'Connor

Changes by Jack O'Connor <oconnor...@gmail.com>:


--
nosy: +oconnor663

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25781>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25782] CPython hangs on error __context__ set to the error itself

2015-12-02 Thread Jack O'Connor

Jack O'Connor added the comment:

Yury, do we need to handle more complicated infinite loops, where "self" 
doesn't actually show up in the loop? Here's an example:

try:
raise Exception
except Exception as ex:
loop1 = Exception()
loop2 = Exception()
loop1.__context__ = loop2
loop2.__context__ = loop1
ex.__context__ = loop1
hasattr(1, 'aa')

I'm unfamiliar with CPython, so I don't know whether full-blown loop detection 
belongs here. Maybe we could add a hardcoded limit like "fail if we loop more 
than X times"?

--
nosy: +oconnor663

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25782>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25786] contextlib.ExitStack introduces a cycle in exception __context__

2015-12-02 Thread Jack O'Connor

Changes by Jack O'Connor <oconnor...@gmail.com>:


--
nosy: +oconnor663

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25786>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25779] deadlock with asyncio+contextmanager+ExitStack

2015-12-02 Thread Jack O'Connor

Jack O'Connor added the comment:

Yury, can you help me understand why `hasattr("foo", "bar")` triggers the 
infinite loop there, but not `print("foo")`?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25779>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25565] subprocess.Popen creates inheritable file descriptors on Windows, can leak to other child processes

2015-11-06 Thread Jack O'Connor

Jack O'Connor added the comment:

Definitely a duplicate, thanks for pointing me to the original. Should I mark 
it resolved, or let someone from the project do that?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25565>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com




[issue25565] subprocess.Popen creates inheritable file descriptors on Windows, can leak to other child processes

2015-11-05 Thread Jack O'Connor

New submission from Jack O'Connor:

The Windows implementation of Popen calls _make_inheritable(), which creates 
inheritable copies of Popen's file descriptors. If two Popen calls happen at 
the same time on different threads, these descriptors can leak to both child 
processes. Here's a demonstration of a deadlock caused by this bug:

https://gist.github.com/oconnor663/b1d39d58b232fc627d84

Victor Stinner also wrote up a summary of the security issues associated with 
leaking file descriptors in PEP 0446.

A workaround for this issue is to protect all Popen calls with a lock. Calls to 
wait() and communicate() don't need to be protected, so you can release the 
lock before you make those blocking calls. I don't see a way to safely use 
run() or the other convenience functions, if you're using pipes and multiple 
threads. Unfortunately close_fds=True is not allowed on Windows when any of 
stdin/stdout/stderr are set, which is going the be the case here.

Would it be feasible for Popen.__init__() to automatically protect the 
inheritable copies it creates, with a lock around that section? We already have 
the _waitpid_lock for POSIX, so it seems like thread safety is a goal.

--
components: Library (Lib)
messages: 254168
nosy: oconnor663
priority: normal
severity: normal
status: open
title: subprocess.Popen creates inheritable file descriptors on Windows, can 
leak to other child processes
type: security
versions: Python 3.5

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25565>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23548] TypeError in event loop finalizer, new in Python 3.4.3

2015-03-05 Thread Jack O'Connor

Jack O'Connor added the comment:

Got it, thanks for the heads up.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23548
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23548] TypeError in event loop finalizer, new in Python 3.4.3

2015-03-04 Thread Jack O'Connor

Jack O'Connor added the comment:

@gvanrossum, the last two lines you suggested don't give any error, as 
expected. Not sure why we're getting that error message in the toy example.

@haypo, closing the event loop explicitly works fine for me. But since you 
mentioned it, I don't see any warning when I leave out closing the event loop 
and turn on PYTHONASYNCIODEBUG=1. What should I be seeing?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23548
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23548] TypeError in event loop finalizer, new in Python 3.4.3

2015-02-28 Thread Jack O'Connor

Jack O'Connor added the comment:

`close()` fixes it; thanks for the workaround! When I throw a print statement 
inside `remove_signal_handler`, it says that sig is 17 and handler is 0. 17 
looks to be SIGCHLD, presumably from the little echo subprocess exiting in this 
example.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23548
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23548] TypeError in event loop finalizer, new in Python 3.4.3

2015-02-28 Thread Jack O'Connor

New submission from Jack O'Connor:

This toy program:

import asyncio
@asyncio.coroutine
def main():
p = yield from asyncio.create_subprocess_shell('echo hi')
yield from p.wait()
asyncio.get_event_loop().run_until_complete(main())

Produces this output on Arch Linux under Python 3.4.3 (but not 3.4.2):

hi
Exception ignored in: bound method _UnixSelectorEventLoop.__del__ of 
_UnixSelectorEventLoop running=False closed=True debug=False
Traceback (most recent call last):
  File /home/jacko/Downloads/Python-3.4.3/Lib/asyncio/base_events.py, line 
361, in __del__
  File /home/jacko/Downloads/Python-3.4.3/Lib/asyncio/unix_events.py, line 
57, in close
  File /home/jacko/Downloads/Python-3.4.3/Lib/asyncio/unix_events.py, line 
138, in remove_signal_handler
TypeError: signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable 
object

--
components: asyncio
messages: 236892
nosy: gvanrossum, haypo, oconnor663, yselivanov
priority: normal
severity: normal
status: open
title: TypeError in event loop finalizer, new in Python 3.4.3
type: crash
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23548
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22428] KeyboardInterrupt inside a coroutine causes AttributeError

2014-09-17 Thread Jack O'Connor

New submission from Jack O'Connor:

The following test script prints a KeyboardInterrupt traceback (expected), but 
also an AttributeError traceback (unexpected):


import asyncio
@asyncio.coroutine
def main():
raise KeyboardInterrupt
asyncio.get_event_loop().run_until_complete(main())


Traceback (most recent call last):
  File test.py, line 9, in module
asyncio.get_event_loop().run_until_complete(main())
  File /usr/lib/python3.4/asyncio/base_events.py, line 203, in 
run_until_complete
self.run_forever()
  File /usr/lib/python3.4/asyncio/base_events.py, line 184, in run_forever
self._run_once()
  File /usr/lib/python3.4/asyncio/base_events.py, line 817, in _run_once
handle._run()
  File /usr/lib/python3.4/asyncio/events.py, line 39, in _run
self._callback(*self._args)
  File /usr/lib/python3.4/asyncio/tasks.py, line 321, in _step
result = next(coro)
  File /usr/lib/python3.4/asyncio/tasks.py, line 103, in coro
res = func(*args, **kw)
  File test.py, line 6, in main
raise KeyboardInterrupt
KeyboardInterrupt
--- Logging error ---
Traceback (most recent call last):
--- Logging error ---
Traceback (most recent call last):
Exception ignored in: bound method Task.__del__ of 
Task(coro)exception=KeyboardInterrupt()
Traceback (most recent call last):
  File /usr/lib/python3.4/asyncio/futures.py, line 184, in __del__
  File /usr/lib/python3.4/asyncio/base_events.py, line 725, in 
call_exception_handler
  File /usr/lib/python3.4/logging/__init__.py, line 1296, in error
  File /usr/lib/python3.4/logging/__init__.py, line 1402, in _log
  File /usr/lib/python3.4/logging/__init__.py, line 1412, in handle
  File /usr/lib/python3.4/logging/__init__.py, line 1482, in callHandlers
  File /usr/lib/python3.4/logging/__init__.py, line 846, in handle
  File /usr/lib/python3.4/logging/__init__.py, line 977, in emit
  File /usr/lib/python3.4/logging/__init__.py, line 899, in handleError
  File /usr/lib/python3.4/traceback.py, line 169, in print_exception
  File /usr/lib/python3.4/traceback.py, line 153, in _format_exception_iter
  File /usr/lib/python3.4/traceback.py, line 18, in _format_list_iter
  File /usr/lib/python3.4/traceback.py, line 65, in _extract_tb_or_stack_iter
  File /usr/lib/python3.4/linecache.py, line 15, in getline
  File /usr/lib/python3.4/linecache.py, line 41, in getlines
  File /usr/lib/python3.4/linecache.py, line 126, in updatecache
  File /usr/lib/python3.4/tokenize.py, line 437, in open
AttributeError: 'module' object has no attribute 'open'


The issue is that Task._step() calls self.set_exception() for both Exceptions 
and BaseExceptions, but it reraises BaseExceptions. That means that the 
BaseEventLoop never gets to call future.result() to clear the exception when 
it's a BaseException. Future.__del__ eventually tries to log this uncleared 
exception, but something about the circumstances of program exit (caused by the 
same exception) has already cleaned up the builtins needed for logging.

Is that __del__ violating some best practices for finalizers by calling such a 
complicated function? Either way, I'm not sure this case should be considered 
an unretrieved exception at all. It's propagating outside the event loop 
after all. Should Task._step() be setting it in the first place, if it's going 
to reraise it? Should it be set without _log_traceback=True somehow?

--
components: asyncio
messages: 226985
nosy: gvanrossum, haypo, oconnor663, yselivanov
priority: normal
severity: normal
status: open
title: KeyboardInterrupt inside a coroutine causes AttributeError
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22428
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22427] TemporaryDirectory attempts to clean up twice

2014-09-17 Thread Jack O'Connor

Jack O'Connor added the comment:

My example script is definitely a corner case, but where this actually came up 
for me was in asyncio. Using a TemporaryDirectory inside a coroutine creates a 
similar situation.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22427
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22427] TemporaryDirectory attempts to clean up twice

2014-09-16 Thread Jack O'Connor

New submission from Jack O'Connor:

The following little script prints (but ignores) a FileNotFoundError.

import tempfile
def generator():
with tempfile.TemporaryDirectory():
yield
g = generator()
next(g)

Exception ignored in: generator object generator at 0x7fb319fe2c60
Traceback (most recent call last):
  File gen.py, line 6, in generator
  File /usr/lib/python3.4/tempfile.py, line 691, in __exit__
  File /usr/lib/python3.4/tempfile.py, line 697, in cleanup
  File /usr/lib/python3.4/shutil.py, line 454, in rmtree
  File /usr/lib/python3.4/shutil.py, line 452, in rmtree
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp7wek4xhy'

Putting print statements in the TemporaryDirectory class shows what's happening 
(confirming Guido's theory from 
https://groups.google.com/forum/#!topic/python-tulip/QXgWH32P2uM): As the 
program exits, the TemporaryDirectory object is finalized. This actually 
rmtree's the directory. After that, the generator is finalized, which raises a 
GeneratorExit inside of it. That exception causes the with statement to call 
__exit__ on the already-finalized TemporaryDirectory, which tries to rmtree 
again and throws the FileNotFoundError.

The main downside of this bug is just garbage on stderr. I suppose in 
exceptionally unlikely circumstances, a new temp dir by the same name could be 
created between the two calls, and we might actually delete something we 
shouldn't. The simple fix would be to store a _was_cleaned flag or something on 
the object. Is there any black magic in finalizers or multithreading that 
demands something more complicated?

--
components: Library (Lib)
messages: 226979
nosy: oconnor663
priority: normal
severity: normal
status: open
title: TemporaryDirectory attempts to clean up twice
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22427
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22279] read() vs read1() in asyncio.StreamReader documentation

2014-08-27 Thread Jack O'Connor

Jack O'Connor added the comment:

Agreed that changing read() would probably break tons of people. I don't think 
a naming inconsistency meets the serious flaws are uncovered bar for breaking 
a provisional package. If we actually prefer the asyncio way of doing things, 
all the better.

That said, another thing I'm noticing is that in asyncio, read is basically two 
different functions. This is clear in the code, 
http://hg.python.org/cpython/file/fb3aee1cff59/Lib/asyncio/streams.py#l433, 
where the n0 case goes off on its own branch and never comes back. 
(Incidentally there's another n0 check at line 453 there that I think always 
returns false.) We have a read function that makes very different guarantees 
depending on the value of n. Contrast this with the read function from regular 
io, where read(n) and read() are effectively the same if n is large enough. 
Maybe just another point that's worth clarifying in the docs.

Thanks for the quick replies!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22279
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22279] read() vs read1() in asyncio.StreamReader documentation

2014-08-26 Thread Jack O'Connor

New submission from Jack O'Connor:

BufferedIOBase and related classes have a read(n) and read1(n). The first will 
wait until n bytes are available (or EOF), while the second will return as soon 
as any bytes are available. In asyncio.StreamReader, there is no read1 method, 
but the read method seems to behave like BufferedIOBase.read1, and there is a 
readexactly method that behaves like BufferedIOBase.read.

Is there a reason for this naming change? Either way, could the documentation 
for asyncio.StreamReader.read be clarified?

--
components: asyncio
messages: 225924
nosy: gvanrossum, haypo, oconnor663, yselivanov
priority: normal
severity: normal
status: open
title: read() vs read1() in asyncio.StreamReader documentation
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22279
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com