[issue43451] pydoc terminal suboptimal rendering of complex annotations

2021-03-09 Thread David Wilson


Change by David Wilson :


--
keywords: +patch
pull_requests: +23570
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24802

___
Python tracker 
<https://bugs.python.org/issue43451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43451] pydoc terminal suboptimal rendering of complex annotations

2021-03-09 Thread David Wilson


New submission from David Wilson :

When viewing docs for classes that use annotations, pydoc's rendering of 
argument lists is regularly truncated at the terminal edge (if using `less 
-S`), or wrapped in a manner where quickly scanning the output is next to 
impossible.

My 'classic' example is 'pydoc aiohttp.client.ClientSession', where the 
__init__ argument list wraps to 24 lines.

The pull request attached to this issue works around the problem by formatting 
arguments one-per-line if the sum of the arguments would exceed a hard-coded 
width of 150 characters. It is more of an RFC than a suggested fix. It produces 
acceptable formatting, but I'm not sure where the correct fix should be made -- 
in the inspect module, or somehow in the pydoc module, or even what the correct 
fix shuld be.

I will include a before/after screenshot in the pull request

I will attach before/after screenshot

--
components: Library (Lib)
messages: 388373
nosy: dw
priority: normal
severity: normal
status: open
title: pydoc terminal suboptimal rendering of complex annotations
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue43451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37696] FileIO.read() on a closed TTY throws an exception prematurely

2019-08-08 Thread David Wilson


David Wilson  added the comment:

A real example of where returning the partial buffer is dangerous would be 
EBADF.

- Repeated reading succeeds, building up a partial buffer. Another thread runs, 
and buggy code causes an unrelated fd blonging to the file object to be closed.

- Original thread resumes, calls read(), which now returns EBADF.

If partial buffer is returned and EBADF is masked, and user only ever calls 
readall() once, a potentially deadly IO corruption bug is completely hidden in 
their code.

I think the correct behaviour in the case of 'bad' errno must remain that the 
partial buffer is discarded, the interface does not seem to make room for any 
safe+better option


So I think to reach the desired outcome of this ticket, the suggested approach 
is to add special handling for a small list of errnos generally accepted to 
unambiguously mean EOF, and in that special case, allow returning the 'partial' 
(actually complete) buffer.

--

___
Python tracker 
<https://bugs.python.org/issue37696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37696] FileIO.read() on a closed TTY throws an exception prematurely

2019-08-08 Thread David Wilson


David Wilson  added the comment:

If we treat different errnos specially, the list of 'okay to silently fail' 
errors seems quite succinct. In another project I treat EIO, EPIPE and 
ECONNRESET as EOF, and raise all others -- 
https://github.com/dw/mitogen/blob/c6de090f083a58344e91ab97847bf7ae3feb9134/mitogen/core.py#L501-L532

But even in this case, if readall() returns the partial buffer given a 'good' 
errno, does it still discard the partial buffer given a 'bad' one? That still 
doesn't feel right

--

___
Python tracker 
<https://bugs.python.org/issue37696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37696] FileIO.read() on a closed TTY throws an exception prematurely

2019-08-08 Thread David Wilson


David Wilson  added the comment:

Interesting, this immediately turns into a little rabbit hole :)

The reason read() is failing in this case, is because argument clinic defaults 
the size parameter to -1, which redirects the call to readall(). So this issue 
is actually about readall().

Calling read(8) in the previous reproduction returns the buffer up to the point 
of EIO as expected.

readall() is somewhat awkward. It is not really expected that a user would call 
it twice, and so the old semantic of the second read() returning EIO doesn't 
seem to apply cleanly.

So >=2 issues:

- readall() is discarding the partial buffer, that seems unavoidably like a bug
- readall() does not intuitively feel like a function you might want to call 
twice
- nothing in fileio.c or fileutils.c make any attempt to understand errno 
except for handling EINTR.

I'm guessing the 2.x behaviour in this case was that no parsing of errno was 
done either, just silently discard any error when a partial buffer exists.

But that leaves the awkward possibility that some real scary error occurred, 
for example EBADF or EFAULT, and the single call by the user to readall() never 
flagged it.

Opinions?

--

___
Python tracker 
<https://bugs.python.org/issue37696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37696] FileIO.read() on a closed TTY throws an exception prematurely

2019-07-28 Thread David Wilson


David Wilson  added the comment:

Happy to send a patch for this if we can agree on the semantic being incorrect, 
and more importantly, someone is happy to review the patch once it reaches 
GitHub ;)

--

___
Python tracker 
<https://bugs.python.org/issue37696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37696] FileIO.read() on a closed TTY throws an exception prematurely

2019-07-28 Thread David Wilson


New submission from David Wilson :

Given:

$ cat tty-failure.py 

import pty
import os

master, slave = pty.openpty()
master = os.fdopen(master, 'r+b', 0)
slave = os.fdopen(slave, 'r+b', 0)
slave.write(b'foo')
slave.close()
print(master.read())

On Python 2, read() would return b'foo', with subsequent calls raising IOError, 
whereas on Python 3 an OSError is raised due to the underlying file descriptor 
returning EIO.

In the case of a PTY, EIO indicates the remote side has hung up and more or 
less can be treated as an EOF indicator.

On Python 3 the partial buffer should not be discarded when a subsequent read() 
syscall returns an error.

Secondarily, the change from IOError to OSError looks wrong. Does anyone know 
what's going on there? I would never expect to see OSError raised by a builtin

--
components: IO
messages: 348578
nosy: dw
priority: normal
severity: normal
status: open
title: FileIO.read() on a closed TTY throws an exception prematurely
type: behavior
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue37696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36279] os.wait3() leaks some uninitialized stack when no processes exist

2019-07-26 Thread David Wilson


David Wilson  added the comment:

The original diff is attached here (per the old process) so others can find it, 
and the PR+fork are closed, as carrying a fork in my GitHub for 4 months has 
non-zero cost. I'm presently more interested in having a clean GH account than 
carrying around the baggage of forgotten old patches

--
resolution:  -> out of date
stage: patch review -> resolved
status: open -> closed
Added file: 
https://bugs.python.org/file48507/c805d2764e6a0ee4d22a338c5f4fef6154df8687.diff

___
Python tracker 
<https://bugs.python.org/issue36279>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36279] os.wait3() leaks some uninitialized stack when no processes exist

2019-03-12 Thread David Wilson


New submission from David Wilson :

Not sure if this is worth reporting..

p = os.popen('sleep 1234')
os.wait3(os.WNOHANG)
os.wait3(os.WNOHANG)
os.wait3(os.WNOHANG)

Notice struct rusage return value. When wait3() succeeds on Linux, but no child 
was waiting to be reaped,  is not updated by the kernel, therefore it is 
passed uninitialized back to the user, essentially leaking a little bit of 
stack memory

--
messages: 337833
nosy: dw
priority: normal
severity: normal
status: open
title: os.wait3() leaks some uninitialized stack when no processes exist

___
Python tracker 
<https://bugs.python.org/issue36279>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35486] subprocess module import hooks breaks back compatibility

2019-01-08 Thread David Wilson


David Wilson  added the comment:

Hi Nick,

The purpose of ModuleNotFoundError is clear and unrelated to the problem 
documented here. The problem is that due to the introduction of 
ModuleNotFoundError, ImportError's semantics have been changed within a minor 
release in a breaking, backwards-incompatible manner.

As of Python 3.5, ImportError meant both "ImportError" and 
"ModuleNotFoundError". As of 3.6, the senses are distinct, and thus code 
written against ImportError as it existed in 3.5 no longer works correctly as 
of 3.6.

--

___
Python tracker 
<https://bugs.python.org/issue35486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35486] subprocess module breaks backwards compatibility with import hooks

2018-12-13 Thread David Wilson


David Wilson  added the comment:

Having considered this for a few hours, it seems the following is clear:

- 3.6 introduces ModuleImportError
- 3.7 begins using it within importlib
- 3.8 first instance of stdlib code catching it

- PEP-302 and PEP-451 are the definitive specifications for how sys.meta_path 
is supposed to work, and neither makes mention of ModuleNotFoundError. It seems 
at least this should have been flagged during review of the original change, 
but apparently the name of the exception was more important.

- The recent work done to move most of the import machinery on to sys.meta_path 
has exposed a set of import hooks that don't quite comply to the documented 
import hook interface

- The newly advertised ModuleNotFoundError appearing in stack traces for 
everyone means that more people will continue to write new cases of "except 
ModuleNotFoundError:", which while following best practice (catch most specific 
relevant exception), at present it amounts to relying on an implementation 
detail of the default importer. GitHub search reveals this to be an accurate 
reading: https://github.com/search?q=%22except+ModuleNotFoundError%22=Code

Therefore we are in a position where:

- Natural developer habit will cause much more protocol-violating code to exist 
over time, there is no option to stop this process
- New import hooks written against the existing documentation will break in the 
face of developer habit
- Existing import hooks were broken in Python 3.6 and this is not documented 
anywhere
- Python's own import machinery contravenes its specification

Options:

- Updating PEP-302 to mention introduction of the new exception type
- Updating ModuleNotFoundError/ImportError documentation to somehow capture the 
compatibility issue

--

___
Python tracker 
<https://bugs.python.org/issue35486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35486] subprocess module breaks backwards compatibility with import hooks

2018-12-13 Thread David Wilson


Change by David Wilson :


--
title: subprocess module breaks backwards compatibility with older import hooks 
-> subprocess module breaks backwards compatibility with import hooks

___
Python tracker 
<https://bugs.python.org/issue35486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35486] subprocess module breaks backwards compatibility with older import hooks

2018-12-13 Thread David Wilson


David Wilson  added the comment:

Corrected GitHub link for the commit: 
https://github.com/python/cpython/commit/880d42a3b24

--

___
Python tracker 
<https://bugs.python.org/issue35486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35486] subprocess module breaks backwards compatibility with older import hooks

2018-12-13 Thread David Wilson


New submission from David Wilson :

The subprocess package since 880d42a3b24 / September 2018 has begun using this 
idiom:

try:
import _foo
except ModuleNotFoundError:
bar

However, ModuleNotFoundError may not be thrown by older import hook 
implementations, since that subclass was only introduced in Python 3.6 -- and 
so the test above will fail. PEP-302 continues to document ImportError as the 
appropriate exception that should be raised.

https://mitogen.readthedocs.io/en/stable/ is one such import hook that lazily 
loads packages over the network when they aren't available locally. Current 
Python subprocess master breaks with Mitogen because when it discovers the 
master cannot satisfy the import, it throws ImportError.

The new exception subtype was introduced in https://bugs.python.org/issue15767 
, however very little in the way of rationale was included, and so it's unclear 
to me what need the new subtype is addressing, whether this is a problem with 
the subprocess module or the subtype as a whole, or indeed whether any of this 
should be considered a bug.

It seems clear that some kind of regression is in the process of occurring 
during a minor release, and it also seems clear the new subtype will 
potentially spawn a whole variety of similar new regressions.

I will be updating Mitogen to throw the new subtype if it is available, but I 
felt it was important to report the regression to see what others think.

--
components: Library (Lib)
messages: 331774
nosy: dw
priority: normal
severity: normal
status: open
title: subprocess module breaks backwards compatibility with older import hooks
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue35486>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread David Wilson

David Wilson added the comment:

@Julian note that ARENA_SIZE is double the threshold after which at least glibc 
resorts to calling mmap directly, so using malloc in place of mmap on at least 
Linux would have zero effect

--
nosy: +dw

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2015-03-04 Thread David Wilson

David Wilson added the comment:

Hi Piotr,

There wasn't an obvious fix that didn't involve changing the buffer interface 
itself. There is presently ambiguity in the interface regarding the difference 
between a read only buffer and an immutable buffer, which is crucial for 
its use in this case.

Fixing the interface, followed by every buffer interface user, is a 
significantly more complicated task than simply optimizing for the most common 
case, as done here. FWIW I still think this work is worth doing, though I 
personally don't have time to approach it just now.

We could have (and possibly should) approach fixing e.g. mmap.mmap() 
hashability, possibly causing user code regressions, but even if such cases 
were fixed it still wouldn't be a enough to rely on for the optimization 
implemented here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2015-02-09 Thread David Wilson

David Wilson added the comment:

Attached trivial patch for whatsnew.rst.

--
Added file: http://bugs.python.org/file38058/whatsnew.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2014-12-02 Thread David Wilson

David Wilson added the comment:

Could we also make a small tweak to zipfile.rst indicating the new behaviour? I 
had made an initial attempt in my patch but wasn't particularly happy with the 
wording.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2014-12-02 Thread David Wilson

David Wilson added the comment:

Sounds great :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2014-11-21 Thread David Wilson

David Wilson added the comment:

While in spirit this is a bug fix, it's reasonably complex and affects a 
popular module -- I'm not sure it should be applied to 2.x, and probably not in 
a minor release of 3.x either. Would it make sense to include as part of 3.5?

(That said, I'd love to see this fixed in 2.x ;))

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2014-11-14 Thread David Wilson

David Wilson added the comment:

Hi Serhiy,

Thanks for the new patch, it looks better than my attempt. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14099] ZipFile.open() should not reopen the underlying file

2014-11-13 Thread David Wilson

David Wilson added the comment:

Per my comment on issue16569, the overhead of performing one seek before each 
(raw file data) read is quite minimal. I have attached a new (but incomplete) 
patch, on which the following microbenchmarks are based. The patch is 
essentially identical to Stepan's 2012 patch, except I haven't yet decided how 
best to preserve the semantics of ZipFile.close().

my.zip is the same my.zip from issue22842. It contains 10,000 files each 
containing 10 bytes over 2 lines.

my2.zip contains 8,000 files each containing the same copy of 64kb of 
/dev/urandom output. The resulting ZIP is 500mb.

For each test, the first run is the existing zipfile module, and the second run 
is with the patch. In summary:

* There is a 35% perf increase in str mode when handling many small files (on 
OS X at least)
* There is a 6% perf decrease in file mode when handling small sequential reads.
* There is a 2.4% perf decrease in file mode when handling large sequential 
reads.


From my reading of zipfile.py, it is clear there are _many_ ways to improve 
its performance (probably starting with readline()), and rejection of a 
functional fix should almost certainly be at the bottom of that list.


For each of the tests below, the functions used were:

def a():

Test concurrent line reads to a str mode ZipFile.

zf = zipfile.ZipFile('my2.zip')
members = [zf.open(n) for n in zf.namelist()]
for m in members:
m.readline()
for m in members:
m.readline()

def c():

Test sequential small reads to a str mode ZipFile.

zf = zipfile.ZipFile('my2.zip')
for name in zf.namelist():
with zf.open(name) as zfp:
zfp.read(1000)

def d():

Test sequential small reads to a file mode ZipFile.

fp = open('my2.zip', 'rb')
zf = zipfile.ZipFile(fp)
for name in zf.namelist():
with zf.open(name) as zfp:
zfp.read(1000)

def e():

Test sequential large reads to a file mode ZipFile.

fp = open('my2.zip', 'rb')
zf = zipfile.ZipFile(fp)
for name in zf.namelist():
with zf.open(name) as zfp:
zfp.read()


 my.zip 

$ python3.4 -m timeit -s 'import my' 'my.a()'
10 loops, best of 3: 1.47 sec per loop

$ python3.4 -m timeit -s 'import my' 'my.a()'
10 loops, best of 3: 950 msec per loop

---

$ python3.4 -m timeit -s 'import my' 'my.c()'
10 loops, best of 3: 1.3 sec per loop

$ python3.4 -m timeit -s 'import my' 'my.c()'
10 loops, best of 3: 865 msec per loop

---

$ python3.4 -m timeit -s 'import my' 'my.d()'
10 loops, best of 3: 800 msec per loop

$ python3.4 -m timeit -s 'import my' 'my.d()'
10 loops, best of 3: 851 msec per loop


 my2.zip 

$ python3.4 -m timeit -s 'import my' 'my.a()'
10 loops, best of 3: 1.46 sec per loop

$ python3.4 -m timeit -s 'import my' 'my.a()'
10 loops, best of 3: 1.16 sec per loop

---

$ python3.4 -m timeit -s 'import my' 'my.c()'
10 loops, best of 3: 1.13 sec per loop

$ python3.4 -m timeit -s 'import my' 'my.c()'
10 loops, best of 3: 892 msec per loop

---

$ python3.4 -m timeit -s 'import my' 'my.d()'
10 loops, best of 3: 842 msec per loop

$ python3.4 -m timeit -s 'import my' 'my.d()'
10 loops, best of 3: 882 msec per loop

---

$ python3.4 -m timeit -s 'import my' 'my.e()'
10 loops, best of 3: 1.65 sec per loop

$ python3.4 -m timeit -s 'import my' 'my.e()'
10 loops, best of 3: 1.69 sec per loop

--
nosy: +dw
Added file: http://bugs.python.org/file37191/zipfile_concurrent_read_1.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22842] zipfile simultaneous open broken and/or needlessly(?) consumes unreasonable number of file descriptors

2014-11-10 Thread David Wilson

New submission from David Wilson:

There is some really funky behaviour in the zipfile module, where, depending on 
whether zipfile.ZipFile() is passed a string filename or a file-like object, 
one of two things happens:

a) Given a file-like object, zipfile does not (since it cannot) consume excess 
file descriptors on each call to '.open()', however simultaneous calls to 
.open() the zip file's members (from the same thread) will produce file-like 
objects for each member that appear intertwingled in some unfortunate manner:

Traceback (most recent call last):
  File my.py, line 23, in module
b()
  File my.py, line 18, in b
m.readline()
  File 
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py, 
line 689, in readline
return io.BufferedIOBase.readline(self, limit)
  File 
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py, 
line 727, in peek
chunk = self.read(n)
  File 
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py, 
line 763, in read
data = self._read1(n)
  File 
/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py, 
line 839, in _read1
data = self._decompressor.decompress(data, n)
zlib.error: Error -3 while decompressing data: invalid stored block lengths



b) Given a string filename, simultaneous use of .open() produces a new file 
descriptor for each opened member, which does not result in the above error, 
but triggers an even worse one: file descriptor exhaustion given a sufficiently 
large zip file.


This tripped me up rather badly last week during consulting work, and I'd like 
to see both these behaviours fixed somehow. The ticket is more an RFC to see if 
anyone has thoughts on how this fix should happen; it seems to me a no-brainer 
that, since the ZIP file format fundamentally always requires a seekable file, 
that in both the constructed using file-like object case, and the 
constructed using filename case, we should somehow reuse the sole file object 
passed to us to satisfy all reads of compressed member data.

It seems the problems can be fixed in both cases without damaging interface 
semantics by simply tracking the expected 'current' read offset in each 
ZipExtFile instance. Prior to any read, we simply call .seek() on the file 
object prior to performing any .read().

Of course the result would not be thread safe, but at least in the current 
code, ZipExtFile for a constructed from a file-like object edition zipfile is 
already not thread-safe. With some additional work, we could make the module 
thread-safe in both cases, however this is not the current semantic and doesn't 
appear to be guaranteed by the module documentation.

---

Finally as to why you'd want to simultaneously open huge numbers of ZIP 
members, well, ZIP itself easily supports streamy reads, and ZIP files can be 
quite large, even larger than RAM. So it should be possible, as I needed last 
week, to read streamily from a large number of members.

---

The attached my.zip is sufficient to demonstrate both problems.

The attached my.py has function a() to demonstrate the FD leak and b() to 
demonstrate the interwingly state.

--
components: Library (Lib)
files: mymy.zip
messages: 230987
nosy: dw
priority: normal
severity: normal
status: open
title: zipfile simultaneous open broken and/or needlessly(?) consumes 
unreasonable number of file descriptors
versions: Python 3.4
Added file: http://bugs.python.org/file37171/mymy.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22842
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22842] zipfile simultaneous open broken and/or needlessly(?) consumes unreasonable number of file descriptors

2014-11-10 Thread David Wilson

David Wilson added the comment:

As a random side-note, this is another case where I really wish Python had a 
.pread() function. It's uniquely valuable for coordinating positioned reads in 
a threaded app without synchronization (at user level anyway) or extraneous 
system calls.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22842
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16569] Preventing errors of simultaneous access in zipfile

2014-11-10 Thread David Wilson

David Wilson added the comment:

Compared to the cost of everything else ZipExtFile must do (e.g. 4kb string 
concatenation in a loop, zlib), its surprising that lseek() would measurable at 
all. 

The attached file 'patch' is the minimal change I tested. It represents, in 
terms of computation and system call overhead, all required to implement the 
seek before read solution to simultaneous access. On OSX, churning over ever 
member of every ZIP in my downloads directory (about 400mb worth), this change 
results in around 0.9% overhead compared to the original module.

Subsequently I'm strongly against the patch here. It is in effect papering over 
an implementation deficiency of the current zipfile module, one that could 
easily and cheaply be addressed.

(My comment on this ticket is in the context of the now-marked-duplicate 
issue22842).

--
nosy: +dw
Added file: http://bugs.python.org/file37172/patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16569
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15381] Optimize BytesIO to do less reallocations when written, similarly to StringIO

2014-08-13 Thread David Wilson

David Wilson added the comment:

Hey Serhiy,

The implementation for your readline optimization seems less contentious (and 
less risky) than the remainder of the patch -- it could perhaps be easily split 
off into a separate patch, which may be far more easily committed.

I love the concept of this patch, although from my last reading (weeks ago), 
it's slightly scary that it relies on Py_REFCNT() to know whether to mutate a 
string or not. In principle this should never break, in practice, however, it 
is uncertain that there are no strange edge cases that aren't immediately 
obvious.

The _PyBytes_Resize doc is quite clear: Only use this to build up a brand new 
bytes object; don’t use this if the bytes may already be known in other parts 
of the code

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15381
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22125] Cure signedness warnings introduced by #22003

2014-08-02 Thread David Wilson

New submission from David Wilson:

The attached patch (hopefully) silences the signedness warnings generated by 
Visual Studio and reported on python-dev in 
https://mail.python.org/pipermail/python-dev/2014-July/135603.html. 

This was sloppiness on my part, I even noted the problem in the original ticket 
and never fixed it. :)

I don't have a local dev environment setup for MSVC and Python, but at least 
the attached patch cures the signedness errors on Clang. They don't seem to 
occur at all with GCC on my Mac.

The added casts ensure comparisons uniformly compare in the unsigned domain. It 
seems size_t buf_size is pretty redundant in the original struct, it just 
introduces lots of casting when it only appears to be required during 
write_bytes() to avoid signed overflow (undefined behaviour)

--
components: Library (Lib)
files: cow-sign.patch
keywords: patch
messages: 224593
nosy: dw, pitrou, zach.ware
priority: normal
severity: normal
status: open
title: Cure signedness warnings introduced by #22003
type: compile error
versions: Python 3.5
Added file: http://bugs.python.org/file36217/cow-sign.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22125
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-29 Thread David Wilson

David Wilson added the comment:

I suspect it's all covered now, but is there anything else I can help with to 
get this patch pushed along its merry way?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-28 Thread David Wilson

David Wilson added the comment:

Newest patch incorporates Antoine's review comments. The final benchmark 
results are below. Just curious, what causes e.g. telco to differ up to 7% 
between runs? That's really huge

Report on Linux k2 3.14-1-amd64 #1 SMP Debian 3.14.9-1 (2014-06-30) x86_64
Total CPU cores: 4

### call_method_slots ###
Min: 0.329869 - 0.340487: 1.03x slower
Avg: 0.330512 - 0.341786: 1.03x slower
Significant (t=-216.69)
Stddev: 0.00067 - 0.00060: 1.x smaller

### call_method_unknown ###
Min: 0.351167 - 0.343961: 1.02x faster
Avg: 0.351731 - 0.344580: 1.02x faster
Significant (t=238.89)
Stddev: 0.00033 - 0.00040: 1.2271x larger

### call_simple ###
Min: 0.257487 - 0.277366: 1.08x slower
Avg: 0.257942 - 0.277809: 1.08x slower
Significant (t=-845.64)
Stddev: 0.00029 - 0.00029: 1.0126x smaller

### etree_generate ###
Min: 0.377985 - 0.365952: 1.03x faster
Avg: 0.381797 - 0.369452: 1.03x faster
Significant (t=31.15)
Stddev: 0.00314 - 0.00241: 1.3017x smaller

### etree_iterparse ###
Min: 0.545668 - 0.565437: 1.04x slower
Avg: 0.554188 - 0.576807: 1.04x slower
Significant (t=-17.00)
Stddev: 0.00925 - 0.00956: 1.0340x larger

### etree_process ###
Min: 0.294158 - 0.286617: 1.03x faster
Avg: 0.296354 - 0.288877: 1.03x faster
Significant (t=36.22)
Stddev: 0.00149 - 0.00143: 1.0435x smaller

### fastpickle ###
Min: 0.458961 - 0.475828: 1.04x slower
Avg: 0.460226 - 0.481228: 1.05x slower
Significant (t=-109.38)
Stddev: 0.00082 - 0.00173: 2.1051x larger

### nqueens ###
Min: 0.305883 - 0.295858: 1.03x faster
Avg: 0.308085 - 0.297755: 1.03x faster
Significant (t=90.22)
Stddev: 0.00077 - 0.00085: 1.0942x larger

### silent_logging ###
Min: 0.074152 - 0.075818: 1.02x slower
Avg: 0.074345 - 0.076005: 1.02x slower
Significant (t=-96.29)
Stddev: 0.00013 - 0.00012: 1.0975x smaller

### spectral_norm ###
Min: 0.355738 - 0.364419: 1.02x slower
Avg: 0.356691 - 0.365764: 1.03x slower
Significant (t=-126.23)
Stddev: 0.00054 - 0.00047: 1.1533x smaller

### telco ###
Min: 0.012152 - 0.013038: 1.07x slower
Avg: 0.012264 - 0.013157: 1.07x slower
Significant (t=-83.98)
Stddev: 0.8 - 0.7: 1.0653x smaller

The following not significant results are hidden, use -v to show them:
2to3, call_method, chaos, django_v2, etree_parse, fannkuch, fastunpickle, 
float, formatted_logging, go, hexiom2, iterative_count, json_dump, 
json_dump_v2, json_load, mako, mako_v2, meteor_contest, nbody, normal_startup, 
pathlib, pickle_dict, pickle_list, pidigits, raytrace, regex_compile, 
regex_effbot, regex_v8, richards, simple_logging, startup_nosite, 
threaded_count, tornado_http, unpack_sequence, unpickle_list.

--
Added file: http://bugs.python.org/file36137/cow6.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-27 Thread David Wilson

David Wilson added the comment:

Hey Antoine,

Thanks for the link. I'm having trouble getting reproducible results at 
present, and running out of ideas as to what might be causing it. Even after 
totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, 
numbers still jump around for the same binary by as much as 3%.

I could not detect any significant change between runs of the old and new 
binary that could not be described as noise, given the isolation issues above.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-24 Thread David Wilson

David Wilson added the comment:

This new patch abandons the buffer interface and specializes for Bytes per the 
comments on this issue.

Anyone care to glance at least at the general structure?

Tests could probably use a little more work.

Microbenchmark seems fine, at least for construction. It doesn't seem likely 
this patch would introduce severe performance troubles elsewhere, but I'd like 
to trying it out with some example heavy BytesIO consumers (any suggestions? 
Some popular template engine?)

cpython] ./python.exe -m timeit -s 'import i' 'i.readlines()'
lines: 54471
100 loops, best of 3: 13.3 msec per loop

[23:52:55 eldil!58 cpython] ./python-nocow -m timeit -s 'import i' 
'i.readlines()'
lines: 54471
10 loops, best of 3: 19.6 msec per loop

[23:52:59 eldil!59 cpython] cat i.py
import io
word = b'word'
line = (word * int(79/len(word))) + b'\n'
ar = line * int((4 * 1048576) / len(line))
def readlines():
return len(list(io.BytesIO(ar)))
print('lines: %s' % (readlines(),))

--
Added file: http://bugs.python.org/file36078/cow5.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-22 Thread David Wilson

David Wilson added the comment:

Stefan, I like your new idea. If there isn't some backwards compatibility 
argument about mmap.mmap being hashable, then it could be considered a bug, and 
fixed in the same hypothetical future release that includes this BytesIO 
change. The only cost now is that to test for hashability, we must hash the 
object, which causes every byte in it to be touched (aka. almost 50% the cost 
of a copy)

If however we can't fix mmap.mmap due to the interface change (I think that's a 
silly idea -- Python has never been about letting the user shoot themselves in 
the foot), then the specialized-for-Bytes approach is almost as good (and 
perhaps even better, since the resulting concept and structure layout is more 
aligned with Serhiy's patch in issue15381).

tl;dr: 

a) mmap.mmap can be fixed - use hashability as strong test for immutability 
(instead of ugly heuristic involving buffer blags)
   - undecided: is calling hash(obj) to check for immutability too costly?

b) mmap.mmap can't be fixed - use the Bytes specialization approach.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson

David Wilson added the comment:

Stefan,

Thanks for digging here. As much as I'd love to follow this interpretation, it 
simply doesn't match existing buffer implementations, including within the 
standard library.

For example, mmap.mmap(..., flags=mmap.MAP_SHARED, prot=mmap.PROT_READ) will 
produce a read-only buffer, yet mutability is entirely at the whim of the 
operating system. In this case, immutability may be apparent for years, until 
some machine has memory pressure, causing the shared mapping to be be flushed, 
and refreshed from (say, incoherent NFS storage) on next access.

I thought it would be worth auditing some of the most popular types of buffer 
just to check your interpretation, and this was the first, most obvious 
candidate.

Any thoughts? I'm leaning heavily toward the Bytes specialization approach

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson

David Wilson added the comment:

I'm not sure how much work it would be, or even if it could be made sufficient 
to solve our problem, but what about extending the buffers interface to include 
a int stable flag, defaulting to 0?

It seems though, that it would just be making the already arcane buffers 
interface even more arcane simply for the benefit of our specific use case

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson

Changes by David Wilson d...@botanicus.net:


Added file: http://bugs.python.org/file36016/cow4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson

David Wilson added the comment:

Hi Stefan,

How does this approach in reinit() look? We first ask for a writable buffer, 
and if the object obliges, immediately copy it. Otherwise if it refused, ask 
for a read-only buffer, and this time expect that it will never change.

This still does not catch the case of mmap.mmap. I am not sure how do deal with 
mmap.mmap. There is no way for it to export PROT_READ as a read-only buffer 
without permitted mutation, so the only options seem to either be a) remove 
buffer support from mmap, or b) blacklist it in bytesio(!).


Antoine, I have padded out the unit tests a little. test_memoryio.py seems the 
best place for them. Also modified test_sizeof(), although to the way this test 
is designed seems inherently brittle to begin with. Now it is also sensitive to 
changes in Py_buffer struct.


Various other changes:

* __new__ once again returns a valid, open, empty BytesIO, since the 
alternative breaks pickling.

* reinit() preserves existing BytesIO state until it knows it can succeed, 
which fixes another of the pickle tests.

* setstate() had CHECK_CLOSED() re-added, again for the pickle tests.


Probably the patch guts could be rearranged again, since the definition of the 
functions is no longer as clear as it was in cow3.patch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-20 Thread David Wilson

David Wilson added the comment:

This version is tidied up enough that I think it could be reviewed.

Changes are:

 * Defer `buf' allocation until __init__, rather than __new__ as was previously 
done. Now upon completion, BytesIO.__new__ returns a valid, closed BytesIO, 
whereas previously a valid, empty, open BytesIO was returned. Is this interface 
change permissible?

 * Move __init__ guts into a reinit(), for sharing with __setstate__, which 
also previously caused an unnecessary copy. Additionally gather up various 
methods for deallocating buffers into a single reset() function, called by 
reinit(), _dealloc(), and _close()

 * Per Stefan's suggested approach, reinit() now explicitly checks for a 
read-only buffer, falling back to silently performing a copy if the returned 
buffer is read-write. That seems vastly preferable to throwing an exception, 
which would probably be another interface change.

 * Call `unshare()` any place the buffer is about to be modified. If the buffer 
needs to be made private, it also accepts a size hint indicating how much 
less/more space the subsequent operation needs, to avoid a redundant 
reallocation after the unsharing.


Outstanding issues:

 * I don't understand why buf_size is a size_t, and I'm certain the casting in 
unshare() is incorrect somehow. Is it simply to avoid signed overflow?

--
Added file: http://bugs.python.org/file36004/cow2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-20 Thread David Wilson

David Wilson added the comment:

New patch also calls unshare() during getbuffer()

--
Added file: http://bugs.python.org/file36005/cow3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-20 Thread David Wilson

David Wilson added the comment:

I'm not sure the read only buffer test is strong enough: having a readonly 
view is not a guarantee that the data in the view cannot be changed through 
some other means, i.e. it is read-only, not immutable.

Pretty sure this approach is broken. What about the alternative approach of 
specializing for Bytes?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-18 Thread David Wilson

David Wilson added the comment:

Good catch :( There doesn't seem to be way a to ask for an immutable buffer, so 
perhaps it could just be a little more selective. I think the majority of use 
cases would still be covered if the sharing behaviour was restricted only to 
BytesType.

In that case Py_buffer initialdata could become a PyObject*, saving a small 
amount of memory, and allowing reuse of the struct member if BytesIO was also 
modified to directly write into a private BytesObject

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-17 Thread David Wilson

New submission from David Wilson:

This is a followup to the thread at 
https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing 
the existing behaviour of BytesIO copying its source object, and how this 
regresses compared to cStringIO.StringI.

The goal of posting the patch on list was to try and stimulate discussion 
around the approach. The patch itself obviously isn't ready for review, and I'm 
not in a position to dedicate time to it just now (although in a few weeks I'd 
love to give it full attention!).

Ignoring this quick implementation, are there any general comments around the 
approach?

My only concern is that it might keep large objects alive in a non-intuitive 
way in certain circumstances, though I can't think of any obvious ones 
immediately.

Also interested in comments on the second half of that thread: a natural 
extension of this is to do something very similar on the write side: instead of 
generating a temporary private heap allocation, generate (and freely resize) a 
private PyBytes object until it is exposed to the user, at which point, 
_getvalue() returns it, and converts its into an IO_SHARED buffer.

There are quite a few interactions with making that work correctly, in 
particular:

* How BytesIO would implement the buffers interface without causing the 
under-construction Bytes to become readonly

* Avoiding redundant copies and resizes -- we can't simply tack 25% slack on 
the end of the Bytes and then truncate it during getvalue() without likely 
triggering a copy and move, however with careful measurement of allocator 
behavior there are various tradeoffs that could be made - e.g. obmalloc won't 
move a 500 byte allocation if it shrinks by 25%. glibc malloc's rules are a 
bit more complex though.

Could also add a private _PyBytes_SetSize() API to allow truncation to the 
final size during getvalue() without informing the allocator. Then we'd simply 
overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.

Notably, this approach completely differs from the one documented in 
http://bugs.python.org/issue15381 .. it's not clear to me which is better.

--
components: Library (Lib)
files: cow.patch
keywords: patch
messages: 223383
nosy: dw
priority: normal
severity: normal
status: open
title: BytesIO copy-on-write
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file35988/cow.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22003] BytesIO copy-on-write

2014-07-17 Thread David Wilson

David Wilson added the comment:

Submitted contributor agreement. Please consider the demo patch licensed under 
the Apache 2 licence.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22003
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com