Antoine Pitrou added the comment:
He accepted it already:
A small last-minute optimization is not a release-blocker.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
Serhiy Storchaka added the comment:
Larry, do you accept the patch for 3.5?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Serhiy Storchaka added the comment:
The patch is not so harmless. First, my change in BZ2File is not correct,
because reading every line should be guarded with a lock (BZ2File is
threading-safe). Second, for now all three compressing files are not only
iterables, but iterators. iter(f)
Larry Hastings added the comment:
Sounds good to me.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Python-bugs-list mailing list
Martin Panter added the comment:
This patch adds an entry to the What’s New for 3.5 (though maybe it will have
to be 3.6), and adds three tests to check that next() raises ValueError when
the files have been closed.
--
Added file: http://bugs.python.org/file39662/decomp-optim.v4.patch
Martin Panter added the comment:
The BufferedReader class is documented as being thread safe:
https://docs.python.org/dev/library/io.html#multi-threading. Some
experimentation suggests that checking the “raw.closed” property is not
actually serialized, but that raw.readinto() calls are
Antoine Pitrou added the comment:
This looks good to me.
--
stage: patch review - commit review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
Serhiy Storchaka added the comment:
Perhaps this change is worth to mention in whatsnews. Could you add this Martin?
It would be nice also add tests to ensure that next() after closing the file
always raises ValueError.
--
___
Python tracker
Changes by Antoine Pitrou pit...@free.fr:
--
priority: normal - release blocker
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Serhiy Storchaka added the comment:
bz2 will gain great benefit from such optimization too.
Microbenchmark results:
$ ./python -m timeit -s import gzip -- f=gzip.GzipFile('words.gz', 'r')
for line in f: pass
2.7: 10 loops, best of 3: 374 msec per loop
3.2: 10
Martin Panter added the comment:
Looking at https://bugs.python.org/file39586/decomp-optim.patch, the “closed”
property is the first of the three hunks:
1. Adds @property / def closed(self) to Lib/_compression.py
2. Adds def __iter__(self) to Lib/gzip.py
3. Adds def __iter__(self) to
Martin Panter added the comment:
New patch just fixes the spelling error in the comment.
--
stage: needs patch - patch review
Added file: http://bugs.python.org/file39604/decomp-optim.v2.patch
___
Python tracker rep...@bugs.python.org
Larry Hastings added the comment:
I don't see anything about closed in the patch you posted.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
Antoine Pitrou added the comment:
Yes, this is right.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Python-bugs-list mailing list
Martin Panter added the comment:
Yes that’s basically right Larry. The __iter__() was previously inherited; now
I am overriding it with a custom version. Similarly for the “closed” property,
but that one is only a member of objects internal to the gzip, lzma and bz2
modules.
--
Antoine Pitrou added the comment:
Nous disions que tu aurais probablement à valider ce changement, mais que nous
pourrions peut-être aussi le faufiler discrètement dans la base de code, vu que
tu ne lis pas ces message.
--
___
Python tracker
Larry Hastings added the comment:
Quoi? Je comprends que le français.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
___
___
Larry Hastings added the comment:
If I understand this correctly, I can ignore everything up to May 2015, as it
has to do with line-reading a compressed binary file (!) being slow.
Then, Martin Panter proposes a new optimization in May 2015, which is to simply
add __iter__ methods to
Antoine Pitrou added the comment:
This looks good to me. Larry would probably have to validate it for 3.5,
although we may try to sneak it in (he isn't reading :-D).
--
nosy: +larry
___
Python tracker rep...@bugs.python.org
Martin Panter added the comment:
This bug was originally raised against Python 3.3, and the speed has improved a
lot since then. Perhaps this bug can be closed as it is, or maybe people would
like to consider my decomp-optim.patch which squeezes a bit more speed out. I
don’t actually have a
Martin Panter added the comment:
I haven’t done any tests, but my LZMAFile patch to Issue 15955 uses
BufferedReader, so it might satisfy this issue
--
nosy: +vadmium
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
Serhiy Storchaka added the comment:
See issue19051. Even preliminary Python implementation noticeable speed up the
reading of short lines.
$ ./python -m timeit -s import lzma, io f=lzma.LZMAFile('words.xz', 'r')
for line in f: pass
Unpatched: 1.44 sec per loop
Patched: 1.06 sec per loop
Antoine Pitrou added the comment:
With C implementation it should be as fast as with BufferedReader.
So why not simply use BufferedReader?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18003
Antoine Pitrou added the comment:
So why not simply use BufferedReader?
Because we want good performance LZMAFile and compatibility with older
versions.
You're reading me wrong. I'm simply suggesting that users interested in
readline() performance wrap LZMAFile in a BufferedReader. The
Serhiy Storchaka added the comment:
So why not simply use BufferedReader?
Because we want good performance LZMAFile and compatibility with older
versions. And I guess that it will be even faster than wrapping in
BufferedReader (due to the avoiding of double buffering).
--
Éric Araujo added the comment:
A higher-level interface to abstract differences between gzip, xz and others is
actually provided in the tarfile module. (zipfile is left out and its file
objects have different methods, but that’s another issue. shutil provides even
higher-level functions to
26 matches
Mail list logo