Antoine Pitrou pit...@free.fr added the comment:
The patches have been committed. Thank you!
--
resolution: - fixed
stage: patch review - committed/rejected
status: open - closed
___
Python tracker rep...@bugs.python.org
Nir Aides n...@winpdb.org added the comment:
isatty() and __iter__() of io.BufferedIOBase raise on closed file and
__enter__() raises ValueError with different (generic) message.
Should we keep the original GzipFile methods or prefer the implementation
of io.BufferedIOBase?
--
Antoine Pitrou pit...@free.fr added the comment:
isatty() and __iter__() of io.BufferedIOBase raise on closed file and
__enter__() raises ValueError with different (generic) message.
Should we keep the original GzipFile methods or prefer the implementation
of io.BufferedIOBase?
It's
Changes by Nir Aides n...@winpdb.org:
Removed file: http://bugs.python.org/file15589/gzip_7471_py27.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7471
___
Nir Aides n...@winpdb.org added the comment:
uploaded updated patch for Python 2.7.
--
Added file: http://bugs.python.org/file15619/gzip_7471_py27.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7471
Nir Aides n...@winpdb.org added the comment:
Uploaded patch for Python 3.2.
--
Added file: http://bugs.python.org/file15620/gzip_7471_py32.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7471
Nir Aides n...@winpdb.org added the comment:
Submitted combined patch for Python 2.7.
If its good i'll send one for Python 3.2.
--
Added file: http://bugs.python.org/file15589/gzip_7471_py27.diff
___
Python tracker rep...@bugs.python.org
Brian Curtin cur...@acm.org added the comment:
In the test, should you verify that the correct data comes back from
io.BufferedReader? After the r.close() on line 90 of test_gzip.py,
adding something like self.assertEqual(.join(lines), data1*50) would
do the trick.
Looks good.
--
Antoine Pitrou pit...@free.fr added the comment:
Two things:
- since it implements common IO operations, the GzipFile class could
inherit io.BufferedIOBase. It would also alleviate the need to
reimplement readinto(): BufferedIOBase has a default implementation
which should be sufficient.
-
Antoine Pitrou pit...@free.fr added the comment:
Thanks for the new patch. The problem with inheriting from
BufferedRandom, though, is that if you call e.g. write() on a read-only
gzipfile, it will appear to succeed because the bytes are buffered
internally.
I think the solution would be to use
Nir Aides n...@winpdb.org added the comment:
How about using the first patch with the slicing optimization and
additionally enhancing GzipFile with the methods required to make it
play nice as a raw stream to an io.BufferedReader object (readable(),
writable(), readinto(), etc...).
This way
Antoine Pitrou pit...@free.fr added the comment:
How about using the first patch with the slicing optimization and
additionally enhancing GzipFile with the methods required to make it
play nice as a raw stream to an io.BufferedReader object (readable(),
writable(), readinto(), etc...).
Nir Aides n...@winpdb.org added the comment:
Right, using the io module makes GzipFile as fast as zcat.
I submit a new patch this time for Python 2.7, however, it is not a
module rewrite, but again minimal refactoring.
GzipFile is now derived of io.BufferedRandom, and as result the
Antoine Pitrou pit...@free.fr added the comment:
Ah, a patch. Now we're talking :)
--
resolution: wont fix -
stage: - patch review
status: closed - open
versions: +Python 2.7, Python 3.2 -Python 2.6
___
Python tracker rep...@bugs.python.org
Antoine Pitrou pit...@free.fr added the comment:
The patch doesn't apply against the SVN trunk (some parts are rejected).
I suppose it was done against 2.6 or earlier, but those versions are in
bug fixing-only mode (which excludes performance improvements), so
you'll have to regenerate it
Antoine Pitrou pit...@free.fr added the comment:
Ah, my bad, I hadn't seen that the patch is for 3.2. Sorry for the
confusion.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7471
___
Antoine Pitrou pit...@free.fr added the comment:
I confirm that the patch gives good speedups. It would be nice if there
was a comment explaining what extrabuf, extrastart and extrasize are.
In 3.x, a better but more involved approached would be to rewrite the
gzip module so as to take
Nir n...@winpdb.org added the comment:
First patch, please forgive long comment :)
I submit a small patch which speeds up readline() on my data set - a
74MB (5MB .gz) log file with 600K lines.
The speedup is 350%.
Source of slowness is that (~20KB) extrabuf is allocated/deallocated in
Jack Diederich jackd...@gmail.com added the comment:
I tried passing a size to readline to see if increasing the chunk helps
(test file was 120meg with 700k lines). For values 1k-10k all took
around 30 seconds, with a value of 100 it took 80 seconds, with a value
of 100k it ran for several
Antoine Pitrou pit...@free.fr added the comment:
(GZipFile.readline() is implemented in pure Python, which explains why
it's rather slow)
--
priority: - normal
title: gzip module too slow - GZipFile.readline too slow
___
Python tracker
Changes by Brian Curtin cur...@acm.org:
--
nosy: +brian.curtin
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7471
___
___
Python-bugs-list mailing
asnakelover a3277...@uggsrock.com added the comment:
Hope this reply works right, the python bug interface is a bit confusing
for this newbie, it doesn't say Reply anywhere - sorry if it goes FUBAR.
I tried the splitlines() version you suggested, it thrashed my machine
so badly I pressed
Antoine Pitrou pit...@free.fr added the comment:
I tried the splitlines() version you suggested, it thrashed my machine
so badly I pressed alt+sysrq+f (which invokes kernel oom_kill) after
about 1 minute so I didn't lose anything important.
This sounds very weird. How much memory do you
asnakelover a3277...@uggsrock.com added the comment:
The gz in question is 17mb compressed and 247mb uncompressed. Calling
zcat the python process uses between 250 and 260 mb with the whole
string in memory using zcat as a fork. Numbers for the gzip module
aren't obtainable except for
Antoine Pitrou pit...@free.fr added the comment:
The gz in question is 17mb compressed and 247mb uncompressed. Calling
zcat the python process uses between 250 and 260 mb with the whole
string in memory using zcat as a fork. Numbers for the gzip module
aren't obtainable except for
asnakelover a3277...@uggsrock.com added the comment:
Yes, subprocess works fine and was the quickest to implement and
probably the fastest to run too.
How can I put this without being an ass? Hell, I'm no good at diplomacy
- the gzip module blows chunks - if I can shell out to a standard unix
Antoine Pitrou pit...@free.fr added the comment:
How can I put this without being an ass? Hell, I'm no good at diplomacy
- the gzip module blows chunks - if I can shell out to a standard unix
util and it uses a tiny fraction of the memory to do the same job the
module is inherently broken no
27 matches
Mail list logo