[issue7471] GZipFile.readline too slow

2010-01-03 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: The patches have been committed. Thank you! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org

[issue7471] GZipFile.readline too slow

2009-12-19 Thread Nir Aides
Nir Aides n...@winpdb.org added the comment: isatty() and __iter__() of io.BufferedIOBase raise on closed file and __enter__() raises ValueError with different (generic) message. Should we keep the original GzipFile methods or prefer the implementation of io.BufferedIOBase? --

[issue7471] GZipFile.readline too slow

2009-12-19 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: isatty() and __iter__() of io.BufferedIOBase raise on closed file and __enter__() raises ValueError with different (generic) message. Should we keep the original GzipFile methods or prefer the implementation of io.BufferedIOBase? It's

[issue7471] GZipFile.readline too slow

2009-12-19 Thread Nir Aides
Changes by Nir Aides n...@winpdb.org: Removed file: http://bugs.python.org/file15589/gzip_7471_py27.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7471 ___

[issue7471] GZipFile.readline too slow

2009-12-19 Thread Nir Aides
Nir Aides n...@winpdb.org added the comment: uploaded updated patch for Python 2.7. -- Added file: http://bugs.python.org/file15619/gzip_7471_py27.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7471

[issue7471] GZipFile.readline too slow

2009-12-19 Thread Nir Aides
Nir Aides n...@winpdb.org added the comment: Uploaded patch for Python 3.2. -- Added file: http://bugs.python.org/file15620/gzip_7471_py32.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7471

[issue7471] GZipFile.readline too slow

2009-12-18 Thread Nir Aides
Nir Aides n...@winpdb.org added the comment: Submitted combined patch for Python 2.7. If its good i'll send one for Python 3.2. -- Added file: http://bugs.python.org/file15589/gzip_7471_py27.diff ___ Python tracker rep...@bugs.python.org

[issue7471] GZipFile.readline too slow

2009-12-18 Thread Brian Curtin
Brian Curtin cur...@acm.org added the comment: In the test, should you verify that the correct data comes back from io.BufferedReader? After the r.close() on line 90 of test_gzip.py, adding something like self.assertEqual(.join(lines), data1*50) would do the trick. Looks good. --

[issue7471] GZipFile.readline too slow

2009-12-18 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Two things: - since it implements common IO operations, the GzipFile class could inherit io.BufferedIOBase. It would also alleviate the need to reimplement readinto(): BufferedIOBase has a default implementation which should be sufficient. -

[issue7471] GZipFile.readline too slow

2009-12-17 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Thanks for the new patch. The problem with inheriting from BufferedRandom, though, is that if you call e.g. write() on a read-only gzipfile, it will appear to succeed because the bytes are buffered internally. I think the solution would be to use

[issue7471] GZipFile.readline too slow

2009-12-17 Thread Nir Aides
Nir Aides n...@winpdb.org added the comment: How about using the first patch with the slicing optimization and additionally enhancing GzipFile with the methods required to make it play nice as a raw stream to an io.BufferedReader object (readable(), writable(), readinto(), etc...). This way

[issue7471] GZipFile.readline too slow

2009-12-17 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: How about using the first patch with the slicing optimization and additionally enhancing GzipFile with the methods required to make it play nice as a raw stream to an io.BufferedReader object (readable(), writable(), readinto(), etc...).

[issue7471] GZipFile.readline too slow

2009-12-16 Thread Nir Aides
Nir Aides n...@winpdb.org added the comment: Right, using the io module makes GzipFile as fast as zcat. I submit a new patch this time for Python 2.7, however, it is not a module rewrite, but again minimal refactoring. GzipFile is now derived of io.BufferedRandom, and as result the

[issue7471] GZipFile.readline too slow

2009-12-14 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ah, a patch. Now we're talking :) -- resolution: wont fix - stage: - patch review status: closed - open versions: +Python 2.7, Python 3.2 -Python 2.6 ___ Python tracker rep...@bugs.python.org

[issue7471] GZipFile.readline too slow

2009-12-14 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: The patch doesn't apply against the SVN trunk (some parts are rejected). I suppose it was done against 2.6 or earlier, but those versions are in bug fixing-only mode (which excludes performance improvements), so you'll have to regenerate it

[issue7471] GZipFile.readline too slow

2009-12-14 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ah, my bad, I hadn't seen that the patch is for 3.2. Sorry for the confusion. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7471 ___

[issue7471] GZipFile.readline too slow

2009-12-14 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I confirm that the patch gives good speedups. It would be nice if there was a comment explaining what extrabuf, extrastart and extrasize are. In 3.x, a better but more involved approached would be to rewrite the gzip module so as to take

[issue7471] GZipFile.readline too slow

2009-12-13 Thread Nir
Nir n...@winpdb.org added the comment: First patch, please forgive long comment :) I submit a small patch which speeds up readline() on my data set - a 74MB (5MB .gz) log file with 600K lines. The speedup is 350%. Source of slowness is that (~20KB) extrabuf is allocated/deallocated in

[issue7471] GZipFile.readline too slow

2009-12-11 Thread Jack Diederich
Jack Diederich jackd...@gmail.com added the comment: I tried passing a size to readline to see if increasing the chunk helps (test file was 120meg with 700k lines). For values 1k-10k all took around 30 seconds, with a value of 100 it took 80 seconds, with a value of 100k it ran for several

[issue7471] GZipFile.readline too slow

2009-12-10 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: (GZipFile.readline() is implemented in pure Python, which explains why it's rather slow) -- priority: - normal title: gzip module too slow - GZipFile.readline too slow ___ Python tracker

[issue7471] GZipFile.readline too slow

2009-12-10 Thread Brian Curtin
Changes by Brian Curtin cur...@acm.org: -- nosy: +brian.curtin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7471 ___ ___ Python-bugs-list mailing

[issue7471] GZipFile.readline too slow

2009-12-10 Thread asnakelover
asnakelover a3277...@uggsrock.com added the comment: Hope this reply works right, the python bug interface is a bit confusing for this newbie, it doesn't say Reply anywhere - sorry if it goes FUBAR. I tried the splitlines() version you suggested, it thrashed my machine so badly I pressed

[issue7471] GZipFile.readline too slow

2009-12-10 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I tried the splitlines() version you suggested, it thrashed my machine so badly I pressed alt+sysrq+f (which invokes kernel oom_kill) after about 1 minute so I didn't lose anything important. This sounds very weird. How much memory do you

[issue7471] GZipFile.readline too slow

2009-12-10 Thread asnakelover
asnakelover a3277...@uggsrock.com added the comment: The gz in question is 17mb compressed and 247mb uncompressed. Calling zcat the python process uses between 250 and 260 mb with the whole string in memory using zcat as a fork. Numbers for the gzip module aren't obtainable except for

[issue7471] GZipFile.readline too slow

2009-12-10 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: The gz in question is 17mb compressed and 247mb uncompressed. Calling zcat the python process uses between 250 and 260 mb with the whole string in memory using zcat as a fork. Numbers for the gzip module aren't obtainable except for

[issue7471] GZipFile.readline too slow

2009-12-10 Thread asnakelover
asnakelover a3277...@uggsrock.com added the comment: Yes, subprocess works fine and was the quickest to implement and probably the fastest to run too. How can I put this without being an ass? Hell, I'm no good at diplomacy - the gzip module blows chunks - if I can shell out to a standard unix

[issue7471] GZipFile.readline too slow

2009-12-10 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: How can I put this without being an ass? Hell, I'm no good at diplomacy - the gzip module blows chunks - if I can shell out to a standard unix util and it uses a tiny fraction of the memory to do the same job the module is inherently broken no