[issue3873] Unpickling is really slow

2011-03-18 Thread Jesús Cea Avión
Changes by Jesús Cea Avión j...@jcea.es: -- nosy: +jcea ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___ ___ Python-bugs-list mailing list

[issue3873] Unpickling is really slow

2010-10-12 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Patch committed in r85384. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873

[issue3873] Unpickling is really slow

2010-09-27 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: One problem with the seek() approach is that some file-like objects have expensive seeks. One example is GzipFile, where seek(n) is O(n) (it first rewinds to the start of file, then reads n decompressed bytes). In the end, unpickling from a

[issue3873] Unpickling is really slow

2010-09-27 Thread Alexandre Vassalotti
Alexandre Vassalotti alexan...@peadrop.com added the comment: Didn't Victor say that only one seek at the end is necessary per pickle? If this is the case, I don't think expensive seeks will be an issue. -- ___ Python tracker rep...@bugs.python.org

[issue3873] Unpickling is really slow

2010-09-27 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Didn't Victor say that only one seek at the end is necessary per pickle? If this is the case, I don't think expensive seeks will be an issue. If you are unpickling from a multi-megabyte gzip file and the seek at the end makes you uncompress

[issue3873] Unpickling is really slow

2010-09-27 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Here is an update bench_pickle which also makes the file unpeekable. -- Added file: http://bugs.python.org/file19033/bench_pickle.py ___ Python tracker rep...@bugs.python.org

[issue3873] Unpickling is really slow

2010-09-27 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: Removed file: http://bugs.python.org/file18241/bench_pickle.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___

[issue3873] Unpickling is really slow

2010-09-27 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: Removed file: http://bugs.python.org/file18983/bench_pickle.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___

[issue3873] Unpickling is really slow

2010-09-27 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Here is a patch using peek() rather than seek(). There are some inefficiencies around (such as using read() to skip the consumed prefetched bytes), but the benchmark results are still as good as with seek(): Protocol 0 - dump: 142.5 ms - load

[issue3873] Unpickling is really slow

2010-09-24 Thread Alexandre Vassalotti
Alexandre Vassalotti alexan...@peadrop.com added the comment: I get this error with the patch: python: /home/alex/src/python.org/py3k/Modules/_pickle.c:908: _Unpickler_ReadFromFile: Assertion `self-next_read_idx == 0' failed. Aborted -- ___ Python

[issue3873] Unpickling is really slow

2010-09-24 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ah, thank you. I hadn't tested in debug mode and there was a wrong assert from the previous code. Here is a patch with the assert removed. -- versions: +Python 3.2 -Python 3.1 Added file:

[issue3873] Unpickling is really slow

2010-09-23 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Here is a fixed version of Victor's bench (didn't work on 2.x). -- Added file: http://bugs.python.org/file18983/bench_pickle.py ___ Python tracker rep...@bugs.python.org

[issue3873] Unpickling is really slow

2010-09-23 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: And here is new performance patch (Victor's patch was outdated because of heavy changes incorporated from Unladen Swallow). Results of bench_pickle.py are as follows: * Python 2.7 (cPickle): Protocol 0 - dump: 189.8 ms - load (seekable=False):

[issue3873] Unpickling is really slow

2010-07-29 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Victor, have you tried using peek() instead of seek()? I mentioned this previously in msg85780. In a file encoded in protocol 0, backward seek are needed to each call to unpickler_readline... and this function is called to read

[issue3873] Unpickling is really slow

2010-07-29 Thread Alexander Belopolsky
Changes by Alexander Belopolsky belopol...@users.sourceforge.net: -- nosy: +belopolsky ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___ ___

[issue3873] Unpickling is really slow

2010-07-28 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: New version of my patch: - add used attribute to UnpicklerBuffer structure: disable the read buffer for not seekable file and for protocol 0 (at the first call to unpickle_readline) - check if PyObject_GetAttrString(file, seek)

[issue3873] Unpickling is really slow

2010-07-28 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Same benchmark with Python 2.6.5+, so without the patch, but compiled with maximum compiler optimization (whereas pydebug means no optimization): Protocol 0 - dump: 517.3 ms - load: 876.6 ms = because of the new I/O library,

[issue3873] Unpickling is really slow

2010-07-28 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: bench_pickle.py: script used to produce last benchmarks. -- Added file: http://bugs.python.org/file18241/bench_pickle.py ___ Python tracker rep...@bugs.python.org

[issue3873] Unpickling is really slow

2010-07-28 Thread Alexandre Vassalotti
Alexandre Vassalotti alexan...@peadrop.com added the comment: Victor, have you tried using peek() instead of seek()? I mentioned this previously in msg85780. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873

[issue3873] Unpickling is really slow

2010-07-18 Thread Mark Lawrence
Mark Lawrence breamore...@yahoo.co.uk added the comment: Has this slipped under the radar? I believe that one way or the other any performance issue should be resolved if at all possible. -- nosy: +BreamoreBoy ___ Python tracker

[issue3873] Unpickling is really slow

2009-04-16 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- priority: - normal stage: - patch review versions: +Python 3.1 -Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___

[issue3873] Unpickling is really slow

2009-04-16 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: By the way, the patch won't work with unseekable files, which is probably bad. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___

[issue3873] Unpickling is really slow

2009-04-08 Thread Alexandre Vassalotti
Alexandre Vassalotti alexan...@peadrop.com added the comment: Victor, Unpickler shouldn't raise an error if the given file object does support seek(); it should gracefully fall back to using only read() and readline(). Also, I think you could get a greater performance improvement by using peek()

[issue3873] Unpickling is really slow

2009-04-06 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Create a read buffer (4096 bytes) in unpickler class. Using [0]*10**7 or [1000]*10**7, load() is from 6 to 8 times faster. I removed last_string attribute because it wasn't used. If there are tail bytes, seek backward. --

[issue3873] Unpickling is really slow

2009-04-06 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: I don't know why, but python-trunk is *much* slower than py3k (eg. with dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms). -- ___ Python tracker rep...@bugs.python.org

[issue3873] Unpickling is really slow

2009-04-06 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: My version of pickletest.py: - make sure that file position is correct after the load() - some benchmark. most interesting numbers: without the patch : version | data | dump ms | load ms | py3k | 0,10^6 | 230 |

[issue3873] Unpickling is really slow

2009-04-06 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Note about my patch: the buffer should be truncated after PyBytes_Concat(self-buffer.pybytes, data) to avoid very long buffer. Something like: self-buffer.pybytes += data; self-buffer.pybytes = self-buffer.pybytes[index:];

[issue3873] Unpickling is really slow

2009-04-06 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I don't know why, but python-trunk is *much* slower than py3k (eg. with dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms). Perhaps you tried with the pure Python version (pickle) rather than the C one (cPickle)? --

[issue3873] Unpickling is really slow

2009-04-06 Thread Collin Winter
Changes by Collin Winter coll...@gmail.com: -- nosy: +collinwinter ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___ ___ Python-bugs-list

[issue3873] Unpickling is really slow

2009-04-04 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: alexandre.vassalotti wrote: The solution is to add a read buffer to Unpickler (...) would mitigate much of the (quite large) Python function call overhead. (...) cPickle has a performance hack to make it uses cStringIO and

[issue3873] Unpickling is really slow

2009-04-04 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Unladen Swallow has a project to optimize pickle. Currently, it uses 3 benchmarks: pickle - use the cPickle module to pickle a variety of datasets. pickle_dict - microbenchmark; use the cPickle module to pickle a lot of

[issue3873] Unpickling is really slow

2009-04-04 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: gprof (--enable-profiler) results: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls ms/call ms/call name 18.18 0.16 0.16 2011055

[issue3873] Unpickling is really slow

2009-01-18 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Making this a duplicate of #4565 (Rewrite the IO stack in C). If anyone disagrees, please reopen! -- resolution: - duplicate status: open - closed superseder: - Rewrite the IO stack in C ___ Python

[issue3873] Unpickling is really slow

2009-01-18 Thread Hagen Fürstenau
Hagen Fürstenau hfuerste...@gmx.net added the comment: With the io-c branch I see much better unpickling performance than before. But it still seems to be around 2 or 3 times slower than with cPickle in 2.6. Is this expected at this point of io-c development? Otherwise perhaps this issue should

[issue3873] Unpickling is really slow

2009-01-18 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Hello, With the io-c branch I see much better unpickling performance than before. But it still seems to be around 2 or 3 times slower than with cPickle in 2.6. It's much closer here. With 2.7 (trunk) and cPickle: 0.439934968948

[issue3873] Unpickling is really slow

2009-01-18 Thread Hagen Fürstenau
Changes by Hagen Fürstenau hfuerste...@gmx.net: Removed file: http://bugs.python.org/file11497/pickletst.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3873 ___

[issue3873] Unpickling is really slow

2009-01-18 Thread Hagen Fürstenau
Hagen Fürstenau hfuerste...@gmx.net added the comment: I uploaded a new pickletst.py which specifies protocol 2, otherwise we're comparing apples with oranges. With this I get: 0.211881160736 0.322369813919 for Python 2.6 and 0.158488035202 1.21621990204 on the io-c branch. Can you confirm

[issue3873] Unpickling is really slow

2009-01-18 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Nice catch! I can confirm your figures with protocol=2 (and protocol=-1 as well). -- resolution: duplicate - status: closed - open superseder: Rewrite the IO stack in C - ___ Python tracker

[issue3873] Unpickling is really slow

2008-10-07 Thread Alexandre Vassalotti
Alexandre Vassalotti [EMAIL PROTECTED] added the comment: The solution is to add a read buffer to Unpickler (Pickler already has a write buffer, so that why it is unaffected). I believe this would mitigate much of the (quite large) Python function call overhead. cPickle has a performance hack

[issue3873] Unpickling is really slow

2008-09-15 Thread Antoine Pitrou
Antoine Pitrou [EMAIL PROTECTED] added the comment: Do the numbers vary if you read the whole file at once and then unpickle the resulting bytes string? Large parts of the IO library are written in Python in 3.0, which might explain the discrepancy. -- nosy: +pitrou

[issue3873] Unpickling is really slow

2008-09-15 Thread Hagen Fürstenau
Hagen Fürstenau [EMAIL PROTECTED] added the comment: Yes, it gets much better, but even so (first reading file and timing only loads) unpickling takes four times as long in Python 3.0 as with the old cPickle module: [EMAIL PROTECTED] hagenf]$ python pickletst2.py 0.0744678974152 0.0514161586761

[issue3873] Unpickling is really slow

2008-09-15 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment: Indeed. If I replace the file with f = io.BytesIO(open(tst, rb).read()) timings are divided by 20... After quick profiling, it seems that PyLong_New would benefit from a free list. len(bytearray) is called very often. To stay

[issue3873] Unpickling is really slow

2008-09-15 Thread Hagen Fürstenau
New submission from Hagen Fürstenau [EMAIL PROTECTED]: Unpickling e.g. a large list seems to be really slow in Python 3.0. The attached test script gives the following results for pickling and unpickling a list of 1M zeros, showing that although the C implementation seems to be used in Python

[issue3873] Unpickling is really slow

2008-09-15 Thread Antoine Pitrou
Antoine Pitrou [EMAIL PROTECTED] added the comment: Gregory had patches for a freelist of long objects in #2013. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3873 ___