Changes by Jesús Cea Avión j...@jcea.es:
--
nosy: +jcea
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
___
Python-bugs-list mailing list
Antoine Pitrou pit...@free.fr added the comment:
Patch committed in r85384.
--
resolution: - fixed
stage: patch review - committed/rejected
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
Antoine Pitrou pit...@free.fr added the comment:
One problem with the seek() approach is that some file-like objects have
expensive seeks. One example is GzipFile, where seek(n) is O(n) (it first
rewinds to the start of file, then reads n decompressed bytes). In the end,
unpickling from a
Alexandre Vassalotti alexan...@peadrop.com added the comment:
Didn't Victor say that only one seek at the end is necessary per
pickle? If this is the case, I don't think expensive seeks will be an
issue.
--
___
Python tracker rep...@bugs.python.org
Antoine Pitrou pit...@free.fr added the comment:
Didn't Victor say that only one seek at the end is necessary per
pickle? If this is the case, I don't think expensive seeks will be an
issue.
If you are unpickling from a multi-megabyte gzip file and the seek at
the end makes you uncompress
Antoine Pitrou pit...@free.fr added the comment:
Here is an update bench_pickle which also makes the file unpeekable.
--
Added file: http://bugs.python.org/file19033/bench_pickle.py
___
Python tracker rep...@bugs.python.org
Changes by Antoine Pitrou pit...@free.fr:
Removed file: http://bugs.python.org/file18241/bench_pickle.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
Changes by Antoine Pitrou pit...@free.fr:
Removed file: http://bugs.python.org/file18983/bench_pickle.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
Antoine Pitrou pit...@free.fr added the comment:
Here is a patch using peek() rather than seek(). There are some inefficiencies
around (such as using read() to skip the consumed prefetched bytes), but the
benchmark results are still as good as with seek():
Protocol 0
- dump: 142.5 ms
- load
Alexandre Vassalotti alexan...@peadrop.com added the comment:
I get this error with the patch:
python: /home/alex/src/python.org/py3k/Modules/_pickle.c:908:
_Unpickler_ReadFromFile: Assertion `self-next_read_idx == 0' failed.
Aborted
--
___
Python
Antoine Pitrou pit...@free.fr added the comment:
Ah, thank you. I hadn't tested in debug mode and there was a wrong assert from
the previous code.
Here is a patch with the assert removed.
--
versions: +Python 3.2 -Python 3.1
Added file:
Antoine Pitrou pit...@free.fr added the comment:
Here is a fixed version of Victor's bench (didn't work on 2.x).
--
Added file: http://bugs.python.org/file18983/bench_pickle.py
___
Python tracker rep...@bugs.python.org
Antoine Pitrou pit...@free.fr added the comment:
And here is new performance patch (Victor's patch was outdated because of heavy
changes incorporated from Unladen Swallow). Results of bench_pickle.py are as
follows:
* Python 2.7 (cPickle):
Protocol 0
- dump: 189.8 ms
- load (seekable=False):
STINNER Victor victor.stin...@haypocalc.com added the comment:
Victor, have you tried using peek() instead of seek()?
I mentioned this previously in msg85780.
In a file encoded in protocol 0, backward seek are needed to each call to
unpickler_readline... and this function is called to read
Changes by Alexander Belopolsky belopol...@users.sourceforge.net:
--
nosy: +belopolsky
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
___
STINNER Victor victor.stin...@haypocalc.com added the comment:
New version of my patch:
- add used attribute to UnpicklerBuffer structure: disable the read buffer
for not seekable file and for protocol 0 (at the first call to
unpickle_readline)
- check if PyObject_GetAttrString(file, seek)
STINNER Victor victor.stin...@haypocalc.com added the comment:
Same benchmark with Python 2.6.5+, so without the patch, but compiled with
maximum compiler optimization (whereas pydebug means no optimization):
Protocol 0
- dump: 517.3 ms
- load: 876.6 ms = because of the new I/O library,
STINNER Victor victor.stin...@haypocalc.com added the comment:
bench_pickle.py: script used to produce last benchmarks.
--
Added file: http://bugs.python.org/file18241/bench_pickle.py
___
Python tracker rep...@bugs.python.org
Alexandre Vassalotti alexan...@peadrop.com added the comment:
Victor, have you tried using peek() instead of seek()? I mentioned this
previously in msg85780.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
Mark Lawrence breamore...@yahoo.co.uk added the comment:
Has this slipped under the radar? I believe that one way or the other any
performance issue should be resolved if at all possible.
--
nosy: +BreamoreBoy
___
Python tracker
Changes by Antoine Pitrou pit...@free.fr:
--
priority: - normal
stage: - patch review
versions: +Python 3.1 -Python 3.0
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
Antoine Pitrou pit...@free.fr added the comment:
By the way, the patch won't work with unseekable files, which is
probably bad.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
Alexandre Vassalotti alexan...@peadrop.com added the comment:
Victor, Unpickler shouldn't raise an error if the given file object does
support seek(); it should gracefully fall back to using only read() and
readline(). Also, I think you could get a greater performance
improvement by using peek()
STINNER Victor victor.stin...@haypocalc.com added the comment:
Create a read buffer (4096 bytes) in unpickler class. Using [0]*10**7 or
[1000]*10**7, load() is from 6 to 8 times faster.
I removed last_string attribute because it wasn't used.
If there are tail bytes, seek backward.
--
STINNER Victor victor.stin...@haypocalc.com added the comment:
I don't know why, but python-trunk is *much* slower than py3k (eg. with
dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).
--
___
Python tracker rep...@bugs.python.org
STINNER Victor victor.stin...@haypocalc.com added the comment:
My version of pickletest.py:
- make sure that file position is correct after the load()
- some benchmark. most interesting numbers:
without the patch :
version | data | dump ms | load ms |
py3k | 0,10^6 | 230 |
STINNER Victor victor.stin...@haypocalc.com added the comment:
Note about my patch: the buffer should be truncated after
PyBytes_Concat(self-buffer.pybytes, data) to avoid very long buffer.
Something like: self-buffer.pybytes += data; self-buffer.pybytes =
self-buffer.pybytes[index:];
Antoine Pitrou pit...@free.fr added the comment:
I don't know why, but python-trunk is *much* slower than py3k (eg. with
dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).
Perhaps you tried with the pure Python version (pickle) rather than the
C one (cPickle)?
--
Changes by Collin Winter coll...@gmail.com:
--
nosy: +collinwinter
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
___
Python-bugs-list
STINNER Victor victor.stin...@haypocalc.com added the comment:
alexandre.vassalotti wrote:
The solution is to add a read buffer to Unpickler (...)
would mitigate much of the (quite large) Python function
call overhead. (...) cPickle has a performance hack to make it
uses cStringIO and
STINNER Victor victor.stin...@haypocalc.com added the comment:
Unladen Swallow has a project to optimize pickle. Currently, it uses 3
benchmarks:
pickle - use the cPickle module to pickle a variety of datasets.
pickle_dict - microbenchmark; use the cPickle module to pickle a lot
of
STINNER Victor victor.stin...@haypocalc.com added the comment:
gprof (--enable-profiler) results:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds secondscalls ms/call ms/call name
18.18 0.16 0.16 2011055
Antoine Pitrou pit...@free.fr added the comment:
Making this a duplicate of #4565 (Rewrite the IO stack in C).
If anyone disagrees, please reopen!
--
resolution: - duplicate
status: open - closed
superseder: - Rewrite the IO stack in C
___
Python
Hagen Fürstenau hfuerste...@gmx.net added the comment:
With the io-c branch I see much better unpickling performance than
before. But it still seems to be around 2 or 3 times slower than with
cPickle in 2.6.
Is this expected at this point of io-c development? Otherwise perhaps
this issue should
Antoine Pitrou pit...@free.fr added the comment:
Hello,
With the io-c branch I see much better unpickling performance than
before. But it still seems to be around 2 or 3 times slower than with
cPickle in 2.6.
It's much closer here.
With 2.7 (trunk) and cPickle:
0.439934968948
Changes by Hagen Fürstenau hfuerste...@gmx.net:
Removed file: http://bugs.python.org/file11497/pickletst.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3873
___
Hagen Fürstenau hfuerste...@gmx.net added the comment:
I uploaded a new pickletst.py which specifies protocol 2, otherwise
we're comparing apples with oranges. With this I get:
0.211881160736
0.322369813919
for Python 2.6 and
0.158488035202
1.21621990204
on the io-c branch. Can you confirm
Antoine Pitrou pit...@free.fr added the comment:
Nice catch! I can confirm your figures with protocol=2 (and protocol=-1
as well).
--
resolution: duplicate -
status: closed - open
superseder: Rewrite the IO stack in C -
___
Python tracker
Alexandre Vassalotti [EMAIL PROTECTED] added the comment:
The solution is to add a read buffer to Unpickler (Pickler already has a
write buffer, so that why it is unaffected). I believe this would
mitigate much of the (quite large) Python function call overhead.
cPickle has a performance hack
Antoine Pitrou [EMAIL PROTECTED] added the comment:
Do the numbers vary if you read the whole file at once and then unpickle
the resulting bytes string?
Large parts of the IO library are written in Python in 3.0, which might
explain the discrepancy.
--
nosy: +pitrou
Hagen Fürstenau [EMAIL PROTECTED] added the comment:
Yes, it gets much better, but even so (first reading file and timing
only loads) unpickling takes four times as long in Python 3.0 as with
the old cPickle module:
[EMAIL PROTECTED] hagenf]$ python pickletst2.py
0.0744678974152
0.0514161586761
Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment:
Indeed. If I replace the file with
f = io.BytesIO(open(tst, rb).read())
timings are divided by 20...
After quick profiling, it seems that PyLong_New would benefit from a
free list. len(bytearray) is called very often.
To stay
New submission from Hagen Fürstenau [EMAIL PROTECTED]:
Unpickling e.g. a large list seems to be really slow in Python 3.0.
The attached test script gives the following results for pickling and
unpickling a list of 1M zeros, showing that although the C
implementation seems to be used in Python
Antoine Pitrou [EMAIL PROTECTED] added the comment:
Gregory had patches for a freelist of long objects in #2013.
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3873
___
44 matches
Mail list logo