New submission from stw <sil...@googlemail.com>:

I've found that unpickling a certain kind of dictionary is substantially slower 
in python 2.7 compared to python 2.6. The dictionary has keys that are tuples 
of strings - a 1-tuple is enough to see the effect. The problem seems to be 
caused by garbage collection, as turning it off eliminates the slowdown. Both 
pickle and cPickle modules are affected.


I've attached two files to demonstrate this. The file 'make_file.py'
creates a dictionary of specified size, with keys containing 1-tuples of random 
strings. It then dumps the dictionary to a pickle file using a specified pickle 
module.

The file 'load_file.py' unpickles the file created by 'make_file.py', using a 
specified pickle module, and prints the time taken. The code can be run with 
garbage collection either on or off.

The results below are for a dictionary of 200000 entries. Each entry is the 
time taken in seconds with garbage collection on / garbage collection off. The 
row headings are the module used to pickle the data, the column headings the 
module used to unpickle it.


python 2.6, n = 200000

               size    pickle      cPickle
    pickle     4.3M    3.02/2.65   0.786/0.559 
    cPickle    3.4M    2.27/2.04   0.66/0.443 


python 2.7, n = 200000

               size    pickle      cPickle
    pickle     4.3M    10.5/2.67   6.62/0.563 
    cPickle    2.4M    1.45/1.39   0.362/0.325 


When pickle is used to pickle the data, there is a significant slowdown in 
python 2.7 compared to python 2.6 with garbage collection on. With garbage 
collection off the times in python 2.7 are essentially identical to those in 
python 2.6.

When cPickle is used to pickle the data, both unpicklers are faster in python 
2.7 than in python 2.6. Presumably the speedup is due to the dictionary 
optimizations introduced from issue #5670.


Both pickle and cPickle show a slowdown when data pickled in python 2.6 is 
unpickled in python 2.7:


pickled in python 2.6, unpickled in python 2.7, n = 200000

                      size    pickle (2.7)    cPickle (2.7)
    pickle (2.6)      4.3M    10.4/2.66       6.64/0.56 
    cPickle (2.6)     3.4M    8.73/2.08       6.1/0.452 


I don't know enough about the internals of the pickle modules or garbage 
collector to offer an explanation/fix. The list of optimizations for python 2.7 
indicates changes to both pickle modules (issues #5670 and #5084) and the 
garbage collector (issues #4074 and #4688). It seems possible that the slowdown 
is the result of some interaction between these changes.


Further notes:

1. System details: python 2.6.5 and python 2.7.3 on Ubuntu 10.04, 1.73GHz 
Pentium M processor.

2. Only pickle files created with protocols 1 and 2 are affected. Pickling with 
protocol 0 gives similar timings on python 2.6 and 2.7.

3. The fact that the dictionary's keys are tuples is relevant, although the 
length of the tuple is not. Unpickling a dictionary whose keys are strings does 
not show any slowdown.

----------
files: make_file.py
messages: 160368
nosy: stw
priority: normal
severity: normal
status: open
title: Slow unpickling of certain dictionaries in python 2.7 vs python 2.6
type: performance
versions: Python 2.7
Added file: http://bugs.python.org/file25524/make_file.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14775>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to