Bugs item #849662, was opened at 2003-11-26 09:06 Message generated for change (Settings changed) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=849662&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Extension Modules Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gottfried Ganßauge (ganssauge) >Assigned to: Nobody/Anonymous (nobody) Summary: reading shelves is really slow Initial Comment: My application uses a shelve-file which is created by another process using the same python version. Before python2.3 using this shelve with the exact same application was almost twice as fast as a binary pickle containing the same data. Now with python2.3 the same application is suddenly about 150 times slower than using the binary pickle. The usage is as follows: idx_dict = shelve.open (idx_dict_name, "r") ... while not infile.eof: index = get_index_from_somewhere_else() if not idx_dict.has_key (index): do_something(index) else: do_something_else(index) idx.dict.close() Profiling revealed that most of the time is spent within userdict. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-12-07 06:55 Message: Logged In: YES user_id=80475 I fixed-up your particular problem for Py2.3.3 and Py2.4. Leaving the report open because there are other calls which have performance issues. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-11-28 16:57 Message: Logged In: YES user_id=80475 Yes, that was the culprit. I'll look for a way to make __cmp__ a bit smarter. In the meantime, the proper way to check for None is always: if dict is None. ---------------------------------------------------------------------- Comment By: Gottfried Ganßauge (ganssauge) Date: 2003-11-28 11:01 Message: Logged In: YES user_id=792746 I think I found the answer: apart from has_key() I'm using "dict != None". If I leave that out in my test program both python variants run with the same speed. The dict != None condition seems to trigger len(dict.keys()) and that seems to be way slower than before. I definitely didn't time different scripts: the script is part of our CDROM production system and the only variables I had during my tests were python itself and the python path. Find my test script attached... ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-11-27 12:55 Message: Logged In: YES user_id=80475 The fragment in the original posting showed the only inner-loop shelve access was through has_key(). The tracebacks show that UserDict is nowhere in the traceback chain. I conclude that the fragment does not represent what is really going on in the problematic script. So, please attach the profiled script, Konvertierung/entsch_pass2.py The attached profile indicates that somewhere, there is a line like: for k,v in idx_dict.iteritems(). This is surprising because shelves did not support iteritems() in Py2.2. That would be mean that you've timed and compared two different pieces of code. Please show the shortest script with data that runs at radically different speeds on Py2.2 vs Py2.3. ---------------------------------------------------------------------- Comment By: Gottfried Ganßauge (ganssauge) Date: 2003-11-27 05:42 Message: Logged In: YES user_id=792746 What the heck ... here is the shelve in question ---------------------------------------------------------------------- Comment By: Gottfried Ganßauge (ganssauge) Date: 2003-11-27 05:32 Message: Logged In: YES user_id=792746 I uploaded my profiling data, maybe it will help you ... Here is the information you requested: ----------------><------------------------><------------ ([EMAIL PROTECTED] 534) PYTHONPATH=../../../COMMON.DEVEL/Tools/python/lib.linux- i686-2.3 python Konvertierung/entsch_pass2.py HI69228 x HR all_idx2.shelve <hi69228.sgml Traceback (most recent call last): File "Konvertierung/entsch_pass2.py", line 1026, in ? init_idx_dict (idx_dict_name) File "../../COMMON/lib/EDB.py", line 54, in init_idx_dict idx_dict.has_key([]) File "/usr/lib/python2.3/shelve.py", line 104, in has_key return self.dict.has_key(key) File "/usr/lib/python2.3/bsddb/__init__.py", line 142, in has_key return self.db.has_key(key) TypeError: String or Integer object expected for key, list found ([EMAIL PROTECTED] 535) PYTHONPATH=../../../COMMON.DEVEL/Tools/python/lib.linux- i686-2.2 python2.2 Konvertierung/entsch_pass2.py HI69228 x HR all_idx2.shelve <hi69228.sgml Traceback (most recent call last): File "Konvertierung/entsch_pass2.py", line 1026, in ? init_idx_dict (idx_dict_name) File "../../COMMON/lib/EDB.py", line 54, in init_idx_dict idx_dict.has_key([]) File "/usr/lib/python2.2/shelve.py", line 62, in has_key return self.dict.has_key(key) TypeError: key type must be string ([EMAIL PROTECTED] 536) python -V Python 2.3.2 ([EMAIL PROTECTED] 537) python2.2 -V Python 2.2.3 ([EMAIL PROTECTED] 538) uname -a Linux gglinux 2.4.22 #1 SMP Mon Nov 3 11:40:28 CET 2003 i686 unknown unknown GNU/Linux ([EMAIL PROTECTED] 538) cat /etc/debian_version testing/unstable ([EMAIL PROTECTED] 539) python2.2 -c 'import shelve ; d = shelve.open("all_idx2.shelve", "r"); print len (d.keys()) ; print d.keys()[0], d [d.keys()[0]]' 34983 HI568817 None ([EMAIL PROTECTED] 540) python2.3 -c 'import shelve ; d = shelve.open("all_idx2.shelve", "r"); print "# items in shelve:", len (d.keys()) ; print "Items look like: index", d.keys() [0], "value", d [d.keys()[0]]' # items in shelve: 34983 Items look like: index HI568817 value None ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2003-11-27 04:17 Message: Logged In: YES user_id=80475 I can reproduce a four-fold slowdown that persists even after the UserDict.DictMixin lines are commented out of shelve.py and bsddb.__init__.py. For me, the only thing that has changed is the underlying bsddb implementation. Let's see if you system is going somewhere else to get its shelving done. After the first line, add: idx_dict.has_key ([]) Then post the traceback here. Do that for both Py2.2 and for Py2.3. Thank you. Also, post what a typical record in the index and tell me how many entries are typically in idx_dict. That way, I can try to reproduce your timings with greater fidelity. Which os are you using and what the minor bugfix verion numbers of the Py2.2 and PY2.3 you are using. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=849662&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com