> You could simulate all this with random strings, I think. You could > reduce the size of the test to ten million records to make the test > run faster...
I have now run comparison runs on several implementations of the on file based dictionary storage. See below the stats. The btopen, hashopen are the functions that ship with bsddb, the sq_dict_open function comes from an sqlite based shelve implementation by Josiah Carlson (see http://bugs.python.org/file11470/sq_dict.py ) while the indexed_open is my own implementation that differs from the previous one by having explicit index management meaning one needs to call the create_index method after loading the data. I expected that not having an index will speed up initial data loading. I have also experimented with a different loading strategy for this latter choice (see fastload), it loads data from iterators via the executemany() method. In total it loads 1 million 100 character long strings. In a nutshell it does not seem seem likely that the btree based bsddb can be beat in any task. Explicit index management and fast loading can speed up the data insertion for sqlite to 30 sec instead of 86, but that is still just about the same as the 31 sec with btopen. The picture is less promising on the 'reverse_iter' and 'update' functions that are 2 to 10 times slower. These functions visited all keys in the database and accessed each one individually. The 'forward_iter' function operates on the database keys as returned from the database whereas the 'reverse_iter' function iterates in reverse. Access times for random items will be somewhere between the limits set by the 'forward_iter' and 'reverse_iter' methods. I believe that the sqlite based implementation could be sped up substantially whenever multiple datasets need to be updated or retrieved. It is not clear whether it could be made faster than btree based bsddb though. Results: elapsed= 14.2s, test=loading, func=btopen elapsed= 43.9s, test=loading, func=hashopen elapsed= 32.4s, test=loading, func=sq_dict_open elapsed= 15.4s, test=loading, func=indexed_open ---------- elapsed= 15.6s, test=fastloading, func=btopen elapsed= 48.0s, test=fastloading, func=hashopen elapsed= 33.5s, test=fastloading, func=sq_dict_open elapsed= 9.1s, test=fastloading, func=indexed_open ---------- elapsed= 0.0s, test=indexing, func=btopen elapsed= 0.0s, test=indexing, func=hashopen elapsed= 0.0s, test=indexing, func=sq_dict_open elapsed= 20.4s, test=indexing, func=indexed_open ---------- elapsed= 22.0s, test=forward_iter, func=btopen elapsed= 26.4s, test=forward_iter, func=hashopen elapsed= 42.2s, test=forward_iter, func=sq_dict_open elapsed= 34.2s, test=forward_iter, func=indexed_open ---------- elapsed= 13.8s, test=reverse_iter, func=btopen elapsed= 35.0s, test=reverse_iter, func=hashopen elapsed=180.3s, test=reverse_iter, func=sq_dict_open elapsed=171.3s, test=reverse_iter, func=indexed_open ---------- elapsed= 38.2s, test=update, func=btopen elapsed= 76.5s, test=update, func=hashopen elapsed=104.2s, test=update, func=sq_dict_open elapsed= 91.9s, test=update, func=indexed_open ---------- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---