> You could simulate all this with random strings, I think.  You could
> reduce the size of the test to ten million records to make the test
> run faster...

I have now run comparison runs on several implementations of the on
file based dictionary storage.  See below the stats. The btopen,
hashopen are the functions that ship with bsddb, the sq_dict_open
function comes from an sqlite based shelve implementation by Josiah
Carlson (see http://bugs.python.org/file11470/sq_dict.py ) while the
indexed_open is my own implementation that differs from the previous
one by having explicit index management meaning one needs to call the
create_index method after loading the data. I expected that not having
an index will speed up initial data loading. I have also experimented
with a different loading strategy for this latter choice (see
fastload), it loads data from iterators via the executemany() method.
In total it loads 1 million 100 character long strings.

In a nutshell it does not seem seem likely that the btree based bsddb
can be beat in any task. Explicit index management and fast loading
can speed up the data insertion for sqlite to 30 sec instead of 86,
but that is still just about the same as the 31 sec with btopen.

The picture is less promising on the 'reverse_iter' and 'update'
functions that are 2 to 10 times slower. These functions visited all
keys in the database and accessed each one individually. The
'forward_iter' function operates on the database keys as returned from
the database whereas the 'reverse_iter' function iterates in reverse.
Access times for random items will be somewhere between the limits set
by the 'forward_iter' and 'reverse_iter' methods.

I believe that the sqlite based implementation could be sped up
substantially whenever multiple datasets need to be updated or
retrieved. It is not clear whether it could be made faster than btree
based bsddb though.

Results:

elapsed= 14.2s, test=loading, func=btopen
elapsed= 43.9s, test=loading, func=hashopen
elapsed= 32.4s, test=loading, func=sq_dict_open
elapsed= 15.4s, test=loading, func=indexed_open
----------
elapsed= 15.6s, test=fastloading, func=btopen
elapsed= 48.0s, test=fastloading, func=hashopen
elapsed= 33.5s, test=fastloading, func=sq_dict_open
elapsed=  9.1s, test=fastloading, func=indexed_open
----------
elapsed=  0.0s, test=indexing, func=btopen
elapsed=  0.0s, test=indexing, func=hashopen
elapsed=  0.0s, test=indexing, func=sq_dict_open
elapsed= 20.4s, test=indexing, func=indexed_open
----------
elapsed= 22.0s, test=forward_iter, func=btopen
elapsed= 26.4s, test=forward_iter, func=hashopen
elapsed= 42.2s, test=forward_iter, func=sq_dict_open
elapsed= 34.2s, test=forward_iter, func=indexed_open
----------
elapsed= 13.8s, test=reverse_iter, func=btopen
elapsed= 35.0s, test=reverse_iter, func=hashopen
elapsed=180.3s, test=reverse_iter, func=sq_dict_open
elapsed=171.3s, test=reverse_iter, func=indexed_open
----------
elapsed= 38.2s, test=update, func=btopen
elapsed= 76.5s, test=update, func=hashopen
elapsed=104.2s, test=update, func=sq_dict_open
elapsed= 91.9s, test=update, func=indexed_open
----------
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-dev@googlegroups.com
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to