Steve Howell <showel...@yahoo.com> writes: > Thanks. That's definitely in the spirit of what I'm looking for, > although the non-64 bit version is obviously geared toward a slightly > smaller data set. My reading of cdb is that it has essentially 64k > hash buckets, so for 3 million keys, you're still scanning through an > average of 45 records per read, which is about 90k of data for my > record size. That seems actually inferior to a btree-based file > system, unless I'm missing something.
1) presumably you can use more buckets in a 64 bit version; 2) scanning 90k probably still takes far less time than a disk seek, even a "seek" (several microseconds in practice) with a solid state disk. > http://thomas.mangin.com/data/source/cdb.py > Unfortunately, it looks like you have to first build the whole thing > in memory. It's probably fixable, but I'd guess you could just use Bernstein's cdbdump program instead. Alternatively maybe you could use one of the *dbm libraries, which burn a little more disk space, but support online update. -- http://mail.python.org/mailman/listinfo/python-list