On May 2, 7:46 pm, Paul Rubin <no.em...@nospam.invalid> wrote: > Steve Howell <showel...@yahoo.com> writes: > > keys are file paths > > directories are 2 levels deep (30 dirs w/100k files each) > > values are file contents > > The current solution isn't horrible, > > Yes it is ;-) > > As I mention up top, I'm mostly hoping folks can point me toward > > sources they trust, whether it be other mailing lists, good tools, > > cdb sounds reasonable for your purposes. I'm sure there are python > bindings for it. > > http://cr.yp.to/cdb.htmlmentions a 4gb limit (2**32) but I > half-remember something about a 64 bit version.
Thanks. That's definitely in the spirit of what I'm looking for, although the non-64 bit version is obviously geared toward a slightly smaller data set. My reading of cdb is that it has essentially 64k hash buckets, so for 3 million keys, you're still scanning through an average of 45 records per read, which is about 90k of data for my record size. That seems actually inferior to a btree-based file system, unless I'm missing something. I did find this as follow up to your lead: http://thomas.mangin.com/data/source/cdb.py Unfortunately, it looks like you have to first build the whole thing in memory. -- http://mail.python.org/mailman/listinfo/python-list