Re: key/value store optimized for disk storage

Paul Rubin Wed, 02 May 2012 20:33:13 -0700

Steve Howell <showel...@yahoo.com> writes:
> Thanks.  That's definitely in the spirit of what I'm looking for,
> although the non-64 bit version is obviously geared toward a slightly
> smaller data set.  My reading of cdb is that it has essentially 64k
> hash buckets, so for 3 million keys, you're still scanning through an
> average of 45 records per read, which is about 90k of data for my
> record size.  That seems actually inferior to a btree-based file
> system, unless I'm missing something.


1) presumably you can use more buckets in a 64 bit version; 2) scanning
90k probably still takes far less time than a disk seek, even a "seek"
(several microseconds in practice) with a solid state disk.

> http://thomas.mangin.com/data/source/cdb.py
> Unfortunately, it looks like you have to first build the whole thing
> in memory.

It's probably fixable, but I'd guess you could just use Bernstein's
cdbdump program instead.

Alternatively maybe you could use one of the *dbm libraries,
which burn a little more disk space, but support online update.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: key/value store optimized for disk storage

Reply via email to