On Mon, May 4, 2009 at 8:22 PM, Greg Landrum <greg.land...@gmail.com> wrote: > > The devil would come down to how much overhead would be required to do this. >
I think I might have damped my own enthusiasm for this by doing a quick experiment. For 10K molecules from the pubchem hts deck, the size of the binary file formed by directly writing the results of MolPickler::pickleMol (mol.ToBinary() from python): -rw-r--r-- 1 landrgr1 staff 5317844 May 4 20:31 20k.bin Add to this another 40-80K for an index mapping molId -> offset into the file. Writing the same pickle information to a blob column in a sqlite table gets me: -rw-r--r-- 1 landrgr1 staff 8140800 May 4 20:37 20k.sqlt Disk space is cheap, but +50% is rather a lot of overhead. -greg