On Mon, May 4, 2009 at 8:22 PM, Greg Landrum <greg.land...@gmail.com> wrote:
>
> The devil would come down to how much overhead would be required to do this.
>

I think I might have damped my own enthusiasm for this by doing a
quick experiment.

For 10K molecules from the pubchem hts deck, the size of the binary
file formed by directly writing the results of MolPickler::pickleMol
(mol.ToBinary() from python):
-rw-r--r--   1 landrgr1  staff  5317844 May  4 20:31 20k.bin
Add to this another 40-80K for an index mapping molId -> offset into the file.

Writing the same pickle information to a blob column in a sqlite table gets me:
-rw-r--r--   1 landrgr1  staff  8140800 May  4 20:37 20k.sqlt

Disk space is cheap, but +50% is rather a lot of overhead.

-greg

Reply via email to