Hi Jean-Paul,
On Dec 6, 2011, at 5:00 PM, JP wrote: > RDKit - v2011.09.01 - chokes on massive SDF files when using > Chem.SDMolSupplier(input_file) ... > Has anyone else noticed this? Are there any known limitations? (buffer sizes > etc maybe) This came up a couple of weeks ago on the list. The current reader does tell()/seek() operations on the file, with a 32-bit integer. This can't handle files larger than 2**32-1 bytes long. If you want a solution now, Greg wrote: On Nov 21, 2011, at 9:00 PM, Greg Landrum wrote: > If you're willing to live on the bleeding edge for a bit, there's an > RDKit branch that contains a new way of working with SD files that is > well suited to dealing with large files: > https://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/StreambufSupport_18Nov2011 > > The new feature is the ForwardSDMolSupplier, this can be initialized > from a filename: > In [3]: suppl = Chem.ForwardSDMolSupplier('PubChemBackground.sdf') > > or a python file-like object: > In [4]: suppl2 = Chem.ForwardSDMolSupplier(file('PubChemBackground.sdf')) > > You can read out molecules by looping over the supplier: > In [5]: for mol in suppl2: > ...: if mol is None: continue > ...: print mol.GetNumAtoms() > ...: > 24 > 17 > .... > > Since these work using file-like objects, you can directly read from > compressed files: > > In [6]: suppl3 = Chem.ForwardSDMolSupplier(gzip.open('bigfile.sdf.gz')) > > The differences to the standard SDMolSupplier : > - the ForwardSDMolSupplier is not random access; you cannot ask for > a particular item > - there's no reset method, if you want to go through the molecules > more than once, you have to create the supplier from scratch. > > Coincidentally, this was inspired by some suggestions Andrew has made > in the last week or so. > > I will be merging this branch back into the trunk sometime in the next > week, but the code is there, mostly tested, and usable now. Andrew [email protected] ------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

