Hi Jean-Paul,

On Dec 6, 2011, at 5:00 PM, JP wrote:
> RDKit - v2011.09.01 - chokes on massive SDF files when using 
> Chem.SDMolSupplier(input_file)
  ...
> Has anyone else noticed this?  Are there any known limitations? (buffer sizes 
> etc maybe)

This came up a couple of weeks ago on the list. The current reader does 
tell()/seek() operations on the file, with a 32-bit integer.  This can't handle 
files larger than 2**32-1 bytes long.

If you want a solution now, Greg wrote:

On Nov 21, 2011, at 9:00 PM, Greg Landrum wrote:
> If you're willing to live on the bleeding edge for a bit, there's an
> RDKit branch that contains a new way of working with SD files that is
> well suited to dealing with large files:
> https://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/StreambufSupport_18Nov2011
> 
> The new feature is the ForwardSDMolSupplier, this can be initialized
> from a filename:
> In [3]: suppl = Chem.ForwardSDMolSupplier('PubChemBackground.sdf')
> 
> or a python file-like object:
> In [4]: suppl2 = Chem.ForwardSDMolSupplier(file('PubChemBackground.sdf'))
> 
> You can read out molecules by looping over the supplier:
> In [5]: for mol in suppl2:
>   ...:     if mol is None: continue
>   ...:     print mol.GetNumAtoms()
>   ...:
> 24
> 17
> ....
> 
> Since these work using file-like objects, you can directly read from
> compressed files:
> 
> In [6]: suppl3  = Chem.ForwardSDMolSupplier(gzip.open('bigfile.sdf.gz'))
> 
> The differences to the standard SDMolSupplier :
>  - the ForwardSDMolSupplier is not random access; you cannot ask for
> a particular item
>  - there's no reset method, if you want to go through the molecules
> more than once, you have to create the supplier from scratch.
> 
> Coincidentally, this was inspired by some suggestions Andrew has made
> in the last week or so.
> 
> I will be merging this branch back into the trunk sometime in the next
> week, but the code is there, mostly tested, and usable now.





                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to