Dear all,

The changes to add support for a ForwardSDMolSupplier that can work
with very large files and read directly from gzipped SD files have now
been merged onto the trunk.

I also checked in modifications to the SDWriter, SmilesWriter, and
TDTWriter classes so that they can now write to file-like objects as
well as named files. This means you can directly generate gzipped SD
files or SD text from within python.

There is currently no support for a ForwardSmilesMolSupplier since
that turns out to be more work than the ForwardSDMolSupplier, but if
there is demand it can be added in the future.

Best,
-greg

On Tue, Dec 6, 2011 at 5:05 PM, Andrew Dalke <[email protected]> wrote:
> Hi Jean-Paul,
>
>
> On Dec 6, 2011, at 5:00 PM, JP wrote:
>> RDKit - v2011.09.01 - chokes on massive SDF files when using 
>> Chem.SDMolSupplier(input_file)
>  ...
>> Has anyone else noticed this?  Are there any known limitations? (buffer 
>> sizes etc maybe)
>
> This came up a couple of weeks ago on the list. The current reader does 
> tell()/seek() operations on the file, with a 32-bit integer.  This can't 
> handle files larger than 2**32-1 bytes long.
>
> If you want a solution now, Greg wrote:
>
> On Nov 21, 2011, at 9:00 PM, Greg Landrum wrote:
>> If you're willing to live on the bleeding edge for a bit, there's an
>> RDKit branch that contains a new way of working with SD files that is
>> well suited to dealing with large files:
>> https://rdkit.svn.sourceforge.net/svnroot/rdkit/branches/StreambufSupport_18Nov2011
>>
>> The new feature is the ForwardSDMolSupplier, this can be initialized
>> from a filename:
>> In [3]: suppl = Chem.ForwardSDMolSupplier('PubChemBackground.sdf')
>>
>> or a python file-like object:
>> In [4]: suppl2 = Chem.ForwardSDMolSupplier(file('PubChemBackground.sdf'))
>>
>> You can read out molecules by looping over the supplier:
>> In [5]: for mol in suppl2:
>>   ...:     if mol is None: continue
>>   ...:     print mol.GetNumAtoms()
>>   ...:
>> 24
>> 17
>> ....
>>
>> Since these work using file-like objects, you can directly read from
>> compressed files:
>>
>> In [6]: suppl3  = Chem.ForwardSDMolSupplier(gzip.open('bigfile.sdf.gz'))
>>
>> The differences to the standard SDMolSupplier :
>>  - the ForwardSDMolSupplier is not random access; you cannot ask for
>> a particular item
>>  - there's no reset method, if you want to go through the molecules
>> more than once, you have to create the supplier from scratch.
>>
>> Coincidentally, this was inspired by some suggestions Andrew has made
>> in the last week or so.
>>
>> I will be merging this branch back into the trunk sometime in the next
>> week, but the code is there, mostly tested, and usable now.
>
>
>
>
>
>                                Andrew
>                                [email protected]
>
>
>
> ------------------------------------------------------------------------------
> Cloud Services Checklist: Pricing and Packaging Optimization
> This white paper is intended to serve as a reference, checklist and point of
> discussion for anyone considering optimizing the pricing and packaging model
> of a cloud services business. Read Now!
> http://www.accelacomm.com/jaw/sfnl/114/51491232/
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to