Hello,
I am using rdkit to read data from SD files.

My goal is to extract both the molecules and their associated properties (which 
for our purposes are separate entities) from the SDF.
[For 100% clarity: by 'properties' I don't mean calculated properties or atom 
or bond properties, but the text properties that were saved in the SDF with 
each molecule, i.e. those that you get when you do mol.GetPropsAsDict() ].

After several tests I found that Chem.ForwardSDMolSupplier does what I need.

But there is an issue.
When Chem.ForwardSDMolSupplier decides that a molecule is not OK, i.e. when it 
says it is None, the SDF record is lost:
I cannot access its Props; I cannot save the failed SDF record for later 
inspection.
[Or at least, I don't know how to do it, hence this question].
At most I can collect the indices of the records that fail.

> Would anyone be able to suggest how to save to a text file (which an SDF 
> essentially already is) the SDF records for which Chem.ForwardSDMolSupplier 
> returns a None?
> Even better, could the properties associated to the failed molecules be read 
> independently? In theory the properties are in a separate part of the CTAB, 
> so even when the atoms, bonds, etc. have a problem, the properties might 
> still be OK.

(Note: PandasTools.LoadSDF gives the same issue, it does not even store in the 
DataFrame the records for which the molecule is None, and in any case it cannot 
be used with the kind of SDF's I am handling, as it uses an enormous amount of 
memory for the molecules - hence the decision to use Chem.ForwardSDMolSupplier 
and pickle the molecules as soon as they are read).

Thanks
This e-mail and its attachment(s) (if any) may contain confidential and/or 
proprietary information and is intended for its addressee(s) only. Any 
unauthorized use of the information contained herein (including, but not 
limited to, alteration, reproduction, communication, distribution or any other 
form of dissemination) is strictly prohibited. If you are not the intended 
addressee, please notify the originator promptly and delete this e-mail and its 
attachment(s) (if any) subsequently. Neither Galapagos nor any of its 
affiliates shall be liable for direct, special, indirect or consequential 
damages arising from alteration of the contents of this message (by a third 
party) or as a result of a virus being passed on.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to