Dear all,

This sounded routine enough that I thought I'd seek guidance to save myself 
hours of hacking and potential misunderstanding.

My objective is to generate a canonical SMILES for each compound in an sdf 
file. The sdf file was downloaded from ChEMBL and contains some +10,000 drugs. 
I've had a brief look at the RDKit API and I noticed 
rdkit.Chem.PandasTools.LoadSDF.

Unfortunately, there was no function argument documentation, so I'm unsure 
whether this function yields canonical SMILES data. However, the RDKit website 
includes the following example which suggests "something" concerning SMILES is 
being processed:


sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf')
>>> frame = 
>>> PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule',
...            includeFingerprints=True, removeHs=False, strictParsing=True)

Any guidance is hugely appreciated.

On the other hand, if anyone can suggest a one-shop list of SMILES in a file 
for e.g., experimental drugs, FDA approved drugs, "representative" of chemical 
space, etc., that would also be appreciated.


Thanks
Anthony
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to