[Rdkit-discuss] SMILES from sdf file

Anthony Nash Sun, 12 Sep 2021 06:20:09 -0700

Dear all,

This sounded routine enough that I thought I'd seek guidance to save myself 
hours of hacking and potential misunderstanding.


My objective is to generate a canonical SMILES for each compound in an sdf 
file. The sdf file was downloaded from ChEMBL and contains some +10,000 drugs. 
I've had a brief look at the RDKit API and I noticed 
rdkit.Chem.PandasTools.LoadSDF.

Unfortunately, there was no function argument documentation, so I'm unsure 
whether this function yields canonical SMILES data. However, the RDKit website 
includes the following example which suggests "something" concerning SMILES is 
being processed:


sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf')
>>> frame = 
>>> PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule',
...            includeFingerprints=True, removeHs=False, strictParsing=True)

Any guidance is hugely appreciated.

On the other hand, if anyone can suggest a one-shop list of SMILES in a file 
for e.g., experimental drugs, FDA approved drugs, "representative" of chemical 
space, etc., that would also be appreciated.


Thanks
Anthony

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] SMILES from sdf file

Reply via email to