Dear all, This sounded routine enough that I thought I'd seek guidance to save myself hours of hacking and potential misunderstanding.
My objective is to generate a canonical SMILES for each compound in an sdf file. The sdf file was downloaded from ChEMBL and contains some +10,000 drugs. I've had a brief look at the RDKit API and I noticed rdkit.Chem.PandasTools.LoadSDF. Unfortunately, there was no function argument documentation, so I'm unsure whether this function yields canonical SMILES data. However, the RDKit website includes the following example which suggests "something" concerning SMILES is being processed: sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf') >>> frame = >>> PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule', ... includeFingerprints=True, removeHs=False, strictParsing=True) Any guidance is hugely appreciated. On the other hand, if anyone can suggest a one-shop list of SMILES in a file for e.g., experimental drugs, FDA approved drugs, "representative" of chemical space, etc., that would also be appreciated. Thanks Anthony
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss