Hi Anthony, This is pretty easy and you don't need to use PandasTools (although PandasTools are very cool).
#!/usr/bin/env python import sys from rdkit import Chem suppl = Chem.SDMolSupplier(sys.argv[1]) for mol in suppl: if mol: print(Chem.MolToSmiles(mol),mol.GetProp("_Name")) By default, Chem.MolToSmiles produces canonical isomeric SMILES. Here's the query I use to get drugs from ChEMBL. select distinct canonical_smiles, chembl_id from compound_structures cs join formulations f on cs.molregno = f.molregno join products p on p.product_id = f.product_id join compound_properties cp on cp.molregno = cs.molregno join molecule_dictionary md on cp.molregno = md.molregno where p.oral = 1 and cp.mw_freebase < 1000 If you just want the data, I have it here. https://github.com/PatWalters/datafiles/blob/main/chembl_drugs.smi Pat On Sun, Sep 12, 2021 at 9:20 AM Anthony Nash <anthony.n...@ndcn.ox.ac.uk> wrote: > Dear all, > > This sounded routine enough that I thought I'd seek guidance to save > myself hours of hacking and potential misunderstanding. > > My objective is to generate a canonical SMILES for each compound in an sdf > file. The sdf file was downloaded from ChEMBL and contains some +10,000 > drugs. I've had a brief look at the RDKit API and I noticed > rdkit.Chem.PandasTools.LoadSDF. > > Unfortunately, there was no function argument documentation, so I'm unsure > whether this function yields canonical SMILES data. However, the RDKit > website includes the following example which suggests "something" > concerning SMILES is being processed: > > sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf')>>> frame > = PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule',... > includeFingerprints=True, removeHs=False, strictParsing=True) > > > Any guidance is hugely appreciated. > > On the other hand, if anyone can suggest a one-shop list of SMILES in a > file for e.g., experimental drugs, FDA approved drugs, "representative" of > chemical space, etc., that would also be appreciated. > > > Thanks > Anthony > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss