Hi Anthony, Another way of accessing all the ChEMBL drugs without having to rely on a local instal of the database is to use the ChEMBL API: https://www.ebi.ac.uk/chembl/api/data/drug <https://www.ebi.ac.uk/chembl/api/data/drug>
More details on the API: https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services Hope it helps, Nicolas ----------------------------------------------- Dr Nicolas Bosc Data Mining and Analysis Scientist ChEMBL group EMBL-EBI Wellcome Genome Campus Hinxton, Cambridge, CB10 1SD United Kingdom nb...@ebi.ac.uk +44 1223 492519 > On 13 Sep 2021, at 16:07, Anthony Nash <anthony.n...@ndcn.ox.ac.uk> wrote: > > Hi Patrick, > > Thank you for the code and the links, both are very helpful and exactly what > I needed. > > Many thanks > Anthony > > Kind regards > Dr Anthony Nash PhD MRSC > > Senior Research Scientist > Nuffield Department of Clinical Neurosciences > RMCR Kellogg College > University of Oxford > http://www.kellogg.ox.ac.uk/ <http://www.kellogg.ox.ac.uk/> > > From: Patrick Walters <wpwalt...@gmail.com> > Sent: 12 September 2021 15:27 > To: Anthony Nash <anthony.n...@ndcn.ox.ac.uk> > Cc: rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net> > Subject: Re: [Rdkit-discuss] SMILES from sdf file > > Hi Anthony, > > This is pretty easy and you don't need to use PandasTools (although > PandasTools are very cool). > > #!/usr/bin/env python > > import sys > from rdkit import Chem > > suppl = Chem.SDMolSupplier(sys.argv[1]) > for mol in suppl: > if mol: > print(Chem.MolToSmiles(mol),mol.GetProp("_Name")) > > By default, Chem.MolToSmiles produces canonical isomeric SMILES. > > Here's the query I use to get drugs from ChEMBL. > > select distinct canonical_smiles, chembl_id from compound_structures cs > join formulations f on cs.molregno = f.molregno > join products p on p.product_id = f.product_id > join compound_properties cp on cp.molregno = cs.molregno > join molecule_dictionary md on cp.molregno = md.molregno > where p.oral = 1 > and cp.mw_freebase < 1000 > If you just want the data, I have it here. > > https://github.com/PatWalters/datafiles/blob/main/chembl_drugs.smi > <https://github.com/PatWalters/datafiles/blob/main/chembl_drugs.smi> > > Pat > > > On Sun, Sep 12, 2021 at 9:20 AM Anthony Nash <anthony.n...@ndcn.ox.ac.uk > <mailto:anthony.n...@ndcn.ox.ac.uk>> wrote: > Dear all, > > This sounded routine enough that I thought I'd seek guidance to save myself > hours of hacking and potential misunderstanding. > > My objective is to generate a canonical SMILES for each compound in an sdf > file. The sdf file was downloaded from ChEMBL and contains some +10,000 > drugs. I've had a brief look at the RDKit API and I noticed > rdkit.Chem.PandasTools.LoadSDF. > > Unfortunately, there was no function argument documentation, so I'm unsure > whether this function yields canonical SMILES data. However, the RDKit > website includes the following example which suggests "something" concerning > SMILES is being processed: > > sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf') > >>> frame = > >>> PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule', > ... includeFingerprints=True, removeHs=False, strictParsing=True) > > Any guidance is hugely appreciated. > > On the other hand, if anyone can suggest a one-shop list of SMILES in a file > for e.g., experimental drugs, FDA approved drugs, "representative" of > chemical space, etc., that would also be appreciated. > > > Thanks > Anthony > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > <mailto:Rdkit-discuss@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss