Hi Anthony,

Another way of accessing all the ChEMBL drugs without having to rely on a local 
instal of the database is to use the ChEMBL API: 
https://www.ebi.ac.uk/chembl/api/data/drug 
<https://www.ebi.ac.uk/chembl/api/data/drug>

More details on the API: 
https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services

Hope it helps,
Nicolas
-----------------------------------------------
Dr Nicolas Bosc
Data Mining and Analysis Scientist
ChEMBL group
EMBL-EBI
Wellcome Genome Campus
Hinxton, Cambridge, CB10 1SD
United Kingdom

nb...@ebi.ac.uk
+44 1223 492519


> On 13 Sep 2021, at 16:07, Anthony Nash <anthony.n...@ndcn.ox.ac.uk> wrote:
> 
> Hi Patrick, 
> 
> Thank you for the code and the links, both are very helpful and exactly what 
> I needed. 
> 
> Many thanks
> Anthony
> 
> Kind regards
> Dr Anthony Nash PhD MRSC
> 
> Senior Research Scientist
> Nuffield Department of Clinical Neurosciences
> RMCR Kellogg College 
> University of Oxford
> http://www.kellogg.ox.ac.uk/ <http://www.kellogg.ox.ac.uk/>
> 
> From: Patrick Walters <wpwalt...@gmail.com>
> Sent: 12 September 2021 15:27
> To: Anthony Nash <anthony.n...@ndcn.ox.ac.uk>
> Cc: rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net>
> Subject: Re: [Rdkit-discuss] SMILES from sdf file
>  
> Hi Anthony,
> 
> This is pretty easy and you don't need to use PandasTools (although 
> PandasTools are very cool).  
> 
> #!/usr/bin/env python
> 
> import sys
> from rdkit import Chem
> 
> suppl = Chem.SDMolSupplier(sys.argv[1])
> for mol in suppl:
>     if mol:
>         print(Chem.MolToSmiles(mol),mol.GetProp("_Name"))
> 
> By default, Chem.MolToSmiles produces canonical isomeric SMILES.   
> 
> Here's the query I use to get drugs from ChEMBL.
> 
> select distinct canonical_smiles, chembl_id from compound_structures cs
> join formulations f on cs.molregno = f.molregno
> join products p on p.product_id = f.product_id
> join compound_properties cp on cp.molregno = cs.molregno
> join molecule_dictionary md on cp.molregno = md.molregno
> where p.oral = 1
> and cp.mw_freebase < 1000
> If you just want the data, I have it here. 
> 
> https://github.com/PatWalters/datafiles/blob/main/chembl_drugs.smi 
> <https://github.com/PatWalters/datafiles/blob/main/chembl_drugs.smi>
> 
> Pat
> 
> 
> On Sun, Sep 12, 2021 at 9:20 AM Anthony Nash <anthony.n...@ndcn.ox.ac.uk 
> <mailto:anthony.n...@ndcn.ox.ac.uk>> wrote:
> Dear all, 
> 
> This sounded routine enough that I thought I'd seek guidance to save myself 
> hours of hacking and potential misunderstanding. 
> 
> My objective is to generate a canonical SMILES for each compound in an sdf 
> file. The sdf file was downloaded from ChEMBL and contains some +10,000 
> drugs. I've had a brief look at the RDKit API and I noticed 
> rdkit.Chem.PandasTools.LoadSDF. 
> 
> Unfortunately, there was no function argument documentation, so I'm unsure 
> whether this function yields canonical SMILES data. However, the RDKit 
> website includes the following example which suggests "something" concerning 
> SMILES is being processed:
> 
> sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf')
> >>> frame = 
> >>> PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule',
> ...            includeFingerprints=True, removeHs=False, strictParsing=True)
> 
> Any guidance is hugely appreciated. 
> 
> On the other hand, if anyone can suggest a one-shop list of SMILES in a file 
> for e.g., experimental drugs, FDA approved drugs, "representative" of 
> chemical space, etc., that would also be appreciated. 
> 
> 
> Thanks
> Anthony
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net 
> <mailto:Rdkit-discuss@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to