Hi Anthony,

This is pretty easy and you don't need to use PandasTools (although
PandasTools are very cool).

#!/usr/bin/env python

import sys
from rdkit import Chem

suppl = Chem.SDMolSupplier(sys.argv[1])
for mol in suppl:
    if mol:
        print(Chem.MolToSmiles(mol),mol.GetProp("_Name"))

By default, Chem.MolToSmiles produces canonical isomeric SMILES.

Here's the query I use to get drugs from ChEMBL.

select distinct canonical_smiles, chembl_id from compound_structures cs
join formulations f on cs.molregno = f.molregno
join products p on p.product_id = f.product_id
join compound_properties cp on cp.molregno = cs.molregno
join molecule_dictionary md on cp.molregno = md.molregno
where p.oral = 1
and cp.mw_freebase < 1000

If you just want the data, I have it here.

https://github.com/PatWalters/datafiles/blob/main/chembl_drugs.smi

Pat


On Sun, Sep 12, 2021 at 9:20 AM Anthony Nash <anthony.n...@ndcn.ox.ac.uk>
wrote:

> Dear all,
>
> This sounded routine enough that I thought I'd seek guidance to save
> myself hours of hacking and potential misunderstanding.
>
> My objective is to generate a canonical SMILES for each compound in an sdf
> file. The sdf file was downloaded from ChEMBL and contains some +10,000
> drugs. I've had a brief look at the RDKit API and I noticed
> rdkit.Chem.PandasTools.LoadSDF.
>
> Unfortunately, there was no function argument documentation, so I'm unsure
> whether this function yields canonical SMILES data. However, the RDKit
> website includes the following example which suggests "something"
> concerning SMILES is being processed:
>
> sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf')>>> frame 
> = PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule',...   
>          includeFingerprints=True, removeHs=False, strictParsing=True)
>
>
> Any guidance is hugely appreciated.
>
> On the other hand, if anyone can suggest a one-shop list of SMILES in a
> file for e.g., experimental drugs, FDA approved drugs, "representative" of
> chemical space, etc., that would also be appreciated.
>
>
> Thanks
> Anthony
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to