Re: [Rdkit-discuss] kekulize AllChem.CanonSmiles error and workaround

Richard Hall Tue, 01 Aug 2017 02:04:15 -0700

When I cut the bonds in the daylight implementation, I add new single bonds to 
xenon atoms – this means the ‘imidazole’ fragment would be [Xe]n1ccnc1, which 
*is* a valid smiles.  The following python code could then be used to convert 
these Xe atoms to hydrogen,


from rdkit import Chem

HYDROGEN = 1
XENON = 54

m = Chem.MolFromSmiles('[Xe]n1ccnc1')
for a in m.GetAtoms():
    if a.GetAtomicNum() == XENON:
        a.SetAtomicNum(HYDROGEN)
n = Chem.RemoveHs(m)
print Chem.MolToSmiles(n)

hopefully that is useful.  Please fire away if you have any further questions 
or would like help with any other aspects of this – it will be great to have an 
RDKit version available for people to use ☺
Rich

From: Konrad Koehler [mailto:[email protected]]
Sent: 01 August 2017 05:29
To: [email protected]
Subject: [Rdkit-discuss] kekulize AllChem.CanonSmiles error and workaround

Hi,

I am having trouble canonicalizing smiles with ambiguous heteroaromatic 
tautomers such as imidazole. For example:

>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> smiles = ‘n1cncc1'
>>> AllChem.CanonSmiles(smiles)
[21:42:52] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4


As a workaround, one can first canonicalize with Open Babel pybel to remove the 
ambiguity and then canonicalize with RDKit:

>>> import pybel
>>> pybel.readstring("smi", "n1cncc1").write("can")
'c1ncc[nH]1\t\n'
>>> AllChem.CanonSmiles('c1ncc[nH]1\t\n')
'c1c[nH]cn1’


or in one line:


>>> AllChem.CanonSmiles(pybel.readstring("smi", "n1cncc1").write("can"))
'c1c[nH]cn1'

It would be nice if RDKit could do this without the assistance of pybel.



This problem arose when implementing the algorithm described in the following 
paper:

Hall RJ, Murray CW, Verdonk ML. The Fragment Network: A Chemistry 
Recommendation Engine Built Using a Graph Database. J Med Chem. 2017; 
60(14):6440-50. PMID: 28712298, doi: 10.1021/acs.jmedchem.7b00809
Details of the algorithm are contained in supporting information:
http://pubs.acs.org/doi/suppl/10.1021/acs.jmedchem.7b00809/suppl_file/jm7b00809_si_001.pdf

The algorithm fragments the molecule at acyclic bonds connected to rings and it 
is necessary to canonicalize both the parent and child fragments. The algorithm 
is recursive and fortunately the smiles can be recursively processed by 
AllChem.CanonSmiles after it has been disambiguated:

>>> AllChem.CanonSmiles('c1c[nH]cn1')
'c1c[nH]cn1’

I eventually plan to donate the RDKit Fragment Network script to the community 
after testing and optimization.

Best,

Konrad
This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying or distribution of this email (or any attachments thereto) by others is 
strictly prohibited. If you are not the intended recipient, please delete the 
original and any copies of this email and any attachments thereto and notify 
the sender immediately.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] kekulize AllChem.CanonSmiles error and workaround

Reply via email to