I am having trouble canonicalizing smiles with ambiguous heteroaromatic 
tautomers such as imidazole. For example:

>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> smiles = ‘n1cncc1'
>>> AllChem.CanonSmiles(smiles)
[21:42:52] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4

As a workaround, one can first canonicalize with Open Babel pybel to remove the 
ambiguity and then canonicalize with RDKit:

>>> import pybel
>>> pybel.readstring("smi", "n1cncc1").write("can")
>>> AllChem.CanonSmiles('c1ncc[nH]1\t\n')

or in one line:

>>> AllChem.CanonSmiles(pybel.readstring("smi", "n1cncc1").write("can"))

It would be nice if RDKit could do this without the assistance of pybel.

This problem arose when implementing the algorithm described in the following 

Hall RJ, Murray CW, Verdonk ML. The Fragment Network: A Chemistry 
Recommendation Engine Built Using a Graph Database. J Med Chem. 2017; 
60(14):6440-50. PMID: 28712298, doi: 10.1021/acs.jmedchem.7b00809

Details of the algorithm are contained in supporting information:

The algorithm fragments the molecule at acyclic bonds connected to rings and it 
is necessary to canonicalize both the parent and child fragments. The algorithm 
is recursive and fortunately the smiles can be recursively processed by 
AllChem.CanonSmiles after it has been disambiguated:

>>> AllChem.CanonSmiles('c1c[nH]cn1')

I eventually plan to donate the RDKit Fragment Network script to the community 
after testing and optimization.


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Rdkit-discuss mailing list

Reply via email to