Hi Paolo,

Yes, that is very helpful indeed, so thank you!

Perhaps something of this nature could be added either to the cookbook or the 
docs as this API is rather unclear?

Cheers,
Jeff

From: Paolo Tosco <paolo.tosco.m...@gmail.com>
Date: Friday, August 14, 2020 at 1:34 PM
To: Jeffrey Van santen <jeffrey_van_san...@sfu.ca>
Cc: "rdkit-discuss@lists.sourceforge.net" <rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Atom Order Canonicalization

Hi Jeffrey,

this gist shows how to achieve what you need:

https://gist.github.com/ptosco/36574d7f025a932bc1b8db221903a8d2

i.e., how to reorder atoms based on the result of Chem.CanonicalRankAtoms().

HTH, cheers
p.

On Fri, Aug 14, 2020 at 8:36 PM Jeffrey Van santen 
<jeffrey_van_san...@sfu.ca<mailto:jeffrey_van_san...@sfu.ca>> wrote:
Hello all,

I realize that this topic has been discussed in some detail 
(https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/76909664-2C16-4B61-8BEE-2196B3721EA1%40gmail.com/#msg34923617),
 but I remain somewhat confused. Let me layout what I am trying to achieve:

I would like a method for creating a canonical order of the atoms in a 
molecule, independent of the input order. For example, given 
(R)-1-(sec-butyl)naphthalene (see attached image)
[A close up of a logo  Description automatically generated]

if you start with the following smiles string “CC[C@H](C1=CC=CC2=C1C=CC=C2)C” 
versus the InChI string 
“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”,
 you obviously get two different atom orders. I have tried to apply the 
`CanonicalRankAtoms` method to each of the molecules, such as the following 
example code:

```
from rdkit import Chem

def atom_order(m):
    return [(x.GetIdx(), x.GetAtomicNum(), x.GetDegree()) for x in m.GetAtoms()]

m = Chem.MolFromSmiles(“CC[C@H](C1=CC=CC2=C1C=CC=C2)C”)
m = Chem.AddHs(m)
m1 = 
Chem.MolFromInchi(“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”)
m1 = Chem.AddHs(m1)
# Some simple comparison of atom ordering
atom_order(m) == atom_order(m1) # returns False
m_order = list(Chem.CanonicalRankAtoms(m))
m1_order = list(Chem.CanonicalRankAtoms(m1))
m_order == m1_order # returns False
# For completeness
m_ordered = Chem.RenumberAtoms(m, m_order)
m1_ordered = Chem.RenumberAtoms(m1, m1_order)
atom_order(m_ordered) == atom_order(m1_ordered) # returns False
```

One plausible solution that seems to work, is the following extension:

```
m_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m))
m1_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m1))
atom_order(m_canon) == atom_order(m1_canon) # returns True
```

I believe this works because by default `MolToSmiles` has the `canonical=True`.

I suppose what I would like to know is

  1.  Why does CanonicalAtomRank not return the same result for different 
inputs of the same molecule. I understand that it has something to do with the 
underlying molecular graph. In particular, in the linked mail list discussion 
Greg says (https://sourceforge.net/p/rdkit/mailman/message/34923647/):
“If you just want a canonical ordering of the atoms, there is no reason to 
generate the SMILES. You can just use Chem.CanonicalRankAtoms().”
  2.  Is there a better solution than round tripping from import X format -> 
export canonical smiles -> import canonical smiles -> export canonical mol (mol 
file or similar)?
  3.  In a related but tangential questions, is there a way to have canonical 
smiles without the lowercase aromaticity notation?

Thank you very much,

Jeff van Santen
The Natural Products Atlas (www.npatlas.org<http://www.npatlas.org>)
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to