Hi Jeffrey, this gist shows how to achieve what you need:
https://gist.github.com/ptosco/36574d7f025a932bc1b8db221903a8d2 i.e., how to reorder atoms based on the result of Chem.CanonicalRankAtoms(). HTH, cheers p. On Fri, Aug 14, 2020 at 8:36 PM Jeffrey Van santen < jeffrey_van_san...@sfu.ca> wrote: > Hello all, > > > > I realize that this topic has been discussed in some detail ( > https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/76909664-2C16-4B61-8BEE-2196B3721EA1%40gmail.com/#msg34923617), > but I remain somewhat confused. Let me layout what I am trying to achieve: > > > > I would like a method for creating a canonical order of the atoms in a > molecule, independent of the input order. For example, given > (R)-1-(sec-butyl)naphthalene (see attached image) > > [image: A close up of a logo Description automatically generated] > > > > if you start with the following smiles string “CC[C@H](C1=CC=CC2=C1C=CC=C2)C” > versus the InChI string > “InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”, > you obviously get two different atom orders. I have tried to apply the > `CanonicalRankAtoms` method to each of the molecules, such as the following > example code: > > ``` > > from rdkit import Chem > > > > def atom_order(m): > > return [(x.GetIdx(), x.GetAtomicNum(), x.GetDegree()) for x in > m.GetAtoms()] > > > > m = Chem.MolFromSmiles(“CC[C@H](C1=CC=CC2=C1C=CC=C2)C”) > > m = Chem.AddHs(m) > > m1 = > Chem.MolFromInchi(“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”) > > m1 = Chem.AddHs(m1) > > # Some simple comparison of atom ordering > > atom_order(m) == atom_order(m1) # returns False > > m_order = list(Chem.CanonicalRankAtoms(m)) > > m1_order = list(Chem.CanonicalRankAtoms(m1)) > > m_order == m1_order # returns False > > # For completeness > > m_ordered = Chem.RenumberAtoms(m, m_order) > > m1_ordered = Chem.RenumberAtoms(m1, m1_order) > > atom_order(m_ordered) == atom_order(m1_ordered) # returns False > > ``` > > > > One plausible solution that seems to work, is the following extension: > > > > ``` > > m_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m)) > > m1_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m1)) > > atom_order(m_canon) == atom_order(m1_canon) # returns True > > ``` > > > > I believe this works because by default `MolToSmiles` has the > `canonical=True`. > > > > I suppose what I would like to know is > > 1. Why does CanonicalAtomRank not return the same result for different > inputs of the same molecule. I understand that it has something to do with > the underlying molecular graph. In particular, in the linked mail list > discussion Greg says ( > https://sourceforge.net/p/rdkit/mailman/message/34923647/): > “If you just want a canonical ordering of the atoms, there is no > reason to generate the SMILES. You can just use Chem.CanonicalRankAtoms().” > 2. Is there a better solution than round tripping from import X format > -> export canonical smiles -> import canonical smiles -> export canonical > mol (mol file or similar)? > 3. In a related but tangential questions, is there a way to have > canonical smiles without the lowercase aromaticity notation? > > > > Thank you very much, > > > > Jeff van Santen > > The Natural Products Atlas (www.npatlas.org) > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss