Hi Paolo, Yes, that is very helpful indeed, so thank you!
Perhaps something of this nature could be added either to the cookbook or the docs as this API is rather unclear? Cheers, Jeff From: Paolo Tosco <paolo.tosco.m...@gmail.com> Date: Friday, August 14, 2020 at 1:34 PM To: Jeffrey Van santen <jeffrey_van_san...@sfu.ca> Cc: "rdkit-discuss@lists.sourceforge.net" <rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] Atom Order Canonicalization Hi Jeffrey, this gist shows how to achieve what you need: https://gist.github.com/ptosco/36574d7f025a932bc1b8db221903a8d2 i.e., how to reorder atoms based on the result of Chem.CanonicalRankAtoms(). HTH, cheers p. On Fri, Aug 14, 2020 at 8:36 PM Jeffrey Van santen <jeffrey_van_san...@sfu.ca<mailto:jeffrey_van_san...@sfu.ca>> wrote: Hello all, I realize that this topic has been discussed in some detail (https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/76909664-2C16-4B61-8BEE-2196B3721EA1%40gmail.com/#msg34923617), but I remain somewhat confused. Let me layout what I am trying to achieve: I would like a method for creating a canonical order of the atoms in a molecule, independent of the input order. For example, given (R)-1-(sec-butyl)naphthalene (see attached image) [A close up of a logo Description automatically generated] if you start with the following smiles string “CC[C@H](C1=CC=CC2=C1C=CC=C2)C” versus the InChI string “InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”, you obviously get two different atom orders. I have tried to apply the `CanonicalRankAtoms` method to each of the molecules, such as the following example code: ``` from rdkit import Chem def atom_order(m): return [(x.GetIdx(), x.GetAtomicNum(), x.GetDegree()) for x in m.GetAtoms()] m = Chem.MolFromSmiles(“CC[C@H](C1=CC=CC2=C1C=CC=C2)C”) m = Chem.AddHs(m) m1 = Chem.MolFromInchi(“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”) m1 = Chem.AddHs(m1) # Some simple comparison of atom ordering atom_order(m) == atom_order(m1) # returns False m_order = list(Chem.CanonicalRankAtoms(m)) m1_order = list(Chem.CanonicalRankAtoms(m1)) m_order == m1_order # returns False # For completeness m_ordered = Chem.RenumberAtoms(m, m_order) m1_ordered = Chem.RenumberAtoms(m1, m1_order) atom_order(m_ordered) == atom_order(m1_ordered) # returns False ``` One plausible solution that seems to work, is the following extension: ``` m_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m)) m1_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m1)) atom_order(m_canon) == atom_order(m1_canon) # returns True ``` I believe this works because by default `MolToSmiles` has the `canonical=True`. I suppose what I would like to know is 1. Why does CanonicalAtomRank not return the same result for different inputs of the same molecule. I understand that it has something to do with the underlying molecular graph. In particular, in the linked mail list discussion Greg says (https://sourceforge.net/p/rdkit/mailman/message/34923647/): “If you just want a canonical ordering of the atoms, there is no reason to generate the SMILES. You can just use Chem.CanonicalRankAtoms().” 2. Is there a better solution than round tripping from import X format -> export canonical smiles -> import canonical smiles -> export canonical mol (mol file or similar)? 3. In a related but tangential questions, is there a way to have canonical smiles without the lowercase aromaticity notation? Thank you very much, Jeff van Santen The Natural Products Atlas (www.npatlas.org<http://www.npatlas.org>) _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss