Hi Ling,

  If there are symmetries then a substructure search like will only give you 
one mapping, and that might not be the canonical mapping.

What you're looking for is the special property _smilesAtomOutputOrder


>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C")
>>> Chem.MolToSmiles(mol)
'COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O'
>>> mol.GetProp("_smilesAtomOutputOrder")
'[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]'

Here are the atom indices of the original SMILES:

         ┌                 1 11  1111 1 1 1 2 2
    atoms│ 0 1 234 56 78 9 0 12  3456 7 8 9 0 1
         └ | | ||| || || | | ||  |||| | | | | |
   SMILES[ O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C


You can see the first atom of the output is a "C", which is mapped to position 
8 in the _smilesAtomOutputOrder, which is the "...C)..." in the original 
SMILES, etc.


Cheers,


                                Andrew
                                da...@dalkescientific.com


> On Nov 3, 2021, at 00:18, Ling Chan <lingtrek...@gmail.com> wrote:
> 
> O.K. Problem solved. Sorry about the spam, folks.
> 
> I can use GetSubstructMatch, as follows.
> 
> # sinput is the input smiles
> # scanon is the output smiles
> 
> minput = Chem.MolFromSmiles(sinput)
> scanon=Chem.MolToSmiles(minput)
> mcanon=Chem.MolFromSmiles(scanon)
> map_forward = minput.GetSubstructMatch(mcanon)
> map_backward = mcanon.GetSubstructMatch(minput)
> 
> 
> 
> 
> Ling Chan <lingtrek...@gmail.com> 於 2021年11月2日週二 下午3:55寫道:
> Dear colleagues,
> 
> Just wonder if I can obtain a mapping of the atom indices upon 
> canonicalization by MolToSmiles ? I am aware that canonicalization (and hence 
> atom reordering) can be suppressed in MolToSmiles, but I do want to 
> canonicalize the output smiles.
> 
> If you are interested, here is a bit more details of my problem. For each 
> molecule, I want to delete one or two side chains, and obtain a smiles of 
> what is left. Just that I want to know what are the atoms that bonded to the 
> deleted side chains. I know, by suppressing canonicalization things will 
> work. But I would like to canonicalize the smiles so that I can know if there 
> are duplicates.
> 
> I tried marking the atoms. But I believe that properties that got carried 
> over to the output smiles, e.g. Isotope, affect the canonicalization, while 
> properties that do not affect canonicalization, e.g, IntProp, are lost upon 
> the conversion to smiles.
> 
> Thank you for your insight.
> 
> Ling
> 



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to