Hi Ling, If there are symmetries then a substructure search like will only give you one mapping, and that might not be the canonical mapping.
What you're looking for is the special property _smilesAtomOutputOrder >>> from rdkit import Chem >>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C") >>> Chem.MolToSmiles(mol) 'COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O' >>> mol.GetProp("_smilesAtomOutputOrder") '[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]' Here are the atom indices of the original SMILES: ┌ 1 11 1111 1 1 1 2 2 atoms│ 0 1 234 56 78 9 0 12 3456 7 8 9 0 1 └ | | ||| || || | | || |||| | | | | | SMILES[ O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C You can see the first atom of the output is a "C", which is mapped to position 8 in the _smilesAtomOutputOrder, which is the "...C)..." in the original SMILES, etc. Cheers, Andrew da...@dalkescientific.com > On Nov 3, 2021, at 00:18, Ling Chan <lingtrek...@gmail.com> wrote: > > O.K. Problem solved. Sorry about the spam, folks. > > I can use GetSubstructMatch, as follows. > > # sinput is the input smiles > # scanon is the output smiles > > minput = Chem.MolFromSmiles(sinput) > scanon=Chem.MolToSmiles(minput) > mcanon=Chem.MolFromSmiles(scanon) > map_forward = minput.GetSubstructMatch(mcanon) > map_backward = mcanon.GetSubstructMatch(minput) > > > > > Ling Chan <lingtrek...@gmail.com> 於 2021年11月2日週二 下午3:55寫道: > Dear colleagues, > > Just wonder if I can obtain a mapping of the atom indices upon > canonicalization by MolToSmiles ? I am aware that canonicalization (and hence > atom reordering) can be suppressed in MolToSmiles, but I do want to > canonicalize the output smiles. > > If you are interested, here is a bit more details of my problem. For each > molecule, I want to delete one or two side chains, and obtain a smiles of > what is left. Just that I want to know what are the atoms that bonded to the > deleted side chains. I know, by suppressing canonicalization things will > work. But I would like to canonicalize the smiles so that I can know if there > are duplicates. > > I tried marking the atoms. But I believe that properties that got carried > over to the output smiles, e.g. Isotope, affect the canonicalization, while > properties that do not affect canonicalization, e.g, IntProp, are lost upon > the conversion to smiles. > > Thank you for your insight. > > Ling > _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss