Cool, good to know this special property. Thank you Andrew! Ling
Andrew Dalke <da...@dalkescientific.com> 於 2021年11月2日週二 下午10:36寫道: > Hi Ling, > > If there are symmetries then a substructure search like will only give > you one mapping, and that might not be the canonical mapping. > > What you're looking for is the special property _smilesAtomOutputOrder > > > >>> from rdkit import Chem > >>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C") > >>> Chem.MolToSmiles(mol) > 'COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O' > >>> mol.GetProp("_smilesAtomOutputOrder") > '[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]' > > Here are the atom indices of the original SMILES: > > ┌ 1 11 1111 1 1 1 2 2 > atoms│ 0 1 234 56 78 9 0 12 3456 7 8 9 0 1 > └ | | ||| || || | | || |||| | | | | | > SMILES[ O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C > > > You can see the first atom of the output is a "C", which is mapped to > position 8 in the _smilesAtomOutputOrder, which is the "...C)..." in the > original SMILES, etc. > > > Cheers, > > > Andrew > da...@dalkescientific.com > > > > On Nov 3, 2021, at 00:18, Ling Chan <lingtrek...@gmail.com> wrote: > > > > O.K. Problem solved. Sorry about the spam, folks. > > > > I can use GetSubstructMatch, as follows. > > > > # sinput is the input smiles > > # scanon is the output smiles > > > > minput = Chem.MolFromSmiles(sinput) > > scanon=Chem.MolToSmiles(minput) > > mcanon=Chem.MolFromSmiles(scanon) > > map_forward = minput.GetSubstructMatch(mcanon) > > map_backward = mcanon.GetSubstructMatch(minput) > > > > > > > > > > Ling Chan <lingtrek...@gmail.com> 於 2021年11月2日週二 下午3:55寫道: > > Dear colleagues, > > > > Just wonder if I can obtain a mapping of the atom indices upon > canonicalization by MolToSmiles ? I am aware that canonicalization (and > hence atom reordering) can be suppressed in MolToSmiles, but I do want to > canonicalize the output smiles. > > > > If you are interested, here is a bit more details of my problem. For > each molecule, I want to delete one or two side chains, and obtain a smiles > of what is left. Just that I want to know what are the atoms that bonded to > the deleted side chains. I know, by suppressing canonicalization things > will work. But I would like to canonicalize the smiles so that I can know > if there are duplicates. > > > > I tried marking the atoms. But I believe that properties that got > carried over to the output smiles, e.g. Isotope, affect the > canonicalization, while properties that do not affect canonicalization, > e.g, IntProp, are lost upon the conversion to smiles. > > > > Thank you for your insight. > > > > Ling > > > >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss