Cool, good to know this special property. Thank you Andrew!
Ling

Andrew Dalke <da...@dalkescientific.com> 於 2021年11月2日週二 下午10:36寫道:

> Hi Ling,
>
>   If there are symmetries then a substructure search like will only give
> you one mapping, and that might not be the canonical mapping.
>
> What you're looking for is the special property _smilesAtomOutputOrder
>
>
> >>> from rdkit import Chem
> >>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C")
> >>> Chem.MolToSmiles(mol)
> 'COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O'
> >>> mol.GetProp("_smilesAtomOutputOrder")
> '[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]'
>
> Here are the atom indices of the original SMILES:
>
>          ┌                 1 11  1111 1 1 1 2 2
>     atoms│ 0 1 234 56 78 9 0 12  3456 7 8 9 0 1
>          └ | | ||| || || | | ||  |||| | | | | |
>    SMILES[ O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C
>
>
> You can see the first atom of the output is a "C", which is mapped to
> position 8 in the _smilesAtomOutputOrder, which is the "...C)..." in the
> original SMILES, etc.
>
>
> Cheers,
>
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
> > On Nov 3, 2021, at 00:18, Ling Chan <lingtrek...@gmail.com> wrote:
> >
> > O.K. Problem solved. Sorry about the spam, folks.
> >
> > I can use GetSubstructMatch, as follows.
> >
> > # sinput is the input smiles
> > # scanon is the output smiles
> >
> > minput = Chem.MolFromSmiles(sinput)
> > scanon=Chem.MolToSmiles(minput)
> > mcanon=Chem.MolFromSmiles(scanon)
> > map_forward = minput.GetSubstructMatch(mcanon)
> > map_backward = mcanon.GetSubstructMatch(minput)
> >
> >
> >
> >
> > Ling Chan <lingtrek...@gmail.com> 於 2021年11月2日週二 下午3:55寫道:
> > Dear colleagues,
> >
> > Just wonder if I can obtain a mapping of the atom indices upon
> canonicalization by MolToSmiles ? I am aware that canonicalization (and
> hence atom reordering) can be suppressed in MolToSmiles, but I do want to
> canonicalize the output smiles.
> >
> > If you are interested, here is a bit more details of my problem. For
> each molecule, I want to delete one or two side chains, and obtain a smiles
> of what is left. Just that I want to know what are the atoms that bonded to
> the deleted side chains. I know, by suppressing canonicalization things
> will work. But I would like to canonicalize the smiles so that I can know
> if there are duplicates.
> >
> > I tried marking the atoms. But I believe that properties that got
> carried over to the output smiles, e.g. Isotope, affect the
> canonicalization, while properties that do not affect canonicalization,
> e.g, IntProp, are lost upon the conversion to smiles.
> >
> > Thank you for your insight.
> >
> > Ling
> >
>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to