Hi Markus,

That's a documentation bug and a pointer to a possible useful new feature.
Thanks for pointing it out.

The current behavior, by design, is that ReplaceSubstructs removes the
atoms that match the pattern, adds the atoms from the replacement molecule,
and then forms bonds from the first atom in the replacement molecule
corresponding to the bonds from the equivalent atom in the original
molecule.

I'll make the example from the docs asymmetrical so that it's a bit easier
to see what's going on:

In [44]: repl = Chem.MolFromSmiles('NC')
    ...: patt = Chem.MolFromSmarts('OC')
    ...: m = Chem.MolFromSmiles('ClCCOC')
    ...: rms = AllChem.ReplaceSubstructs(m,patt,repl)



In [45]: Chem.MolToSmiles(rms[0])


Out[45]: 'CCl.CNC'

In [46]: Chem.MolToSmiles(rms[1])


Out[46]: 'CNCCCl'


The first result corresponds to the pattern matching atoms 3 and 2
(numbered from zero) in m. Since only the bonds from the first matching
atom (the O, atom 3) are restored, we don't end up creating the bond
between the C in the pattern and C1 in the molecule and we get disconnected
fragments.
The second result corresponds to the pattern matching atoms 3 and 4. Here
the bonds from the O are restored; this connects the N to C2 and we end up
with a single molecule.

It wouldn't be impossible to add an option which changes the behavior so
that all bonds from matched atoms in the molecule are restored, but in the
meantime the documentation definitely should be corrected.

Best,
-greg





On Fri, Mar 6, 2020 at 2:28 AM Markus Metz <metm...@gmail.com> wrote:

> Hello:
> I am puzzled by the output from ReplaceSubstructs as it can produce two
> fragments.
> So I went and tried the examples in the manual and I observed this:
> The example in the intro manual with the recursive smarts pattern works as
> expected.
> repl = Chem.MolFromSmiles('OC')
> patt = Chem.MolFromSmarts('[$(NC(=O))]')
> m = Chem.MolFromSmiles('CC(=O)N')
> rms = AllChem.ReplaceSubstructs(m,patt,repl)
> One expected product is formed.
>
> The example form the ReplaceSubstructs manual produces two solutions.
> I have used the following commands:
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
> #ReplaceSubstructs(‘CCOC’,’OC’,’NC’) -> (‘CCNC’,)
> repl = Chem.MolFromSmiles('NC')
> patt = Chem.MolFromSmarts('OC')
> m = Chem.MolFromSmiles('CCOC')
> rms = AllChem.ReplaceSubstructs(m,patt,repl)
>
> for rm in rms:
>    print(CHem.MolToSmiles(rm))
>
> output is: C.CNC and CCNC
>
> Why is the first result produced?
> I checked the mailing list and could find this older threat
> https://sourceforge.net/p/rdkit/mailman/message/28777648/.
> But the answer to the related question is missing.
>
> rdkit version is 2020.03.1dev1
>
> Best wishes,
> Markus
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to