Hi Greg,
 
Thanks for your answer!
 
So my guess was correct that the atom mapping is not used in substructure search matching. Are there any efforts to include that feature in the (near) future? Same question goes to the make/break bonds feature I mentioned in my earlier E-Mail.
 
The Users for my search engine would need at least either feature to be included so that they can narrow their search by defining some kind of center where the actual reaction takes place.
The move_unmmapped_reactants_to_agents is definitely not what I want in my search. So I'll simply disable it. Thanks for pointing me to the correct syntax to do so.
 
Best regards
Sebastian
 
Gesendet: Montag, 14. Mai 2018 um 08:58 Uhr
Von: "Greg Landrum" <greg.land...@gmail.com>
An: "Sebastian Wandernoth" <s_wandern...@gmx.de>
Cc: "RDKit Discuss" <rdkit-discuss@lists.sourceforge.net>
Betreff: Re: [Rdkit-discuss] atom mapping in reaction searches
Hi Sebastian,
 
The reason this had me confused is that information about atom mapping is not taken into account when doing substructure search matching. This is true both of molecules and reactions.
I also tried to reproduce your examples in Python and failed completely.
 
Your question about threshold_unmapped_reactant_atoms is what gave me the clue. 
 
Here's what's going on:
By default the cartridge code does an extra step after reading a reaction from SMILES/SMARTS: it looks at all the reactants and moves any that don't have a sufficient fraction of mapped atoms to the agents. We do this by default because the reactions that we found "in the wild" often have agents, solvents, etc. mixed in with the reactants. The key parameter used there is threshold_unmapped_reactant_atoms, which defaults to 0.2.
 
Here you can see the difference for your example:
 
chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=false;
SET
chembl_23=> SELECT reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1') @> reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
 ?column?
----------
 t
(1 row)
 
 
chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=true;
SET
chembl_23=> SELECT reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1') @> reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
 ?column?
----------
 f
(1 row)
 
 
You can see in detail what's happening here:
chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=true;
SET
chembl_23=> select reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
                reaction_from_smarts
----------------------------------------------------
 >ClC1=NC=C[S:1]1>O=S1(=O)N=C(C2=CN=C(Cl)[S:1]2)CO1
(1 row)
 
 
chembl_23=> set rdkit.move_unmmapped_reactants_to_agents=false;
SET
chembl_23=> select reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
                reaction_from_smarts
----------------------------------------------------
 ClC1=NC=C[S:1]1>>O=S1(=O)N=C(C2=CN=C(Cl)[S:1]2)CO1
(1 row)
 
 
I hope this helps,
-greg
 
 
 
On Tue, Apr 24, 2018 at 9:34 AM Sebastian Wandernoth <s_wandern...@gmx.de> wrote:
Hey guys,
 
I'm still working on my search engine for reactions and I'm a bit puzzled as to what RDKit does with atom mapping information.
I'm still working with the PostgreSQL cartridge version 0.73.0, which should correspond to the release 2017.9.3.
 
I'm starting off with this example reaction which is fully mapped ([S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1)
 
If I'm using a completely unmapped reaction as query I expect to find this one. So the following should return TRUE:
 
SELECT reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1') @> reaction_from_smarts('S1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2SC(Cl)=NC=2)=N1');
 
... which it does
 
 
Next step is to map one atom correctly in the query and try again. I still expect this to return TRUE:
 
SELECT reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1') @> reaction_from_smarts('[S:1]1C(Cl)=NC=C1>>S(=O)(=O)1OCC(C2[S:1]C(Cl)=NC=2)=N1');
 
... which it doesn't
 
 
With two atoms mapped correctly in the query, I wouldn't expect to get different results from the previous try:
 
SELECT reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1') @> reaction_from_smarts('[S:1]1C(Cl)=NC=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1]C(Cl)=NC=2)=N1');
 
... this one however returns TRUE again
 
 
Final try I did was to include a wrong mapping in the query. I definitely would expect to get back FALSE here (I'm mapping one sulfur atom to a carbon atom and a nitrogen to an oxygen):
 
SELECT reaction_from_smarts('[S:1]1[C:3]([Cl:4])=[N:5][CH:6]=[CH:2]1>>S(=O)(=O)1OCC([C:2]2[S:1][C:3]([Cl:4])=[N:5][CH:6]=2)=N1') @> reaction_from_smarts('[S:1]1C(Cl)=[N:2]C=C1>>S(=[O:2])(=O)1O[CH2:1]C(C2SC(Cl)=NC=2)=N1');
 
... however this returns TRUE yet again.
 
Playing around with it a bit more I found that whatever single atom I map in the query, I always get back FALSE and if I map more than one atom, I always get back TRUE...
Does this have something to do with the parameter 'rdkit.threshold_unmapped_reactant_atoms'? My suspicion is that RDKit only counts how many atoms are mapped and not compare them to the correct mapping. Can you confirm this?
Is there any way at all to include atom mapping in the query to filter the reactions the way I want to?
 
 
I hope you guys can help me here. Sorry for the lengthy question, but I wanted to include as much information as possible for you to pinpoint the issue.
 
Best regards
Sebastian
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to