Hi Andrey,
On Thu, Nov 30, 2017 at 1:17 AM, Andrey <pti...@ua.fm> wrote:
> Dear RDKit community,
>
> I'm setting up a chemical search engine based on RDKit, and I have
> question about accounting explicit hydrogens.
> I'm using Ketcher and Marvin JS as molecular editors to draw structure
> queries for searching among ~100K compounds.
>
> Here's an example search queries:
>
> 1. C1=CC=NC(N)=C1
> 2. C1=CC=NC(N([H])[H])=C1
>
> Both queries is the same molecule (pyridin-2-amine), but query#2 has two
> explicitly indicated hydrogens in NH2 group.
>
> In both cases, when I do substructure search I get the same list of
> compounds with substituted NH2 group, which is OK for query#1, but for
> query#2 the NH2 substitution should be avoided.
> It seems that the system (RDKit?) is not sensitive to explicitly indicated
> hydrogens which makes the substructure search not efficient enough for my
> needs.
>
The RDKit has a function called MergeQueryHs() that's intended to help out
in cases like this. Here's a quick demo of how this works:
Start by building the query molecules:
In [5]: params = Chem.SmilesParserParams()
In [6]: params.removeHs=False
In [8]: p1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1',params)
In [9]: p2 = Chem.MolFromSmiles('C1=CC=NC(N([H])[H])=C1',params)
In [10]: p3 = Chem.MergeQueryHs(p2)
Here are the two test molecules:
In [11]: m1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1')
In [12]: m2 = Chem.MolFromSmiles('C1=CC=NC(N(C))=C1')
And the results:
In [13]: m1.HasSubstructMatch(p1)
Out[13]: True
In [14]: m1.HasSubstructMatch(p2)
Out[14]: False
In [15]: m1.HasSubstructMatch(p3)
Out[15]: True
In [16]: m2.HasSubstructMatch(p1)
Out[16]: True
In [17]: m2.HasSubstructMatch(p2)
Out[17]: False
In [18]: m2.HasSubstructMatch(p3)
Out[18]: False
You may also find this blog post and the links therein helpful:
http://rdkit.blogspot.co.uk/2016/07/tuning-substructure-queries-ii.html
I hope this helps,
-greg
> I'm new to RDKit and I'd very appreciate any thoughts on how this problem
> could be solved. Are there any settings in RDKit related to this?
>
> Thank you in advance,
>
> Andrew
>
> -- реклама -----------------------------------------------------------
> Программа управления бизнесом для ленивых эгоистов
> CRM OneBox https://goo.gl/PdBVV6
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss