That's awesome, many thanks for your help Greg. The blog article is great too. I'll try this out and let you know if there's any success.
Kind regards, Andrew 30.11.2017 08:15, Greg Landrum <greg.land...@gmail.com> >Hi Andrey, > > > On Thu, Nov 30, 2017 at 1:17 AM, Andrey <pti...@ua.fm> wrote: > > > Dear RDKit community, > > > > I'm setting up a chemical search engine based on RDKit, and I have > > question about accounting explicit hydrogens. > > I'm using Ketcher and Marvin JS as molecular editors to draw structure > > queries for searching among ~100K compounds. > > > > Here's an example search queries: > > > > 1. C1=CC=NC(N)=C1 > > 2. C1=CC=NC(N([H])[H])=C1 > > > > Both queries is the same molecule (pyridin-2-amine), but query#2 has two > > explicitly indicated hydrogens in NH2 group. > > > > In both cases, when I do substructure search I get the same list of > > compounds with substituted NH2 group, which is OK for query#1, but for > > query#2 the NH2 substitution should be avoided. > > It seems that the system (RDKit?) is not sensitive to explicitly indicated > > hydrogens which makes the substructure search not efficient enough for my > > needs. > > > > The RDKit has a function called MergeQueryHs() that's intended to help out > in cases like this. Here's a quick demo of how this works: > > Start by building the query molecules: > > In [5]: params = Chem.SmilesParserParams() > > In [6]: params.removeHs=False > > In [8]: p1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1',params) > > In [9]: p2 = Chem.MolFromSmiles('C1=CC=NC(N([H])[H])=C1',params) > > In [10]: p3 = Chem.MergeQueryHs(p2) > > > Here are the two test molecules: > > In [11]: m1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1') > > In [12]: m2 = Chem.MolFromSmiles('C1=CC=NC(N(C))=C1') > > > > And the results: > > In [13]: m1.HasSubstructMatch(p1) > Out[13]: True > > In [14]: m1.HasSubstructMatch(p2) > Out[14]: False > > In [15]: m1.HasSubstructMatch(p3) > Out[15]: True > > In [16]: m2.HasSubstructMatch(p1) > Out[16]: True > > In [17]: m2.HasSubstructMatch(p2) > Out[17]: False > > In [18]: m2.HasSubstructMatch(p3) > Out[18]: False > > You may also find this blog post and the links therein helpful: > http://rdkit.blogspot.co.uk/2016/07/tuning-substructure-queries-ii.html > > I hope this helps, > -greg > > > > > I'm new to RDKit and I'd very appreciate any thoughts on how this problem > > could be solved. Are there any settings in RDKit related to this? > > > > Thank you in advance, > > > > Andrew -- реклама ----------------------------------------------------------- Программа управления бизнесом для ленивых эгоистов CRM OneBox https://goo.gl/PdBVV6 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss