Hi Greg, Does this depend on removeHs() function? I mean, to make MergeQueryHs() work, should I do removeHs=False first for all compounds in my database, to preserve implicit\explicit hydrogens in their structure?
Thank you! Andrew 30.11.2017 22:26, Andrey <pti...@ua.fm> >That's awesome, many thanks for your help Greg. The blog article is great too. >I'll try this out and let you know if there's any success. > > Kind regards, > > Andrew > > > 30.11.2017 08:15, Greg Landrum <greg.land...@gmail.com> > >Hi Andrey, > > > > > > On Thu, Nov 30, 2017 at 1:17 AM, Andrey <pti...@ua.fm> wrote: > > > > > Dear RDKit community, > > > > > > I'm setting up a chemical search engine based on RDKit, and I have > > > question about accounting explicit hydrogens. > > > I'm using Ketcher and Marvin JS as molecular editors to draw structure > > > queries for searching among ~100K compounds. > > > > > > Here's an example search queries: > > > > > > 1. C1=CC=NC(N)=C1 > > > 2. C1=CC=NC(N([H])[H])=C1 > > > > > > Both queries is the same molecule (pyridin-2-amine), but query#2 has two > > > explicitly indicated hydrogens in NH2 group. > > > > > > In both cases, when I do substructure search I get the same list of > > > compounds with substituted NH2 group, which is OK for query#1, but for > > > query#2 the NH2 substitution should be avoided. > > > It seems that the system (RDKit?) is not sensitive to explicitly indicated > > > hydrogens which makes the substructure search not efficient enough for my > > > needs. > > > > > > > The RDKit has a function called MergeQueryHs() that's intended to help out > > in cases like this. Here's a quick demo of how this works: > > > > Start by building the query molecules: > > > > In [5]: params = Chem.SmilesParserParams() > > > > In [6]: params.removeHs=False > > > > In [8]: p1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1',params) > > > > In [9]: p2 = Chem.MolFromSmiles('C1=CC=NC(N([H])[H])=C1',params) > > > > In [10]: p3 = Chem.MergeQueryHs(p2) > > > > > > Here are the two test molecules: > > > > In [11]: m1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1') > > > > In [12]: m2 = Chem.MolFromSmiles('C1=CC=NC(N(C))=C1') > > > > > > > > And the results: > > > > In [13]: m1.HasSubstructMatch(p1) > > Out[13]: True > > > > In [14]: m1.HasSubstructMatch(p2) > > Out[14]: False > > > > In [15]: m1.HasSubstructMatch(p3) > > Out[15]: True > > > > In [16]: m2.HasSubstructMatch(p1) > > Out[16]: True > > > > In [17]: m2.HasSubstructMatch(p2) > > Out[17]: False > > > > In [18]: m2.HasSubstructMatch(p3) > > Out[18]: False > > > > You may also find this blog post and the links therein helpful: > > http://rdkit.blogspot.co.uk/2016/07/tuning-substructure-queries-ii.html > > > > I hope this helps, > > -greg > > > > > > > > > I'm new to RDKit and I'd very appreciate any thoughts on how this problem > > > could be solved. Are there any settings in RDKit related to this? > > > > > > Thank you in advance, > > > > > > Andrew > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- реклама ----------------------------------------------------------- Программа управления бизнесом для ленивых эгоистов CRM OneBox https://goo.gl/PdBVV6 ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss