Hi Greg,

Does this depend on removeHs() function? I mean, to make MergeQueryHs() work, 
should I do removeHs=False first for all compounds in my database, to preserve 
implicit\explicit hydrogens in their structure?

Thank you!

Andrew


30.11.2017 22:26, Andrey <pti...@ua.fm>
>That's awesome, many thanks for your help Greg. The blog article is great too. 
>I'll try this out and let you know if there's any success.
> 
> Kind regards,
> 
> Andrew
> 
> 
> 30.11.2017 08:15, Greg Landrum <greg.land...@gmail.com>
> >Hi Andrey,
> > 
> > 
> > On Thu, Nov 30, 2017 at 1:17 AM, Andrey <pti...@ua.fm> wrote:
> > 
> > > Dear RDKit community,
> > >
> > > I'm setting up a chemical search engine based on RDKit, and I have
> > > question about accounting explicit hydrogens.
> > > I'm using Ketcher and Marvin JS as molecular editors to draw structure
> > > queries for searching among ~100K compounds.
> > >
> > > Here's an example search queries:
> > >
> > > 1. C1=CC=NC(N)=C1
> > > 2. C1=CC=NC(N([H])[H])=C1
> > >
> > > Both queries is the same molecule (pyridin-2-amine), but query#2 has two
> > > explicitly indicated hydrogens in NH2 group.
> > >
> > > In both cases, when I do substructure search I get the same list of
> > > compounds with substituted NH2 group, which is OK for query#1, but for
> > > query#2 the NH2 substitution should be avoided.
> > > It seems that the system (RDKit?) is not sensitive to explicitly indicated
> > > hydrogens which makes the substructure search not efficient enough for my
> > > needs.
> > >
> > 
> > The RDKit has a function called MergeQueryHs() that's intended to help out
> > in cases like this. Here's a quick demo of how this works:
> > 
> > Start by building the query molecules:
> > 
> > In [5]: params = Chem.SmilesParserParams()
> > 
> > In [6]: params.removeHs=False
> > 
> > In [8]: p1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1',params)
> > 
> > In [9]: p2 = Chem.MolFromSmiles('C1=CC=NC(N([H])[H])=C1',params)
> > 
> > In [10]: p3 = Chem.MergeQueryHs(p2)
> > 
> > 
> > Here are the two test molecules:
> > 
> > In [11]: m1 = Chem.MolFromSmiles('C1=CC=NC(N)=C1')
> > 
> > In [12]: m2 = Chem.MolFromSmiles('C1=CC=NC(N(C))=C1')
> > 
> > 
> > 
> > And the results:
> > 
> > In [13]: m1.HasSubstructMatch(p1)
> > Out[13]: True
> > 
> > In [14]: m1.HasSubstructMatch(p2)
> > Out[14]: False
> > 
> > In [15]: m1.HasSubstructMatch(p3)
> > Out[15]: True
> > 
> > In [16]: m2.HasSubstructMatch(p1)
> > Out[16]: True
> > 
> > In [17]: m2.HasSubstructMatch(p2)
> > Out[17]: False
> > 
> > In [18]: m2.HasSubstructMatch(p3)
> > Out[18]: False
> > 
> > You may also find this blog post and the links therein helpful:
> > http://rdkit.blogspot.co.uk/2016/07/tuning-substructure-queries-ii.html
> > 
> > I hope this helps,
> > -greg
> > 
> > 
> > 
> > > I'm new to RDKit and I'd very appreciate any thoughts on how this problem
> > > could be solved. Are there any settings in RDKit related to this?
> > >
> > > Thank you in advance,
> > >
> > > Andrew
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


-- реклама -----------------------------------------------------------
Программа управления бизнесом для ленивых эгоистов 
CRM OneBox https://goo.gl/PdBVV6

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to