On Mon, Dec 3, 2012 at 4:55 PM, Greg Landrum <[email protected]> wrote:
>
> On Mon, Dec 3, 2012 at 3:40 PM, Andrew Dalke <[email protected]>wrote:
>
>>
>> However, that doesn't work like I hoped it would. Consider these:
>>
>> >>> mol = Chem.MolFromSmiles("c1c(C)c2c(N)cccc2[nH]1")
>> >>> mol.HasSubstructMatch(query)
>> True
>> >>> mol = Chem.MolFromSmiles("Fn1cc(Cl)c2ccccc21")
>> >>> mol.HasSubstructMatch(query)
>> False
>>
>> I expected the sketched structure to match both structures,
>> and not just the first. The failure appears to be because
>> of the explicit hydrogen in the '[nH]' term. If I remove the
>> 'H' then the match is fine.
>>
>> >>> for atom in query.GetAtoms():
>> ... print atom.GetNumExplicitHs(),
>> ... else:
>> ... print
>> ...
>> 0 0 0 0 0 0 0 0 1
>> >>> for atom in query.GetAtoms():
>> ... atom.SetNumExplicitHs(0)
>> ...
>> >>> mol.HasSubstructMatch(query)
>> True
>>
>> Is a description of how atoms and bonds are matched, when the
>> given substructure comes from a molecule and not a SMARTS,
>> available somewhere?
>>
>
> Yes, it's here:
>
> http://www.rdkit.org/docs/RDKit_Book.html#atom-atom-matching-in-substructure-queries
>
>
>> Finally, am I missing anything else in what I need to do
>> in order to prepare an input substructure as a substructure
>> query?
>>
>
> The problem is that the aromatic N in the query has, according to the
> RDKit, an explicit H. So when the query executes, it uses that explicit H
> as part of the matching criteria (see the link above). This is plainly
> wrong, but the best fix to the problem is going to require some
> re-imagining of how the RDKit handles hydrogen atoms. I've been wanting to
> do this for a while, but it's potentially a code-breaking change, so I've
> been avoiding it. I'll start a thread on the rdkit-devel list about this,
> in case anyone wants to participate.
>
> In the meantime, your workaround of setting the number of explicit Hs to
> zero on atoms in the query molecule should solve the problem.
>
This particular problem -- "explicit" Hs on atoms in the query preventing
substructure matches -- has come up a few times recently in a couple
different places. As an at least temporary solution, I've changed the
atom-atom matching code so that it ignores hydrogen counts. You can now do
this:
In [3]: p = Chem.MolFromSmiles('c1c[nH]cc1')
In [4]: m = Chem.MolFromSmiles('c1cn(C)cc1')
In [5]: m.HasSubstructMatch(p)
Out[5]: True
Note that this also means that the H in C[OH] is ignored, so it's now a
substructure of C[O-]. For finer-grain control over H specifications in
queries, you will need to use either SMARTS or molecules that have Hs added.
This look ok?
Best,
-greg
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss