Dear James,

On Tue, Sep 21, 2010 at 4:26 PM, James Davidson <[email protected]> wrote:
> Dear All,
>
> I have been struggling with a substructure search to remove compounds
> containing isotopes of hydrogen other than 1H...  I had hoped that the
> following would work: patt = Chem.MolFromSmarts('[2H,3H]') but this does not
> give a valid mol.

yeah, "H" in SMARTS doesn't indicate an H atom, but at atom that has
an H attached. So [2H] doesn't work.

> I then tried patt = Chem.MolFromSmarts('[2#1,3#1]').  This at leasts gives a
> mol object, but when used for substructure querying does not behave as I
> would like (ie mol.HasSubstructMatch(patt) is false even for molecules
> containing 2H or 3H)

hmm, that certainly should not be the case. Here's what I see:

[2]>>> smis = ['C','C[2H]','C[3H]']

[3]>>> ms = [Chem.MolFromSmiles(x) for x in smis]

[4]>>> p = Chem.MolFromSmarts('[2#1]')

[5]>>> [x.HasSubstructMatch(p) for x in ms]
Out[5] [False, True, False]

[6]>>> p = Chem.MolFromSmarts('[3#1]')

[7]>>> [x.HasSubstructMatch(p) for x in ms]
Out[7] [False, False, True]

[8]>>> p = Chem.MolFromSmarts('[2#1,3#1]')

[9]>>> [x.HasSubstructMatch(p) for x in ms]
Out[9] [False, True, True]

This looks correct to me.

If you see anything different, I would certainly like to know about
it. In that case, please let me know which version of the RDKit on
which platform.

> Finally, I thought that I could run them one at a time with eg patt =
> Chem.MolFromSmiles('[2H]', True) for the deuteriums.  This does work for me
> (sort of), but identifies molecules containing 3H as well as 2H - which I
> guess is great, as this is originally what I wanted!  However, I'm not sure
> why it works - and what I should be doing to make it behave as expected.

As strange as it may seem, this is expected (though arguably
incorrect) behavior:

[10]>>> p = Chem.MolFromSmiles('[2H]')

[11]>>> [x.HasSubstructMatch(p) for x in ms]
Out[11] [False, True, True]

This may help to understand what is going on:

[13]>>> mh = Chem.AddHs(ms[0])

[14]>>> mh.HasSubstructMatch(p)
Out[14] True

The query basically ends up matching any atom with atomic number 1. In
the RDKit substructure matching scheme, the only thing that is
compared when two non-query atoms are checked against each other is
their atomic number. Isotope specifications are not taken into account
unless you have query atoms (which is what MolFromSmarts gives you).

One could argue that isotope information should be taken into account
when standard atoms are matched, but that would require a broader
discussion about what exactly should match what. For example, it's
"obvious" that [2H] should not match [H], but it's less obvious if
[1H] should match [H].

> Any help/advice much appreciated!  And apologies for my high percentage of
> questions revolving around a lack of SMARTS experience!

SMARTS can be confusing, so asking questions about them doesn't
indicate a lack of smarts. ;-)

-greg

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to