Bugs item #3310779, was opened at 2011-06-02 19:35
Message generated for change (Tracker Item Submitted) made by baoilleach
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=428740&aid=3310779&group_id=40728

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Noel O'Boyle (baoilleach)
Assigned to: Nobody/Anonymous (nobody)
Summary: Handling of Implicit H in Smarts 

Initial Comment:
>From Andrew Dalke on list:

One of RDKit MACCS key definitions is

   [!#6;!#1]~[!#6;!#1;!H0]

I'm working on my test suite for those definitions, as mentioned in my previous 
email.

Here's a test case

>>> mol = pybel.readstring("smi", "[U]S(C)C")
>>> matcher = pybel.Smarts("[!#6;H0]")
>>> matcher.findall(mol)
[(1,), (2,)]
>>> matcher = pybel.Smarts("[!#6;!#1]~[!#6;!#1;!H0]")
>>> matcher.findall(mol)
[]
>>>

RDKit, OEChem, and Daylight say that that pattern matches that structure. 
That's because all three programs say that the "S" has an implicit hydrogen on 
it.

Daylight says that sulfur has valence levels of  "S (2,4,6)"

 http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

This looks to be a bug in the code which calculates the implicit hydrogen count.

Here's another another case where the implicit h-count is wrong, this time with 
P.

Daylight says the valence levels for P in SMILES are (3,5)

Given  N=PPCC

The second atom (the first P) has a double bond and a single,
so it's valences are filled. It should have no implicit hydrogens.

However, here's first the RDKit.MACCS pattern which passed, unexpectedly, in 
OpenBabel

>>> mol = pybel.readstring("smi", "N=PPCC")
>>> matcher = pybel.Smarts("[!#6;!#1;!H0]~[!#6;!#1;!H0]")
>>> matcher.findall(mol)
[(1, 2), (2, 3)]
>>> Hmatcher = pybel.Smarts("[!H0]")
>>> Hmatcher.findall(mol)
[(1,), (2,), (3,), (4,), (5,)]
>>>

You can see it's because the matcher thinks all of the atoms have at least one 
implicit hydrogen.


Compare this to RDKit, which correctly has the P with no implicit hydrogens.

>>> mol = Chem.MolFromSmiles("N=PPCC")
>>> pat = Chem.MolFromSmarts("[!#6;!#1;!H0]~[!#6;!#1;!H0]")
>>> mol.GetSubstructMatches(pat)
()
>>> Hpat = Chem.MolFromSmarts("[!H0]")
>>> mol.GetSubstructMatches(Hpat)
((0,), (2,), (3,), (4,))

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=428740&aid=3310779&group_id=40728

------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Discover what all the cheering's about.
Get your free trial download today. 
http://p.sf.net/sfu/quest-dev2dev2 
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to