Dear Andrew,

On Mon, Dec 21, 2009 at 12:42 AM, Andrew Dalke
<[email protected]> wrote:
> The documentation for GetNumAtoms says it takes an optional parameter:
>
>   onlyHeavy: (optional) include only heavy atoms (not Hs) defaults to 1
>
>
> In testing it with the attached SD file I found that it included the four 
> hydrogens with mass difference of 1
>
[snip]
>
> My guess is that it treats the deuterium as explicit hydrogens but doesn't 
> take that into account when doing the heavy atom calculations.
>
> I'm assuming the problem is from ROMol::getNumAtoms

[snip]

>
> where the if (onlyHeavy) case does not remove count of any verticies which 
> are hydrogens.

Your diagnosis is mostly correct. The actual cause is that the code in
ROMol::getNumAtoms counts all vertices that are explicit in the graph
(i.e. it's assuming that Hs have been removed). You can see this
below:

[3] >>> m = Chem.MolFromSmiles('C')
[4] >>> m.GetNumAtoms(True)
Out[4]: 1
[5] >>> mh = Chem.AddHs(m)
[6] >>> mh.GetNumAtoms(True)
Out[6]: 5
[7] >>> m.GetNumAtoms(False)
Out[7]: 5

What's going on in your case is that the H removal code (by design)
doesn't remove any atoms with a mass difference other than 0. So 2H,
3H, etc are not removed.

The reasoning behind this behavior is that one should be able to call
ROMol::getNumAtoms(True) and then loop over the molecule's atoms.
Something like:

for(unsigned int i=0;i<mol.getNumAtoms();++i){
  Atom *atom=mol.getAtomWithIdx(i);
  [ do something ]
}

It's clear that the name of the function doesn't match its behavior.
If you actually want the number of non-hydrogen atoms, you need to
loop over the atoms in the molecule and check the atomic numbers.

The handling of Hs in the RDKit is, in general, a wart. As long as one
sticks to the "standard" workflow (working primarily with
hydrogen-suppressed graphs) things are fine, but if the molecules have
Hs (or 2Hs) the going can be a bit bumpy. Fixing the H handling is on
my list of things to do in the near(ish) future; it's going to be time
consuming but it will simplify a lot of other code, particularly
anything related to chirality.

-greg

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to