Hi Michael,
On Tue, Aug 14, 2012 at 7:06 PM, Michael Palmer <[email protected]> wrote:
>
> it seems that nitrogen atoms in aromatic rings are being assigned
> faulty valences. See example below:
>
>>>> from rdkit import Chem
>>>> pyrrole = Chem.MolFromSmiles('C1=CNC=C1')
>>>> for atom in pyrrole.GetAtoms():
> ... print atom.GetSymbol(), atom.GetExplicitValence(),
> atom.GetNumImplicitHs()
> ...
> C 3 1
> C 3 1
> N 3 0
> C 3 1
> C 3 1
>
> The nitrogen should have a valence of two and one implicit hydrogen.
This is confusing because of a bad choice of terminology that I made a
long time ago.
First the terminology bit: valence has to do with bond orders, degree
has to do with neighbors. Most of the time (the bad exception is
explained below), explicit refers to atoms that are in the graph and
implicit refers to atoms that are not in the graph (i.e. Hs).
So, given that the ring is aromatic, the explicit valence of each of
the atoms (ignores the Hs that are not present in the graph) in
pyrrole is 3:
In [6]: for atom in pyrrole.GetAtoms():
print atom.GetSymbol(), atom.GetExplicitValence()
...:
C 3
C 3
N 3
C 3
C 3
If you want the H count, you should use GetTotalNumHs(); the total
number of Hs for each atom is one:
In [8]: for atom in pyrrole.GetAtoms():
print atom.GetSymbol(), atom.GetTotalNumHs()
...:
C 1
C 1
N 1
C 1
C 1
Unfortunately, due to a bad decision from >10 years ago, there's some
overlapping and confusing nomenclature around the use of the words
"explicit" and "implicit" when it comes to Hs. When you specify the Hs
for an atom inside of square brackets in the SMILES, it becomes an
"explicit" H as far as atom.GetNumExplicitHs() is concerned.
Here's an example:
In [14]: pyrrole = Chem.MolFromSmiles('C1=CNC=C1')
In [15]: mol1 = Chem.MolFromSmiles('C1=CNCC1')
In [16]: mol2 = Chem.MolFromSmiles('C1=C[NH]CC1')
In [17]: for atom in pyrrole.GetAtoms():
print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs()
....:
C 0 1
C 0 1
N 1 0
C 0 1
C 0 1
In [18]: for atom in mol1.GetAtoms():
print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs()
....:
C 0 1
C 0 1
N 0 1
C 0 2
C 0 2
In [19]: for atom in mol2.GetAtoms():
print atom.GetSymbol(), atom.GetNumExplicitHs(), atom.GetNumImplicitHs()
....:
C 0 1
C 0 1
N 1 0
C 0 2
C 0 2
This is really exposing an implementation detail as far as the end
user/client is concerned, it creates confusion, and it's not something
that should be there. It definitely could be fixed, but I am afraid
that it would break a lot of code.
Now that I've at least tried to clear up what is going on, maybe I can
be more helpful: was there a specific question you were trying to
answer that led you to your discovery that the RDKit behaves strangely
in this special case?
Does this help?
-greg
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss