Message: 6
Date: Wed, 15 Oct 2008 14:07:58 -0600
From: "Robert DeLisle" <[email protected]>
Subject: Re: [Rdkit-discuss] H-bond Acceptor problem
To: "Hans Purkey" <[email protected]>
Cc: RD-Kit <[email protected]>, Greg Landrum
<[email protected]>
Message-ID:
<[email protected]>
Content-Type: text/plain; charset="iso-8859-1"
Good point, Hans.
I see that within the available descriptors there are NHOHCount and NOCount,
which I assume are equivalent to Lipinski's Donors and Acceptors. Also
there are NumHAcceptors and NumHDonors which I would expect to differentiate
themselves from the Linpinski versions in some way.
-Kirk
On Wed, Oct 15, 2008 at 1:19 PM, Hans Purkey <[email protected]> wrote:
If the intention is to follow Lipinski's definitions of Hbond acceptors,
then it should be a simple N+O count (look back at the original paper and
that is how he difined it "for simplicity").
However, if the descriptor is intended to match a more intuitive/realistic
definition of HBA, then N-H shouldn't be a part of it.
Hans
On Oct 15, 2008, at 11:50 AM, Greg Landrum wrote:
[heh, worse than sending a message without an attachment is hitting
send before the message is done and sending a message without text...
sorry]
On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle <[email protected]>
wrote:
As you know, I've been working with descriptors in RDKit, and I think
I've
found a bug in the calculation of H-bond Acceptors. Attached is an
example
structure, N-methyl-1H-indole-6-carboxamide. When I calculate
NumHAcceptors
for this structure, I get 3. I've looked at numerous other strucures and
it
seems that nitrogens are always counted. I went into the code and found
the
definitions used for HAcceptors:
Here's a simpler case showing the same behavior:
[15] >>> m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1')
[16] >>> Lipinski.NumHAcceptors(m2)
Out[16]: 3
so that confirms the wrong count
$([O,S;H1;v2]-[!$(*=[O,N,P,S])])
$([O,S;H0;v2])
$([O,S;-])
$([N&v3;H1,H2]-[!$(*=[O,N,P,S])])
$([N;v3;H0])
$([n,o,s;+0])
F
Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH
groups are being counted as an acceptor due to matching
$([N&v3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be excluded
according to this same definition?
[20] >>>
m2.GetSubstructMatches(Chem.MolFromSmarts('[$([N&v3;H1,H2]-[!$(*=[O,N,P,S])])]'))
Out[20]: ((1,),)
Only matches one nitrogen... the amide nitrogen. The aromatic N
matches the second but last definition:
[29] >>> m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]'))
Out[29]: ((6,),)
The problem is that the first definition matches an N that is single
bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not
exclude Ns that are single bonded to an atom that is doubly bonded to
O,N,P, or S. So your amide with a secondary N matches. The problem
isn't the matcher, it's the definition.
Is that clear?
I agree that this is a bug in the definition and will fix it. Would
you mind entering the bug at sf.net or should I do it?
-greg
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the
world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
End of Rdkit-discuss Digest, Vol 13, Issue 3
********************************************
Hi there,
some time ago, I had a similar concern about some Features in
Basefeatures.fdef.
I thought this to be a matter of a somewhat personal view and simply
changed the Definitions to satisfy my needs.
The three changes worth noting were:
the amide (as discussed above)
the Amidine group, that I checked in as positive ionizable (does it
interfere with the guanidine, greg??),
and a Carboxylic acid Oxygen, that I additionally define as
Acceptor (I presume it to be deprotonated).
I attached the customized file
Maybe this helps,
Markus
# $Id: BaseFeatures.fdef 662 2008-05-14 20:22:44Z glandrum $
#
# RDKit base fdef file.
# Created by Greg Landrum
# changes by M.Kossner:
# ChalcAcceptor in line 27: removed a v2; in order to include e.g. carboxylate
O- as Acceptor
# NDonor in line 13 !{AmideN} in order to exclude e.g. dimethylamides from
NDonor (they won't be protonated!)
# Amidine as Pos ion Feature analogous to Guanidine
AtomType AmideN [$(N-C(=O))]
AtomType SulfonamideN [$([N;H0]S(=O)(=O))]
AtomType NDonor [N&!H0&v3,N&!H0&+1&v4,n&H1&+0]
AtomType NDonor [$([Nv3](-C)(-C)-C);!{AmideN}]
AtomType NDonor [$(n[n;H1]),$(nc[n;H1])]
AtomType AcidicHydroxyl [$([O]C(=[O,S,P]))]
AtomType ChalcDonor [O,S;H1;+0]
DefineFeature SingleAtomDonor [{NDonor},{ChalcDonor};!{AcidicHydroxyl}]
Family Donor
Weights 1
EndFeature
# aromatic N, but not indole or pyrole or fusing two rings
AtomType NAcceptor [n;+0;!X3;!$([n;H1](cc)cc)]
AtomType NAcceptor [$([N;H0]#[C&v4])]
# tertiary nitrogen adjacent to aromatic carbon
AtomType NAcceptor [N&v3;H0;$(Nc)]
# removes thioether and nitro oxygen
AtomType ChalcAcceptor [O;H0;!$(O=N-*)]
Atomtype ChalcAcceptor [O;-;!$(*-N=O)]
# Removed aromatic sulfur from ChalcAcceptor definition
Atomtype ChalcAcceptor [o;+0]
# Hydroxyls and acids
AtomType Hydroxyl [O;H1;v2]
# F is an acceptor so long as the C has no other halogen neighbors. This is
maybe
# a bit too general, but the idea is to eliminate things like CF3
AtomType HalogenAcceptor [F;$(F-[#6]);!$(FC[F,Cl,Br,I])]
DefineFeature SingleAtomAcceptor
[{Hydroxyl},{ChalcAcceptor},{NAcceptor},{HalogenAcceptor}]
Family Acceptor
Weights 1
EndFeature
# this one is delightfully easy:
DefineFeature AcidicGroup [C,S](=[O,S,P])-[O;H1,H0&-1]
Family NegIonizable
Weights 1.0,1.0,1.0
EndFeature
AtomType Carbon_NotDouble [C;!$(C=*)]
AtomType BasicNH2 [$([N;H2&+0][{Carbon_NotDouble}])]
AtomType BasicNH1 [$([N;H1&+0]([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
AtomType PosNH3 [$([N;H3&+1][{Carbon_NotDouble}])]
AtomType PosNH2 [$([N;H2&+1]([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
AtomType PosNH1
[$([N;H1&+1]([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
AtomType BasicNH0
[$([N;H0&+0]([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
AtomType QuatN
[$([N;H0&+1]([{Carbon_NotDouble}])([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
DefineFeature BasicGroup [{BasicNH2},{BasicNH1},{BasicNH0};!$(N[a])]
Family PosIonizable
Weights 1.0
EndFeature
# 14.11.2007 (GL): add !$([N+]-[O-]) constraint so we don't match
# nitro (or similar) groups
DefineFeature PosN [#7;+;!$([N+]-[O-])]
Family PosIonizable
Weights 1.0
EndFeature
# imidazole group can be positively charged (too promiscuous?)
DefineFeature Imidazole c1ncnc1
Family PosIonizable
Weights 1.0,1.0,1.0,1.0,1.0
EndFeature
# guanidine group is positively charged (too promiscuous?)
DefineFeature Guanidine NC(=N)N
Family PosIonizable
Weights 1.0,1.0,1.0,1.0
EndFeature
# amidine group is positively charged (too promiscuous?)
DefineFeature Amidine NC(=N)
Family PosIonizable
Weights 1.0,1.0,1.0
EndFeature
# the LigZn binder features were adapted from combichem.fdl
DefineFeature ZnBinder1 [S;D1]-[#6]
Family ZnBinder
Weights 1,0
EndFeature
DefineFeature ZnBinder2 [#6]-C(=O)-C-[S;D1]
Family ZnBinder
Weights 0,0,1,0,1
EndFeature
DefineFeature ZnBinder3 [#6]-C(=O)-C-C-[S;D1]
Family ZnBinder
Weights 0,0,1,0,0,1
EndFeature
DefineFeature ZnBinder4 [#6]-C(=O)-N-[O;D1]
Family ZnBinder
Weights 0,0,1,0,1
EndFeature
DefineFeature ZnBinder5 [#6]-C(=O)-[O;D1]
Family ZnBinder
Weights 0,0,1,1
EndFeature
DefineFeature ZnBinder6 [#6]-P(=O)(-O)-[C,O,N]-[C,H]
Family ZnBinder
Weights 0,0,1,1,0,0
EndFeature
# aromatic rings of various sizes:
#
# Note that with the aromatics, it's important to include the ring-size queries
along with
# the aromaticity query for two reasons:
# 1) Much of the current feature-location code assumes that the feature point
is
# equidistant from the atoms defining it. Larger definitions like:
a1aaaaaaaa1 will actually
# match things like 'o1c2cccc2ccc1', which have an aromatic unit spread
across multiple simple
# rings and so don't fit that requirement.
# 2) It's *way* faster.
#
#
# 21.1.2008 (GL): update ring membership tests to reflect corrected meaning of
# "r" in SMARTS parser
#
AtomType AromR4 [a;r4,!R1&r3]
DefineFeature Arom4 [{AromR4}]1:[{AromR4}]:[{AromR4}]:[{AromR4}]:1
Family Aromatic
Weights 1.0,1.0,1.0,1.0
EndFeature
AtomType AromR5 [a;r5,!R1&r4,!R1&r3]
DefineFeature Arom5 [{AromR5}]1:[{AromR5}]:[{AromR5}]:[{AromR5}]:[{AromR5}]:1
Family Aromatic
Weights 1.0,1.0,1.0,1.0,1.0
EndFeature
AtomType AromR6 [a;r6,!R1&r5,!R1&r4,!R1&r3]
DefineFeature Arom6
[{AromR6}]1:[{AromR6}]:[{AromR6}]:[{AromR6}]:[{AromR6}]:[{AromR6}]:1
Family Aromatic
Weights 1.0,1.0,1.0,1.0,1.0,1.0
EndFeature
AtomType AromR7 [a;r7,!R1&r6,!R1&r5,!R1&r4,!R1&r3]
DefineFeature Arom7
[{AromR7}]1:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:1
Family Aromatic
Weights 1.0,1.0,1.0,1.0,1.0,1.0,1.0
EndFeature
AtomType AromR8 [a;r8,!R1&r7,!R1&r6,!R1&r5,!R1&r4,!R1&r3]
DefineFeature Arom8
[{AromR8}]1:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:1
Family Aromatic
Weights 1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
EndFeature
# hydrophobic features
# any carbon that is not bonded to a polar atom is considered a hydrophobe
#
# 23.11.2007 (GL): match any bond (not just single bonds); add #6 at
# beginning to make it more efficient
AtomType Carbon_Polar [#6;$([#6]~[#7,#8,#9])]
# 23.11.2007 (GL): don't match charged carbon
AtomType Carbon_NonPolar [#6;+0;!{Carbon_Polar}]
DefineFeature ThreeWayAttach [D3,D4;{Carbon_NonPolar}]
Family Hydrophobe
Weights 1.0
EndFeature
DefineFeature ChainTwoWayAttach [R0;D2;{Carbon_NonPolar}]
Family Hydrophobe
Weights 1.0
EndFeature
# hydrophobic atom
AtomType Hphobe [c,s,S&H0&v2,Br,I,{Carbon_NonPolar}]
AtomType RingHphobe [R;{Hphobe}]
# nitro groups in the RD code are always: *-[N+](=O)[O-]
DefineFeature Nitro2 [N;D3;+](=O)[O-]
Family LumpedHydrophobe
Weights 1.0,1.0,1.0
EndFeature
#
# 21.1.2008 (GL): update ring membership tests to reflect corrected meaning of
# "r" in SMARTS parser
#
AtomType Ring6 [r6,!R1&r5,!R1&r4,!R1&r3]
DefineFeature RH6_6
[{Ring6};{RingHphobe}]1[{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}]1
Family LumpedHydrophobe
Weights 1.0,1.0,1.0,1.0,1.0,1.0
EndFeature
AtomType Ring5 [r5,!R1&r4,!R1&r3]
DefineFeature RH5_5
[{Ring5};{RingHphobe}]1[{Ring5};{RingHphobe}][{Ring5};{RingHphobe}][{Ring5};{RingHphobe}][{Ring5};{RingHphobe}]1
Family LumpedHydrophobe
Weights 1.0,1.0,1.0,1.0,1.0
EndFeature
AtomType Ring4 [r4,!R1&r3]
DefineFeature RH4_4
[{Ring4};{RingHphobe}]1[{Ring4};{RingHphobe}][{Ring4};{RingHphobe}][{Ring4};{RingHphobe}]1
Family LumpedHydrophobe
Weights 1.0,1.0,1.0,1.0
EndFeature
AtomType Ring3 [r3]
DefineFeature RH3_3
[{Ring3};{RingHphobe}]1[{Ring3};{RingHphobe}][{Ring3};{RingHphobe}]1
Family LumpedHydrophobe
Weights 1.0,1.0,1.0
EndFeature
#DefineFeature tButyl [C;!R](-[CH3])(-[CH3])-[CH3]
# Family LumpedHydrophobe
# Weights 1.0,0.0,0.0,0.0
#EndFeature
#DefineFeature iPropyl [CH;!R](-[CH3])-[CH3]
# Family LumpedHydrophobe
# Weights 1.0,1.0,1.0
#EndFeature