Markus,

Thank you very much for the modified definitions file.  It will certainly go
to good use here.

-Kirk



On Thu, Oct 16, 2008 at 1:49 AM, markus <[email protected]> wrote:

>
>
>> Message: 6
>> Date: Wed, 15 Oct 2008 14:07:58 -0600
>> From: "Robert DeLisle" <[email protected]>
>> Subject: Re: [Rdkit-discuss] H-bond Acceptor problem
>> To: "Hans Purkey" <[email protected]>
>> Cc: RD-Kit <[email protected]>,       Greg Landrum
>>        <[email protected]>
>> Message-ID:
>>        <[email protected]>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Good point, Hans.
>>
>> I see that within the available descriptors there are NHOHCount and
>> NOCount,
>> which I assume are equivalent to Lipinski's Donors and Acceptors.  Also
>> there are NumHAcceptors and NumHDonors which I would expect to
>> differentiate
>> themselves from the Linpinski versions in some way.
>>
>> -Kirk
>>
>>
>>
>>
>> On Wed, Oct 15, 2008 at 1:19 PM, Hans Purkey <[email protected]>
>> wrote:
>>
>>
>>
>>> If the intention is to follow Lipinski's definitions of Hbond acceptors,
>>> then  it should be a simple N+O count (look back at the original paper
>>> and
>>> that is how he difined it "for simplicity").
>>>
>>> However, if the descriptor is intended to match a more
>>> intuitive/realistic
>>> definition of HBA, then N-H shouldn't be a part of it.
>>>
>>> Hans
>>>
>>>
>>> On Oct 15, 2008, at 11:50 AM, Greg Landrum wrote:
>>>
>>>  [heh, worse than sending a message without an attachment is hitting
>>>
>>>
>>>> send before the message is done and sending a message without text...
>>>> sorry]
>>>>
>>>> On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle <[email protected]>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> As you know, I've been working with descriptors in RDKit, and I think
>>>>> I've
>>>>> found a bug in the calculation of H-bond Acceptors.  Attached is an
>>>>> example
>>>>> structure, N-methyl-1H-indole-6-carboxamide.  When I calculate
>>>>> NumHAcceptors
>>>>> for this structure, I get 3.  I've looked at numerous other strucures
>>>>> and
>>>>> it
>>>>> seems that nitrogens are always counted.  I went into the code and
>>>>> found
>>>>> the
>>>>> definitions used for HAcceptors:
>>>>>
>>>>>
>>>>>
>>>> Here's a simpler case showing the same behavior:
>>>> [15] >>> m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1')
>>>>
>>>> [16] >>> Lipinski.NumHAcceptors(m2)
>>>> Out[16]: 3
>>>>
>>>> so that confirms the wrong count
>>>>
>>>>
>>>>
>>>>
>>>>> $([O,S;H1;v2]-[!$(*=[O,N,P,S])])
>>>>> $([O,S;H0;v2])
>>>>> $([O,S;-])
>>>>> $([N&v3;H1,H2]-[!$(*=[O,N,P,S])])
>>>>> $([N;v3;H0])
>>>>> $([n,o,s;+0])
>>>>> F
>>>>>
>>>>> Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH
>>>>> groups are being counted as an acceptor due to matching
>>>>> $([N&v3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be
>>>>> excluded
>>>>> according to this same definition?
>>>>>
>>>>>
>>>>>
>>>> [20] >>>
>>>>
>>>> m2.GetSubstructMatches(Chem.MolFromSmarts('[$([N&v3;H1,H2]-[!$(*=[O,N,P,S])])]'))
>>>> Out[20]: ((1,),)
>>>>
>>>> Only matches one nitrogen... the amide nitrogen. The aromatic N
>>>> matches the second but last definition:
>>>> [29] >>> m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]'))
>>>> Out[29]: ((6,),)
>>>>
>>>> The problem is that the first definition matches an N that is single
>>>> bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not
>>>> exclude Ns that are single bonded to an atom that is doubly bonded to
>>>> O,N,P, or S. So your amide with a secondary N matches. The problem
>>>> isn't the matcher, it's the definition.
>>>>
>>>> Is that clear?
>>>>
>>>> I agree that this is a bug in the definition and will fix it. Would
>>>> you mind entering the bug at sf.net or should I do it?
>>>>
>>>> -greg
>>>>
>>>>
>>>> -------------------------------------------------------------------------
>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>> challenge
>>>> Build the coolest Linux based applications with Moblin SDK & win great
>>>> prizes
>>>> Grand prize is a trip for two to an Open Source event anywhere in the
>>>> world
>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>>
>>>>
>>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>>
>> ------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win great
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>
>> ------------------------------
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>> End of Rdkit-discuss Digest, Vol 13, Issue 3
>> ********************************************
>>
>>
> Hi there,
> some time ago, I had a similar concern about some Features in
> Basefeatures.fdef.
> I thought this to be a matter of a somewhat personal view and simply
> changed the Definitions to satisfy my needs.
> The three changes worth noting were:
> the amide (as discussed above)
> the Amidine group, that I checked in as  positive ionizable (does it
> interfere with the guanidine, greg??),
> and a Carboxylic acid  Oxygen,  that  I additionally  define as  Acceptor
> (I presume it to be deprotonated).
> I attached the customized file
> Maybe this helps,
> Markus
>
>
>
> # $Id: BaseFeatures.fdef 662 2008-05-14 20:22:44Z glandrum $
> #
> # RDKit base fdef file.
> # Created by Greg Landrum
> # changes by M.Kossner:
> # ChalcAcceptor in line 27: removed a v2; in order to include e.g.
> carboxylate O- as Acceptor
> # NDonor in line 13 !{AmideN} in order to exclude e.g. dimethylamides from
> NDonor (they won't be protonated!)
> # Amidine as Pos ion Feature analogous to Guanidine
>
> AtomType AmideN [$(N-C(=O))]
> AtomType SulfonamideN [$([N;H0]S(=O)(=O))]
> AtomType NDonor [N&!H0&v3,N&!H0&+1&v4,n&H1&+0]
> AtomType NDonor [$([Nv3](-C)(-C)-C);!{AmideN}]
> AtomType NDonor [$(n[n;H1]),$(nc[n;H1])]
>
> AtomType AcidicHydroxyl [$([O]C(=[O,S,P]))]
> AtomType ChalcDonor [O,S;H1;+0]
> DefineFeature SingleAtomDonor [{NDonor},{ChalcDonor};!{AcidicHydroxyl}]
>  Family Donor
>  Weights 1
> EndFeature
>
> # aromatic N, but not indole or pyrole or fusing two rings
> AtomType NAcceptor [n;+0;!X3;!$([n;H1](cc)cc)]
> AtomType NAcceptor [$([N;H0]#[C&v4])]
> # tertiary nitrogen adjacent to aromatic carbon
> AtomType NAcceptor [N&v3;H0;$(Nc)]
>
> # removes thioether and nitro oxygen
> AtomType ChalcAcceptor [O;H0;!$(O=N-*)]
> Atomtype ChalcAcceptor [O;-;!$(*-N=O)]
>
> # Removed aromatic sulfur from ChalcAcceptor definition
> Atomtype ChalcAcceptor [o;+0]
>
> # Hydroxyls and acids
> AtomType Hydroxyl [O;H1;v2]
>
> # F is an acceptor so long as the C has no other halogen neighbors. This is
> maybe
> # a bit too general, but the idea is to eliminate things like CF3
> AtomType HalogenAcceptor [F;$(F-[#6]);!$(FC[F,Cl,Br,I])]
>
> DefineFeature SingleAtomAcceptor
> [{Hydroxyl},{ChalcAcceptor},{NAcceptor},{HalogenAcceptor}]
>  Family Acceptor
>  Weights 1
> EndFeature
>
> # this one is delightfully easy:
> DefineFeature AcidicGroup [C,S](=[O,S,P])-[O;H1,H0&-1]
>  Family NegIonizable
>  Weights 1.0,1.0,1.0
> EndFeature
>
> AtomType Carbon_NotDouble [C;!$(C=*)]
> AtomType BasicNH2 [$([N;H2&+0][{Carbon_NotDouble}])]
> AtomType BasicNH1 [$([N;H1&+0]([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
> AtomType PosNH3 [$([N;H3&+1][{Carbon_NotDouble}])]
> AtomType PosNH2 [$([N;H2&+1]([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
> AtomType PosNH1
> [$([N;H1&+1]([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
> AtomType BasicNH0
> [$([N;H0&+0]([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
> AtomType QuatN
> [$([N;H0&+1]([{Carbon_NotDouble}])([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])]
>
>
> DefineFeature BasicGroup [{BasicNH2},{BasicNH1},{BasicNH0};!$(N[a])]
>  Family PosIonizable
>  Weights 1.0
> EndFeature
>
> # 14.11.2007 (GL): add !$([N+]-[O-]) constraint so we don't match
> # nitro (or similar) groups
> DefineFeature PosN [#7;+;!$([N+]-[O-])]
>  Family PosIonizable
>  Weights 1.0
> EndFeature
>
> # imidazole group can be positively charged (too promiscuous?)
> DefineFeature Imidazole c1ncnc1
>  Family PosIonizable
>  Weights 1.0,1.0,1.0,1.0,1.0
> EndFeature
>
> # guanidine group is positively charged (too promiscuous?)
> DefineFeature Guanidine NC(=N)N
>  Family PosIonizable
>  Weights 1.0,1.0,1.0,1.0
> EndFeature
>
> # amidine group is positively charged (too promiscuous?)
> DefineFeature Amidine NC(=N)
>  Family PosIonizable
>  Weights 1.0,1.0,1.0
> EndFeature
>
> # the LigZn binder features were adapted from combichem.fdl
> DefineFeature ZnBinder1 [S;D1]-[#6]
>  Family ZnBinder
>  Weights 1,0
> EndFeature
> DefineFeature ZnBinder2 [#6]-C(=O)-C-[S;D1]
>  Family ZnBinder
>  Weights 0,0,1,0,1
> EndFeature
> DefineFeature ZnBinder3 [#6]-C(=O)-C-C-[S;D1]
>  Family ZnBinder
>  Weights 0,0,1,0,0,1
> EndFeature
>
> DefineFeature ZnBinder4 [#6]-C(=O)-N-[O;D1]
>  Family ZnBinder
>  Weights 0,0,1,0,1
> EndFeature
> DefineFeature ZnBinder5 [#6]-C(=O)-[O;D1]
>  Family ZnBinder
>  Weights 0,0,1,1
> EndFeature
> DefineFeature ZnBinder6 [#6]-P(=O)(-O)-[C,O,N]-[C,H]
>  Family ZnBinder
>  Weights 0,0,1,1,0,0
> EndFeature
>
>
> # aromatic rings of various sizes:
> #
> # Note that with the aromatics, it's important to include the ring-size
> queries along with
> # the aromaticity query for two reasons:
> #   1) Much of the current feature-location code assumes that the feature
> point is
> #      equidistant from the atoms defining it. Larger definitions like:
> a1aaaaaaaa1 will actually
> #      match things like 'o1c2cccc2ccc1', which have an aromatic unit
> spread across multiple simple
> #      rings and so don't fit that requirement.
> #   2) It's *way* faster.
> #
>
> #
> # 21.1.2008 (GL): update ring membership tests to reflect corrected meaning
> of
> # "r" in SMARTS parser
> #
> AtomType AromR4 [a;r4,!R1&r3]
> DefineFeature Arom4 [{AromR4}]1:[{AromR4}]:[{AromR4}]:[{AromR4}]:1
>  Family Aromatic
>  Weights 1.0,1.0,1.0,1.0
> EndFeature
> AtomType AromR5 [a;r5,!R1&r4,!R1&r3]
> DefineFeature Arom5
> [{AromR5}]1:[{AromR5}]:[{AromR5}]:[{AromR5}]:[{AromR5}]:1
>  Family Aromatic
>  Weights 1.0,1.0,1.0,1.0,1.0
> EndFeature
> AtomType AromR6 [a;r6,!R1&r5,!R1&r4,!R1&r3]
> DefineFeature Arom6
> [{AromR6}]1:[{AromR6}]:[{AromR6}]:[{AromR6}]:[{AromR6}]:[{AromR6}]:1
>  Family Aromatic
>  Weights 1.0,1.0,1.0,1.0,1.0,1.0
> EndFeature
> AtomType AromR7 [a;r7,!R1&r6,!R1&r5,!R1&r4,!R1&r3]
> DefineFeature Arom7
> [{AromR7}]1:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:1
>  Family Aromatic
>  Weights 1.0,1.0,1.0,1.0,1.0,1.0,1.0
> EndFeature
> AtomType AromR8 [a;r8,!R1&r7,!R1&r6,!R1&r5,!R1&r4,!R1&r3]
> DefineFeature Arom8
> [{AromR8}]1:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:1
>  Family Aromatic
>  Weights 1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
> EndFeature
>
> # hydrophobic features
> # any carbon that is not bonded to a polar atom is considered a hydrophobe
> #
> # 23.11.2007 (GL): match any bond (not just single bonds); add #6 at
> #  beginning to make it more efficient
> AtomType Carbon_Polar [#6;$([#6]~[#7,#8,#9])]
> # 23.11.2007 (GL): don't match charged carbon
> AtomType Carbon_NonPolar [#6;+0;!{Carbon_Polar}]
>
> DefineFeature ThreeWayAttach [D3,D4;{Carbon_NonPolar}]
>  Family Hydrophobe
>  Weights 1.0
> EndFeature
>
> DefineFeature ChainTwoWayAttach [R0;D2;{Carbon_NonPolar}]
>  Family Hydrophobe
>  Weights 1.0
> EndFeature
>
> # hydrophobic atom
> AtomType Hphobe [c,s,S&H0&v2,Br,I,{Carbon_NonPolar}]
> AtomType RingHphobe [R;{Hphobe}]
>
> # nitro groups in the RD code are always: *-[N+](=O)[O-]
> DefineFeature Nitro2 [N;D3;+](=O)[O-]
>  Family LumpedHydrophobe
>  Weights 1.0,1.0,1.0
> EndFeature
>
> #
> # 21.1.2008 (GL): update ring membership tests to reflect corrected meaning
> of
> # "r" in SMARTS parser
> #
> AtomType Ring6 [r6,!R1&r5,!R1&r4,!R1&r3]
> DefineFeature RH6_6
> [{Ring6};{RingHphobe}]1[{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}]1
>  Family LumpedHydrophobe
>  Weights 1.0,1.0,1.0,1.0,1.0,1.0
> EndFeature
>
> AtomType Ring5 [r5,!R1&r4,!R1&r3]
> DefineFeature RH5_5
> [{Ring5};{RingHphobe}]1[{Ring5};{RingHphobe}][{Ring5};{RingHphobe}][{Ring5};{RingHphobe}][{Ring5};{RingHphobe}]1
>  Family LumpedHydrophobe
>  Weights 1.0,1.0,1.0,1.0,1.0
> EndFeature
>
> AtomType Ring4 [r4,!R1&r3]
> DefineFeature RH4_4
> [{Ring4};{RingHphobe}]1[{Ring4};{RingHphobe}][{Ring4};{RingHphobe}][{Ring4};{RingHphobe}]1
>  Family LumpedHydrophobe
>  Weights 1.0,1.0,1.0,1.0
> EndFeature
>
> AtomType Ring3 [r3]
> DefineFeature RH3_3
> [{Ring3};{RingHphobe}]1[{Ring3};{RingHphobe}][{Ring3};{RingHphobe}]1
>  Family LumpedHydrophobe
>  Weights 1.0,1.0,1.0
> EndFeature
>
> #DefineFeature tButyl [C;!R](-[CH3])(-[CH3])-[CH3]
> #  Family LumpedHydrophobe
> #  Weights 1.0,0.0,0.0,0.0
> #EndFeature
>
> #DefineFeature iPropyl [CH;!R](-[CH3])-[CH3]
> #  Family LumpedHydrophobe
> #  Weights 1.0,1.0,1.0
> #EndFeature
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>

Reply via email to