Markus, Thank you very much for the modified definitions file. It will certainly go to good use here.
-Kirk On Thu, Oct 16, 2008 at 1:49 AM, markus <[email protected]> wrote: > > >> Message: 6 >> Date: Wed, 15 Oct 2008 14:07:58 -0600 >> From: "Robert DeLisle" <[email protected]> >> Subject: Re: [Rdkit-discuss] H-bond Acceptor problem >> To: "Hans Purkey" <[email protected]> >> Cc: RD-Kit <[email protected]>, Greg Landrum >> <[email protected]> >> Message-ID: >> <[email protected]> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Good point, Hans. >> >> I see that within the available descriptors there are NHOHCount and >> NOCount, >> which I assume are equivalent to Lipinski's Donors and Acceptors. Also >> there are NumHAcceptors and NumHDonors which I would expect to >> differentiate >> themselves from the Linpinski versions in some way. >> >> -Kirk >> >> >> >> >> On Wed, Oct 15, 2008 at 1:19 PM, Hans Purkey <[email protected]> >> wrote: >> >> >> >>> If the intention is to follow Lipinski's definitions of Hbond acceptors, >>> then it should be a simple N+O count (look back at the original paper >>> and >>> that is how he difined it "for simplicity"). >>> >>> However, if the descriptor is intended to match a more >>> intuitive/realistic >>> definition of HBA, then N-H shouldn't be a part of it. >>> >>> Hans >>> >>> >>> On Oct 15, 2008, at 11:50 AM, Greg Landrum wrote: >>> >>> [heh, worse than sending a message without an attachment is hitting >>> >>> >>>> send before the message is done and sending a message without text... >>>> sorry] >>>> >>>> On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle <[email protected]> >>>> wrote: >>>> >>>> >>>> >>>>> As you know, I've been working with descriptors in RDKit, and I think >>>>> I've >>>>> found a bug in the calculation of H-bond Acceptors. Attached is an >>>>> example >>>>> structure, N-methyl-1H-indole-6-carboxamide. When I calculate >>>>> NumHAcceptors >>>>> for this structure, I get 3. I've looked at numerous other strucures >>>>> and >>>>> it >>>>> seems that nitrogens are always counted. I went into the code and >>>>> found >>>>> the >>>>> definitions used for HAcceptors: >>>>> >>>>> >>>>> >>>> Here's a simpler case showing the same behavior: >>>> [15] >>> m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1') >>>> >>>> [16] >>> Lipinski.NumHAcceptors(m2) >>>> Out[16]: 3 >>>> >>>> so that confirms the wrong count >>>> >>>> >>>> >>>> >>>>> $([O,S;H1;v2]-[!$(*=[O,N,P,S])]) >>>>> $([O,S;H0;v2]) >>>>> $([O,S;-]) >>>>> $([N&v3;H1,H2]-[!$(*=[O,N,P,S])]) >>>>> $([N;v3;H0]) >>>>> $([n,o,s;+0]) >>>>> F >>>>> >>>>> Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH >>>>> groups are being counted as an acceptor due to matching >>>>> $([N&v3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be >>>>> excluded >>>>> according to this same definition? >>>>> >>>>> >>>>> >>>> [20] >>> >>>> >>>> m2.GetSubstructMatches(Chem.MolFromSmarts('[$([N&v3;H1,H2]-[!$(*=[O,N,P,S])])]')) >>>> Out[20]: ((1,),) >>>> >>>> Only matches one nitrogen... the amide nitrogen. The aromatic N >>>> matches the second but last definition: >>>> [29] >>> m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]')) >>>> Out[29]: ((6,),) >>>> >>>> The problem is that the first definition matches an N that is single >>>> bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not >>>> exclude Ns that are single bonded to an atom that is doubly bonded to >>>> O,N,P, or S. So your amide with a secondary N matches. The problem >>>> isn't the matcher, it's the definition. >>>> >>>> Is that clear? >>>> >>>> I agree that this is a bug in the definition and will fix it. Would >>>> you mind entering the bug at sf.net or should I do it? >>>> >>>> -greg >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge >>>> Build the coolest Linux based applications with Moblin SDK & win great >>>> prizes >>>> Grand prize is a trip for two to an Open Source event anywhere in the >>>> world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>>> >>>> >>>> >>> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> >> ------------------------------ >> >> _______________________________________________ >> Rdkit-discuss mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> >> End of Rdkit-discuss Digest, Vol 13, Issue 3 >> ******************************************** >> >> > Hi there, > some time ago, I had a similar concern about some Features in > Basefeatures.fdef. > I thought this to be a matter of a somewhat personal view and simply > changed the Definitions to satisfy my needs. > The three changes worth noting were: > the amide (as discussed above) > the Amidine group, that I checked in as positive ionizable (does it > interfere with the guanidine, greg??), > and a Carboxylic acid Oxygen, that I additionally define as Acceptor > (I presume it to be deprotonated). > I attached the customized file > Maybe this helps, > Markus > > > > # $Id: BaseFeatures.fdef 662 2008-05-14 20:22:44Z glandrum $ > # > # RDKit base fdef file. > # Created by Greg Landrum > # changes by M.Kossner: > # ChalcAcceptor in line 27: removed a v2; in order to include e.g. > carboxylate O- as Acceptor > # NDonor in line 13 !{AmideN} in order to exclude e.g. dimethylamides from > NDonor (they won't be protonated!) > # Amidine as Pos ion Feature analogous to Guanidine > > AtomType AmideN [$(N-C(=O))] > AtomType SulfonamideN [$([N;H0]S(=O)(=O))] > AtomType NDonor [N&!H0&v3,N&!H0&+1&v4,n&H1&+0] > AtomType NDonor [$([Nv3](-C)(-C)-C);!{AmideN}] > AtomType NDonor [$(n[n;H1]),$(nc[n;H1])] > > AtomType AcidicHydroxyl [$([O]C(=[O,S,P]))] > AtomType ChalcDonor [O,S;H1;+0] > DefineFeature SingleAtomDonor [{NDonor},{ChalcDonor};!{AcidicHydroxyl}] > Family Donor > Weights 1 > EndFeature > > # aromatic N, but not indole or pyrole or fusing two rings > AtomType NAcceptor [n;+0;!X3;!$([n;H1](cc)cc)] > AtomType NAcceptor [$([N;H0]#[C&v4])] > # tertiary nitrogen adjacent to aromatic carbon > AtomType NAcceptor [N&v3;H0;$(Nc)] > > # removes thioether and nitro oxygen > AtomType ChalcAcceptor [O;H0;!$(O=N-*)] > Atomtype ChalcAcceptor [O;-;!$(*-N=O)] > > # Removed aromatic sulfur from ChalcAcceptor definition > Atomtype ChalcAcceptor [o;+0] > > # Hydroxyls and acids > AtomType Hydroxyl [O;H1;v2] > > # F is an acceptor so long as the C has no other halogen neighbors. This is > maybe > # a bit too general, but the idea is to eliminate things like CF3 > AtomType HalogenAcceptor [F;$(F-[#6]);!$(FC[F,Cl,Br,I])] > > DefineFeature SingleAtomAcceptor > [{Hydroxyl},{ChalcAcceptor},{NAcceptor},{HalogenAcceptor}] > Family Acceptor > Weights 1 > EndFeature > > # this one is delightfully easy: > DefineFeature AcidicGroup [C,S](=[O,S,P])-[O;H1,H0&-1] > Family NegIonizable > Weights 1.0,1.0,1.0 > EndFeature > > AtomType Carbon_NotDouble [C;!$(C=*)] > AtomType BasicNH2 [$([N;H2&+0][{Carbon_NotDouble}])] > AtomType BasicNH1 [$([N;H1&+0]([{Carbon_NotDouble}])[{Carbon_NotDouble}])] > AtomType PosNH3 [$([N;H3&+1][{Carbon_NotDouble}])] > AtomType PosNH2 [$([N;H2&+1]([{Carbon_NotDouble}])[{Carbon_NotDouble}])] > AtomType PosNH1 > [$([N;H1&+1]([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])] > AtomType BasicNH0 > [$([N;H0&+0]([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])] > AtomType QuatN > [$([N;H0&+1]([{Carbon_NotDouble}])([{Carbon_NotDouble}])([{Carbon_NotDouble}])[{Carbon_NotDouble}])] > > > DefineFeature BasicGroup [{BasicNH2},{BasicNH1},{BasicNH0};!$(N[a])] > Family PosIonizable > Weights 1.0 > EndFeature > > # 14.11.2007 (GL): add !$([N+]-[O-]) constraint so we don't match > # nitro (or similar) groups > DefineFeature PosN [#7;+;!$([N+]-[O-])] > Family PosIonizable > Weights 1.0 > EndFeature > > # imidazole group can be positively charged (too promiscuous?) > DefineFeature Imidazole c1ncnc1 > Family PosIonizable > Weights 1.0,1.0,1.0,1.0,1.0 > EndFeature > > # guanidine group is positively charged (too promiscuous?) > DefineFeature Guanidine NC(=N)N > Family PosIonizable > Weights 1.0,1.0,1.0,1.0 > EndFeature > > # amidine group is positively charged (too promiscuous?) > DefineFeature Amidine NC(=N) > Family PosIonizable > Weights 1.0,1.0,1.0 > EndFeature > > # the LigZn binder features were adapted from combichem.fdl > DefineFeature ZnBinder1 [S;D1]-[#6] > Family ZnBinder > Weights 1,0 > EndFeature > DefineFeature ZnBinder2 [#6]-C(=O)-C-[S;D1] > Family ZnBinder > Weights 0,0,1,0,1 > EndFeature > DefineFeature ZnBinder3 [#6]-C(=O)-C-C-[S;D1] > Family ZnBinder > Weights 0,0,1,0,0,1 > EndFeature > > DefineFeature ZnBinder4 [#6]-C(=O)-N-[O;D1] > Family ZnBinder > Weights 0,0,1,0,1 > EndFeature > DefineFeature ZnBinder5 [#6]-C(=O)-[O;D1] > Family ZnBinder > Weights 0,0,1,1 > EndFeature > DefineFeature ZnBinder6 [#6]-P(=O)(-O)-[C,O,N]-[C,H] > Family ZnBinder > Weights 0,0,1,1,0,0 > EndFeature > > > # aromatic rings of various sizes: > # > # Note that with the aromatics, it's important to include the ring-size > queries along with > # the aromaticity query for two reasons: > # 1) Much of the current feature-location code assumes that the feature > point is > # equidistant from the atoms defining it. Larger definitions like: > a1aaaaaaaa1 will actually > # match things like 'o1c2cccc2ccc1', which have an aromatic unit > spread across multiple simple > # rings and so don't fit that requirement. > # 2) It's *way* faster. > # > > # > # 21.1.2008 (GL): update ring membership tests to reflect corrected meaning > of > # "r" in SMARTS parser > # > AtomType AromR4 [a;r4,!R1&r3] > DefineFeature Arom4 [{AromR4}]1:[{AromR4}]:[{AromR4}]:[{AromR4}]:1 > Family Aromatic > Weights 1.0,1.0,1.0,1.0 > EndFeature > AtomType AromR5 [a;r5,!R1&r4,!R1&r3] > DefineFeature Arom5 > [{AromR5}]1:[{AromR5}]:[{AromR5}]:[{AromR5}]:[{AromR5}]:1 > Family Aromatic > Weights 1.0,1.0,1.0,1.0,1.0 > EndFeature > AtomType AromR6 [a;r6,!R1&r5,!R1&r4,!R1&r3] > DefineFeature Arom6 > [{AromR6}]1:[{AromR6}]:[{AromR6}]:[{AromR6}]:[{AromR6}]:[{AromR6}]:1 > Family Aromatic > Weights 1.0,1.0,1.0,1.0,1.0,1.0 > EndFeature > AtomType AromR7 [a;r7,!R1&r6,!R1&r5,!R1&r4,!R1&r3] > DefineFeature Arom7 > [{AromR7}]1:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:[{AromR7}]:1 > Family Aromatic > Weights 1.0,1.0,1.0,1.0,1.0,1.0,1.0 > EndFeature > AtomType AromR8 [a;r8,!R1&r7,!R1&r6,!R1&r5,!R1&r4,!R1&r3] > DefineFeature Arom8 > [{AromR8}]1:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:[{AromR8}]:1 > Family Aromatic > Weights 1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0 > EndFeature > > # hydrophobic features > # any carbon that is not bonded to a polar atom is considered a hydrophobe > # > # 23.11.2007 (GL): match any bond (not just single bonds); add #6 at > # beginning to make it more efficient > AtomType Carbon_Polar [#6;$([#6]~[#7,#8,#9])] > # 23.11.2007 (GL): don't match charged carbon > AtomType Carbon_NonPolar [#6;+0;!{Carbon_Polar}] > > DefineFeature ThreeWayAttach [D3,D4;{Carbon_NonPolar}] > Family Hydrophobe > Weights 1.0 > EndFeature > > DefineFeature ChainTwoWayAttach [R0;D2;{Carbon_NonPolar}] > Family Hydrophobe > Weights 1.0 > EndFeature > > # hydrophobic atom > AtomType Hphobe [c,s,S&H0&v2,Br,I,{Carbon_NonPolar}] > AtomType RingHphobe [R;{Hphobe}] > > # nitro groups in the RD code are always: *-[N+](=O)[O-] > DefineFeature Nitro2 [N;D3;+](=O)[O-] > Family LumpedHydrophobe > Weights 1.0,1.0,1.0 > EndFeature > > # > # 21.1.2008 (GL): update ring membership tests to reflect corrected meaning > of > # "r" in SMARTS parser > # > AtomType Ring6 [r6,!R1&r5,!R1&r4,!R1&r3] > DefineFeature RH6_6 > [{Ring6};{RingHphobe}]1[{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}][{Ring6};{RingHphobe}]1 > Family LumpedHydrophobe > Weights 1.0,1.0,1.0,1.0,1.0,1.0 > EndFeature > > AtomType Ring5 [r5,!R1&r4,!R1&r3] > DefineFeature RH5_5 > [{Ring5};{RingHphobe}]1[{Ring5};{RingHphobe}][{Ring5};{RingHphobe}][{Ring5};{RingHphobe}][{Ring5};{RingHphobe}]1 > Family LumpedHydrophobe > Weights 1.0,1.0,1.0,1.0,1.0 > EndFeature > > AtomType Ring4 [r4,!R1&r3] > DefineFeature RH4_4 > [{Ring4};{RingHphobe}]1[{Ring4};{RingHphobe}][{Ring4};{RingHphobe}][{Ring4};{RingHphobe}]1 > Family LumpedHydrophobe > Weights 1.0,1.0,1.0,1.0 > EndFeature > > AtomType Ring3 [r3] > DefineFeature RH3_3 > [{Ring3};{RingHphobe}]1[{Ring3};{RingHphobe}][{Ring3};{RingHphobe}]1 > Family LumpedHydrophobe > Weights 1.0,1.0,1.0 > EndFeature > > #DefineFeature tButyl [C;!R](-[CH3])(-[CH3])-[CH3] > # Family LumpedHydrophobe > # Weights 1.0,0.0,0.0,0.0 > #EndFeature > > #DefineFeature iPropyl [CH;!R](-[CH3])-[CH3] > # Family LumpedHydrophobe > # Weights 1.0,1.0,1.0 > #EndFeature > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > >

