[heh, worse than sending a message without an attachment is hitting send before the message is done and sending a message without text... sorry]
On Wed, Oct 15, 2008 at 7:59 PM, Robert DeLisle <[email protected]> wrote: > > As you know, I've been working with descriptors in RDKit, and I think I've > found a bug in the calculation of H-bond Acceptors. Attached is an example > structure, N-methyl-1H-indole-6-carboxamide. When I calculate NumHAcceptors > for this structure, I get 3. I've looked at numerous other strucures and it > seems that nitrogens are always counted. I went into the code and found the > definitions used for HAcceptors: Here's a simpler case showing the same behavior: [15] >>> m2 = Chem.MolFromSmiles('CNC(=O)c1c[nH]cc1') [16] >>> Lipinski.NumHAcceptors(m2) Out[16]: 3 so that confirms the wrong count > > $([O,S;H1;v2]-[!$(*=[O,N,P,S])]) > $([O,S;H0;v2]) > $([O,S;-]) > $([N&v3;H1,H2]-[!$(*=[O,N,P,S])]) > $([N;v3;H0]) > $([n,o,s;+0]) > F > > Unless I'm misinterpreting the SMARTS (a very good possiblity), both NH > groups are being counted as an acceptor due to matching > $([N&v3;H1,H2]-[!$(*=[O,N,P,S])]), but shouldn't the amide NH be excluded > according to this same definition? [20] >>> m2.GetSubstructMatches(Chem.MolFromSmarts('[$([N&v3;H1,H2]-[!$(*=[O,N,P,S])])]')) Out[20]: ((1,),) Only matches one nitrogen... the amide nitrogen. The aromatic N matches the second but last definition: [29] >>> m2.GetSubstructMatches(Chem.MolFromSmarts('[$([n,o,s;+0])]')) Out[29]: ((6,),) The problem is that the first definition matches an N that is single bonded to an atom that isn't doubly bonded to O,N,P, or S. It does not exclude Ns that are single bonded to an atom that is doubly bonded to O,N,P, or S. So your amide with a secondary N matches. The problem isn't the matcher, it's the definition. Is that clear? I agree that this is a bug in the definition and will fix it. Would you mind entering the bug at sf.net or should I do it? -greg

