I wanted to make one more post on this topic, ask a couple questions (at the bottom of the post), and give people a few days to comment before I regenerate the regression test data and commit a change for this bug.
On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey <[email protected]> wrote: > If the intention is to follow Lipinski's definitions of Hbond acceptors, > then it should be a simple N+O count (look back at the original paper and > that is how he difined it "for simplicity"). For those who are coming to this late, this is the NOCount() descriptor, which is already present in the RDKit. > However, if the descriptor is intended to match a more intuitive/realistic > definition of HBA, then N-H shouldn't be a part of it. I don't think I agree with this. There are plenty of cases of nitrogens with attached Hs that act as H-bond acceptors (I did a CCD search yesterday to be sure), but that's a side topic. Back to the main topic: since these descriptors are all defined in a module named "Lipinski", and since this all qualitative anyway, I'd propose the following change: The existing NumHDonors and NumHAcceptors (with fixes, discussed below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then deprecate NOCount and NHOHCount (they will generate warnings when used in the next release and then be completely removed in the release after that). For the purposes of fixing the more complex HAcceptor descriptor I propose the following SMARTS: HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\ $([O,S;H0;v2]),$([O,S;-]),\ $([N;v3;!$(n-...@[o,N,P,S])]),\ $([nH0,o,s;+0]),\ $([F;!$(F-*-F)])]')d There are two changes here: the third line and the last one. The third line includes nitrogens that have three neighbors and that are not connected to another atom that has a non-ring double bond to O, N, P, or S. The last line includes Fs that are not connected to another atom that has more than one F attached (to exclude CF3 and CF2). I realize these are not highly tuned, very detailed definitions like those in the fdef file discussed elsewhere on this thread, but are they acceptable for a qualitative descriptor? So, the two questions: 1) Should the renaming mentioned above (i.e. the NumHAcceptor and NumHDonor descriptors start returning the "official" Lipinski values and the existing functions are renamed to NumHAcceptorAlt and NumHDonorAlt) be done? 2) Is the above SMARTS reasonable for the more detailed HAcceptor definition? Thanks for any feedback, -greg

