I wanted to make one more post on this topic, ask a couple questions
(at the bottom of the post), and give people a few days to comment
before I regenerate the regression test data and commit a change for
this bug.

On Wed, Oct 15, 2008 at 8:19 PM, Hans Purkey <[email protected]> wrote:
> If the intention is to follow Lipinski's definitions of Hbond acceptors,
> then  it should be a simple N+O count (look back at the original paper and
> that is how he difined it "for simplicity").

For those who are coming to this late, this is the NOCount()
descriptor, which is already present in the RDKit.

> However, if the descriptor is intended to match a more intuitive/realistic
> definition of HBA, then N-H shouldn't be a part of it.

I don't think I agree with this. There are plenty of cases of
nitrogens with attached Hs that act as H-bond acceptors (I did a CCD
search yesterday to be sure), but that's a side topic.

Back to the main topic: since these descriptors are all defined in a
module named "Lipinski", and since this all qualitative anyway, I'd
propose the following change:
The existing NumHDonors and NumHAcceptors (with fixes, discussed
below) be renamed to NumHDonorsAlt and NumHAcceptorsAlt and NOCount
and NHOHCount be aliased to NumHAcceptors and NumHDonors. I'd then
deprecate NOCount and NHOHCount (they will generate warnings when used
in the next release and then be completely removed in the release
after that).

For the purposes of fixing the more complex HAcceptor descriptor I
propose the following SMARTS:

HAcceptorSmarts = Chem.MolFromSmarts('[$([O,S;H1;v2]-[!$(*=[O,N,P,S])]),\
$([O,S;H0;v2]),$([O,S;-]),\
$([N;v3;!$(n-...@[o,N,P,S])]),\
$([nH0,o,s;+0]),\
$([F;!$(F-*-F)])]')d

There are two changes here: the third line and the last one.
The third line includes nitrogens that have three neighbors and that
are not connected to another atom that has a non-ring double bond to
O, N, P, or S.
The last line includes Fs that are not connected to another atom that
has more than one F attached (to exclude CF3 and CF2).

I realize these are not highly tuned, very detailed definitions like
those in the fdef file discussed elsewhere on this thread, but are
they acceptable for a qualitative descriptor?

So, the two questions:
1) Should the renaming mentioned above (i.e. the NumHAcceptor and
NumHDonor descriptors start returning the "official" Lipinski values
and the existing functions are renamed to NumHAcceptorAlt and
NumHDonorAlt) be done?
2) Is the above SMARTS reasonable for the more detailed HAcceptor definition?

Thanks for any feedback,
-greg

Reply via email to