Hi Paul,

On Mon, Feb 11, 2013 at 9:15 AM,  <[email protected]> wrote:
>
> how do the MOE and RDKit implementations of VSA descriptors correlate?
>

I've never checked, so I'm afraid I don't know. At the time we
implemented the descriptors, we didn't have access to MOE.

> I was looking into the documentation and only found a fingerprint
> correlation plot. Not sure if there is a correlation plot for the
> descriptors as well - or maybe it was part of my dreams.

Kirk DeLisle did something quite a while ago that looks at the
correlations of the RDKit descriptors with each other. The data and
plots are here:
https://sourceforge.net/p/rdkit/code/2416/tree/trunk/Docs/Analysis/Descriptors/Correlations/

>
> Given that particular compound:
> m2 = Chem.MolFromSmiles('Clc1cc(Oc2ccc(cc2C(NC)C)C)ccc1Cl')
> Exemplary, the output for
> print Descriptors.PEOE_VSA2(m2)
> 0.0
> print Descriptors.PEOE_VSA6(m2)
> 40.8980654091
> print Descriptors.SlogP_VSA6(m2)
> 36.3982024108
>
>
> Switching into MOE gives these values:
> PEOE_VSA+2
> 42.35798
> PEOE_VSA-2
> 0
> PEOE_VSA+6
> 0
> PEOE_VSA-6
> 2.503756
> SlogP_VSA6
> 7.001213
>
>
>
> First question:
> A partial charge calculation is not necessary, or am I wrong?

Partial charges are needed for the PEOE_VSA descriptors, but the RDKit
will generate the charges when you call the function if you haven't
done so already.

> At least, the descriptor values are not different in the above case (data
> not shown).
>
>
> Second question:
> Am I comparing apples with peas, or is the above case just a bad example?

A bit of background before I answer this one:
These descriptors each combine a VSA calculation with another
descriptor value; mollogp, molmr, or a partial charge.
When we implemented each of those methods we tried to follow the
original publication as closely as we could. Most of these things
don't have large test sets available to use to validate new
implementations, and we didn't have access to any reference
implementations, so I can't be sure how well we did. At this point
others have compared the RDKit MolLogP and MolMR implementations with
the MOE implementation and reported that it compares quite well (after
some fixing, of course), so I have some confidence in that. The VSA
and Gasteiger-Marsilli partial charge implementations, on the other
hand, have never had this treatment. I'm not even sure what would
qualify as a reference implementation of the partial charge algorithm,
but I've never done (not has anyone else told me about doing) the
experiment with MOE for VSA.

The SLOGP_VSA, SMR_VSA, and PEOE_VSA descriptors combine a binned form
of either MolLogP, MolMR, or the partial charge with a VSA
contribution to get a descriptor value. The RDKit implementation
originally used the published bin values, but at some point along the
way (this was probably 9-10 years ago) we changed the bin definitions
(by adding more bins) for SLOGP_VSA and SMR_VSA a bit to better
represent the datasets we cared about at the time. You can see both
the original bin definitions and the RDKit values here:
https://sourceforge.net/p/rdkit/code/2416/tree/trunk/rdkit/Chem/MolSurf.py
at lines 109 and 142.

Because of these changes I definitely would not expect 100% agreement.

I would also argue that it doesn't really matter. SLOGP, SMR, partial
charges, and possible VSA are all "primary" descriptors: they have a
more-or-less direct mapping to the real world and are somewhat
interpretable. The XXX_VSA descriptors, on the other hand, are
intended to be used to build predictive models. As such, there's no
need for them to be consistent across programs; the only thing that
really matters is whether or not they are useful in building models.
In my experience at least, the RDKit XXX_VSA descriptor
implementations are certainly useful for that.

Make sense?
-greg

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to