Hi Paul, On Mon, Feb 11, 2013 at 9:15 AM, <[email protected]> wrote: > > how do the MOE and RDKit implementations of VSA descriptors correlate? >
I've never checked, so I'm afraid I don't know. At the time we implemented the descriptors, we didn't have access to MOE. > I was looking into the documentation and only found a fingerprint > correlation plot. Not sure if there is a correlation plot for the > descriptors as well - or maybe it was part of my dreams. Kirk DeLisle did something quite a while ago that looks at the correlations of the RDKit descriptors with each other. The data and plots are here: https://sourceforge.net/p/rdkit/code/2416/tree/trunk/Docs/Analysis/Descriptors/Correlations/ > > Given that particular compound: > m2 = Chem.MolFromSmiles('Clc1cc(Oc2ccc(cc2C(NC)C)C)ccc1Cl') > Exemplary, the output for > print Descriptors.PEOE_VSA2(m2) > 0.0 > print Descriptors.PEOE_VSA6(m2) > 40.8980654091 > print Descriptors.SlogP_VSA6(m2) > 36.3982024108 > > > Switching into MOE gives these values: > PEOE_VSA+2 > 42.35798 > PEOE_VSA-2 > 0 > PEOE_VSA+6 > 0 > PEOE_VSA-6 > 2.503756 > SlogP_VSA6 > 7.001213 > > > > First question: > A partial charge calculation is not necessary, or am I wrong? Partial charges are needed for the PEOE_VSA descriptors, but the RDKit will generate the charges when you call the function if you haven't done so already. > At least, the descriptor values are not different in the above case (data > not shown). > > > Second question: > Am I comparing apples with peas, or is the above case just a bad example? A bit of background before I answer this one: These descriptors each combine a VSA calculation with another descriptor value; mollogp, molmr, or a partial charge. When we implemented each of those methods we tried to follow the original publication as closely as we could. Most of these things don't have large test sets available to use to validate new implementations, and we didn't have access to any reference implementations, so I can't be sure how well we did. At this point others have compared the RDKit MolLogP and MolMR implementations with the MOE implementation and reported that it compares quite well (after some fixing, of course), so I have some confidence in that. The VSA and Gasteiger-Marsilli partial charge implementations, on the other hand, have never had this treatment. I'm not even sure what would qualify as a reference implementation of the partial charge algorithm, but I've never done (not has anyone else told me about doing) the experiment with MOE for VSA. The SLOGP_VSA, SMR_VSA, and PEOE_VSA descriptors combine a binned form of either MolLogP, MolMR, or the partial charge with a VSA contribution to get a descriptor value. The RDKit implementation originally used the published bin values, but at some point along the way (this was probably 9-10 years ago) we changed the bin definitions (by adding more bins) for SLOGP_VSA and SMR_VSA a bit to better represent the datasets we cared about at the time. You can see both the original bin definitions and the RDKit values here: https://sourceforge.net/p/rdkit/code/2416/tree/trunk/rdkit/Chem/MolSurf.py at lines 109 and 142. Because of these changes I definitely would not expect 100% agreement. I would also argue that it doesn't really matter. SLOGP, SMR, partial charges, and possible VSA are all "primary" descriptors: they have a more-or-less direct mapping to the real world and are somewhat interpretable. The XXX_VSA descriptors, on the other hand, are intended to be used to build predictive models. As such, there's no need for them to be consistent across programs; the only thing that really matters is whether or not they are useful in building models. In my experience at least, the RDKit XXX_VSA descriptor implementations are certainly useful for that. Make sense? -greg ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

