Dear Greg,
Dear RDKitters,
> >
> > how do the MOE and RDKit implementations of VSA descriptors correlate?
> >
>
> I've never checked, so I'm afraid I don't know. At the time we
> implemented the descriptors, we didn't have access to MOE.
I could run a comparison between MOE and RDKit and place the results in
the wiki. However, if one considers your statements from below, then it
may be something like a "Fleissarbeit" (don't get the translation to
English - it's like hard work without much intellectual challenge), but of
not that much value for the community.
> > First question:
> > A partial charge calculation is not necessary, or am I wrong?
>
> Partial charges are needed for the PEOE_VSA descriptors, but the RDKit
> will generate the charges when you call the function if you haven't
> done so already.
What a relief! After building dozens of models, I started to actually look
into the descriptor and feared that all models are wrong, since charge
assignment was necessary.
> > Second question:
> > Am I comparing apples with peas, or is the above case just a bad
example?
>
> A bit of background before I answer this one:
> These descriptors each combine a VSA calculation with another
> descriptor value; mollogp, molmr, or a partial charge.
> When we implemented each of those methods we tried to follow the
> original publication as closely as we could. Most of these things
> don't have large test sets available to use to validate new
> implementations, and we didn't have access to any reference
> implementations, so I can't be sure how well we did. At this point
> others have compared the RDKit MolLogP and MolMR implementations with
> the MOE implementation and reported that it compares quite well (after
> some fixing, of course), so I have some confidence in that. The VSA
> and Gasteiger-Marsilli partial charge implementations, on the other
> hand, have never had this treatment. I'm not even sure what would
> qualify as a reference implementation of the partial charge algorithm,
> but I've never done (not has anyone else told me about doing) the
> experiment with MOE for VSA.
>
> The SLOGP_VSA, SMR_VSA, and PEOE_VSA descriptors combine a binned form
> of either MolLogP, MolMR, or the partial charge with a VSA
> contribution to get a descriptor value. The RDKit implementation
> originally used the published bin values, but at some point along the
> way (this was probably 9-10 years ago) we changed the bin definitions
> (by adding more bins) for SLOGP_VSA and SMR_VSA a bit to better
> represent the datasets we cared about at the time. You can see both
> the original bin definitions and the RDKit values here:
>
https://sourceforge.net/p/rdkit/code/2416/tree/trunk/rdkit/Chem/MolSurf.py
> at lines 109 and 142.
>
> Because of these changes I definitely would not expect 100% agreement.
>
> I would also argue that it doesn't really matter. SLOGP, SMR, partial
> charges, and possible VSA are all "primary" descriptors: they have a
> more-or-less direct mapping to the real world and are somewhat
> interpretable. The XXX_VSA descriptors, on the other hand, are
> intended to be used to build predictive models. As such, there's no
> need for them to be consistent across programs; the only thing that
> really matters is whether or not they are useful in building models.
> In my experience at least, the RDKit XXX_VSA descriptor
> implementations are certainly useful for that.
>
> Make sense?
Thanks for this comprehensive answer!
What I'm actually looking for are interpretable descriptors before/after
building QSAR models. For that purpose, I started to read a little bit the
RDKit as well as the MOE documentation and stumbled on the different
treshold settings. Of course, my first thought was "Upps, what's that
difference?", but given your explanation, I feel much more comfortable
now.
If I submit a paper in the near future, reviewers might give me a hard
time, but (1) you never know, maybe Greg is reviewing and (2) why should
be MOE more "correct" than RDKit?
To sum it up: By reading you email, I learnt a lot about RDKit's
descriptor world & history, and I'm eager to share my experiences.
Once again, Greg, thanks for your elaborate post!
Cheers & Thanks,
Paul
> -greg
This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient, you
must not copy this message or attachment or disclose the contents to any other
person. If you have received this transmission in error, please notify the
sender immediately and delete the message and any attachment from your system.
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept
liability for any omissions or errors in this message which may arise as a
result of E-Mail-transmission or for damages resulting from any unauthorized
changes of the content of this message and any attachment thereto. Merck KGaA,
Darmstadt, Germany and any of its subsidiaries do not guarantee that this
message is free of viruses and does not accept liability for any damages caused
by any virus transmitted therewith.
Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss