On Fri, Oct 7, 2011 at 12:31 PM, <[email protected]> wrote: > > Dear RDKitters, > > I'm in the process of training a 3-class decision tree model. I have roughly > about 1500 compounds with an almost equal distribution of the 3 classes.
<snip> > > In all cases, the statistics is really bad: about 50 percent are > misclassified, e.g.: > " > *** Vote Results *** > misclassified: 580/1180 (%49.15) 580/1180 (%49.15) > > average correct confidence: 0.7837 > average incorrect confidence: 0.7528 > " > > Interestingly, there is a really small difference between the average > confidence level for the correct as well as the incorrect classifications. > As far as I got it this tells me that the model is really bad - an > information I already got by the vote results themselves. > > > Which parameters are worthhile to test? We talked about this at the Knime OSD meeting already, but I think it's worth repeating for the community: I believe that prediction of hERG binding is too challenging for simple descriptors like the physicochemical descriptors the RDKit provides or the standard Morgan fingerprint. This is particularly true if you're trying to build a three-class model (which is much more difficult than a two-class model). One suggestion would be to try doing a two class model (either combine two of your classes together or use only classes 0 and 2 in the training) and see if that helps. Another would be try using different descriptors. You might be able to get something useful with the FeatMorgan fingerprints (similar to the FCFP fingerprints). -greg ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

