On Fri, Oct 7, 2011 at 12:31 PM,  <[email protected]> wrote:
>
> Dear RDKitters,
>
> I'm in the process of training a 3-class decision tree model. I have roughly
> about 1500 compounds with an almost equal distribution of the 3 classes.

<snip>

>
> In all cases, the statistics is really bad: about 50 percent are
> misclassified, e.g.:
> "
>         *** Vote Results ***
> misclassified: 580/1180 (%49.15)        580/1180 (%49.15)
>
> average correct confidence:    0.7837
> average incorrect confidence:  0.7528
> "
>
> Interestingly, there is a really small difference between the average
> confidence level for the correct as well as the incorrect classifications.
> As far as I got it this tells me that the model is really bad - an
> information I already got by the vote results themselves.
>
>
> Which parameters are worthhile to test?

We talked about this at the Knime OSD meeting already, but I think
it's worth repeating for the community: I believe that prediction of
hERG binding is too challenging for simple descriptors like the
physicochemical descriptors the RDKit provides or the standard Morgan
fingerprint. This is particularly true if you're trying to build a
three-class model (which is much more difficult than a two-class
model).

One suggestion would be to try doing a two class model (either combine
two of your classes together or use only classes 0 and 2 in the
training) and see if that helps. Another would be try using different
descriptors. You might be able to get something useful with the
FeatMorgan fingerprints (similar to the FCFP fingerprints).

-greg

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to