[adding the mailing list back to this]

Dear Paul,

On Tue, Jun 7, 2011 at 4:54 PM,  <[email protected]> wrote:
>
> Dear folks,
>
> finally, I updated the Wiki entry for the 3class model:
> http://code.google.com/p/rdkit/wiki/TrainAThreeClassSolubilityModel
>
>
> Do you have any explanation for the bad statistics? [see at the end "The
> output"]
>
> Of course, this is not a simple question. But maybe I did stupid mistake..
>

Aside from what looks like a mis-labeling of the data points (it looks
like you've labeled the high solubility points low and vice versa), I
don't see anything obviously wrong. I don't think that I would expect
a fingerprint-based model to be able to do a particularly good job of
being able to model solubility, so I'm not particularly surprised that
you're getting bad stats.

I tried your dataset using a descriptor-based model based on the
example in this wiki page:
http://code.google.com/p/rdkit/wiki/BuildingModelsUsingDescriptors1
The descriptor calculation stays the same, but I had to tell the
model-building code to use three classes, which involved changing the
line:
nPossible = [0]+[2]*ndescrs+[2]
to:
nPossible = [0]+[2]*ndescrs+[3]
and the call to ShowVoteResults from:
res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 2, 0,
errorEstimate=True)
to:
res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 3, 0,
errorEstimate=True)

Doing that I get the following results:

        *** Vote Results ***
misclassified: 232/1025 (%22.63)        232/1025 (%22.63)

average correct confidence:    0.9485
average incorrect confidence:  0.8434

        Results Table:

         132      40       0      |  63.77
          73     313      69      |  77.67
           1      49     348      |  83.25
     ------- ------- -------
       64.08   77.86   83.45

Which shows that something something reasonable is happening. It's not
(IMO) a bad start for a solubility model, certainly from here one
could start doing parameter tuning to try and improve the model.

-greg

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to