[adding the mailing list back to this] Dear Paul,
On Tue, Jun 7, 2011 at 4:54 PM, <[email protected]> wrote: > > Dear folks, > > finally, I updated the Wiki entry for the 3class model: > http://code.google.com/p/rdkit/wiki/TrainAThreeClassSolubilityModel > > > Do you have any explanation for the bad statistics? [see at the end "The > output"] > > Of course, this is not a simple question. But maybe I did stupid mistake.. > Aside from what looks like a mis-labeling of the data points (it looks like you've labeled the high solubility points low and vice versa), I don't see anything obviously wrong. I don't think that I would expect a fingerprint-based model to be able to do a particularly good job of being able to model solubility, so I'm not particularly surprised that you're getting bad stats. I tried your dataset using a descriptor-based model based on the example in this wiki page: http://code.google.com/p/rdkit/wiki/BuildingModelsUsingDescriptors1 The descriptor calculation stays the same, but I had to tell the model-building code to use three classes, which involved changing the line: nPossible = [0]+[2]*ndescrs+[2] to: nPossible = [0]+[2]*ndescrs+[3] and the call to ShowVoteResults from: res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 2, 0, errorEstimate=True) to: res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 3, 0, errorEstimate=True) Doing that I get the following results: *** Vote Results *** misclassified: 232/1025 (%22.63) 232/1025 (%22.63) average correct confidence: 0.9485 average incorrect confidence: 0.8434 Results Table: 132 40 0 | 63.77 73 313 69 | 77.67 1 49 348 | 83.25 ------- ------- ------- 64.08 77.86 83.45 Which shows that something something reasonable is happening. It's not (IMO) a bad start for a solubility model, certainly from here one could start doing parameter tuning to try and improve the model. -greg ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

