Dear all, > > Dear Paul, > > On Tue, Jun 7, 2011 at 4:54 PM, <[email protected]> wrote: > > > > Dear folks, > > > > finally, I updated the Wiki entry for the 3class model: > > http://code.google.com/p/rdkit/wiki/TrainAThreeClassSolubilityModel > > > > > > Do you have any explanation for the bad statistics? [see at the end "The > > output"] > > > > Of course, this is not a simple question. But maybe I did stupid mistake.. > > > > Aside from what looks like a mis-labeling of the data points (it looks > like you've labeled the high solubility points low and vice versa), I > don't see anything obviously wrong. I don't think that I would expect > a fingerprint-based model to be able to do a particularly good job of > being able to model solubility, so I'm not particularly surprised that > you're getting bad stats.
Regarding the classification, this should be correct. The solubility data is given in the molar range. Therefore, very low logS numbers (< -3) correspond to very insoluble compounds. Or am I wrong? > > I tried your dataset using a descriptor-based model based on the > example in this wiki page: > http://code.google.com/p/rdkit/wiki/BuildingModelsUsingDescriptors1 > The descriptor calculation stays the same, but I had to tell the > model-building code to use three classes, which involved changing the > line: > nPossible = [0]+[2]*ndescrs+[2] > to: > nPossible = [0]+[2]*ndescrs+[3] > and the call to ShowVoteResults from: > res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 2, 0, > errorEstimate=True) > to: > res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 3, 0, > errorEstimate=True) > > Doing that I get the following results: > > *** Vote Results *** > misclassified: 232/1025 (%22.63) 232/1025 (%22.63) > > average correct confidence: 0.9485 > average incorrect confidence: 0.8434 > > Results Table: > > 132 40 0 | 63.77 > 73 313 69 | 77.67 > 1 49 348 | 83.25 > ------- ------- ------- > 64.08 77.86 83.45 > The statistics looks indeed better. Thanks so far, Paul > Which shows that something something reasonable is happening. It's not > (IMO) a bad start for a solubility model, certainly from here one > could start doing parameter tuning to try and improve the model. > > -greg This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

