Dear all,
>
> Dear Paul,
>
> On Tue, Jun 7, 2011 at 4:54 PM,  <[email protected]> wrote:
> >
> > Dear folks,
> >
> > finally, I updated the Wiki entry for the 3class model:
> > http://code.google.com/p/rdkit/wiki/TrainAThreeClassSolubilityModel
> >
> >
> > Do you have any explanation for the bad statistics? [see at the end
"The
> > output"]
> >
> > Of course, this is not a simple question. But maybe I did stupid
mistake..
> >
>
> Aside from what looks like a mis-labeling of the data points (it looks
> like you've labeled the high solubility points low and vice versa), I
> don't see anything obviously wrong. I don't think that I would expect
> a fingerprint-based model to be able to do a particularly good job of
> being able to model solubility, so I'm not particularly surprised that
> you're getting bad stats.

Regarding the classification, this should be correct. The solubility data
is given in the molar range. Therefore, very low logS numbers (< -3)
correspond to very insoluble compounds.
Or am I wrong?


>
> I tried your dataset using a descriptor-based model based on the
> example in this wiki page:
> http://code.google.com/p/rdkit/wiki/BuildingModelsUsingDescriptors1
> The descriptor calculation stays the same, but I had to tell the
> model-building code to use three classes, which involved changing the
> line:
> nPossible = [0]+[2]*ndescrs+[2]
> to:
> nPossible = [0]+[2]*ndescrs+[3]
> and the call to ShowVoteResults from:
> res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 2, 0,
> errorEstimate=True)
> to:
> res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 3, 0,
> errorEstimate=True)
>
> Doing that I get the following results:
>
>    *** Vote Results ***
> misclassified: 232/1025 (%22.63)   232/1025 (%22.63)
>
> average correct confidence:    0.9485
> average incorrect confidence:  0.8434
>
>    Results Table:
>
>          132      40       0      |  63.77
>           73     313      69      |  77.67
>            1      49     348      |  83.25
>      ------- ------- -------
>        64.08   77.86   83.45
>

The statistics looks indeed better.


Thanks so far,

Paul



> Which shows that something something reasonable is happening. It's not
> (IMO) a bad start for a solubility model, certainly from here one
> could start doing parameter tuning to try and improve the model.
>
> -greg

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://disclaimer.merck.de to access the German, French, Spanish and
Portuguese versions of this disclaimer.


------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to