On Wed, Jun 8, 2011 at 1:35 PM, <[email protected]> wrote:
>>
>> Aside from what looks like a mis-labeling of the data points (it looks
>> like you've labeled the high solubility points low and vice versa), I
>> don't see anything obviously wrong. I don't think that I would expect
>> a fingerprint-based model to be able to do a particularly good job of
>> being able to model solubility, so I'm not particularly surprised that
>> you're getting bad stats.
>
> Regarding the classification, this should be correct. The solubility data
> is given in the molar range. Therefore, very low logS numbers (< -3)
> correspond to very insoluble compounds.
> Or am I wrong?
There's just a disconnect in the text on the wiki page, where you say
that there are 417 compounds in class A (low):
(A) low 417 compounds
(B) medium 402 compounds
(C) high 206 compounds
and the confusion matrix at the bottom that shows 417 compounds in class C:
calc
(A) (B) (C)
e (A) 4 16 186
x (B) 0 100 302
p (C) 0 23 394
-greg
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss