On Wed, Jun 8, 2011 at 1:35 PM,  <[email protected]> wrote:
>>
>> Aside from what looks like a mis-labeling of the data points (it looks
>> like you've labeled the high solubility points low and vice versa), I
>> don't see anything obviously wrong. I don't think that I would expect
>> a fingerprint-based model to be able to do a particularly good job of
>> being able to model solubility, so I'm not particularly surprised that
>> you're getting bad stats.
>
> Regarding the classification, this should be correct. The solubility data
> is given in the molar range. Therefore, very low logS numbers (< -3)
> correspond to very insoluble compounds.
> Or am I wrong?

There's just a disconnect in the text on the wiki page, where you say
that there are 417 compounds in class A (low):

(A) low     417 compounds
(B) medium  402 compounds
(C) high    206 compounds

 and the confusion matrix at the bottom that shows 417 compounds in class C:

    calc
     (A) (B)  (C)
e (A)  4  16  186
x (B)  0 100  302
p (C)  0  23  394

-greg

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to