On Mon, Oct 03, 2011 at 06:16:37PM -0400, Satrajit Ghosh wrote:
>    when i used mean_square_error as score_func, it gave me p=.98, when i was
>    pretty positive i had a significant result. but that's because the lower
>    the value is in the distribution the better it is. this obviously reversed
>    when i used explained_variance, where things closer to 1 are better.
>    do you think stating that score_func should return a float between 0 and 1
>    would be better or to state that if you have a score_func that ranges from
>    0 to inf and whose lower bound is a better score, then interpret
>    significance as 1-p_value?

In the scikit, there is a convention that everything that is a 'score' is
'bigger is better'. The reason is that it enables black box optimizers to
tune parameters or select models based on this score. I wouldn't like to
enforce that it is bound between 0 and 1 because many scores used in real
life are not bound. Also, in general, you cannot interpret a score (like
explained_variance) as related to a p-value. I wouldn't try to have a too
simple message by fitting all the metrics in the framework. I don't think
that it can work: they test for different things.

Gaƫl

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to