On Mon, Oct 03, 2011 at 06:16:37PM -0400, Satrajit Ghosh wrote: > when i used mean_square_error as score_func, it gave me p=.98, when i was > pretty positive i had a significant result. but that's because the lower > the value is in the distribution the better it is. this obviously reversed > when i used explained_variance, where things closer to 1 are better. > do you think stating that score_func should return a float between 0 and 1 > would be better or to state that if you have a score_func that ranges from > 0 to inf and whose lower bound is a better score, then interpret > significance as 1-p_value?
In the scikit, there is a convention that everything that is a 'score' is 'bigger is better'. The reason is that it enables black box optimizers to tune parameters or select models based on this score. I wouldn't like to enforce that it is bound between 0 and 1 because many scores used in real life are not bound. Also, in general, you cannot interpret a score (like explained_variance) as related to a p-value. I wouldn't try to have a too simple message by fitting all the metrics in the framework. I don't think that it can work: they test for different things. Gaƫl ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
