I think the answer is that the results of RMSE are (somewhat
counterintuitively) always negative when reporting the results of
GridSearchCV.
The "greater_is_better" actually just flips the sign.
The reason it does that is that GridSearchCV always tries to maximize
the score.
I am not sure this is documented anywhere and I am not entirely happy,
but other solutions were a bit overly complex.
We are still working on the reporting of GridSearchCV results.
On 10/22/2013 08:48 PM, Ralf Gunter wrote:
Hi Andreas,
I'm not sure what you mean by "more comprehensive"; the gist on the
first message should reproduce the problem -- if not, then it might be
something on my local configuration (python, numpy, etc). The script
is exactly the same one I'm using in "production", just with a much
bigger dataset. Please let me know what kind of extra information
you're looking for.
Thanks!
2013/10/22 Andreas Mueller <[email protected]
<mailto:[email protected]>>
Hi Ralf.
Can you give a more comprehensive gist maybe? https://gist.github.com/
My first intuition would be that you are in fact using the r2
score, not the MSE, when outputting these numbers.
Cheers,
Andy
On 10/22/2013 07:20 PM, Ralf Gunter wrote:
Hello,
I'm testing a few regression algorithms to map ndarrays of
eigenvalues to floats, using StratifieldKFolds + GridSearchCV for
cross-validation & hyperparameter estimation using some code
borrowed from [1]. Although GridSearchCV appears to be working as
advertised (i.e. the "best_estimator_" is much better than the
baseline), it's giving a negative "mean_score" for both the
"mean_squared_error" metric and with my manually-implemented RMS
error function. Code & sample datasets are at [2]. Here's a
trimmed sample output:
~/a/a/g/test ??? python regression.py
...
Tuning hyper-parameters for mean_squared_error
Best parameters set found on development set:
SVR(C=100.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
gamma=0.0,
kernel=rbf, max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
Grid scores on development set:
...
-122.503 (+/-21.396) for {'epsilon': 0.01, 'C': 0.001,
'kernel': 'rbf'}
-122.503 (+/-21.396) for {'epsilon': 0.10000000000000001, 'C':
0.001, 'kernel': 'rbf'}
...
RMSE: 0.100012
MAE: 0.100012
RMSE: 0.099933
MAE: 0.099933
...
Currently I'm testing both 0.14.1 and Mathieu Blondel's
"kernel_ridge" branch with python 2.7.5 on arch linux. The above
phenomenon happens with both versions for SVR and (naturally) in
the dev branch for KernelRidge. As you can see, the exact same
custom metric applied manually (lines 117-122) gives appropriate,
positive errors, whereas the one printed in lines 113-114 does not.
Why are these numbers negative? Am I missing something obvious
here? I'm a bit concerned about trusting these estimators (or
rather, their optimality) because of this oddity. A quick google
search only came up with similar "problems" with r2 (which aren't
really problems, since r2 can be negative, unlike
"mean_squared_error").
Thanks!
[1] - http://scikit-learn.org/dev/_downloads/grid_search_digits.py
[2] -
https://gist.github.com/anonymous/1af53a1da1357a6a97c3 (sorry for
the mangled code -- it's gone through a botched anonymization
procedure)
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get
the most from
the latest Intel processors and coprocessors. See abstracts and
register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general