Hi Ralf.
Can you give a more comprehensive gist maybe? https://gist.github.com/
My first intuition would be that you are in fact using the r2 score, not
the MSE, when outputting these numbers.
Cheers,
Andy
On 10/22/2013 07:20 PM, Ralf Gunter wrote:
Hello,
I'm testing a few regression algorithms to map ndarrays of eigenvalues
to floats, using StratifieldKFolds + GridSearchCV for cross-validation
& hyperparameter estimation using some code borrowed from [1].
Although GridSearchCV appears to be working as advertised (i.e. the
"best_estimator_" is much better than the baseline), it's giving a
negative "mean_score" for both the "mean_squared_error" metric and
with my manually-implemented RMS error function. Code & sample
datasets are at [2]. Here's a trimmed sample output:
~/a/a/g/test ??? python regression.py
...
Tuning hyper-parameters for mean_squared_error
Best parameters set found on development set:
SVR(C=100.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
gamma=0.0,
kernel=rbf, max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
Grid scores on development set:
...
-122.503 (+/-21.396) for {'epsilon': 0.01, 'C': 0.001, 'kernel': 'rbf'}
-122.503 (+/-21.396) for {'epsilon': 0.10000000000000001, 'C':
0.001, 'kernel': 'rbf'}
...
RMSE: 0.100012
MAE: 0.100012
RMSE: 0.099933
MAE: 0.099933
...
Currently I'm testing both 0.14.1 and Mathieu Blondel's "kernel_ridge"
branch with python 2.7.5 on arch linux. The above phenomenon happens
with both versions for SVR and (naturally) in the dev branch for
KernelRidge. As you can see, the exact same custom metric applied
manually (lines 117-122) gives appropriate, positive errors, whereas
the one printed in lines 113-114 does not.
Why are these numbers negative? Am I missing something obvious here?
I'm a bit concerned about trusting these estimators (or rather, their
optimality) because of this oddity. A quick google search only came up
with similar "problems" with r2 (which aren't really problems, since
r2 can be negative, unlike "mean_squared_error").
Thanks!
[1] - http://scikit-learn.org/dev/_downloads/grid_search_digits.py
[2] - https://gist.github.com/anonymous/1af53a1da1357a6a97c3 (sorry
for the mangled code -- it's gone through a botched anonymization
procedure)
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general