Hello,
I'm testing a few regression algorithms to map ndarrays of eigenvalues to
floats, using StratifieldKFolds + GridSearchCV for cross-validation &
hyperparameter estimation using some code borrowed from [1]. Although
GridSearchCV appears to be working as advertised (i.e. the
"best_estimator_" is much better than the baseline), it's giving a negative
"mean_score" for both the "mean_squared_error" metric and with my
manually-implemented RMS error function. Code & sample datasets are at [2].
Here's a trimmed sample output:
~/a/a/g/test ❯❯❯ python regression.py
...
Tuning hyper-parameters for mean_squared_error
Best parameters set found on development set:
SVR(C=100.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=0.0,
kernel=rbf, max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
Grid scores on development set:
...
-122.503 (+/-21.396) for {'epsilon': 0.01, 'C': 0.001, 'kernel': 'rbf'}
-122.503 (+/-21.396) for {'epsilon': 0.10000000000000001, 'C': 0.001,
'kernel': 'rbf'}
...
RMSE: 0.100012
MAE: 0.100012
RMSE: 0.099933
MAE: 0.099933
...
Currently I'm testing both 0.14.1 and Mathieu Blondel's "kernel_ridge"
branch with python 2.7.5 on arch linux. The above phenomenon happens with
both versions for SVR and (naturally) in the dev branch for KernelRidge. As
you can see, the exact same custom metric applied manually (lines 117-122)
gives appropriate, positive errors, whereas the one printed in lines
113-114 does not.
Why are these numbers negative? Am I missing something obvious here? I'm a
bit concerned about trusting these estimators (or rather, their optimality)
because of this oddity. A quick google search only came up with similar
"problems" with r2 (which aren't really problems, since r2 can be negative,
unlike "mean_squared_error").
Thanks!
[1] - http://scikit-learn.org/dev/_downloads/grid_search_digits.py
[2] - https://gist.github.com/anonymous/1af53a1da1357a6a97c3 (sorry for the
mangled code -- it's gone through a botched anonymization procedure)
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general