Hi Andreas,

I'm not sure what you mean by "more comprehensive"; the gist on the first
message should reproduce the problem -- if not, then it might be something
on my local configuration (python, numpy, etc). The script is exactly the
same one I'm using in "production", just with a much bigger dataset. Please
let me know what kind of extra information you're looking for.

Thanks!


2013/10/22 Andreas Mueller <[email protected]>

>  Hi Ralf.
>
> Can you give a more comprehensive gist maybe? https://gist.github.com/
> My first intuition would be that you are in fact using the r2 score, not
> the MSE, when outputting these numbers.
>
> Cheers,
> Andy
>
>
>
> On 10/22/2013 07:20 PM, Ralf Gunter wrote:
>
> Hello,
>
>  I'm testing a few regression algorithms to map ndarrays of eigenvalues
> to floats, using StratifieldKFolds + GridSearchCV for cross-validation &
> hyperparameter estimation using some code borrowed from [1]. Although
> GridSearchCV appears to be working as advertised (i.e. the
> "best_estimator_" is much better than the baseline), it's giving a negative
> "mean_score" for both the "mean_squared_error" metric and with my
> manually-implemented RMS error function. Code & sample datasets are at [2].
> Here's a trimmed sample output:
>
>
>    ~/a/a/g/test ❯❯❯ python regression.py
>   ...
>   Tuning hyper-parameters for mean_squared_error
>
>   Best parameters set found on development set:
>
>   SVR(C=100.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma=0.0,
>     kernel=rbf, max_iter=-1, probability=False, random_state=None,
>       shrinking=True, tol=0.001, verbose=False)
>
>   Grid scores on development set:
>
>   ...
>   -122.503 (+/-21.396) for {'epsilon': 0.01, 'C': 0.001, 'kernel': 'rbf'}
>   -122.503 (+/-21.396) for {'epsilon': 0.10000000000000001, 'C': 0.001,
> 'kernel': 'rbf'}
>   ...
>   RMSE: 0.100012
>   MAE:  0.100012
>   RMSE: 0.099933
>   MAE:  0.099933
>   ...
>
>
>  Currently I'm testing both 0.14.1 and Mathieu Blondel's "kernel_ridge"
> branch with python 2.7.5 on arch linux. The above phenomenon happens with
> both versions for SVR and (naturally) in the dev branch for KernelRidge. As
> you can see, the exact same custom metric applied manually (lines 117-122)
> gives appropriate, positive errors, whereas the one printed in lines
> 113-114 does not.
>
>  Why are these numbers negative? Am I missing something obvious here? I'm
> a bit concerned about trusting these estimators (or rather, their
> optimality) because of this oddity. A quick google search only came up with
> similar "problems" with r2 (which aren't really problems, since r2 can be
> negative, unlike "mean_squared_error").
>
>  Thanks!
>
>  [1] - http://scikit-learn.org/dev/_downloads/grid_search_digits.py
> [2] - https://gist.github.com/anonymous/1af53a1da1357a6a97c3 (sorry for
> the mangled code -- it's gone through a botched anonymization procedure)
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register 
> >http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to