Hi Chris,

your observation is at least partially due to scaling differences between
the losses of the classifiers. Whereas `SGDRegressor` by construction puts
an extra 1/n_samples in front of your data fit term, `Ridge` does not. So
the penalties used will differ by at least a factor n_samples (see this
gist for a small example
<https://gist.github.com/eickenberg/79d360540a7c1c0cc953>).

The rest of the discrepancy may be due to noise. E.g. are the cross
validation scores using `SGDRegressor` significantly different from those
obtained by `Ridge`?


On Wed, Jun 11, 2014 at 4:02 AM, Chris Holdgraf <choldg...@berkeley.edu>
wrote:

> itHey all - I had a weird issue with my model fitting that I wanted to
> share in the event that I'm doing something stupid.
>
> I'm trying to fit a model with about 32 features x 20 time lags of those
> features (so ~650 features total). I have about 12919 samples, all of which
> are z-scored within features. Without a doubt, the data I'm dealing with is
> quite noisy and there's a fair amount of correlation between features.
>
> When I did a grid search for the best alpha, I noticed that it simply
> returned whichever alpha was largest. AKA, even if I included alphas up to
> ~1e7, those would have the highest score. When I looked at the score across
> alphas, this was confirmed. It was basically a monotonic increasing
> function of alpha (at least up to the point that I stopped looking).
>
> That's a bit weird to me, as usually I've found that the ridge parameter
> settles somewhere between 1e-4 and .1
>
> Has anyone experienced this before or have an intuition for why this is
> happening? My first thought was that the model is basically a horrible one,
> and so the grid search just gets whatever is closest to predicting the
> mean. That said, if I use an alternate method like SGD with an l2
> regularizer, then it chooses alpha values around .01, and the weights look
> more like what I'd expect.
>
> Anyone have thoughts on what is going on?
>
> Chris
>
>
> --
> _____________________________________
>
> PhD Candidate in Neuroscience | UC Berkeley <http://hwni.org/>
>  Editor and Web Master | Berkeley Science Review
> <http://sciencereview.berkeley.edu/>
> _____________________________________
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to