Ah yes, that makes sense. I've found basically two differences between
Ridge and SGD:
1. The model fits are a bit worse with SGD. But the weights are far more
interpretable and look like there's some "structure" in them.
2. The model fits with SGD can be improved to those of Ridge regression if
I let them fit for much longer (again, this is mean scores across CV sets).
However, in this case, the weights will become totally noisy-looking.
So it's a bit strange to me that the model performance on a held out test
set will still *increase *even though the interpretability of the weights
starts to look like total noise (which would make me think it's overfitting
except for the increase in test performance).
It seems like the analytical Ridge solution, even with regularizers, is
still overfitting the data, and that if I stop the SGD early enough then
it's able to tease out the signal from the noise (though again, those cv
scores are strange to me).
by the way, +1 if anyone is still thinking of coding in early stopping to
SGD.
Chris
> ---------- Forwarded message ----------
> From: Michael Eickenberg <michael.eickenb...@gmail.com>
> To: "scikit-learn-general@lists.sourceforge.net" <
> scikit-learn-general@lists.sourceforge.net>
> Cc:
> Date: Wed, 11 Jun 2014 14:35:59 +0200
> Subject: Re: [Scikit-learn-general] Ridge regression only working with
> huge alpha values?
> Hi Chris,
>
> your observation is at least partially due to scaling differences between
> the losses of the classifiers. Whereas `SGDRegressor` by construction puts
> an extra 1/n_samples in front of your data fit term, `Ridge` does not. So
> the penalties used will differ by at least a factor n_samples (see this
> gist for a small example
> <https://gist.github.com/eickenberg/79d360540a7c1c0cc953>).
>
> The rest of the discrepancy may be due to noise. E.g. are the cross
> validation scores using `SGDRegressor` significantly different from those
> obtained by `Ridge`?
>
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general