Hi all,

I realized today that not all models scale the regularization
parameter (C or alpha)
with the number of samples so they minimize during fit a cost function
of the form:

1/n_samples \sum_i loss(x_i, y_i) + alpha \| ... \|_x

or

C/n_samples \sum_i loss(x_i, y_i) + \| ... \|_x

Apparently libsvm / liblinear and the corresponding models do not do it while
Lasso, LassoLars etc. do it.

See this gist:

https://gist.github.com/1357024

for an illustration.

To me this is wrong to not apply such a scaling by n_samples. To motivate this,
just look as the gist and you will see that if you don't do it then C
/ alpha needs
to be changed if you duplicate every sample. This is particularly problematic
with the cross-validation as you end up finding a C/alpha adapted to
the size of the
training folds rather than the full data. Think about the refit in GridSearchCV.

Let me know what you think but I feel we should fix this.

Alex

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to