Hi all, I realized today that not all models scale the regularization parameter (C or alpha) with the number of samples so they minimize during fit a cost function of the form:
1/n_samples \sum_i loss(x_i, y_i) + alpha \| ... \|_x or C/n_samples \sum_i loss(x_i, y_i) + \| ... \|_x Apparently libsvm / liblinear and the corresponding models do not do it while Lasso, LassoLars etc. do it. See this gist: https://gist.github.com/1357024 for an illustration. To me this is wrong to not apply such a scaling by n_samples. To motivate this, just look as the gist and you will see that if you don't do it then C / alpha needs to be changed if you duplicate every sample. This is particularly problematic with the cross-validation as you end up finding a C/alpha adapted to the size of the training folds rather than the full data. Think about the refit in GridSearchCV. Let me know what you think but I feel we should fix this. Alex ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
