On 2012-03-21, at 7:25 PM, Gael Varoquaux <[email protected]> wrote:
> I'd like to stress that I don't think that following libsvm is much of a > goal per se. I understand that it make the life of someone like James > easier, because he knows libsvm well and can relate to it. I think it's less about disagreeing with libsvm than disagreeing with the notation of every textbook presentation I know of. I agree that libsvm is no golden calf. > libsvm and liblinear do not agree on whether multi-class should be done with > one versus rest or one versus one. <side rant> In particular, doing 1 vs rest for logistic regression seems like an odd choice when there is a perfectly good multiclass generalization of logistic regression. Mathieu clarified to me last night how liblinear is calculating "probabilities" in the multiclass case, and it seems insane to me, from a calibration perspective (normalizing a bunch of things by their sum does not make them probabilities in any meaningful sense!). > Actually, if we are going to debate about the exact value that the > parameter should take, let me tell you my point of view from an abstract, > user-centric aspect: it is meaningless that when I use logistic > regression, bigger C means less regularization, whereas when I use lasso, > bigger alpha means more regularzation. As someone who has spent a little > while doing statistical learning, understand the reasons behind this, but > it really a nuisance for non experts. Agreed. It *still is* a nuisance for this quasi-expert. ;) > I believe that the right choice is to have a ratio between the loss and > the penalization invariance on the number of samples. From a theoretical > perspective, I believe that this is the case because the loss is the > plugin estimate of a risk. Such estimate should not grow with the number > of sample. From a practical point of view, I believe that this is the > right choice because if I learn to set C on a dataset, and you give me a > new dataset saying it comes from the same source/feed, I should be able > to use the same C. In practice, the reason why Alex found this problem > was because on real life data he had difficulties setting C. > > That said, I agree with James that the docs should be much more > explicit about what is going on, and how what we have differs from > libsvm. I think that renaming sklearn's scaled version of "C" is probably a start. Using the name "C" for something other than what everyone else means by "C" violates the principle if least surprise quite severely. If they saw "zeta" or "Francis" or "unicorn", most people will not assume it is a moniker for C but refer to the documentation for an explanation. David ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
