hi, 1/ I agree with Gael. When writing the maths you don't want to carry around at every line n_samples and for the sparse regression that produces papers with no n_samples scaling but implementations that do scale (e.g. R packages like glmnet for example)
2/ What you say Olivier is interesting and I don't really understand why this happens besides non-iid samples. Is there any theoretical reasons that would justify that a non-scaled regularization is valid under certain hypothesis on the data? 3/ David I feel that changing for "unicorn" or "alpha" where alpha = n_samples / C_libsvm would not solve the above problem as you would not have a way to use libsvm default C behavior. Although that would lead to less astonishment I agree... Alex On Thu, Mar 22, 2012 at 7:31 AM, Gael Varoquaux <[email protected]> wrote: > On Wed, Mar 21, 2012 at 08:09:26PM -0400, David Warde-Farley wrote: >> I think it's less about disagreeing with libsvm than disagreeing with the >> notation of every textbook presentation I know of. I agree that libsvm is no >> golden calf. > > But it is also the case for the lasso: the loss term is the sum of the > sample-level losses, and not the mean of these, (I just just in > Tibshirani's paper and the 'Elements of statistical learning') and no > library implements the lasso with a non-scaled version of the penalty. I > think that many textbooks are just using the simplest possible > formulation and no worrying about details like this one. There are many > important details that are never mentionned in textbooks. > >> > That said, I agree with James that the docs should be much more >> > explicit about what is going on, and how what we have differs from >> > libsvm. > >> I think that renaming sklearn's scaled version of "C" is probably a >> start. Using the name "C" for something other than what everyone else >> means by "C" violates the principle if least surprise quite severely. >> If they saw "zeta" or "Francis" or "unicorn", most people will not >> assume it is a moniker for C but refer to the documentation for an >> explanation. > > That might be a valid solution, also I don't think that it is as > important as you say due to my above point. > > Gael > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
