hi,

1/ I agree with Gael. When writing the maths you don't want to carry
around at every line n_samples and for the sparse regression that
produces papers with no n_samples scaling but implementations that do
scale (e.g. R packages like glmnet for example)

2/ What you say Olivier is interesting and I don't really understand
why this happens besides non-iid samples. Is there any theoretical
reasons that would justify that a non-scaled regularization is valid
under certain hypothesis on the data?

3/ David I feel that changing for "unicorn" or "alpha" where alpha =
n_samples / C_libsvm would not solve the above problem as you would
not have a way to use libsvm default C behavior. Although that would
lead to less astonishment I agree...

Alex

On Thu, Mar 22, 2012 at 7:31 AM, Gael Varoquaux
<[email protected]> wrote:
> On Wed, Mar 21, 2012 at 08:09:26PM -0400, David Warde-Farley wrote:
>> I think it's less about disagreeing with libsvm than disagreeing with the 
>> notation of every textbook presentation I know of. I agree that libsvm is no 
>> golden calf.
>
> But it is also the case for the lasso: the loss term is the sum of the
> sample-level losses, and not the mean of these, (I just just in
> Tibshirani's paper and the 'Elements of statistical learning') and no
> library implements the lasso with a non-scaled version of the penalty. I
> think that many textbooks are just using the simplest possible
> formulation and no worrying about details like this one. There are many
> important details that are never mentionned in textbooks.
>
>> > That said, I agree with James that the docs should be much more
>> > explicit about what is going on, and how what we have differs from
>> > libsvm.
>
>> I think that renaming sklearn's scaled version of "C" is probably a
>> start. Using the name "C" for something other than what everyone else
>> means by "C" violates the principle if least surprise quite severely.
>> If they saw "zeta" or "Francis" or "unicorn", most people will not
>> assume it is a moniker for C but refer to the documentation for an
>> explanation.
>
> That might be a valid solution, also I don't think that it is as
> important as you say due to my above point.
>
> Gael
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to