On Tue, Apr 17, 2012 at 02:39:33PM +0200, Alexandre Gramfort wrote:
> ok I give up… Let's move back to scale_C=None that spits a warning to
> strongly suggest users to make their choice.

We could do it, but it's broken. Basically this choice would be accepting
that in the small sample situation you and I (the happy few) know how to
make it work, and the others can just go and read the docs (which they
won't do, as we all know).

I have a _strong_ sens of failure here. We are basically saying that my
office neighbor (neuroscientists, doesn't read docs, doesn't care about
our discussion) will continue doing what she does, picking a random C and
not caring about it.

I understand that when n_sample is large, as the SVM that has a number of
support vector scaling as C, having C grow with the number of samples is
a pragmatic choice that leads to good tradeoffs. 

It's non-sensical to have the scaling of the regularization parameter
change just because the loss is changing. Our lasso parameter is
independent from the number of samples. I'd strongly like to keep the
scaling of C in the logistic regression at least. 

A solution that I'd really like to see, and would actually make me happy,
is adding 'scale_params' in all linear models that would scale the
regularization parameter by some natural scaling of the problem. For l1
penalties, it would be 'l1_min_C'. For l2 penalties, I have no intuition.
I am suggestion 'scale_params', because it can be universal and not be
specific to SVMs or any other model with different parameter names.

In our experience, this choice is actually often a very good one when
setting parameters empirically.

Once we have this, we can suggest it in grid_search simply by check if
'scale_params' is in 'estimator.get_param_names'.

One thing to keep in mind, is that I would like not to have both sets of
argument 'scale_C' and 'scale_params'. I would be in favor of not scaling
C by default and killing 'scale_C' if we have 'scale_params'.

Now there is some work before we get there. First we have to figure out
how C should scale for l2 penalties. Second, we have to write the PR.

How would people feel about the proposal? Ideally in the long run the
same strategy might be useable for other estimators.

G

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to