On Tue, Apr 17, 2012 at 02:39:33PM +0200, Alexandre Gramfort wrote: > ok I give up… Let's move back to scale_C=None that spits a warning to > strongly suggest users to make their choice.
We could do it, but it's broken. Basically this choice would be accepting that in the small sample situation you and I (the happy few) know how to make it work, and the others can just go and read the docs (which they won't do, as we all know). I have a _strong_ sens of failure here. We are basically saying that my office neighbor (neuroscientists, doesn't read docs, doesn't care about our discussion) will continue doing what she does, picking a random C and not caring about it. I understand that when n_sample is large, as the SVM that has a number of support vector scaling as C, having C grow with the number of samples is a pragmatic choice that leads to good tradeoffs. It's non-sensical to have the scaling of the regularization parameter change just because the loss is changing. Our lasso parameter is independent from the number of samples. I'd strongly like to keep the scaling of C in the logistic regression at least. A solution that I'd really like to see, and would actually make me happy, is adding 'scale_params' in all linear models that would scale the regularization parameter by some natural scaling of the problem. For l1 penalties, it would be 'l1_min_C'. For l2 penalties, I have no intuition. I am suggestion 'scale_params', because it can be universal and not be specific to SVMs or any other model with different parameter names. In our experience, this choice is actually often a very good one when setting parameters empirically. Once we have this, we can suggest it in grid_search simply by check if 'scale_params' is in 'estimator.get_param_names'. One thing to keep in mind, is that I would like not to have both sets of argument 'scale_C' and 'scale_params'. I would be in favor of not scaling C by default and killing 'scale_C' if we have 'scale_params'. Now there is some work before we get there. First we have to figure out how C should scale for l2 penalties. Second, we have to write the PR. How would people feel about the proposal? Ideally in the long run the same strategy might be useable for other estimators. G ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
