what would be the semantic of scale_params? shall we touch every estimator or assume scale_params=True if not present as attribute?
Alex On Tue, Apr 17, 2012 at 4:16 PM, Gael Varoquaux <[email protected]> wrote: > On Tue, Apr 17, 2012 at 03:46:10PM +0200, Andreas Mueller wrote: >> I agree that they show that scaling C seems better. > >> BUT: I would not agree with Gael that scale_C=False is broken. > >> Even with few samples, it is very hard to actually generate the problem. >> You need to have a learning problem that is VERY sensitive to the value >> of C and you need to have a difference in size of the validation set that >> is larger than the tolerance you have for C. > > For the logistic regression with l1 penalty, this happens very easily. > For the logistic regression with l1 penalty, it is not as bad. In > practice, in our problems, it seems that for SVMs, as long as C is 'big > enough', there is not catastrophic failure, though things might be > slightly suboptimal. I realize that the last sentence is in contradiction > with the comon wisdom in statistics that tells you that you are better > off with over-penalizing than under-penalizing. > > We found the problem using a logistic-l1 with n_samples ~ 200 and > n_features ~ 50000. > > What do people think about my solution 'scale_params'? I thought that it > was a way to make everybody happy, but I don't seem to be getting > traction. > > I'd like us to give a good thought to this problem, as I think that it can > be a recurrent pain. I'd actually be happy delaying the release a couple > weeks and reaching a solution that we believe actually solves the problem > for all classes of users (beginners and experts, large and small > n_samples). > > Gaƫl > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
