Am 17.04.2012 16:16, schrieb Gael Varoquaux: > On Tue, Apr 17, 2012 at 03:46:10PM +0200, Andreas Mueller wrote: >> I agree that they show that scaling C seems better. >> BUT: I would not agree with Gael that scale_C=False is broken. >> Even with few samples, it is very hard to actually generate the problem. >> You need to have a learning problem that is VERY sensitive to the value >> of C and you need to have a difference in size of the validation set that >> is larger than the tolerance you have for C. > For the logistic regression with l1 penalty, this happens very easily. > For the logistic regression with l1 penalty, it is not as bad. In > practice, in our problems, it seems that for SVMs, as long as C is 'big > enough', there is not catastrophic failure, though things might be > slightly suboptimal. I realize that the last sentence is in contradiction > with the comon wisdom in statistics that tells you that you are better > off with over-penalizing than under-penalizing. > > We found the problem using a logistic-l1 with n_samples ~ 200 and > n_features ~ 50000. > Thanks for your explanation. I would be curious how many samples you used in cross-validation and how many in training. > What do people think about my solution 'scale_params'? I thought that it > was a way to make everybody happy, but I don't seem to be getting > traction. Sorry, can't really concentrate right now. Traction -> later. > I'd like us to give a good thought to this problem, as I think that it can > be a recurrent pain. I'd actually be happy delaying the release a couple > weeks and reaching a solution that we believe actually solves the problem > for all classes of users (beginners and experts, large and small > n_samples). +1
Btw I feel it is somewhat of a problem to undo what was done in the current master, as I would guess some people are already working with that. ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
