Am 17.04.2012 15:06, schrieb Alexandre Gramfort: > what's killing me is that andy's plot shows that scale_C is the way to > go so it's not just me. Also libsvm/liblinear bindings are the only > models that have a regularization parameter that depends on the > numbers of samples. Either we stick to libsvm and we have an > inconsistent grid search + an inconsistent behavior across estimators > or we go the clean way and we take the risk of having people reporting > "SERIOUS BUGS" > > pick your side⦠I am really really trying to find a good solution to this (which was my motivation to do the graphs) - but I am also getting a bit fed up with this issue.
I am really torn between the different possible solutions. I agree that they show that scaling C seems better. BUT: I would not agree with Gael that scale_C=False is broken. Even with few samples, it is very hard to actually generate the problem. You need to have a learning problem that is VERY sensitive to the value of C and you need to have a difference in size of the validation set that is larger than the tolerance you have for C. In the plots, if you compare the max of the blue curve with the max of the purple one, you are off by a factor of 4 with scale_C=False and off by a factor of 2 with scale_C=True. Could you maybe give the real-life example that failed? Cheers, Andy ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
