Am 17.04.2012 15:06, schrieb Alexandre Gramfort:
> what's killing me is that andy's plot shows that scale_C is the way to
> go so it's not just me. Also libsvm/liblinear bindings are the only
> models that have a regularization parameter that depends on the
> numbers of samples. Either we stick to libsvm and we have an
> inconsistent grid search + an inconsistent behavior across estimators
> or we go the clean way and we take the risk of having people reporting
> "SERIOUS BUGS"
>
> pick your side…
I am really really trying to find a good solution to this
(which was my motivation to do the graphs) - but
I am also getting a bit fed up with this issue.

I am really torn between the different possible solutions.


I agree that they show that scaling C seems better.

BUT: I would not agree with Gael that scale_C=False is broken.

Even with few samples, it is very hard to actually generate the problem.
You need to have a learning problem that is VERY sensitive to the value
of C and you need to have a difference in size of the validation set that
is larger than the tolerance you have for C.

In the plots, if you compare the max of the blue curve with the max of the
purple one, you are off by a factor of 4 with scale_C=False and off
by a factor of 2 with scale_C=True.


Could you maybe give the real-life example that failed?

Cheers,
Andy

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to