Le 17 avril 2012 02:45, Gael Varoquaux <[email protected]> a écrit :
> @scikit-learn developers:
>
> Hum...
> http://www.flickr.com/photos/scriptingnews/3503448168/sizes/o/in/photostream/

hahaha

> The situation is that the authors of libSVM have chosen a solution that
> leads to inconsistent estimator with bad statistical properties, but
> works well on many datasets. I think it is wrong, but then, I am worried
> that this might be a battle that we might not win.
>
> On the one hand, we really cannot have C the way the libSVM guy have
> defined it, because parameter setting by cross-validation will not work.
> On the other hand, it is clear that people keep tripping over this
> difference. Should we introduce a different name, that way it forces
> people to read the docs?

Or we could revert back to `scale_C=False` by default and let
statistically consistent people turn it on explicitly when they need
it (i.e. to do model selection in the low `n_samples` case).

This way people who don't read the doc (the majority of the users)
will not fall in the libsvm-gives-different-results trap and will have
the tools to not fall in the statistical inconsistency trap if they
make the effort to read the doc.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to