Le 17 avril 2012 02:45, Gael Varoquaux <[email protected]> a écrit : > @scikit-learn developers: > > Hum... > http://www.flickr.com/photos/scriptingnews/3503448168/sizes/o/in/photostream/
hahaha > The situation is that the authors of libSVM have chosen a solution that > leads to inconsistent estimator with bad statistical properties, but > works well on many datasets. I think it is wrong, but then, I am worried > that this might be a battle that we might not win. > > On the one hand, we really cannot have C the way the libSVM guy have > defined it, because parameter setting by cross-validation will not work. > On the other hand, it is clear that people keep tripping over this > difference. Should we introduce a different name, that way it forces > people to read the docs? Or we could revert back to `scale_C=False` by default and let statistically consistent people turn it on explicitly when they need it (i.e. to do model selection in the low `n_samples` case). This way people who don't read the doc (the majority of the users) will not fall in the libsvm-gives-different-results trap and will have the tools to not fall in the statistical inconsistency trap if they make the effort to read the doc. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
