what's killing me is that andy's plot shows that scale_C is the way to go so it's not just me. Also libsvm/liblinear bindings are the only models that have a regularization parameter that depends on the numbers of samples. Either we stick to libsvm and we have an inconsistent grid search + an inconsistent behavior across estimators or we go the clean way and we take the risk of having people reporting "SERIOUS BUGS"
pick your side… Alex On Tue, Apr 17, 2012 at 3:00 PM, Gael Varoquaux <[email protected]> wrote: > On Tue, Apr 17, 2012 at 02:56:13PM +0200, Lars Buitinck wrote: >> >> > This way people who don't read the doc (the majority of the users) >> >> > will not fall in the libsvm-gives-different-results trap and will have >> >> > the tools to not fall in the statistical inconsistency trap if they >> >> > make the effort to read the doc. > >> >> + .5 > >> > +1 > >> +1 > > It seems to me that we are hearing here the people with large number of > samples who do not have the problems that scale_C=False creates saying > that they prefer this default choice. > > :(. Basically the impression that I have is that either choice we take, > we are breaking the library for a set of users. > >> > And we could add a warning in grid_search.py: > >> > if not getattr(clf, "scale_C", True): >> > warning.warning("scale_C=False is not recommended when using grid >> > search: see http:// for a discussion") > >> I'm not very fond of adding estimator-specific heuristics to >> general-purpose modules... > > I agree. This is a clearly a code smell, telling us that something is > wrong with our objects: they are unable to abstract out enough the > details of the model. > > G > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
