On 03/14/2012 09:16 PM, David Warde-Farley wrote: > On Wed, Mar 14, 2012 at 12:58:39PM -0700, Olivier Grisel wrote: > >> I am not asserting that faster models for a given accuracy level will >> always be the most regularized once (I am not sure this is even true, >> I am sure we can find counter examples). I am just asserting that >> faster parameter sets are a good way to arbitrarily break ties on >> predictive accuracy as the user will prefer using parameters that lead >> to faster models. > How often do you expect to need to break ties? It seems like a rather remote > possibility to me, at least in terms of a continuous objective function > evaluated on a validation set. > > Returning the "faster" one may be fine in the case of a true tie but it > doesn't really make a lot of sense to me. Assuming you're using grid search > to optimize hyperparameters using a train/valid split and intend to retrain > on union(train,valid) using the best set of hyperparameters, the speed of > fitting on "train" alone may not have anything to do with the speed of > fitting on the union. Not to mention that this selection criterion is > potentially subject to the whims of the system load on a shared machine, or > even other tasks that the user was running himself. > > If possible I'd make GridSearchCV return a *list* of hyperparameter sets that > give equivalent performance in the case of a tie, and let the user do > whatever they want with them. For simplicity, this might imply returning > a list of length one in the non-tied case... > > David >
I agree with David and most of the alternative approaches explained in this thread does not make much sense to me too (no offense): they can be correct in some cases and wrong in others. The list of parameters and scores is already available to the user of GridSearchCV so the creative user can play with it as she likes. My point is that if GridSearchCV needs a "simple" non-biased policy that returns a single classifier out of the grid search then it should be the one that chooses uniformly at random among equivalent parameters (and we can discuss more on the meaning of "equivalent" if you like :-) - see below). I don't know a more convincing rule that is both simple and effective, at least in my experience. On a more general level I would like to note that the cross-validated scores of GridSearchCV are just estimates of the true scores and they have their degree of uncertainty associated. For this reason even parameters having near-optimal score could be of interest - in principle. Of course setting up a convincing selection process that accounts for the uncertainty of the estimates and operates the optimal choice (in a decision theoretic sense) is not easy, if possible. Moreover the uncertainty associated with the cross-validation estimate is still an open problem [0]. But most probably this is out of the scope of sklearn. Best, Emanuele [0]: See for example http://hunch.net/?p=29 ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
