On Wed, Mar 14, 2012 at 12:58:39PM -0700, Olivier Grisel wrote: > I am not asserting that faster models for a given accuracy level will > always be the most regularized once (I am not sure this is even true, > I am sure we can find counter examples). I am just asserting that > faster parameter sets are a good way to arbitrarily break ties on > predictive accuracy as the user will prefer using parameters that lead > to faster models.
How often do you expect to need to break ties? It seems like a rather remote possibility to me, at least in terms of a continuous objective function evaluated on a validation set. Returning the "faster" one may be fine in the case of a true tie but it doesn't really make a lot of sense to me. Assuming you're using grid search to optimize hyperparameters using a train/valid split and intend to retrain on union(train,valid) using the best set of hyperparameters, the speed of fitting on "train" alone may not have anything to do with the speed of fitting on the union. Not to mention that this selection criterion is potentially subject to the whims of the system load on a shared machine, or even other tasks that the user was running himself. If possible I'd make GridSearchCV return a *list* of hyperparameter sets that give equivalent performance in the case of a tie, and let the user do whatever they want with them. For simplicity, this might imply returning a list of length one in the non-tied case... David ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
