On 03/14/2012 09:16 PM, David Warde-Farley wrote:
> On Wed, Mar 14, 2012 at 12:58:39PM -0700, Olivier Grisel wrote:
>
>> I am not asserting that faster models for a given accuracy level will
>> always be the most regularized once (I am not sure this is even true,
>> I am sure we can find counter examples). I am just asserting that
>> faster parameter sets are a good way to arbitrarily break ties on
>> predictive accuracy as the user will prefer using parameters that lead
>> to faster models.
> How often do you expect to need to break ties? It seems like a rather remote
> possibility to me, at least in terms of a continuous objective function
> evaluated on a validation set.
>
> Returning the "faster" one may be fine in the case of a true tie but it
> doesn't really make a lot of sense to me. Assuming you're using grid search
> to optimize hyperparameters using a train/valid split and intend to retrain
> on union(train,valid) using the best set of hyperparameters, the speed of
> fitting on "train" alone may not have anything to do with the speed of
> fitting on the union.  Not to mention that this selection criterion is
> potentially subject to the whims of the system load on a shared machine, or
> even other tasks that the user was running himself.
>
> If possible I'd make GridSearchCV return a *list* of hyperparameter sets that
> give equivalent performance in the case of a tie, and let the user do
> whatever they want with them. For simplicity, this might imply returning
> a list of length one in the non-tied case...
>
> David
>

I agree with David and most of the alternative approaches explained
in this thread does not make much sense to me too (no offense): they can
be correct in some cases and wrong in others. The list of parameters
and scores is already available to the user of GridSearchCV so the
creative user can play with it as she likes. My point is that if GridSearchCV 
needs a
"simple" non-biased policy that returns a single classifier out of the grid 
search
then it should be the one that chooses uniformly at random among
equivalent parameters (and we can discuss more on the meaning of "equivalent"
if you like :-) - see below). I don't know a more convincing rule that is both 
simple
and effective, at least in my experience.

On a more general level I would like to note that the cross-validated
scores of GridSearchCV are just estimates of the true scores and they
have their degree of uncertainty associated. For this reason even
parameters having near-optimal score could be of interest - in principle.
Of course setting up a convincing selection process that accounts for the
uncertainty of the estimates and operates the optimal choice (in a decision
theoretic sense) is not easy, if possible. Moreover the uncertainty associated
with the cross-validation estimate is still an open problem [0]. But most
probably this is out of the scope of sklearn.

Best,

Emanuele

[0]: See for example http://hunch.net/?p=29


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to