On Tue, Jan 14, 2014 at 4:16 PM, Joel Nothman <joel.noth...@gmail.com>wrote:
>
>
> - I like some ideas of your solution, in which you can have multiple
> objectives and hence best models, i.e. est.best_index_ could be an array,
> and the corresponding est.best_params_. Yet I think there are many cases
> where you don't actually want to find the best parameters for each metric
> (e.g. P and R are only there to explain the F1 objective; multiclass
> per-class vs average).
>
>
So it seems that we have different use cases. I want to find the best-tuned
estimator against each metric while you want to reuse computations from
GridSearchCV to make a multiple metric evaluation report. But then I am not
completely sure to see why you need to frame this within GridSearchCV.
My previous proposition was mainly for cross_val_score for the time being.
I actually think that supporting multiple scorers in GridSearchCV would be
problematic because GridSearchCV needs to behave like a predictor. So, we
would need a stateful API like:
gs = GridSearchCV(LinearSVC(), param_dict, scoring=["auc", "f1"])
gs.fit(X, y)
gs.set_best_estimator(scoring="auc")
gs.predict(X)
gs.set_best_estimator(scoring="f1")
gs.predict(X) # predictions may be different
For this reason, I think that a function that outputs the best estimators
for each scorer would be better:
best_estimators = multiple_grid_search(LinearSVC(), param_dict,
scoring=["auc", "f1"])
> -
> - Passing a list of scorers doesn't take advantage of already having
> multiple metrics returned efficiently by a function (e.g. P,R,F; per-class
> F1), besides the need to do an extra prediction which you already point
> out. If each scorer were passed individually, you'd need a custom scorer
> for each class in the per-class F1 case; or the outputs from each scorer
> can be flattened and hstacked.
>
> I think evaluating the metric is orders of magnitude faster than computing
the predictions.
>
> - Using a list of scorer names means this *can* be optimised to do
> prediction as few times as possible, by grouping together those that
> require thresholds and those that don't. This of course requires a rewrite
> of scorer.py and is quite a complex solution.
>
> But I think that the fact that predictions must be recomputed every time
is a serious limitation of the current scorer API and should be addressed.
My solution would be for scorers to take not a triplet (estimator, X,
y_true) but a pair (y_true, y_score), where y_score is a *continuous*
output (output of decision_function). For metrics which need categorical
predictions, y_score can be converted in the scorer. The conversion would
rely on the fact that predict in classifiers is defined as the argmax of
decision_function.
This solution assumes that all classifiers have a decision_function. I
think that this is feasible, even for non-parametric estimators like kNN.
It also assumes that decision_function is defined as an alias to predict in
RegressorMixin. The log loss is the only metric that specifically need
probabilities but it can be re-implemented so as to take decision_function
outputs instead.
In any case, I can see the benefit of having a callback system in
GridSearchCV to let the user reuse some computations.
Mathieu
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general