On Tue, Jan 14, 2014 at 4:16 PM, Joel Nothman <joel.noth...@gmail.com>wrote:

>
>
>    - I like some ideas of your solution, in which you can have multiple
>    objectives and hence best models, i.e. est.best_index_ could be an array,
>    and the corresponding est.best_params_. Yet I think there are many cases
>    where you don't actually want to find the best parameters for each metric
>    (e.g. P and R are only there to explain the F1 objective; multiclass
>    per-class vs average).
>
>
So it seems that we have different use cases. I want to find the best-tuned
estimator against each metric while you want to reuse computations from
GridSearchCV to make a multiple metric evaluation report. But then I am not
completely sure to see why you need to frame this within GridSearchCV.

My previous proposition was mainly for cross_val_score for the time being.
I actually think that supporting multiple scorers in GridSearchCV would be
problematic because GridSearchCV needs to behave like a predictor. So, we
would need a stateful API like:

gs = GridSearchCV(LinearSVC(), param_dict, scoring=["auc", "f1"])
gs.fit(X, y)
gs.set_best_estimator(scoring="auc")
gs.predict(X)
gs.set_best_estimator(scoring="f1")
gs.predict(X) # predictions may be different

For this reason, I think that a function that outputs the best estimators
for each scorer would be better:

best_estimators = multiple_grid_search(LinearSVC(), param_dict,
scoring=["auc", "f1"])


>    -
>    - Passing a list of scorers doesn't take advantage of already having
>    multiple metrics returned efficiently by a function (e.g. P,R,F; per-class
>    F1), besides the need to do an extra prediction which you already point
>    out. If each scorer were passed individually, you'd need a custom scorer
>    for each class in the per-class F1 case; or the outputs from each scorer
>    can be flattened and hstacked.
>
> I think evaluating the metric is orders of magnitude faster than computing
the predictions.


>
>    - Using a list of scorer names means this *can* be optimised to do
>    prediction as few times as possible, by grouping together those that
>    require thresholds and those that don't. This of course requires a rewrite
>    of scorer.py and is quite a complex solution.
>
> But I think that the fact that predictions must be recomputed every time
is a serious limitation of the current scorer API and should be addressed.

My solution would be for scorers to take not a triplet (estimator, X,
y_true) but a pair (y_true, y_score), where y_score is a *continuous*
output (output of decision_function). For metrics which need categorical
predictions, y_score can be converted in the scorer. The conversion would
rely on the fact that predict in classifiers is defined as the argmax of
decision_function.

This solution assumes that all classifiers have a decision_function. I
think that this is feasible, even for non-parametric estimators like kNN.
It also assumes that decision_function is defined as an alias to predict in
RegressorMixin. The log loss is the only metric that specifically need
probabilities but it can be re-implemented so as to take decision_function
outputs instead.

In any case, I can see the benefit of having a callback system in
GridSearchCV to let the user reuse some computations.

Mathieu
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to