Firstly, yes, fit_grid_point is being replaced cross_val_score. It wants
you to review it! https://github.com/scikit-learn/scikit-learn/pull/2736
Secondly, my prior proposals include:
- On the mailing list in March 2013 I suggested a minimal code change
although not very user friendly approach: allow a scorer to return an
arbitrary object which would be stored, as long as it has __float__ to
convert it to a single objective for average and argmax.
- Similarly a scorer could return a tuple or array and the first is the
objective;
- Or a scorer could return a dict, and the entry with a particular label
is the objective.
- #1768 <https://github.com/scikit-learn/scikit-learn/pull/1768> takes
this approach. Scorers may return a list of (name, score) tuples. And the
name 'score' is the objective. Before storing, it prefixes the names with
'test_', and does similar for 'train_' given
`compute_training_score=True`.
[In that PR and in #1787, the data is then stored in a structured array,
which acts somewhat like a dict of arrays or array of dicts, and can be
sliced and reshaped, which is useful for parameter grids.
Structured arrays
have their issues
(#1787<https://github.com/scikit-learn/scikit-learn/pull/1787>),
so #2079 <https://github.com/scikit-learn/scikit-learn/pull/2079>goes
for returning and storing a dict.]
In all of the above, it is possible to get multiple scores without
duplicating prediction work (and it would make sense in general to provide
a PRFScorer instead of individually calculating F1, P and R). In my present
proposal, duplicate prediction must be done, but the API is arguably
simpler.
Thirdly, to summarise my latest proposal:
- Provide a way for the user to retrieve arbitrary data calculated from
the estimator at one "grid point".
- Don't make it exclusively/explicitly about scoring, so use a separate
parameter and more expansive callback args.
- This duplicates work done in scoring, does not presuppose any
particular use-case, and leaves the search with a single objective.
- As with scoring, useful measures can be provided by name.
Finally, on your proposal:
- I like some ideas of your solution, in which you can have multiple
objectives and hence best models, i.e. est.best_index_ could be an array,
and the corresponding est.best_params_. Yet I think there are many cases
where you don't actually want to find the best parameters for each metric
(e.g. P and R are only there to explain the F1 objective; multiclass
per-class vs average). Where there are multiple metrics, you also cannot
sensibly refit a best_estimator_ to which the search delegates its predict.
- Passing a list of scorers doesn't take advantage of already having
multiple metrics returned efficiently by a function (e.g. P,R,F; per-class
F1), besides the need to do an extra prediction which you already point
out. If each scorer were passed individually, you'd need a custom scorer
for each class in the per-class F1 case; or the outputs from each scorer
can be flattened and hstacked.
- Using a list of scorer names means this *can* be optimised to do
prediction as few times as possible, by grouping together those that
require thresholds and those that don't. This of course requires a rewrite
of scorer.py and is quite a complex solution.
- Having multiple objectives won't work with a more clever CV search
that is guided by the objective in selecting the next parameters to try.
- Joel
On 14 January 2014 14:59, Mathieu Blondel <math...@mblondel.org> wrote:
> I'd definitely like to have support for multiple metrics. My use case is
> that I have several methods that I want to evaluate against different
> metrics and I want the hyper-parameters to be tuned against each metric. In
> addition I don't have a test set so I need to use cross-validation both for
> evaluation and hyper-parameter tuning.
>
> A first change would be for cross_val_score to accept a list of scorers
> and to return a n_folds x n_scorers array. This would only support a fixed
> set of hyper-parameters but this change seems rather straightforward and
> non-controversial. This would hopefully also serve as a basis for multiple
> metrics grid search (can't fit_grid_point be replaced with
> cross_val_score?).
>
> When using multiple metrics, a major limitation of the current scorer API
> is that it will recompute the predictions for each scorer. Unfortunately,
> for kernel methods or random forests, computing the predictions is really
> expensive.
>
> I will study your solution more carefully when I have more time. Could you
> also give a pointer to your previous proposed solution for comparison?
>
> Mathieu
>
>
> On Thu, Jan 9, 2014 at 4:48 PM, Eustache DIEMERT <eusta...@diemert.fr>wrote:
>
>> +1 for the "diagnostics" attribute
>>
>> I've struggled with this in the past and the workaround I found was to
>> subclass my estimator to hook up the computation of additional metrics and
>> store the results into a new attribute like diagnostics.
>>
>> Also, having a default set of diagnostics for different tasks is a must
>> for a practitioner-friendly library.
>>
>> my 2c :)
>>
>> Eustache
>>
>>
>> 2014/1/9 Joel Nothman <joel.noth...@gmail.com>
>>
>>> Hi all,
>>>
>>> I've had enough frustration at having to patch in things from a code
>>> fork in order to merely get back precision and recall while optimising F1
>>> in grid search. This is something I need to do really frequently, as I'm
>>> sure do others.
>>>
>>> When I wrote and submitted PRs about this problem nine months ago, I
>>> proposed relatively sophisticated solutions. Perhaps a simple, flexible
>>> solution is appropriate:
>>>
>>> GridSearchCV, RandomizedSearchCV, cross_val_score, and perhaps anything
>>> else supporting 'scoring', should take an additional parameter, e.g.
>>> 'diagnostics', which is a callable with interface:
>>> (estimator, X_train, y_train, X_test, y_test) -> object
>>>
>>> The results of CV will include a params x folds array (or list of
>>> arrays) to store each of these returned objects, whose dtype is
>>> automatically detected, so that it may be compactly stored and easily
>>> accessed if desired.
>>>
>>> So when scoring=f1, a diagnostic fn can be passed to calculate
>>> precision, recall, etc., which means a bit of duplicated scoring work, but
>>> no confusion of the existing scoring interface.
>>>
>>> Scikit-learn may indeed provide ready-made diagnostic functions for
>>> certain types of tasks. For example:
>>>
>>> - a binary classification diagnostic might return P, R, F, AUC,
>>> AvgPrec;
>>> - multiclass might add per-class performances, different averages
>>> and a confusion matrix;
>>> - a linear model diagnostic might measure model sparsity. (Perhaps
>>> the parameter can take a sequence of callables to return a tuple of
>>> diagnostic results per fold.)
>>>
>>>
>>> As opposed to some of my more intricate proposals, this approach leaves
>>> it to the user to do any averaging over folds etc.
>>>
>>> *SearchCV should also store best_index_ more importantly than
>>> best_params_ so that this data can be cross-referenced. If the diagnostic
>>> output is a list of arrays, rather than an array, the user can manually
>>> delete information from the non-best trials, before saving the model to
>>> disk.
>>>
>>> This also implies some refactoring of cross_val_score and fit_grid_point
>>> that is overdue.
>>>
>>> Does this seem the right level of complexity/flexibility? Please help me
>>> and the many others who have requested it resolve this issue sooner rather
>>> than later. I'd like to submit a PR towards this that actually gets
>>> accepted, so some feedback is really welcome.
>>>
>>> Cheers,
>>>
>>> - Joel
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>> Critical Workloads, Development Environments & Everything In Between.
>>> Get a Quote or Start a Free Trial Today.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general