Re: [Scikit-learn-general] Multiple metrics in grid search, etc. (again!)

Joel Nothman Mon, 13 Jan 2014 23:18:35 -0800

Firstly, yes, fit_grid_point is being replaced cross_val_score. It wants
you to review it! https://github.com/scikit-learn/scikit-learn/pull/2736


Secondly, my prior proposals include:

   - On the mailing list in March 2013 I suggested a minimal code change
   although not very user friendly approach: allow a scorer to return an
   arbitrary object which would be stored, as long as it has __float__ to
   convert it to a single objective for average and argmax.
   - Similarly a scorer could return a tuple or array and the first is the
   objective;
   - Or a scorer could return a dict, and the entry with a particular label
   is the objective.
      - #1768 <https://github.com/scikit-learn/scikit-learn/pull/1768> takes
      this approach. Scorers may return a list of (name, score) tuples. And the
      name 'score' is the objective. Before storing, it prefixes the names with
      'test_', and does similar for 'train_' given
`compute_training_score=True`.
      [In that PR and in #1787, the data is then stored in a structured array,
      which acts somewhat like a dict of arrays or array of dicts, and can be
      sliced and reshaped, which is useful for parameter grids.
Structured arrays
      have their issues
(#1787<https://github.com/scikit-learn/scikit-learn/pull/1787>),
      so #2079 <https://github.com/scikit-learn/scikit-learn/pull/2079>goes
for returning and storing a dict.]

In all of the above, it is possible to get multiple scores without
duplicating prediction work (and it would make sense in general to provide
a PRFScorer instead of individually calculating F1, P and R). In my present
proposal, duplicate prediction must be done, but the API is arguably
simpler.

Thirdly, to summarise my latest proposal:

   - Provide a way for the user to retrieve arbitrary data calculated from
   the estimator at one "grid point".
   - Don't make it exclusively/explicitly about scoring, so use a separate
   parameter and more expansive callback args.
   - This duplicates work done in scoring, does not presuppose any
   particular use-case, and leaves the search with a single objective.
   - As with scoring, useful measures can be provided by name.

Finally, on your proposal:

   - I like some ideas of your solution, in which you can have multiple
   objectives and hence best models, i.e. est.best_index_ could be an array,
   and the corresponding est.best_params_. Yet I think there are many cases
   where you don't actually want to find the best parameters for each metric
   (e.g. P and R are only there to explain the F1 objective; multiclass
   per-class vs average). Where there are multiple metrics, you also cannot
   sensibly refit a best_estimator_ to which the search delegates its predict.
   - Passing a list of scorers doesn't take advantage of already having
   multiple metrics returned efficiently by a function (e.g. P,R,F; per-class
   F1), besides the need to do an extra prediction which you already point
   out. If each scorer were passed individually, you'd need a custom scorer
   for each class in the per-class F1 case; or the outputs from each scorer
   can be flattened and hstacked.
   - Using a list of scorer names means this *can* be optimised to do
   prediction as few times as possible, by grouping together those that
   require thresholds and those that don't. This of course requires a rewrite
   of scorer.py and is quite a complex solution.
   - Having multiple objectives won't work with a more clever CV search
   that is guided by the objective in selecting the next parameters to try.

- Joel



On 14 January 2014 14:59, Mathieu Blondel <math...@mblondel.org> wrote:

> I'd definitely like to have support for multiple metrics. My use case is
> that I have several methods that I want to evaluate against different
> metrics and I want the hyper-parameters to be tuned against each metric. In
> addition I don't have a test set so I need to use cross-validation both for
> evaluation and hyper-parameter tuning.
>
> A first change would be for cross_val_score to accept a list of scorers
> and to return a n_folds x n_scorers array. This would only support a fixed
> set of hyper-parameters but this change seems rather straightforward and
> non-controversial. This would hopefully also serve as a basis for multiple
> metrics grid search (can't fit_grid_point be replaced with
> cross_val_score?).
>
> When using multiple metrics, a major limitation of the current scorer API
> is that it will recompute the predictions for each scorer. Unfortunately,
> for kernel methods or random forests, computing the predictions is really
> expensive.
>
> I will study your solution more carefully when I have more time. Could you
> also give a pointer to your previous proposed solution for comparison?
>
> Mathieu
>
>
> On Thu, Jan 9, 2014 at 4:48 PM, Eustache DIEMERT <eusta...@diemert.fr>wrote:
>
>> +1 for the "diagnostics" attribute
>>
>> I've struggled with this in the past and the workaround I found was to
>> subclass my estimator to hook up the computation of additional metrics and
>> store the results into a new attribute like diagnostics.
>>
>> Also, having a default set of diagnostics for different tasks is a must
>> for a practitioner-friendly library.
>>
>> my 2c :)
>>
>> Eustache
>>
>>
>> 2014/1/9 Joel Nothman <joel.noth...@gmail.com>
>>
>>>  Hi all,
>>>
>>> I've had enough frustration at having to patch in things from a code
>>> fork in order to merely get back precision and recall while optimising F1
>>> in grid search. This is something I need to do really frequently, as I'm
>>> sure do others.
>>>
>>> When I wrote and submitted PRs about this problem nine months ago, I
>>> proposed relatively sophisticated solutions. Perhaps a simple, flexible
>>> solution is appropriate:
>>>
>>> GridSearchCV, RandomizedSearchCV, cross_val_score, and perhaps anything
>>> else supporting 'scoring', should take an additional parameter, e.g.
>>> 'diagnostics', which is a callable with interface:
>>> (estimator, X_train, y_train, X_test, y_test) -> object
>>>
>>> The results of CV will include a params x folds array (or list of
>>> arrays) to store each of these returned objects, whose dtype is
>>> automatically detected, so that it may be compactly stored and easily
>>> accessed if desired.
>>>
>>> So when scoring=f1, a diagnostic fn can be passed to calculate
>>> precision, recall, etc., which means a bit of duplicated scoring work, but
>>> no confusion of the existing scoring interface.
>>>
>>> Scikit-learn may indeed provide ready-made diagnostic functions for
>>> certain types of tasks. For example:
>>>
>>>    - a binary classification diagnostic might return P, R, F, AUC,
>>>    AvgPrec;
>>>    - multiclass might add per-class performances, different averages
>>>    and a confusion matrix;
>>>    - a linear model diagnostic might measure model sparsity. (Perhaps
>>>    the parameter can take a sequence of callables to return a tuple of
>>>    diagnostic results per fold.)
>>>
>>>
>>> As opposed to some of my more intricate proposals, this approach leaves
>>> it to the user to do any averaging over folds etc.
>>>
>>> *SearchCV should also store best_index_ more importantly than
>>> best_params_ so that this data can be cross-referenced. If the diagnostic
>>> output is a list of arrays, rather than an array, the user can manually
>>> delete information from the non-best trials, before saving the model to
>>> disk.
>>>
>>> This also implies some refactoring of cross_val_score and fit_grid_point
>>> that is overdue.
>>>
>>> Does this seem the right level of complexity/flexibility? Please help me
>>> and the many others who have requested it resolve this issue sooner rather
>>> than later. I'd like to submit a PR towards this that actually gets
>>> accepted, so some feedback is really welcome.
>>>
>>> Cheers,
>>>
>>> - Joel
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>> Critical Workloads, Development Environments & Everything In Between.
>>> Get a Quote or Start a Free Trial Today.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Multiple metrics in grid search, etc. (again!)

Reply via email to