Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

Joel Nothman Sun, 10 Mar 2013 06:05:16 -0700

Thanks Andy,

On 03/10/2013 11:01, Andreas Mueller wrote:


Yes, we want to find the model with the highest F1 score, and so use it for
the grid search. But it is hard to interpret an F1 score alone, because it
is by definition a compromise between precision and recall. The latter are
more informative about the relative strengths of the model, and in much
research where F1 is reported, precision and recall are also reported. It
would therefore be useful to see these metrics calculated for each grid
point (but certainly for the best-scoring grid point).

> In the current stable, you can use ``score_func=f1_score``, in the
> current developer version you can use ``scoring='f1'``.

These mask the underlying precision and recall calculations, while using
precision_recall_fscore_support directly will trigger errors when the score
is being aggregated in BaseSearchCV.

> > I can see two ways to make this data available:
> > * allow the user to provide an arbitrary diagnostics function which is
> > run on each fold's predictions and whose output is stored with
> > cv_scores_ (or one could even store the learnt parameters for each fold)
> This should be possible in the current developer version with the
> introduction of the ``scoring`` parameter.

The only output of the ``scoring`` parameter that is currently stored in
GridSearchCV.cv_scores_ is the score. But the score is also aggregated (by
BaseSearchCV to produce mean_validation_score), which requires that it can
be added to 0, accumulated with += and, if BaseSearchCV.iid=True, divided
with /=. This means that the Scorer cannot currently return an arbitrary
score object: it needs to be a numeric type.

Alternatively, a custom Scorer could indeed store data every time it is
called, but it would not be stored in or in correspondence with
GridSearchCV.cv_scores_.

I hope my issue is a bit clearer, but perhaps I need to implement a
possible solution to make it clearer.

- Joel

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

Reply via email to