Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

Joel Nothman Sun, 10 Mar 2013 17:25:22 -0700

>
> On 03/10/2013 16:42:44 +0100, Andreas Mueller wrote:
>
> If you have an elegant solution, I'm all ears, though ;)
>


Here's a hacky solution for my particular case which requires git revert
2d9cb81b8 to work at HEAD. It works by returning a Score object from the
Scorer, which pretends it is the fscore value when it is the operand of +
or *:

import sklearn.metrics

class Score(object):
    __slots__ = ('value', 'meta')
    def __init__(self, value, meta=None):
        self.value = value
        self.meta = meta
    def __add__(self, other):
        return self.value + other
    def __radd__(self, other):
        return other + self.value
    def __mul__(self, other):
        return self.value * other
    def __repr__(self):
        return '<{}{}>'.format(self.value, '' if self.meta is None else '
({})'.format(self.meta))


def prf(*args, **kwargs):
    if 'average' not in kwargs:
        kwargs['average'] = 'weighted'
    p, r, f, support =
sklearn.metrics.precision_recall_fscore_support(*args, **kwargs)
    return Score(f, {'precision': p, 'recall': r, 'support': support})


clf =
sklearn.grid_search.GridSearchCV(sklearn.linear_model.LogisticRegression(),
{'C': [1, 10]}, scoring=sklearn.metrics.Scorer(prf))
clf.fit(iris.data, iris.target == 1)  # binary classification


Then printing clf.cv_scores_
[CVScoreTuple(parameters={'C': 1},
mean_validation_score=0.30285714285714288, cv_validation_scores=array([
<0.48 ({'recall': 0.375, 'support': 16, 'precision': 0.66666666666666663})>,
       <0.428571428571 ({'recall': 0.35294117647058826, 'support': 17,
'precision': 0.54545454545454541})>,
...]


More generically, you might loosen the requirement that Scorer.__call__
returns a number, as long as it returns something with __float__() as in
https://github.com/jnothman/scikit-learn/commit/51d3ea. However, converting
to float may mess up results where scores are integers.

In this case, the following would suffice:

class FScore(object):
    __slots__ = ('precision', 'recall', 'fscore', 'support')
    def __init__(self, *args, **kwargs):
        if 'average' not in kwargs:
            kwargs['average'] = 'weighted'
        self.precision, self.recall, self.fscore, support =
sklearn.metrics.precision_recall_fscore_support(*args, **kwargs)
    def __float__(self):
        return self.fscore

clf =
sklearn.grid_search.GridSearchCV(sklearn.linear_model.LogisticRegression(),
{'C': [1, 10]}, scoring=sklearn.metrics.Scorer(FScore))
...

As an aside: if you had all fitted estimators, it would also be quite
> easy to compute the other scores, right?
> Would that be an acceptable solution for you?
>

I guess so (noting that a modified scorer with the above could store the
estimator as well)... Perhaps that's a reasonable option -- its main
benefit over the above is less obfuscation -- though I to worry that
storing all estimators in the general case is expensive.

- Joel

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

Reply via email to