[jira] [Updated] (SPARK-17906) MulticlassClassificationEvaluator support target label

zhengruifeng (JIRA) Thu, 13 Oct 2016 06:00:40 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-17906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zhengruifeng updated SPARK-17906:
---------------------------------
    Description: 
In practice, I sometime only focus on metric of one special label.
For example, in CTR prediction, I usually only mind F1 of positive class.

In sklearn, this is supported:
{code}
>>> from sklearn.metrics import classification_report
>>> y_true = [0, 1, 2, 2, 2]
>>> y_pred = [0, 0, 2, 2, 1]
>>> target_names = ['class 0', 'class 1', 'class 2']
>>> print(classification_report(y_true, y_pred, target_names=target_names))
             precision    recall  f1-score   support

    class 0       0.50      1.00      0.67         1
    class 1       0.00      0.00      0.00         1
    class 2       1.00      0.67      0.80         3

avg / total       0.70      0.60      0.61         5
{code}

Now, ml only support `weightedXXX`. So I think there may be a point to improve.

The API may be designed like this:
{code}
val dataset = ...
val evaluator = new MulticlassClassificationEvaluator
evaluator.setMetricName("f1")
evaluator.evaluate(dataset)       // weightedF1 of all classes

evaluator.setTarget(0.0).setMetricName("f1")
evaluator.evaluate(dataset)       // F1 of class "0"
{code}


what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
If this is useful and acceptable, I'm happy to work on this. 

  was:
In practice, I sometime only focus metric of one special label.
For example, in CTR prediction, I usually only mind F1 of positive class.

In sklearn, this is supported:
{code}
>>> from sklearn.metrics import classification_report
>>> y_true = [0, 1, 2, 2, 2]
>>> y_pred = [0, 0, 2, 2, 1]
>>> target_names = ['class 0', 'class 1', 'class 2']
>>> print(classification_report(y_true, y_pred, target_names=target_names))
             precision    recall  f1-score   support

    class 0       0.50      1.00      0.67         1
    class 1       0.00      0.00      0.00         1
    class 2       1.00      0.67      0.80         3

avg / total       0.70      0.60      0.61         5
{code}

Now, ml only support `weightedXXX`. So I think there may be a point to improve.

The API may be designed like this:
{code}
val dataset = ...
val evaluator = new MulticlassClassificationEvaluator
evaluator.setMetricName("f1")
evaluator.evaluate(dataset)       // weightedF1 of all classes

evaluator.setTarget(0.0).setMetricName("f1")
evaluator.evaluate(dataset)       // F1 of class "0"
{code}


what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
If this is useful and acceptable, I'm happy to work on this. 


> MulticlassClassificationEvaluator support target label
> ------------------------------------------------------
>
>                 Key: SPARK-17906
>                 URL: https://issues.apache.org/jira/browse/SPARK-17906
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: zhengruifeng
>            Priority: Minor
>
> In practice, I sometime only focus on metric of one special label.
> For example, in CTR prediction, I usually only mind F1 of positive class.
> In sklearn, this is supported:
> {code}
> >>> from sklearn.metrics import classification_report
> >>> y_true = [0, 1, 2, 2, 2]
> >>> y_pred = [0, 0, 2, 2, 1]
> >>> target_names = ['class 0', 'class 1', 'class 2']
> >>> print(classification_report(y_true, y_pred, target_names=target_names))
>              precision    recall  f1-score   support
>     class 0       0.50      1.00      0.67         1
>     class 1       0.00      0.00      0.00         1
>     class 2       1.00      0.67      0.80         3
> avg / total       0.70      0.60      0.61         5
> {code}
> Now, ml only support `weightedXXX`. So I think there may be a point to 
> improve.
> The API may be designed like this:
> {code}
> val dataset = ...
> val evaluator = new MulticlassClassificationEvaluator
> evaluator.setMetricName("f1")
> evaluator.evaluate(dataset)       // weightedF1 of all classes
> evaluator.setTarget(0.0).setMetricName("f1")
> evaluator.evaluate(dataset)       // F1 of class "0"
> {code}
> what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
> If this is useful and acceptable, I'm happy to work on this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17906) MulticlassClassificationEvaluator support target label

Reply via email to