[jira] [Commented] (SPARK-17906) MulticlassClassificationEvaluator support target label

2016-10-21 Thread zhengruifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594607#comment-15594607
 ] 

zhengruifeng commented on SPARK-17906:
--

It seems that in that PR, we should first obtain a model, then use it to 
{{evaluate}} on some dataframe to generate {{summary}}.
This MetricPerLabel maybe also added into {{MulticlassificationEvaluator}} for 
general purpose.

> MulticlassClassificationEvaluator support target label
> --
>
> Key: SPARK-17906
> URL: https://issues.apache.org/jira/browse/SPARK-17906
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> In practice, I sometime only focus on metric of one special label.
> For example, in CTR prediction, I usually only mind F1 of positive class.
> In sklearn, this is supported:
> {code}
> >>> from sklearn.metrics import classification_report
> >>> y_true = [0, 1, 2, 2, 2]
> >>> y_pred = [0, 0, 2, 2, 1]
> >>> target_names = ['class 0', 'class 1', 'class 2']
> >>> print(classification_report(y_true, y_pred, target_names=target_names))
>  precisionrecall  f1-score   support
> class 0   0.50  1.00  0.67 1
> class 1   0.00  0.00  0.00 1
> class 2   1.00  0.67  0.80 3
> avg / total   0.70  0.60  0.61 5
> {code}
> Now, ml only support `weightedXXX`. So I think there may be a point to 
> improve.
> The API may be designed like this:
> {code}
> val dataset = ...
> val evaluator = new MulticlassClassificationEvaluator
> evaluator.setMetricName("f1")
> evaluator.evaluate(dataset)   // weightedF1 of all classes
> evaluator.setTarget(0.0).setMetricName("f1")
> evaluator.evaluate(dataset)   // F1 of class "0"
> {code}
> what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
> If this is useful and acceptable, I'm happy to work on this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17906) MulticlassClassificationEvaluator support target label

2016-10-21 Thread zhengruifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594570#comment-15594570
 ] 

zhengruifeng commented on SPARK-17906:
--

Yes. I think it useful to expose metrics computing one label vs others.

> MulticlassClassificationEvaluator support target label
> --
>
> Key: SPARK-17906
> URL: https://issues.apache.org/jira/browse/SPARK-17906
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> In practice, I sometime only focus on metric of one special label.
> For example, in CTR prediction, I usually only mind F1 of positive class.
> In sklearn, this is supported:
> {code}
> >>> from sklearn.metrics import classification_report
> >>> y_true = [0, 1, 2, 2, 2]
> >>> y_pred = [0, 0, 2, 2, 1]
> >>> target_names = ['class 0', 'class 1', 'class 2']
> >>> print(classification_report(y_true, y_pred, target_names=target_names))
>  precisionrecall  f1-score   support
> class 0   0.50  1.00  0.67 1
> class 1   0.00  0.00  0.00 1
> class 2   1.00  0.67  0.80 3
> avg / total   0.70  0.60  0.61 5
> {code}
> Now, ml only support `weightedXXX`. So I think there may be a point to 
> improve.
> The API may be designed like this:
> {code}
> val dataset = ...
> val evaluator = new MulticlassClassificationEvaluator
> evaluator.setMetricName("f1")
> evaluator.evaluate(dataset)   // weightedF1 of all classes
> evaluator.setTarget(0.0).setMetricName("f1")
> evaluator.evaluate(dataset)   // F1 of class "0"
> {code}
> what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
> If this is useful and acceptable, I'm happy to work on this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17906) MulticlassClassificationEvaluator support target label

2016-10-21 Thread zhengruifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594569#comment-15594569
 ] 

zhengruifeng commented on SPARK-17906:
--

Yes. I think it useful to expose metrics computing one label vs others.

> MulticlassClassificationEvaluator support target label
> --
>
> Key: SPARK-17906
> URL: https://issues.apache.org/jira/browse/SPARK-17906
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> In practice, I sometime only focus on metric of one special label.
> For example, in CTR prediction, I usually only mind F1 of positive class.
> In sklearn, this is supported:
> {code}
> >>> from sklearn.metrics import classification_report
> >>> y_true = [0, 1, 2, 2, 2]
> >>> y_pred = [0, 0, 2, 2, 1]
> >>> target_names = ['class 0', 'class 1', 'class 2']
> >>> print(classification_report(y_true, y_pred, target_names=target_names))
>  precisionrecall  f1-score   support
> class 0   0.50  1.00  0.67 1
> class 1   0.00  0.00  0.00 1
> class 2   1.00  0.67  0.80 3
> avg / total   0.70  0.60  0.61 5
> {code}
> Now, ml only support `weightedXXX`. So I think there may be a point to 
> improve.
> The API may be designed like this:
> {code}
> val dataset = ...
> val evaluator = new MulticlassClassificationEvaluator
> evaluator.setMetricName("f1")
> evaluator.evaluate(dataset)   // weightedF1 of all classes
> evaluator.setTarget(0.0).setMetricName("f1")
> evaluator.evaluate(dataset)   // F1 of class "0"
> {code}
> what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
> If this is useful and acceptable, I'm happy to work on this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17906) MulticlassClassificationEvaluator support target label

2016-10-13 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572154#comment-15572154
 ] 

Seth Hendrickson commented on SPARK-17906:
--

We are adding model summaries that would expose some of this behavior. For 
example, see [https://github.com/apache/spark/pull/15435]. That PR will likely 
expose some of the functionality being requested here.

> MulticlassClassificationEvaluator support target label
> --
>
> Key: SPARK-17906
> URL: https://issues.apache.org/jira/browse/SPARK-17906
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> In practice, I sometime only focus on metric of one special label.
> For example, in CTR prediction, I usually only mind F1 of positive class.
> In sklearn, this is supported:
> {code}
> >>> from sklearn.metrics import classification_report
> >>> y_true = [0, 1, 2, 2, 2]
> >>> y_pred = [0, 0, 2, 2, 1]
> >>> target_names = ['class 0', 'class 1', 'class 2']
> >>> print(classification_report(y_true, y_pred, target_names=target_names))
>  precisionrecall  f1-score   support
> class 0   0.50  1.00  0.67 1
> class 1   0.00  0.00  0.00 1
> class 2   1.00  0.67  0.80 3
> avg / total   0.70  0.60  0.61 5
> {code}
> Now, ml only support `weightedXXX`. So I think there may be a point to 
> improve.
> The API may be designed like this:
> {code}
> val dataset = ...
> val evaluator = new MulticlassClassificationEvaluator
> evaluator.setMetricName("f1")
> evaluator.evaluate(dataset)   // weightedF1 of all classes
> evaluator.setTarget(0.0).setMetricName("f1")
> evaluator.evaluate(dataset)   // F1 of class "0"
> {code}
> what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
> If this is useful and acceptable, I'm happy to work on this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17906) MulticlassClassificationEvaluator support target label

2016-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571946#comment-15571946
 ] 

Sean Owen commented on SPARK-17906:
---

Are you describing what the old MulticlassMetrics class supported with 
fMeasure? it would compute it for one label vs the others. I think it makes 
some sense, just a question of how to expose it meaningfully and in a backwards 
compatible way. 

> MulticlassClassificationEvaluator support target label
> --
>
> Key: SPARK-17906
> URL: https://issues.apache.org/jira/browse/SPARK-17906
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> In practice, I sometime only focus on metric of one special label.
> For example, in CTR prediction, I usually only mind F1 of positive class.
> In sklearn, this is supported:
> {code}
> >>> from sklearn.metrics import classification_report
> >>> y_true = [0, 1, 2, 2, 2]
> >>> y_pred = [0, 0, 2, 2, 1]
> >>> target_names = ['class 0', 'class 1', 'class 2']
> >>> print(classification_report(y_true, y_pred, target_names=target_names))
>  precisionrecall  f1-score   support
> class 0   0.50  1.00  0.67 1
> class 1   0.00  0.00  0.00 1
> class 2   1.00  0.67  0.80 3
> avg / total   0.70  0.60  0.61 5
> {code}
> Now, ml only support `weightedXXX`. So I think there may be a point to 
> improve.
> The API may be designed like this:
> {code}
> val dataset = ...
> val evaluator = new MulticlassClassificationEvaluator
> evaluator.setMetricName("f1")
> evaluator.evaluate(dataset)   // weightedF1 of all classes
> evaluator.setTarget(0.0).setMetricName("f1")
> evaluator.evaluate(dataset)   // F1 of class "0"
> {code}
> what's your opinion? [~yanboliang][~josephkb][~sethah][~srowen] 
> If this is useful and acceptable, I'm happy to work on this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org