[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641138#comment-16641138 ] Dongjoon Hyun commented on SPARK-14681: --- This is reverted on `master` branch, too. > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Weichen Xu >Priority: Major > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636138#comment-16636138 ] Xiangrui Meng commented on SPARK-14681: --- The change were reverted in both branch-2.4 and master to avoid breaking API changes. If it is important to add back the feature, please open a new JIRA ticket. > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Weichen Xu >Priority: Major > Fix For: 2.4.0 > > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636135#comment-16636135 ] Apache Spark commented on SPARK-14681: -- User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/22618 > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Weichen Xu >Priority: Major > Fix For: 2.4.0 > > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622944#comment-16622944 ] Apache Spark commented on SPARK-14681: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/22492 > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Weichen Xu >Priority: Major > Fix For: 2.4.0 > > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622945#comment-16622945 ] Apache Spark commented on SPARK-14681: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/22492 > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Weichen Xu >Priority: Major > Fix For: 2.4.0 > > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392942#comment-16392942 ] Apache Spark commented on SPARK-14681: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/20786 > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Major > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391762#comment-16391762 ] Joseph K. Bradley commented on SPARK-14681: --- [~WeichenXu123] Thanks for the PR! I'll comment on the design here in the JIRA. >From your PR: {code} class TreeClassifierStatInfo def getLabelCount(label: Int): Double class TreeRegressorStatInfo def getCount(): Double def getSum(): Double def getSquareSum(): Double class Node +++ def statInfo: TreeStatInfo trait TreeStatInfo def asTreeClassifierStatInfo: TreeClassifierStatInfo def asTreeRegressorStatInfo: TreeRegressorStatInfo {code} I have a few thoughts: * I like the overall approach of using classes instead of just returning plain double arrays. * This will require users to explicitly cast TreeStatInfo to the classifier/regressor type. Would it be possible to avoid that without breaking APIs, e.g., by having a ClassificationNode and a RegressionNode inheriting from Node? * Naming: What about using "Stats" or "Statistics" instead of "StatInfo?" I just feel the "Info" part is uninformative. > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Major > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389382#comment-16389382 ] Apache Spark commented on SPARK-14681: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/20758 > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Major > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956646#comment-15956646 ] Apache Spark commented on SPARK-14681: -- User 'shaynativ' has created a pull request for this issue: https://github.com/apache/spark/pull/17466 > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244574#comment-15244574 ] zhengruifeng commented on SPARK-14681: -- Will this stats be inclued in trainingSummary or non-trainingSummary evaluated on some dataframe? > Provide label/impurity stats for spark.ml decision tree nodes > - > > Key: SPARK-14681 > URL: https://issues.apache.org/jira/browse/SPARK-14681 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > Currently, spark.ml decision trees provide all node info except for the > aggregated stats about labels and impurities. This task is to provide those > publicly. We need to choose a good API for it, so we should discuss the > design on this issue before implementing it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org