[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18832 merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18832 **[Test build #80333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80333/testReport)** for PR 18832 at commit

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18832 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18832 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80333/ Test PASSed. ---

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18832 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18832 **[Test build #80331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80331/testReport)** for PR 18832 at commit

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18832 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80331/ Test PASSed. ---

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18832 **[Test build #80333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80333/testReport)** for PR 18832 at commit

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18832 Thanks @srowen , I revised the comments per Seth's suggestion: "Parent stats need to be explicitly tracked in the DTStatsAggregator because the parent [[Node]] object does not have ImpurityStats on

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18832 **[Test build #80331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80331/testReport)** for PR 18832 at commit

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-07 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18832 @mpjlu can you either close or update the change to reflect your input and Seth's? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18832 Thanks @sethah . I strongly think we should update the commend or just delete the comment as the current PR. Another reason is: there are three kinds of feature: categorical, ordered

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 If you want to change it, that's fine. I think it's fine either way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18832 I agree with you. Do you think we should update the comment to help others understand the code. Since parantStats is updated and used in each iteration. Thanks. --- If your project is set

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 No, I don't think so. Computing parent stats is a very small fraction of the time and memory compared with the overall `allStats` array. That's why we decided to just add it in the first place.

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18832 I know your point. I am confusing the code doesn't work that way. The code update parentStats for each iteration. Actually, we only need to update parentStats for the first Iteration.

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 I don't agree the comment is _misleading_. It might be confusing, but that's something different. The reason that the `DTStatsAggregator` needs to keep track of `parentStats` is so that we

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18832 parentStats is used in this code: binAggregates.getParentImpurityCalculator(), this is used in all iteration. So that comment seems very misleading. `} else if

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18832 node.stats is ImpurityStats, and parentStats is Array[Double], there are different. Maybe this comment should be used on node.stats, but not on parentStats. Is my understanding wrong? --- If your

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 The comment is not wrong. It's added for when we are finding the best split, to compute the right child stats from the left child stats. We would have just used the stats that are already available

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18832 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80199/ Test PASSed. ---

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18832 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18832 **[Test build #80199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80199/testReport)** for PR 18832 at commit

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18832 @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18832 **[Test build #80199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80199/testReport)** for PR 18832 at commit