Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18832
merged to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18832
**[Test build #80333 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80333/testReport)**
for PR 18832 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18832
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18832
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80333/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18832
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18832
**[Test build #80331 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80331/testReport)**
for PR 18832 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18832
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80331/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18832
**[Test build #80333 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80333/testReport)**
for PR 18832 at commit
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
Thanks @srowen , I revised the comments per Seth's suggestion: "Parent
stats need to be explicitly tracked in the DTStatsAggregator because the parent
[[Node]] object does not have ImpurityStats on
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18832
**[Test build #80331 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80331/testReport)**
for PR 18832 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18832
@mpjlu can you either close or update the change to reflect your input and
Seth's?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
Thanks @sethah .
I strongly think we should update the commend or just delete the comment as
the current PR.
Another reason is: there are three kinds of feature: categorical, ordered
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
If you want to change it, that's fine. I think it's fine either way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
I agree with you. Do you think we should update the comment to help others
understand the code.
Since parantStats is updated and used in each iteration.
Thanks.
---
If your project is set
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
No, I don't think so. Computing parent stats is a very small fraction of
the time and memory compared with the overall `allStats` array. That's why we
decided to just add it in the first place.
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
I know your point.
I am confusing the code doesn't work that way.
The code update parentStats for each iteration. Actually, we only need to
update parentStats for the first Iteration.
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
I don't agree the comment is _misleading_. It might be confusing, but
that's something different.
The reason that the `DTStatsAggregator` needs to keep track of
`parentStats` is so that we
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
parentStats is used in this
code:ãbinAggregates.getParentImpurityCalculator(), this is used in all
iteration.
So that comment seems very misleading.
`} else if
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
node.stats is ImpurityStats, and parentStats is Array[Double], there are
different. Maybe this comment should be used on node.stats, but not on
parentStats. Is my understanding wrong?
---
If your
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
The comment is not wrong. It's added for when we are finding the best
split, to compute the right child stats from the left child stats. We would
have just used the stats that are already available
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18832
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80199/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18832
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18832
**[Test build #80199 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80199/testReport)**
for PR 18832 at commit
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/18832
@sethah
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18832
**[Test build #80199 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80199/testReport)**
for PR 18832 at commit
25 matches
Mail list logo