[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/20632 @asolimando Can you change the title to include `[ML]` and also shorten it. Maybe just: `Add decision tree pruning` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20632 **[Test build #4136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4136/testReport)** for PR 20632 at commit [`d02ff2a`](https://github.com/apache/spark/commit/d02ff2aa5c7db983beb5c2af03f5cd2df1aca135). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20632 **[Test build #4136 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4136/testReport)** for PR 20632 at commit [`d02ff2a`](https://github.com/apache/spark/commit/d02ff2aa5c7db983beb5c2af03f5cd2df1aca135). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/20632 @srowen Do we need you to trigger the tests? I'm not sure why they haven't been run... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/20632 Jenkins test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20632 PS @sethah feel free to merge this one when you think it's ready. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20632 **[Test build #4103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4103/testReport)** for PR 20632 at commit [`eeb3cc9`](https://github.com/apache/spark/commit/eeb3cc93ffa39198ccaba75fc7a63032325baf1e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20632 **[Test build #4103 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4103/testReport)** for PR 20632 at commit [`eeb3cc9`](https://github.com/apache/spark/commit/eeb3cc93ffa39198ccaba75fc7a63032325baf1e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20632 Run `./dev/scalastyle` to replicate what Jenkins will do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user asolimando commented on the issue: https://github.com/apache/spark/pull/20632 Given that we are converging I have squashed the commits into a single one. My local `mvn scalastyle:check` was passing (as well as the check done via the Scala plugin for IntelliiJ) while the one triggered in Jenkins failed. I have fixed the specific issue, but there might be more, any suggestion on how to reproduce the same setup locally in order to be sure there won't be additional issues on scalastyle side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20632 **[Test build #4100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4100/testReport)** for PR 20632 at commit [`fd3b5df`](https://github.com/apache/spark/commit/fd3b5df05352fa581780f17510bb0d6662359647). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20632 **[Test build #4100 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4100/testReport)** for PR 20632 at commit [`fd3b5df`](https://github.com/apache/spark/commit/fd3b5df05352fa581780f17510bb0d6662359647). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20632 Another one that @jkbradley or @MLnick might want to look at, but seems like a nice win. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user asolimando commented on the issue: https://github.com/apache/spark/pull/20632 Hello Sean, here is my understanding of the problem and the main intuition of the proposed solution: We want to have a tree such that it does not contain any redundant subtree. A subtree is redundant iff it is not composed by a leaf and all the leaves belonging to it agree on the prediction (the leafPredictions has at least two distinct values). Given that for any subtree there is exactly one root node, we "store" this information in its root (learning) node. A redundant node can be replaced by a single leaf, predicting the single prediction value reachable via that subtree. So, there are basically two cases in our top-down exploration: 1) The subtree is redundant -> the whole subtree is explored (linearly) to collect the predictions, the node is translated into a LeafNode, and thus the toNode method is never called on its children, so no extra exploration 2) The subtree is *not* redundant -> the whole subtree is explored (linearly) to collect the predictions, the node is translated into an InternalNode, and toNode recursively called on its children. However, at this point, the lazy value shouldBeLeaf (private lazy val shouldBeLeaf: Boolean) has been already set in the previous call, so there won't be any further exploration. Each further step down the recursion path will be able to "reason" locally on stored shouldBeLeaf values. Indeed, I was about to switch from the first top-down version to a bottom-up one for avoiding the repeated explorations, but then I realized that "lazy" could achieve the same result, while keeping the top-down exploration that is, in my opinion, more straightforward to read. I have updated the PR according to your suggestions, I haven't squashed the commits yet in case there will be further comments. I will reply to your comments below so I hope I won't miss anything. Thanks for your help. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20632 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20632 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org