[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-28 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/20632
  
@asolimando Can you change the title to include `[ML]` and also shorten it. 
Maybe just: `Add decision tree pruning`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #4136 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4136/testReport)**
 for PR 20632 at commit 
[`d02ff2a`](https://github.com/apache/spark/commit/d02ff2aa5c7db983beb5c2af03f5cd2df1aca135).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #4136 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4136/testReport)**
 for PR 20632 at commit 
[`d02ff2a`](https://github.com/apache/spark/commit/d02ff2aa5c7db983beb5c2af03f5cd2df1aca135).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-28 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/20632
  
@srowen Do we need you to trigger the tests? I'm not sure why they haven't 
been run...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-28 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/20632
  
Jenkins test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-27 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20632
  
PS @sethah feel free to merge this one when you think it's ready.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #4103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4103/testReport)**
 for PR 20632 at commit 
[`eeb3cc9`](https://github.com/apache/spark/commit/eeb3cc93ffa39198ccaba75fc7a63032325baf1e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #4103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4103/testReport)**
 for PR 20632 at commit 
[`eeb3cc9`](https://github.com/apache/spark/commit/eeb3cc93ffa39198ccaba75fc7a63032325baf1e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-20 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20632
  
Run `./dev/scalastyle` to replicate what Jenkins will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-20 Thread asolimando
Github user asolimando commented on the issue:

https://github.com/apache/spark/pull/20632
  
Given that we are converging I have squashed the commits into a single one.

My local `mvn scalastyle:check` was passing (as well as the check done via 
the Scala plugin for IntelliiJ) while the one triggered in Jenkins failed.

I have fixed the specific issue, but there might be more, any suggestion on 
how to reproduce the same setup locally in order to be sure there won't be 
additional issues on scalastyle side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #4100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4100/testReport)**
 for PR 20632 at commit 
[`fd3b5df`](https://github.com/apache/spark/commit/fd3b5df05352fa581780f17510bb0d6662359647).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #4100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4100/testReport)**
 for PR 20632 at commit 
[`fd3b5df`](https://github.com/apache/spark/commit/fd3b5df05352fa581780f17510bb0d6662359647).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-19 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20632
  
Another one that @jkbradley or @MLnick might want to look at, but seems 
like a nice win.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-17 Thread asolimando
Github user asolimando commented on the issue:

https://github.com/apache/spark/pull/20632
  
Hello Sean,
here is my understanding of the problem and the main intuition of the 
proposed solution:
We want to have a tree such that it does not contain any redundant subtree.
A subtree is redundant iff it is not composed by a leaf and all the leaves 
belonging to it agree on the prediction (the leafPredictions has at least two 
distinct values).

Given that for any subtree there is exactly one root node, we "store" this 
information in its root (learning) node.

A redundant node can be replaced by a single leaf, predicting the single 
prediction value reachable via that subtree.

So, there are basically two cases in our top-down exploration:
1) The subtree is redundant -> the whole subtree is explored (linearly) to 
collect the predictions, the node is translated into a LeafNode, and thus the 
toNode method is never called on its children, so no extra exploration
2) The subtree is *not* redundant -> the whole subtree is explored 
(linearly) to collect the predictions, the node is translated into an 
InternalNode, and toNode recursively called on its children. However, at this 
point, the lazy value shouldBeLeaf (private lazy val shouldBeLeaf: Boolean) has 
been already set in the previous call, so there won't be any further 
exploration. Each further step down the recursion path will be able to "reason" 
locally on stored shouldBeLeaf values.

Indeed, I was about to switch from the first top-down version to a 
bottom-up one for avoiding the repeated explorations, but then I realized that 
"lazy" could achieve the same result, while keeping the top-down exploration 
that is, in my opinion, more straightforward to read.

I have updated the PR according to your suggestions, I haven't squashed the 
commits yet in case there will be further comments.

I will reply to your comments below so I hope I won't miss anything.

Thanks for your help.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20632
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20632
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org