Github user asolimando commented on the issue:
https://github.com/apache/spark/pull/20632
Thanks to you @srowen and @sethah for all your feedbacks which sensibly
improved the PR!
---
-
To unsubscribe, e-mail
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r171983753
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -703,4 +707,16 @@ private object RandomForestSuite
Github user asolimando commented on the issue:
https://github.com/apache/spark/pull/20632
Thanks for the suggestion @sethah, I have updated the PR with the extra
check (both tests).
---
-
To unsubscribe, e-mail
Github user asolimando commented on the issue:
https://github.com/apache/spark/pull/20632
@sethah I have shortened the name as suggested and squashed the commits
into a single one, let me know it that's ok
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r171045897
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -631,10 +634,99 @@ class RandomForestSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r170734558
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -631,6 +651,160 @@ class RandomForestSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r170733980
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -541,7 +541,7 @@ object DecisionTreeSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r170140738
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -362,10 +365,10 @@ class RandomForestSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r170140280
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -359,29 +339,6 @@ class DecisionTreeSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r170139957
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -402,20 +405,40 @@ class RandomForestSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r169742383
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +291,34 @@ private[tree] class LearningNode
Github user asolimando commented on the issue:
https://github.com/apache/spark/pull/20632
Given that we are converging I have squashed the commits into a single one.
My local `mvn scalastyle:check` was passing (as well as the check done via
the Scala plugin for IntelliiJ
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r169172406
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -402,20 +406,40 @@ class RandomForestSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r169172053
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +291,34 @@ private[tree] class LearningNode
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168961174
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +740,55 @@ private object RandomForestSuite
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168961183
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +740,55 @@ private object RandomForestSuite
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168961088
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -303,26 +303,6 @@ class DecisionTreeSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168946503
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +689,96 @@ private object RandomForestSuite
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168946478
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +292,41 @@ private[tree] class LearningNode
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168946358
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +292,41 @@ private[tree] class LearningNode
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168925267
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +689,96 @@ private object RandomForestSuite
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168925279
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +689,96 @@ private object RandomForestSuite
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168925264
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -640,4 +689,96 @@ private object RandomForestSuite
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168925247
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +292,41 @@ private[tree] class LearningNode
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168925240
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -631,6 +654,32 @@ class RandomForestSuite extends
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168925232
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +292,41 @@ private[tree] class LearningNode
Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r168925224
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
@@ -287,6 +292,41 @@ private[tree] class LearningNode
Github user asolimando commented on the issue:
https://github.com/apache/spark/pull/20632
Hello Sean,
here is my understanding of the problem and the main intuition of the
proposed solution:
We want to have a tree such that it does not contain any redundant subtree
GitHub user asolimando opened a pull request:
https://github.com/apache/spark/pull/20632
[SPARK-3159] added subtree pruning in the translation from LearningNode to
Node, added unit tests for tree redundancy and adapted existing ones that were
affected
## What changes were proposed
29 matches
Mail list logo