Github user asolimando commented on a diff in the pull request:
https://github.com/apache/spark/pull/20632#discussion_r170140280
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -359,29 +339,6 @@ class DecisionTreeSuite extends SparkFunSuite with
MLlibTestSparkContext {
assert(rootNode.stats.isEmpty)
}
- test("do not choose split that does not satisfy min instance per node
requirements") {
- // if a split does not satisfy min instances per node requirements,
- // this split is invalid, even though the information gain of split is
large.
- val arr = Array(
- LabeledPoint(0.0, Vectors.dense(0.0, 1.0)),
--- End diff --
That's true , I have modified the input data for both tests as suggested,
and "moved back" the two tests from _.../ml/tree/impl/RandomForestSuite.scala_
to _.../mllib/tree/DecisionTreeSuite.scala_ where they originally were. The
whole suite of tests for mllib passes.
As a recap, 2 tests have been adapted by slightly changing the input data:
- _"Multiclass classification stump with 10-ary (ordered) categorical
features"_
- _"do not choose split that does not satisfy min instance per node
requirements"_
_"Use soft prediction for binary classification with ordered categorical
features"_ was present in two files:
1. _.../ml/classification/DecisionTreeClassifierSuite.scala_
2. _.../ml/tree/impl/RandomForestSuite.scala_
The one in 1. has been removed because it had to be adapted and it was
redundant, while the one in 2. has been adapted following the same principle of
other tests in that file such as _"Avoid aggregation on the last level" test,
for instance"_.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]