Github user manishamde commented on a diff in the pull request:
https://github.com/apache/spark/pull/2780#discussion_r18809178
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
@@ -102,6 +102,37 @@ class DecisionTreeSuite extends FunSuite with
LocalSparkContext {
assert(List(3.0, 2.0, 0.0).toSeq === l.toSeq)
}
+ test("find splits for a continuous feature") {
+ // find splits for normal case
+ {
+ val fakeMetadata = new DecisionTreeMetadata(1, 0, 0, 0,
+ Map(), Set(),
+ Array(6), Gini, QuantileStrategy.Sort,
+ 0, 0, 0.0, 0, 0
+ )
+ val featureSamples = Array.fill(200000)(math.random)
+ val splits =
DecisionTree.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
+ assert(splits.length === 5)
+ assert(fakeMetadata.numSplits(0) === 5)
+ assert(fakeMetadata.numBins(0) === 6)
+ }
+
--- End diff --
How about another unit test where the distribution is skewed: a) most
samples close to the minimum and b) most samples close to the maximum. This
will test the boundary conditions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]