[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

manishamde Mon, 13 Oct 2014 22:04:25 -0700

Github user manishamde commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2780#discussion_r18809178
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
    @@ -102,6 +102,37 @@ class DecisionTreeSuite extends FunSuite with 
LocalSparkContext {
         assert(List(3.0, 2.0, 0.0).toSeq === l.toSeq)
       }
     
    +  test("find splits for a continuous feature") {
    +    // find splits for normal case
    +    {
    +      val fakeMetadata = new DecisionTreeMetadata(1, 0, 0, 0,
    +        Map(), Set(),
    +        Array(6), Gini, QuantileStrategy.Sort,
    +        0, 0, 0.0, 0, 0
    +      )
    +      val featureSamples = Array.fill(200000)(math.random)
    +      val splits = 
DecisionTree.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
    +      assert(splits.length === 5)
    +      assert(fakeMetadata.numSplits(0) === 5)
    +      assert(fakeMetadata.numBins(0) === 6)
    +    }
    +
    --- End diff --
    
    How about another unit test where the distribution is skewed: a) most 
samples close to the minimum and b) most samples close to the maximum. This 
will test the boundary conditions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

Reply via email to