[GitHub] spark pull request #20632: [SPARK-3159] added subtree pruning in the transla...

asolimando Thu, 22 Feb 2018 17:00:03 -0800

Github user asolimando commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20632#discussion_r170140280
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala ---
    @@ -359,29 +339,6 @@ class DecisionTreeSuite extends SparkFunSuite with 
MLlibTestSparkContext {
         assert(rootNode.stats.isEmpty)
       }
     
    -  test("do not choose split that does not satisfy min instance per node 
requirements") {
    -    // if a split does not satisfy min instances per node requirements,
    -    // this split is invalid, even though the information gain of split is 
large.
    -    val arr = Array(
    -      LabeledPoint(0.0, Vectors.dense(0.0, 1.0)),
    --- End diff --
    
    That's true , I have modified the input data for both tests as suggested, 
and "moved back" the two tests from _.../ml/tree/impl/RandomForestSuite.scala_ 
to _.../mllib/tree/DecisionTreeSuite.scala_ where they originally were. The 
whole suite of tests for mllib passes. 
    
    As a recap, 2 tests have been adapted by slightly changing the input data:
    - _"Multiclass classification stump with 10-ary (ordered) categorical 
features"_
    -  _"do not choose split that does not satisfy min instance per node 
requirements"_
    
    _"Use soft prediction for binary classification with ordered categorical 
features"_ was present in two files:
    
    1. _.../ml/classification/DecisionTreeClassifierSuite.scala_
    2. _.../ml/tree/impl/RandomForestSuite.scala_
    
    The one in 1. has been removed because it had to be adapted and it was 
redundant, while the one in 2. has been adapted following the same principle of 
other tests in that file such as _"Avoid aggregation on the last level" test, 
for instance"_.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20632: [SPARK-3159] added subtree pruning in the transla...

Reply via email to