Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19993#discussion_r162940665
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala ---
    @@ -401,15 +390,14 @@ class BucketizerSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defa
         }
       }
     
    -  test("Both inputCol and inputCols are set") {
    -    val bucket = new Bucketizer()
    -      .setInputCol("feature1")
    -      .setOutputCol("result")
    -      .setSplits(Array(-0.5, 0.0, 0.5))
    -      .setInputCols(Array("feature1", "feature2"))
    -
    -    // When both are set, we ignore `inputCols` and just map the column 
specified by `inputCol`.
    -    assert(bucket.isBucketizeMultipleColumns() == false)
    +  test("assert exception is thrown if both multi-column and single-column 
params are set") {
    +    val df = Seq((0.5, 0.3), (0.5, -0.4)).toDF("feature1", "feature2")
    +    ParamsSuite.testExclusiveParams(new Bucketizer, df, ("inputCol", 
"feature1"),
    +      ("inputCols", Array("feature1", "feature2")))
    +    ParamsSuite.testExclusiveParams(new Bucketizer, df, ("outputCol", 
"result1"),
    +      ("outputCols", Array("result1", "result2")))
    +    ParamsSuite.testExclusiveParams(new Bucketizer, df, ("splits", 
Array(-0.5, 0.0, 0.5)),
    --- End diff --
    
    Only comment I have is that I believe this line is not testing what you may 
think.
    
    As I read the 
[checkSingleVsMultiColumnParams](https://github.com/apache/spark/pull/19993/files#diff-72f95a0938e5a140d5126f06bdc381a6R266)
 method, in this test case it will throw the error, _not_ because both `splits` 
and `splitsArray` are set, but rather because both `inputCol` & `inputCols` are 
_unset_.
    
    Actually it applies to the line above too.
    
    @jkbradley 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to