Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/19993#discussion_r162940665
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala ---
@@ -401,15 +390,14 @@ class BucketizerSuite extends SparkFunSuite with
MLlibTestSparkContext with Defa
}
}
- test("Both inputCol and inputCols are set") {
- val bucket = new Bucketizer()
- .setInputCol("feature1")
- .setOutputCol("result")
- .setSplits(Array(-0.5, 0.0, 0.5))
- .setInputCols(Array("feature1", "feature2"))
-
- // When both are set, we ignore `inputCols` and just map the column
specified by `inputCol`.
- assert(bucket.isBucketizeMultipleColumns() == false)
+ test("assert exception is thrown if both multi-column and single-column
params are set") {
+ val df = Seq((0.5, 0.3), (0.5, -0.4)).toDF("feature1", "feature2")
+ ParamsSuite.testExclusiveParams(new Bucketizer, df, ("inputCol",
"feature1"),
+ ("inputCols", Array("feature1", "feature2")))
+ ParamsSuite.testExclusiveParams(new Bucketizer, df, ("outputCol",
"result1"),
+ ("outputCols", Array("result1", "result2")))
+ ParamsSuite.testExclusiveParams(new Bucketizer, df, ("splits",
Array(-0.5, 0.0, 0.5)),
--- End diff --
Only comment I have is that I believe this line is not testing what you may
think.
As I read the
[checkSingleVsMultiColumnParams](https://github.com/apache/spark/pull/19993/files#diff-72f95a0938e5a140d5126f06bdc381a6R266)
method, in this test case it will throw the error, _not_ because both `splits`
and `splitsArray` are set, but rather because both `inputCol` & `inputCols` are
_unset_.
Actually it applies to the line above too.
@jkbradley
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]