Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20686#discussion_r173584784
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
---
@@ -324,19 +352,24 @@ class QuantileDiscretizerSuite
.setStages(Array(discretizerForCol1, discretizerForCol2,
discretizerForCol3))
.fit(df)
- val resultForMultiCols = plForMultiCols.transform(df)
- .select("result1", "result2", "result3")
- .collect()
+ val expected = plForSingleCol.transform(df).select("result1",
"result2", "result3").collect()
- val resultForSingleCol = plForSingleCol.transform(df)
- .select("result1", "result2", "result3")
- .collect()
+ testTransformerByGlobalCheckFunc[(Double, Double, Double)](
+ df,
+ plForMultiCols,
+ "result1",
+ "result2",
+ "result3") { rows =>
+ assert(rows === expected)
+ }
- resultForSingleCol.zip(resultForMultiCols).foreach {
- case (rowForSingle, rowForMultiCols) =>
- assert(rowForSingle.getDouble(0) == rowForMultiCols.getDouble(0) &&
- rowForSingle.getDouble(1) == rowForMultiCols.getDouble(1) &&
- rowForSingle.getDouble(2) == rowForMultiCols.getDouble(2))
+ testTransformerByGlobalCheckFunc[(Double, Double, Double)](
--- End diff --
I'd remove this. Testing vs. multiCol is already testing batch vs
streaming. No need to test singleCol against itself.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]