Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20686#discussion_r173584784
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala 
---
    @@ -324,19 +352,24 @@ class QuantileDiscretizerSuite
           .setStages(Array(discretizerForCol1, discretizerForCol2, 
discretizerForCol3))
           .fit(df)
     
    -    val resultForMultiCols = plForMultiCols.transform(df)
    -      .select("result1", "result2", "result3")
    -      .collect()
    +    val expected = plForSingleCol.transform(df).select("result1", 
"result2", "result3").collect()
     
    -    val resultForSingleCol = plForSingleCol.transform(df)
    -      .select("result1", "result2", "result3")
    -      .collect()
    +    testTransformerByGlobalCheckFunc[(Double, Double, Double)](
    +      df,
    +      plForMultiCols,
    +      "result1",
    +      "result2",
    +      "result3") { rows =>
    +        assert(rows === expected)
    +      }
     
    -    resultForSingleCol.zip(resultForMultiCols).foreach {
    -      case (rowForSingle, rowForMultiCols) =>
    -        assert(rowForSingle.getDouble(0) == rowForMultiCols.getDouble(0) &&
    -          rowForSingle.getDouble(1) == rowForMultiCols.getDouble(1) &&
    -          rowForSingle.getDouble(2) == rowForMultiCols.getDouble(2))
    +    testTransformerByGlobalCheckFunc[(Double, Double, Double)](
    --- End diff --
    
    I'd remove this.  Testing vs. multiCol is already testing batch vs 
streaming.  No need to test singleCol against itself.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to