Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165745445
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +598,11 @@ class ALSSuite
(ex, act) =>
ex.userFactors.first().getSeq[Float](1) ===
act.userFactors.first.getSeq[Float](1)
} { (ex, act, _) =>
- ex.transform(_:
DataFrame).select("prediction").first.getDouble(0) ~==
- act.transform(_:
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+ testTransformerByGlobalCheckFunc[Float](_: DataFrame, act,
"prediction") {
+ case actRows: Seq[Row] =>
+ ex.transform(_:
DataFrame).select("prediction").first.getDouble(0) ~==
+ actRows(0).getDouble(0) absTol 1e-6
+ }
--- End diff --
I think this code does not check anything.
`testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, "prediction")`
is just a partial application of `testTransformerByGlobalCheckFunc`.
However, `checkNumericTypesALS` expects `check2: (ALSModel, ALSModel,
DataFrame) => Unit`. It's happy to call the provided function, discard the
partially applied function and use `()` instead, so it will typecheck.
As a consequence, the function doing the assert is never called, so the
`~===` assertion never happens. You can check it say by asking for the 100th
column of the first row - it will not produce an error.
This problem is not a result of your change, the original code had the same
issue.
It could probably be simplified a bit but I think the original intent was
to do a check like this:
```
{ (ex, act, df) =>
ex.transform(df).selectExpr("cast(prediction as
double)").first.getDouble(0) ~==
act.transform(df).selectExpr("cast(prediction as
double)").first.getDouble(0) absTol
1e-6
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]