Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165312057
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -662,28 +676,32 @@ class ALSSuite
val knownItem = data.select(max("item")).as[Int].first()
val unknownItem = knownItem + 20
val test = Seq(
- (unknownUser, unknownItem),
- (knownUser, unknownItem),
- (unknownUser, knownItem),
- (knownUser, knownItem)
- ).toDF("user", "item")
+ (unknownUser, unknownItem, true),
+ (knownUser, unknownItem, true),
+ (unknownUser, knownItem, true),
+ (knownUser, knownItem, false)
+ ).toDF("user", "item", "expectedIsNaN")
val als = new ALS().setMaxIter(1).setRank(1)
// default is 'nan'
val defaultModel = als.fit(data)
- val defaultPredictions =
defaultModel.transform(test).select("prediction").as[Float].collect()
- assert(defaultPredictions.length == 4)
- assert(defaultPredictions.slice(0, 3).forall(_.isNaN))
- assert(!defaultPredictions.last.isNaN)
+ var defaultPredictionNotNaN = Float.NaN
--- End diff --
I would get rid of this variable.
In `testTransformer` it just adds overhead,
`assert(!defaultPredictionNotNaN.isNaN)` asserts something that was already
checked in testTransformer, so it's only use is in
`testTransformerByGlobalCheckFunc`.
Producing it is a bit convoluted, it's not easy to understand why it's
needed.
I would make it clearer by doing a plain old transform using the `test` DF
(or a smaller one containing only the knownUser, knownItem pair) and selecting
the value.
An alternative solution could be to use real expected values in the `test`
DF instead of "isNan" flags.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]