[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

smurakozi Thu, 01 Feb 2018 02:26:20 -0800

Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20362#discussion_r165312057
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
    @@ -662,28 +676,32 @@ class ALSSuite
         val knownItem = data.select(max("item")).as[Int].first()
         val unknownItem = knownItem + 20
         val test = Seq(
    -      (unknownUser, unknownItem),
    -      (knownUser, unknownItem),
    -      (unknownUser, knownItem),
    -      (knownUser, knownItem)
    -    ).toDF("user", "item")
    +      (unknownUser, unknownItem, true),
    +      (knownUser, unknownItem, true),
    +      (unknownUser, knownItem, true),
    +      (knownUser, knownItem, false)
    +    ).toDF("user", "item", "expectedIsNaN")
     
         val als = new ALS().setMaxIter(1).setRank(1)
         // default is 'nan'
         val defaultModel = als.fit(data)
    -    val defaultPredictions = 
defaultModel.transform(test).select("prediction").as[Float].collect()
    -    assert(defaultPredictions.length == 4)
    -    assert(defaultPredictions.slice(0, 3).forall(_.isNaN))
    -    assert(!defaultPredictions.last.isNaN)
    +    var defaultPredictionNotNaN = Float.NaN
    --- End diff --
    
    I would get rid of this variable. 
    In `testTransformer` it just adds overhead, 
`assert(!defaultPredictionNotNaN.isNaN)` asserts something that was already 
checked in testTransformer, so it's only use is in 
`testTransformerByGlobalCheckFunc`.
    Producing it is a bit convoluted, it's not easy to understand why it's 
needed. 
    I would make it clearer by doing a plain old transform using the `test` DF 
(or a smaller one containing only the knownUser, knownItem pair) and selecting 
the value.
    An alternative solution could be to use real expected values in the `test` 
DF instead of "isNan" flags.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

Reply via email to