Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/20121#discussion_r172288890
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -2567,10 +2504,13 @@ class LogisticRegressionSuite
val model1 = lr.fit(smallBinaryDataset)
val lr2 = new
LogisticRegression().setInitialModel(model1).setMaxIter(5).setFamily("binomial")
val model2 = lr2.fit(smallBinaryDataset)
- val predictions1 =
model1.transform(smallBinaryDataset).select("prediction").collect()
- val predictions2 =
model2.transform(smallBinaryDataset).select("prediction").collect()
- predictions1.zip(predictions2).foreach { case (Row(p1: Double),
Row(p2: Double)) =>
- assert(p1 === p2)
+ val binaryExpected =
model1.transform(smallBinaryDataset).select("prediction").collect()
+ .map(_.getDouble(0))
+ for (model <- Seq(model1, model2)) {
--- End diff --
My thought is that testing binaryExpected (from model1) against model2
would already test the 2 things we care about:
* batch vs streaming prediction
* initial model
I'll just merge this though since it's not a big deal (just a bit longer
testing time).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]