subject:"\[GitHub\] spark pull request #20362\: \[Spark\-22886\]\[ML\]\[TESTS\] ML test for structured s..."

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20362


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170074167
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -586,6 +586,68 @@ class ALSSuite
   allModelParamSettings, checkModelData)
   }
 
+  private def checkNumericTypesALS(
--- End diff --

It's changed because one of the test hasn't done anything. Please take a 
look at the last commit it contains a test bugfix not only the transformation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170072578
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -413,34 +411,36 @@ class ALSSuite
   .setSeed(0)
 val alpha = als.getAlpha
 val model = als.fit(training.toDF())
-val predictions = model.transform(test.toDF()).select("rating", 
"prediction").rdd.map {
-  case Row(rating: Float, prediction: Float) =>
-(rating.toDouble, prediction.toDouble)
+testTransformerByGlobalCheckFunc[Rating[Int]](test.toDF(), model, 
"rating", "prediction") {
+case rows: Seq[Row] =>
+  val predictions = rows.map(row => (row.getFloat(0).toDouble, 
row.getFloat(1).toDouble))
+
+  val rmse =
+if (implicitPrefs) {
+  // TODO: Use a better (rank-based?) evaluation metric for 
implicit feedback.
+  // We limit the ratings and the predictions to interval [0, 
1] and compute the
+  // weighted RMSE with the confidence scores as weights.
+  val (totalWeight, weightedSumSq) = predictions.map { case 
(rating, prediction) =>
+val confidence = 1.0 + alpha * math.abs(rating)
+val rating01 = math.max(math.min(rating, 1.0), 0.0)
+val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
+val err = prediction01 - rating01
+(confidence, confidence * err * err)
+  }.reduce[(Double, Double)] { case ((c0, e0), (c1, e1)) =>
+(c0 + c1, e0 + e1)
+  }
+  math.sqrt(weightedSumSq / totalWeight)
+} else {
+  val errorSquares = predictions.map { case (rating, 
prediction) =>
+val err = rating - prediction
+err * err
+  }
+  val mse = errorSquares.sum / errorSquares.length
+  math.sqrt(mse)
+}
+  logInfo(s"Test RMSE is $rmse.")
+  assert(rmse < targetRMSE)
 }
-val rmse =
--- End diff --

Mainly move but there was no mean function so implemented.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170072392
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -586,6 +586,68 @@ class ALSSuite
   allModelParamSettings, checkModelData)
   }
 
+  private def checkNumericTypesALS(
--- End diff --

Mainly move but there was no mean function so implemented this oneliner.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170046788
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -586,6 +586,68 @@ class ALSSuite
   allModelParamSettings, checkModelData)
   }
 
+  private def checkNumericTypesALS(
--- End diff --

Is this mostly just moving the code from the test class, or did it change 
too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170047180
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +766,7 @@ class ALSSuite
 val data = ratings.toDF
 val model = new ALS().fit(data)
 Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
-  model.setColdStartStrategy(s).transform(data)
+  testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), 
"prediction") { _ => }
--- End diff --

What's the no-op function at the end for -- just because it requires an 
argument?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-22 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r170046857
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -413,34 +411,36 @@ class ALSSuite
   .setSeed(0)
 val alpha = als.getAlpha
 val model = als.fit(training.toDF())
-val predictions = model.transform(test.toDF()).select("rating", 
"prediction").rdd.map {
-  case Row(rating: Float, prediction: Float) =>
-(rating.toDouble, prediction.toDouble)
+testTransformerByGlobalCheckFunc[Rating[Int]](test.toDF(), model, 
"rating", "prediction") {
+case rows: Seq[Row] =>
+  val predictions = rows.map(row => (row.getFloat(0).toDouble, 
row.getFloat(1).toDouble))
+
+  val rmse =
+if (implicitPrefs) {
+  // TODO: Use a better (rank-based?) evaluation metric for 
implicit feedback.
+  // We limit the ratings and the predictions to interval [0, 
1] and compute the
+  // weighted RMSE with the confidence scores as weights.
+  val (totalWeight, weightedSumSq) = predictions.map { case 
(rating, prediction) =>
+val confidence = 1.0 + alpha * math.abs(rating)
+val rating01 = math.max(math.min(rating, 1.0), 0.0)
+val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
+val err = prediction01 - rating01
+(confidence, confidence * err * err)
+  }.reduce[(Double, Double)] { case ((c0, e0), (c1, e1)) =>
+(c0 + c1, e0 + e1)
+  }
+  math.sqrt(weightedSumSq / totalWeight)
+} else {
+  val errorSquares = predictions.map { case (rating, 
prediction) =>
+val err = rating - prediction
+err * err
+  }
+  val mse = errorSquares.sum / errorSquares.length
+  math.sqrt(mse)
+}
+  logInfo(s"Test RMSE is $rmse.")
+  assert(rmse < targetRMSE)
 }
-val rmse =
--- End diff --

This change is just a move, really? or did something else change as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-08 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r167005865
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +598,11 @@ class ALSSuite
   (ex, act) =>
 ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first.getSeq[Float](1)
 } { (ex, act, _) =>
-  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
-act.transform(_: 
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+  testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, 
"prediction") {
+case actRows: Seq[Row] =>
+  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
+actRows(0).getDouble(0) absTol 1e-6
+  }
--- End diff --

Woah, didn't check the original functionality in such a depth. This is 
really dead code in runtime environment. Fixed ð 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-02 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165745445
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +598,11 @@ class ALSSuite
   (ex, act) =>
 ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first.getSeq[Float](1)
 } { (ex, act, _) =>
-  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
-act.transform(_: 
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+  testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, 
"prediction") {
+case actRows: Seq[Row] =>
+  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
+actRows(0).getDouble(0) absTol 1e-6
+  }
--- End diff --

I think this code does not check anything. 
`testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, "prediction")` 
is just a partial application of `testTransformerByGlobalCheckFunc`. 
However,  `checkNumericTypesALS` expects `check2: (ALSModel, ALSModel, 
DataFrame) => Unit`. It's happy to call the provided function, discard the 
partially applied function and use `()` instead, so it will typecheck.
As a consequence, the function doing the assert is never called, so the 
`~===` assertion never happens. You can check it say by asking for the 100th 
column of the first row - it will not produce an error.

This problem is not a result of your change, the original code had the same 
issue.

It could probably be simplified a bit but I think the original intent was 
to do a check like this:

```
{ (ex, act, df) =>
  ex.transform(df).selectExpr("cast(prediction as 
double)").first.getDouble(0) ~==
act.transform(df).selectExpr("cast(prediction as 
double)").first.getDouble(0) absTol
  1e-6
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165396480
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -662,28 +676,32 @@ class ALSSuite
 val knownItem = data.select(max("item")).as[Int].first()
 val unknownItem = knownItem + 20
 val test = Seq(
-  (unknownUser, unknownItem),
-  (knownUser, unknownItem),
-  (unknownUser, knownItem),
-  (knownUser, knownItem)
-).toDF("user", "item")
+  (unknownUser, unknownItem, true),
+  (knownUser, unknownItem, true),
+  (unknownUser, knownItem, true),
+  (knownUser, knownItem, false)
+).toDF("user", "item", "expectedIsNaN")
 
 val als = new ALS().setMaxIter(1).setRank(1)
 // default is 'nan'
 val defaultModel = als.fit(data)
-val defaultPredictions = 
defaultModel.transform(test).select("prediction").as[Float].collect()
-assert(defaultPredictions.length == 4)
-assert(defaultPredictions.slice(0, 3).forall(_.isNaN))
-assert(!defaultPredictions.last.isNaN)
+var defaultPredictionNotNaN = Float.NaN
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165392031
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +711,9 @@ class ALSSuite
 val data = ratings.toDF
 val model = new ALS().fit(data)
 Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
-  model.setColdStartStrategy(s).transform(data)
+  testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), 
"prediction") {
+case _ =>
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165382596
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -653,6 +666,7 @@ class ALSSuite
   test("ALS cold start user/item prediction strategy") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165382316
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -628,18 +635,24 @@ class ALSSuite
 }
 withClue("transform should fail when ids exceed integer range. ") {
   val model = als.fit(df)
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_big").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_small").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_big").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_small").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
+  def testTransformIdExceedsIntRange[A : Encoder](dataFrame: 
DataFrame): Unit = {
+assert(intercept[SparkException] {
+  model.transform(dataFrame).first
+}.getMessage.contains(msg))
+assert(intercept[StreamingQueryException] {
+  testTransformer[A](dataFrame, model, "prediction") {
+case _ =>
--- End diff --

Partial function removed. This code part expects `StreamingQueryException` 
which is quite close to this area. Not sure whether a comment would make it 
better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165379876
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +599,15 @@ class ALSSuite
   (ex, act) =>
 ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first.getSeq[Float](1)
 } { (ex, act, _) =>
-  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
-act.transform(_: 
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+  testTransformerByGlobalCheckFunc[Float](_: DataFrame, ex, 
"prediction") {
+case exRows: Seq[Row] =>
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165378573
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -566,6 +565,7 @@ class ALSSuite
   test("read/write") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165304423
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -566,6 +565,7 @@ class ALSSuite
   test("read/write") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

nit: new line is not needed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165308129
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -628,18 +635,24 @@ class ALSSuite
 }
 withClue("transform should fail when ids exceed integer range. ") {
   val model = als.fit(df)
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_big").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("user_small").as("user"), 
df("item"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_big").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-model.transform(df.select(df("item_small").as("item"), 
df("user"))).first
-  }.getMessage.contains(msg))
+  def testTransformIdExceedsIntRange[A : Encoder](dataFrame: 
DataFrame): Unit = {
+assert(intercept[SparkException] {
+  model.transform(dataFrame).first
+}.getMessage.contains(msg))
+assert(intercept[StreamingQueryException] {
+  testTransformer[A](dataFrame, model, "prediction") {
+case _ =>
--- End diff --

No need for a partial function here, you can simplify it to `{ _ => }`. 
I would also add a small comment to make it explicit that we intentionally 
do not check anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165305989
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +599,15 @@ class ALSSuite
   (ex, act) =>
 ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first.getSeq[Float](1)
 } { (ex, act, _) =>
-  ex.transform(_: 
DataFrame).select("prediction").first.getDouble(0) ~==
-act.transform(_: 
DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+  testTransformerByGlobalCheckFunc[Float](_: DataFrame, ex, 
"prediction") {
+case exRows: Seq[Row] =>
--- End diff --

I think it's ok to keep ex.transform here. This way the code will be a bit 
simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165312057
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -662,28 +676,32 @@ class ALSSuite
 val knownItem = data.select(max("item")).as[Int].first()
 val unknownItem = knownItem + 20
 val test = Seq(
-  (unknownUser, unknownItem),
-  (knownUser, unknownItem),
-  (unknownUser, knownItem),
-  (knownUser, knownItem)
-).toDF("user", "item")
+  (unknownUser, unknownItem, true),
+  (knownUser, unknownItem, true),
+  (unknownUser, knownItem, true),
+  (knownUser, knownItem, false)
+).toDF("user", "item", "expectedIsNaN")
 
 val als = new ALS().setMaxIter(1).setRank(1)
 // default is 'nan'
 val defaultModel = als.fit(data)
-val defaultPredictions = 
defaultModel.transform(test).select("prediction").as[Float].collect()
-assert(defaultPredictions.length == 4)
-assert(defaultPredictions.slice(0, 3).forall(_.isNaN))
-assert(!defaultPredictions.last.isNaN)
+var defaultPredictionNotNaN = Float.NaN
--- End diff --

I would get rid of this variable. 
In `testTransformer` it just adds overhead, 
`assert(!defaultPredictionNotNaN.isNaN)` asserts something that was already 
checked in testTransformer, so it's only use is in 
`testTransformerByGlobalCheckFunc`.
Producing it is a bit convoluted, it's not easy to understand why it's 
needed. 
I would make it clearer by doing a plain old transform using the `test` DF 
(or a smaller one containing only the knownUser, knownItem pair) and selecting 
the value.
An alternative solution could be to use real expected values in the `test` 
DF instead of "isNan" flags. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165312157
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +711,9 @@ class ALSSuite
 val data = ratings.toDF
 val model = new ALS().fit(data)
 Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
-  model.setColdStartStrategy(s).transform(data)
+  testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), 
"prediction") {
+case _ =>
--- End diff --

Just like above, no need for partial function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-02-01 Thread smurakozi

Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20362#discussion_r165308205
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -653,6 +666,7 @@ class ALSSuite
   test("ALS cold start user/item prediction strategy") {
 val spark = this.spark
 import spark.implicits._
+
--- End diff --

nit: no need for empty line here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-01-23 Thread gaborgsomogyi

GitHub user gaborgsomogyi opened a pull request:

https://github.com/apache/spark/pull/20362

[Spark-22886][ML][TESTS] ML test for structured streaming: ml.recommeâ¦

## What changes were proposed in this pull request?

Converting spark.ml.recommendation tests to also check code with structured 
streaming, using the ML testing infrastructure implemented in SPARK-22882.

## How was this patch tested?

Automated: Pass the Jenkins.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaborgsomogyi/spark SPARK-22886

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20362


commit 33654c93c2fe240eb0c6a6932353239ab84b0ce0
Author: Gabor Somogyi 
Date:   2018-01-18T20:27:08Z

[Spark-22886][ML][TESTS] ML test for structured streaming: ml.recommendation




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

22 matches

Site Navigation

Mail list logo

Footer information