[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20111 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159116537 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -98,6 +97,21 @@ class BucketedRandomProjectionLSHSuite MLTestingUtils.checkCopyAndUids(brp, brpModel) } + test("BucketedRandomProjectionLSH: streaming transform") { +val brp = new BucketedRandomProjectionLSH() + .setNumHashTables(2) + .setInputCol("keys") + .setOutputCol("values") + .setBucketLength(1.0) + .setSeed(12345) +val brpModel = brp.fit(dataset) + +testTransformer[Tuple1[Vector]](dataset.toDF(), brpModel, "keys", "values") { + case Row(_: Vector, values: Seq[_]) => --- End diff -- OK. so I prefer to use simpler code: ``` testTransformer[Tuple1[Vector]](dataset.toDF(), brpModel, "values") { case Row(values: Seq[_]) => ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159106208 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -98,6 +97,21 @@ class BucketedRandomProjectionLSHSuite MLTestingUtils.checkCopyAndUids(brp, brpModel) } + test("BucketedRandomProjectionLSH: streaming transform") { +val brp = new BucketedRandomProjectionLSH() + .setNumHashTables(2) + .setInputCol("keys") + .setOutputCol("values") + .setBucketLength(1.0) + .setSeed(12345) +val brpModel = brp.fit(dataset) + +testTransformer[Tuple1[Vector]](dataset.toDF(), brpModel, "keys", "values") { + case Row(_: Vector, values: Seq[_]) => --- End diff -- I don't think we have to. The main thing here is to make sure that the transform really does happen. Other tests check validity of the values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159048079 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -98,6 +97,21 @@ class BucketedRandomProjectionLSHSuite MLTestingUtils.checkCopyAndUids(brp, brpModel) } + test("BucketedRandomProjectionLSH: streaming transform") { +val brp = new BucketedRandomProjectionLSH() + .setNumHashTables(2) + .setInputCol("keys") + .setOutputCol("values") + .setBucketLength(1.0) + .setSeed(12345) +val brpModel = brp.fit(dataset) + +testTransformer[Tuple1[Vector]](dataset.toDF(), brpModel, "keys", "values") { + case Row(_: Vector, values: Seq[_]) => --- End diff -- So here do not need to select "keys" column ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159027879 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/HashingTFSuite.scala --- @@ -37,21 +36,28 @@ class HashingTFSuite extends SparkFunSuite with MLlibTestSparkContext with Defau } test("hashingTF") { --- End diff -- ditto: rearranged to do validity check per-row --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159027858 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala --- @@ -51,31 +48,31 @@ class FeatureHasherSuite extends SparkFunSuite } test("feature hashing") { --- End diff -- Rearranged this test so it checks each row independently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159022777 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala --- @@ -17,26 +17,23 @@ package org.apache.spark.ml.feature -import org.apache.spark.SparkFunSuite import org.apache.spark.ml.attribute.AttributeGroup import org.apache.spark.ml.linalg.{Vector, Vectors} import org.apache.spark.ml.param.ParamsSuite -import org.apache.spark.ml.util.DefaultReadWriteTest +import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest} import org.apache.spark.ml.util.TestingUtils._ -import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.sql.Row import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.functions.col import org.apache.spark.sql.types._ -class FeatureHasherSuite extends SparkFunSuite - with MLlibTestSparkContext - with DefaultReadWriteTest { +class FeatureHasherSuite extends MLTest with DefaultReadWriteTest { import testImplicits._ import HashingTFSuite.murmur3FeatureIdx - implicit private val vectorEncoder = ExpressionEncoder[Vector]() + implicit private val vectorEncoder: ExpressionEncoder[Vector] = ExpressionEncoder[Vector]() --- End diff -- scala style --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159022766 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ElementwiseProductSuite.scala --- @@ -17,13 +17,31 @@ package org.apache.spark.ml.feature -import org.apache.spark.SparkFunSuite -import org.apache.spark.ml.linalg.Vectors -import org.apache.spark.ml.util.DefaultReadWriteTest -import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.ml.linalg.{Vector, Vectors} +import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest} +import org.apache.spark.ml.util.TestingUtils._ +import org.apache.spark.sql.Row -class ElementwiseProductSuite - extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { +class ElementwiseProductSuite extends MLTest with DefaultReadWriteTest { + + import testImplicits._ + + test("streaming transform") { --- End diff -- No existing unit test to use --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159022677 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala --- @@ -163,18 +162,19 @@ class ChiSqSelectorSuite extends SparkFunSuite with MLlibTestSparkContext assert(expected.selectedFeatures === actual.selectedFeatures) } } -} -object ChiSqSelectorSuite { - - private def testSelector(selector: ChiSqSelector, dataset: Dataset[_]): ChiSqSelectorModel = { -val selectorModel = selector.fit(dataset) -selectorModel.transform(dataset).select("filtered", "topFeature").collect() - .foreach { case Row(vec1: Vector, vec2: Vector) => + private def testSelector(selector: ChiSqSelector, data: Dataset[_]): ChiSqSelectorModel = { --- End diff -- Moved from object to class b/c this needed testTransformer from the MLTest mix-in --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20111#discussion_r159022657 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -98,6 +97,21 @@ class BucketedRandomProjectionLSHSuite MLTestingUtils.checkCopyAndUids(brp, brpModel) } + test("BucketedRandomProjectionLSH: streaming transform") { --- End diff -- No existing test to use --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/20111 [SPARK-22883][ML][TEST] Streaming tests for spark.ml.feature, from A to H ## What changes were proposed in this pull request? Adds structured streaming tests using testTransformer for these suites: * BinarizerSuite * BucketedRandomProjectionLSHSuite * BucketizerSuite * ChiSqSelectorSuite * CountVectorizerSuite * DCTSuite.scala * ElementwiseProductSuite * FeatureHasherSuite * HashingTFSuite ## How was this patch tested? It tests itself because it is a bunch of tests! You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark SPARK-22883-streaming-featureAM Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20111.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20111 commit 12b3dcf13f90ea00c2a12ec186a5f3277e812095 Author: Joseph K. BradleyDate: 2017-12-29T03:31:17Z added streaming tests for first quarter of spark.ml.feature --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org