[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16013 **[Test build #69219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69219/consoleFull)** for PR 16013 at commit [`29d65cc`](https://github.com/apache/spark/commit/29d65cce3e5f2e29010609c9323cd79ca889b9f8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69219/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69229/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69228/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16029 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15983 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15976 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15976 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69224/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16029 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69227/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69222/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69225/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15983 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69223/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15780 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69221/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15780 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16029 This is too trivial to bother with. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/16030 [SPARK-18108][SQL] Fix a bug to fail partition schema inference ## What changes were proposed in this pull request? This pr is to fix a bug to fail partition schema inference; ``` scala> case class A(a: Long, b: Int) scala> val as = Seq(A(1, 2)) scala> spark.createDataFrame(as).write.parquet("/data/a=1/") scala> val df = spark.read.parquet("/data/") scala> df.printSchema root |-- a: long (nullable = true) |-- b: integer (nullable = true) scala> df.collect java.lang.NullPointerException at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:283) at org.apache.spark.sql.execution.vectorized.ColumnarBatch$Row.getLong(ColumnarBatch.java:191) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) ``` This was because spark failed to infer the partition column as `LongType` and it wrongly regarded the column as `IntegerType` in `DataSource`. Therefore, the query failed in scanning the column from a parquet file. ## How was this patch tested? Add tests in `ParquetPartitionDiscoverySuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-18108 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16030.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16030 commit 6bd8b4cdb63b20bc292a5ec1d8ca38281ee5bfbf Author: Takeshi YAMAMURO Date: 2016-11-28T07:45:30Z Fix a bug to fail partition schema inference --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16030 This query passed in the released spark-2.0.2, so it seems this regression is involved with SPARK-18510. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69231/consoleFull)** for PR 15994 at commit [`662acfb`](https://github.com/apache/spark/commit/662acfb9ab046842f0fbe2f9344dd3c0df12ad7a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69230/consoleFull)** for PR 16030 at commit [`6bd8b4c`](https://github.com/apache/spark/commit/6bd8b4cdb63b20bc292a5ec1d8ca38281ee5bfbf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16000: [SPARK-18537][Web UI]Add a REST api to spark streaming
Github user ChorPangChan commented on the issue: https://github.com/apache/spark/pull/16000 if there is no other comment, i believe this PR is ready to go. @ajbozarth please forgive me if its not appropriate to ask but will you please take a look on the code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89736733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -153,19 +168,20 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { * (Scala-specific) Returns a new [[DataFrame]] that replaces null or NaN values in specified * numeric columns. If a specified column is not a numeric column, it is ignored. * + * @since 2.1.0 + */ + def fill(value: Long, cols: Seq[String]): DataFrame = { +fill1(value, cols) --- End diff -- nit: put it in one line? i.e. `def fill... = fill1...` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15976 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14136 **[Test build #69233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69233/consoleFull)** for PR 14136 at commit [`3c699ad`](https://github.com/apache/spark/commit/3c699adfee609781c1e4ce2c08493308f5e7f511). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15780 **[Test build #69232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69232/consoleFull)** for PR 15780 at commit [`2a1287a`](https://github.com/apache/spark/commit/2a1287a84cb303a8df9f8c310aad154e04b6b4d4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89736915 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in specified + * numeric, string columns. If a specified column is not a numeric, string column, + * it is ignored. + */ + private def fill1[T](value: T, cols: Seq[String]): DataFrame = { +// the fill[T] which T is Long/Integer/Float/Double, --- End diff -- why the T can be `Integer` and `Float`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89736984 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in specified + * numeric, string columns. If a specified column is not a numeric, string column, + * it is ignored. + */ + private def fill1[T](value: T, cols: Seq[String]): DataFrame = { +// the fill[T] which T is Long/Integer/Float/Double, +// should apply on all the NumericType Column, for example: +// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b") +// input.na.fill(3.1) +// the result is (3,164.3), not (null, 164.3) --- End diff -- `(3, 164.3)`? shouldn't it be 3.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89737081 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in specified + * numeric, string columns. If a specified column is not a numeric, string column, + * it is ignored. + */ + private def fill1[T](value: T, cols: Seq[String]): DataFrame = { +// the fill[T] which T is Long/Integer/Float/Double, +// should apply on all the NumericType Column, for example: +// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b") +// input.na.fill(3.1) +// the result is (3,164.3), not (null, 164.3) +val targetType = value match { + case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Long => NumericType --- End diff -- why we match `jd.Double` here intead of scala `Double`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15976 **[Test build #69234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69234/consoleFull)** for PR 15976 at commit [`6db5af9`](https://github.com/apache/spark/commit/6db5af95e456d6529a37c243f41a4632a69f40d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89737732 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -643,8 +645,9 @@ case class ExternalMapToCatalyst private( override def foldable: Boolean = false - override def dataType: MapType = MapType( -keyConverter.dataType, valueConverter.dataType, valueContainsNull = valueConverter.nullable) + override def dataType: MapType = { +MapType(keyConverter.dataType, valueConverter.dataType, valueConverter.nullable) + } --- End diff -- Looks no difference here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89737784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { /** * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. * + * @since 2.1.0 + */ + def fill(value: Long): DataFrame = fill(value, df.columns) + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. --- End diff -- Could I ask change `[[DataFrame]]` to `` `DataFrame` ``? It seems the `DataFrame` is unrecognisable via unidoc/genjavadoc (see https://github.com/apache/spark/pull/16013) which ends up with documentation build failure with Java 8. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89737836 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala --- @@ -74,7 +74,8 @@ case class ReferenceToExpressions(result: Expression, children: Seq[Expression]) ctx.addMutableState("boolean", classChildVarIsNull, "") val classChildVar = - LambdaVariable(classChildVarName, classChildVarIsNull, child.dataType) + LambdaVariable(classChildVarName, classChildVarIsNull, child.dataType, +childGen.isNull != "false") --- End diff -- Use `child.nullable` if you want to specify it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89739080 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -396,12 +396,15 @@ object JavaTypeInference { case _ if mapType.isAssignableFrom(typeToken) => val (keyType, valueType) = mapKeyValueType(typeToken) + val (_, valueNullable) = inferDataType(valueType) --- End diff -- good catch. done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89739106 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -643,8 +645,9 @@ case class ExternalMapToCatalyst private( override def foldable: Boolean = false - override def dataType: MapType = MapType( -keyConverter.dataType, valueConverter.dataType, valueContainsNull = valueConverter.nullable) + override def dataType: MapType = { +MapType(keyConverter.dataType, valueConverter.dataType, valueConverter.nullable) + } --- End diff -- good catch. done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/14136 Currently `ImplicitTypeCasts` doesn't support cast between `ArrayType(elementType)`s, so we have to support `ArrayType(NumericType)` for now. When we have add that support, we could make the code for analyze `percentageExpression` more concise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89739433 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala --- @@ -74,7 +74,8 @@ case class ReferenceToExpressions(result: Expression, children: Seq[Expression]) ctx.addMutableState("boolean", classChildVarIsNull, "") val classChildVar = - LambdaVariable(classChildVarName, classChildVarIsNull, child.dataType) + LambdaVariable(classChildVarName, classChildVarIsNull, child.dataType, +childGen.isNull != "false") --- End diff -- Thank you very much for your point. done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15780 **[Test build #69235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69235/consoleFull)** for PR 15780 at commit [`b7bf966`](https://github.com/apache/spark/commit/b7bf966a808668c08787c39632fc4634c9a8d3da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89739977 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -67,17 +67,19 @@ private[spark] object SamplingUtils { } /** - * Returns a sampling rate that guarantees a sample of size >= sampleSizeLowerBound 99.99% of - * the time. + * Returns a sampling rate that guarantees a sample of size greater than or equal to + * sampleSizeLowerBound 99.99% of the time. * * How the sampling rate is determined: + * * Let p = num / total, where num is the sample size and total is the total number of - * datapoints in the RDD. We're trying to compute q > p such that + * datapoints in the RDD. We're trying to compute q {@literal >} p such that * - when sampling with replacement, we're drawing each datapoint with prob_i ~ Pois(q), - * where we want to guarantee Pr[s < num] < 0.0001 for s = sum(prob_i for i from 0 to total), - * i.e. the failure rate of not having a sufficiently large sample < 0.0001. + * where we want to guarantee + * Pr[s {@literal <} num] {@literal <} 0.0001 for s = sum(prob_i for i from 0 to total), + * i.e. the failure rate of not having a sufficiently large sample {@literal <} 0.0001. * Setting q = p + 5 * sqrt(p/total) is sufficient to guarantee 0. success rate for - * num > 12, but we need a slightly larger q (9 empirically determined). + * num {@literal >} 12, but we need a slightly larger q (9 empirically determined). --- End diff -- That's fine, but outside of actual mathematical equations, I think it's fine to use prose like "greater than". No big deal either way, up to your taste about what to change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/16020#discussion_r89740085 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -334,10 +334,8 @@ class KMeans @Since("1.5.0") ( val summary = new KMeansSummary( model.transform(dataset), $(predictionCol), $(featuresCol), $(k)) model.setSummary(Some(summary)) +if (handlePersistence) instances.unpersist() instr.logSuccess(model) -if (handlePersistence) { --- End diff -- prefer to keep this form according to style guide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/16020#discussion_r89740051 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala --- @@ -255,10 +256,19 @@ class BisectingKMeans @Since("2.0.0") ( @Since("2.0.0") override def fit(dataset: Dataset[_]): BisectingKMeansModel = { +val handlePersistence = dataset.rdd.getStorageLevel == StorageLevel.NONE --- End diff -- By the way, I've been meaning to log a ticket for this issue, but have been tied up. This will actually never work. `dataset.rdd` will always have storage level `NONE`. To see this: ``` scala> import org.apache.spark.storage.StorageLevel import org.apache.spark.storage.StorageLevel scala> val df = spark.range(10).toDF("num") df: org.apache.spark.sql.DataFrame = [num: bigint] scala> df.storageLevel == StorageLevel.NONE res0: Boolean = true scala> df.persist res1: df.type = [num: bigint] scala> df.storageLevel == StorageLevel.MEMORY_AND_DISK res2: Boolean = true scala> df.rdd.getStorageLevel == StorageLevel.MEMORY_AND_DISK res3: Boolean = false scala> df.rdd.getStorageLevel == StorageLevel.NONE res4: Boolean = true ``` So in fact all the algorithms that are checking for storage level using `dataset.rdd` are actually double-caching the data if the input DataFrame is actually cached, because the RDD will not appear to be cached. So we should migrate all the checks to use `dataset.storageLevel` which was added in https://github.com/apache/spark/pull/13780 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/16020#discussion_r89740159 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala --- @@ -273,6 +283,7 @@ class BisectingKMeans @Since("2.0.0") ( val summary = new BisectingKMeansSummary( model.transform(dataset), $(predictionCol), $(featuresCol), $(k)) model.setSummary(Some(summary)) +if (handlePersistence) rdd.unpersist() --- End diff -- Prefer ``` if (handlePersistence) { rdd.unpersist() } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89740299 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in specified + * numeric, string columns. If a specified column is not a numeric, string column, + * it is ignored. + */ + private def fill1[T](value: T, cols: Seq[String]): DataFrame = { +// the fill[T] which T is Long/Integer/Float/Double, --- End diff -- remove them is ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89740897 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { /** * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. * + * @since 2.1.0 + */ + def fill(value: Long): DataFrame = fill(value, df.columns) + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. --- End diff -- change them in #16013 is better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69236/consoleFull)** for PR 15994 at commit [`d1ba27f`](https://github.com/apache/spark/commit/d1ba27f96dba9f69b3c92a0f15fa5b3ada50dfaf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89741302 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { /** * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. * + * @since 2.1.0 + */ + def fill(value: Long): DataFrame = fill(value, df.columns) + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. --- End diff -- What if that one is merged first? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89741513 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala --- @@ -36,14 +36,14 @@ import org.apache.spark.mllib.tree.loss.{LogLoss, Loss, SquaredError} * @param validationTol validationTol is a condition which decides iteration termination when * runWithValidation is used. * The end of iteration is decided based on below logic: - * If the current loss on the validation set is > 0.01, the diff + * If the current loss on the validation set is greater than 0.01, the diff * of validation error is compared to relative tolerance which is * validationTol * (current loss on the validation set). - * If the current loss on the validation set is <= 0.01, the diff - * of validation error is compared to absolute tolerance which is + * If the current loss on the validation set is less than or euqal to 0.01, --- End diff -- typo: euqal -> equal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89741431 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala --- @@ -42,8 +42,8 @@ class SlidingRDDPartition[T](val idx: Int, val prev: Partition, val tail: Seq[T] * @param windowSize the window size, must be greater than 1 * @param step step size for windows * - * @see [[org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*]] - * @see [[scala.collection.IterableLike.sliding(Int, Int)*]] + * @see `org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*` --- End diff -- Is the trailing * intentional or a typo? no big deal anyway --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69237/consoleFull)** for PR 15994 at commit [`d7dc343`](https://github.com/apache/spark/commit/d7dc34341e8d17e892e500f7445a887b59a5f841). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89740182 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2063,6 +2063,7 @@ class SparkContext(config: SparkConf) extends Logging { * @param jobId the job ID to cancel * @throws InterruptedException if the cancel message cannot be sent */ + @throws(classOf[InterruptedException]) --- End diff -- I think these need to be reverted too; we don't want to introduce checked exceptions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89742186 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { /** * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. * + * @since 2.1.0 + */ + def fill(value: Long): DataFrame = fill(value, df.columns) + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89742322 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in specified + * numeric, string columns. If a specified column is not a numeric, string column, + * it is ignored. + */ + private def fill1[T](value: T, cols: Seq[String]): DataFrame = { +// the fill[T] which T is Long/Integer/Float/Double, +// should apply on all the NumericType Column, for example: +// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b") +// input.na.fill(3.1) +// the result is (3,164.3), not (null, 164.3) +val targetType = value match { + case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Long => NumericType --- End diff -- fixed it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69238/consoleFull)** for PR 15994 at commit [`36bff41`](https://github.com/apache/spark/commit/36bff418825a8ac98a266549b1f11d9ce87ddd15). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15961: [SPARK-18523][PySpark]Make SparkContext.stop more reliab...
Github user kxepal commented on the issue: https://github.com/apache/spark/pull/15961 @holdenk Agree with you here. The message is fixed, PR rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16031: [SPARK-18606][HISTORYSERVER]remove useless elemen...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/16031 [SPARK-18606][HISTORYSERVER]remove useless elements while searching ## What changes were proposed in this pull request? When we search applications in HistoryServer, it will include all contents between tag, which including useless elemtns like "https://cloud.githubusercontent.com/assets/5276001/20662840/28bcc874-b590-11e6-9115-12fb64e49898.jpg) After: ![after](https://cloud.githubusercontent.com/assets/5276001/20662844/2f717af2-b590-11e6-97dc-a48b08a54247.jpg) You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark span Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16031.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16031 commit 37aa3a2d2fddfa46fb4c5427cebed5683530153d Author: WangTaoTheTonic Date: 2016-11-28T08:37:13Z remove useless elements while searching --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16031: [SPARK-18606][HISTORYSERVER]remove useless elements whil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16031 **[Test build #69239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69239/consoleFull)** for PR 16031 at commit [`37aa3a2`](https://github.com/apache/spark/commit/37aa3a2d2fddfa46fb4c5427cebed5683530153d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89744772 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -86,7 +86,7 @@ class FileStreamSinkSuite extends StreamTest { val outputDf = spark.read.parquet(outputDir) val expectedSchema = new StructType() -.add(StructField("value", IntegerType)) +.add(StructField("value", IntegerType, nullable = false)) .add(StructField("id", IntegerType)) --- End diff -- BTW, do you know why `id` is not `nullable == false`? Looks both `value` and `id` are `nullable == false`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16032: [SPARK-18118][SQL] fix a compilation error due to...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/16032 [SPARK-18118][SQL] fix a compilation error due to nested JavaBeans ## What changes were proposed in this pull request? This PR avoids a compilation error due to more than 64KB Java byte code size. This error occur since generated java code `SpecificSafeProjection.apply()` for nested JavaBeans is too big. This PR avoids this compilation error by splitting a big code chunk into multiple methods by calling `CodegenContext.splitExpression` at `InitializeJavaBean.doGenCode` An object reference for JavaBean is stored to an instance variable `javaBean...`. Then, the instance variable will be referenced in the split methods. Generated code with this PR /* 22098 */ private void apply130_0(InternalRow i) { ... /* 22125 */ boolean isNull238 = i.isNullAt(2); /* 22126 */ InternalRow value238 = isNull238 ? null : (i.getStruct(2, 3)); /* 22127 */ boolean isNull236 = false; /* 22128 */ test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 value236 = null; /* 22129 */ if (!false && isNull238) { /* 22130 */ /* 22131 */ final test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 value239 = null; /* 22132 */ isNull236 = true; /* 22133 */ value236 = value239; /* 22134 */ } else { /* 22135 */ /* 22136 */ final test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 value241 = false ? null : new test.org.apache.spark.sql.JavaDatasetSuite$Nesting1(); /* 22137 */ this.javaBean14 = value241; /* 22138 */ if (!false) { /* 22139 */ apply25_0(i); /* 22140 */ apply25_1(i); /* 22141 */ apply25_2(i); /* 22142 */ } /* 22143 */ isNull236 = false; /* 22144 */ value236 = value241; /* 22145 */ } /* 22146 */ this.javaBean.setField2(value236); /* 22147 */ /* 22148 */ } ... /* 22928 */ public java.lang.Object apply(java.lang.Object _i) { /* 22929 */ InternalRow i = (InternalRow) _i; /* 22930 */ /* 22931 */ final test.org.apache.spark.sql.JavaDatasetSuite$NestedComplicatedJavaBean value1 = false ? null : new test.org.apache.spark.sql.JavaDatasetSuite$NestedComplicatedJavaBean(); /* 22932 */ this.javaBean = value1; /* 22933 */ if (!false) { /* 22934 */ apply130_0(i); /* 22935 */ apply130_1(i); /* 22936 */ apply130_2(i); /* 22937 */ apply130_3(i); /* 22938 */ apply130_4(i); /* 22939 */ } /* 22940 */ if (false) { /* 22941 */ mutableRow.setNullAt(0); /* 22942 */ } else { /* 22943 */ /* 22944 */ mutableRow.update(0, value1); /* 22945 */ } /* 22946 */ /* 22947 */ return mutableRow; /* 22948 */ } ## How was this patch tested? added a test suite into `JavaDatasetSuite.java` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-18118 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16032 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89744913 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala --- @@ -42,8 +42,8 @@ class SlidingRDDPartition[T](val idx: Int, val prev: Partition, val tail: Seq[T] * @param windowSize the window size, must be greater than 1 * @param step step size for windows * - * @see [[org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*]] - * @see [[scala.collection.IterableLike.sliding(Int, Int)*]] + * @see `org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*` --- End diff -- Let me please leave as they are. I am worried of getting blamed in the future. I will keep in mind that I should leave a comment for this when someone tries to change something around this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16032: [SPARK-18118][SQL] fix a compilation error due to nested...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16032 **[Test build #69240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69240/consoleFull)** for PR 16032 at commit [`5debc84`](https://github.com/apache/spark/commit/5debc847bba6dd2824d856e6325e0647df525870). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [SPARK-3359][DOCS] Make javadoc8 working for unid...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89745428 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala --- @@ -42,8 +42,8 @@ class SlidingRDDPartition[T](val idx: Int, val prev: Partition, val tail: Seq[T] * @param windowSize the window size, must be greater than 1 * @param step step size for windows * - * @see [[org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*]] - * @see [[scala.collection.IterableLike.sliding(Int, Int)*]] + * @see `org.apache.spark.mllib.rdd.RDDFunctions.sliding(Int, Int)*` --- End diff -- Heh, OK. I don't think it has any meaning in a hyperlink or javadoc syntax, and this isn't either one anyway, but it's OK to leave it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15992: [SPARK-18560][CORE][STREAMING] Receiver data can not be ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15992 I am not familiar enough with this code to review it. I do think @JoshRosen is the right person given https://issues.apache.org/jira/browse/SPARK-13990 and believe he's said he will start reviewing again this week after the holiday --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16013 **[Test build #69241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69241/consoleFull)** for PR 16013 at commit [`7d44dc5`](https://github.com/apache/spark/commit/7d44dc5ee69a75aa58132bac65de2f46a21845ba). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89747135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -130,6 +130,13 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { /** * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. * + * @since 2.1.0 + */ + def fill(value: Long): DataFrame = fill(value, df.columns) + + /** + * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. --- End diff -- @HyukjinKwon yeah the bad news is that I'm sure the javadoc generation is going to re-break periodically. we can try to catch it with reviews and your work at least gets it to a working state. But we'll clean it up again before releases regularly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89424696 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/WatermarkSuite.scala --- @@ -96,28 +96,58 @@ class WatermarkSuite extends StreamTest with BeforeAndAfter with Logging { ) } - ignore("recovery") { -val inputData = MemoryStream[Int] - -val windowedAggregation = inputData.toDF() + test("recovery") { +val ms = new MemoryStream[Int](0, sqlContext) +val df = ms.toDF().toDF("a") +val tableName = "recovery" +def startQuery: StreamingQuery = { + ms.toDF() .withColumn("eventTime", $"value".cast("timestamp")) .withWatermark("eventTime", "10 seconds") .groupBy(window($"eventTime", "5 seconds") as 'window) .agg(count("*") as 'count) .select($"window".getField("start").cast("long").as[Long], $"count".as[Long]) +.writeStream +.format("memory") +.queryName(tableName) +.outputMode("append") +.start() +} -testStream(windowedAggregation)( - AddData(inputData, 10, 11, 12, 13, 14, 15), - CheckAnswer(), - AddData(inputData, 25), // Advance watermark to 15 seconds - StopStream, - StartStream(), - CheckAnswer(), - AddData(inputData, 25), // Evict items less than previous watermark. - StopStream, - StartStream(), - CheckAnswer((10, 5)) +var q = startQuery +ms.addData(10, 11, 12, 13, 14, 15) +q.processAllAvailable() + +checkAnswer( + spark.table(tableName), Seq() +) + +// Advance watermark to 15 seconds, +// but do not process batch +ms.addData(25) +q.stop() --- End diff -- why dont you want to process the batch? let it process the batch, check whether the results are correct (i.e. things were evicted, and then stop. drop the table, and restart, processAllAvailable, and restart whether the same result is recreated. this will then actually verify that watermark is recovered, and used to evict the records again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89746924 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.io.File + +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.execution.streaming.{MemoryStream, StreamExecutionMetadata} +import org.apache.spark.sql.functions._ +import org.apache.spark.util.{SystemClock, Utils} + +class StreamExecutionMetadataSuite extends StreamTest { + + private def newMetadataDir = +Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath + + test("stream execution metadata") { +assert(StreamExecutionMetadata(0, 0) === + StreamExecutionMetadata("""{}""")) +assert(StreamExecutionMetadata(1, 0) === + StreamExecutionMetadata("""{"batchWatermarkMs":1}""")) +assert(StreamExecutionMetadata(0, 2) === + StreamExecutionMetadata("""{"batchTimestampMs":2}""")) +assert(StreamExecutionMetadata(1, 2) === + StreamExecutionMetadata( +"""{"batchWatermarkMs":1,"batchTimestampMs":2}""")) + } + + test("metadata is recovered from log when query is restarted") { +import testImplicits._ +val clock = new SystemClock() +val ms = new MemoryStream[Long](0, sqlContext) +val df = ms.toDF().toDF("a") +val checkpointLoc = newMetadataDir +val checkpointDir = new File(checkpointLoc, "complete") +checkpointDir.mkdirs() +assert(checkpointDir.exists()) +val tableName = "test" +// Query that prunes timestamps less than current_timestamp, making +// it easy to use for ensuring that a batch is re-processed with the +// timestamp used when it was first processed. +def startQuery: StreamingQuery = { + df.groupBy("a") +.count() +.where('a >= current_timestamp().cast("long")) +.writeStream +.format("memory") +.queryName(tableName) +.option("checkpointLocation", checkpointLoc) +.outputMode("complete") +.start() +} +// no exception here --- End diff -- what does this comment mean? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89747718 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.io.File + +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.execution.streaming.{MemoryStream, StreamExecutionMetadata} +import org.apache.spark.sql.functions._ +import org.apache.spark.util.{SystemClock, Utils} + +class StreamExecutionMetadataSuite extends StreamTest { + + private def newMetadataDir = +Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath + + test("stream execution metadata") { +assert(StreamExecutionMetadata(0, 0) === + StreamExecutionMetadata("""{}""")) +assert(StreamExecutionMetadata(1, 0) === + StreamExecutionMetadata("""{"batchWatermarkMs":1}""")) +assert(StreamExecutionMetadata(0, 2) === + StreamExecutionMetadata("""{"batchTimestampMs":2}""")) +assert(StreamExecutionMetadata(1, 2) === + StreamExecutionMetadata( +"""{"batchWatermarkMs":1,"batchTimestampMs":2}""")) + } + + test("metadata is recovered from log when query is restarted") { +import testImplicits._ +val clock = new SystemClock() +val ms = new MemoryStream[Long](0, sqlContext) +val df = ms.toDF().toDF("a") +val checkpointLoc = newMetadataDir +val checkpointDir = new File(checkpointLoc, "complete") +checkpointDir.mkdirs() +assert(checkpointDir.exists()) +val tableName = "test" +// Query that prunes timestamps less than current_timestamp, making +// it easy to use for ensuring that a batch is re-processed with the +// timestamp used when it was first processed. +def startQuery: StreamingQuery = { + df.groupBy("a") +.count() +.where('a >= current_timestamp().cast("long")) +.writeStream +.format("memory") +.queryName(tableName) +.option("checkpointLocation", checkpointLoc) +.outputMode("complete") +.start() +} +// no exception here +val t1 = clock.getTimeMillis() + 60L * 1000L +val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L +val q = startQuery +ms.addData(t1, t2) +q.processAllAvailable() + +checkAnswer( + spark.table(tableName), + Seq(Row(t1, 1), Row(t2, 1)) +) + +q.stop() +Thread.sleep(60L * 1000L + 5000L) // Expire t1 and t2 --- End diff -- This test will now takes 60 seconds! I think I didnt quite understand the test earlier, but now I do. I think the earlier 5 second was closer to being fine. Okay, lets just use 10 seconds. And instead of sleep, use `eventually` to check the conditions `t2 < clock.getTimeMillis()`. this would make the system sleep no more than that is necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89748128 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.io.File + +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.execution.streaming.{MemoryStream, StreamExecutionMetadata} +import org.apache.spark.sql.functions._ +import org.apache.spark.util.{SystemClock, Utils} + +class StreamExecutionMetadataSuite extends StreamTest { + + private def newMetadataDir = +Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath + + test("stream execution metadata") { +assert(StreamExecutionMetadata(0, 0) === + StreamExecutionMetadata("""{}""")) +assert(StreamExecutionMetadata(1, 0) === + StreamExecutionMetadata("""{"batchWatermarkMs":1}""")) +assert(StreamExecutionMetadata(0, 2) === + StreamExecutionMetadata("""{"batchTimestampMs":2}""")) +assert(StreamExecutionMetadata(1, 2) === + StreamExecutionMetadata( +"""{"batchWatermarkMs":1,"batchTimestampMs":2}""")) + } + + test("metadata is recovered from log when query is restarted") { +import testImplicits._ +val clock = new SystemClock() +val ms = new MemoryStream[Long](0, sqlContext) +val df = ms.toDF().toDF("a") +val checkpointLoc = newMetadataDir +val checkpointDir = new File(checkpointLoc, "complete") +checkpointDir.mkdirs() +assert(checkpointDir.exists()) +val tableName = "test" +// Query that prunes timestamps less than current_timestamp, making +// it easy to use for ensuring that a batch is re-processed with the +// timestamp used when it was first processed. +def startQuery: StreamingQuery = { + df.groupBy("a") +.count() +.where('a >= current_timestamp().cast("long")) +.writeStream +.format("memory") +.queryName(tableName) +.option("checkpointLocation", checkpointLoc) +.outputMode("complete") +.start() +} +// no exception here +val t1 = clock.getTimeMillis() + 60L * 1000L +val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L +val q = startQuery +ms.addData(t1, t2) +q.processAllAvailable() + +checkAnswer( + spark.table(tableName), + Seq(Row(t1, 1), Row(t2, 1)) +) + +q.stop() +Thread.sleep(60L * 1000L + 5000L) // Expire t1 and t2 +assert(t1 < clock.getTimeMillis()) +assert(t2 < clock.getTimeMillis()) + +spark.sql(s"drop table $tableName") + +// verify table is dropped +intercept[AnalysisException](spark.table(tableName).collect()) --- End diff -- i think you can use `spark.catalog,tableExists` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89747045 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.io.File + +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.execution.streaming.{MemoryStream, StreamExecutionMetadata} +import org.apache.spark.sql.functions._ +import org.apache.spark.util.{SystemClock, Utils} + +class StreamExecutionMetadataSuite extends StreamTest { + + private def newMetadataDir = +Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath + + test("stream execution metadata") { +assert(StreamExecutionMetadata(0, 0) === + StreamExecutionMetadata("""{}""")) +assert(StreamExecutionMetadata(1, 0) === + StreamExecutionMetadata("""{"batchWatermarkMs":1}""")) +assert(StreamExecutionMetadata(0, 2) === + StreamExecutionMetadata("""{"batchTimestampMs":2}""")) +assert(StreamExecutionMetadata(1, 2) === + StreamExecutionMetadata( +"""{"batchWatermarkMs":1,"batchTimestampMs":2}""")) + } + + test("metadata is recovered from log when query is restarted") { +import testImplicits._ +val clock = new SystemClock() +val ms = new MemoryStream[Long](0, sqlContext) +val df = ms.toDF().toDF("a") +val checkpointLoc = newMetadataDir +val checkpointDir = new File(checkpointLoc, "complete") +checkpointDir.mkdirs() +assert(checkpointDir.exists()) +val tableName = "test" +// Query that prunes timestamps less than current_timestamp, making +// it easy to use for ensuring that a batch is re-processed with the +// timestamp used when it was first processed. +def startQuery: StreamingQuery = { + df.groupBy("a") +.count() +.where('a >= current_timestamp().cast("long")) +.writeStream +.format("memory") +.queryName(tableName) +.option("checkpointLocation", checkpointLoc) +.outputMode("complete") +.start() +} +// no exception here +val t1 = clock.getTimeMillis() + 60L * 1000L +val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L --- End diff -- add a comment explaining how the test works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89745095 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala --- @@ -235,4 +239,85 @@ class StreamingAggregationSuite extends StreamTest with BeforeAndAfterAll { CheckLastBatch(("a", 30), ("b", 3), ("c", 1)) ) } + + test("prune results by current_time, complete mode") { +import testImplicits._ +import StreamingAggregationSuite._ +clock = new StreamManualClock + +val inputData = MemoryStream[Long] + +val aggregated = + inputData.toDF() +.groupBy($"value") +.agg(count("*")) +.where('value >= current_timestamp().cast("long") - 10L) + +testStream(aggregated, Complete)( + StartStream(ProcessingTime("10 seconds"), triggerClock = clock), + + // advance clock to 10 seconds + AddData(inputData, 0L, 5L, 5L, 10L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((0L, 1), (5L, 2), (10L, 1)), + + // advance clock to 20 seconds, should retain keys >= 10 + AddData(inputData, 15L, 15L, 20L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((10L, 1), (15L, 2), (20L, 1)), + + // advance clock to 30 seconds, should retain keys >= 20 + AddData(inputData, 0L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((20L, 1)), + + // advance clock to 40 seconds, should retain keys >= 30 + AddData(inputData, 25L, 30L, 40L, 45L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((30L, 1), (40L, 1), (45L, 1)) +) + } + --- End diff -- nit: extra line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89744923 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/WatermarkSuite.scala --- @@ -96,27 +96,42 @@ class WatermarkSuite extends StreamTest with BeforeAndAfter with Logging { ) } - ignore("recovery") { + test("recovery") { val inputData = MemoryStream[Int] - -val windowedAggregation = inputData.toDF() -.withColumn("eventTime", $"value".cast("timestamp")) -.withWatermark("eventTime", "10 seconds") -.groupBy(window($"eventTime", "5 seconds") as 'window) -.agg(count("*") as 'count) -.select($"window".getField("start").cast("long").as[Long], $"count".as[Long]) - -testStream(windowedAggregation)( +val df = inputData.toDF() + .withColumn("eventTime", $"value".cast("timestamp")) + .withWatermark("eventTime", "10 seconds") + .groupBy(window($"eventTime", "5 seconds") as 'window) + .agg(count("*") as 'count) + .select($"window".getField("start").cast("long").as[Long], $"count".as[Long]) +val outputMode = OutputMode.Append +val memorySink = new MemorySink(df.schema, outputMode) +testStream(df)( AddData(inputData, 10, 11, 12, 13, 14, 15), CheckAnswer(), AddData(inputData, 25), // Advance watermark to 15 seconds StopStream, StartStream(), - CheckAnswer(), + CheckLastBatch(), AddData(inputData, 25), // Evict items less than previous watermark. + CheckLastBatch((10, 5)), StopStream, + AssertOnQuery { q => // clear the sink +q.sink.asInstanceOf[MemorySink].clear() +true + }, StartStream(), - CheckAnswer((10, 5)) + CheckLastBatch((10, 5)), --- End diff -- nit: add comment to explain // Should recompute the last batch and re-evict timestamp 10 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89739836 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala --- @@ -235,4 +239,85 @@ class StreamingAggregationSuite extends StreamTest with BeforeAndAfterAll { CheckLastBatch(("a", 30), ("b", 3), ("c", 1)) ) } + + test("prune results by current_time, complete mode") { +import testImplicits._ +import StreamingAggregationSuite._ +clock = new StreamManualClock + +val inputData = MemoryStream[Long] + +val aggregated = + inputData.toDF() +.groupBy($"value") +.agg(count("*")) +.where('value >= current_timestamp().cast("long") - 10L) + +testStream(aggregated, Complete)( + StartStream(ProcessingTime("10 seconds"), triggerClock = clock), + + // advance clock to 10 seconds + AddData(inputData, 0L, 5L, 5L, 10L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((0L, 1), (5L, 2), (10L, 1)), + + // advance clock to 20 seconds, should retain keys >= 10 + AddData(inputData, 15L, 15L, 20L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((10L, 1), (15L, 2), (20L, 1)), + + // advance clock to 30 seconds, should retain keys >= 20 + AddData(inputData, 0L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((20L, 1)), + + // advance clock to 40 seconds, should retain keys >= 30 + AddData(inputData, 25L, 30L, 40L, 45L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((30L, 1), (40L, 1), (45L, 1)) +) + } + + + test("prune results by current_date, complete mode") { +import testImplicits._ +import StreamingAggregationSuite._ +clock = new StreamManualClock +val tz = TimeZone.getDefault.getID +val inputData = MemoryStream[Long] +val aggregated = + inputData.toDF() +.select(to_utc_timestamp(from_unixtime('value * DateTimeUtils.SECONDS_PER_DAY), tz)) +.toDF("value") +.groupBy($"value") +.agg(count("*")) +// .select('value, date_sub(current_date(), 10).cast("timestamp").alias("t")) +// .select('value, 't, 'value >= 't) +.where($"value".cast("date") >= date_sub(current_date(), 10)) +.select(($"value".cast("long") / DateTimeUtils.SECONDS_PER_DAY).cast("long"), $"count(1)") +testStream(aggregated, Complete)( + StartStream(ProcessingTime("10 day"), triggerClock = clock), + // advance clock to 10 days, should retain all keys + AddData(inputData, 0L, 5L, 5L, 10L), + AdvanceManualClock(DateTimeUtils.MILLIS_PER_DAY * 10), + CheckLastBatch((0L, 1), (5L, 2), (10L, 1)), + // advance clock to 20 days, should retain keys >= 10 + AddData(inputData, 15L, 15L, 20L), + AdvanceManualClock(DateTimeUtils.MILLIS_PER_DAY * 10), + CheckLastBatch((10L, 1), (15L, 2), (20L, 1)), + // advance clock to 30 days, should retain keys >= 20 + AddData(inputData, 0L), + AdvanceManualClock(DateTimeUtils.MILLIS_PER_DAY * 10), + CheckLastBatch((20L, 1)), + // advance clock to 40 seconds, should retain keys >= 30 --- End diff -- 40 seconds -> days --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89739983 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/WatermarkSuite.scala --- @@ -96,27 +96,42 @@ class WatermarkSuite extends StreamTest with BeforeAndAfter with Logging { ) } - ignore("recovery") { + test("recovery") { val inputData = MemoryStream[Int] - -val windowedAggregation = inputData.toDF() -.withColumn("eventTime", $"value".cast("timestamp")) -.withWatermark("eventTime", "10 seconds") -.groupBy(window($"eventTime", "5 seconds") as 'window) -.agg(count("*") as 'count) -.select($"window".getField("start").cast("long").as[Long], $"count".as[Long]) - -testStream(windowedAggregation)( +val df = inputData.toDF() + .withColumn("eventTime", $"value".cast("timestamp")) + .withWatermark("eventTime", "10 seconds") + .groupBy(window($"eventTime", "5 seconds") as 'window) + .agg(count("*") as 'count) + .select($"window".getField("start").cast("long").as[Long], $"count".as[Long]) +val outputMode = OutputMode.Append +val memorySink = new MemorySink(df.schema, outputMode) +testStream(df)( AddData(inputData, 10, 11, 12, 13, 14, 15), CheckAnswer(), --- End diff -- nit: Make this CheckAnswer -> CheckLastBatch for being consistent with rest of the checks in this test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89746706 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.io.File + +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.execution.streaming.{MemoryStream, StreamExecutionMetadata} +import org.apache.spark.sql.functions._ +import org.apache.spark.util.{SystemClock, Utils} + +class StreamExecutionMetadataSuite extends StreamTest { + + private def newMetadataDir = +Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath + + test("stream execution metadata") { +assert(StreamExecutionMetadata(0, 0) === + StreamExecutionMetadata("""{}""")) +assert(StreamExecutionMetadata(1, 0) === + StreamExecutionMetadata("""{"batchWatermarkMs":1}""")) +assert(StreamExecutionMetadata(0, 2) === + StreamExecutionMetadata("""{"batchTimestampMs":2}""")) +assert(StreamExecutionMetadata(1, 2) === + StreamExecutionMetadata( +"""{"batchWatermarkMs":1,"batchTimestampMs":2}""")) + } + + test("metadata is recovered from log when query is restarted") { +import testImplicits._ +val clock = new SystemClock() +val ms = new MemoryStream[Long](0, sqlContext) +val df = ms.toDF().toDF("a") +val checkpointLoc = newMetadataDir +val checkpointDir = new File(checkpointLoc, "complete") +checkpointDir.mkdirs() +assert(checkpointDir.exists()) +val tableName = "test" +// Query that prunes timestamps less than current_timestamp, making +// it easy to use for ensuring that a batch is re-processed with the +// timestamp used when it was first processed. +def startQuery: StreamingQuery = { --- End diff -- nit: functions that have sideeffects (like starting a thread), usually have `()` at the end, and is used with the `()`. for example, `val q = startQuery()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89739781 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala --- @@ -235,4 +239,85 @@ class StreamingAggregationSuite extends StreamTest with BeforeAndAfterAll { CheckLastBatch(("a", 30), ("b", 3), ("c", 1)) ) } + + test("prune results by current_time, complete mode") { +import testImplicits._ +import StreamingAggregationSuite._ +clock = new StreamManualClock + +val inputData = MemoryStream[Long] + +val aggregated = + inputData.toDF() +.groupBy($"value") +.agg(count("*")) +.where('value >= current_timestamp().cast("long") - 10L) + +testStream(aggregated, Complete)( + StartStream(ProcessingTime("10 seconds"), triggerClock = clock), + + // advance clock to 10 seconds + AddData(inputData, 0L, 5L, 5L, 10L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((0L, 1), (5L, 2), (10L, 1)), + + // advance clock to 20 seconds, should retain keys >= 10 + AddData(inputData, 15L, 15L, 20L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((10L, 1), (15L, 2), (20L, 1)), + + // advance clock to 30 seconds, should retain keys >= 20 + AddData(inputData, 0L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((20L, 1)), + + // advance clock to 40 seconds, should retain keys >= 30 + AddData(inputData, 25L, 30L, 40L, 45L), + AdvanceManualClock(10 * 1000), + CheckLastBatch((30L, 1), (40L, 1), (45L, 1)) +) + } + + + test("prune results by current_date, complete mode") { +import testImplicits._ +import StreamingAggregationSuite._ +clock = new StreamManualClock +val tz = TimeZone.getDefault.getID +val inputData = MemoryStream[Long] +val aggregated = + inputData.toDF() +.select(to_utc_timestamp(from_unixtime('value * DateTimeUtils.SECONDS_PER_DAY), tz)) +.toDF("value") +.groupBy($"value") +.agg(count("*")) +// .select('value, date_sub(current_date(), 10).cast("timestamp").alias("t")) +// .select('value, 't, 'value >= 't) --- End diff -- please remove these lines --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15949: [SPARK-18339] [SPARK-18513] [SQL] Don't push down...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15949#discussion_r89745152 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamExecutionMetadataSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.io.File + +import org.apache.spark.sql.{AnalysisException, Row} +import org.apache.spark.sql.execution.streaming.{MemoryStream, StreamExecutionMetadata} +import org.apache.spark.sql.functions._ +import org.apache.spark.util.{SystemClock, Utils} + +class StreamExecutionMetadataSuite extends StreamTest { + + private def newMetadataDir = +Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath + + test("stream execution metadata") { +assert(StreamExecutionMetadata(0, 0) === + StreamExecutionMetadata("""{}""")) +assert(StreamExecutionMetadata(1, 0) === + StreamExecutionMetadata("""{"batchWatermarkMs":1}""")) +assert(StreamExecutionMetadata(0, 2) === + StreamExecutionMetadata("""{"batchTimestampMs":2}""")) +assert(StreamExecutionMetadata(1, 2) === + StreamExecutionMetadata( +"""{"batchWatermarkMs":1,"batchTimestampMs":2}""")) + } + + test("metadata is recovered from log when query is restarted") { +import testImplicits._ +val clock = new SystemClock() +val ms = new MemoryStream[Long](0, sqlContext) +val df = ms.toDF().toDF("a") +val checkpointLoc = newMetadataDir +val checkpointDir = new File(checkpointLoc, "complete") +checkpointDir.mkdirs() +assert(checkpointDir.exists()) +val tableName = "test" +// Query that prunes timestamps less than current_timestamp, making +// it easy to use for ensuring that a batch is re-processed with the +// timestamp used when it was first processed. +def startQuery: StreamingQuery = { + df.groupBy("a") +.count() +.where('a >= current_timestamp().cast("long")) +.writeStream +.format("memory") +.queryName(tableName) +.option("checkpointLocation", checkpointLoc) +.outputMode("complete") +.start() +} +// no exception here +val t1 = clock.getTimeMillis() + 60L * 1000L +val t2 = clock.getTimeMillis() + 60L * 1000L + 1000L +val q = startQuery +ms.addData(t1, t2) +q.processAllAvailable() + +checkAnswer( + spark.table(tableName), + Seq(Row(t1, 1), Row(t2, 1)) +) + +q.stop() +Thread.sleep(60L * 1000L + 5000L) // Expire t1 and t2 +assert(t1 < clock.getTimeMillis()) +assert(t2 < clock.getTimeMillis()) + +spark.sql(s"drop table $tableName") + +// verify table is dropped +intercept[AnalysisException](spark.table(tableName).collect()) +val q2 = startQuery +q2.processAllAvailable() +checkAnswer( + spark.table(tableName), + Seq(Row(t1, 1), Row(t2, 1)) +) + +q2.stop() + --- End diff -- nit: extra line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69230/consoleFull)** for PR 16030 at commit [`6bd8b4c`](https://github.com/apache/spark/commit/6bd8b4cdb63b20bc292a5ec1d8ca38281ee5bfbf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69230/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16030 @brkyvz @tdas Could you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69231/consoleFull)** for PR 15994 at commit [`662acfb`](https://github.com/apache/spark/commit/662acfb9ab046842f0fbe2f9344dd3c0df12ad7a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69231/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16027: [SPARK-18604][SQL] Make sure CollapseWindow returns the ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16027 Merging to master/2.1. Thanks for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16027: [SPARK-18604][SQL] Make sure CollapseWindow retur...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16027 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16031: [SPARK-18606][HISTORYSERVER]remove useless elemen...
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/16031#discussion_r89758734 --- Diff: core/src/main/resources/org/apache/spark/ui/static/historypage.js --- @@ -78,6 +78,12 @@ jQuery.extend( jQuery.fn.dataTableExt.oSort, { } } ); +jQuery.extend( jQuery.fn.dataTableExt.ofnSearch, { +"appid-numeric": function ( a ) { +return a.replace(/[\r\n]/g, " ").replace(/<.*?>/g, ""); --- End diff -- Refer to `jquery.dataTables.1.10.4.min.js`. I'd like to change it to better style if there's any :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15976 **[Test build #69234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69234/consoleFull)** for PR 15976 at commit [`6db5af9`](https://github.com/apache/spark/commit/6db5af95e456d6529a37c243f41a4632a69f40d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15976 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69234/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15780 **[Test build #69232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69232/consoleFull)** for PR 15780 at commit [`2a1287a`](https://github.com/apache/spark/commit/2a1287a84cb303a8df9f8c310aad154e04b6b4d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class LambdaVariable(value: String, isNull: String, dataType: DataType,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15780 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69232/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14136 **[Test build #69233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69233/consoleFull)** for PR 14136 at commit [`3c699ad`](https://github.com/apache/spark/commit/3c699adfee609781c1e4ce2c08493308f5e7f511). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14136 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69233/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14136 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16033: SPARK-18607 get a result on a percent of the task...
GitHub user Ru-Xiang opened a pull request: https://github.com/apache/spark/pull/16033 SPARK-18607 get a result on a percent of the tasks succeed ## What changes were proposed in this pull request? In this patch, we modify the codes corresponding to runApproximateJob so that we can get a result when the specified percent of tasks succeed. In a production environment, 'long tail' is a common urgent problem. In practice, as long as we can get a specified percent of tasks' results, we can guarantee the final results. And this is a common requirement in the practice of machine learning algorithms. ## How was this patch tested? We compile the codes by dev/make-distribution.sh, and deploy it on a cluster. and run a test codes reduce on the cluster, and we get the desired results. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ru-Xiang/spark my_change Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16033.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16033 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16033: SPARK-18607 get a result on a percent of the tasks succe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16033 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org