[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20625 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20626 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20511#discussion_r168691737 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { } } } + + test("SPARK-23340 Empty float/double array columns raise EOFException") { +Seq(Seq(Array.empty[Float]).toDF(), Seq(Array.empty[Double]).toDF()).foreach { df => + withTempPath { path => --- End diff -- ? I already added the test case at [HiveOrcQuerySuite.scala](https://github.com/apache/spark/pull/20511/files#diff-1569b2874975978ed62a01aab108d093R212), too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20626: [SPARK-23447][SQL] Cleanup codegen template for L...
GitHub user rednaxelafx opened a pull request: https://github.com/apache/spark/pull/20626 [SPARK-23447][SQL] Cleanup codegen template for Literal ## What changes were proposed in this pull request? Cleaned up the codegen templates for `Literal`s, to make sure that the `ExprCode` returned from `Literal.doGenCode()` has: 1. an empty `code` field; 2. an `isNull` field of either literal `true` or `false`; 3. a `value` field that is just a simple literal/constant. Before this PR, there are a couple of paths that would return a non-trivial `code` and all of them are actually unnecessary. The `NaN` and `Infinity` constants for `double` and `float` can be accessed through constants directly available so there's no need to add a reference for them. Also took the opportunity to add a new util method for ease of creating `ExprCode` for inline-able non-null values. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rednaxelafx/apache-spark codegen-literal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20626.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20626 commit 68edf0f3463daed3bb7042becb333788b22b23b0 Author: Kris MokDate: 2018-02-16T07:44:43Z Cleanup codegen templates for Literals: make sure the `code` field is empty and the `value` field is a simple literal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20625 **[Test build #87502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87502/testReport)** for PR 20625 at commit [`c79c6df`](https://github.com/apache/spark/commit/c79c6df7284b9717fe4e4c26090dcb51bf7712da). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 I just opened https://github.com/apache/spark/pull/20625. I believe this is the smallest and simplest change .. Will turn this PR to add a configuration later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20625 [SPARK-23446][PYTHON] Explicitly check supported types in toPandas ## What changes were proposed in this pull request? This PR explicitly specifies the types we supported in `toPandas`. This was a hole. For example, we haven't finished the binary type support in Python side yet but now it allows as below: ```python spark.conf.set("spark.sql.execution.arrow.enabled", "false") df = spark.createDataFrame([[bytearray("a")]]) df.toPandas() spark.conf.set("spark.sql.execution.arrow.enabled", "true") df.toPandas() ``` ``` _1 0 [97] _1 0 a ``` This should be disallowed. I think the same things also apply to nested timestamps too. I also added some nicer message about `spark.sql.execution.arrow.enabled` in the error message. ## How was this patch tested? Manually tested and tests added in `python/pyspark/sql/tests.py`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark pandas_convertion_supported_type Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20625.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20625 commit c79c6df7284b9717fe4e4c26090dcb51bf7712da Author: hyukjinkwonDate: 2018-02-16T07:45:52Z Explicitly specify supported types in toPandas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 Yup, I will. Sorry for delaying it. I was trying to make the fix small as possible as I can. Let me just open it as a simplest way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 @HyukjinKwon Will you submit a fix for the binary type today? We are very close to RC4. This is kind of urgent if we still want to block it in the Spark 2.3.0 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20511#discussion_r168686683 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { } } } + + test("SPARK-23340 Empty float/double array columns raise EOFException") { +Seq(Seq(Array.empty[Float]).toDF(), Seq(Array.empty[Double]).toDF()).foreach { df => + withTempPath { path => --- End diff -- Please also test both readers as we discussed above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20295 Don't worry, I am keeping my eyes on this and I believe @ueshin too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20568 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20568 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20568 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87501/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20568 **[Test build #87501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87501/testReport)** for PR 20568 at commit [`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20621#discussion_r168680871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -407,6 +407,29 @@ object PartitioningUtils { Literal(bigDecimal) } +val dateTry = Try { + // try and parse the date, if no exception occurs this is a candidate to be resolved as + // DateType + DateTimeUtils.getThreadLocalDateFormat.parse(raw) + // SPARK-23436: Casting the string to date may still return null if a bad Date is provided. + // We need to check that we can cast the raw string since we later can use Cast to get + // the partition values with the right DataType (see + // org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning) + val dateOption = Option(Cast(Literal(raw), DateType).eval()) --- End diff -- Can we add `require(dateOption.isDefine)` with some comments explicitly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20621#discussion_r168680397 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -407,6 +407,29 @@ object PartitioningUtils { Literal(bigDecimal) } +val dateTry = Try { + // try and parse the date, if no exception occurs this is a candidate to be resolved as + // DateType + DateTimeUtils.getThreadLocalDateFormat.parse(raw) --- End diff -- Ah, so the root cause is more specific to `SimpleDateFormat` because it allows invalid dates like `2018-01-01-04` to be parsed fine .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20624 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20624 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87500/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20624 **[Test build #87500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87500/testReport)** for PR 20624 at commit [`cf36020`](https://github.com/apache/spark/commit/cf3602075dcee35494c72975e361b739939079b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CatalogColumnStat(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20501: [SPARK-22430][Docs] Unknown tag warnings when building R...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20501 @rekhajoshm feel free to follow up after we are through with 2.3.0, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20464 Sorry, I'm a bit occupied with testing 2.3 RC, will get back to this after. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168670009 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -196,7 +208,13 @@ case class Asin(child: Expression) extends UnaryMathExpression(math.asin, "ASIN" // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the inverse tangent (a.k.a. arctangent).", + usage = "_FUNC_(expr) - Returns the inverse tangent (a.k.a. arc tangent) of `expr`, " + +"as if computed by `java.lang.Math._FUNC_`.", + arguments = --- End diff -- Could we just save one line and stick to the same indentation? ```scala arguments = """ Arguments: * expr - number whose arc tangent is to be returned. """, examples = """ ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168670792 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -521,7 +554,13 @@ case class Signum(child: Expression) extends UnaryMathExpression(math.signum, "S case class Sin(child: Expression) extends UnaryMathExpression(math.sin, "SIN") @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the hyperbolic sine of `expr`.", + usage = "_FUNC_(expr) - Returns hyperbolic sine of `expr`, " + --- End diff -- I think we can just do as below: ```scala ... usage = """ _FUNC_(expr) - Returns hyperbolic sine of `expr`, as if computed by `java.lang.Math._FUNC_`. """, ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20622 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87499/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20622 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20622 **[Test build #87499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87499/testReport)** for PR 20622 at commit [`35f5b4a`](https://github.com/apache/spark/commit/35f5b4a495517d4f11998d6b7fb463851304712d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168669517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1313,131 +1313,178 @@ object functions { // /** - * Computes the cosine inverse of the given value; the returned angle is in the range - * 0.0 through pi. + * @param e the value whose arc cosine is to be returned + * @return cosine inverse of the given value in the range of 0.0 through pi, + * as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ def acos(e: Column): Column = withExpr { Acos(e.expr) } /** - * Computes the cosine inverse of the given column; the returned angle is in the range - * 0.0 through pi. + * @param colName the value whose arc cosine is to be returned + * @returncosine inverse of the given value in the range of 0.0 through pi, + *as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ - def acos(columnName: String): Column = acos(Column(columnName)) + def acos(colName: String): Column = acos(Column(colName)) --- End diff -- I don't think we should change the name for that reason .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20568 **[Test build #87501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87501/testReport)** for PR 20568 at commit [`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20568 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87498/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20554 **[Test build #87498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87498/testReport)** for PR 20554 at commit [`2dea08a`](https://github.com/apache/spark/commit/2dea08a4c5f85991e4ad4c7da886c2e0bf456bb8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fallback to n...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20567#discussion_r168665914 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1941,12 +1941,24 @@ def toPandas(self): timezone = None if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", "false").lower() == "true": +should_fallback = False try: -from pyspark.sql.types import _check_dataframe_convert_date, \ -_check_dataframe_localize_timestamps +from pyspark.sql.types import to_arrow_schema from pyspark.sql.utils import require_minimum_pyarrow_version -import pyarrow require_minimum_pyarrow_version() +# Check if its schema is convertible in Arrow format. +to_arrow_schema(self.schema) +except Exception as e: --- End diff -- Hm, it might depend on which message we want to show. Will open another PR as discussed above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20619 It looks good to me that we move the registrations to the new (earlier) places. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87497/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20554 **[Test build #87497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87497/testReport)** for PR 20554 at commit [`7f5df22`](https://github.com/apache/spark/commit/7f5df222da2e6cf59ed632b1c05165f1035202f3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20604 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87496/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20604 **[Test build #87496 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87496/testReport)** for PR 20604 at commit [`4d0b52e`](https://github.com/apache/spark/commit/4d0b52edc89bf98e3dccf4e6b044712bc09547ef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20624 **[Test build #87500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87500/testReport)** for PR 20624 at commit [`cf36020`](https://github.com/apache/spark/commit/cf3602075dcee35494c72975e361b739939079b4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20624 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20624 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/929/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/20624 cc @gatorsmile @cloud-fan @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring
GitHub user juliuszsompolski opened a pull request: https://github.com/apache/spark/pull/20624 [SPARK-23445] ColumnStat refactoring ## What changes were proposed in this pull request? Refactor ColumnStat to be more flexible. * Split `ColumnStat` and `CatalogColumnStat` just like `CatalogStatistics` is split from `Statistics`. This detaches how the statistics are stored from how they are processed in the query plan. `CatalogColumnStat` keeps `min` and `max` as `String`, making it not depend on dataType information. * For `CatalogColumnStat`, parse column names from property names in the metastore (`KEY_VERSION` property), not from metastore schema. This means that `CatalogColumnStat`s can be created for columns even if the schema itself is not stored in the metastore. * Make all fields optional. `min`, `max` and `histogram` for columns were optional already. Having them all optional is more consistent, and gives flexibility to e.g. drop some of the fields through transformations if they are difficult / impossible to calculate. The added flexibility will make it possible to have alternative implementations for stats, and separates stats collection from stats and estimation processing in plans. ## How was this patch tested? Refactored existing tests to work with refactored `ColumnStat` and `CatalogColumnStat`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/juliuszsompolski/apache-spark SPARK-23445 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20624.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20624 commit cf3602075dcee35494c72975e361b739939079b4 Author: Juliusz SompolskiDate: 2018-01-19T13:57:46Z column stat refactoring --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20568 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20568 @mrkm4ntr Do not worry about these failures. Since we know there are some unstable tests, our community is trying to fix them. For a while, we have to kick test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 I cannot reproduce this failure of the test in my environment. It seems to me that this is not related to this change... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Github user mrkm4ntr commented on a diff in the pull request: https://github.com/apache/spark/pull/20568#discussion_r168659153 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java --- @@ -71,6 +73,20 @@ public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, i return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, int seed) { +// This is compatible with original and another implementations. +// Use this method after 2.3.0. --- End diff -- Thanks, fixed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20382 Hi @tdas, I'm on vacation this week, will update the code when I have time. Sorry for the delay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20622 **[Test build #87499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87499/testReport)** for PR 20622 at commit [`35f5b4a`](https://github.com/apache/spark/commit/35f5b4a495517d4f11998d6b7fb463851304712d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20554 LGTM again --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20622 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20622 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87495/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20622 **[Test build #87495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87495/testReport)** for PR 20622 at commit [`3ad7b3f`](https://github.com/apache/spark/commit/3ad7b3f547dac787022262a2f55bc7a7a6c30cd7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/20424 @squito We made a few adjustments since your "lgtm". Do you want to take a quick look? @HyukjinKwon also gave his "lgtm" after the adjustments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20511#discussion_r168643357 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { } } } + + test("SPARK-23340 Empty float/double array columns raise EOFException") { +Seq(Seq(Array.empty[Float]).toDF(), Seq(Array.empty[Double]).toDF()).foreach { df => + withTempPath { path => --- End diff -- Sure. This suite is in `sql/core` and inherits `OrcTest.scala`'s `val orcImp: String = "native"`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20623 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20623 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87492/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20623: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20623 **[Test build #87492 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87492/testReport)** for PR 20623 at commit [`f7a2282`](https://github.com/apache/spark/commit/f7a22827694a3aa92e8a7dd20195e2895e86880a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8745: [SPARK-10589] [WEBUI] Add defense against external site f...
Github user alexmnyc commented on the issue: https://github.com/apache/spark/pull/8745 Now I am not able to embed it on my grafana dashboard... That should be a configuration parameter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20554 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20295 Resolved conflict and addressed @ueshin's comment. (Btw, I am fine with merging after Spark 2.3 RC passes, as that seems to be the priority now, just want to make sure this PR doesn't sit forever...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20511#discussion_r168638420 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { } } } + + test("SPARK-23340 Empty float/double array columns raise EOFException") { +Seq(Seq(Array.empty[Float]).toDF(), Seq(Array.empty[Double]).toDF()).foreach { df => + withTempPath { path => --- End diff -- Are we testing the native readers? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20554 **[Test build #87498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87498/testReport)** for PR 20554 at commit [`2dea08a`](https://github.com/apache/spark/commit/2dea08a4c5f85991e4ad4c7da886c2e0bf456bb8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/928/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20382 @jerryshao any updates? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20554 **[Test build #87497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87497/testReport)** for PR 20554 at commit [`7f5df22`](https://github.com/apache/spark/commit/7f5df222da2e6cf59ed632b1c05165f1035202f3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/927/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20604 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/926/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [WIP][SPARK-23365][CORE] Do not adjust num executors whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20604 **[Test build #87496 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87496/testReport)** for PR 20604 at commit [`4d0b52e`](https://github.com/apache/spark/commit/4d0b52edc89bf98e3dccf4e6b044712bc09547ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87490/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20295 **[Test build #87490 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87490/testReport)** for PR 20295 at commit [`9ed3779`](https://github.com/apache/spark/commit/9ed3779b665c90e5bb25bc6636997a4b080c3d34). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20057 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20057 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87493/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20057 **[Test build #87493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87493/testReport)** for PR 20057 at commit [`6c0d3df`](https://github.com/apache/spark/commit/6c0d3dfd415e5630dbb02ce65c6adf3db419bdec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20554#discussion_r168626487 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala --- @@ -112,14 +112,18 @@ abstract class KafkaSourceTest extends StreamTest with SharedSQLContext { query.nonEmpty, "Cannot add data when there is no query for finding the active kafka source") - val sources = query.get.logicalPlan.collect { -case StreamingExecutionRelation(source: KafkaSource, _) => source - } ++ (query.get.lastExecution match { -case null => Seq() -case e => e.logical.collect { - case DataSourceV2Relation(_, reader: KafkaContinuousReader) => reader -} - }) + val sources = { +query.get.logicalPlan.collect { + case StreamingExecutionRelation(source: KafkaSource, _) => source + case StreamingExecutionRelation(source: KafkaMicroBatchReader, _) => source +} ++ (query.get.lastExecution match { + case null => Seq() + case e => e.logical.collect { +case DataSourceV2Relation(_, reader: KafkaContinuousReader) => reader + } +}) + }.distinct --- End diff -- yes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20622 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87491/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20622 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20622 **[Test build #87491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87491/testReport)** for PR 20622 at commit [`3d8acd2`](https://github.com/apache/spark/commit/3d8acd2974d11a790ab9cd9338673bba18d683ac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20554#discussion_r168625863 --- Diff: external/kafka-0-10-sql/src/test/resources/kafka-source-initial-offset-future-version.bin --- @@ -0,0 +1,2 @@ +0v9 +{"kafka-initial-offset-future-version":{"2":2,"1":1,"0":0}} --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20554#discussion_r168625742 --- Diff: external/kafka-0-10-sql/src/test/resources/kafka-source-initial-offset-version-2.1.0.bin --- @@ -1 +1 @@ -2{"kafka-initial-offset-2-1-0":{"2":0,"1":0,"0":0}} \ No newline at end of file +2{"kafka-initial-offset-2-1-0":{"2":2,"1":1,"0":0}} --- End diff -- I modified the to make the test "deserialization of initial offset written by Spark 2.1.0 " stronger. See the updated test. The way it goes now is that we start the query from earliest offset, and simultaneous have this initial offsets that are NOT at 0 offset. And we check that the query is reading the first offset as given in the initial offset and not the earliest available in the topic. Hence I am changing the file a little bit, the values not the format. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20622 @zsxwing pointed out that the original behavior was more subtly wrong than I expected. What we want to do is cancel the Spark job, and then cleanly restart it from the last checkpoint. But in fact, this was not working, since cancelling a Spark job throws an opaque SparkException which we didn't anticipate. The reason things seemed to work was that the interrupt() call would almost always (but was not guaranteed to) interrupt the job cancellation, thus preventing the SparkException. So I've updated the PR to anticipate that SparkException, and filed SPARK-23444 to ask for a better handle for job cancellations. Note that the continuous processing reconfiguration tests will always deterministically fail if they don't properly catch this exception, so the checking logic isn't really fragile despite being weird. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20622 **[Test build #87495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87495/testReport)** for PR 20622 at commit [`3ad7b3f`](https://github.com/apache/spark/commit/3ad7b3f547dac787022262a2f55bc7a7a6c30cd7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20621 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87488/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20621 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20621 **[Test build #87488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87488/testReport)** for PR 20621 at commit [`6b56408`](https://github.com/apache/spark/commit/6b5640833a2d45986a0cf6074d7211a8ba9d2b3e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87494/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20620 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20620 **[Test build #87494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87494/testReport)** for PR 20620 at commit [`bd46d1c`](https://github.com/apache/spark/commit/bd46d1cb63e7a04e0236f7b1bf70b46fb55f3ea4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 Thank you, All. Now, it's ready for review again for Apache Spark 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20601 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87483/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87487/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org