[spark] branch branch-3.4 updated: [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new a71aed977f5 [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append a71aed977f5 is described below commit a71aed977f5006ad271f947a6ae3cdd38349ed8e Author: Bruce Robbins AuthorDate: Mon Feb 13 15:49:46 2023 +0900 [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append ### What changes were proposed in this pull request? In the `DataType` instance returned by `ArrayInsert#dataType` and `ArrayAppend#dataType`, set `containsNull` to true if either - the input array has `containsNull` set to true - the expression to be inserted/appended is nullable. ### Why are the changes needed? The following two queries return the wrong answer: ``` spark-sql> select array_insert(array(1, 2, 3, 4), 5, cast(null as int)); [1,2,3,4,0] <== should be [1,2,3,4,null] Time taken: 3.879 seconds, Fetched 1 row(s) spark-sql> select array_append(array(1, 2, 3, 4), cast(null as int)); [1,2,3,4,0] <== should be [1,2,3,4,null] Time taken: 0.068 seconds, Fetched 1 row(s) spark-sql> ``` The following two queries throw a `NullPointerException`: ``` spark-sql> select array_insert(array('1', '2', '3', '4'), 5, cast(null as string)); 23/02/10 11:24:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ... spark-sql> select array_append(array('1', '2', '3', '4'), cast(null as string)); 23/02/10 11:25:10 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ... spark-sql> ``` The bug arises because both `ArrayInsert` and `ArrayAppend` use the first child's data type as the function's data type. That is, it uses the first child's `containsNull` setting, regardless of whether the insert/append operation might produce an array containing a null value. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes #39970 from bersprockets/array_insert_anomaly. Authored-by: Bruce Robbins Signed-off-by: Hyukjin Kwon (cherry picked from commit 718b6b7ed8277d5f6577367ab0d49f60f9777df7) Signed-off-by: Hyukjin Kwon --- .../sql/catalyst/expressions/collectionOperations.scala| 4 ++-- .../catalyst/expressions/CollectionExpressionsSuite.scala | 14 ++ .../org/apache/spark/sql/DataFrameFunctionsSuite.scala | 14 ++ 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 92a3127d438..53d8ff160c0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -4840,7 +4840,7 @@ case class ArrayInsert(srcArrayExpr: Expression, posExpr: Expression, itemExpr: override def third: Expression = itemExpr override def prettyName: String = "array_insert" - override def dataType: DataType = first.dataType + override def dataType: DataType = if (third.nullable) first.dataType.asNullable else first.dataType override def nullable: Boolean = first.nullable | second.nullable @transient private lazy val elementType: DataType = @@ -5024,7 +5024,7 @@ case class ArrayAppend(left: Expression, right: Expression) * Returns the [[DataType]] of the result of evaluating this expression. It is invalid to query * the dataType of an unresolved expression (i.e., when `resolved` == false). */ - override def dataType: DataType = left.dataType + override def dataType: DataType = if (right.nullable) left.dataType.asNullable else lef
[spark] branch master updated: [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 718b6b7ed82 [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append 718b6b7ed82 is described below commit 718b6b7ed8277d5f6577367ab0d49f60f9777df7 Author: Bruce Robbins AuthorDate: Mon Feb 13 15:49:46 2023 +0900 [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append ### What changes were proposed in this pull request? In the `DataType` instance returned by `ArrayInsert#dataType` and `ArrayAppend#dataType`, set `containsNull` to true if either - the input array has `containsNull` set to true - the expression to be inserted/appended is nullable. ### Why are the changes needed? The following two queries return the wrong answer: ``` spark-sql> select array_insert(array(1, 2, 3, 4), 5, cast(null as int)); [1,2,3,4,0] <== should be [1,2,3,4,null] Time taken: 3.879 seconds, Fetched 1 row(s) spark-sql> select array_append(array(1, 2, 3, 4), cast(null as int)); [1,2,3,4,0] <== should be [1,2,3,4,null] Time taken: 0.068 seconds, Fetched 1 row(s) spark-sql> ``` The following two queries throw a `NullPointerException`: ``` spark-sql> select array_insert(array('1', '2', '3', '4'), 5, cast(null as string)); 23/02/10 11:24:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ... spark-sql> select array_append(array('1', '2', '3', '4'), cast(null as string)); 23/02/10 11:25:10 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ... spark-sql> ``` The bug arises because both `ArrayInsert` and `ArrayAppend` use the first child's data type as the function's data type. That is, it uses the first child's `containsNull` setting, regardless of whether the insert/append operation might produce an array containing a null value. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes #39970 from bersprockets/array_insert_anomaly. Authored-by: Bruce Robbins Signed-off-by: Hyukjin Kwon --- .../sql/catalyst/expressions/collectionOperations.scala| 4 ++-- .../catalyst/expressions/CollectionExpressionsSuite.scala | 14 ++ .../org/apache/spark/sql/DataFrameFunctionsSuite.scala | 14 ++ 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 92a3127d438..53d8ff160c0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -4840,7 +4840,7 @@ case class ArrayInsert(srcArrayExpr: Expression, posExpr: Expression, itemExpr: override def third: Expression = itemExpr override def prettyName: String = "array_insert" - override def dataType: DataType = first.dataType + override def dataType: DataType = if (third.nullable) first.dataType.asNullable else first.dataType override def nullable: Boolean = first.nullable | second.nullable @transient private lazy val elementType: DataType = @@ -5024,7 +5024,7 @@ case class ArrayAppend(left: Expression, right: Expression) * Returns the [[DataType]] of the result of evaluating this expression. It is invalid to query * the dataType of an unresolved expression (i.e., when `resolved` == false). */ - override def dataType: DataType = left.dataType + override def dataType: DataType = if (right.nullable) left.dataType.asNullable else left.dataType protected def withNewChildrenInternal(newLeft: Expression, newRight: Expression): ArrayAppend =
[spark] branch branch-3.4 updated: [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 95919e26993 [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings 95919e26993 is described below commit 95919e269930f3d1f3716b869e1abb25185d8a44 Author: Xinrong Meng AuthorDate: Mon Feb 13 14:10:34 2023 +0900 [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings ### What changes were proposed in this pull request? Support complex return types in DDL strings. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. ```py # BEFORE >>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id")) ... AssertionError: returnType should be singular >>> spark.udf.register('f', lambda x: (x, x), "struct") ... AssertionError: returnType should be singular # AFTER >>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id")) DataFrame[(id): struct] >>> spark.udf.register('f', lambda x: (x, x), "struct") at 0x7faee0eaaca0> ``` ### How was this patch tested? Unit tests. Closes #39964 from xinrong-meng/collection_ret_type. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon (cherry picked from commit 3985b91633f5e49c8c97433651f81604dad193e9) Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/connect/client.py | 15 ++ python/pyspark/sql/connect/types.py| 19 ++ python/pyspark/sql/connect/udf.py | 23 -- .../pyspark/sql/tests/connect/test_parity_udf.py | 5 - 4 files changed, 25 insertions(+), 37 deletions(-) diff --git a/python/pyspark/sql/connect/client.py b/python/pyspark/sql/connect/client.py index 943a7e70464..2c07596fec0 100644 --- a/python/pyspark/sql/connect/client.py +++ b/python/pyspark/sql/connect/client.py @@ -68,12 +68,12 @@ from pyspark.sql.connect.expressions import ( PythonUDF, CommonInlineUserDefinedFunction, ) +from pyspark.sql.connect.types import parse_data_type from pyspark.sql.types import ( DataType, StructType, StructField, ) -from pyspark.sql.utils import is_remote from pyspark.serializers import CloudPickleSerializer from pyspark.rdd import PythonEvalType @@ -443,23 +443,12 @@ class SparkConnectClient(object): """Create a temporary UDF in the session catalog on the other side. We generate a temporary name for it.""" -from pyspark.sql import SparkSession as PySparkSession - if name is None: name = f"fun_{uuid.uuid4().hex}" # convert str return_type to DataType if isinstance(return_type, str): - -assert is_remote() -return_type_schema = ( # a workaround to parse the DataType from DDL strings -PySparkSession.builder.getOrCreate() -.createDataFrame(data=[], schema=return_type) -.schema -) -assert len(return_type_schema.fields) == 1, "returnType should be singular" -return_type = return_type_schema.fields[0].dataType - +return_type = parse_data_type(return_type) # construct a PythonUDF py_udf = PythonUDF( output_type=return_type.json(), diff --git a/python/pyspark/sql/connect/types.py b/python/pyspark/sql/connect/types.py index f12d6c4827e..6b9975c52cd 100644 --- a/python/pyspark/sql/connect/types.py +++ b/python/pyspark/sql/connect/types.py @@ -51,6 +51,7 @@ from pyspark.sql.types import ( ) import pyspark.sql.connect.proto as pb2 +from pyspark.sql.utils import is_remote JVM_BYTE_MIN: int = -(1 << 7) @@ -337,3 +338,21 @@ def from_arrow_schema(arrow_schema: "pa.Schema") -> StructType: for field in arrow_schema ] ) + + +def parse_data_type(data_type: str) -> DataType: +# Currently we don't have a way to have a current Spark session in Spark Connect, and +# pyspark.sql.SparkSession has a centralized logic to control the session creation. +# So uses pyspark.sql.SparkSession for now. Should replace this to using the current +# Spark session for Spark Connect in the future. +from pyspark.sql import SparkSession as PySparkSession + +assert is_remote() +return_type_schema = ( +PySparkSession.builder.getOrCreate().createDataFrame(data=[], schema=data_type).schema +) +if len(return_type_schema.fields) == 1: +return_type = return_type_schema.fields[0].dataType +else: +return_type = return_type_schema +return return_type diff --git a/python/pyspark/sql/connect/udf.py b/python/pyspark/sql/connect/udf.py index 573d
[spark] branch master updated: [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3985b91633f [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings 3985b91633f is described below commit 3985b91633f5e49c8c97433651f81604dad193e9 Author: Xinrong Meng AuthorDate: Mon Feb 13 14:10:34 2023 +0900 [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings ### What changes were proposed in this pull request? Support complex return types in DDL strings. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. ```py # BEFORE >>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id")) ... AssertionError: returnType should be singular >>> spark.udf.register('f', lambda x: (x, x), "struct") ... AssertionError: returnType should be singular # AFTER >>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id")) DataFrame[(id): struct] >>> spark.udf.register('f', lambda x: (x, x), "struct") at 0x7faee0eaaca0> ``` ### How was this patch tested? Unit tests. Closes #39964 from xinrong-meng/collection_ret_type. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/connect/client.py | 15 ++ python/pyspark/sql/connect/types.py| 19 ++ python/pyspark/sql/connect/udf.py | 23 -- .../pyspark/sql/tests/connect/test_parity_udf.py | 5 - 4 files changed, 25 insertions(+), 37 deletions(-) diff --git a/python/pyspark/sql/connect/client.py b/python/pyspark/sql/connect/client.py index 943a7e70464..2c07596fec0 100644 --- a/python/pyspark/sql/connect/client.py +++ b/python/pyspark/sql/connect/client.py @@ -68,12 +68,12 @@ from pyspark.sql.connect.expressions import ( PythonUDF, CommonInlineUserDefinedFunction, ) +from pyspark.sql.connect.types import parse_data_type from pyspark.sql.types import ( DataType, StructType, StructField, ) -from pyspark.sql.utils import is_remote from pyspark.serializers import CloudPickleSerializer from pyspark.rdd import PythonEvalType @@ -443,23 +443,12 @@ class SparkConnectClient(object): """Create a temporary UDF in the session catalog on the other side. We generate a temporary name for it.""" -from pyspark.sql import SparkSession as PySparkSession - if name is None: name = f"fun_{uuid.uuid4().hex}" # convert str return_type to DataType if isinstance(return_type, str): - -assert is_remote() -return_type_schema = ( # a workaround to parse the DataType from DDL strings -PySparkSession.builder.getOrCreate() -.createDataFrame(data=[], schema=return_type) -.schema -) -assert len(return_type_schema.fields) == 1, "returnType should be singular" -return_type = return_type_schema.fields[0].dataType - +return_type = parse_data_type(return_type) # construct a PythonUDF py_udf = PythonUDF( output_type=return_type.json(), diff --git a/python/pyspark/sql/connect/types.py b/python/pyspark/sql/connect/types.py index f12d6c4827e..6b9975c52cd 100644 --- a/python/pyspark/sql/connect/types.py +++ b/python/pyspark/sql/connect/types.py @@ -51,6 +51,7 @@ from pyspark.sql.types import ( ) import pyspark.sql.connect.proto as pb2 +from pyspark.sql.utils import is_remote JVM_BYTE_MIN: int = -(1 << 7) @@ -337,3 +338,21 @@ def from_arrow_schema(arrow_schema: "pa.Schema") -> StructType: for field in arrow_schema ] ) + + +def parse_data_type(data_type: str) -> DataType: +# Currently we don't have a way to have a current Spark session in Spark Connect, and +# pyspark.sql.SparkSession has a centralized logic to control the session creation. +# So uses pyspark.sql.SparkSession for now. Should replace this to using the current +# Spark session for Spark Connect in the future. +from pyspark.sql import SparkSession as PySparkSession + +assert is_remote() +return_type_schema = ( +PySparkSession.builder.getOrCreate().createDataFrame(data=[], schema=data_type).schema +) +if len(return_type_schema.fields) == 1: +return_type = return_type_schema.fields[0].dataType +else: +return_type = return_type_schema +return return_type diff --git a/python/pyspark/sql/connect/udf.py b/python/pyspark/sql/connect/udf.py index 573d8f582e2..39c31e85992 100644 --- a/python/pyspark/sql/connect/udf.py +++ b/python/pyspark/sql/connect/udf.py @@ -33
[spark] branch master updated: [SPARK-42034] QueryExecutionListener and Observation API do not work with `foreach` / `reduce` / `foreachPartition` action
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a1649ad2429 [SPARK-42034] QueryExecutionListener and Observation API do not work with `foreach` / `reduce` / `foreachPartition` action a1649ad2429 is described below commit a1649ad24298d988267acb8588d19848c7fb16c4 Author: 佘志铭 AuthorDate: Mon Feb 13 14:09:54 2023 +0900 [SPARK-42034] QueryExecutionListener and Observation API do not work with `foreach` / `reduce` / `foreachPartition` action ### What changes were proposed in this pull request? Add the name parameter for 'foreach'/'reduce'/'foreachPartition' operators in `DataSet#withNewRDDExecutionId`. Because the QueryExecutionListener and Observation API is triggered only when the operators have the name parameter. https://github.com/apache/spark/blob/84ddd409c11e4da769c5b1f496f2b61c3d928c07/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L181 ### Why are the changes needed? The QueryExecutionListener and Observation API is triggered only when the operators have the name parameter. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add two unit test. Closes #39976 from ming95/SPARK-42034. Authored-by: 佘志铭 Signed-off-by: Hyukjin Kwon --- .../main/scala/org/apache/spark/sql/Dataset.scala | 10 .../scala/org/apache/spark/sql/DatasetSuite.scala | 13 ++ .../spark/sql/util/DataFrameCallbackSuite.scala| 28 ++ 3 files changed, 46 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index 28177b90c7e..edcfad0c798 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1858,7 +1858,7 @@ class Dataset[T] private[sql]( * @group action * @since 1.6.0 */ - def reduce(func: (T, T) => T): T = withNewRDDExecutionId { + def reduce(func: (T, T) => T): T = withNewRDDExecutionId("reduce") { rdd.reduce(func) } @@ -3336,7 +3336,7 @@ class Dataset[T] private[sql]( * @group action * @since 1.6.0 */ - def foreach(f: T => Unit): Unit = withNewRDDExecutionId { + def foreach(f: T => Unit): Unit = withNewRDDExecutionId("foreach") { rdd.foreach(f) } @@ -3355,7 +3355,7 @@ class Dataset[T] private[sql]( * @group action * @since 1.6.0 */ - def foreachPartition(f: Iterator[T] => Unit): Unit = withNewRDDExecutionId { + def foreachPartition(f: Iterator[T] => Unit): Unit = withNewRDDExecutionId("foreachPartition") { rdd.foreachPartition(f) } @@ -4148,8 +4148,8 @@ class Dataset[T] private[sql]( * them with an execution. Before performing the action, the metrics of the executed plan will be * reset. */ - private def withNewRDDExecutionId[U](body: => U): U = { -SQLExecution.withNewExecutionId(rddQueryExecution) { + private def withNewRDDExecutionId[U](name: String)(body: => U): U = { +SQLExecution.withNewExecutionId(rddQueryExecution, Some(name)) { rddQueryExecution.executedPlan.resetMetrics() body } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala index 86e640a4fa8..263e361413c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala @@ -960,6 +960,19 @@ class DatasetSuite extends QueryTest observe(spark.range(1, 10, 1, 11), Map("percentile_approx_val" -> 5)) } + test("observation on datasets when a DataSet trigger foreach action") { +def f(): Unit = {} + +val namedObservation = Observation("named") +val observed_df = spark.range(100).observe( + namedObservation, percentile_approx($"id", lit(0.5), lit(100)).as("percentile_approx_val")) + +observed_df.foreach(r => f) +val expected = Map("percentile_approx_val" -> 49) + +assert(namedObservation.get === expected) + } + test("sample with replacement") { val n = 100 val data = sparkContext.parallelize(1 to n, 2).toDS() diff --git a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala index 2fc1f10d3ea..f046daacb91 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala @@ -96,6 +96,34 @@ class DataFrameCallbackSuite extends QueryTest spark.listenerManager.unregister(listener) } + test("execut
[spark] branch branch-3.4 updated: [SPARK-42331][SQL] Fix metadata col can not been resolved
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 85ec8d3d19a [SPARK-42331][SQL] Fix metadata col can not been resolved 85ec8d3d19a is described below commit 85ec8d3d19a43658f4eb3c9cb0d19dd63bbf43d9 Author: ulysses-you AuthorDate: Mon Feb 13 12:43:33 2023 +0800 [SPARK-42331][SQL] Fix metadata col can not been resolved ### What changes were proposed in this pull request? This pr makes metadata output consistent during analysis by checking the output and reuse these if exists. This pr also deduplicates the metadata output when merging into the output. ### Why are the changes needed? Let's say a process of resolving metadata: ``` Project (_metadata.file_size) File (_metadata.file_size > 0) Relation ``` 1. `ResolveReferences` resolves _metadata.file_size for `Filter` 2. `ResolveReferences` can not resolve _metadata.file_size for `Project`, due to Filter is not resolved (data type does not match) 3. then `AddMetadataColumns` will merge metadata output into output 4. the next round of `ResolveReferences` can not resolve _metadata.file_size for `Project` since we filter not the confict names(output already contains the metadata output), see code: ``` def isOutputColumn(col: MetadataColumn): Boolean = { outputNames.exists(name => resolve(col.name, name)) } // filter out metadata columns that have names conflicting with output columns. if the table // has a column "line" and the table can produce a metadata column called "line", then the // data column should be returned, not the metadata column. hasMeta.metadataColumns.filterNot(isOutputColumn).toAttributes ``` And we also can not skip metadata column during filter confict name, otherwise the new generated metadata attribute will have different expr id with previous. One failed example: ```scala SELECT _metadata.row_index FROM t WHERE _metadata.row_index >= 0; ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test for v1, v2 and streaming relation Closes #39870 from ulysses-you/SPARK-42331. Authored-by: ulysses-you Signed-off-by: Wenchen Fan (cherry picked from commit 5705436f70e6c6d5a127db7773d3627c8e3d695a) Signed-off-by: Wenchen Fan --- .../sql/catalyst/plans/logical/LogicalPlan.scala | 23 ++ .../datasources/v2/DataSourceV2Relation.scala | 17 - .../execution/datasources/LogicalRelation.scala| 16 - .../execution/streaming/StreamingRelation.scala| 18 +- .../spark/sql/connector/MetadataColumnSuite.scala | 13 ++ .../datasources/FileMetadataStructSuite.scala | 28 +- 6 files changed, 79 insertions(+), 36 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index 9a7726f6a03..5a7dcff3667 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.{AliasAwareQueryOutputOrdering, QueryPlan} import org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats import org.apache.spark.sql.catalyst.trees.{BinaryLike, LeafLike, UnaryLike} +import org.apache.spark.sql.catalyst.util.MetadataColumnHelper import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors} import org.apache.spark.sql.types.{DataType, StructType} @@ -317,5 +318,27 @@ object LogicalPlanIntegrity { * A logical plan node that can generate metadata columns */ trait ExposesMetadataColumns extends LogicalPlan { + protected def metadataOutputWithOutConflicts( + metadataOutput: Seq[AttributeReference]): Seq[AttributeReference] = { +// If `metadataColFromOutput` is not empty that means `AddMetadataColumns` merged +// metadata output into output. We should still return an available metadata output +// so that the rule `ResolveReferences` can resolve metadata column correctly. +val metadataColFromOutput = output.filter(_.isMetadataCol) +if (metadataColFromOutput.isEmpty) { + val resolve = conf.resolver + val outputNames = outputSet.map(_.name) + + def isOutputColumn(col: AttributeReference): Boolean = { +outputNames.exists(name => resolve(col.na
[spark] branch master updated: [SPARK-42331][SQL] Fix metadata col can not been resolved
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5705436f70e [SPARK-42331][SQL] Fix metadata col can not been resolved 5705436f70e is described below commit 5705436f70e6c6d5a127db7773d3627c8e3d695a Author: ulysses-you AuthorDate: Mon Feb 13 12:43:33 2023 +0800 [SPARK-42331][SQL] Fix metadata col can not been resolved ### What changes were proposed in this pull request? This pr makes metadata output consistent during analysis by checking the output and reuse these if exists. This pr also deduplicates the metadata output when merging into the output. ### Why are the changes needed? Let's say a process of resolving metadata: ``` Project (_metadata.file_size) File (_metadata.file_size > 0) Relation ``` 1. `ResolveReferences` resolves _metadata.file_size for `Filter` 2. `ResolveReferences` can not resolve _metadata.file_size for `Project`, due to Filter is not resolved (data type does not match) 3. then `AddMetadataColumns` will merge metadata output into output 4. the next round of `ResolveReferences` can not resolve _metadata.file_size for `Project` since we filter not the confict names(output already contains the metadata output), see code: ``` def isOutputColumn(col: MetadataColumn): Boolean = { outputNames.exists(name => resolve(col.name, name)) } // filter out metadata columns that have names conflicting with output columns. if the table // has a column "line" and the table can produce a metadata column called "line", then the // data column should be returned, not the metadata column. hasMeta.metadataColumns.filterNot(isOutputColumn).toAttributes ``` And we also can not skip metadata column during filter confict name, otherwise the new generated metadata attribute will have different expr id with previous. One failed example: ```scala SELECT _metadata.row_index FROM t WHERE _metadata.row_index >= 0; ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test for v1, v2 and streaming relation Closes #39870 from ulysses-you/SPARK-42331. Authored-by: ulysses-you Signed-off-by: Wenchen Fan --- .../sql/catalyst/plans/logical/LogicalPlan.scala | 23 ++ .../datasources/v2/DataSourceV2Relation.scala | 17 - .../execution/datasources/LogicalRelation.scala| 16 - .../execution/streaming/StreamingRelation.scala| 18 +- .../spark/sql/connector/MetadataColumnSuite.scala | 13 ++ .../datasources/FileMetadataStructSuite.scala | 28 +- 6 files changed, 79 insertions(+), 36 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index 9a7726f6a03..5a7dcff3667 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.{AliasAwareQueryOutputOrdering, QueryPlan} import org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats import org.apache.spark.sql.catalyst.trees.{BinaryLike, LeafLike, UnaryLike} +import org.apache.spark.sql.catalyst.util.MetadataColumnHelper import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors} import org.apache.spark.sql.types.{DataType, StructType} @@ -317,5 +318,27 @@ object LogicalPlanIntegrity { * A logical plan node that can generate metadata columns */ trait ExposesMetadataColumns extends LogicalPlan { + protected def metadataOutputWithOutConflicts( + metadataOutput: Seq[AttributeReference]): Seq[AttributeReference] = { +// If `metadataColFromOutput` is not empty that means `AddMetadataColumns` merged +// metadata output into output. We should still return an available metadata output +// so that the rule `ResolveReferences` can resolve metadata column correctly. +val metadataColFromOutput = output.filter(_.isMetadataCol) +if (metadataColFromOutput.isEmpty) { + val resolve = conf.resolver + val outputNames = outputSet.map(_.name) + + def isOutputColumn(col: AttributeReference): Boolean = { +outputNames.exists(name => resolve(col.name, name)) + } + // filter out the metadata struct column if it has the name conflicting with output c
svn commit: r60067 - /release/spark/spark-3.1.3/
Author: srowen Date: Mon Feb 13 04:32:53 2023 New Revision: 60067 Log: Remove EOL Spark 3.1.x Removed: release/spark/spark-3.1.3/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42391][CORE][TESTS] Close live `AppStatusStore` in the finally block for test cases
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 89d1d17e57a [SPARK-42391][CORE][TESTS] Close live `AppStatusStore` in the finally block for test cases 89d1d17e57a is described below commit 89d1d17e57abc4c825cb9259f7a319b80e0d854a Author: yangjie01 AuthorDate: Sun Feb 12 20:14:52 2023 -0800 [SPARK-42391][CORE][TESTS] Close live `AppStatusStore` in the finally block for test cases ### What changes were proposed in this pull request? `AppStatusStore.createLiveStore` will return `RocksDB` backend `AppStatusStore` when `LIVE_UI_LOCAL_STORE_DIR` or `LIVE_UI_LOCAL_STORE_DIR` is configured, it should be closed in finally block to release resources for test cases. There are 4 test suites use `AppStatusStore.createLiveStore` function: - `AppStatusStoreSuite`: one case not close `AppStatusStore` - `StagePageSuite`: already call close in `finally` block - `AllExecutionsPageSuite`: already call close in `after` - `SQLAppStatusListenerSuite`: already call close in `after` and only `AppStatusStoreSuite` has `AppStatusStore` without manual closing, so this pr has made the following changes: - For `SPARK-36038: speculation summary should not be present if there are no speculative tasks` in `AppStatusStoreSuite`, add close `AppStatusStore` in finally block - For `SPARK-26260: summary should contain only successful tasks' metrics (store = $hint)`, move the existing `AppStatusStore.close` to the finally block ### Why are the changes needed? Call `AppStatusStore.close` in the finally block to release possible RocksDB resources. ### Does this PR introduce _any_ user-facing change? No, just for test ### How was this patch tested? - Pass GitHub Actions - Manual test ``` export LIVE_UI_LOCAL_STORE_DIR=/tmp/spark-ui build/sbt clean "core/testOnly org.apache.spark.status.AppStatusStoreSuite" ``` All tests passed Closes #39961 from LuciferYang/SPARK-42391. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../apache/spark/status/AppStatusStoreSuite.scala | 173 +++-- 1 file changed, 90 insertions(+), 83 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala b/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala index d38b0857e57..ccf6c9184cc 100644 --- a/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala +++ b/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala @@ -133,78 +133,81 @@ class AppStatusStoreSuite extends SparkFunSuite { cases.foreach { case (hint, appStore) => test(s"SPARK-26260: summary should contain only successful tasks' metrics (store = $hint)") { assume(appStore != null) - val store = appStore.store - - // Success and failed tasks metrics - for (i <- 0 to 5) { -if (i % 2 == 0) { - writeTaskDataToStore(i, store, "FAILED") -} else { - writeTaskDataToStore(i, store, "SUCCESS") + try { +val store = appStore.store + +// Success and failed tasks metrics +for (i <- 0 to 5) { + if (i % 2 == 0) { +writeTaskDataToStore(i, store, "FAILED") + } else { +writeTaskDataToStore(i, store, "SUCCESS") + } } - } - // Running tasks metrics (-1 = no metrics reported, positive = metrics have been reported) - Seq(-1, 6).foreach { metric => -writeTaskDataToStore(metric, store, "RUNNING") - } +// Running tasks metrics (-1 = no metrics reported, positive = metrics have been reported) +Seq(-1, 6).foreach { metric => + writeTaskDataToStore(metric, store, "RUNNING") +} - /** - * Following are the tasks metrics, - * 1, 3, 5 => Success - * 0, 2, 4 => Failed - * -1, 6 => Running - * - * Task summary will consider (1, 3, 5) only - */ - val summary = appStore.taskSummary(stageId, attemptId, uiQuantiles).get - val successfulTasks = Array(getTaskMetrics(1), getTaskMetrics(3), getTaskMetrics(5)) - - def assertQuantiles(metricGetter: TaskMetrics => Double, -actualQuantiles: Seq[Double]): Unit = { -val values = successfulTasks.map(metricGetter) -val expectedQuantiles = new Distribution(values, 0, values.length) - .getQuantiles(uiQuantiles.sorted) - -assert(actualQuantiles === expectedQuantiles) - } +/** + * Following are the tasks metrics, + * 1, 3, 5 => Success + * 0, 2, 4 => Failed + * -1, 6 => Running + * + * Task summary will consider (1, 3, 5) only + */ +val summary
[spark] branch branch-3.4 updated: [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 4dd2ef95af8 [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together 4dd2ef95af8 is described below commit 4dd2ef95af8546a6dc80aa9f1b9dde5513b5d0c9 Author: Dongjoon Hyun AuthorDate: Sun Feb 12 18:57:52 2023 -0800 [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together ### What changes were proposed in this pull request? This is a follow-up of #39982 to fix `PlanGenerationTestSuite` together. ### Why are the changes needed? SPARK-42377 added two test suites originally which fails at Scala 2.13, but SPARK-42410 missed `PlanGenerationTestSuite ` while fixing `ProtoToParsedPlanTestSuite`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ``` $ dev/change-scala-version.sh 2.13 $ build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13 [info] PlanGenerationTestSuite: ... [info] - function udf 2.13 (514 milliseconds) ... $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13 ... [info] PlanGenerationTestSuite: ... [info] - function udf 2.13 (574 milliseconds) ``` Closes #39986 from dongjoon-hyun/SPARK-42410-2. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit f8fda8aa1356b3ac902570e0a5536d9c00838490) Signed-off-by: Dongjoon Hyun --- .../test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala| 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala index a15539afaa4..494c497c553 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala @@ -20,6 +20,7 @@ import java.nio.file.{Files, Path} import scala.collection.mutable import scala.util.{Failure, Success, Try} +import scala.util.Properties.versionNumberString import com.google.protobuf.util.JsonFormat import io.grpc.inprocess.InProcessChannelBuilder @@ -57,6 +58,8 @@ class PlanGenerationTestSuite extends ConnectFunSuite with BeforeAndAfterAll wit // Borrowed from SparkFunSuite private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + private val scala = versionNumberString.substring(0, versionNumberString.indexOf(".", 2)) + // Borrowed from SparkFunSuite private def getWorkspaceFilePath(first: String, more: String*): Path = { if (!(sys.props.contains("spark.test.home") || sys.env.contains("SPARK_HOME"))) { @@ -209,7 +212,7 @@ class PlanGenerationTestSuite extends ConnectFunSuite with BeforeAndAfterAll wit select(fn.col("id")) } - test("function udf") { + test("function udf " + scala) { // This test might be a bit tricky if different JVM // versions are used to generate the golden files. val functions = Seq( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f8fda8aa135 [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together f8fda8aa135 is described below commit f8fda8aa1356b3ac902570e0a5536d9c00838490 Author: Dongjoon Hyun AuthorDate: Sun Feb 12 18:57:52 2023 -0800 [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together ### What changes were proposed in this pull request? This is a follow-up of #39982 to fix `PlanGenerationTestSuite` together. ### Why are the changes needed? SPARK-42377 added two test suites originally which fails at Scala 2.13, but SPARK-42410 missed `PlanGenerationTestSuite ` while fixing `ProtoToParsedPlanTestSuite`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ``` $ dev/change-scala-version.sh 2.13 $ build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13 [info] PlanGenerationTestSuite: ... [info] - function udf 2.13 (514 milliseconds) ... $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13 ... [info] PlanGenerationTestSuite: ... [info] - function udf 2.13 (574 milliseconds) ``` Closes #39986 from dongjoon-hyun/SPARK-42410-2. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala| 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala index a15539afaa4..494c497c553 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala @@ -20,6 +20,7 @@ import java.nio.file.{Files, Path} import scala.collection.mutable import scala.util.{Failure, Success, Try} +import scala.util.Properties.versionNumberString import com.google.protobuf.util.JsonFormat import io.grpc.inprocess.InProcessChannelBuilder @@ -57,6 +58,8 @@ class PlanGenerationTestSuite extends ConnectFunSuite with BeforeAndAfterAll wit // Borrowed from SparkFunSuite private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + private val scala = versionNumberString.substring(0, versionNumberString.indexOf(".", 2)) + // Borrowed from SparkFunSuite private def getWorkspaceFilePath(first: String, more: String*): Path = { if (!(sys.props.contains("spark.test.home") || sys.env.contains("SPARK_HOME"))) { @@ -209,7 +212,7 @@ class PlanGenerationTestSuite extends ConnectFunSuite with BeforeAndAfterAll wit select(fn.col("id")) } - test("function udf") { + test("function udf " + scala) { // This test might be a bit tricky if different JVM // versions are used to generate the golden files. val functions = Seq( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new d3b59aad100 [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module d3b59aad100 is described below commit d3b59aad1000820d7fcbd78e0e1810d9bf81b8cc Author: Dongjoon Hyun AuthorDate: Sun Feb 12 17:31:26 2023 -0800 [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module ### What changes were proposed in this pull request? This PR aims to support both Scala 2.12/13 tests in `connect` module by splitting the golden files. ### Why are the changes needed? After #39933, Scala 2.13 CIs are broken . - **master**: https://github.com/apache/spark/actions/runs/4157806226/jobs/7192575602 - **branch-3.4**: https://github.com/apache/spark/actions/runs/4155578848/jobs/7188777977 ``` [info] - function_udf *** FAILED *** (29 milliseconds) [info] java.io.InvalidClassException: org.apache.spark.sql.TestUDFs$$anon$1; local class incompatible: stream classdesc serialVersionUID = 505010451380771093, local class serialVersionUID = 643575318841761245 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and do Scala 2.13 tests manually. ``` $ dev/change-scala-version.sh 2.13 $ build/sbt -Dscala.version=2.13.8 -Pscala-2.13 -Phadoop-3 assembly/package "connect/test" ... [info] - function_udf_2.13 (19 milliseconds) [info] Run completed in 8 seconds, 964 milliseconds. [info] Total number of tests run: 119 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 119, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 18 s, completed Feb 12, 2023, 3:33:40 PM ``` Closes #39982 from dongjoon-hyun/SPARK-42410. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit a5db9b976d8f37d22f8a91f461c05cbb20601d8a) Signed-off-by: Dongjoon Hyun --- ...ction_udf.explain => function_udf_2.12.explain} | 0 ...ction_udf.explain => function_udf_2.13.explain} | 0 .../{function_udf.json => function_udf_2.12.json} | 0 ...n_udf.proto.bin => function_udf_2.12.proto.bin} | Bin .../query-tests/queries/function_udf_2.13.json | 96 + .../queries/function_udf_2.13.proto.bin| Bin 0 -> 12092 bytes .../sql/connect/ProtoToParsedPlanTestSuite.scala | 5 ++ 7 files changed, 101 insertions(+) diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain similarity index 100% copy from connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain copy to connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain similarity index 100% rename from connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain rename to connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.json b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json similarity index 100% rename from connector/connect/common/src/test/resources/query-tests/queries/function_udf.json rename to connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin similarity index 100% rename from connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin rename to connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json new file mode 100644 index 000..23aeb078543 --- /dev/null +++ b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json @@ -0,0 +1,96 @@ +{ + "project": { +"input": { + "localRelation": { +"schema": "struct\u003cid:bigint,a:int,b:double\u003e" + } +}, +"expressions": [{ +
[spark] branch master updated: [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a5db9b976d8 [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module a5db9b976d8 is described below commit a5db9b976d8f37d22f8a91f461c05cbb20601d8a Author: Dongjoon Hyun AuthorDate: Sun Feb 12 17:31:26 2023 -0800 [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module ### What changes were proposed in this pull request? This PR aims to support both Scala 2.12/13 tests in `connect` module by splitting the golden files. ### Why are the changes needed? After #39933, Scala 2.13 CIs are broken . - **master**: https://github.com/apache/spark/actions/runs/4157806226/jobs/7192575602 - **branch-3.4**: https://github.com/apache/spark/actions/runs/4155578848/jobs/7188777977 ``` [info] - function_udf *** FAILED *** (29 milliseconds) [info] java.io.InvalidClassException: org.apache.spark.sql.TestUDFs$$anon$1; local class incompatible: stream classdesc serialVersionUID = 505010451380771093, local class serialVersionUID = 643575318841761245 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and do Scala 2.13 tests manually. ``` $ dev/change-scala-version.sh 2.13 $ build/sbt -Dscala.version=2.13.8 -Pscala-2.13 -Phadoop-3 assembly/package "connect/test" ... [info] - function_udf_2.13 (19 milliseconds) [info] Run completed in 8 seconds, 964 milliseconds. [info] Total number of tests run: 119 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 119, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 18 s, completed Feb 12, 2023, 3:33:40 PM ``` Closes #39982 from dongjoon-hyun/SPARK-42410. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- ...ction_udf.explain => function_udf_2.12.explain} | 0 ...ction_udf.explain => function_udf_2.13.explain} | 0 .../{function_udf.json => function_udf_2.12.json} | 0 ...n_udf.proto.bin => function_udf_2.12.proto.bin} | Bin .../query-tests/queries/function_udf_2.13.json | 96 + .../queries/function_udf_2.13.proto.bin| Bin 0 -> 12092 bytes .../sql/connect/ProtoToParsedPlanTestSuite.scala | 5 ++ 7 files changed, 101 insertions(+) diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain similarity index 100% copy from connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain copy to connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain similarity index 100% rename from connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain rename to connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.json b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json similarity index 100% rename from connector/connect/common/src/test/resources/query-tests/queries/function_udf.json rename to connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin similarity index 100% rename from connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin rename to connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin diff --git a/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json new file mode 100644 index 000..23aeb078543 --- /dev/null +++ b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json @@ -0,0 +1,96 @@ +{ + "project": { +"input": { + "localRelation": { +"schema": "struct\u003cid:bigint,a:int,b:double\u003e" + } +}, +"expressions": [{ + "commonInlineUserDefinedFunction": { +"scalarScalaUdf": { + "payload": "rO0ABXNyAC1vcmcuYXBhY2hl
[spark] branch branch-3.4 updated: [SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the same error class when the `values` argument is empty
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 92d82698760 [SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the same error class when the `values` argument is empty 92d82698760 is described below commit 92d8269876019e8580b2e60d3b3891ac13b5740b Author: Takuya UESHIN AuthorDate: Mon Feb 13 09:45:05 2023 +0900 [SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the same error class when the `values` argument is empty ### What changes were proposed in this pull request? Fixes `DataFrame.unpivot` to raise the same error class when the `values` argument is an empty list/tuple. ### Why are the changes needed? Currently `DataFrame.unpivot` raises a different error class, `UNPIVOT_REQUIRES_VALUE_COLUMNS` for PySpark vs. `UNPIVOT_VALUE_DATA_TYPE_MISMATCH` for Spark Connect. In `Unpivot`, an empty list/tuple as `values` argument is different from `None`. It should handle them differently. ### Does this PR introduce _any_ user-facing change? `DataFrame.unpivot` on Spark Connect will raise the same error class as PySpark. ### How was this patch tested? Enabled `DataFrameParityTests.test_unpivot_negative`. Closes #39960 from ueshin/issues/SPARK-41963/unpivot. Authored-by: Takuya UESHIN Signed-off-by: Hyukjin Kwon (cherry picked from commit 633a486c65067b483524b079810b5590ac482a48) Signed-off-by: Hyukjin Kwon --- .../main/protobuf/spark/connect/relations.proto| 6 +++- .../org/apache/spark/sql/connect/dsl/package.scala | 5 ++- .../sql/connect/planner/SparkConnectPlanner.scala | 4 +-- python/pyspark/sql/connect/dataframe.py| 8 +++-- python/pyspark/sql/connect/plan.py | 5 +-- python/pyspark/sql/connect/proto/relations_pb2.py | 25 ++ python/pyspark/sql/connect/proto/relations_pb2.pyi | 39 +- .../pyspark/sql/tests/connect/test_connect_plan.py | 19 +++ .../sql/tests/connect/test_parity_dataframe.py | 5 --- 9 files changed, 83 insertions(+), 33 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto index 3d597fd2744..ea1216957d8 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto @@ -732,13 +732,17 @@ message Unpivot { repeated Expression ids = 2; // (Optional) Value columns to unpivot. - repeated Expression values = 3; + optional Values values = 3; // (Required) Name of the variable column. string variable_column_name = 4; // (Required) Name of the value column. string value_column_name = 5; + + message Values { +repeated Expression values = 1; + } } message ToSchema { diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala index 88531286e24..f91040a1009 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala @@ -990,7 +990,10 @@ package object dsl { .newBuilder() .setInput(logicalPlan) .addAllIds(ids.asJava) - .addAllValues(values.asJava) + .setValues(Unpivot.Values +.newBuilder() +.addAllValues(values.asJava) +.build()) .setVariableColumnName(variableColumnName) .setValueColumnName(valueColumnName)) .build() diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 194588fe89b..740d6b85964 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -515,7 +515,7 @@ class SparkConnectPlanner(val session: SparkSession) { Column(transformExpression(expr)) } -if (rel.getValuesList.isEmpty) { +if (!rel.hasValues) { Unpivot( Some(ids.map(_.named)), None, @@ -524,7 +524,7 @@ class SparkConnectPlanner(val session: SparkSession) { Seq(rel.getValueColumnName), transformRelation(rel.getInput)) } else { - val values = rel.getValuesList.asScala.toArray.map { expr => + val values = rel.getValues.ge
[spark] branch master updated (d703808347c -> 633a486c650)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d703808347c [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure add 633a486c650 [SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the same error class when the `values` argument is empty No new revisions were added by this update. Summary of changes: .../main/protobuf/spark/connect/relations.proto| 6 +++- .../org/apache/spark/sql/connect/dsl/package.scala | 5 ++- .../sql/connect/planner/SparkConnectPlanner.scala | 4 +-- python/pyspark/sql/connect/dataframe.py| 8 +++-- python/pyspark/sql/connect/plan.py | 5 +-- python/pyspark/sql/connect/proto/relations_pb2.py | 25 ++ python/pyspark/sql/connect/proto/relations_pb2.pyi | 39 +- .../pyspark/sql/tests/connect/test_connect_plan.py | 19 +++ .../sql/tests/connect/test_parity_dataframe.py | 5 --- 9 files changed, 83 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 35cd46183b4 [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure 35cd46183b4 is described below commit 35cd46183b4a4861b53120c8fbcad9c969fe465e Author: Dongjoon Hyun AuthorDate: Mon Feb 13 09:43:12 2023 +0900 [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure ### What changes were proposed in this pull request? This is a follow-up of #39947 to ignore `freqItems` doctest back. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #39983 from dongjoon-hyun/SPARK-40453. Authored-by: Dongjoon Hyun Signed-off-by: Hyukjin Kwon (cherry picked from commit d703808347ca61ed05541e7252a0e881b1be7431) Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/dataframe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 5649d362b8b..d9de9ee14ac 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -4598,7 +4598,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Examples >>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"]) ->>> df.freqItems(["c1", "c2"]).show() +>>> df.freqItems(["c1", "c2"]).show() # doctest: +SKIP +++ |c1_freqItems|c2_freqItems| +++ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d703808347c [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure d703808347c is described below commit d703808347ca61ed05541e7252a0e881b1be7431 Author: Dongjoon Hyun AuthorDate: Mon Feb 13 09:43:12 2023 +0900 [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure ### What changes were proposed in this pull request? This is a follow-up of #39947 to ignore `freqItems` doctest back. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #39983 from dongjoon-hyun/SPARK-40453. Authored-by: Dongjoon Hyun Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/dataframe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 5649d362b8b..d9de9ee14ac 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -4598,7 +4598,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Examples >>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"]) ->>> df.freqItems(["c1", "c2"]).show() +>>> df.freqItems(["c1", "c2"]).show() # doctest: +SKIP +++ |c1_freqItems|c2_freqItems| +++ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42409][BUILD] Upgrade ZSTD-JNI to 1.5.4-1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 17eff5d82d7 [SPARK-42409][BUILD] Upgrade ZSTD-JNI to 1.5.4-1 17eff5d82d7 is described below commit 17eff5d82d700d30707b42f68e8710894694b7b6 Author: yangjie01 AuthorDate: Sun Feb 12 13:11:06 2023 -0800 [SPARK-42409][BUILD] Upgrade ZSTD-JNI to 1.5.4-1 ### What changes were proposed in this pull request? This pr aims to upgrade zstd-jni from 1.5.2-5 to 1.5.4-1 ### Why are the changes needed? This version add support for no finalizer zstd BufferDecompressingStream - https://github.com/luben/zstd-jni/pull/244 All changes as follows: - https://github.com/luben/zstd-jni/compare/c1.5.2-5...v1.5.4-1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #39981 from LuciferYang/SPARK-42409. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index 7e6990b5b24..964c06cb74c 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -268,4 +268,4 @@ xz/1.9//xz-1.9.jar zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar zookeeper-jute/3.6.3//zookeeper-jute-3.6.3.jar zookeeper/3.6.3//zookeeper-3.6.3.jar -zstd-jni/1.5.2-5//zstd-jni-1.5.2-5.jar +zstd-jni/1.5.4-1//zstd-jni-1.5.4-1.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index d9ea8223eeb..0783abfe2fc 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -253,4 +253,4 @@ xz/1.9//xz-1.9.jar zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar zookeeper-jute/3.6.3//zookeeper-jute-3.6.3.jar zookeeper/3.6.3//zookeeper-3.6.3.jar -zstd-jni/1.5.2-5//zstd-jni-1.5.2-5.jar +zstd-jni/1.5.4-1//zstd-jni-1.5.4-1.jar diff --git a/pom.xml b/pom.xml index 8fdc06b335c..637e7a790ed 100644 --- a/pom.xml +++ b/pom.xml @@ -806,7 +806,7 @@ com.github.luben zstd-jni -1.5.2-5 +1.5.4-1 com.clearspring.analytics - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a27c604eef -> c49b23fd81e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4a27c604eef [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042 add c49b23fd81e [SPARK-42400] Code clean up in org.apache.spark.storage No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/storage/BlockManager.scala | 12 +--- .../main/scala/org/apache/spark/storage/BlockManagerId.scala | 2 +- .../scala/org/apache/spark/storage/BlockManagerMaster.scala | 3 +-- .../apache/spark/storage/BlockManagerMasterEndpoint.scala| 8 .../scala/org/apache/spark/storage/DiskBlockManager.scala| 8 core/src/main/scala/org/apache/spark/storage/RDDInfo.scala | 6 ++ .../apache/spark/storage/ShuffleBlockFetcherIterator.scala | 8 .../scala/org/apache/spark/storage/memory/MemoryStore.scala | 2 +- .../spark/storage/ShuffleBlockFetcherIteratorSuite.scala | 2 -- 9 files changed, 22 insertions(+), 29 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 2729e12d3ed [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042 2729e12d3ed is described below commit 2729e12d3ed0e0513da7fcb2a4c6bdc78aa955ac Author: itholic AuthorDate: Sun Feb 12 19:13:05 2023 +0500 [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_0042, "INVALID_SET_SYNTAX". ### Why are the changes needed? We should assign proper name to _LEGACY_ERROR_TEMP_* ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*` Closes #39951 from itholic/LEGACY_0042. Lead-authored-by: itholic Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com> Signed-off-by: Max Gekk (cherry picked from commit 4a27c604eef6f06672a3d2aaa5e40285e15bacab) Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 11 + .../spark/sql/errors/QueryParsingErrors.scala | 2 +- .../spark/sql/execution/SparkSqlParserSuite.scala | 28 +++--- 3 files changed, 21 insertions(+), 20 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 3082e482392..33f31a97acf 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1004,6 +1004,12 @@ }, "sqlState" : "42K07" }, + "INVALID_SET_SYNTAX" : { +"message" : [ + "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, or include semicolon in value, please use backquotes, e.g., SET `key`=`value`." +], +"sqlState" : "42000" + }, "INVALID_SQL_ARG" : { "message" : [ "The argument of `sql()` is invalid. Consider to replace it by a SQL literal." @@ -2052,11 +2058,6 @@ "Found duplicate clauses: ." ] }, - "_LEGACY_ERROR_TEMP_0042" : { -"message" : [ - "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, or include semicolon in value, please use quotes, e.g., SET `key`=`value`." -] - }, "_LEGACY_ERROR_TEMP_0043" : { "message" : [ "Expected format is 'RESET' or 'RESET key'. If you want to include special characters in key, please use quotes, e.g., RESET `key`." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala index accf5363d6c..57868020736 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala @@ -478,7 +478,7 @@ private[sql] object QueryParsingErrors extends QueryErrorsBase { } def unexpectedFormatForSetConfigurationError(ctx: ParserRuleContext): Throwable = { -new ParseException(errorClass = "_LEGACY_ERROR_TEMP_0042", ctx) +new ParseException(errorClass = "INVALID_SET_SYNTAX", ctx) } def invalidPropertyKeyForSetQuotedConfigurationError( diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala index 57c991c34d9..d6a3b74ee4c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala @@ -119,7 +119,7 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { val sql1 = "SET spark.sql.key value" checkError( exception = parseException(sql1), - errorClass = "_LEGACY_ERROR_TEMP_0042", + errorClass = "INVALID_SET_SYNTAX", parameters = Map.empty, context = ExpectedContext( fragment = sql1, @@ -129,7 +129,7 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { val sql2 = "SET spark.sql.key 'value'" checkError( exception = parseException(sql2), - errorClass = "_LEGACY_ERROR_TEMP_0042", + errorClass = "INVALID_SET_SYNTAX", parameters = Map.empty, context = ExpectedContext( fragment = sql2, @@ -139,7 +139,7 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { val sql3 = "SETspark.sql.key \"value\" " checkError( exception = parseException(sql3), - errorClass = "_LEGACY_ERROR_TEMP_0042", + errorClass = "INVALID_SET
[spark] branch master updated: [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4a27c604eef [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042 4a27c604eef is described below commit 4a27c604eef6f06672a3d2aaa5e40285e15bacab Author: itholic AuthorDate: Sun Feb 12 19:13:05 2023 +0500 [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_0042, "INVALID_SET_SYNTAX". ### Why are the changes needed? We should assign proper name to _LEGACY_ERROR_TEMP_* ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*` Closes #39951 from itholic/LEGACY_0042. Lead-authored-by: itholic Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com> Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 11 + .../spark/sql/errors/QueryParsingErrors.scala | 2 +- .../spark/sql/execution/SparkSqlParserSuite.scala | 28 +++--- 3 files changed, 21 insertions(+), 20 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 20685622bc5..f7e4086263d 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1004,6 +1004,12 @@ }, "sqlState" : "42K07" }, + "INVALID_SET_SYNTAX" : { +"message" : [ + "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, or include semicolon in value, please use backquotes, e.g., SET `key`=`value`." +], +"sqlState" : "42000" + }, "INVALID_SQL_ARG" : { "message" : [ "The argument of `sql()` is invalid. Consider to replace it by a SQL literal." @@ -2052,11 +2058,6 @@ "Found duplicate clauses: ." ] }, - "_LEGACY_ERROR_TEMP_0042" : { -"message" : [ - "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, or include semicolon in value, please use quotes, e.g., SET `key`=`value`." -] - }, "_LEGACY_ERROR_TEMP_0043" : { "message" : [ "Expected format is 'RESET' or 'RESET key'. If you want to include special characters in key, please use quotes, e.g., RESET `key`." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala index accf5363d6c..57868020736 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala @@ -478,7 +478,7 @@ private[sql] object QueryParsingErrors extends QueryErrorsBase { } def unexpectedFormatForSetConfigurationError(ctx: ParserRuleContext): Throwable = { -new ParseException(errorClass = "_LEGACY_ERROR_TEMP_0042", ctx) +new ParseException(errorClass = "INVALID_SET_SYNTAX", ctx) } def invalidPropertyKeyForSetQuotedConfigurationError( diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala index 57c991c34d9..d6a3b74ee4c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala @@ -119,7 +119,7 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { val sql1 = "SET spark.sql.key value" checkError( exception = parseException(sql1), - errorClass = "_LEGACY_ERROR_TEMP_0042", + errorClass = "INVALID_SET_SYNTAX", parameters = Map.empty, context = ExpectedContext( fragment = sql1, @@ -129,7 +129,7 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { val sql2 = "SET spark.sql.key 'value'" checkError( exception = parseException(sql2), - errorClass = "_LEGACY_ERROR_TEMP_0042", + errorClass = "INVALID_SET_SYNTAX", parameters = Map.empty, context = ExpectedContext( fragment = sql2, @@ -139,7 +139,7 @@ class SparkSqlParserSuite extends AnalysisTest with SharedSparkSession { val sql3 = "SETspark.sql.key \"value\" " checkError( exception = parseException(sql3), - errorClass = "_LEGACY_ERROR_TEMP_0042", + errorClass = "INVALID_SET_SYNTAX", parameters = Map.empty, context = ExpectedContext( fragment = "SETspark.s