[spark] branch branch-3.4 updated: [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new a71aed977f5 [SPARK-42401][SQL] Set `containsNull` correctly in the 
data type for array_insert/array_append
a71aed977f5 is described below

commit a71aed977f5006ad271f947a6ae3cdd38349ed8e
Author: Bruce Robbins 
AuthorDate: Mon Feb 13 15:49:46 2023 +0900

[SPARK-42401][SQL] Set `containsNull` correctly in the data type for 
array_insert/array_append

### What changes were proposed in this pull request?

In the `DataType` instance returned by `ArrayInsert#dataType` and 
`ArrayAppend#dataType`, set `containsNull` to true if either

- the input array has `containsNull` set to true
- the expression to be inserted/appended is nullable.

### Why are the changes needed?

The following two queries return the wrong answer:
```
spark-sql> select array_insert(array(1, 2, 3, 4), 5, cast(null as int));
[1,2,3,4,0] <== should be [1,2,3,4,null]
Time taken: 3.879 seconds, Fetched 1 row(s)
spark-sql> select array_append(array(1, 2, 3, 4), cast(null as int));
[1,2,3,4,0] <== should be [1,2,3,4,null]
Time taken: 0.068 seconds, Fetched 1 row(s)
spark-sql>
```
The following two queries throw a `NullPointerException`:
```
spark-sql> select array_insert(array('1', '2', '3', '4'), 5, cast(null as 
string));
23/02/10 11:24:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
...
spark-sql> select array_append(array('1', '2', '3', '4'), cast(null as 
string));
23/02/10 11:25:10 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
...
spark-sql>
```
The bug arises because both `ArrayInsert` and `ArrayAppend` use the first 
child's data type as the function's data type. That is, it uses the first 
child's `containsNull` setting, regardless of whether the insert/append 
operation might produce an array containing a null value.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

Closes #39970 from bersprockets/array_insert_anomaly.

Authored-by: Bruce Robbins 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 718b6b7ed8277d5f6577367ab0d49f60f9777df7)
Signed-off-by: Hyukjin Kwon 
---
 .../sql/catalyst/expressions/collectionOperations.scala|  4 ++--
 .../catalyst/expressions/CollectionExpressionsSuite.scala  | 14 ++
 .../org/apache/spark/sql/DataFrameFunctionsSuite.scala | 14 ++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 92a3127d438..53d8ff160c0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -4840,7 +4840,7 @@ case class ArrayInsert(srcArrayExpr: Expression, posExpr: 
Expression, itemExpr:
   override def third: Expression = itemExpr
 
   override def prettyName: String = "array_insert"
-  override def dataType: DataType = first.dataType
+  override def dataType: DataType = if (third.nullable) 
first.dataType.asNullable else first.dataType
   override def nullable: Boolean = first.nullable | second.nullable
 
   @transient private lazy val elementType: DataType =
@@ -5024,7 +5024,7 @@ case class ArrayAppend(left: Expression, right: 
Expression)
* Returns the [[DataType]] of the result of evaluating this expression. It 
is invalid to query
* the dataType of an unresolved expression (i.e., when `resolved` == false).
*/
-  override def dataType: DataType = left.dataType
+  override def dataType: DataType = if (right.nullable) 
left.dataType.asNullable else lef

[spark] branch master updated: [SPARK-42401][SQL] Set `containsNull` correctly in the data type for array_insert/array_append

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 718b6b7ed82 [SPARK-42401][SQL] Set `containsNull` correctly in the 
data type for array_insert/array_append
718b6b7ed82 is described below

commit 718b6b7ed8277d5f6577367ab0d49f60f9777df7
Author: Bruce Robbins 
AuthorDate: Mon Feb 13 15:49:46 2023 +0900

[SPARK-42401][SQL] Set `containsNull` correctly in the data type for 
array_insert/array_append

### What changes were proposed in this pull request?

In the `DataType` instance returned by `ArrayInsert#dataType` and 
`ArrayAppend#dataType`, set `containsNull` to true if either

- the input array has `containsNull` set to true
- the expression to be inserted/appended is nullable.

### Why are the changes needed?

The following two queries return the wrong answer:
```
spark-sql> select array_insert(array(1, 2, 3, 4), 5, cast(null as int));
[1,2,3,4,0] <== should be [1,2,3,4,null]
Time taken: 3.879 seconds, Fetched 1 row(s)
spark-sql> select array_append(array(1, 2, 3, 4), cast(null as int));
[1,2,3,4,0] <== should be [1,2,3,4,null]
Time taken: 0.068 seconds, Fetched 1 row(s)
spark-sql>
```
The following two queries throw a `NullPointerException`:
```
spark-sql> select array_insert(array('1', '2', '3', '4'), 5, cast(null as 
string));
23/02/10 11:24:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
...
spark-sql> select array_append(array('1', '2', '3', '4'), cast(null as 
string));
23/02/10 11:25:10 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
...
spark-sql>
```
The bug arises because both `ArrayInsert` and `ArrayAppend` use the first 
child's data type as the function's data type. That is, it uses the first 
child's `containsNull` setting, regardless of whether the insert/append 
operation might produce an array containing a null value.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

Closes #39970 from bersprockets/array_insert_anomaly.

Authored-by: Bruce Robbins 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/catalyst/expressions/collectionOperations.scala|  4 ++--
 .../catalyst/expressions/CollectionExpressionsSuite.scala  | 14 ++
 .../org/apache/spark/sql/DataFrameFunctionsSuite.scala | 14 ++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 92a3127d438..53d8ff160c0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -4840,7 +4840,7 @@ case class ArrayInsert(srcArrayExpr: Expression, posExpr: 
Expression, itemExpr:
   override def third: Expression = itemExpr
 
   override def prettyName: String = "array_insert"
-  override def dataType: DataType = first.dataType
+  override def dataType: DataType = if (third.nullable) 
first.dataType.asNullable else first.dataType
   override def nullable: Boolean = first.nullable | second.nullable
 
   @transient private lazy val elementType: DataType =
@@ -5024,7 +5024,7 @@ case class ArrayAppend(left: Expression, right: 
Expression)
* Returns the [[DataType]] of the result of evaluating this expression. It 
is invalid to query
* the dataType of an unresolved expression (i.e., when `resolved` == false).
*/
-  override def dataType: DataType = left.dataType
+  override def dataType: DataType = if (right.nullable) 
left.dataType.asNullable else left.dataType
   protected def withNewChildrenInternal(newLeft: Expression, newRight: 
Expression): ArrayAppend =
   

[spark] branch branch-3.4 updated: [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 95919e26993 [SPARK-42269][CONNECT][PYTHON] Support complex return 
types in DDL strings
95919e26993 is described below

commit 95919e269930f3d1f3716b869e1abb25185d8a44
Author: Xinrong Meng 
AuthorDate: Mon Feb 13 14:10:34 2023 +0900

[SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings

### What changes were proposed in this pull request?
Support complex return types in DDL strings.

### Why are the changes needed?
Parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes.

```py
# BEFORE
>>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id"))
...
AssertionError: returnType should be singular

>>> spark.udf.register('f', lambda x: (x, x), "struct")
...
AssertionError: returnType should be singular

# AFTER
>>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id"))
DataFrame[(id): struct]

>>> spark.udf.register('f', lambda x: (x, x), "struct")
 at 0x7faee0eaaca0>

```

### How was this patch tested?
Unit tests.

Closes #39964 from xinrong-meng/collection_ret_type.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 3985b91633f5e49c8c97433651f81604dad193e9)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/client.py   | 15 ++
 python/pyspark/sql/connect/types.py| 19 ++
 python/pyspark/sql/connect/udf.py  | 23 --
 .../pyspark/sql/tests/connect/test_parity_udf.py   |  5 -
 4 files changed, 25 insertions(+), 37 deletions(-)

diff --git a/python/pyspark/sql/connect/client.py 
b/python/pyspark/sql/connect/client.py
index 943a7e70464..2c07596fec0 100644
--- a/python/pyspark/sql/connect/client.py
+++ b/python/pyspark/sql/connect/client.py
@@ -68,12 +68,12 @@ from pyspark.sql.connect.expressions import (
 PythonUDF,
 CommonInlineUserDefinedFunction,
 )
+from pyspark.sql.connect.types import parse_data_type
 from pyspark.sql.types import (
 DataType,
 StructType,
 StructField,
 )
-from pyspark.sql.utils import is_remote
 from pyspark.serializers import CloudPickleSerializer
 from pyspark.rdd import PythonEvalType
 
@@ -443,23 +443,12 @@ class SparkConnectClient(object):
 """Create a temporary UDF in the session catalog on the other side. We 
generate a
 temporary name for it."""
 
-from pyspark.sql import SparkSession as PySparkSession
-
 if name is None:
 name = f"fun_{uuid.uuid4().hex}"
 
 # convert str return_type to DataType
 if isinstance(return_type, str):
-
-assert is_remote()
-return_type_schema = (  # a workaround to parse the DataType from 
DDL strings
-PySparkSession.builder.getOrCreate()
-.createDataFrame(data=[], schema=return_type)
-.schema
-)
-assert len(return_type_schema.fields) == 1, "returnType should be 
singular"
-return_type = return_type_schema.fields[0].dataType
-
+return_type = parse_data_type(return_type)
 # construct a PythonUDF
 py_udf = PythonUDF(
 output_type=return_type.json(),
diff --git a/python/pyspark/sql/connect/types.py 
b/python/pyspark/sql/connect/types.py
index f12d6c4827e..6b9975c52cd 100644
--- a/python/pyspark/sql/connect/types.py
+++ b/python/pyspark/sql/connect/types.py
@@ -51,6 +51,7 @@ from pyspark.sql.types import (
 )
 
 import pyspark.sql.connect.proto as pb2
+from pyspark.sql.utils import is_remote
 
 
 JVM_BYTE_MIN: int = -(1 << 7)
@@ -337,3 +338,21 @@ def from_arrow_schema(arrow_schema: "pa.Schema") -> 
StructType:
 for field in arrow_schema
 ]
 )
+
+
+def parse_data_type(data_type: str) -> DataType:
+# Currently we don't have a way to have a current Spark session in Spark 
Connect, and
+# pyspark.sql.SparkSession has a centralized logic to control the session 
creation.
+# So uses pyspark.sql.SparkSession for now. Should replace this to using 
the current
+# Spark session for Spark Connect in the future.
+from pyspark.sql import SparkSession as PySparkSession
+
+assert is_remote()
+return_type_schema = (
+PySparkSession.builder.getOrCreate().createDataFrame(data=[], 
schema=data_type).schema
+)
+if len(return_type_schema.fields) == 1:
+return_type = return_type_schema.fields[0].dataType
+else:
+return_type = return_type_schema
+return return_type
diff --git a/python/pyspark/sql/connect/udf.py 
b/python/pyspark/sql/connect/udf.py
index 573d

[spark] branch master updated: [SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3985b91633f [SPARK-42269][CONNECT][PYTHON] Support complex return 
types in DDL strings
3985b91633f is described below

commit 3985b91633f5e49c8c97433651f81604dad193e9
Author: Xinrong Meng 
AuthorDate: Mon Feb 13 14:10:34 2023 +0900

[SPARK-42269][CONNECT][PYTHON] Support complex return types in DDL strings

### What changes were proposed in this pull request?
Support complex return types in DDL strings.

### Why are the changes needed?
Parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes.

```py
# BEFORE
>>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id"))
...
AssertionError: returnType should be singular

>>> spark.udf.register('f', lambda x: (x, x), "struct")
...
AssertionError: returnType should be singular

# AFTER
>>> spark.range(2).select(udf(lambda x: (x, x), "struct")("id"))
DataFrame[(id): struct]

>>> spark.udf.register('f', lambda x: (x, x), "struct")
 at 0x7faee0eaaca0>

```

### How was this patch tested?
Unit tests.

Closes #39964 from xinrong-meng/collection_ret_type.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/client.py   | 15 ++
 python/pyspark/sql/connect/types.py| 19 ++
 python/pyspark/sql/connect/udf.py  | 23 --
 .../pyspark/sql/tests/connect/test_parity_udf.py   |  5 -
 4 files changed, 25 insertions(+), 37 deletions(-)

diff --git a/python/pyspark/sql/connect/client.py 
b/python/pyspark/sql/connect/client.py
index 943a7e70464..2c07596fec0 100644
--- a/python/pyspark/sql/connect/client.py
+++ b/python/pyspark/sql/connect/client.py
@@ -68,12 +68,12 @@ from pyspark.sql.connect.expressions import (
 PythonUDF,
 CommonInlineUserDefinedFunction,
 )
+from pyspark.sql.connect.types import parse_data_type
 from pyspark.sql.types import (
 DataType,
 StructType,
 StructField,
 )
-from pyspark.sql.utils import is_remote
 from pyspark.serializers import CloudPickleSerializer
 from pyspark.rdd import PythonEvalType
 
@@ -443,23 +443,12 @@ class SparkConnectClient(object):
 """Create a temporary UDF in the session catalog on the other side. We 
generate a
 temporary name for it."""
 
-from pyspark.sql import SparkSession as PySparkSession
-
 if name is None:
 name = f"fun_{uuid.uuid4().hex}"
 
 # convert str return_type to DataType
 if isinstance(return_type, str):
-
-assert is_remote()
-return_type_schema = (  # a workaround to parse the DataType from 
DDL strings
-PySparkSession.builder.getOrCreate()
-.createDataFrame(data=[], schema=return_type)
-.schema
-)
-assert len(return_type_schema.fields) == 1, "returnType should be 
singular"
-return_type = return_type_schema.fields[0].dataType
-
+return_type = parse_data_type(return_type)
 # construct a PythonUDF
 py_udf = PythonUDF(
 output_type=return_type.json(),
diff --git a/python/pyspark/sql/connect/types.py 
b/python/pyspark/sql/connect/types.py
index f12d6c4827e..6b9975c52cd 100644
--- a/python/pyspark/sql/connect/types.py
+++ b/python/pyspark/sql/connect/types.py
@@ -51,6 +51,7 @@ from pyspark.sql.types import (
 )
 
 import pyspark.sql.connect.proto as pb2
+from pyspark.sql.utils import is_remote
 
 
 JVM_BYTE_MIN: int = -(1 << 7)
@@ -337,3 +338,21 @@ def from_arrow_schema(arrow_schema: "pa.Schema") -> 
StructType:
 for field in arrow_schema
 ]
 )
+
+
+def parse_data_type(data_type: str) -> DataType:
+# Currently we don't have a way to have a current Spark session in Spark 
Connect, and
+# pyspark.sql.SparkSession has a centralized logic to control the session 
creation.
+# So uses pyspark.sql.SparkSession for now. Should replace this to using 
the current
+# Spark session for Spark Connect in the future.
+from pyspark.sql import SparkSession as PySparkSession
+
+assert is_remote()
+return_type_schema = (
+PySparkSession.builder.getOrCreate().createDataFrame(data=[], 
schema=data_type).schema
+)
+if len(return_type_schema.fields) == 1:
+return_type = return_type_schema.fields[0].dataType
+else:
+return_type = return_type_schema
+return return_type
diff --git a/python/pyspark/sql/connect/udf.py 
b/python/pyspark/sql/connect/udf.py
index 573d8f582e2..39c31e85992 100644
--- a/python/pyspark/sql/connect/udf.py
+++ b/python/pyspark/sql/connect/udf.py
@@ -33

[spark] branch master updated: [SPARK-42034] QueryExecutionListener and Observation API do not work with `foreach` / `reduce` / `foreachPartition` action

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a1649ad2429 [SPARK-42034] QueryExecutionListener and Observation API 
do not work with `foreach` / `reduce` / `foreachPartition` action
a1649ad2429 is described below

commit a1649ad24298d988267acb8588d19848c7fb16c4
Author: 佘志铭 
AuthorDate: Mon Feb 13 14:09:54 2023 +0900

[SPARK-42034] QueryExecutionListener and Observation API do not work with 
`foreach` / `reduce` / `foreachPartition` action

### What changes were proposed in this pull request?

Add the name parameter for 'foreach'/'reduce'/'foreachPartition' operators 
in `DataSet#withNewRDDExecutionId`. Because the QueryExecutionListener and 
Observation API is triggered only when the operators have the name parameter.


https://github.com/apache/spark/blob/84ddd409c11e4da769c5b1f496f2b61c3d928c07/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L181

### Why are the changes needed?

The QueryExecutionListener and Observation API is triggered only when the 
operators have the name parameter.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

add two unit test.

Closes #39976 from ming95/SPARK-42034.

Authored-by: 佘志铭 
Signed-off-by: Hyukjin Kwon 
---
 .../main/scala/org/apache/spark/sql/Dataset.scala  | 10 
 .../scala/org/apache/spark/sql/DatasetSuite.scala  | 13 ++
 .../spark/sql/util/DataFrameCallbackSuite.scala| 28 ++
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 28177b90c7e..edcfad0c798 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1858,7 +1858,7 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
-  def reduce(func: (T, T) => T): T = withNewRDDExecutionId {
+  def reduce(func: (T, T) => T): T = withNewRDDExecutionId("reduce") {
 rdd.reduce(func)
   }
 
@@ -3336,7 +3336,7 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
-  def foreach(f: T => Unit): Unit = withNewRDDExecutionId {
+  def foreach(f: T => Unit): Unit = withNewRDDExecutionId("foreach") {
 rdd.foreach(f)
   }
 
@@ -3355,7 +3355,7 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
-  def foreachPartition(f: Iterator[T] => Unit): Unit = withNewRDDExecutionId {
+  def foreachPartition(f: Iterator[T] => Unit): Unit = 
withNewRDDExecutionId("foreachPartition") {
 rdd.foreachPartition(f)
   }
 
@@ -4148,8 +4148,8 @@ class Dataset[T] private[sql](
* them with an execution. Before performing the action, the metrics of the 
executed plan will be
* reset.
*/
-  private def withNewRDDExecutionId[U](body: => U): U = {
-SQLExecution.withNewExecutionId(rddQueryExecution) {
+  private def withNewRDDExecutionId[U](name: String)(body: => U): U = {
+SQLExecution.withNewExecutionId(rddQueryExecution, Some(name)) {
   rddQueryExecution.executedPlan.resetMetrics()
   body
 }
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
index 86e640a4fa8..263e361413c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@@ -960,6 +960,19 @@ class DatasetSuite extends QueryTest
 observe(spark.range(1, 10, 1, 11), Map("percentile_approx_val" -> 5))
   }
 
+  test("observation on datasets when a DataSet trigger foreach action") {
+def f(): Unit = {}
+
+val namedObservation = Observation("named")
+val observed_df = spark.range(100).observe(
+  namedObservation, percentile_approx($"id", lit(0.5), 
lit(100)).as("percentile_approx_val"))
+
+observed_df.foreach(r => f)
+val expected = Map("percentile_approx_val" -> 49)
+
+assert(namedObservation.get === expected)
+  }
+
   test("sample with replacement") {
 val n = 100
 val data = sparkContext.parallelize(1 to n, 2).toDS()
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
index 2fc1f10d3ea..f046daacb91 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
@@ -96,6 +96,34 @@ class DataFrameCallbackSuite extends QueryTest
 spark.listenerManager.unregister(listener)
   }
 
+  test("execut

[spark] branch branch-3.4 updated: [SPARK-42331][SQL] Fix metadata col can not been resolved

2023-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 85ec8d3d19a [SPARK-42331][SQL] Fix metadata col can not been resolved
85ec8d3d19a is described below

commit 85ec8d3d19a43658f4eb3c9cb0d19dd63bbf43d9
Author: ulysses-you 
AuthorDate: Mon Feb 13 12:43:33 2023 +0800

[SPARK-42331][SQL] Fix metadata col can not been resolved

### What changes were proposed in this pull request?

This pr makes metadata output consistent during analysis by checking the 
output and reuse these if exists.

This pr also deduplicates the metadata output when merging into the output.

### Why are the changes needed?

Let's say a process of resolving metadata:
```
Project (_metadata.file_size)
  File (_metadata.file_size > 0)
Relation
```

1. `ResolveReferences` resolves _metadata.file_size for `Filter`
2. `ResolveReferences` can not resolve _metadata.file_size for `Project`, 
due to Filter is not resolved (data type does not match)
3. then `AddMetadataColumns` will merge metadata output into output
4. the next round of `ResolveReferences` can not resolve 
_metadata.file_size for `Project` since we filter not the confict names(output 
already contains the metadata output), see code:
```
def isOutputColumn(col: MetadataColumn): Boolean = {
  outputNames.exists(name => resolve(col.name, name))
}
// filter out metadata columns that have names conflicting with 
output columns. if the table
// has a column "line" and the table can produce a metadata column 
called "line", then the
// data column should be returned, not the metadata column.
hasMeta.metadataColumns.filterNot(isOutputColumn).toAttributes
   ```
   And we also can not skip metadata column during filter confict name, 
otherwise the new generated metadata attribute will have different expr id with 
previous.

One failed example:
```scala
SELECT _metadata.row_index  FROM t WHERE _metadata.row_index >= 0;
```

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

add test for v1, v2 and streaming relation

Closes #39870 from ulysses-you/SPARK-42331.

Authored-by: ulysses-you 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 5705436f70e6c6d5a127db7773d3627c8e3d695a)
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/plans/logical/LogicalPlan.scala   | 23 ++
 .../datasources/v2/DataSourceV2Relation.scala  | 17 -
 .../execution/datasources/LogicalRelation.scala| 16 -
 .../execution/streaming/StreamingRelation.scala| 18 +-
 .../spark/sql/connector/MetadataColumnSuite.scala  | 13 ++
 .../datasources/FileMetadataStructSuite.scala  | 28 +-
 6 files changed, 79 insertions(+), 36 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
index 9a7726f6a03..5a7dcff3667 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
@@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.{AliasAwareQueryOutputOrdering, 
QueryPlan}
 import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats
 import org.apache.spark.sql.catalyst.trees.{BinaryLike, LeafLike, UnaryLike}
+import org.apache.spark.sql.catalyst.util.MetadataColumnHelper
 import org.apache.spark.sql.errors.{QueryCompilationErrors, 
QueryExecutionErrors}
 import org.apache.spark.sql.types.{DataType, StructType}
 
@@ -317,5 +318,27 @@ object LogicalPlanIntegrity {
  * A logical plan node that can generate metadata columns
  */
 trait ExposesMetadataColumns extends LogicalPlan {
+  protected def metadataOutputWithOutConflicts(
+  metadataOutput: Seq[AttributeReference]): Seq[AttributeReference] = {
+// If `metadataColFromOutput` is not empty that means `AddMetadataColumns` 
merged
+// metadata output into output. We should still return an available 
metadata output
+// so that the rule `ResolveReferences` can resolve metadata column 
correctly.
+val metadataColFromOutput = output.filter(_.isMetadataCol)
+if (metadataColFromOutput.isEmpty) {
+  val resolve = conf.resolver
+  val outputNames = outputSet.map(_.name)
+
+  def isOutputColumn(col: AttributeReference): Boolean = {
+outputNames.exists(name => resolve(col.na

[spark] branch master updated: [SPARK-42331][SQL] Fix metadata col can not been resolved

2023-02-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5705436f70e [SPARK-42331][SQL] Fix metadata col can not been resolved
5705436f70e is described below

commit 5705436f70e6c6d5a127db7773d3627c8e3d695a
Author: ulysses-you 
AuthorDate: Mon Feb 13 12:43:33 2023 +0800

[SPARK-42331][SQL] Fix metadata col can not been resolved

### What changes were proposed in this pull request?

This pr makes metadata output consistent during analysis by checking the 
output and reuse these if exists.

This pr also deduplicates the metadata output when merging into the output.

### Why are the changes needed?

Let's say a process of resolving metadata:
```
Project (_metadata.file_size)
  File (_metadata.file_size > 0)
Relation
```

1. `ResolveReferences` resolves _metadata.file_size for `Filter`
2. `ResolveReferences` can not resolve _metadata.file_size for `Project`, 
due to Filter is not resolved (data type does not match)
3. then `AddMetadataColumns` will merge metadata output into output
4. the next round of `ResolveReferences` can not resolve 
_metadata.file_size for `Project` since we filter not the confict names(output 
already contains the metadata output), see code:
```
def isOutputColumn(col: MetadataColumn): Boolean = {
  outputNames.exists(name => resolve(col.name, name))
}
// filter out metadata columns that have names conflicting with 
output columns. if the table
// has a column "line" and the table can produce a metadata column 
called "line", then the
// data column should be returned, not the metadata column.
hasMeta.metadataColumns.filterNot(isOutputColumn).toAttributes
   ```
   And we also can not skip metadata column during filter confict name, 
otherwise the new generated metadata attribute will have different expr id with 
previous.

One failed example:
```scala
SELECT _metadata.row_index  FROM t WHERE _metadata.row_index >= 0;
```

### Does this PR introduce _any_ user-facing change?

yes, bug fix

### How was this patch tested?

add test for v1, v2 and streaming relation

Closes #39870 from ulysses-you/SPARK-42331.

Authored-by: ulysses-you 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/plans/logical/LogicalPlan.scala   | 23 ++
 .../datasources/v2/DataSourceV2Relation.scala  | 17 -
 .../execution/datasources/LogicalRelation.scala| 16 -
 .../execution/streaming/StreamingRelation.scala| 18 +-
 .../spark/sql/connector/MetadataColumnSuite.scala  | 13 ++
 .../datasources/FileMetadataStructSuite.scala  | 28 +-
 6 files changed, 79 insertions(+), 36 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
index 9a7726f6a03..5a7dcff3667 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
@@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.{AliasAwareQueryOutputOrdering, 
QueryPlan}
 import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats
 import org.apache.spark.sql.catalyst.trees.{BinaryLike, LeafLike, UnaryLike}
+import org.apache.spark.sql.catalyst.util.MetadataColumnHelper
 import org.apache.spark.sql.errors.{QueryCompilationErrors, 
QueryExecutionErrors}
 import org.apache.spark.sql.types.{DataType, StructType}
 
@@ -317,5 +318,27 @@ object LogicalPlanIntegrity {
  * A logical plan node that can generate metadata columns
  */
 trait ExposesMetadataColumns extends LogicalPlan {
+  protected def metadataOutputWithOutConflicts(
+  metadataOutput: Seq[AttributeReference]): Seq[AttributeReference] = {
+// If `metadataColFromOutput` is not empty that means `AddMetadataColumns` 
merged
+// metadata output into output. We should still return an available 
metadata output
+// so that the rule `ResolveReferences` can resolve metadata column 
correctly.
+val metadataColFromOutput = output.filter(_.isMetadataCol)
+if (metadataColFromOutput.isEmpty) {
+  val resolve = conf.resolver
+  val outputNames = outputSet.map(_.name)
+
+  def isOutputColumn(col: AttributeReference): Boolean = {
+outputNames.exists(name => resolve(col.name, name))
+  }
+  // filter out the metadata struct column if it has the name conflicting 
with output c

svn commit: r60067 - /release/spark/spark-3.1.3/

2023-02-12 Thread srowen
Author: srowen
Date: Mon Feb 13 04:32:53 2023
New Revision: 60067

Log:
Remove EOL Spark 3.1.x

Removed:
release/spark/spark-3.1.3/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42391][CORE][TESTS] Close live `AppStatusStore` in the finally block for test cases

2023-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89d1d17e57a [SPARK-42391][CORE][TESTS] Close live `AppStatusStore` in 
the finally block for test cases
89d1d17e57a is described below

commit 89d1d17e57abc4c825cb9259f7a319b80e0d854a
Author: yangjie01 
AuthorDate: Sun Feb 12 20:14:52 2023 -0800

[SPARK-42391][CORE][TESTS] Close live `AppStatusStore` in the finally block 
for test cases

### What changes were proposed in this pull request?
`AppStatusStore.createLiveStore` will return `RocksDB` backend 
`AppStatusStore` when `LIVE_UI_LOCAL_STORE_DIR` or `LIVE_UI_LOCAL_STORE_DIR`  
is configured, it should be closed in finally block to release resources for 
test cases.

There are 4 test suites use `AppStatusStore.createLiveStore` function:

- `AppStatusStoreSuite`: one case not close `AppStatusStore`
- `StagePageSuite`: already call close in `finally` block
- `AllExecutionsPageSuite`: already call close in `after`
- `SQLAppStatusListenerSuite`: already call close in `after`

and only `AppStatusStoreSuite` has `AppStatusStore` without manual closing, 
so this pr has made the following changes:

- For `SPARK-36038: speculation summary should not be present if there are 
no speculative tasks` in `AppStatusStoreSuite`, add close `AppStatusStore` in 
finally block
- For `SPARK-26260: summary should contain only successful tasks' metrics 
(store = $hint)`, move the existing `AppStatusStore.close` to the finally block

### Why are the changes needed?
Call `AppStatusStore.close` in the finally block to release possible 
RocksDB resources.

### Does this PR introduce _any_ user-facing change?
No, just for test

### How was this patch tested?
- Pass GitHub Actions
- Manual test

```
export LIVE_UI_LOCAL_STORE_DIR=/tmp/spark-ui
build/sbt clean "core/testOnly org.apache.spark.status.AppStatusStoreSuite"
```
All tests passed

Closes #39961 from LuciferYang/SPARK-42391.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/status/AppStatusStoreSuite.scala  | 173 +++--
 1 file changed, 90 insertions(+), 83 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala 
b/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
index d38b0857e57..ccf6c9184cc 100644
--- a/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
+++ b/core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala
@@ -133,78 +133,81 @@ class AppStatusStoreSuite extends SparkFunSuite {
   cases.foreach { case (hint, appStore) =>
 test(s"SPARK-26260: summary should contain only successful tasks' metrics 
(store = $hint)") {
   assume(appStore != null)
-  val store = appStore.store
-
-  // Success and failed tasks metrics
-  for (i <- 0 to 5) {
-if (i % 2 == 0) {
-  writeTaskDataToStore(i, store, "FAILED")
-} else {
-  writeTaskDataToStore(i, store, "SUCCESS")
+  try {
+val store = appStore.store
+
+// Success and failed tasks metrics
+for (i <- 0 to 5) {
+  if (i % 2 == 0) {
+writeTaskDataToStore(i, store, "FAILED")
+  } else {
+writeTaskDataToStore(i, store, "SUCCESS")
+  }
 }
-  }
 
-  // Running tasks metrics (-1 = no metrics reported, positive = metrics 
have been reported)
-  Seq(-1, 6).foreach { metric =>
-writeTaskDataToStore(metric, store, "RUNNING")
-  }
+// Running tasks metrics (-1 = no metrics reported, positive = metrics 
have been reported)
+Seq(-1, 6).foreach { metric =>
+  writeTaskDataToStore(metric, store, "RUNNING")
+}
 
-  /**
-   * Following are the tasks metrics,
-   * 1, 3, 5 => Success
-   * 0, 2, 4 => Failed
-   * -1, 6 => Running
-   *
-   * Task summary will consider (1, 3, 5) only
-   */
-  val summary = appStore.taskSummary(stageId, attemptId, uiQuantiles).get
-  val successfulTasks = Array(getTaskMetrics(1), getTaskMetrics(3), 
getTaskMetrics(5))
-
-  def assertQuantiles(metricGetter: TaskMetrics => Double,
-actualQuantiles: Seq[Double]): Unit = {
-val values = successfulTasks.map(metricGetter)
-val expectedQuantiles = new Distribution(values, 0, values.length)
-  .getQuantiles(uiQuantiles.sorted)
-
-assert(actualQuantiles === expectedQuantiles)
-  }
+/**
+ * Following are the tasks metrics,
+ * 1, 3, 5 => Success
+ * 0, 2, 4 => Failed
+ * -1, 6 => Running
+ *
+ * Task summary will consider (1, 3, 5) only
+ */
+val summary

[spark] branch branch-3.4 updated: [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together

2023-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 4dd2ef95af8 [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix 
`PlanGenerationTestSuite` together
4dd2ef95af8 is described below

commit 4dd2ef95af8546a6dc80aa9f1b9dde5513b5d0c9
Author: Dongjoon Hyun 
AuthorDate: Sun Feb 12 18:57:52 2023 -0800

[SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` 
together

### What changes were proposed in this pull request?

This is a follow-up of #39982 to fix `PlanGenerationTestSuite` together.

### Why are the changes needed?

SPARK-42377 added two test suites originally which fails at Scala 2.13, but 
SPARK-42410 missed `PlanGenerationTestSuite ` while fixing 
`ProtoToParsedPlanTestSuite`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

```
$ dev/change-scala-version.sh 2.13
$ build/sbt "connect-client-jvm/testOnly 
org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13
[info] PlanGenerationTestSuite:
...
[info] - function udf 2.13 (514 milliseconds)
...
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly 
org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13
...
[info] PlanGenerationTestSuite:
...
[info] - function udf 2.13 (574 milliseconds)
```

Closes #39986 from dongjoon-hyun/SPARK-42410-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit f8fda8aa1356b3ac902570e0a5536d9c00838490)
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala| 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
index a15539afaa4..494c497c553 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
@@ -20,6 +20,7 @@ import java.nio.file.{Files, Path}
 
 import scala.collection.mutable
 import scala.util.{Failure, Success, Try}
+import scala.util.Properties.versionNumberString
 
 import com.google.protobuf.util.JsonFormat
 import io.grpc.inprocess.InProcessChannelBuilder
@@ -57,6 +58,8 @@ class PlanGenerationTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll wit
   // Borrowed from SparkFunSuite
   private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
 
+  private val scala = versionNumberString.substring(0, 
versionNumberString.indexOf(".", 2))
+
   // Borrowed from SparkFunSuite
   private def getWorkspaceFilePath(first: String, more: String*): Path = {
 if (!(sys.props.contains("spark.test.home") || 
sys.env.contains("SPARK_HOME"))) {
@@ -209,7 +212,7 @@ class PlanGenerationTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll wit
 select(fn.col("id"))
   }
 
-  test("function udf") {
+  test("function udf " + scala) {
 // This test might be a bit tricky if different JVM
 // versions are used to generate the golden files.
 val functions = Seq(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` together

2023-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f8fda8aa135 [SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix 
`PlanGenerationTestSuite` together
f8fda8aa135 is described below

commit f8fda8aa1356b3ac902570e0a5536d9c00838490
Author: Dongjoon Hyun 
AuthorDate: Sun Feb 12 18:57:52 2023 -0800

[SPARK-42410][CONNECT][TESTS][FOLLOWUP] Fix `PlanGenerationTestSuite` 
together

### What changes were proposed in this pull request?

This is a follow-up of #39982 to fix `PlanGenerationTestSuite` together.

### Why are the changes needed?

SPARK-42377 added two test suites originally which fails at Scala 2.13, but 
SPARK-42410 missed `PlanGenerationTestSuite ` while fixing 
`ProtoToParsedPlanTestSuite`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

```
$ dev/change-scala-version.sh 2.13
$ build/sbt "connect-client-jvm/testOnly 
org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13
[info] PlanGenerationTestSuite:
...
[info] - function udf 2.13 (514 milliseconds)
...
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly 
org.apache.spark.sql.PlanGenerationTestSuite" -Pscala-2.13
...
[info] PlanGenerationTestSuite:
...
[info] - function udf 2.13 (574 milliseconds)
```

Closes #39986 from dongjoon-hyun/SPARK-42410-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala| 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
index a15539afaa4..494c497c553 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
@@ -20,6 +20,7 @@ import java.nio.file.{Files, Path}
 
 import scala.collection.mutable
 import scala.util.{Failure, Success, Try}
+import scala.util.Properties.versionNumberString
 
 import com.google.protobuf.util.JsonFormat
 import io.grpc.inprocess.InProcessChannelBuilder
@@ -57,6 +58,8 @@ class PlanGenerationTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll wit
   // Borrowed from SparkFunSuite
   private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
 
+  private val scala = versionNumberString.substring(0, 
versionNumberString.indexOf(".", 2))
+
   // Borrowed from SparkFunSuite
   private def getWorkspaceFilePath(first: String, more: String*): Path = {
 if (!(sys.props.contains("spark.test.home") || 
sys.env.contains("SPARK_HOME"))) {
@@ -209,7 +212,7 @@ class PlanGenerationTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll wit
 select(fn.col("id"))
   }
 
-  test("function udf") {
+  test("function udf " + scala) {
 // This test might be a bit tricky if different JVM
 // versions are used to generate the golden files.
 val functions = Seq(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module

2023-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d3b59aad100 [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 
tests in `connect` module
d3b59aad100 is described below

commit d3b59aad1000820d7fcbd78e0e1810d9bf81b8cc
Author: Dongjoon Hyun 
AuthorDate: Sun Feb 12 17:31:26 2023 -0800

[SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` 
module

### What changes were proposed in this pull request?

This PR aims to support both Scala 2.12/13 tests in `connect` module by 
splitting the golden files.

### Why are the changes needed?

After #39933, Scala 2.13 CIs are broken .
- **master**: 
https://github.com/apache/spark/actions/runs/4157806226/jobs/7192575602
- **branch-3.4**: 
https://github.com/apache/spark/actions/runs/4155578848/jobs/7188777977
```
[info] - function_udf *** FAILED *** (29 milliseconds)
[info]   java.io.InvalidClassException: 
org.apache.spark.sql.TestUDFs$$anon$1; local class incompatible: stream 
classdesc serialVersionUID = 505010451380771093, local class serialVersionUID = 
643575318841761245
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and do Scala 2.13 tests manually.

```
$ dev/change-scala-version.sh 2.13
$ build/sbt -Dscala.version=2.13.8 -Pscala-2.13 -Phadoop-3 assembly/package 
"connect/test"
...
[info] - function_udf_2.13 (19 milliseconds)
[info] Run completed in 8 seconds, 964 milliseconds.
[info] Total number of tests run: 119
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 119, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 18 s, completed Feb 12, 2023, 3:33:40 PM
```

Closes #39982 from dongjoon-hyun/SPARK-42410.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit a5db9b976d8f37d22f8a91f461c05cbb20601d8a)
Signed-off-by: Dongjoon Hyun 
---
 ...ction_udf.explain => function_udf_2.12.explain} |   0
 ...ction_udf.explain => function_udf_2.13.explain} |   0
 .../{function_udf.json => function_udf_2.12.json}  |   0
 ...n_udf.proto.bin => function_udf_2.12.proto.bin} | Bin
 .../query-tests/queries/function_udf_2.13.json |  96 +
 .../queries/function_udf_2.13.proto.bin| Bin 0 -> 12092 bytes
 .../sql/connect/ProtoToParsedPlanTestSuite.scala   |   5 ++
 7 files changed, 101 insertions(+)

diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain
similarity index 100%
copy from 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
copy to 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain
similarity index 100%
rename from 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
rename to 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.json
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json
similarity index 100%
rename from 
connector/connect/common/src/test/resources/query-tests/queries/function_udf.json
rename to 
connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin
similarity index 100%
rename from 
connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin
rename to 
connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json
new file mode 100644
index 000..23aeb078543
--- /dev/null
+++ 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json
@@ -0,0 +1,96 @@
+{
+  "project": {
+"input": {
+  "localRelation": {
+"schema": "struct\u003cid:bigint,a:int,b:double\u003e"
+  }
+},
+"expressions": [{
+ 

[spark] branch master updated: [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` module

2023-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a5db9b976d8 [SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 
tests in `connect` module
a5db9b976d8 is described below

commit a5db9b976d8f37d22f8a91f461c05cbb20601d8a
Author: Dongjoon Hyun 
AuthorDate: Sun Feb 12 17:31:26 2023 -0800

[SPARK-42410][CONNECT][TESTS] Support Scala 2.12/2.13 tests in `connect` 
module

### What changes were proposed in this pull request?

This PR aims to support both Scala 2.12/13 tests in `connect` module by 
splitting the golden files.

### Why are the changes needed?

After #39933, Scala 2.13 CIs are broken .
- **master**: 
https://github.com/apache/spark/actions/runs/4157806226/jobs/7192575602
- **branch-3.4**: 
https://github.com/apache/spark/actions/runs/4155578848/jobs/7188777977
```
[info] - function_udf *** FAILED *** (29 milliseconds)
[info]   java.io.InvalidClassException: 
org.apache.spark.sql.TestUDFs$$anon$1; local class incompatible: stream 
classdesc serialVersionUID = 505010451380771093, local class serialVersionUID = 
643575318841761245
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and do Scala 2.13 tests manually.

```
$ dev/change-scala-version.sh 2.13
$ build/sbt -Dscala.version=2.13.8 -Pscala-2.13 -Phadoop-3 assembly/package 
"connect/test"
...
[info] - function_udf_2.13 (19 milliseconds)
[info] Run completed in 8 seconds, 964 milliseconds.
[info] Total number of tests run: 119
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 119, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 18 s, completed Feb 12, 2023, 3:33:40 PM
```

Closes #39982 from dongjoon-hyun/SPARK-42410.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 ...ction_udf.explain => function_udf_2.12.explain} |   0
 ...ction_udf.explain => function_udf_2.13.explain} |   0
 .../{function_udf.json => function_udf_2.12.json}  |   0
 ...n_udf.proto.bin => function_udf_2.12.proto.bin} | Bin
 .../query-tests/queries/function_udf_2.13.json |  96 +
 .../queries/function_udf_2.13.proto.bin| Bin 0 -> 12092 bytes
 .../sql/connect/ProtoToParsedPlanTestSuite.scala   |   5 ++
 7 files changed, 101 insertions(+)

diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain
similarity index 100%
copy from 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
copy to 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.12.explain
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain
similarity index 100%
rename from 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf.explain
rename to 
connector/connect/common/src/test/resources/query-tests/explain-results/function_udf_2.13.explain
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.json
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json
similarity index 100%
rename from 
connector/connect/common/src/test/resources/query-tests/queries/function_udf.json
rename to 
connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.json
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin
similarity index 100%
rename from 
connector/connect/common/src/test/resources/query-tests/queries/function_udf.proto.bin
rename to 
connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.12.proto.bin
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json
new file mode 100644
index 000..23aeb078543
--- /dev/null
+++ 
b/connector/connect/common/src/test/resources/query-tests/queries/function_udf_2.13.json
@@ -0,0 +1,96 @@
+{
+  "project": {
+"input": {
+  "localRelation": {
+"schema": "struct\u003cid:bigint,a:int,b:double\u003e"
+  }
+},
+"expressions": [{
+  "commonInlineUserDefinedFunction": {
+"scalarScalaUdf": {
+  "payload": 
"rO0ABXNyAC1vcmcuYXBhY2hl

[spark] branch branch-3.4 updated: [SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the same error class when the `values` argument is empty

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 92d82698760 [SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the 
same error class when the `values` argument is empty
92d82698760 is described below

commit 92d8269876019e8580b2e60d3b3891ac13b5740b
Author: Takuya UESHIN 
AuthorDate: Mon Feb 13 09:45:05 2023 +0900

[SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the same error class 
when the `values` argument is empty

### What changes were proposed in this pull request?

Fixes `DataFrame.unpivot` to raise the same error class when the `values` 
argument is an empty list/tuple.

### Why are the changes needed?

Currently `DataFrame.unpivot` raises a different error class, 
`UNPIVOT_REQUIRES_VALUE_COLUMNS` for PySpark vs. 
`UNPIVOT_VALUE_DATA_TYPE_MISMATCH` for Spark Connect.

In `Unpivot`, an empty list/tuple as `values` argument is different from 
`None`. It should handle them differently.

### Does this PR introduce _any_ user-facing change?

`DataFrame.unpivot` on Spark Connect will raise the same error class as 
PySpark.

### How was this patch tested?

Enabled `DataFrameParityTests.test_unpivot_negative`.

Closes #39960 from ueshin/issues/SPARK-41963/unpivot.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 633a486c65067b483524b079810b5590ac482a48)
Signed-off-by: Hyukjin Kwon 
---
 .../main/protobuf/spark/connect/relations.proto|  6 +++-
 .../org/apache/spark/sql/connect/dsl/package.scala |  5 ++-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  4 +--
 python/pyspark/sql/connect/dataframe.py|  8 +++--
 python/pyspark/sql/connect/plan.py |  5 +--
 python/pyspark/sql/connect/proto/relations_pb2.py  | 25 ++
 python/pyspark/sql/connect/proto/relations_pb2.pyi | 39 +-
 .../pyspark/sql/tests/connect/test_connect_plan.py | 19 +++
 .../sql/tests/connect/test_parity_dataframe.py |  5 ---
 9 files changed, 83 insertions(+), 33 deletions(-)

diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
index 3d597fd2744..ea1216957d8 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
@@ -732,13 +732,17 @@ message Unpivot {
   repeated Expression ids = 2;
 
   // (Optional) Value columns to unpivot.
-  repeated Expression values = 3;
+  optional Values values = 3;
 
   // (Required) Name of the variable column.
   string variable_column_name = 4;
 
   // (Required) Name of the value column.
   string value_column_name = 5;
+
+  message Values {
+repeated Expression values = 1;
+  }
 }
 
 message ToSchema {
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
index 88531286e24..f91040a1009 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
@@ -990,7 +990,10 @@ package object dsl {
   .newBuilder()
   .setInput(logicalPlan)
   .addAllIds(ids.asJava)
-  .addAllValues(values.asJava)
+  .setValues(Unpivot.Values
+.newBuilder()
+.addAllValues(values.asJava)
+.build())
   .setVariableColumnName(variableColumnName)
   .setValueColumnName(valueColumnName))
   .build()
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 194588fe89b..740d6b85964 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -515,7 +515,7 @@ class SparkConnectPlanner(val session: SparkSession) {
   Column(transformExpression(expr))
 }
 
-if (rel.getValuesList.isEmpty) {
+if (!rel.hasValues) {
   Unpivot(
 Some(ids.map(_.named)),
 None,
@@ -524,7 +524,7 @@ class SparkConnectPlanner(val session: SparkSession) {
 Seq(rel.getValueColumnName),
 transformRelation(rel.getInput))
 } else {
-  val values = rel.getValuesList.asScala.toArray.map { expr =>
+  val values = rel.getValues.ge

[spark] branch master updated (d703808347c -> 633a486c650)

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d703808347c [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip 
freqItems doctest due to Scala 2.13 failure
 add 633a486c650 [SPARK-41963][CONNECT] Fix DataFrame.unpivot to raise the 
same error class when the `values` argument is empty

No new revisions were added by this update.

Summary of changes:
 .../main/protobuf/spark/connect/relations.proto|  6 +++-
 .../org/apache/spark/sql/connect/dsl/package.scala |  5 ++-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  4 +--
 python/pyspark/sql/connect/dataframe.py|  8 +++--
 python/pyspark/sql/connect/plan.py |  5 +--
 python/pyspark/sql/connect/proto/relations_pb2.py  | 25 ++
 python/pyspark/sql/connect/proto/relations_pb2.pyi | 39 +-
 .../pyspark/sql/tests/connect/test_connect_plan.py | 19 +++
 .../sql/tests/connect/test_parity_dataframe.py |  5 ---
 9 files changed, 83 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 35cd46183b4 [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip 
freqItems doctest due to Scala 2.13 failure
35cd46183b4 is described below

commit 35cd46183b4a4861b53120c8fbcad9c969fe465e
Author: Dongjoon Hyun 
AuthorDate: Mon Feb 13 09:43:12 2023 +0900

[SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest 
due to Scala 2.13 failure

### What changes were proposed in this pull request?

This is a follow-up of #39947 to ignore `freqItems` doctest back.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #39983 from dongjoon-hyun/SPARK-40453.

Authored-by: Dongjoon Hyun 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit d703808347ca61ed05541e7252a0e881b1be7431)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/dataframe.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 5649d362b8b..d9de9ee14ac 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -4598,7 +4598,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 Examples
 
 >>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 
8)], ["c1", "c2"])
->>> df.freqItems(["c1", "c2"]).show()
+>>> df.freqItems(["c1", "c2"]).show()  # doctest: +SKIP
 +++
 |c1_freqItems|c2_freqItems|
 +++


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest due to Scala 2.13 failure

2023-02-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d703808347c [SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip 
freqItems doctest due to Scala 2.13 failure
d703808347c is described below

commit d703808347ca61ed05541e7252a0e881b1be7431
Author: Dongjoon Hyun 
AuthorDate: Mon Feb 13 09:43:12 2023 +0900

[SPARK-40453][SPARK-41715][CONNECT][TESTS][FOLLOWUP] Skip freqItems doctest 
due to Scala 2.13 failure

### What changes were proposed in this pull request?

This is a follow-up of #39947 to ignore `freqItems` doctest back.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes #39983 from dongjoon-hyun/SPARK-40453.

Authored-by: Dongjoon Hyun 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/dataframe.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 5649d362b8b..d9de9ee14ac 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -4598,7 +4598,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 Examples
 
 >>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 
8)], ["c1", "c2"])
->>> df.freqItems(["c1", "c2"]).show()
+>>> df.freqItems(["c1", "c2"]).show()  # doctest: +SKIP
 +++
 |c1_freqItems|c2_freqItems|
 +++


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42409][BUILD] Upgrade ZSTD-JNI to 1.5.4-1

2023-02-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 17eff5d82d7 [SPARK-42409][BUILD] Upgrade ZSTD-JNI to 1.5.4-1
17eff5d82d7 is described below

commit 17eff5d82d700d30707b42f68e8710894694b7b6
Author: yangjie01 
AuthorDate: Sun Feb 12 13:11:06 2023 -0800

[SPARK-42409][BUILD] Upgrade ZSTD-JNI to 1.5.4-1

### What changes were proposed in this pull request?
This pr aims to upgrade zstd-jni from 1.5.2-5 to  1.5.4-1

### Why are the changes needed?
This version add support for no finalizer zstd BufferDecompressingStream

- https://github.com/luben/zstd-jni/pull/244

All changes as follows:

- https://github.com/luben/zstd-jni/compare/c1.5.2-5...v1.5.4-1

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #39981 from LuciferYang/SPARK-42409.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 7e6990b5b24..964c06cb74c 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -268,4 +268,4 @@ xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.6.3//zookeeper-jute-3.6.3.jar
 zookeeper/3.6.3//zookeeper-3.6.3.jar
-zstd-jni/1.5.2-5//zstd-jni-1.5.2-5.jar
+zstd-jni/1.5.4-1//zstd-jni-1.5.4-1.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index d9ea8223eeb..0783abfe2fc 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -253,4 +253,4 @@ xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.6.3//zookeeper-jute-3.6.3.jar
 zookeeper/3.6.3//zookeeper-3.6.3.jar
-zstd-jni/1.5.2-5//zstd-jni-1.5.2-5.jar
+zstd-jni/1.5.4-1//zstd-jni-1.5.4-1.jar
diff --git a/pom.xml b/pom.xml
index 8fdc06b335c..637e7a790ed 100644
--- a/pom.xml
+++ b/pom.xml
@@ -806,7 +806,7 @@
   
 com.github.luben
 zstd-jni
-1.5.2-5
+1.5.4-1
   
   
 com.clearspring.analytics


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4a27c604eef -> c49b23fd81e)

2023-02-12 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4a27c604eef [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042
 add c49b23fd81e [SPARK-42400] Code clean up in org.apache.spark.storage

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/storage/BlockManager.scala   | 12 +---
 .../main/scala/org/apache/spark/storage/BlockManagerId.scala |  2 +-
 .../scala/org/apache/spark/storage/BlockManagerMaster.scala  |  3 +--
 .../apache/spark/storage/BlockManagerMasterEndpoint.scala|  8 
 .../scala/org/apache/spark/storage/DiskBlockManager.scala|  8 
 core/src/main/scala/org/apache/spark/storage/RDDInfo.scala   |  6 ++
 .../apache/spark/storage/ShuffleBlockFetcherIterator.scala   |  8 
 .../scala/org/apache/spark/storage/memory/MemoryStore.scala  |  2 +-
 .../spark/storage/ShuffleBlockFetcherIteratorSuite.scala |  2 --
 9 files changed, 22 insertions(+), 29 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042

2023-02-12 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 2729e12d3ed [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042
2729e12d3ed is described below

commit 2729e12d3ed0e0513da7fcb2a4c6bdc78aa955ac
Author: itholic 
AuthorDate: Sun Feb 12 19:13:05 2023 +0500

[SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042

### What changes were proposed in this pull request?

This PR proposes to assign name to _LEGACY_ERROR_TEMP_0042, 
"INVALID_SET_SYNTAX".

### Why are the changes needed?

We should assign proper name to _LEGACY_ERROR_TEMP_*

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*`

Closes #39951 from itholic/LEGACY_0042.

Lead-authored-by: itholic 
Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com>
Signed-off-by: Max Gekk 
(cherry picked from commit 4a27c604eef6f06672a3d2aaa5e40285e15bacab)
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 11 +
 .../spark/sql/errors/QueryParsingErrors.scala  |  2 +-
 .../spark/sql/execution/SparkSqlParserSuite.scala  | 28 +++---
 3 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 3082e482392..33f31a97acf 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1004,6 +1004,12 @@
 },
 "sqlState" : "42K07"
   },
+  "INVALID_SET_SYNTAX" : {
+"message" : [
+  "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to 
include special characters in key, or include semicolon in value, please use 
backquotes, e.g., SET `key`=`value`."
+],
+"sqlState" : "42000"
+  },
   "INVALID_SQL_ARG" : {
 "message" : [
   "The argument  of `sql()` is invalid. Consider to replace it by a 
SQL literal."
@@ -2052,11 +2058,6 @@
   "Found duplicate clauses: ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_0042" : {
-"message" : [
-  "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to 
include special characters in key, or include semicolon in value, please use 
quotes, e.g., SET `key`=`value`."
-]
-  },
   "_LEGACY_ERROR_TEMP_0043" : {
 "message" : [
   "Expected format is 'RESET' or 'RESET key'. If you want to include 
special characters in key, please use quotes, e.g., RESET `key`."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index accf5363d6c..57868020736 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -478,7 +478,7 @@ private[sql] object QueryParsingErrors extends 
QueryErrorsBase {
   }
 
   def unexpectedFormatForSetConfigurationError(ctx: ParserRuleContext): 
Throwable = {
-new ParseException(errorClass = "_LEGACY_ERROR_TEMP_0042", ctx)
+new ParseException(errorClass = "INVALID_SET_SYNTAX", ctx)
   }
 
   def invalidPropertyKeyForSetQuotedConfigurationError(
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
index 57c991c34d9..d6a3b74ee4c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
@@ -119,7 +119,7 @@ class SparkSqlParserSuite extends AnalysisTest with 
SharedSparkSession {
 val sql1 = "SET spark.sql.key value"
 checkError(
   exception = parseException(sql1),
-  errorClass = "_LEGACY_ERROR_TEMP_0042",
+  errorClass = "INVALID_SET_SYNTAX",
   parameters = Map.empty,
   context = ExpectedContext(
 fragment = sql1,
@@ -129,7 +129,7 @@ class SparkSqlParserSuite extends AnalysisTest with 
SharedSparkSession {
 val sql2 = "SET spark.sql.key   'value'"
 checkError(
   exception = parseException(sql2),
-  errorClass = "_LEGACY_ERROR_TEMP_0042",
+  errorClass = "INVALID_SET_SYNTAX",
   parameters = Map.empty,
   context = ExpectedContext(
 fragment = sql2,
@@ -139,7 +139,7 @@ class SparkSqlParserSuite extends AnalysisTest with 
SharedSparkSession {
 val sql3 = "SETspark.sql.key \"value\" "
 checkError(
   exception = parseException(sql3),
-  errorClass = "_LEGACY_ERROR_TEMP_0042",
+  errorClass = "INVALID_SET

[spark] branch master updated: [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042

2023-02-12 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4a27c604eef [SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042
4a27c604eef is described below

commit 4a27c604eef6f06672a3d2aaa5e40285e15bacab
Author: itholic 
AuthorDate: Sun Feb 12 19:13:05 2023 +0500

[SPARK-42312][SQL] Assign name to _LEGACY_ERROR_TEMP_0042

### What changes were proposed in this pull request?

This PR proposes to assign name to _LEGACY_ERROR_TEMP_0042, 
"INVALID_SET_SYNTAX".

### Why are the changes needed?

We should assign proper name to _LEGACY_ERROR_TEMP_*

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*`

Closes #39951 from itholic/LEGACY_0042.

Lead-authored-by: itholic 
Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com>
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 11 +
 .../spark/sql/errors/QueryParsingErrors.scala  |  2 +-
 .../spark/sql/execution/SparkSqlParserSuite.scala  | 28 +++---
 3 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 20685622bc5..f7e4086263d 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1004,6 +1004,12 @@
 },
 "sqlState" : "42K07"
   },
+  "INVALID_SET_SYNTAX" : {
+"message" : [
+  "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to 
include special characters in key, or include semicolon in value, please use 
backquotes, e.g., SET `key`=`value`."
+],
+"sqlState" : "42000"
+  },
   "INVALID_SQL_ARG" : {
 "message" : [
   "The argument  of `sql()` is invalid. Consider to replace it by a 
SQL literal."
@@ -2052,11 +2058,6 @@
   "Found duplicate clauses: ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_0042" : {
-"message" : [
-  "Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to 
include special characters in key, or include semicolon in value, please use 
quotes, e.g., SET `key`=`value`."
-]
-  },
   "_LEGACY_ERROR_TEMP_0043" : {
 "message" : [
   "Expected format is 'RESET' or 'RESET key'. If you want to include 
special characters in key, please use quotes, e.g., RESET `key`."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index accf5363d6c..57868020736 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -478,7 +478,7 @@ private[sql] object QueryParsingErrors extends 
QueryErrorsBase {
   }
 
   def unexpectedFormatForSetConfigurationError(ctx: ParserRuleContext): 
Throwable = {
-new ParseException(errorClass = "_LEGACY_ERROR_TEMP_0042", ctx)
+new ParseException(errorClass = "INVALID_SET_SYNTAX", ctx)
   }
 
   def invalidPropertyKeyForSetQuotedConfigurationError(
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
index 57c991c34d9..d6a3b74ee4c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
@@ -119,7 +119,7 @@ class SparkSqlParserSuite extends AnalysisTest with 
SharedSparkSession {
 val sql1 = "SET spark.sql.key value"
 checkError(
   exception = parseException(sql1),
-  errorClass = "_LEGACY_ERROR_TEMP_0042",
+  errorClass = "INVALID_SET_SYNTAX",
   parameters = Map.empty,
   context = ExpectedContext(
 fragment = sql1,
@@ -129,7 +129,7 @@ class SparkSqlParserSuite extends AnalysisTest with 
SharedSparkSession {
 val sql2 = "SET spark.sql.key   'value'"
 checkError(
   exception = parseException(sql2),
-  errorClass = "_LEGACY_ERROR_TEMP_0042",
+  errorClass = "INVALID_SET_SYNTAX",
   parameters = Map.empty,
   context = ExpectedContext(
 fragment = sql2,
@@ -139,7 +139,7 @@ class SparkSqlParserSuite extends AnalysisTest with 
SharedSparkSession {
 val sql3 = "SETspark.sql.key \"value\" "
 checkError(
   exception = parseException(sql3),
-  errorClass = "_LEGACY_ERROR_TEMP_0042",
+  errorClass = "INVALID_SET_SYNTAX",
   parameters = Map.empty,
   context = ExpectedContext(
 fragment = "SETspark.s