date:20240516

(spark) branch master updated: [SPARK-48303][CORE] Reorganize `LogKeys`

2024-05-16 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5643cfb71d34 [SPARK-48303][CORE] Reorganize `LogKeys`
5643cfb71d34 is described below

commit 5643cfb71d343133a185aa257f137074f41abfb3
Author: panbingkun 
AuthorDate: Thu May 16 23:20:23 2024 -0700

[SPARK-48303][CORE] Reorganize `LogKeys`

### What changes were proposed in this pull request?
The pr aims to `reorganize` `LogKeys`, includes:
- remove some unused `LogLeys`
  ACTUAL_BROADCAST_OUTPUT_STATUS_SIZE
  DEFAULT_COMPACTION_INTERVAL
  DRIVER_LIBRARY_PATH_KEY
  EXISTING_JARS
  EXPECTED_ANSWER
  FILTERS
  HAS_R_PACKAGE
  JAR_ENTRY
  LOG_KEY_FILE
  NUM_ADDED_MASTERS
  NUM_ADDED_WORKERS
  NUM_PARTITION_VALUES
  OUTPUT_LINE
  OUTPUT_LINE_NUMBER
  PARTITIONS_SIZE
  RULE_BATCH_NAME
  SERIALIZE_OUTPUT_LENGTH
  SHELL_COMMAND
  STREAM_SOURCE

- merge `PARAMETER` into `PARAM` (because some are `full` spelled, and some 
are `abbreviations`, which are not unified)
  ESTIMATOR_PARAMETER_MAP -> ESTIMATOR_PARAM_MAP
  FUNCTION_PARAMETER -> FUNCTION_PARAM
  METHOD_PARAMETER_TYPES -> METHOD_PARAM_TYPES

- merge `NUMBER` into `NUM` (abbreviations)
  MIN_VERSION_NUMBER -> MIN_VERSION_NUM
  RULE_NUMBER_OF_RUNS -> NUM_RULE_OF_RUNS
  VERSION_NUMBER -> VERSION_NUM

- merge `TOTAL` into `NUM`
  TOTAL_RECORDS_READ -> NUM_RECORDS_READ
  TRAIN_WORD_COUNT -> NUM_TRAIN_WORD

- `NUM` as prefix
  CHECKSUM_FILE_NUM -> NUM_CHECKSUM_FILE
  DATA_FILE_NUM -> NUM_DATA_FILE
  INDEX_FILE_NUM -> NUM_INDEX_FILE

- COUNR -> NUM
  EXECUTOR_DESIRED_COUNT -> NUM_EXECUTOR_DESIRED
  EXECUTOR_LAUNCH_COUNT -> NUM_EXECUTOR_LAUNCH
  EXECUTOR_TARGET_COUNT -> NUM_EXECUTOR_TARGET
  KAFKA_PULLS_COUNT -> NUM_KAFKA_PULLS
  KAFKA_RECORDS_PULLED_COUNT -> NUM_KAFKA_RECORDS_PULLED
  MIN_FREQUENT_PATTERN_COUNT -> MIN_NUM_FREQUENT_PATTERN
  POD_COUNT -> NUM_POD
  POD_SHARED_SLOT_COUNT -> NUM_POD_SHARED_SLOT
  POD_TARGET_COUNT -> NUM_POD_TARGET
  RETRY_COUNT -> NUM_RETRY

- fix some `typo`
  MALFORMATTED_STIRNG -> MALFORMATTED_STRING

- other
  MAX_LOG_NUM_POLICY -> MAX_NUM_LOG_POLICY
  WEIGHTED_NUM -> NUM_WEIGHTED_EXAMPLES

Changes in other code are additional changes caused by the above 
adjustments.

### Why are the changes needed?
Let's make `LogKeys` easier to understand and more consistent.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46612 from panbingkun/reorganize_logkey.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../network/shuffle/RetryingBlockTransferor.java   |  6 +-
 .../scala/org/apache/spark/internal/LogKey.scala   | 68 --
 .../sql/connect/client/GrpcRetryHandler.scala  |  8 +--
 .../sql/kafka010/KafkaOffsetReaderAdmin.scala  |  4 +-
 .../sql/kafka010/KafkaOffsetReaderConsumer.scala   |  4 +-
 .../sql/kafka010/consumer/KafkaDataConsumer.scala  |  6 +-
 .../streaming/kinesis/KinesisBackedBlockRDD.scala  |  4 +-
 .../org/apache/spark/api/r/RBackendHandler.scala   |  4 +-
 .../spark/deploy/history/FsHistoryProvider.scala   |  2 +-
 .../org/apache/spark/deploy/master/Master.scala|  2 +-
 .../apache/spark/ml/tree/impl/RandomForest.scala   |  4 +-
 .../apache/spark/ml/tuning/CrossValidator.scala|  4 +-
 .../spark/ml/tuning/TrainValidationSplit.scala |  4 +-
 .../org/apache/spark/mllib/feature/Word2Vec.scala  |  4 +-
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala|  4 +-
 .../apache/spark/mllib/linalg/VectorsSuite.scala   |  4 +-
 .../cluster/k8s/ExecutorPodsAllocator.scala|  6 +-
 ...ernetesLocalDiskShuffleExecutorComponents.scala |  6 +-
 .../apache/spark/deploy/yarn/YarnAllocator.scala   |  6 +-
 .../catalyst/expressions/V2ExpressionUtils.scala   |  4 +-
 .../spark/sql/catalyst/rules/RuleExecutor.scala|  6 +-
 .../sql/execution/streaming/state/RocksDB.scala| 18 +++---
 .../streaming/state/RocksDBFileManager.scala   | 22 +++
 .../state/RocksDBStateStoreProvider.scala  |  6 +-
 .../apache/hive/service/server/HiveServer2.java|  2 +-
 .../spark/sql/hive/client/HiveClientImpl.scala |  2 +-
 .../org/apache/spark/streaming/Checkpoint.scala|  4 +-
 .../streaming/util/FileBasedWriteAheadLog.scala|  4 +-
 28 files changed, 101 insertions(+), 117 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/Retryin

(spark) branch master updated (74a1a76e811a -> e07f1af03edf)

2024-05-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 74a1a76e811a [MINOR][PYTHON][TESTS] Call 
`test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row`
 add e07f1af03edf [SPARK-48317][PYTHON][CONNECT][TESTS] Enable 
`test_udtf_with_analyze_using_archive` and `test_udtf_with_analyze_using_file`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_parity_udtf.py |  7 ++-
 python/pyspark/sql/tests/test_udtf.py| 14 --
 2 files changed, 10 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (889820c1ff39 -> 74a1a76e811a)

2024-05-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 889820c1ff39 [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable 
`DataFrameObservationParityTests.test_observe_str`
 add 74a1a76e811a [MINOR][PYTHON][TESTS] Call 
`test_apply_schema_to_dict_and_rows` in `test_apply_schema_to_row`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_parity_types.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str`

2024-05-16 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 889820c1ff39 [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable 
`DataFrameObservationParityTests.test_observe_str`
889820c1ff39 is described below

commit 889820c1ff392983c52b55d80bd8d80be22785ab
Author: Hyukjin Kwon 
AuthorDate: Fri May 17 11:57:34 2024 +0800

[SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable 
`DataFrameObservationParityTests.test_observe_str`

### What changes were proposed in this pull request?

This PR proposes to enable 
`DataFrameObservationParityTests.test_observe_str`.

### Why are the changes needed?

To make sure on the test coverage

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46630 from HyukjinKwon/SPARK-41625-followup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/tests/connect/test_parity_observation.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/sql/tests/connect/test_parity_observation.py 
b/python/pyspark/sql/tests/connect/test_parity_observation.py
index a7b0009357b6..e16053d5a082 100644
--- a/python/pyspark/sql/tests/connect/test_parity_observation.py
+++ b/python/pyspark/sql/tests/connect/test_parity_observation.py
@@ -25,10 +25,7 @@ class DataFrameObservationParityTests(
 DataFrameObservationTestsMixin,
 ReusedConnectTestCase,
 ):
-# TODO(SPARK-41625): Support Structured Streaming
-@unittest.skip("Fails in Spark Connect, should enable.")
-def test_observe_str(self):
-super().test_observe_str()
+pass
 
 
 if __name__ == "__main__":


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 714fc8cd872d [SPARK-48316][PS][CONNECT][TESTS] Fix comments for 
SparkFrameMethodsParityTests.test_coalesce and test_repartition
714fc8cd872d is described below

commit 714fc8cd872d6f583a6066e9ddb4a51caa51caf3
Author: Hyukjin Kwon 
AuthorDate: Fri May 17 12:09:49 2024 +0900

[SPARK-48316][PS][CONNECT][TESTS] Fix comments for 
SparkFrameMethodsParityTests.test_coalesce and test_repartition

### What changes were proposed in this pull request?

This PR proposes to enable `SparkFrameMethodsParityTests.test_coalesce` and 
`SparkFrameMethodsParityTests.test_repartition` in Spark Connect by avoiding 
RDD usage in the test.

### Why are the changes needed?

To make sure on the test coverage

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46629 from HyukjinKwon/SPARK-48316.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/tests/connect/test_parity_frame_spark.py | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/pandas/tests/connect/test_parity_frame_spark.py 
b/python/pyspark/pandas/tests/connect/test_parity_frame_spark.py
index 24626a9164e8..c3672647b71b 100644
--- a/python/pyspark/pandas/tests/connect/test_parity_frame_spark.py
+++ b/python/pyspark/pandas/tests/connect/test_parity_frame_spark.py
@@ -28,7 +28,9 @@ class SparkFrameMethodsParityTests(
 def test_checkpoint(self):
 super().test_checkpoint()
 
-@unittest.skip("Test depends on RDD which is not supported from Spark 
Connect.")
+@unittest.skip(
+"Test depends on RDD, and cannot use SQL expression due to Catalyst 
optimization"
+)
 def test_coalesce(self):
 super().test_coalesce()
 
@@ -36,7 +38,9 @@ class SparkFrameMethodsParityTests(
 def test_local_checkpoint(self):
 super().test_local_checkpoint()
 
-@unittest.skip("Test depends on RDD which is not supported from Spark 
Connect.")
+@unittest.skip(
+"Test depends on RDD, and cannot use SQL expression due to Catalyst 
optimization"
+)
 def test_repartition(self):
 super().test_repartition()
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (b0e535217bf8 -> 403619a3974c)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b0e535217bf8 [SPARK-48301][SQL][FOLLOWUP] Update the error message
 add 403619a3974c [SPARK-48306][SQL] Improve UDT in error message

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala   |  2 +-
 .../scala/org/apache/spark/sql/errors/DataTypeErrorsBase.scala |  3 ++-
 .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala  | 10 +-
 .../main/scala/org/apache/spark/sql/hive/HiveInspectors.scala  |  5 +++--
 .../sql/hive/execution/HiveScriptTransformationSuite.scala |  9 -
 .../org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala |  4 ++--
 6 files changed, 17 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48301][SQL][FOLLOWUP] Update the error message

2024-05-16 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b0e535217bf8 [SPARK-48301][SQL][FOLLOWUP] Update the error message
b0e535217bf8 is described below

commit b0e535217bf891f2320f2419d213e1c700e15b41
Author: Ruifeng Zheng 
AuthorDate: Fri May 17 09:56:06 2024 +0800

[SPARK-48301][SQL][FOLLOWUP] Update the error message

### What changes were proposed in this pull request?
Update the error message

### Why are the changes needed?
we don't support `CREATE PROCEDURE` in spark, to address 
https://github.com/apache/spark/pull/46608#discussion_r1604205064

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46628 from zhengruifeng/nit_error.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 common/utils/src/main/resources/error/error-conditions.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 5d750ade7867..69889435b02e 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -2677,7 +2677,7 @@
   },
   "CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE" : {
 "message" : [
-  "CREATE PROCEDURE or CREATE FUNCTION with both IF NOT EXISTS and 
REPLACE is not allowed."
+  "Cannot create a routine with both IF NOT EXISTS and REPLACE 
specified."
 ]
   },
   "CREATE_TEMP_FUNC_WITH_DATABASE" : {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies

2024-05-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 05e1706e5aa6 [SPARK-48310][PYTHON][CONNECT] Cached properties must 
return copies
05e1706e5aa6 is described below

commit 05e1706e5aa66a592e61b03263683a2dbbc64afe
Author: Martin Grund 
AuthorDate: Fri May 17 10:28:36 2024 +0900

[SPARK-48310][PYTHON][CONNECT] Cached properties must return copies

### What changes were proposed in this pull request?
When a consumer modifies the result values of a cached property it will 
modify the value of the cached property.

Before:
```python
df_columns = df.columns
for col in ['id', 'name']:
  df_columns.remove(col)
assert len(df_columns) == df.columns
```

But this is wrong and this patch fixes it to

```python
df_columns = df.columns
for col in ['id', 'name']:
  df_columns.remove(col)
assert len(df_columns) != df.columns
```

### Why are the changes needed?
Correctness of the API

### Does this PR introduce _any_ user-facing change?
No, this makes the code consistent with Spark classic.

### How was this patch tested?
UT

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46621 from grundprinzip/grundprinzip/SPARK-48310.

Authored-by: Martin Grund 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/dataframe.py|  3 ++-
 .../sql/tests/connect/test_parity_dataframe.py | 24 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index ccaaa15f3190..05300909cdce 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -43,6 +43,7 @@ from typing import (
 Type,
 )
 
+import copy
 import sys
 import random
 import pyarrow as pa
@@ -1787,7 +1788,7 @@ class DataFrame(ParentDataFrame):
 if self._cached_schema is None:
 query = self._plan.to_proto(self._session.client)
 self._cached_schema = self._session.client.schema(query)
-return self._cached_schema
+return copy.deepcopy(self._cached_schema)
 
 def isLocal(self) -> bool:
 query = self._plan.to_proto(self._session.client)
diff --git a/python/pyspark/sql/tests/connect/test_parity_dataframe.py 
b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
index 343f485553a9..c9888a6a8f1a 100644
--- a/python/pyspark/sql/tests/connect/test_parity_dataframe.py
+++ b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
@@ -19,6 +19,7 @@ import unittest
 
 from pyspark.sql.tests.test_dataframe import DataFrameTestsMixin
 from pyspark.testing.connectutils import ReusedConnectTestCase
+from pyspark.sql.types import StructType, StructField, IntegerType, StringType
 
 
 class DataFrameParityTests(DataFrameTestsMixin, ReusedConnectTestCase):
@@ -26,6 +27,29 @@ class DataFrameParityTests(DataFrameTestsMixin, 
ReusedConnectTestCase):
 df = self.spark.createDataFrame(data=[{"foo": "bar"}, {"foo": "baz"}])
 super().check_help_command(df)
 
+def test_cached_property_is_copied(self):
+schema = StructType(
+[
+StructField("id", IntegerType(), True),
+StructField("name", StringType(), True),
+StructField("age", IntegerType(), True),
+StructField("city", StringType(), True),
+]
+)
+# Create some dummy data
+data = [
+(1, "Alice", 30, "New York"),
+(2, "Bob", 25, "San Francisco"),
+(3, "Cathy", 29, "Los Angeles"),
+(4, "David", 35, "Chicago"),
+]
+df = self.spark.createDataFrame(data, schema)
+df_columns = df.columns
+assert len(df.columns) == 4
+for col in ["id", "name"]:
+df_columns.remove(col)
+assert len(df.columns) == 4
+
 @unittest.skip("Spark Connect does not support RDD but the tests depend on 
them.")
 def test_toDF_with_schema_string(self):
 super().test_toDF_with_schema_string()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (e9d4152a319a -> 153053fe6c3d)

2024-05-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e9d4152a319a [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in 
view lookup test
 add 153053fe6c3d [SPARK-48268][CORE] Add a configuration for 
SparkContext.setCheckpointDir

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/SparkContext.scala  |  2 ++
 .../scala/org/apache/spark/internal/config/package.scala |  9 +
 .../test/scala/org/apache/spark/CheckpointSuite.scala| 16 
 docs/configuration.md|  9 +
 4 files changed, 36 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test

2024-05-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e9d4152a319a [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in 
view lookup test
e9d4152a319a is described below

commit e9d4152a319af4ad138ad1a6eb87bdf0b051ec9e
Author: Hyukjin Kwon 
AuthorDate: Fri May 17 08:35:37 2024 +0900

[SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/46267 that 
uses ANSI-enabled cast in the tests. It intentionally uses ANSI-enabled cast in 
`castColToType` when you look up a view.

### Why are the changes needed?

In order to fix the scheduled CI build without ANSI:

- https://github.com/apache/spark/actions/runs/9072308206/job/24960016975
- https://github.com/apache/spark/actions/runs/9072308206/job/24960019187

```
[info] - look up view relation *** FAILED *** (72 milliseconds)
[info]   == FAIL: Plans do not match ===
[info]'SubqueryAlias spark_catalog.db3.view1




  'SubqueryAlias spark_catalog.db3.view1
[info]+- View (`spark_catalog`.`db3`.`view1`, ['col1, 'col2, 'a, 'b])   




  +- View (`spark_catalog`.`db3`.`view1`, ['col1, 'col2, 'a, 'b])
[info]   +- 'Project 
[cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, col1, 0, 1) 
as int) AS col1#0, 
cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, col2, 0, 1) 
as string) AS col2#0, 
cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, a, 0, 1) as 
int) AS a#0, cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1`, 
b, 0, 1) as string) AS b#0]  +- 'Project 
[cast(getviewcolumnbynameandordinal(`spark_catalog`.`db3`.`view1 [...]
[info]  +- 'Project [*] 




+- 'Project [*]
[info] +- 'UnresolvedRelation [tbl1], [], false
```

```
[info] - look up view created before Spark 3.0 *** FAILED *** (452 
milliseconds)
[info]   == FAIL: Plans do not match ===
[info]'SubqueryAlias spark_catalog.db3.view2



  'SubqueryAlias spark_catalog.db3.view2
[info]+- View (`db3`.`view2`, ['col1, 'col2, 'a, 'b])   



  +- View (`db3`.`view2`, ['col1, 'col2, 'a, 'b])
[info]   +- 'Project [cast(getviewcolumnbynameandordinal(`db3`.`view2`, 
col1, 0, 1) as int) AS col1#0, 
cast(getviewcolumnbynameandordinal(`db3`.`view2`, col2, 0, 1) as string) AS 
col2#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, a, 0, 1) as int) AS 
a#0, cast(getviewcolumnbynameandordinal(`db3`.`view2`, b, 0, 1) as string) AS 
b#0]  +- 'Project [cast(getviewcolumnbynameandordinal(`db3`.`view2`, col1, 
0, 1) as int) AS col1#0, cast(getviewcolumnbynameandordinal(`db3`.`view [...]
[info]  +- 'Project [*] 



+- 'Project [

(spark) branch dependabot/bundler/docs/rexml-3.2.8 deleted (was 96e70ab579c3)

2024-05-16 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/bundler/docs/rexml-3.2.8
in repository https://gitbox.apache.org/repos/asf/spark.git


 was 96e70ab579c3 Bump rexml from 3.2.6 to 3.2.8 in /docs

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError

2024-05-16 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 59f88c372522 [SPARK-48294][SQL] Handle lowercase in 
nestedTypeMissingElementTypeError
59f88c372522 is described below

commit 59f88c3725222b84b2d0b51ba40a769d99866b56
Author: Michael Zhang 
AuthorDate: Thu May 16 14:58:25 2024 -0700

[SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError

### What changes were proposed in this pull request?

Handle lowercase values inside of nestTypeMissingElementTypeError to 
prevent match errors.

### Why are the changes needed?

The previous match error was not user-friendly. Now it gives an actionable 
`INCOMPLETE_TYPE_DEFINITION` error.

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Newly added tests pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46623 from michaelzhan-db/SPARK-48294.

Authored-by: Michael Zhang 
Signed-off-by: Gengliang Wang 
---
 .../apache/spark/sql/errors/QueryParsingErrors.scala  |  2 +-
 .../spark/sql/errors/QueryParsingErrorsSuite.scala| 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index 5eafd4d915a4..816fa546a138 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -289,7 +289,7 @@ private[sql] object QueryParsingErrors extends 
DataTypeErrorsBase {
 
   def nestedTypeMissingElementTypeError(
   dataType: String, ctx: PrimitiveDataTypeContext): Throwable = {
-dataType match {
+dataType.toUpperCase(Locale.ROOT) match {
   case "ARRAY" =>
 new ParseException(
   errorClass = "INCOMPLETE_TYPE_DEFINITION.ARRAY",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
index 29ab6e994e42..b7fb65091ef7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
@@ -647,6 +647,13 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
   sqlState = "42K01",
   parameters = Map("elementType" -> ""),
   context = ExpectedContext(fragment = "ARRAY", start = 30, stop = 34))
+// Create column of array type without specifying element type in lowercase
+checkError(
+  exception = parseException("CREATE TABLE tbl_120691 (col1 array)"),
+  errorClass = "INCOMPLETE_TYPE_DEFINITION.ARRAY",
+  sqlState = "42K01",
+  parameters = Map("elementType" -> ""),
+  context = ExpectedContext(fragment = "array", start = 30, stop = 34))
   }
 
   test("INCOMPLETE_TYPE_DEFINITION: struct type definition is incomplete") {
@@ -674,6 +681,12 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
   errorClass = "PARSE_SYNTAX_ERROR",
   sqlState = "42601",
   parameters = Map("error" -> "'<'", "hint" -> ": missing ')'"))
+// Create column of struct type without specifying field type in lowercase
+checkError(
+  exception = parseException("CREATE TABLE tbl_120691 (col1 struct)"),
+  errorClass = "INCOMPLETE_TYPE_DEFINITION.STRUCT",
+  sqlState = "42K01",
+  context = ExpectedContext(fragment = "struct", start = 30, stop = 35))
   }
 
   test("INCOMPLETE_TYPE_DEFINITION: map type definition is incomplete") {
@@ -695,6 +708,12 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
   errorClass = "PARSE_SYNTAX_ERROR",
   sqlState = "42601",
   parameters = Map("error" -> "'<'", "hint" -> ": missing ')'"))
+// Create column of map type without specifying key/value types in 
lowercase
+checkError(
+  exception = parseException("SELECT CAST(map('1',2) AS map)"),
+  errorClass = "INCOMPLETE_TYPE_DEFINITION.MAP",
+  sqlState = "42K01",
+  context = ExpectedContext(fragment = "map", start = 26, stop = 28))
   }
 
   test("INVALID_ESC: Escape string must contain only one character") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch dependabot/bundler/docs/rexml-3.2.8 created (now 96e70ab579c3)

2024-05-16 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/bundler/docs/rexml-3.2.8
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 96e70ab579c3 Bump rexml from 3.2.6 to 3.2.8 in /docs

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48291][CORE][FOLLOWUP] Rename Java LoggerSuite as SparkLoggerSuite

2024-05-16 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 283b2ff42221 [SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* 
as *SparkLoggerSuite*
283b2ff42221 is described below

commit 283b2ff422218b025e7b0170e4b7ed31a1294a80
Author: panbingkun 
AuthorDate: Thu May 16 11:55:20 2024 -0700

[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as 
*SparkLoggerSuite*

### What changes were proposed in this pull request?
The pr is follow up https://github.com/apache/spark/pull/46600

 to . Similarly, to maintain consistency,  should be renamed to

### Why are the changes needed?
After `org.apache.spark.internal.Logger` is renamed to 
`org.apache.spark.internal.SparkLogger` and 
`org.apache.spark.internal.LoggerFactory` is renamed to 
`org.apache.spark.internal.SparkLoggerFactory.`, the related UT's names should 
also be `renamed`, so that developers can easily locate the related UT.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46615 from panbingkun/SPARK-48291_follow_up.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../util/{PatternLoggerSuite.java => PatternSparkLoggerSuite.java} | 7 ---
 .../spark/util/{LoggerSuiteBase.java => SparkLoggerSuiteBase.java} | 2 +-
 ...{StructuredLoggerSuite.java => StructuredSparkLoggerSuite.java} | 6 +++---
 common/utils/src/test/resources/log4j2.properties  | 4 ++--
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git 
a/common/utils/src/test/java/org/apache/spark/util/PatternLoggerSuite.java 
b/common/utils/src/test/java/org/apache/spark/util/PatternSparkLoggerSuite.java
similarity index 91%
rename from 
common/utils/src/test/java/org/apache/spark/util/PatternLoggerSuite.java
rename to 
common/utils/src/test/java/org/apache/spark/util/PatternSparkLoggerSuite.java
index 33de91697efa..2d370bad4cc8 100644
--- a/common/utils/src/test/java/org/apache/spark/util/PatternLoggerSuite.java
+++ 
b/common/utils/src/test/java/org/apache/spark/util/PatternSparkLoggerSuite.java
@@ -22,9 +22,10 @@ import org.apache.logging.log4j.Level;
 import org.apache.spark.internal.SparkLogger;
 import org.apache.spark.internal.SparkLoggerFactory;
 
-public class PatternLoggerSuite extends LoggerSuiteBase {
+public class PatternSparkLoggerSuite extends SparkLoggerSuiteBase {
 
-  private static final SparkLogger LOGGER = 
SparkLoggerFactory.getLogger(PatternLoggerSuite.class);
+  private static final SparkLogger LOGGER =
+SparkLoggerFactory.getLogger(PatternSparkLoggerSuite.class);
 
   private String toRegexPattern(Level level, String msg) {
 return msg
@@ -39,7 +40,7 @@ public class PatternLoggerSuite extends LoggerSuiteBase {
 
   @Override
   String className() {
-return PatternLoggerSuite.class.getSimpleName();
+return PatternSparkLoggerSuite.class.getSimpleName();
   }
 
   @Override
diff --git 
a/common/utils/src/test/java/org/apache/spark/util/LoggerSuiteBase.java 
b/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java
similarity index 99%
rename from 
common/utils/src/test/java/org/apache/spark/util/LoggerSuiteBase.java
rename to 
common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java
index ecc0a75070c7..46bfe3415080 100644
--- a/common/utils/src/test/java/org/apache/spark/util/LoggerSuiteBase.java
+++ b/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java
@@ -30,7 +30,7 @@ import org.apache.spark.internal.SparkLogger;
 import org.apache.spark.internal.LogKeys;
 import org.apache.spark.internal.MDC;
 
-public abstract class LoggerSuiteBase {
+public abstract class SparkLoggerSuiteBase {
 
   abstract SparkLogger logger();
   abstract String className();
diff --git 
a/common/utils/src/test/java/org/apache/spark/util/StructuredLoggerSuite.java 
b/common/utils/src/test/java/org/apache/spark/util/StructuredSparkLoggerSuite.java
similarity index 95%
rename from 
common/utils/src/test/java/org/apache/spark/util/StructuredLoggerSuite.java
rename to 
common/utils/src/test/java/org/apache/spark/util/StructuredSparkLoggerSuite.java
index 110e7cc7794e..416f0b6172c0 100644
--- 
a/common/utils/src/test/java/org/apache/spark/util/StructuredLoggerSuite.java
+++ 
b/common/utils/src/test/java/org/apache/spark/util/StructuredSparkLoggerSuite.java
@@ -24,10 +24,10 @@ import org.apache.logging.log4j.Level;
 import org.apache.spark.internal.SparkLogger;
 import org.apache.spark.internal.SparkLoggerFactory;
 
-public class StructuredLoggerSuite extends LoggerSuiteBase {
+public class StructuredSparkLoggerSuite extends SparkLoggerSuiteBase {
 
   private static final

(spark) branch master updated: [SPARK-48308][CORE] Unify getting data schema without partition columns in FileSourceStrategy

2024-05-16 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57948c865e06 [SPARK-48308][CORE] Unify getting data schema without 
partition columns in FileSourceStrategy
57948c865e06 is described below

commit 57948c865e064469a75c92f8b58c632b9b40fdd3
Author: Johan Lasperas 
AuthorDate: Thu May 16 22:38:02 2024 +0800

[SPARK-48308][CORE] Unify getting data schema without partition columns in 
FileSourceStrategy

### What changes were proposed in this pull request?
Compute the schema of the data without partition columns only once in 
FileSourceStrategy.

### Why are the changes needed?
In FileSourceStrategy, the schema of the data excluding partition columns 
is computed 2 times in a slightly different way, using an AttributeSet 
(`partitionSet`) and using the attributes directly (`partitionColumns`)
These don't have the exact same semantics, AttributeSet will only use 
expression ids for comparison while comparing with the actual attributes will 
use the name, type, nullability and metadata. We want to use the former here.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46619 from johanl-db/reuse-schema-without-partition-columns.

Authored-by: Johan Lasperas 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala| 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
index 8333c276cdd8..d31cb111924b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@@ -216,9 +216,8 @@ object FileSourceStrategy extends Strategy with 
PredicateHelper with Logging {
   val requiredExpressions: Seq[NamedExpression] = filterAttributes.toSeq 
++ projects
   val requiredAttributes = AttributeSet(requiredExpressions)
 
-  val readDataColumns = dataColumns
+  val readDataColumns = dataColumnsWithoutPartitionCols
 .filter(requiredAttributes.contains)
-.filterNot(partitionColumns.contains)
 
   // Metadata attributes are part of a column of type struct up to this 
point. Here we extract
   // this column from the schema and specify a matcher for that.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE`

2024-05-16 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3d3d18f14ba2 [SPARK-48301][SQL] Rename 
`CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to 
`CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE`
3d3d18f14ba2 is described below

commit 3d3d18f14ba29074ca3ff8b661449ad45d84369e
Author: Ruifeng Zheng 
AuthorDate: Thu May 16 20:58:15 2024 +0800

[SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to 
`CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE`

### What changes were proposed in this pull request?
Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to 
`CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE`

### Why are the changes needed?
`IF NOT EXISTS` + `REPLACE` is standard restriction, not just for functions.
Rename it to make it reusable.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
updated tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46608 from zhengruifeng/sql_rename_if_not_exists_replace.

Lead-authored-by: Ruifeng Zheng 
Co-authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 common/utils/src/main/resources/error/error-conditions.json   | 4 ++--
 .../main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala   | 2 +-
 .../scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 75067a1920f7..5d750ade7867 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -2675,9 +2675,9 @@
   "ANALYZE TABLE(S) ... COMPUTE STATISTICS ...  must be either 
NOSCAN or empty."
 ]
   },
-  "CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE" : {
+  "CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE" : {
 "message" : [
-  "CREATE FUNCTION with both IF NOT EXISTS and REPLACE is not allowed."
+  "CREATE PROCEDURE or CREATE FUNCTION with both IF NOT EXISTS and 
REPLACE is not allowed."
 ]
   },
   "CREATE_TEMP_FUNC_WITH_DATABASE" : {
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index d07aa6741a14..5eafd4d915a4 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -576,7 +576,7 @@ private[sql] object QueryParsingErrors extends 
DataTypeErrorsBase {
 
   def createFuncWithBothIfNotExistsAndReplaceError(ctx: 
CreateFunctionContext): Throwable = {
 new ParseException(
-  errorClass = 
"INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE",
+  errorClass = 
"INVALID_SQL_SYNTAX.CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE",
   ctx)
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
index 5babce0ddb8d..29ab6e994e42 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
@@ -288,7 +288,7 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
 stop = 27))
   }
 
-  test("INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE: " +
+  test("INVALID_SQL_SYNTAX.CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE: " +
 "Create function with both if not exists and replace") {
 val sqlText =
   """CREATE OR REPLACE FUNCTION IF NOT EXISTS func1 as
@@ -297,7 +297,7 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
 
 checkError(
   exception = parseException(sqlText),
-  errorClass = 
"INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE",
+  errorClass = 
"INVALID_SQL_SYNTAX.CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE",
   sqlState = "42000",
   context = ExpectedContext(
 fragment = sqlText,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (fa83d0f8fce7 -> 4be0828e6e6a)

2024-05-16 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fa83d0f8fce7 [SPARK-48296][SQL] Codegen Support for `to_xml`
 add 4be0828e6e6a [SPARK-48288] Add source data type for connector cast 
expression

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/connector/expressions/Cast.java   | 18 +-
 .../sql/connector/util/V2ExpressionSQLBuilder.java |  6 +++---
 .../spark/sql/catalyst/util/V2ExpressionBuilder.scala  |  2 +-
 .../scala/org/apache/spark/sql/jdbc/JdbcDialects.scala |  4 ++--
 4 files changed, 23 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (3bd845ea930a -> fa83d0f8fce7)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3bd845ea930a [SPARK-48297][SQL] Fix a regression TRANSFORM clause with 
char/varchar
 add fa83d0f8fce7 [SPARK-48296][SQL] Codegen Support for `to_xml`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/xmlExpressions.scala | 11 ++-
 .../org/apache/spark/sql/XmlFunctionsSuite.scala  | 19 ++-
 2 files changed, 24 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c1dd4a5df693 [SPARK-48297][SQL] Fix a regression TRANSFORM clause with 
char/varchar
c1dd4a5df693 is described below

commit c1dd4a5df69340884f3f0f0c28ce916bf9e30159
Author: Kent Yao 
AuthorDate: Thu May 16 17:29:47 2024 +0800

[SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar

### What changes were proposed in this pull request?

TRANSFORM with char/varchar has been accidentally invalidated since 3.1 
with a scala.MatchError, this PR fixes it

### Why are the changes needed?

bugfix
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46603 from yaooqinn/SPARK-48297.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 3bd845ea930a4709b7a2f0447b5f8af64c697239)
Signed-off-by: Kent Yao 
---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala |  4 +++-
 .../resources/sql-tests/analyzer-results/transform.sql.out| 11 +++
 sql/core/src/test/resources/sql-tests/inputs/transform.sql|  6 +-
 .../src/test/resources/sql-tests/results/transform.sql.out| 10 ++
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 5d68aed9245a..f38d41af445e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -787,7 +787,9 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 // Create the attributes.
 val (attributes, schemaLess) = if (transformClause.colTypeList != null) {
   // Typed return columns.
-  (DataTypeUtils.toAttributes(createSchema(transformClause.colTypeList)), 
false)
+  val schema = createSchema(transformClause.colTypeList)
+  val replacedSchema = 
CharVarcharUtils.replaceCharVarcharWithStringInSchema(schema)
+  (DataTypeUtils.toAttributes(replacedSchema), false)
 } else if (transformClause.identifierSeq != null) {
   // Untyped return columns.
   val attrs = visitIdentifierSeq(transformClause.identifierSeq).map { name 
=>
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out
index ceca433a1c91..aa595c551f79 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out
@@ -1035,3 +1035,14 @@ ScriptTransformation cat, [a#x, b#x], 
ScriptInputOutputSchema(List(),List(),None
 +- Project [a#x, b#x]
+- SubqueryAlias complex_trans
   +- LocalRelation [a#x, b#x]
+
+
+-- !query
+SELECT TRANSFORM (a, b)
+  USING 'cat' AS (a CHAR(10), b VARCHAR(10))
+FROM VALUES('apache', 'spark') t(a, b)
+-- !query analysis
+ScriptTransformation cat, [a#x, b#x], 
ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
++- Project [a#x, b#x]
+   +- SubqueryAlias t
+  +- LocalRelation [a#x, b#x]
diff --git a/sql/core/src/test/resources/sql-tests/inputs/transform.sql 
b/sql/core/src/test/resources/sql-tests/inputs/transform.sql
index 922a1d817778..8570496d439e 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/transform.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/transform.sql
@@ -415,4 +415,8 @@ FROM (
   ORDER BY a
 ) map_output
 SELECT TRANSFORM(a, b)
-  USING 'cat' AS (a, b);
\ No newline at end of file
+  USING 'cat' AS (a, b);
+
+SELECT TRANSFORM (a, b)
+  USING 'cat' AS (a CHAR(10), b VARCHAR(10))
+FROM VALUES('apache', 'spark') t(a, b);
diff --git a/sql/core/src/test/resources/sql-tests/results/transform.sql.out 
b/sql/core/src/test/resources/sql-tests/results/transform.sql.out
index ab726b93c07c..7975392fd014 100644
--- a/sql/core/src/test/resources/sql-tests/results/transform.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/transform.sql.out
@@ -837,3 +837,13 @@ struct
 3  3
 3  3
 3  3
+
+
+-- !query
+SELECT TRANSFORM (a, b)
+  USING 'cat' AS (a CHAR(10), b VARCHAR(10))
+FROM VALUES('apache', 'spark') t(a, b)
+-- !query schema
+struct
+-- !query output
+apache spark


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail:

(spark) branch master updated (b53d78e94f6e -> 3bd845ea930a)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b53d78e94f6e [SPARK-48036][DOCS][FOLLOWUP] Update 
sql-ref-ansi-compliance.md
 add 3bd845ea930a [SPARK-48297][SQL] Fix a regression TRANSFORM clause with 
char/varchar

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala |  4 +++-
 .../resources/sql-tests/analyzer-results/transform.sql.out| 11 +++
 sql/core/src/test/resources/sql-tests/inputs/transform.sql|  6 +-
 .../src/test/resources/sql-tests/results/transform.sql.out| 10 ++
 4 files changed, 29 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (0ba8ddc9ce5b -> b53d78e94f6e)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0ba8ddc9ce5b [SPARK-48293][SS] Add test for when 
ForeachBatchUserFuncException wraps interrupted exception due to query stop
 add b53d78e94f6e [SPARK-48036][DOCS][FOLLOWUP] Update 
sql-ref-ansi-compliance.md

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48303][CORE] Reorganize `LogKeys`

(spark) branch master updated (74a1a76e811a -> e07f1af03edf)

(spark) branch master updated (889820c1ff39 -> 74a1a76e811a)

(spark) branch master updated: [SPARK-41625][PYTHON][CONNECT][TESTS][FOLLOW-UP] Enable `DataFrameObservationParityTests.test_observe_str`

(spark) branch master updated: [SPARK-48316][PS][CONNECT][TESTS] Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

(spark) branch master updated (b0e535217bf8 -> 403619a3974c)

(spark) branch master updated: [SPARK-48301][SQL][FOLLOWUP] Update the error message

(spark) branch master updated: [SPARK-48310][PYTHON][CONNECT] Cached properties must return copies

(spark) branch master updated (e9d4152a319a -> 153053fe6c3d)

(spark) branch master updated: [SPARK-48031][SQL][FOLLOW-UP] Use ANSI-enabled cast in view lookup test

(spark) branch dependabot/bundler/docs/rexml-3.2.8 deleted (was 96e70ab579c3)

(spark) branch master updated: [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError

(spark) branch dependabot/bundler/docs/rexml-3.2.8 created (now 96e70ab579c3)

(spark) branch master updated: [SPARK-48291][CORE][FOLLOWUP] Rename Java LoggerSuite as SparkLoggerSuite

(spark) branch master updated: [SPARK-48308][CORE] Unify getting data schema without partition columns in FileSourceStrategy

(spark) branch master updated: [SPARK-48301][SQL] Rename `CREATE_FUNC_WITH_IF_NOT_EXISTS_AND_REPLACE` to `CREATE_ROUTINE_WITH_IF_NOT_EXISTS_AND_REPLACE`

(spark) branch master updated (fa83d0f8fce7 -> 4be0828e6e6a)

(spark) branch master updated (3bd845ea930a -> fa83d0f8fce7)

(spark) branch branch-3.5 updated: [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar

(spark) branch master updated (b53d78e94f6e -> 3bd845ea930a)

(spark) branch master updated (0ba8ddc9ce5b -> b53d78e94f6e)

21 matches

Site Navigation

Mail list logo

Footer information