(spark) branch branch-3.4 updated: [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 3c41b1d97e1f [SPARK-47428][BUILD][3.4] Upgrade Jetty to 
9.4.54.v20240208
3c41b1d97e1f is described below

commit 3c41b1d97e1f5ff9f74f9ea72f7ea92dcbca2122
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 15 22:42:17 2024 -0700

[SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208

### What changes were proposed in this pull request?

This PR aims to upgrade Jetty to 9.4.54.v20240208 for Apache Spark 3.4.3.

### Why are the changes needed?

To bring the latest bug fixes.
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.52.v20230823
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.51.v20230217

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45544 from dongjoon-hyun/SPARK-47428-3.4.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 691c83632b38..a94fbcd0ca77 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -143,7 +143,7 @@ jersey-hk2/2.36//jersey-hk2-2.36.jar
 jersey-server/2.36//jersey-server-2.36.jar
 jetty-sslengine/6.1.26//jetty-sslengine-6.1.26.jar
 jetty-util/6.1.26//jetty-util-6.1.26.jar
-jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jetty/6.1.26//jetty-6.1.26.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.2//joda-time-2.12.2.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4d94cb5c699e..99665da7d16a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -128,8 +128,8 @@ 
jersey-container-servlet/2.36//jersey-container-servlet-2.36.jar
 jersey-hk2/2.36//jersey-hk2-2.36.jar
 jersey-server/2.36//jersey-server-2.36.jar
 jettison/1.1//jettison-1.1.jar
-jetty-util-ajax/9.4.50.v20221201//jetty-util-ajax-9.4.50.v20221201.jar
-jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar
+jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.2//joda-time-2.12.2.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
diff --git a/pom.xml b/pom.xml
index 373d17b76c09..77218d162c41 100644
--- a/pom.xml
+++ b/pom.xml
@@ -143,7 +143,7 @@
 1.12.3
 1.8.6
 shaded-protobuf
-9.4.50.v20221201
+9.4.54.v20240208
 4.0.3
 0.10.0
 2.5.1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 210e80e8b7ba [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` 
GitHub Action job
210e80e8b7ba is described below

commit 210e80e8b7baa5fc1e6462615bc8134a4c90647c
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 23:38:56 2023 -0700

[SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

### What changes were proposed in this pull request?

This PR aims to skip `Unidoc` and `MIMA` phases in many general test 
pipelines. `mima` test is moved to `lint` job.

### Why are the changes needed?

By having an independent document generation and mima checking GitHub 
Action job, we can skip them in the following many jobs.


https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually check the GitHub action logs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43422 from dongjoon-hyun/SPARK-45587.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 13527119e51a..33747fb5b61d 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -198,6 +198,8 @@ jobs:
   HIVE_PROFILE: ${{ matrix.hive }}
   GITHUB_PREV_SHA: ${{ github.event.before }}
   SPARK_LOCAL_IP: localhost
+  SKIP_UNIDOC: true
+  SKIP_MIMA: true
   SKIP_PACKAGING: true
 steps:
 - name: Checkout Spark repository
@@ -578,6 +580,8 @@ jobs:
   run: ./dev/check-license
 - name: Dependencies test
   run: ./dev/test-dependencies.sh
+- name: MIMA test
+  run: ./dev/mima
 - name: Scala linter
   run: ./dev/lint-scala
 - name: Java linter


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8c6eeb8ab018 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` 
GitHub Action job
8c6eeb8ab018 is described below

commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 23:38:56 2023 -0700

[SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

### What changes were proposed in this pull request?

This PR aims to skip `Unidoc` and `MIMA` phases in many general test 
pipelines. `mima` test is moved to `lint` job.

### Why are the changes needed?

By having an independent document generation and mima checking GitHub 
Action job, we can skip them in the following many jobs.


https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually check the GitHub action logs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43422 from dongjoon-hyun/SPARK-45587.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index ad8685754b31..b0760a955342 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -204,6 +204,8 @@ jobs:
   HIVE_PROFILE: ${{ matrix.hive }}
   GITHUB_PREV_SHA: ${{ github.event.before }}
   SPARK_LOCAL_IP: localhost
+  SKIP_UNIDOC: true
+  SKIP_MIMA: true
   SKIP_PACKAGING: true
 steps:
 - name: Checkout Spark repository
@@ -627,6 +629,8 @@ jobs:
   run: ./dev/check-license
 - name: Dependencies test
   run: ./dev/test-dependencies.sh
+- name: MIMA test
+  run: ./dev/mima
 - name: Scala linter
   run: ./dev/lint-scala
 - name: Java linter


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d59425275cdd [SPARK-47428][BUILD][3.5] Upgrade Jetty to 
9.4.54.v20240208
d59425275cdd is described below

commit d59425275cdd0ff678a5bcccef4c7b74fe8170cb
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 15 22:28:45 2024 -0700

[SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208

### What changes were proposed in this pull request?

This PR aims to upgrade Jetty to 9.4.54.v20240208

### Why are the changes needed?

To bring the latest bug fixes.
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45543 from dongjoon-hyun/SPARK-47428.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index c76702cd0af0..8ecf931bf513 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -130,8 +130,8 @@ 
jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar
 jersey-hk2/2.40//jersey-hk2-2.40.jar
 jersey-server/2.40//jersey-server-2.40.jar
 jettison/1.1//jettison-1.1.jar
-jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar
-jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar
+jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.5//joda-time-2.12.5.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
diff --git a/pom.xml b/pom.xml
index 5db3c78e00eb..fb6208777d3f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -143,7 +143,7 @@
 1.13.1
 1.9.2
 shaded-protobuf
-9.4.52.v20230823
+9.4.54.v20240208
 4.0.3
 0.10.0
 

(spark) branch master updated: [SPARK-47327][SQL] Move sort keys concurrency test to CollationFactorySuite

2024-03-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6719168b6ec7 [SPARK-47327][SQL] Move sort keys concurrency test to 
CollationFactorySuite
6719168b6ec7 is described below

commit 6719168b6ec72242e111bcb3aae75985d36fdad2
Author: Stefan Kandic 
AuthorDate: Sat Mar 16 09:24:22 2024 +0500

[SPARK-47327][SQL] Move sort keys concurrency test to CollationFactorySuite

### What changes were proposed in this pull request?

Move concurrency test to the `CollationFactorySuite`

### Why are the changes needed?

This is more appropriate location for the test as it directly uses the 
`CollationFactory`.

Also, I just found out that `par` method is highly discouraged and that we 
should use `ParSeq` instead.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

With existing UTs

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45501 from stefankandic/moveTest.

Authored-by: Stefan Kandic 
Signed-off-by: Max Gekk 
---
 common/unsafe/pom.xml  |  6 ++
 .../apache/spark/unsafe/types/CollationFactorySuite.scala  | 14 ++
 .../test/scala/org/apache/spark/sql/CollationSuite.scala   | 14 --
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index e9785ebb7ad4..13b45f55a4ad 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -47,6 +47,12 @@
   ${project.version}
 
 
+
+  org.scala-lang.modules
+  
scala-parallel-collections_${scala.binary.version}
+  test
+
+
 
   com.ibm.icu
   icu4j
diff --git 
a/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala
 
b/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala
index f9927b94fd42..0a9ff7558e3a 100644
--- 
a/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala
+++ 
b/common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.unsafe.types
 
+import scala.collection.parallel.immutable.ParSeq
 import scala.jdk.CollectionConverters.MapHasAsScala
 
 import org.apache.spark.SparkException
@@ -138,4 +139,17 @@ class CollationFactorySuite extends AnyFunSuite with 
Matchers { // scalastyle:ig
   assert(result == testCase.expectedResult)
 })
   }
+
+  test("test concurrently generating collation keys") {
+// generating ICU sort keys is not thread-safe by default so this should 
fail
+// if we don't handle the concurrency properly on Collator level
+
+(0 to 10).foreach(_ => {
+  val collator = fetchCollation("UNICODE").collator
+
+  ParSeq(0 to 100).foreach { _ =>
+collator.getCollationKey("aaa")
+  }
+})
+  }
 }
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
index bef7417be36c..aaf3e88c9bdb 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
@@ -18,7 +18,6 @@
 package org.apache.spark.sql
 
 import scala.collection.immutable.Seq
-import 
scala.collection.parallel.CollectionConverters.ImmutableIterableIsParallelizable
 import scala.jdk.CollectionConverters.MapHasAsJava
 
 import org.apache.spark.SparkException
@@ -413,19 +412,6 @@ class CollationSuite extends DatasourceV2SQLBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  test("test concurrently generating collation keys") {
-// generating ICU sort keys is not thread-safe by default so this should 
fail
-// if we don't handle the concurrency properly on Collator level
-
-(0 to 10).foreach(_ => {
-  val collator = CollationFactory.fetchCollation("UNICODE").collator
-
-  (0 to 100).par.foreach { _ =>
-collator.getCollationKey("aaa")
-  }
-})
-  }
-
   test("text writing to parquet with collation enclosed with backticks") {
 withTempPath{ path =>
   sql(s"select 'a' COLLATE `UNICODE`").write.parquet(path.getAbsolutePath)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47423][SQL] Collations - Set operation support for strings with collations

2024-03-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 653ac5b729e2 [SPARK-47423][SQL] Collations - Set operation support for 
strings with collations
653ac5b729e2 is described below

commit 653ac5b729e2eba9bf097905b3fd136603b7a298
Author: Aleksandar Tomic 
AuthorDate: Sat Mar 16 09:21:08 2024 +0500

[SPARK-47423][SQL] Collations - Set operation support for strings with 
collations

### What changes were proposed in this pull request?

This PR fixes support for set operations for strings with collations 
different from `UTF8_BINARY`. The fix is not strictly related to set operations 
and may resolve other problems in collation space. The fix is to add default 
value for `StringType` with collation. Previously the matching pattern would 
not catch the `StringType` with collation case and fix is simply to do pattern 
matching on `st: StringType` instead of relying on `StringType` match.

### Why are the changes needed?

Fixing behaviour of set operations.

### Does this PR introduce _any_ user-facing change?

Yes - fixing the logic that previously didn't work.

### How was this patch tested?

Golden file tests are added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45536 from dbatomic/collations_and_set_ops.

Authored-by: Aleksandar Tomic 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/expressions/literals.scala  |  2 +-
 .../sql-tests/analyzer-results/collations.sql.out  | 51 +
 .../test/resources/sql-tests/inputs/collations.sql |  7 +++
 .../resources/sql-tests/results/collations.sql.out | 53 ++
 4 files changed, 112 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
index 9603647db06f..eadd4c04f4b3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
@@ -195,7 +195,7 @@ object Literal {
 case TimestampNTZType => create(0L, TimestampNTZType)
 case it: DayTimeIntervalType => create(0L, it)
 case it: YearMonthIntervalType => create(0, it)
-case StringType => Literal("")
+case st: StringType => Literal(UTF8String.fromString(""), st)
 case BinaryType => Literal("".getBytes(StandardCharsets.UTF_8))
 case CalendarIntervalType => Literal(new CalendarInterval(0, 0, 0))
 case arr: ArrayType => create(Array(), arr)
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out
index fff2d4eab717..6d9bb3470be6 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out
@@ -149,6 +149,57 @@ DropTable false, false
 +- ResolvedIdentifier V2SessionCatalog(spark_catalog), default.t1
 
 
+-- !query
+select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), 
('BBB'), ('zzz'), ('ZZZ') except select col1 collate utf8_binary_lcase from 
values ('aaa'), ('bbb')
+-- !query analysis
+Except false
+:- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x]
+:  +- LocalRelation [col1#x]
++- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x]
+   +- LocalRelation [col1#x]
+
+
+-- !query
+select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), 
('BBB'), ('zzz'), ('ZZZ') except all select col1 collate utf8_binary_lcase from 
values ('aaa'), ('bbb')
+-- !query analysis
+Except All true
+:- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x]
+:  +- LocalRelation [col1#x]
++- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x]
+   +- LocalRelation [col1#x]
+
+
+-- !query
+select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), 
('BBB'), ('zzz'), ('ZZZ') union select col1 collate utf8_binary_lcase from 
values ('aaa'), ('bbb')
+-- !query analysis
+Distinct
++- Union false, false
+   :- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x]
+   :  +- LocalRelation [col1#x]
+   +- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x]
+  +- LocalRelation [col1#x]
+
+
+-- !query
+select col1 collate utf8_binary_lcase from values ('aaa'), ('AAA'), ('bbb'), 
('BBB'), ('zzz'), ('ZZZ') union all select col1 collate utf8_binary_lcase from 
values ('aaa'), ('bbb')
+-- !query analysis
+Union false, false
+:- Project [collate(col1#x, utf8_binary_lcase) AS collate(col1)#x]
+:  +- LocalRelation [col1#x]
++- 

(spark) branch master updated (7c81bdf1ed17 -> add49b3c115f)

2024-03-15 Thread ueshin
This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7c81bdf1ed17 [SPARK-47345][SQL][TESTS] Xml functions suite
 add add49b3c115f [SPARK-47346][PYTHON] Make daemon mode configurable when 
creating Python planner workers

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/SparkEnv.scala | 27 ++
 .../spark/api/python/PythonWorkerFactory.scala | 12 +-
 .../spark/api/python/StreamingPythonRunner.scala   | 18 +--
 .../api/python/PythonWorkerFactorySuite.scala  |  2 +-
 .../sql/execution/python/PythonPlannerRunner.scala |  4 ++--
 .../python/PythonStreamingSourceRunner.scala   |  2 +-
 6 files changed, 39 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47345][SQL][TESTS] Xml functions suite

2024-03-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7c81bdf1ed17 [SPARK-47345][SQL][TESTS] Xml functions suite
7c81bdf1ed17 is described below

commit 7c81bdf1ed17df31ec6d7a3ee9f18b73d8ae2bd6
Author: Yousof Hosny 
AuthorDate: Fri Mar 15 22:56:29 2024 +0500

[SPARK-47345][SQL][TESTS] Xml functions suite

### What changes were proposed in this pull request?

Convert JsonFunctiosnSuite.scala to XML equivalent. Note that XML doesn’t 
implement all json functions like json_tuple, get_json_object, etc.

### Why are the changes needed?

Improve unit test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45466 from yhosny/xml-functions-suite.

Authored-by: Yousof Hosny 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/XmlFunctionsSuite.scala   | 480 +
 1 file changed, 480 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala
new file mode 100644
index ..fcfbebaa61ec
--- /dev/null
+++ b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala
@@ -0,0 +1,480 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.text.SimpleDateFormat
+import java.util.Locale
+
+import scala.jdk.CollectionConverters._
+
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types._
+
+class XmlFunctionsSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  test("from_xml") {
+val df = Seq("""1""").toDS()
+val schema = new StructType().add("a", IntegerType)
+
+checkAnswer(
+  df.select(from_xml($"value", schema)),
+  Row(Row(1)) :: Nil)
+  }
+
+  test("from_xml with option (timestampFormat)") {
+val df = Seq("""26/08/2015 18:00""").toDS()
+val schema = new StructType().add("time", TimestampType)
+val options = Map("timestampFormat" -> "dd/MM/ HH:mm").asJava
+
+checkAnswer(
+  df.select(from_xml($"value", schema, options)),
+  Row(Row(java.sql.Timestamp.valueOf("2015-08-26 18:00:00.0"
+  }
+
+  test("from_xml with option (rowTag)") {
+val df = Seq("""1""").toDS()
+val schema = new StructType().add("a", IntegerType)
+val options = Map("rowTag" -> "foo").asJava
+
+checkAnswer(
+  df.select(from_xml($"value", schema)),
+  Row(Row(1)) :: Nil)
+  }
+
+  test("from_xml with option (dateFormat)") {
+val df = Seq("""26/08/2015""").toDS()
+val schema = new StructType().add("time", DateType)
+val options = Map("dateFormat" -> "dd/MM/").asJava
+
+checkAnswer(
+  df.select(from_xml($"value", schema, options)),
+  Row(Row(java.sql.Date.valueOf("2015-08-26"
+  }
+
+  test("from_xml missing columns") {
+val df = Seq("""1""").toDS()
+val schema = new StructType().add("b", IntegerType)
+
+checkAnswer(
+  df.select(from_xml($"value", schema)),
+  Row(Row(null)) :: Nil)
+  }
+
+  test("from_xml invalid xml") {
+val df = Seq("""1""").toDS()
+val schema = new StructType().add("a", IntegerType)
+
+checkAnswer(
+  df.select(from_xml($"value", schema)),
+  Row(Row(null)) :: Nil)
+  }
+
+  test("from_xml - xml doesn't conform to the array type") {
+val df = Seq("""1""").toDS()
+val schema = StructType(StructField("a", ArrayType(IntegerType)) :: Nil)
+
+checkAnswer(df.select(from_xml($"value", schema)), Row(Row(null)))
+  }
+
+  test("from_xml array support") {
+val df = Seq(s""" 1 2 """.stripMargin).toDS()
+val schema = StructType(StructField("a", ArrayType(IntegerType)) :: Nil)
+
+checkAnswer(
+  df.select(from_xml($"value", schema)),
+  

(spark) branch master updated (6bf031796c8c -> e2c0471476ea)

2024-03-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate 
`test_metadata`
 add e2c0471476ea [SPARK-47395] Add collate and collation to other APIs

No new revisions were added by this update.

Summary of changes:
 R/pkg/NAMESPACE|   2 +
 R/pkg/R/functions.R|  29 
 R/pkg/R/generics.R |   8 
 R/pkg/tests/fulltests/test_sparkSQL.R  |   2 +
 .../scala/org/apache/spark/sql/functions.scala |  16 +++
 .../apache/spark/sql/PlanGenerationTestSuite.scala |   8 
 ...ulls_first.explain => function_collate.explain} |   2 +-
 ...ls_first.explain => function_collation.explain} |   2 +-
 .../{column_rlike.json => function_collate.json}   |   4 +-
 ...sition.proto.bin => function_collate.proto.bin} | Bin 189 -> 189 bytes
 ...{column_isNull.json => function_collation.json} |   2 +-
 ..._geq.proto.bin => function_collation.proto.bin} | Bin 178 -> 178 bytes
 .../source/reference/pyspark.sql/functions.rst |   2 +
 python/pyspark/sql/connect/functions/builtin.py|  14 ++
 python/pyspark/sql/functions/builtin.py|  52 +
 python/pyspark/sql/tests/test_functions.py |   5 ++
 .../scala/org/apache/spark/sql/functions.scala |  16 +++
 17 files changed, 159 insertions(+), 5 deletions(-)
 copy 
connector/connect/common/src/test/resources/query-tests/explain-results/{column_asc_nulls_first.explain
 => function_collate.explain} (57%)
 copy 
connector/connect/common/src/test/resources/query-tests/explain-results/{column_asc_nulls_first.explain
 => function_collation.explain} (61%)
 copy 
connector/connect/common/src/test/resources/query-tests/queries/{column_rlike.json
 => function_collate.json} (89%)
 copy 
connector/connect/common/src/test/resources/query-tests/queries/{function_bitmap_bit_position.proto.bin
 => function_collate.proto.bin} (85%)
 copy 
connector/connect/common/src/test/resources/query-tests/queries/{column_isNull.json
 => function_collation.json} (93%)
 copy 
connector/connect/common/src/test/resources/query-tests/queries/{column_geq.proto.bin
 => function_collation.proto.bin} (90%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (4437e6e21237 -> 6bf031796c8c)

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` 
to `common/utils`
 add 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate 
`test_metadata`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_connect_session.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (b7aa9740249b -> 4437e6e21237)

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to 
NullType
 add 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` 
to `common/utils`

No new revisions were added by this update.

Summary of changes:
 .../utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {core => 
common/utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties 
(100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (4b1f8c3d779b -> b7aa9740249b)

2024-03-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4b1f8c3d779b [SPARK-47399][SQL] Disable generated columns on 
expressions with collations
 add b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to 
NullType

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/jdbc/PostgresIntegrationSuite.scala | 16 +++-
 .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala |  6 +-
 .../org/apache/spark/sql/jdbc/PostgresDialect.scala  |  1 +
 3 files changed, 21 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47399][SQL] Disable generated columns on expressions with collations

2024-03-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b1f8c3d779b [SPARK-47399][SQL] Disable generated columns on 
expressions with collations
4b1f8c3d779b is described below

commit 4b1f8c3d779b1391b414d6d6791bed5800b600bd
Author: Stefan Kandic 
AuthorDate: Fri Mar 15 16:12:40 2024 +0500

[SPARK-47399][SQL] Disable generated columns on expressions with collations

### What changes were proposed in this pull request?
Disable the ability to use collations in expressions for generated columns.

### Why are the changes needed?
Changing the collation of a column or even just changing the ICU version 
could lead to a differences in the resulting expression so it would be best if 
we simply disable it for now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

With new unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45520 from stefankandic/disableGeneratedColumnsCollation.

Authored-by: Stefan Kandic 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/util/GeneratedColumn.scala  |  5 ++
 .../org/apache/spark/sql/CollationSuite.scala  | 53 ++
 2 files changed, 58 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala
index 28ddc16cf6b0..747a0e225a2f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala
@@ -29,6 +29,7 @@ import 
org.apache.spark.sql.connector.catalog.{CatalogManager, Identifier, Table
 import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{DataType, StructField, StructType}
+import org.apache.spark.sql.util.SchemaUtils
 
 /**
  * This object contains utility methods and values for Generated Columns
@@ -162,6 +163,10 @@ object GeneratedColumn {
 s"generation expression data type ${analyzed.dataType.simpleString} " +
 s"is incompatible with column data type ${dataType.simpleString}")
 }
+if (analyzed.exists(e => 
SchemaUtils.hasNonDefaultCollatedString(e.dataType))) {
+  throw unsupportedExpressionError(
+"generation expression cannot contain non-default collated string 
type")
+}
   }
 
   /**
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
index 72e72a53c4f6..bef7417be36c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
@@ -622,4 +622,57 @@ class CollationSuite extends DatasourceV2SQLBase with 
AdaptiveSparkPlanHelper {
   case _: SortMergeJoinExec => ()
 }.nonEmpty)
   }
+
+  test("Generated column expressions using collations - errors out") {
+checkError(
+  exception = intercept[AnalysisException] {
+sql(
+  s"""
+ |CREATE TABLE testcat.test_table(
+ |  c1 STRING COLLATE UNICODE,
+ |  c2 STRING COLLATE UNICODE GENERATED ALWAYS AS (SUBSTRING(c1, 
0, 1))
+ |)
+ |USING $v2Source
+ |""".stripMargin)
+  },
+  errorClass = "UNSUPPORTED_EXPRESSION_GENERATED_COLUMN",
+  parameters = Map(
+"fieldName" -> "c2",
+"expressionStr" -> "SUBSTRING(c1, 0, 1)",
+"reason" -> "generation expression cannot contain non-default collated 
string type"))
+
+checkError(
+  exception = intercept[AnalysisException] {
+sql(
+  s"""
+ |CREATE TABLE testcat.test_table(
+ |  c1 STRING COLLATE UNICODE,
+ |  c2 STRING COLLATE UNICODE GENERATED ALWAYS AS (c1 || 'a' 
COLLATE UNICODE)
+ |)
+ |USING $v2Source
+ |""".stripMargin)
+  },
+  errorClass = "UNSUPPORTED_EXPRESSION_GENERATED_COLUMN",
+  parameters = Map(
+"fieldName" -> "c2",
+"expressionStr" -> "c1 || 'a' COLLATE UNICODE",
+"reason" -> "generation expression cannot contain non-default collated 
string type"))
+
+checkError(
+  exception = intercept[AnalysisException] {
+sql(
+  s"""
+ |CREATE TABLE testcat.test_table(
+ |  struct1 STRUCT,
+ |  c2 STRING COLLATE UNICODE GENERATED ALWAYS AS 
(SUBSTRING(struct1.a, 0, 1))
+ |)
+ |USING $v2Source
+ |""".stripMargin)
+  },
+  errorClass = 

(spark) branch master updated: [SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect

2024-03-15 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 19ac1fc13646 [SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in 
MYSQLDialect
19ac1fc13646 is described below

commit 19ac1fc13646e982ef76718b5e7a0f0e5147794e
Author: Kent Yao 
AuthorDate: Fri Mar 15 16:38:08 2024 +0800

[SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect

### What changes were proposed in this pull request?

In MySQL, TIMESTAMP and DATETIME are different. The former is a TIMESTAMP 
WITH LOCAL TIME ZONE and the latter is a TIMESTAMP WITHOUT TIME ZONE

Following [SPARK-47375](https://issues.apache.org/jira/browse/SPARK-47375), 
MySql TIMESTAMP goes directly to TimestampType, DATETIME's mapping is decided 
by preferTimestampNTZ.

### Why are the changes needed?

align the guidelines for jdbc timestamps
### Does this PR introduce _any_ user-facing change?

yes,migration guide provided

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

Closes #45530 from yaooqinn/SPARK-47406.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 14 +++
 docs/sql-migration-guide.md|  1 +
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 10 +++--
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  7 
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 45 +-
 5 files changed, 55 insertions(+), 22 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 48b94cf28a63..b1d239337aa0 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql.jdbc
 
 import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
+import java.time.LocalDateTime
 import java.util.Properties
 
 import org.apache.spark.sql.Row
@@ -134,6 +135,19 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 }
   }
 
+  test("SPARK-47406: MySQL datetime types with preferTimestampNTZ") {
+withDefaultTimeZone(UTC) {
+  val df = sqlContext.read.option("preferTimestampNTZ", true)
+.jdbc(jdbcUrl, "dates", new Properties)
+  checkAnswer(df, Row(
+Date.valueOf("1991-11-09"),
+LocalDateTime.of(1970, 1, 1, 13, 31, 24),
+LocalDateTime.of(1996, 1, 1, 1, 23, 45),
+Timestamp.valueOf("2009-02-13 23:31:30"),
+Date.valueOf("2001-01-01")))
+}
+  }
+
   test("String types") {
 val df = sqlContext.read.jdbc(jdbcUrl, "strings", new Properties)
 val rows = df.collect()
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 28fa19c351fc..27b62a6bd792 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -41,6 +41,7 @@ license: |
 - Since Spark 4.0, the SQL config 
`spark.sql.legacy.allowZeroIndexInFormatString` is deprecated. Consider to 
change `strfmt` of the `format_string` function to use 1-based indexes. The 
first argument must be referenced by "1$", the second by "2$", etc.
 - Since Spark 4.0, the function `to_csv` no longer supports input with the 
data type `STRUCT`, `ARRAY`, `MAP`, `VARIANT` and `BINARY` (because the `CSV 
specification` does not have standards for these data types and cannot be read 
back using `from_csv`), Spark will throw 
`DATATYPE_MISMATCH.UNSUPPORTED_INPUT_TYPE` exception.
 - Since Spark 4.0, JDBC read option `preferTimestampNTZ=true` will not convert 
Postgres TIMESTAMP WITH TIME ZONE and TIME WITH TIME ZONE data types to 
TimestampNTZType, which is available in Spark 3.5. 
+- Since Spark 4.0, JDBC read option `preferTimestampNTZ=true` will not convert 
MySQL TIMESTAMP to TimestampNTZType, which is available in Spark 3.5. MySQL 
DATETIME is not affected.
 
 ## Upgrading from Spark SQL 3.4 to 3.5
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index b037d862fa1a..393f09b6075e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -167,6 +167,10 @@ object JdbcUtils extends Logging with SQLConfHelper {
   throw QueryExecutionErrors.cannotGetJdbcTypeError(dt))
   }
 
+  def