(spark) branch master updated (6b879c2ca1f0 -> dc82285610e6)

2024-09-01 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6b879c2ca1f0 [SPARK-49455][SQL][TESTS] Refactor 
`StagingInMemoryTableCatalog` to override the non-deprecated functions
 add dc82285610e6 [SPARK-49483][BUILD] Upgrade `commons-lang3` to 3.17.0

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (fb8d01acf166 -> 6b879c2ca1f0)

2024-09-01 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fb8d01acf166 [SPARK-48682][SQL][FOLLOW-UP] Changed initCap behaviour 
with UTF8_BINARY collation
 add 6b879c2ca1f0 [SPARK-49455][SQL][TESTS] Refactor 
`StagingInMemoryTableCatalog` to override the non-deprecated functions

No new revisions were added by this update.

Summary of changes:
 .../connector/catalog/StagingInMemoryTableCatalog.scala   | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49460][SQL] Remove `cleanupResource()` from EmptyRelationExec

2024-08-30 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9cec3c4f7c1b [SPARK-49460][SQL] Remove `cleanupResource()` from 
EmptyRelationExec
9cec3c4f7c1b is described below

commit 9cec3c4f7c1b467023f0eefff69e8b7c5105417d
Author: Ziqi Liu 
AuthorDate: Sat Aug 31 10:05:18 2024 +0800

[SPARK-49460][SQL] Remove `cleanupResource()` from EmptyRelationExec

### What changes were proposed in this pull request?
Remove cleanupResource() from`EmptyRelationExec`

### Why are the changes needed?

This bug was introduced in https://github.com/apache/spark/pull/46830 : 
`cleanupResources` might be executed on the executor where `logical` is null.

After revisiting cleanupResources relevant code paths, I think 
`EmptyRelationExec` doesn't need to anything here.

- for driver side cleanup, we have [this code 
path](https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala)
 to cleanup each AQE query stage.
- for executor side cleanup, so far we only have SortMergeJoinExec which 
invoke cleanupResource during its execution, so upon the time when 
EmptyRelationExec is created, it's guaranteed necessary cleanup has been done.
-
After all, `EmptyRelationExec` is only a never-execute wrapper for 
materialized physical query stages, it should not be responsible for any 
cleanup invocation.

So I'm removing `cleanupResources` implementation from `EmptyRelationExec`.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
New unit test.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #47931 from liuzqt/SPARK-49460.

Authored-by: Ziqi Liu 
Signed-off-by: yangjie01 
---
 .../spark/sql/execution/EmptyRelationExec.scala| 10 --
 .../adaptive/AdaptiveQueryExecSuite.scala  | 37 ++
 2 files changed, 37 insertions(+), 10 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala
index 085c0b22524c..8a544de7567e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala
@@ -22,7 +22,6 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
-import org.apache.spark.sql.execution.adaptive.LogicalQueryStage
 import org.apache.spark.sql.vectorized.ColumnarBatch
 
 /**
@@ -81,13 +80,4 @@ case class EmptyRelationExec(@transient logical: 
LogicalPlan) extends LeafExecNo
   override def doCanonicalize(): SparkPlan = {
 this.copy(logical = LocalRelation(logical.output).canonicalized)
   }
-
-  override protected[sql] def cleanupResources(): Unit = {
-logical.foreach {
-  case LogicalQueryStage(_, physical) =>
-physical.cleanupResources()
-  case _ =>
-}
-super.cleanupResources()
-  }
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index fc54e7ecd46d..938a96a86b01 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -1608,6 +1608,43 @@ class AdaptiveQueryExecSuite
 }
   }
 
+  test("SPARK-49460: NPE in EmptyRelationExec.cleanupResources") {
+withTable("t1left", "t1right", "t1empty") {
+  spark.sql("create table t1left (a int, b int);")
+  spark.sql("insert into t1left values (1, 1), (2,2), (3,3);")
+  spark.sql("create table t1right (a int, b int);")
+  spark.sql("create table t1empty (a int, b int);")
+  spark.sql("insert into t1right values (2,20), (4, 40);")
+
+  spark.sql("""
+  |with leftT as (
+  |  with erp as (
+  |select
+  |  *
+  |from
+  |  t1left
+  |  join t1empty on t1left.a = t1empty.a
+  |  join t1right on t1left.a = t1right.a
+  |  )
+  |  SELECT
+  |CASE
+  |

(spark) branch master updated: [SPARK-49119][SQL] Fix the inconsistency of syntax `show columns` between v1 and v2

2024-08-30 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 53c1f31dc26b [SPARK-49119][SQL] Fix the inconsistency of syntax `show 
columns` between v1 and v2
53c1f31dc26b is described below

commit 53c1f31dc26bb56d56e0b71b144910df5d376a76
Author: panbingkun 
AuthorDate: Fri Aug 30 16:15:01 2024 +0800

[SPARK-49119][SQL] Fix the inconsistency of syntax `show columns` between 
v1 and v2

### What changes were proposed in this pull request?
The pr aims to
- fix the `inconsistency` of syntax `show columns` between `v1` and `v2`.
- assign a name `SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE` to the error 
condition `_LEGACY_ERROR_TEMP_1057`.
- unify v1 and v2 `SHOW COLUMNS ...` tests.
- move some UT related to `SHOW COLUMNS` from `DDLSuite` to 
`command/ShowColumnsSuiteBase` or `v1/ShowColumnsSuiteBase`.
- move some UT related to `SHOW COLUMNS` from `DDLParserSuite` and 
`ErrorParserSuite` to `ShowColumnsParserSuite`.

### Why are the changes needed?
In `AstBuilder`, we have `a comment` that explains as follows:

https://github.com/apache/spark/blob/2a752105091ef95f994526b15bae2159657c8ed0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L5054-L5055
However, in our v2 of the syntax `show columns` implementation, we `did 
not` perform the above checks, as shown below:

```
  withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (col1 int, col2 string) $defaultUsing")
  sql(s"SHOW COLUMNS IN $t IN ns1")
}
```

- Before (inconsistent, v1 will fail, but v2 will success)
  v1:
  ```
  [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting 
namespace: `ns1` != `ns`.
  ```

  v2:
  ```
  Execute successfully.
  ```

 so, we should fix it.

- After (consistent, v1 & v2 all will fail)
  v1:
  ```
  [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting 
namespace: `ns1` != `ns`.
  ```

  v2:
  ```
  [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting 
namespace: `ns1` != `ns`.
  ```

### Does this PR introduce _any_ user-facing change?
Yes, for v2 tables, in syntax `SHOW COLUMNS {FROM | IN} {tableName} {FROM | 
IN} {namespace}`, if the namespace (`second parameter`) is different from the 
namespace of the table(`first parameter`), the command will succeed without any 
awareness before this PR, after this PR, it will report an error.

### How was this patch tested?
Add new UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47628 from panbingkun/SPARK-49119.

Lead-authored-by: panbingkun 
    Co-authored-by: Kent Yao 
Signed-off-by: yangjie01 
---
 .../src/main/resources/error/error-conditions.json |  11 +--
 .../spark/sql/errors/QueryCompilationErrors.scala  |  11 +--
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  23 -
 .../sql/catalyst/parser/ErrorParserSuite.scala |   4 -
 .../catalyst/analysis/ResolveSessionCatalog.scala  |   3 +-
 .../datasources/v2/DataSourceV2Strategy.scala  |  13 ++-
 ...olumnsTableExec.scala => ShowColumnsExec.scala} |   4 +-
 .../analyzer-results/show_columns.sql.out  |   7 +-
 .../sql-tests/results/show_columns.sql.out |   7 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  10 ---
 .../spark/sql/execution/command/DDLSuite.scala |  33 ---
 .../execution/command/ShowColumnsParserSuite.scala |  55 
 .../execution/command/ShowColumnsSuiteBase.scala   | 100 +
 .../execution/command/v1/ShowColumnsSuite.scala|  55 
 .../execution/command/v2/ShowColumnsSuite.scala}   |  17 +---
 .../hive/execution/command/ShowColumnsSuite.scala} |  18 ++--
 16 files changed, 255 insertions(+), 116 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 89d2627ef32e..496a90e5db34 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -3866,6 +3866,12 @@
 ],
 "sqlState" : "42K08"
   },
+  "SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE" : {
+"message" : [
+  "SHOW COLUMNS with conflicting namespaces:  != ."
+],
+"sqlState" : "42K05"
+  },
   "SORT_BY_WITHOUT_BUCKETING" : {
 "message" : [
   "sortBy must be used together with bucketBy."
@@ -5685,11 +5691,6 @@
   "ADD COLUM

(spark) branch master updated: [SPARK-49457][BUILD] Remove uncommon curl option `--retry-all-errors`

2024-08-29 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new effcf22a029f [SPARK-49457][BUILD] Remove uncommon curl option 
`--retry-all-errors`
effcf22a029f is described below

commit effcf22a029f2f61aa2513ae06554d171a774f5b
Author: Cheng Pan 
AuthorDate: Thu Aug 29 22:55:08 2024 +0800

[SPARK-49457][BUILD] Remove uncommon curl option `--retry-all-errors`

### What changes were proposed in this pull request?

Remove uncommon curl option `--retry-all-errors`, which is added in curl 
7.71.0 - June 24 2020, old versions can not recognize this option.

### Why are the changes needed?

It causes `build/mvn` to fail on Ubuntu 20.04.

```
exec: curl --retry 3 --retry-all-errors --silent --show-error -L 
https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz?action=download
curl: option --retry-all-errors: is unknown
curl: try 'curl --help' or 'curl --manual' for more information
```

```
$ curl --version
curl 7.68.0 (aarch64-unknown-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f 
zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) 
libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 
pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos 
Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47926 from pan3793/SPARK-49457.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 build/mvn | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/build/mvn b/build/mvn
index 28454c68fd12..060209ac1ac4 100755
--- a/build/mvn
+++ b/build/mvn
@@ -58,7 +58,7 @@ install_app() {
   local local_checksum="${local_tarball}.${checksum_suffix}"
   local 
remote_checksum="https://archive.apache.org/dist/${url_path}.${checksum_suffix}";
 
-  local curl_opts="--retry 3 --retry-all-errors --silent --show-error -L"
+  local curl_opts="--retry 3 --silent --show-error -L"
   local wget_opts="--no-verbose"
 
   if [ ! -f "$binary" ]; then


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][SQL] Fix the incorrect method `@link` tag in `StagingTableCatalog`

2024-08-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57b6bc114e33 [MINOR][SQL] Fix the incorrect method `@link` tag in 
`StagingTableCatalog`
57b6bc114e33 is described below

commit 57b6bc114e3348e00cbe88af3be0ad2a5cc0a579
Author: yangjie01 
AuthorDate: Wed Aug 28 17:44:53 2024 +0800

[MINOR][SQL] Fix the incorrect method `@link` tag in `StagingTableCatalog`

### What changes were proposed in this pull request?
This pr fixes an incorrect method `link` in `StagingTableCatalog`, link it 
to the method that should be `override` instead of the current method.

### Why are the changes needed?
Fix the incorrect method `link` tag

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47899 from LuciferYang/minor-wrong-link-StagingTableCatalog.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../org/apache/spark/sql/connector/catalog/StagingTableCatalog.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java
index 6f074faf6e58..eead1ade4079 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java
@@ -102,7 +102,7 @@ public interface StagingTableCatalog extends TableCatalog {
* returned table's {@link StagedTable#commitStagedChanges()} is called.
* 
* This is deprecated, please override
-   * {@link #stageReplace(Identifier, StructType, Transform[], Map)} instead.
+   * {@link #stageReplace(Identifier, Column[], Transform[], Map)} instead.
*/
   @Deprecated(since = "3.4.0")
   default StagedTable stageReplace(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49327][BUILD] Upgrade `commons-compress` to 1.27.1

2024-08-21 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a31a5acbd389 [SPARK-49327][BUILD] Upgrade `commons-compress` to 1.27.1
a31a5acbd389 is described below

commit a31a5acbd3891b9b903c65878707f64ec338a8fb
Author: panbingkun 
AuthorDate: Wed Aug 21 17:30:16 2024 +0800

[SPARK-49327][BUILD] Upgrade `commons-compress` to 1.27.1

### What changes were proposed in this pull request?
The pr aims to upgrade `commons-compress` from `1.27.0` to `1.27.1`

### Why are the changes needed?
Although the last upgrade occurred 10 days ago, this version fixed a 
serious bug as follows:

https://commons.apache.org/proper/commons-compress/changes-report.html#a1.27.1
- Compression into BZip2 format has unexpected end of file when using a 
BufferedOutputStream. Fixes 
[COMPRESS-686](https://issues.apache.org/jira/browse/COMPRESS-686).

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47821 from panbingkun/SPARK-49327.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 5a86dee79d98..60f11565658b 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -40,7 +40,7 @@ commons-codec/1.17.1//commons-codec-1.17.1.jar
 commons-collections/3.2.2//commons-collections-3.2.2.jar
 commons-collections4/4.4//commons-collections4-4.4.jar
 commons-compiler/3.1.9//commons-compiler-3.1.9.jar
-commons-compress/1.27.0//commons-compress-1.27.0.jar
+commons-compress/1.27.1//commons-compress-1.27.1.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
 commons-io/2.16.1//commons-io-2.16.1.jar
diff --git a/pom.xml b/pom.xml
index fc1836f1c406..3fb276e53059 100644
--- a/pom.xml
+++ b/pom.xml
@@ -186,7 +186,7 @@
 1.1.10.6
 3.0.3
 1.17.1
-1.27.0
+1.27.1
 2.16.1
 
 2.6


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (e64f620fe8fd -> 1ae482ac7e64)

2024-08-19 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e64f620fe8fd [SPARK-48796][SS] Load Column Family Id from 
RocksDBCheckpointMetadata for VCF when restarting
 add 1ae482ac7e64 [SPARK-49075][BUILD] Upgrade JUnit5 related to the latest 
version

No new revisions were added by this update.

Summary of changes:
 pom.xml | 10 +-
 project/plugins.sbt |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][SQL][TESTS] Changes the `test:runMain` in the code comments to `Test/runMain`

2024-08-15 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7b43a6fdf0f9 [MINOR][SQL][TESTS] Changes the `test:runMain` in the 
code comments to `Test/runMain`
7b43a6fdf0f9 is described below

commit 7b43a6fdf0f906ebbc83ca2a91f31e3ab76a68b2
Author: yangjie01 
AuthorDate: Thu Aug 15 17:11:26 2024 +0800

[MINOR][SQL][TESTS] Changes the `test:runMain` in the code comments to 
`Test/runMain`

### What changes were proposed in this pull request?
This PR only changes the `test:runMain` description related to run command 
in the code comments to `Test/runMain`.

### Why are the changes needed?
When we use the execution command in the code comments, we will see the 
following compilation warning:

```
build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.benchmark.TopKBenchmark"
```

```
[warn] sbt 0.13 shell syntax is deprecated; use slash syntax instead: sql / 
Test / runMain
```

The relevant comments should be updated to eliminate the compilation 
warnings when run the command.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually run the test using the updated command and check that the 
corresponding compilation warning is no longer present.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47767 from LuciferYang/runMain-command-comments.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala   | 2 +-
 .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala | 4 ++--
 .../sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala  | 4 ++--
 .../org/apache/spark/sql/execution/benchmark/TopKBenchmark.scala  | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala
index 00b757e4f78f..48a16f01d574 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala
@@ -359,7 +359,7 @@ class GenTPCDSDataConfig(args: Array[String]) {
   private def printUsageAndExit(exitCode: Int): Unit = {
 // scalastyle:off
 System.err.println("""
-  |build/sbt "test:runMain  [Options]"
+  |build/sbt "Test/runMain  [Options]"
   |Options:
   |  --masterthe Spark master to use, default to 
local[*]
   |  --dsdgenDir location of dsdgen
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
index e7f83cb7eb4b..0078c3f9f65d 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
@@ -33,9 +33,9 @@ import 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter
  *   1. without sbt:
  *  bin/spark-submit --class  --jars  

  *   2. build/sbt build/sbt ";project sql;set javaOptions
- *in Test += \"-Dspark.memory.debugFill=false\";test:runMain "
+ *in Test += \"-Dspark.memory.debugFill=false\";Test/runMain "
  *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt ";project 
sql;set javaOptions
- *in Test += \"-Dspark.memory.debugFill=false\";test:runMain "
+ *in Test += \"-Dspark.memory.debugFill=false\";Test/runMain "
  *  Results will be written to
  *  "benchmarks/ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt".
  * }}}
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala
index 88cdfebbb173..1244dd029981 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala
@@ -28,9 +28,9 @@ import org.apache.spark.sql.internal.SQLConf
  *   1. without sbt:
  *  bin/spark-submit --class 
  *--jars , 
- *   2. build/sbt "sql/test:runMain "
+ *   2. build/sbt "sql/Test/runMain "
  *   3. generate result:
- *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain &qu

(spark) branch master updated: [SPARK-49240][BUILD] Add `scalastyle` and `checkstyle` rules to avoid `URL` constructors

2024-08-14 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0c24ae11164d [SPARK-49240][BUILD] Add `scalastyle` and `checkstyle` 
rules to avoid `URL` constructors
0c24ae11164d is described below

commit 0c24ae11164d2aa5dcd5042d63e43839ee479757
Author: Dongjoon Hyun 
AuthorDate: Thu Aug 15 12:27:15 2024 +0800

[SPARK-49240][BUILD] Add `scalastyle` and `checkstyle` rules to avoid `URL` 
constructors

### What changes were proposed in this pull request?

This PR aims to add `scalastyle` and `checkstyle` rules to avoid `URL` 
constructors.

### Why are the changes needed?

The java.net.URL class does not itself encode or decode any URL components 
according to the escaping mechanism defined in RFC2396.

So, from Java 20, all `URL` constructors are deprecated. We had better use 
better `URI` class.
- https://bugs.openjdk.org/browse/JDK-8295949

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added rules.

After this PR, there is only two exceptional instances in 
`JettyUtils.scala` and `UISuite.scala`.
- `JettyUtils` is tricky instance
- UISuite test case is supposed to add bad URL which URI prevents with 
`java.net.URISyntaxException`. This is an example why `URI` is better. In this 
PR, we keep the old, URL class, to keep the test coverage.
```
$ git grep -C1 'new URL('
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala-// 
scalastyle:off URLConstructor
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:val newUrl 
= new URL(requestURL, prefixedDestPath).toString
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala-// 
scalastyle:on URLConstructor
--
core/src/test/scala/org/apache/spark/ui/UISuite.scala-  // 
scalastyle:off URLConstructor
core/src/test/scala/org/apache/spark/ui/UISuite.scala:  val badRequest 
= new URL(
core/src/test/scala/org/apache/spark/ui/UISuite.scala-
s"http://$localhost:${serverInfo.boundPort}$path/root?bypass&invalid<=foo")
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47762 from dongjoon-hyun/SPARK-49240.

Authored-by: Dongjoon Hyun 
    Signed-off-by: yangjie01 
---
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  8 +++
 .../apache/spark/deploy/FaultToleranceTest.scala   |  4 ++--
 .../spark/deploy/rest/RestSubmissionClient.scala   | 14 +--
 .../scala/org/apache/spark/ui/JettyUtils.scala |  5 +++-
 .../main/scala/org/apache/spark/util/Utils.scala   |  2 +-
 .../spark/deploy/LogUrlsStandaloneSuite.scala  |  4 ++--
 .../deploy/history/HistoryServerPageSuite.scala|  6 ++---
 .../spark/deploy/history/HistoryServerSuite.scala  | 28 +++---
 .../deploy/master/MasterDecommisionSuite.scala |  4 ++--
 .../spark/deploy/master/ui/MasterWebUISuite.scala  |  4 ++--
 .../deploy/rest/StandaloneRestSubmitSuite.scala|  4 ++--
 .../org/apache/spark/ui/UISeleniumSuite.scala  | 16 ++---
 .../test/scala/org/apache/spark/ui/UISuite.scala   | 25 ++-
 dev/checkstyle.xml |  4 
 .../k8s/integrationtest/DepsTestsSuite.scala   |  4 ++--
 .../org/apache/spark/deploy/yarn/AmIpFilter.java   |  6 ++---
 .../cluster/YarnSchedulerBackendSuite.scala|  8 +++
 scalastyle-config.xml  |  5 
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  6 ++---
 .../v1/sql/SqlResourceWithActualMetricsSuite.scala | 18 +++---
 20 files changed, 95 insertions(+), 80 deletions(-)

diff --git 
a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala 
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
index 3f1314e970b1..b20ee4b3cc23 100644
--- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
+++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.sql.avro
 
 import java.io._
-import java.net.URL
+import java.net.URI
 import java.nio.file.{Files, Paths, StandardCopyOption}
 import java.sql.{Date, Timestamp}
 import java.util.UUID
@@ -648,7 +648,7 @@ abstract class AvroSuite
 assert(message.contains("No Avro files found."))
 
 Files.copy(
-  Paths.get(new URL(episodesAvro).toURI),
+  Paths.get(new URI(episodesAvro)),
   Paths.get(dir.getCanonicalPath, "episodes.avro"))
 
 val result = spark.read.format("avro").load(episodesAvro).collect()
@@ -2139,7 +2139,7 @@ abstract class AvroSuite
   test("SPARK

(spark) branch master updated: [SPARK-49234][BUILD] Upgrade `xz` to `1.10`

2024-08-14 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1947646f2b63 [SPARK-49234][BUILD] Upgrade `xz` to `1.10`
1947646f2b63 is described below

commit 1947646f2b6398506405dc592c6c9b15075cc4e7
Author: Dongjoon Hyun 
AuthorDate: Wed Aug 14 16:50:04 2024 +0800

[SPARK-49234][BUILD] Upgrade `xz` to `1.10`

### What changes were proposed in this pull request?

This PR aims to upgrade `xz` to `1.10` independently from Apache Avro 
change.

### Why are the changes needed?

`1.10` is the latest minor version update with new improvements (like 
`ARM64`, `optimized classes for Java >= 9`) and bug fixes.
- 
https://github.com/tukaani-project/xz-java/blob/master/NEWS.md#110-2024-07-29

Note that the license is also changed from `Public Domain` to `BSD Zero 
Clause`.
> Licensing change: From version 1.10 onwards, XZ for Java is under the BSD 
Zero Clause License (0BSD). 1.9 and older are in the public domain and 
obviously remain so; the change only affects the new releases.

### Does this PR introduce _any_ user-facing change?

No behavior change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47750 from dongjoon-hyun/SPARK-49234.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 LICENSE-binary| 11 +--
 dev/deps/spark-deps-hadoop-3-hive-2.3 |  2 +-
 licenses/LICENSE-xz.txt   | 11 +++
 pom.xml   |  2 +-
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index 28f1b63033d5..89826482d363 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -422,6 +422,11 @@ Python Software Foundation License
 python/pyspark/loose_version.py
 
 
+BSD 0-Clause
+
+org.tukaani:xz
+
+
 BSD 2-Clause
 
 com.github.luben:zstd-jni
@@ -520,12 +525,6 @@ org.glassfish.hk2:hk2-locator
 org.glassfish.hk2:hk2-utils
 org.glassfish.hk2:osgi-resource-locator
 
-
-Public Domain
--
-org.tukaani:xz
-
-
 Creative Commons CC0 1.0 Universal Public Domain Dedication
 ---
 (see LICENSE-CC0.txt)
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index aac733e16bed..a0febbfc721a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -271,7 +271,7 @@ univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar
 wildfly-openssl/1.1.3.Final//wildfly-openssl-1.1.3.Final.jar
 xbean-asm9-shaded/4.25//xbean-asm9-shaded-4.25.jar
 xmlschema-core/2.3.1//xmlschema-core-2.3.1.jar
-xz/1.9//xz-1.9.jar
+xz/1.10//xz-1.10.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.9.2//zookeeper-jute-3.9.2.jar
 zookeeper/3.9.2//zookeeper-3.9.2.jar
diff --git a/licenses/LICENSE-xz.txt b/licenses/LICENSE-xz.txt
new file mode 100644
index ..4322122aecf1
--- /dev/null
+++ b/licenses/LICENSE-xz.txt
@@ -0,0 +1,11 @@
+Permission to use, copy, modify, and/or distribute this
+software for any purpose with or without fee is hereby granted.
+
+THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
+WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
+THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
+CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
+LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
+NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
diff --git a/pom.xml b/pom.xml
index cd95cc9f9587..6b2e8b3482d0 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1608,7 +1608,7 @@
   
 org.tukaani
 xz
-1.9
+1.10
   
   

(spark) branch master updated: [SPARK-49187][BUILD] Upgrade slf4j to 2.0.16

2024-08-12 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e72d21c299a4 [SPARK-49187][BUILD] Upgrade slf4j to 2.0.16
e72d21c299a4 is described below

commit e72d21c299a450e48b3cf6e5d36b8f3e9a568088
Author: yangjie01 
AuthorDate: Tue Aug 13 10:42:21 2024 +0800

[SPARK-49187][BUILD] Upgrade slf4j to 2.0.16

### What changes were proposed in this pull request?
This pr aims to upgrade slf4j from 2.0.14 to 2.0.16.

### Why are the changes needed?
The new version bring 2 fix:

- Fixed issue with stale MANIFEST.MF files. This issue was raisied in 
https://github.com/qos-ch/slf4j/issues/421
- The information about the provider LoggerFactory connected with will now 
be reported using the level DEBUG and will not be printed by 
default.(https://github.com/qos-ch/slf4j/commit/3ff00870b32c2067d72fb83d6a9e95548130)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Action
- manual check

run `build/sbt core/test`

**before**

we can see the following message before test

```
SLF4J(I): Connected with provider of type 
[org.apache.logging.slf4j.SLF4JServiceProvider]
```

**after**

No more logs similar to `SLF4J(I): Connected with provider of type 
[org.apache.logging.slf4j.SLF4JServiceProvider]`

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47720 from LuciferYang/SPARK-49187.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 66774ec42ce4..b3ded712072f 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -126,7 +126,7 @@ javax.servlet-api/4.0.1//javax.servlet-api-4.0.1.jar
 javolution/5.5.1//javolution-5.5.1.jar
 jaxb-api/2.2.11//jaxb-api-2.2.11.jar
 jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
-jcl-over-slf4j/2.0.14//jcl-over-slf4j-2.0.14.jar
+jcl-over-slf4j/2.0.16//jcl-over-slf4j-2.0.16.jar
 jdo-api/3.0.1//jdo-api-3.0.1.jar
 jdom2/2.0.6//jdom2-2.0.6.jar
 jersey-client/3.0.12//jersey-client-3.0.12.jar
@@ -153,7 +153,7 @@ json4s-jackson_2.13/4.0.7//json4s-jackson_2.13-4.0.7.jar
 json4s-scalap_2.13/4.0.7//json4s-scalap_2.13-4.0.7.jar
 jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
-jul-to-slf4j/2.0.14//jul-to-slf4j-2.0.14.jar
+jul-to-slf4j/2.0.16//jul-to-slf4j-2.0.16.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
 kubernetes-client-api/6.13.2//kubernetes-client-api-6.13.2.jar
 kubernetes-client/6.13.2//kubernetes-client-6.13.2.jar
@@ -253,7 +253,7 @@ 
scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar
 scala-parser-combinators_2.13/2.4.0//scala-parser-combinators_2.13-2.4.0.jar
 scala-reflect/2.13.14//scala-reflect-2.13.14.jar
 scala-xml_2.13/2.3.0//scala-xml_2.13-2.3.0.jar
-slf4j-api/2.0.14//slf4j-api-2.0.14.jar
+slf4j-api/2.0.16//slf4j-api-2.0.16.jar
 snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar
 snakeyaml/2.2//snakeyaml-2.2.jar
 snappy-java/1.1.10.6//snappy-java-1.1.10.6.jar
diff --git a/pom.xml b/pom.xml
index 8f3283d2498b..cda6d8d5c289 100644
--- a/pom.xml
+++ b/pom.xml
@@ -119,7 +119,7 @@
 3.2.0
 spark
 9.7
-2.0.14
+2.0.16
 2.22.1
 
 3.4.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49206][CORE][UI] Add `Environment Variables` table to Master `EnvironmentPage`

2024-08-12 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2973097b4fae [SPARK-49206][CORE][UI] Add `Environment Variables` table 
to Master `EnvironmentPage`
2973097b4fae is described below

commit 2973097b4fae86f69322c7927962c2599f3f98b6
Author: Dongjoon Hyun 
AuthorDate: Mon Aug 12 20:51:41 2024 +0800

[SPARK-49206][CORE][UI] Add `Environment Variables` table to Master 
`EnvironmentPage`

### What changes were proposed in this pull request?

This PR aims to add `Environment Variables` table to Master 
`EnvironmentPage` via a new configuration, 
`spark.master.ui.visibleEnvVarPrefixes`.

### Why are the changes needed?

To allow users to expose and show the environment variables of Spark Master.

### Does this PR introduce _any_ user-facing change?

Yes, but this is a new table on `Spark Master` UI's `EnvironmentPage`.

### How was this patch tested?

Pass the CIs with newly added test case.

**DEFAULT**
```
$ sbin/start-master.sh
```
![Screenshot 2024-08-12 at 00 53 
26](https://github.com/user-attachments/assets/b7929536-a25d-4bd5-876d-908ed7403b92)

**Expose `AWS_`**
```
$ AWS_CA_BUNDLE=/tmp/root-ca.pem \
  AWS_ENDPOINT_URL=https://s3express-usw2-az1.us-west-2.amazonaws.com \
  SPARK_MASTER_OPTS="-Dspark.master.ui.visibleEnvVarPrefixes=AWS_" \
  sbin/start-master.sh
```
![Screenshot 2024-08-12 at 01 05 
25](https://github.com/user-attachments/assets/50a57ba1-8ed8-4827-ad51-b303da09a663)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47714 from dongjoon-hyun/SPARK-49206.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 .../main/resources/org/apache/spark/ui/static/webui.js |  1 +
 .../spark/deploy/master/ui/EnvironmentPage.scala   | 18 ++
 .../scala/org/apache/spark/internal/config/UI.scala|  7 +++
 .../deploy/master/ui/ReadOnlyMasterWebUISuite.scala| 11 +++
 4 files changed, 37 insertions(+)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/webui.js 
b/core/src/main/resources/org/apache/spark/ui/static/webui.js
index b365082c1e14..4c7cf8c8ea90 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/webui.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/webui.js
@@ -75,6 +75,7 @@ $(function() {
   
collapseTablePageLoad('collapse-aggregated-systemProperties','aggregated-systemProperties');
   
collapseTablePageLoad('collapse-aggregated-metricsProperties','aggregated-metricsProperties');
   
collapseTablePageLoad('collapse-aggregated-classpathEntries','aggregated-classpathEntries');
+  
collapseTablePageLoad('collapse-aggregated-environmentVariables','aggregated-environmentVariables');
   
collapseTablePageLoad('collapse-aggregated-activeJobs','aggregated-activeJobs');
   
collapseTablePageLoad('collapse-aggregated-completedJobs','aggregated-completedJobs');
   
collapseTablePageLoad('collapse-aggregated-failedJobs','aggregated-failedJobs');
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala
index 190e821524ba..c05b20d30b98 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala
@@ -17,12 +17,14 @@
 
 package org.apache.spark.deploy.master.ui
 
+import scala.jdk.CollectionConverters._
 import scala.xml.Node
 
 import jakarta.servlet.http.HttpServletRequest
 
 import org.apache.spark.{SparkConf, SparkEnv}
 import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.config.UI.MASTER_UI_VISIBLE_ENV_VAR_PREFIXES
 import org.apache.spark.ui._
 import org.apache.spark.util.Utils
 
@@ -39,6 +41,9 @@ private[ui] class EnvironmentPage(
 val systemProperties = Utils.redact(conf, details("System 
Properties")).sorted
 val metricsProperties = Utils.redact(conf, details("Metrics 
Properties")).sorted
 val classpathEntries = details("Classpath Entries").sorted
+val prefixes = conf.get(MASTER_UI_VISIBLE_ENV_VAR_PREFIXES)
+val environmentVariables = System.getenv().asScala
+  .filter { case (k, _) => prefixes.exists(k.startsWith(_)) }.toSeq.sorted
 
 val runtimeInformationTable = UIUtils.listingTable(propertyHeader, 
propertyRow,
   jvmInformation, fixedWidth = true, headerClasses = headerClasses)
@@ -52,6 +57,8 @@ private[ui] class EnvironmentPage(
   metricsPr

(spark) branch master updated: [SPARK-49077][SQL][TESTS] Remove `bouncycastle-related` test dependencies from `hive-thriftserver` module

2024-08-01 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 08b1fb55d738 [SPARK-49077][SQL][TESTS] Remove `bouncycastle-related` 
test dependencies from `hive-thriftserver` module
08b1fb55d738 is described below

commit 08b1fb55d738dde5a69aa94aab946d15bc2568af
Author: yangjie01 
AuthorDate: Thu Aug 1 23:34:17 2024 +0800

[SPARK-49077][SQL][TESTS] Remove `bouncycastle-related` test dependencies 
from `hive-thriftserver` module

### What changes were proposed in this pull request?
After SPARK-49066 merged, other than `OrcEncryptionSuite`, the test cases 
for writing Orc data no longer require the use of `FakeKeyProvider`. As a 
result, `hive-thriftserver` no longer needs these test dependencies.

### Why are the changes needed?
Clean up the test dependencies that are no longer needed by 
`hive-thriftserver`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual Test with this pr.

```
build/mvn -Phive -Phive-thriftserver clean install -DskipTests
build/mvn -Phive -Phive-thriftserver clean install -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite
 -pl sql/hive-thriftserver
```

```
Run completed in 5 minutes, 14 seconds.
Total number of tests run: 243
Suites: completed 2, aborted 0
Tests: succeeded 243, failed 0, canceled 0, ignored 20, pending 0
All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47563 from LuciferYang/SPARK-49077.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 sql/hive-thriftserver/pom.xml | 10 --
 1 file changed, 10 deletions(-)

diff --git a/sql/hive-thriftserver/pom.xml b/sql/hive-thriftserver/pom.xml
index d50c78bd1f9b..6a352f8a530d 100644
--- a/sql/hive-thriftserver/pom.xml
+++ b/sql/hive-thriftserver/pom.xml
@@ -156,16 +156,6 @@
   org.apache.httpcomponents
   httpcore
 
-
-  org.bouncycastle
-  bcprov-jdk18on
-  test
-
-
-  org.bouncycastle
-  bcpkix-jdk18on
-  test
-
   
   
 
target/scala-${scala.binary.version}/classes


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49076][SQL] Fix the outdated `logical plan name` in `AstBuilder's` comments

2024-08-01 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new abf9bac69680 [SPARK-49076][SQL] Fix the outdated `logical plan name` 
in `AstBuilder's` comments
abf9bac69680 is described below

commit abf9bac696804d7af28fbc9cd9026efddea303e3
Author: panbingkun 
AuthorDate: Thu Aug 1 23:32:44 2024 +0800

[SPARK-49076][SQL] Fix the outdated `logical plan name` in `AstBuilder's` 
comments

### What changes were proposed in this pull request?
The pr aims to fix the outdated `logical plan name` in `AstBuilder's` 
comments.

### Why are the changes needed?
- After the pr https://github.com/apache/spark/pull/33609, the name of the 
logical plan below has been changed:
  `AlterTableAddColumns` -> `AddColumns`
  `AlterTableRenameColumn` -> `RenameColumn`
  `AlterTableAlterColumn` -> `AlterColumn`
  `AlterTableDropColumns` -> `DropColumns`

- After the pr https://github.com/apache/spark/pull/30398
The name of the logical plan `ShowPartitionsStatement` has been changed to 
`ShowPartitions`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Only update comments.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47562 from panbingkun/fix_astbuilder.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun 
    Signed-off-by: yangjie01 
---
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala| 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index a046ededf964..feb3ef4e7155 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -61,8 +61,8 @@ import org.apache.spark.util.random.RandomSampler
  * The AstBuilder converts an ANTLR4 ParseTree into a catalyst Expression, 
LogicalPlan or
  * TableIdentifier.
  */
-class AstBuilder extends DataTypeAstBuilder with SQLConfHelper
-  with Logging with DataTypeErrorsBase {
+class AstBuilder extends DataTypeAstBuilder
+  with SQLConfHelper with Logging with DataTypeErrorsBase {
   import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
   import ParserUtils._
 
@@ -4452,7 +4452,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper
   }
 
   /**
-   * Parse a [[AlterTableAddColumns]] command.
+   * Parse a [[AddColumns]] command.
*
* For example:
* {{{
@@ -4469,7 +4469,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper
   }
 
   /**
-   * Parse a [[AlterTableRenameColumn]] command.
+   * Parse a [[RenameColumn]] command.
*
* For example:
* {{{
@@ -4485,7 +4485,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper
   }
 
   /**
-   * Parse a [[AlterTableAlterColumn]] command to alter a column's property.
+   * Parse a [[AlterColumn]] command to alter a column's property.
*
* For example:
* {{{
@@ -4555,7 +4555,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper
   }
 
   /**
-   * Parse a [[AlterTableAlterColumn]] command. This is Hive SQL syntax.
+   * Parse a [[AlterColumn]] command. This is Hive SQL syntax.
*
* For example:
* {{{
@@ -4639,7 +4639,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper
   }
 
   /**
-   * Parse a [[AlterTableDropColumns]] command.
+   * Parse a [[DropColumns]] command.
*
* For example:
* {{{
@@ -4979,7 +4979,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper
* A command for users to list the partition names of a table. If partition 
spec is specified,
* partitions that match the spec are returned. Otherwise an empty result 
set is returned.
*
-   * This function creates a [[ShowPartitionsStatement]] logical plan
+   * This function creates a [[ShowPartitions]] logical plan
*
* The syntax of using this command in SQL is:
* {{{


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48964][SQL][DOCS] Fix the discrepancy between implementation, comment and documentation of option `recursive.fields.max.depth` in ProtoBuf connector

2024-07-31 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5954ed19b9dd [SPARK-48964][SQL][DOCS] Fix the discrepancy between 
implementation, comment and documentation of option 
`recursive.fields.max.depth` in ProtoBuf connector
5954ed19b9dd is described below

commit 5954ed19b9dd1e7e741e37a689ae741722f7d5b6
Author: Wei Guo 
AuthorDate: Wed Jul 31 17:04:26 2024 +0800

[SPARK-48964][SQL][DOCS] Fix the discrepancy between implementation, 
comment and documentation of option `recursive.fields.max.depth` in ProtoBuf 
connector

### What changes were proposed in this pull request?

This PR aims to fix the discrepancy between implementation, comment and 
documentation of option `recursive.fields.max.depth` in ProtoBuf connector

### Why are the changes needed?

Unify code implementation and documentation description.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47458 from wayneguow/SPARK-48964.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 .../utils/src/main/resources/error/error-conditions.json   |  2 +-
 .../apache/spark/sql/protobuf/utils/SchemaConverters.scala | 14 +++---
 docs/sql-data-sources-protobuf.md  | 11 ++-
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 15b851a78d62..de127d4a7bf0 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -3640,7 +3640,7 @@
   },
   "RECURSIVE_PROTOBUF_SCHEMA" : {
 "message" : [
-  "Found recursive reference in Protobuf schema, which can not be 
processed by Spark by default: . try setting the option 
`recursive.fields.max.depth` 0 to 10. Going beyond 10 levels of recursion is 
not allowed."
+  "Found recursive reference in Protobuf schema, which can not be 
processed by Spark by default: . try setting the option 
`recursive.fields.max.depth` 1 to 10. Going beyond 10 levels of recursion is 
not allowed."
 ],
 "sqlState" : "42K0G"
   },
diff --git 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
index feb5aed03451..56c1f8185061 100644
--- 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
+++ 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
@@ -176,16 +176,16 @@ object SchemaConverters extends Logging {
 }
   case MESSAGE =>
 // If the `recursive.fields.max.depth` value is not specified, it will 
default to -1,
-// and recursive fields are not permitted. Setting it to 0 drops all 
recursive fields,
-// 1 allows it to be recursed once, and 2 allows it to be recursed 
twice and so on.
-// A value greater than 10 is not allowed, and if a protobuf record 
has more depth for
-// recursive fields than the allowed value, it will be truncated and 
some fields may be
-// discarded.
+// and recursive fields are not permitted. Setting it to 1 drops all 
recursive fields,
+// 2 allows it to be recursed once, and 3 allows it to be recursed 
twice and so on.
+// A value less than or equal to 0 or greater than 10 is not allowed, 
and if a protobuf
+// record has more depth for recursive fields than the allowed value, 
it will be truncated
+// and some fields may be discarded.
 // SQL Schema for protob2uf `message Person { string name = 1; Person 
bff = 2;}`
 // will vary based on the value of "recursive.fields.max.depth".
 // 1: struct
-// 2: struct>
-// 3: struct>>
+// 2: struct>
+// 3: struct>>
 // and so on.
 // TODO(rangadi): A better way to terminate would be replace the 
remaining recursive struct
 //  with the byte array of corresponding protobuf. This way no 
information is lost.
diff --git a/docs/sql-data-sources-protobuf.md 
b/docs/sql-data-sources-protobuf.md
index 28e3e83bef7c..34cb1d4997d2 100644
--- a/docs/sql-data-sources-protobuf.md
+++ b/docs/sql-data-sources-protobuf.md
@@ -402,9 +402,9 @@ Spark supports the writing of all Spark SQL types into 
Protobuf. For most types,
 ## Handling circular references protobuf fields
 
 One common i

(spark) branch master updated: [SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1

2024-07-29 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6ff93eaca57e [SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1
6ff93eaca57e is described below

commit 6ff93eaca57e14ed26e2e3cceb25d53e811f4765
Author: panbingkun 
AuthorDate: Tue Jul 30 13:28:53 2024 +0800

[SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1

### What changes were proposed in this pull request?
The pr aims to upgrade `RoaringBitmap` from `1.1.0` to `1.2.1`.

### Why are the changes needed?
- The full release notes: 
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.2.0
- The latest version has brought bug fixes and some improvements:
  improve: Optimize RoaringBitSet.get(int fromIndex, int toIndex) 
https://github.com/RoaringBitmap/RoaringBitmap/pull/727
  fix: add bitmapOfRange (non-static) in 
https://github.com/RoaringBitmap/RoaringBitmap/pull/728

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47247 from panbingkun/SPARK-48829.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 
 core/benchmarks/MapStatusesConvertBenchmark-results.txt   | 8 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
index a15442496b24..8f0886ae4d99 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
+OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1023-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500674685 
 12  0.0   673772738.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1579   1590 
 12  0.0  1579383970.0   0.4X
-Num Maps: 5 Fetch partitions:1500  2435   2472 
 37  0.0  2434530380.0   0.3X
+Num Maps: 5 Fetch partitions:500697707 
 10  0.0   697013793.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1608   1621 
 16  0.0  1608250487.0   0.4X
+Num Maps: 5 Fetch partitions:1500  2443   2478 
 39  0.0  2443321570.0   0.3X
 
 
diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
index b9f36af4a653..b64b0b392473 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
+OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1023-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500703716 
 11  0.0   703103575.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1707   1723 
 14  0.0  1707060398.0   0.4X
-Num Maps: 5 Fetch partitions:1500  2626   2638 
 14  0.0  2625981097.0   0.3X
+Num Maps: 5 Fetch partitions:500769772 
  3  0.0   769382967.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1698   1715 
 14  0.0  1698166886.0   0.5X
+Num Maps: 5 Fetch partitions:1500  2588   2606 
 26  0.0  2587840071.0   0.3X
 
 
diff --git a/dev

(spark) branch master updated: [SPARK-48974][SQL][SS][ML][MLLIB] Use `SparkSession.implicits` instead of `SQLContext.implicits`

2024-07-23 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 877c3f2bea92 [SPARK-48974][SQL][SS][ML][MLLIB] Use 
`SparkSession.implicits` instead of `SQLContext.implicits`
877c3f2bea92 is described below

commit 877c3f2bea924ca9f3fd5b7e9c6cbfb0fc3be958
Author: yangjie01 
AuthorDate: Wed Jul 24 10:41:07 2024 +0800

[SPARK-48974][SQL][SS][ML][MLLIB] Use `SparkSession.implicits` instead of 
`SQLContext.implicits`

### What changes were proposed in this pull request?
This PR replaces `SQLContext.implicits`  with `SparkSession.implicits` in 
the Spark codebase.

### Why are the changes needed?
Reduce the usage of code from `SQLContext` within the internal code of 
Spark.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47457 from LuciferYang/use-sparksession-implicits.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: yangjie01 
---
 .../src/main/scala/org/apache/spark/mllib/util/MLUtils.scala |  2 +-
 .../apache/spark/ml/classification/FMClassifierSuite.scala   |  4 ++--
 .../spark/ml/classification/LogisticRegressionSuite.scala| 12 ++--
 .../apache/spark/ml/recommendation/CollectTopKSuite.scala|  4 ++--
 .../apache/spark/ml/regression/LinearRegressionSuite.scala   |  4 ++--
 .../test/scala/org/apache/spark/ml/util/MLTestingUtils.scala |  2 +-
 .../spark/sql/execution/datasources/csv/CSVUtils.scala   |  2 +-
 .../org/apache/spark/sql/SparkSessionExtensionSuite.scala|  8 
 .../org/apache/spark/sql/streaming/util/BlockingSource.scala |  2 +-
 .../spark/sql/hive/HiveContextCompatibilitySuite.scala   |  4 ++--
 .../org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala |  2 +-
 .../scala/org/apache/spark/sql/hive/ListTablesSuite.scala|  2 +-
 .../org/apache/spark/sql/hive/execution/HiveQuerySuite.scala |  2 +-
 .../spark/sql/hive/execution/HiveResolutionSuite.scala   |  2 +-
 .../apache/spark/sql/hive/execution/HiveTableScanSuite.scala |  2 +-
 .../sql/sources/BucketedWriteWithHiveSupportSuite.scala  |  2 +-
 16 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
index e23423e4c004..1257d2ccfbfb 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
@@ -119,7 +119,7 @@ object MLUtils extends Logging {
   ).resolveRelation(checkFilesExist = false))
   .select("value")
 
-import lines.sqlContext.implicits._
+import lines.sparkSession.implicits._
 
 lines.select(trim($"value").as("line"))
   .filter(not((length($"line") === 0).or($"line".startsWith("#"
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala
index 68e83fccf3d1..ff9ce1ca7b9f 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala
@@ -52,8 +52,8 @@ class FMClassifierSuite extends MLTest with 
DefaultReadWriteTest {
   }
 
   test("FMClassifier: Predictor, Classifier methods") {
-val sqlContext = smallBinaryDataset.sqlContext
-import sqlContext.implicits._
+val session = smallBinaryDataset.sparkSession
+import session.implicits._
 val fm = new FMClassifier()
 
 val model = fm.fit(smallBinaryDataset)
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
index 8e54262e2f61..b0e275f5e193 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
@@ -550,8 +550,8 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTest {
   }
 
   test("multinomial logistic regression: Predictor, Classifier methods") {
-val sqlContext = smallMultinomialDataset.sqlContext
-import sqlContext.implicits._
+val session = smallMultinomialDataset.sparkSession
+import session.implicits._
 val mlr = new LogisticRegression().setFamily("multinomial")
 
 val model = mlr.fit(smallMultinomialDataset)
@@ -590,8 +590,8 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTes

(spark) branch master updated: [SPARK-48893][SQL][PYTHON][DOCS] Add some examples for `linearRegression` built-in functions

2024-07-22 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a809740cf6ec [SPARK-48893][SQL][PYTHON][DOCS] Add some examples for 
`linearRegression` built-in functions
a809740cf6ec is described below

commit a809740cf6ec039141d416f6fb27a6deb66b3d2c
Author: Wei Guo 
AuthorDate: Tue Jul 23 10:27:21 2024 +0800

[SPARK-48893][SQL][PYTHON][DOCS] Add some examples for `linearRegression` 
built-in functions

### What changes were proposed in this pull request?

This PR aims to add some extra examples for `linearRegression` built-in 
functions.

### Why are the changes needed?

- Align the use examples for this series of functions.
- Allow users to better understand the usage of `linearRegression` related 
methods from sql built-in functions 
docs(https://spark.apache.org/docs/latest/api/sql/index.html).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA and Manual testing for new examples.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47343 from wayneguow/regr_series.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 python/pyspark/sql/functions/builtin.py| 545 ++---
 .../expressions/aggregate/linearRegression.scala   |  28 +-
 .../sql-functions/sql-expression-schema.md |   4 +-
 3 files changed, 494 insertions(+), 83 deletions(-)

diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index 5b9d0dd87002..3d094dd38c50 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -3671,16 +3671,59 @@ def regr_avgx(y: "ColumnOrName", x: "ColumnOrName") -> 
Column:
 
 Examples
 
->>> from pyspark.sql import functions as sf
->>> x = (sf.col("id") % 3).alias("x")
->>> y = (sf.randn(42) + x * 10).alias("y")
->>> spark.range(0, 1000, 1, 1).select(x, y).select(
-... sf.regr_avgx("y", "x"), sf.avg("x")
-... ).show()
+Example 1: All paris are non-null
+
+>>> import pyspark.sql.functions as sf
+>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS 
tab(y, x)")
+>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
 +---+--+
 |regr_avgx(y, x)|avg(x)|
 +---+--+
-|  0.999| 0.999|
+|   2.75|  2.75|
++---+--+
+
+Example 2: All paris's x values are null
+
+>>> import pyspark.sql.functions as sf
+>>> df = spark.sql("SELECT * FROM VALUES (1, null) AS tab(y, x)")
+>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
++---+--+
+|regr_avgx(y, x)|avg(x)|
++---+--+
+|   NULL|  NULL|
++---+--+
+
+Example 3: All paris's y values are null
+
+>>> import pyspark.sql.functions as sf
+>>> df = spark.sql("SELECT * FROM VALUES (null, 1) AS tab(y, x)")
+>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
++---+--+
+|regr_avgx(y, x)|avg(x)|
++---+--+
+|   NULL|   1.0|
++---+--+
+
+Example 4: Some paris's x values are null
+
+>>> import pyspark.sql.functions as sf
+>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) 
AS tab(y, x)")
+>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
++---+--+
+|regr_avgx(y, x)|avg(x)|
++---+--+
+|3.0|   3.0|
++---+--+
+
+Example 5: Some paris's x or y values are null
+
+>>> import pyspark.sql.functions as sf
+>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, null), (null, 3), (2, 
4) AS tab(y, x)")
+>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show()
++---+--+
+|regr_avgx(y, x)|avg(x)|
++---+--+
+|3.0|   3.0|
 +---+--+
 """
 return _invoke_function_over_columns("regr_avgx", y, x)
@@ -3708,17 +3751,60 @@ def regr_avgy(y: "ColumnOrName", x: "ColumnOrName") -> 
Column:
 
 Examples
 

(spark) branch master updated: [SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance the test coverage of behavior changes of `asin` and `acos` complying Standard SQL

2024-07-22 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4cc41ea63f94 [SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance 
the test coverage of behavior changes of `asin` and `acos` complying Standard 
SQL
4cc41ea63f94 is described below

commit 4cc41ea63f943b61be8f771f5cd95cfd4ea15c2e
Author: Wei Guo 
AuthorDate: Tue Jul 23 10:21:32 2024 +0800

[SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance the test coverage 
of behavior changes of `asin` and `acos` complying Standard SQL

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` from 2.2.220 to 2.3.230 and enhance the test 
coverage of behavior changes of `asin` and `acos` complying Standard SQL.

The detail of behavior changes as follows:
After this commit( 
https://github.com/h2database/h2database/commit/186647d4a35d05681febf4f53502b306aa6d511a),
 the behavior of `asin` and `acos` has changed in h2, complying with Standard 
SQL, and throwing exceptions directly when the argument is invalid(< -1d || > 
1d).

### Why are the changes needed?

2.3.230 is latest version of `h2`, there are a lot of bug fixes and 
improvements.
Full change notes:
https://www.h2database.com/html/changelog.html

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Update a exist test case and add a new test case.
Pass GA and manually test `JDBCV2Suite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47414 from wayneguow/upgrade_h2.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 connect/server/pom.xml |  2 +-
 sql/core/pom.xml   |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 58 +++---
 3 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/connect/server/pom.xml b/connect/server/pom.xml
index 73a3310c8a38..ecbb22168aa1 100644
--- a/connect/server/pom.xml
+++ b/connect/server/pom.xml
@@ -254,7 +254,7 @@
 
   com.h2database
   h2
-  2.2.220
+  2.3.230
   test
 
 
diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index 59d798e6e62f..c891763eb4e1 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -166,7 +166,7 @@
 
   com.h2database
   h2
-  2.2.220
+  2.3.230
   test
 
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
index e1a7971b283c..db06aac7f5e0 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
@@ -37,7 +37,7 @@ import org.apache.spark.sql.connector.expressions.Expression
 import org.apache.spark.sql.execution.FormattedMode
 import 
org.apache.spark.sql.execution.datasources.v2.{DataSourceV2ScanRelation, 
V1ScanWrapper}
 import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog
-import org.apache.spark.sql.functions.{abs, acos, asin, atan, atan2, avg, 
ceil, coalesce, cos, cosh, cot, count, count_distinct, degrees, exp, floor, 
lit, log => logarithm, log10, not, pow, radians, round, signum, sin, sinh, 
sqrt, sum, tan, tanh, udf, when}
+import org.apache.spark.sql.functions.{abs, acos, asin, avg, ceil, coalesce, 
count, count_distinct, degrees, exp, floor, lit, log => logarithm, log10, not, 
pow, radians, round, signum, sqrt, sum, udf, when}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types.{DataType, IntegerType, StringType}
@@ -1258,25 +1258,29 @@ class JDBCV2Suite extends QueryTest with 
SharedSparkSession with ExplainSuiteHel
 checkAnswer(df15, Seq(Row(1, "cathy", 9000, 1200, false),
   Row(2, "alex", 12000, 1200, false), Row(6, "jen", 12000, 1200, true)))
 
-val df16 = spark.table("h2.test.employee")
-  .filter(sin($"bonus") < -0.08)
-  .filter(sinh($"bonus") > 200)
-  .filter(cos($"bonus") > 0.9)
-  .filter(cosh($"bonus") > 200)
-  .filter(tan($"bonus") < -0.08)
-  .filter(tanh($"bonus") === 1)
-  .filter(cot($"bonus") < -11)
-  .filter(asin($"bonus") > 0.1)
-  .filter(acos($"bonus") > 1.4)
-  .filter(atan($"bonus") > 1.4)
-  .filter(atan2($"bonus", $"bonus") > 0.7)
+val df16 = sql(
+  """
+|SELECT * FROM h2.test.employee
+|WHERE sin(bonus) < -0.08
+|

(spark) branch master updated: [MINOR][SQL][TESTS] Enable test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite`

2024-07-18 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 546da0d5522e [MINOR][SQL][TESTS] Enable test case `testOrcAPI` in 
`JavaDataFrameReaderWriterSuite`
546da0d5522e is described below

commit 546da0d5522ec79620bd29563c5ea809386635f5
Author: yangjie01 
AuthorDate: Thu Jul 18 15:58:21 2024 +0800

[MINOR][SQL][TESTS] Enable test case `testOrcAPI` in 
`JavaDataFrameReaderWriterSuite`

### What changes were proposed in this pull request?
This PR enabled test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite` 
because this test no longer depends on Hive classes, we can test it like other 
test cases in this Suite.

### Why are the changes needed?
Enable test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47400 from LuciferYang/minor-testOrcAPI.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java| 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java
 
b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java
index 2a0c8c00574a..691fb67bbe90 100644
--- 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java
+++ 
b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java
@@ -144,10 +144,7 @@ public class JavaDataFrameReaderWriterSuite {
 .write().parquet(output);
   }
 
-  /**
-   * This only tests whether API compiles, but does not run it as orc()
-   * cannot be run without Hive classes.
-   */
+  @Test
   public void testOrcAPI() {
 spark.read().schema(schema).orc();
 spark.read().schema(schema).orc(input);


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][SQL][TESTS] Fix compilation warning `adaptation of an empty argument list by inserting () is deprecated`

2024-07-15 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b6c05259a0b9 [MINOR][SQL][TESTS] Fix compilation warning `adaptation 
of an empty argument list by inserting () is deprecated`
b6c05259a0b9 is described below

commit b6c05259a0b98205d2f0fe2476ecd09c8d258b0a
Author: panbingkun 
AuthorDate: Mon Jul 15 17:11:17 2024 +0800

[MINOR][SQL][TESTS] Fix compilation warning `adaptation of an empty 
argument list by inserting () is deprecated`

### What changes were proposed in this pull request?
The pr aims to fix  compilation warning: `adaptation of an empty argument 
list by inserting () is deprecated`

### Why are the changes needed?
Fix compilation warning.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually check.
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47350 from panbingkun/ParquetCommitterSuite_deprecated.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .../spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala
index eadd55bdc320..fb435e3639fd 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala
@@ -116,7 +116,7 @@ class ParquetCommitterSuite extends SparkFunSuite with 
SQLTestUtils
   test("SPARK-48804: Fail fast on unloadable or invalid committers") {
 Seq("invalid", getClass.getName).foreach { committer =>
   val e = intercept[IllegalArgumentException] {
-withSQLConf(SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key -> committer)()
+withSQLConf(SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key -> 
committer)(())
   }
   assert(e.getMessage.contains(classOf[OutputCommitter].getName))
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug

2024-07-12 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4c7edd2a2048 [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and 
fix bug
4c7edd2a2048 is described below

commit 4c7edd2a20480a8521fcc88a966b22619143aebd
Author: panbingkun 
AuthorDate: Fri Jul 12 15:22:34 2024 +0800

[SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug

### What changes were proposed in this pull request?
The pr aims to refactor `HiveQuerySuite` and `fix` bug, includes:
- use `getWorkspaceFilePath` to enable `HiveQuerySuite` to run successfully 
in the IDE.
- make the test `lookup hive UDF in another thread` `independence`, without 
relying on the previous UT `current_database with multiple sessions`.
- enable two test: `non-boolean conditions in a CaseWhen are illegal` and 
`Dynamic partition folder layout`.

### Why are the changes needed?
- Run successfully in the `IDE`
  Before:
  https://github.com/apache/spark/assets/15246973/005fd49c-3edf-4e51-8223-097fd7a485bf";>

  After:
  https://github.com/apache/spark/assets/15246973/caedec72-be0c-4bb5-bc06-26cceef8b4b8";>

- Make UT `lookup hive UDF in another thread` `independence`
  when `only` running it, it actually failed with the following error:
  https://github.com/apache/spark/assets/15246973/ef9c260f-8c0d-4821-8233-d4d7ae13802a";>

  **why ?**
  Because the previous UT `current_database with multiple sessions`  
changed `current database` and was not restored after it finished running.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47293 from panbingkun/refactor_HiveQuerySuite.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .../sql/hive/execution/HiveComparisonTest.scala|   5 +-
 .../spark/sql/hive/execution/HiveQuerySuite.scala  | 249 +++--
 2 files changed, 135 insertions(+), 119 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala
index f0feccb4f494..87e58bb8fa13 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala
@@ -100,8 +100,9 @@ abstract class HiveComparisonTest extends SparkFunSuite 
with BeforeAndAfterAll {
   .map(name => new File(targetDir, s"$suiteName.$name"))
 
   /** The local directory with cached golden answer will be stored. */
-  protected val answerCache = new File("src" + File.separator + "test" +
-File.separator + "resources" + File.separator + "golden")
+  protected val answerCache = getWorkspaceFilePath(
+"sql", "hive", "src", "test", "resources", "golden").toFile
+
   if (!answerCache.exists) {
 answerCache.mkdir()
   }
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
index 5ccb7f0d1f84..24d1e24b30c8 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.hive.execution
 
 import java.io.File
-import java.net.URI
 import java.nio.file.Files
 import java.sql.Timestamp
 
@@ -679,15 +678,23 @@ class HiveQuerySuite extends HiveComparisonTest with 
SQLTestUtils with BeforeAnd
 assert(actual === expected)
   }
 
-  // TODO: adopt this test when Spark SQL has the functionality / framework to 
report errors.
-  // See https://github.com/apache/spark/pull/1055#issuecomment-45820167 for a 
discussion.
-  ignore("non-boolean conditions in a CaseWhen are illegal") {
+  test("non-boolean conditions in a CaseWhen are illegal") {
 checkError(
   exception = intercept[AnalysisException] {
 sql("SELECT (CASE WHEN key > 2 THEN 3 WHEN 1 THEN 2 ELSE 0 END) FROM 
src").collect()
   },
-  errorClass = null,
-  parameters = Map.empty)
+  errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE",
+  parameters = Map(
+"sqlExpr" -> "\"CASE WHEN (key > 2) THEN 3 WHEN 1 THEN 2 ELSE 0 END\"",
+"paramIndex" -> "second",
+"i

(spark) branch master updated: [SPARK-48866][SQL] Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET

2024-07-11 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 261dbf4a9047 [SPARK-48866][SQL] Fix hints of valid charset in the 
error message of INVALID_PARAMETER_VALUE.CHARSET
261dbf4a9047 is described below

commit 261dbf4a9047bc00271137b547341e02351106ed
Author: Kent Yao 
AuthorDate: Thu Jul 11 18:59:10 2024 +0800

[SPARK-48866][SQL] Fix hints of valid charset in the error message of 
INVALID_PARAMETER_VALUE.CHARSET

### What changes were proposed in this pull request?

This PR fixes hints at the error message of 
INVALID_PARAMETER_VALUE.CHARSET. The current error message does not enumerate 
all valid charsets, e.g. UTF-32.

This PR parameterizes it to fix this issue.

### Why are the changes needed?
Bugfix, the hint w/ charsets missing is not helpful

### Does this PR introduce _any_ user-facing change?
Yes, error message changing

### How was this patch tested?
modified tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #47295 from yaooqinn/SPARK-48866.

Authored-by: Kent Yao 
Signed-off-by: yangjie01 
---
 common/utils/src/main/resources/error/error-conditions.json   | 2 +-
 .../scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala  | 5 +++--
 .../resources/sql-tests/results/ansi/string-functions.sql.out | 8 
 .../src/test/resources/sql-tests/results/string-functions.sql.out | 8 
 .../org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala | 8 +---
 5 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 02d1e63e380a..7f54a77c94a0 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -2584,7 +2584,7 @@
   },
   "CHARSET" : {
 "message" : [
-  "expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 
'UTF-16BE', 'UTF-16LE', 'UTF-16', but got ."
+  "expects one of the , but got ."
 ]
   },
   "DATETIME_UNIT" : {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index d524742e126e..bdd53219de40 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -42,7 +42,7 @@ import org.apache.spark.sql.catalyst.plans.JoinType
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.ValueInterval
 import org.apache.spark.sql.catalyst.trees.{Origin, TreeNode}
-import org.apache.spark.sql.catalyst.util.{sideBySide, DateTimeUtils, 
FailFastMode, MapData}
+import org.apache.spark.sql.catalyst.util.{sideBySide, CharsetProvider, 
DateTimeUtils, FailFastMode, MapData}
 import org.apache.spark.sql.connector.catalog.{CatalogNotFoundException, 
Table, TableProvider}
 import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
 import org.apache.spark.sql.connector.expressions.Transform
@@ -2742,7 +2742,8 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
   messageParameters = Map(
 "functionName" -> toSQLId(functionName),
 "parameter" -> toSQLId("charset"),
-"charset" -> charset))
+"charset" -> charset,
+"charsets" -> CharsetProvider.VALID_CHARSETS.mkString(", ")))
   }
 
   def malformedCharacterCoding(functionName: String, charset: String): 
RuntimeException = {
diff --git 
a/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out 
b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out
index da2fa9ca0c18..d4adec22c50f 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out
@@ -846,6 +846,7 @@ org.apache.spark.SparkIllegalArgumentException
   "sqlState" : "22023",
   "messageParameters" : {
 "charset" : "WINDOWS-1252",
+"charsets" : "UTF-16LE, UTF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, 
ISO-8859-1",
 "functionName" : "`encode`",
 "parameter" : "`charset`"
   }
@@ -863,6 +864,7 @@

(spark) branch master updated: [SPARK-48826][BUILD] Upgrade `fasterxml.jackson` to 2.17.2

2024-07-09 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 65daff55f556 [SPARK-48826][BUILD] Upgrade `fasterxml.jackson` to 2.17.2
65daff55f556 is described below

commit 65daff55f556ab48e06aa1f0536b627a8b479b9b
Author: Wei Guo 
AuthorDate: Tue Jul 9 16:01:27 2024 +0800

[SPARK-48826][BUILD] Upgrade `fasterxml.jackson` to 2.17.2

### What changes were proposed in this pull request?

This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2.

### Why are the changes needed?

There are some bug fixes about 
[Databind](https://github.com/FasterXML/jackson-databind):
[#4561](https://github.com/FasterXML/jackson-databind/issues/4561): Issues 
using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and 
ReentrantLock)
[#4575](https://github.com/FasterXML/jackson-databind/issues/4575): 
StdDelegatingSerializer does not consider a Converter that may return null for 
a non-null input
[#4577](https://github.com/FasterXML/jackson-databind/issues/4577): Cannot 
deserialize value of type java.math.BigDecimal from String "3." (not a valid 
representation)
[#4595](https://github.com/FasterXML/jackson-databind/issues/4595): No way 
to explicitly disable wrapping in custom annotation processor
[#4607](https://github.com/FasterXML/jackson-databind/issues/4607): 
MismatchedInput: No Object Id found for an instance of X to assign to property 
'id'
[#4610](https://github.com/FasterXML/jackson-databind/issues/4610): 
DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used 
with Polymorphic type handling

The full release note of 2.17.2:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47241 from wayneguow/upgrade_jackson.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++---
 pom.xml   |  4 ++--
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 2c3bee92176b..5ec7cb541ee7 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -102,16 +102,16 @@ icu4j/75.1//icu4j-75.1.jar
 ini4j/0.5.4//ini4j-0.5.4.jar
 istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar
 ivy/2.5.2//ivy-2.5.2.jar
-jackson-annotations/2.17.1//jackson-annotations-2.17.1.jar
+jackson-annotations/2.17.2//jackson-annotations-2.17.2.jar
 jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar
-jackson-core/2.17.1//jackson-core-2.17.1.jar
-jackson-databind/2.17.1//jackson-databind-2.17.1.jar
-jackson-dataformat-cbor/2.17.1//jackson-dataformat-cbor-2.17.1.jar
-jackson-dataformat-yaml/2.17.1//jackson-dataformat-yaml-2.17.1.jar
+jackson-core/2.17.2//jackson-core-2.17.2.jar
+jackson-databind/2.17.2//jackson-databind-2.17.2.jar
+jackson-dataformat-cbor/2.17.2//jackson-dataformat-cbor-2.17.2.jar
+jackson-dataformat-yaml/2.17.2//jackson-dataformat-yaml-2.17.2.jar
 jackson-datatype-jdk8/2.17.0//jackson-datatype-jdk8-2.17.0.jar
-jackson-datatype-jsr310/2.17.1//jackson-datatype-jsr310-2.17.1.jar
+jackson-datatype-jsr310/2.17.2//jackson-datatype-jsr310-2.17.2.jar
 jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
-jackson-module-scala_2.13/2.17.1//jackson-module-scala_2.13-2.17.1.jar
+jackson-module-scala_2.13/2.17.2//jackson-module-scala_2.13-2.17.2.jar
 jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar
 jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar
 jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar
diff --git a/pom.xml b/pom.xml
index 0ebe6ab8c580..b2dd22cb0c0a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -180,8 +180,8 @@
 true
 true
 1.9.13
-2.17.1
-
2.17.1
+2.17.2
+
2.17.2
 2.3.1
 3.0.2
 1.1.10.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48720][SQL] Align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2

2024-07-07 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73126b17515a [SPARK-48720][SQL] Align the command `ALTER TABLE ... 
UNSET TBLPROPERTIES ...` in v1 and v2
73126b17515a is described below

commit 73126b17515adc73dbb63f199fd641c330171d02
Author: panbingkun 
AuthorDate: Mon Jul 8 11:40:21 2024 +0800

[SPARK-48720][SQL] Align the command `ALTER TABLE ... UNSET TBLPROPERTIES 
...` in v1 and v2

### What changes were proposed in this pull request?
The pr aims to:
- align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2.
(this means that in the v1, regardless of whether `IF EXISTS` is specified 
or not, when unset a `non-existent` property, it is `ignored` and no longer 
`fails`.)
- update the description of `ALTER TABLE ... UNSET TBLPROPERTIES ...` in 
the doc `docs/sql-ref-syntax-ddl-alter-table.md`.
- unify v1 and v2 `ALTER TABLE ... UNSET TBLPROPERTIES ...` tests.
- Add the following `scenario` for `ALTER TABLE ... SET TBLPROPERTIES ...` 
testing
A.`table to alter does not exist`
B.`alter table set reserved properties`

### Why are the changes needed?
- align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2, 
avoid confusing end-users.
- to improve test coverage.
- align with other similar tests, eg: `AlterTableSetTblProperties*`

### Does this PR introduce _any_ user-facing change?
Yes, in the `v1`, regardless of whether `IF EXISTS` is specified or not, 
when unset a `non-existent` property, it is `ignored` and no longer `fails`

### How was this patch tested?
Update some UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47097 from panbingkun/alter_unset_table.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .../src/main/resources/error/error-conditions.json |   6 -
 docs/sql-ref-syntax-ddl-alter-table.md |  20 ++-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  10 --
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  19 ---
 .../apache/spark/sql/execution/command/ddl.scala   |   8 --
 .../AlterTableSetTblPropertiesSuiteBase.scala  |  80 +--
 .../AlterTableUnsetTblPropertiesParserSuite.scala  |  65 +
 .../AlterTableUnsetTblPropertiesSuiteBase.scala| 149 +
 .../sql/execution/command/DDLParserSuite.scala |  12 --
 .../spark/sql/execution/command/DDLSuite.scala |  67 -
 .../v1/AlterTableSetTblPropertiesSuite.scala   |   4 +
 ...ala => AlterTableUnsetTblPropertiesSuite.scala} |  17 ++-
 .../v2/AlterTableSetTblPropertiesSuite.scala   |   4 +
 ...ala => AlterTableUnsetTblPropertiesSuite.scala} |  10 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  26 +---
 .../AlterTableUnsetTblPropertiesSuite.scala|  27 
 16 files changed, 353 insertions(+), 171 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 45b922b88063..06f8d3a78252 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -4275,12 +4275,6 @@
 ],
 "sqlState" : "42883"
   },
-  "UNSET_NONEXISTENT_PROPERTIES" : {
-"message" : [
-  "Attempted to unset non-existent properties [] in table 
."
-],
-"sqlState" : "42K0J"
-  },
   "UNSUPPORTED_ADD_FILE" : {
 "message" : [
   "Don't support add file."
diff --git a/docs/sql-ref-syntax-ddl-alter-table.md 
b/docs/sql-ref-syntax-ddl-alter-table.md
index 566e73da2151..31eaf659b5c7 100644
--- a/docs/sql-ref-syntax-ddl-alter-table.md
+++ b/docs/sql-ref-syntax-ddl-alter-table.md
@@ -236,21 +236,29 @@ ALTER TABLE table_identifier DROP [ IF EXISTS ] 
partition_spec [PURGE]
  
 ### SET AND UNSET
 
- SET TABLE PROPERTIES
+ SET PROPERTIES
 
 `ALTER TABLE SET` command is used for setting the table properties. If a 
particular property was already set, 
 this overrides the old value with the new one.
 
-`ALTER TABLE UNSET` is used to drop the table property. 
-
 # Syntax
 
 ```sql
--- Set Table Properties 
+-- Set Properties
 ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 = val1, key2 = val2, ... 
)
+```
+
+ UNSET PROPERTIES
+
+`ALTER TABLE UNSET` command is used to drop the table property.
 
--- Unset Table Properties
-ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, 
... )
+**Note:** If the specified property key does not exist, whether specify `IF 
EXISTS` or not, the command will ignore it and finally suc

(spark) branch master updated (f1eca903f5c2 -> 489e32535aad)

2024-07-07 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f1eca903f5c2 [SPARK-48719][SQL] Fix the calculation bug of `RegrSlope` 
& `RegrIntercept` when the first parameter is null
 add 489e32535aad [SPARK-48177][BUILD][FOLLOWUP] Update parquet version in 
`sql-data-sources-parquet.md` doc

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-parquet.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls to bridged APIs based on `SparkSession#sqlContext` with `SparkSession` API

2024-07-04 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 54b75582506d [SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls 
to bridged APIs based on `SparkSession#sqlContext` with `SparkSession` API
54b75582506d is described below

commit 54b75582506d0e58af7f500b9d284ab7222e98f0
Author: yangjie01 
AuthorDate: Thu Jul 4 19:27:22 2024 +0800

[SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls to bridged APIs 
based on `SparkSession#sqlContext` with `SparkSession` API

### What changes were proposed in this pull request?
In the internal code of Spark, there are instances where, despite having a 
SparkSession instance, the bridged APIs based on SparkSession#sqlContext are 
still used. Therefore, this PR makes some simplifications in this regard:"

1. `SparkSession#sqlContext#read` -> `SparkSession#read`

```scala
/**
   * Returns a [[DataFrameReader]] that can be used to read non-streaming 
data in as a
   * `DataFrame`.
   * {{{
   *   sqlContext.read.parquet("/path/to/file.parquet")
   *   sqlContext.read.schema(schema).json("/path/to/file.json")
   * }}}
   *
   * group genericdata
   * since 1.4.0
   */
  def read: DataFrameReader = sparkSession.read
```

2. `SparkSession#sqlContext#setConf` -> `SparkSession#conf#set`

```scala
  /**
   * Set the given Spark SQL configuration property.
   *
   * group config
   * since 1.0.0
   */
  def setConf(key: String, value: String): Unit = {
sparkSession.conf.set(key, value)
  }
```

3. `SparkSession#sqlContext#getConf` -> `SparkSession#conf#get`

```scala
/**
   * Return the value of Spark SQL configuration property for the given key.
   *
   * group config
   * since 1.0.0
   */
  def getConf(key: String): String = {
sparkSession.conf.get(key)
  }
```

4. `SparkSession#sqlContext#createDataFrame` -> 
`SparkSession#createDataFrame`

```scala
/**
   * Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).
   *
   * group dataframes
   * since 1.3.0
   */
  def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = {
sparkSession.createDataFrame(rdd)
  }
```

5. `SparkSession#sqlContext#sessionState` -> `SparkSession#sessionState`

```scala
private[sql] def sessionState: SessionState = sparkSession.sessionState
```

6. `SparkSession#sqlContext#sharedState` -> `SparkSession#sharedState`

```scala
private[sql] def sharedState: SharedState = sparkSession.sharedState
```

7. `SparkSession#sqlContext#streams` -> `SparkSession#streams`

```scala
/**
   * Returns a `StreamingQueryManager` that allows managing all the
   * [[org.apache.spark.sql.streaming.StreamingQuery StreamingQueries]] 
active on `this` context.
   *
   * since 2.0.0
   */
  def streams: StreamingQueryManager = sparkSession.streams
```

8. `SparkSession#sqlContext#uncacheTable` -> 
`SparkSession#catalog#uncacheTable`

```scala
/**
   * Removes the specified table from the in-memory cache.
   * group cachemgmt
   * since 1.3.0
   */
  def uncacheTable(tableName: String): Unit = {
sparkSession.catalog.uncacheTable(tableName)
  }
```

### Why are the changes needed?
Decrease the nesting levels of API calls

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manually checked `SparkHiveExample`

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47210 from LuciferYang/session.sqlContext.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala |  4 ++--
 .../apache/spark/examples/sql/hive/SparkHiveExample.scala|  4 ++--
 .../apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala  |  2 +-
 .../sql/execution/streaming/FlatMapGroupsWithStateExec.scala | 12 ++--
 .../sql/execution/streaming/TransformWithStateExec.scala | 12 ++--
 .../test/scala/org/apache/spark/sql/CachedTableSuite.scala   |  4 ++--
 .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala  |  2 +-
 .../apache/spark/sql/hive/HiveParquetMetastoreSuite.scala|  2 +-
 .../org/apache/spark/sql/hive/HiveUDFDynamicLoadSuite.scala  |  2 +-
 .../spark/sql/hive/PartitionedTablePerfStatsSuite.scala  |  2 +-
 10 files changed, 23 insertions(+), 23 deletions(-)

diff --git 
a/connector/avro/src/test/scala/org/apa

(spark) branch master updated: [SPARK-48765][DEPLOY] Enhance default value evaluation for SPARK_IDENT_STRING

2024-07-01 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bc16b24c7a32 [SPARK-48765][DEPLOY] Enhance default value evaluation 
for SPARK_IDENT_STRING
bc16b24c7a32 is described below

commit bc16b24c7a328cf103b003b1c4a5cf16832cf2bd
Author: Cheng Pan 
AuthorDate: Mon Jul 1 19:49:31 2024 +0800

[SPARK-48765][DEPLOY] Enhance default value evaluation for 
SPARK_IDENT_STRING

### What changes were proposed in this pull request?

This PR follows Hadoop[1] to enhance the `SPARK_IDENT_STRING` default value 
evaluation.

[1] 
https://github.com/apache/hadoop/blob/rel/release-3.4.0/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L893-L896

### Why are the changes needed?

I found in some cases, `$USER` is not available, thus the auto-generated 
log and pid file names are strange.

For example, there is no `$USER` when login to docker
```
$ docker run -t -i ubuntu:latest
root1dbeaefd6cd4:/# echo $USER

root1dbeaefd6cd4:/# id -nu
root
root1dbeaefd6cd4:/# exit
```

### Does this PR introduce _any_ user-facing change?

Yes, affects log/pid file names.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47160 from pan3793/SPARK-48765.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 sbin/spark-daemon.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/sbin/spark-daemon.sh b/sbin/spark-daemon.sh
index 28d205f03e0f..b7233e6e9bf3 100755
--- a/sbin/spark-daemon.sh
+++ b/sbin/spark-daemon.sh
@@ -98,6 +98,10 @@ spark_rotate_log ()
 . "${SPARK_HOME}/bin/load-spark-env.sh"
 
 if [ "$SPARK_IDENT_STRING" = "" ]; then
+  # if for some reason the shell doesn't have $USER defined
+  # (e.g., ssh'd in to execute a command)
+  # let's get the effective username and use that
+  USER=${USER:-$(id -nu)}
   export SPARK_IDENT_STRING="$USER"
 fi
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48691][BUILD] Upgrade scalatest related dependencies to the 3.2.19 series

2024-06-26 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7f5f96cad224 [SPARK-48691][BUILD] Upgrade scalatest related 
dependencies to the 3.2.19 series
7f5f96cad224 is described below

commit 7f5f96cad22464e02679ab1a1c6eb08b9da039ef
Author: Wei Guo 
AuthorDate: Wed Jun 26 22:31:08 2024 +0800

[SPARK-48691][BUILD] Upgrade scalatest related dependencies to the 3.2.19 
series

### What changes were proposed in this pull request?

This PR aims to upgrade:
-  `scalatest` to 3.2.19
- `mockto` to 5.12.0
- `selenium` to 4.21
- `bytebuddy` to 1.14.17

### Why are the changes needed?

Full release notes:

scalatest: 
https://github.com/scalatest/scalatest/releases/tag/release-3.2.19
mockito:
https://github.com/mockito/mockito/releases/tag/v5.12.0
https://github.com/mockito/mockito/releases/tag/v5.11.0
selenium:

https://github.com/SeleniumHQ/selenium/compare/selenium-4.17.0...selenium-4.21.0
bytebuddy:

https://github.com/raphw/byte-buddy/compare/byte-buddy-1.14.11...byte-buddy-1.14.17

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Pass GitHub Actions

- Manual tests:

```
build/sbt clean -Dguava.version=33.0.0-jre 
-Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver 
-Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive 
-Phive-thriftserver \
"core/testOnly *HistoryServerSuite"
```
```
build/sbt clean  -Dguava.version=33.0.0-jre 
-Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver 
-Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive 
-Phive-thriftserver \
"core/testOnly *UISeleniumSuite"
```
```
build/sbt clean  -Dguava.version=33.0.0-jre 
-Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver 
-Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive 
-Phive-thriftserver \
"sql/testOnly *UISeleniumSuite"
```
```
build/sbt clean  -Dguava.version=33.0.0-jre 
-Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver 
-Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive 
-Phive-thriftserver \
"streaming/testOnly *UISeleniumSuite"
```
```
build/sbt clean  -Dguava.version=33.0.0-jre 
-Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver 
-Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive 
-Phive-thriftserver \
"hive-thriftserver/testOnly *UISeleniumSuite"
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47065 from wayneguow/upgrade_mockito.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 pom.xml | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/pom.xml b/pom.xml
index b2e7bba3ec75..5e181cc38d31 100644
--- a/pom.xml
+++ b/pom.xml
@@ -206,8 +206,8 @@
 0.16.0
 4.13.1
 1.1
-4.17.0
-4.17.0
+4.21.0
+4.21.0
 3.1.0
 1.1.0
 1.8.0
@@ -420,12 +420,12 @@
 
 
   org.scalatestplus
-  mockito-5-10_${scala.binary.version}
+  mockito-5-12_${scala.binary.version}
   test
 
 
   org.scalatestplus
-  selenium-4-17_${scala.binary.version}
+  selenium-4-21_${scala.binary.version}
   test
 
 
@@ -1156,25 +1156,25 @@
   
 org.scalatest
 scalatest_${scala.binary.version}
-3.2.18
+3.2.19
 test
   
   
 org.scalatestplus
 scalacheck-1-18_${scala.binary.version}
-3.2.18.0
+3.2.19.0
 test
   
   
 org.scalatestplus
-mockito-5-10_${scala.binary.version}
-3.2.18.0
+mockito-5-12_${scala.binary.version}
+3.2.19.0
 test
   
   
 org.scalatestplus
-selenium-4-17_${scala.binary.version}
-3.2.18.0
+selenium-4-21_${scala.binary.version}
+3.2.19.0
 test
 
   
@@ -1186,19 +1186,19 @@
   
 org.mockito
 mockito-core
-5.10.0
+5.12.0
 test
   
   
 net.bytebuddy
 byte-buddy
-1.14.11
+1.14.17
 test
   
   
 net.bytebuddy
 byte-buddy-agent
-1.14.11
+1.14.17
 test
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48724][SQL][TESTS] Fix incorrect conf settings of `ignoreCorruptFiles` related tests case in `ParquetQuerySuite`

2024-06-26 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a474b88aa2eb [SPARK-48724][SQL][TESTS] Fix incorrect conf settings of 
`ignoreCorruptFiles` related tests case in `ParquetQuerySuite`
a474b88aa2eb is described below

commit a474b88aa2ebb2af17273975f2f91584c0ce9af1
Author: Wei Guo 
AuthorDate: Wed Jun 26 19:30:52 2024 +0800

[SPARK-48724][SQL][TESTS] Fix incorrect conf settings of 
`ignoreCorruptFiles` related tests case in `ParquetQuerySuite`

### What changes were proposed in this pull request?

This PR aims to fix incorrect conf settings of `ignoreCorruptFiles` related 
tests case in `ParquetQuerySuite`. The inner `withSQLConf 
(SQLConf.IGNORE_CORRUPT_FILES.key -> "false")` will overwrite the outer 
configuration, making it impossible to test the situation where `sqlConf` is 
true.

### Why are the changes needed?

Fix test coverage logic.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47088 from wayneguow/parquet_query_suite.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 .../datasources/parquet/ParquetQuerySuite.scala| 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
index a329d3fdc3cb..4d413efe5043 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
@@ -369,16 +369,14 @@ abstract class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedS
   }
 
   withSQLConf(SQLConf.IGNORE_CORRUPT_FILES.key -> sqlConf) {
-withSQLConf(SQLConf.IGNORE_CORRUPT_FILES.key -> "false") {
-  val exception = intercept[SparkException] {
-testIgnoreCorruptFiles(options)
-  }.getCause
-  assert(exception.getMessage().contains("is not a Parquet file"))
-  val exception2 = intercept[SparkException] {
-testIgnoreCorruptFilesWithoutSchemaInfer(options)
-  }.getCause
-  assert(exception2.getMessage().contains("is not a Parquet file"))
-}
+val exception = intercept[SparkException] {
+  testIgnoreCorruptFiles(options)
+}.getCause
+assert(exception.getMessage().contains("is not a Parquet file"))
+val exception2 = intercept[SparkException] {
+  testIgnoreCorruptFilesWithoutSchemaInfer(options)
+}.getCause
+assert(exception2.getMessage().contains("is not a Parquet file"))
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1

2024-06-24 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5112e5887714 [SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1
5112e5887714 is described below

commit 5112e58877147c7fb169d2c53845ce00de127866
Author: panbingkun 
AuthorDate: Tue Jun 25 11:45:42 2024 +0800

[SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1

### What changes were proposed in this pull request?
The pr aims to upgrade rocksdbjni from `8.11.4` to `9.2.1`.

### Why are the changes needed?
The full release notes as follows:
https://github.com/facebook/rocksdb/releases/tag/v9.2.1
https://github.com/facebook/rocksdb/releases/tag/v9.1.1
https://github.com/facebook/rocksdb/releases/tag/v9.1.0
https://github.com/facebook/rocksdb/releases/tag/v9.0.1
https://github.com/facebook/rocksdb/releases/tag/v9.0.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46146 from panbingkun/test_rocksdbjni_9.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   2 +-
 pom.xml|   2 +-
 ...StoreBasicOperationsBenchmark-jdk21-results.txt | 120 ++---
 .../StateStoreBasicOperationsBenchmark-results.txt | 120 ++---
 4 files changed, 122 insertions(+), 122 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index edaf1c494d13..b99ec346e6ab 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -244,7 +244,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
 pickle/1.5//pickle-1.5.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
-rocksdbjni/8.11.4//rocksdbjni-8.11.4.jar
+rocksdbjni/9.2.1//rocksdbjni-9.2.1.jar
 scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
 scala-compiler/2.13.14//scala-compiler-2.13.14.jar
 scala-library/2.13.14//scala-library-2.13.14.jar
diff --git a/pom.xml b/pom.xml
index 85fc2aefdf90..a6dc3a60d89c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -691,7 +691,7 @@
   
 org.rocksdb
 rocksdbjni
-8.11.4
+9.2.1
   
   
 ${leveldbjni.group}
diff --git 
a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
index e563e60a8f48..6a42c7b283b7 100644
--- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
@@ -2,143 +2,143 @@
 put rows
 

 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
 AMD EPYC 7763 64-Core Processor
 putting 1 rows (1 rows to overwrite - rate 100):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
---
-In-memory   10 
12   1  1.01023.2   1.0X
-RocksDB (trackTotalNumberOfRows: true)  42 
44   2  0.24197.6   0.2X
-RocksDB (trackTotalNumberOfRows: false) 16 
17   1  0.61591.7   0.6X
+In-memory   10 
11   1  1.0 968.0   1.0X
+RocksDB (trackTotalNumberOfRows: true)  40 
42   2  0.24033.5   0.2X
+RocksDB (trackTotalNumberOfRows: false) 15 
16   1  0.71502.0   0.6X
 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
 AMD EPYC 7763 64-Core Processor
 putting 1 rows (5000 rows to overwrite - rate 50):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
-
-In-memory 10   
  11   1  1.01009.0   1.0X
-RocksDB (trackTotalNumberOfRows: true

(spark) branch master updated: [SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0

2024-06-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 67c71874bcb2 [SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0
67c71874bcb2 is described below

commit 67c71874bcb2ce6fe2f68e0e47cab72e0d37a687
Author: Wei Guo 
AuthorDate: Fri Jun 21 10:31:43 2024 +0800

[SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0

### What changes were proposed in this pull request?

This PR aims to upgrade `RoaringBitmap` to 1.1.0.

### Why are the changes needed?

There are some bug fixes in `RoaringBitmap` 1.1.0:
Fix RunContainer#contains(BitmapContainer) 
(https://github.com/RoaringBitmap/RoaringBitmap/issues/721) by LeeWorrall in 
https://github.com/RoaringBitmap/RoaringBitmap/pull/722
Fix ArrayContainer#contains(RunContainer) 
(https://github.com/RoaringBitmap/RoaringBitmap/issues/723) by LeeWorrall in 
https://github.com/RoaringBitmap/RoaringBitmap/pull/724

Full release note:
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.1.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47020 from wayneguow/upgrade_RoaringBitmap.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 
 core/benchmarks/MapStatusesConvertBenchmark-results.txt   | 8 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
index 71c13a0fc5ad..a15442496b24 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500664678 
 14  0.0   664277160.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1597   1616 
 29  0.0  1596794881.0   0.4X
-Num Maps: 5 Fetch partitions:1500  2402   2421 
 18  0.0  2401654923.0   0.3X
+Num Maps: 5 Fetch partitions:500674685 
 12  0.0   673772738.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1579   1590 
 12  0.0  1579383970.0   0.4X
+Num Maps: 5 Fetch partitions:1500  2435   2472 
 37  0.0  2434530380.0   0.3X
 
 
diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
index a7379aa0d4af..b9f36af4a653 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500699715 
 14  0.0   698750825.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1653   1676 
 36  0.0  1653453370.0   0.4X
-Num Maps: 5 Fetch partitions:1500  2580   2613 
 30  0.0  2579900318.0   0.3X
+Num Maps: 5 Fetch partitions:500703716 
 11  0.0   703103575.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1707   1723 
 14  0.0  1707060398.0   0.4X
+Num Maps

(spark) branch master updated: [SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable

2024-06-18 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 58701d811d95 [SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make 
test more stable
58701d811d95 is described below

commit 58701d811d95918ac4a73d8fb260c46ccbf25bdd
Author: Wenchen Fan 
AuthorDate: Tue Jun 18 16:10:19 2024 +0800

[SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable

### What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/45234 to make the test 
more stable by using broadcast hint.

### Why are the changes needed?

test improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47007 from cloud-fan/follow.

Authored-by: Wenchen Fan 
Signed-off-by: yangjie01 
---
 .../apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index 4e1e171c8a84..d6fd45269ce6 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -961,7 +961,7 @@ class AdaptiveQueryExecSuite
   spark.range(10).toDF("col1").createTempView("t1")
   spark.range(5).coalesce(2).toDF("col2").createTempView("t2")
   spark.range(15).toDF("col3").filter(Symbol("col3") >= 
2).createTempView("t3")
-  sql("SELECT * FROM (SELECT /*+ BROADCAST(t2) */ * FROM t1 " +
+  sql("SELECT /*+ BROADCAST(t3) */ * FROM (SELECT /*+ BROADCAST(t2) */ * 
FROM t1 " +
 "INNER JOIN t2 ON t1.col1 = t2.col2) t JOIN t3 ON t.col1 = t3.col3;")
 }
 withTempView("t1", "t2", "t3") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyException` throw out the `original` exception

2024-06-17 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a3feffdff9cd [SPARK-48585][SQL] Make `built-in` JdbcDialect's method 
`classifyException` throw out the `original` exception
a3feffdff9cd is described below

commit a3feffdff9cd17e0435ac5620731093f40d1a3bf
Author: panbingkun 
AuthorDate: Tue Jun 18 14:50:34 2024 +0800

[SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyException` 
throw out the `original` exception

### What changes were proposed in this pull request?
The pr aims to make `built-in` JdbcDialect's method classifyException throw 
out the `original` exception.

### Why are the changes needed?
As discussed in 
https://github.com/apache/spark/pull/46912#discussion_r1630876576, the 
following code:

https://github.com/apache/spark/blob/df4156aa3217cf0f58b4c6cbf33c967bb43f7155/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L746-L751
have lost the original cause of the error, let's correct it.

### Does this PR introduce _any_ user-facing change?
Yes, more accurate error conditions for end users.

### How was this patch tested?
- Manually test.
- Update existed UT & Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46937 from panbingkun/improve_JDBCTableCatalog.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 26 --
 .../apache/spark/sql/jdbc/AggregatedDialect.scala  |  3 ++-
 .../org/apache/spark/sql/jdbc/DB2Dialect.scala |  2 +-
 .../apache/spark/sql/jdbc/DatabricksDialect.scala  |  2 +-
 .../org/apache/spark/sql/jdbc/DerbyDialect.scala   |  2 +-
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  |  2 +-
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 17 ++
 .../apache/spark/sql/jdbc/MsSqlServerDialect.scala |  2 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  2 +-
 .../org/apache/spark/sql/jdbc/OracleDialect.scala  |  2 +-
 .../apache/spark/sql/jdbc/PostgresDialect.scala|  3 ++-
 .../apache/spark/sql/jdbc/SnowflakeDialect.scala   |  2 +-
 .../apache/spark/sql/jdbc/TeradataDialect.scala|  2 +-
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 16 ++---
 14 files changed, 52 insertions(+), 31 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
index c78e87d0b846..88ba00a8a1ae 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
@@ -83,14 +83,16 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
 
   def testCreateTableWithProperty(tbl: String): Unit = {}
 
-  def checkErrorFailedLoadTable(e: AnalysisException, tbl: String): Unit = {
-checkError(
+  private def checkErrorFailedJDBC(
+  e: AnalysisException,
+  errorClass: String,
+  tbl: String): Unit = {
+checkErrorMatchPVals(
   exception = e,
-  errorClass = "FAILED_JDBC.UNCLASSIFIED",
+  errorClass = errorClass,
   parameters = Map(
-"url" -> "jdbc:",
-"message" -> s"Failed to load table: $tbl"
-  )
+"url" -> "jdbc:.*",
+"tableName" -> s"`$tbl`")
 )
   }
 
@@ -132,7 +134,7 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
 val e = intercept[AnalysisException] {
   sql(s"ALTER TABLE $catalogName.not_existing_table ADD COLUMNS (C4 
STRING)")
 }
-checkErrorFailedLoadTable(e, "not_existing_table")
+checkErrorFailedJDBC(e, "FAILED_JDBC.LOAD_TABLE", "not_existing_table")
   }
 
   test("SPARK-33034: ALTER TABLE ... drop column") {
@@ -154,7 +156,7 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
 val e = intercept[AnalysisException] {
   sql(s"ALTER TABLE $catalogName.not_existing_table DROP COLUMN C1")
 }
-checkErrorFailedLoadTable(e, "not_existing_table")
+checkErrorFailedJDBC(e, "FAILED_JDBC.LOAD_TABLE", "not_existing_table")
   }
 
   test("SPARK-33034: ALTER TABLE ... update column type") {
@@ -170,7 +172,7 @@ private[v2] trait V2JDBCTest extends SharedSparkSession 
with DockerIntegrationFu
 val e = intercept[AnalysisException] {
   sql(s"ALTER TABLE $catalogName

(spark) branch master updated: [SPARK-48615][SQL] Perf improvement for parsing hex string

2024-06-16 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 257a7883f215 [SPARK-48615][SQL] Perf improvement for parsing hex string
257a7883f215 is described below

commit 257a7883f2150e037eb05f8c7a84103ad9a1
Author: Kent Yao 
AuthorDate: Mon Jun 17 09:56:05 2024 +0800

[SPARK-48615][SQL] Perf improvement for parsing hex string

### What changes were proposed in this pull request?

Currently, we use two heximal string parsing functions. One uses Apache 
Codecs Hex for X-prefixed lit parsing, and the other use builtin impl for unhex 
function. I did a benchmark for them comparing with the `java.util.HexFormat` 
which was introduced in JDK17.

```
OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
Apple M2 Max
Cardinality 100:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Apache 5050   5100  
86  0.25050.1   1.0X
Spark  3822   3840  
30  0.33821.6   1.3X
Java   2462   2522  
87  0.42462.1   2.1X

OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
Apple M2 Max
Cardinality 200:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Apache10020  10828  
  1154  0.25010.1   1.0X
Spark  6875   6966  
   144  0.33437.7   1.5X
Java   4999   5092  
89  0.42499.3   2.0X

OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
Apple M2 Max
Cardinality 400:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Apache20090  20433  
   433  0.25022.5   1.0X
Spark 13389  13620  
   229  0.33347.2   1.5X
Java  10023  10069  
42  0.42505.6   2.0X

OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
Apple M2 Max
Cardinality 800:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Apache40277  43453  
  2755  0.25034.7   1.0X
Spark 27145  27380  
   311  0.33393.1   1.5X
Java  19980  21198  
  1473  0.42497.5   2.0X
```

The results indicate that the speed is Apache Codecs < builtin < Java, 
increasing by ~50%.

In this PR, we replace these two with the Java 17 API

### Why are the changes needed?

performance enhance

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

benchmarking

existing unit tests in 
org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46972 from yaooqinn/SPARK-48615.

Authored-by: Kent Yao 
Signed-off-by: yangjie01 
---
 .../benchmarks/HexBenchmark-jdk21-results.txt  | 14 
 sql/catalyst/benchmarks/HexBenchmark-results.txt   | 14 
 .../sql/catalyst/expressions/mathExpressions.scala | 94 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  7 +-
 .../sql/catalyst/expressions/HexBenchmark.scala| 90 +
 5 files changed, 158 insertions(+), 61 deletions(-)

diff --git a/sql/catalyst/benchmarks/HexBenchmark-jdk21-results.txt 
b/sql/catalyst/benchmarks/HexBenchmark-jdk21-results.txt
new file mode 100644
index ..afa3efa7a919
--- /dev/null
+++ b/sql/ca

(spark) branch master updated: [SPARK-48626][CORE] Change the scope of object LogKeys as private in Spark

2024-06-14 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 878de0014a37 [SPARK-48626][CORE] Change the scope of object LogKeys as 
private in Spark
878de0014a37 is described below

commit 878de0014a3782187180c40158f0805e51335cb5
Author: Gengliang Wang 
AuthorDate: Fri Jun 14 15:08:41 2024 +0800

[SPARK-48626][CORE] Change the scope of object LogKeys as private in Spark

### What changes were proposed in this pull request?

Change the scope of object LogKeys as private in Spark.
### Why are the changes needed?

LogKeys are internal and developing. Making it private can avoid future 
confusion or compiling failures.
This is suggested by pan3793 in 
https://github.com/apache/spark/pull/46947#issuecomment-2167164424
### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Existing UT
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46983 from gengliangwang/changeScope.

Authored-by: Gengliang Wang 
Signed-off-by: yangjie01 
---
 common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index b8b63382fe4c..ec621c4f84ce 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -57,7 +57,7 @@ trait LogKey {
  * Various keys used for mapped diagnostic contexts(MDC) in logging. All 
structured logging keys
  * should be defined here for standardization.
  */
-object LogKeys {
+private[spark] object LogKeys {
   case object ACCUMULATOR_ID extends LogKey
   case object ACL_ENABLED extends LogKey
   case object ACTUAL_NUM_FILES extends LogKey


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48612][SQL][SS] Cleanup deprecated api usage related to commons-pool2

2024-06-13 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 157b1e3ba5d5 [SPARK-48612][SQL][SS] Cleanup deprecated api usage 
related to commons-pool2
157b1e3ba5d5 is described below

commit 157b1e3ba5d5d5e75eb79805eaa3ea14fa876f5b
Author: yangjie01 
AuthorDate: Fri Jun 14 12:36:22 2024 +0800

[SPARK-48612][SQL][SS] Cleanup deprecated api usage related to commons-pool2

### What changes were proposed in this pull request?
This pr make the following changes

- o.a.c.pool2.impl.BaseObjectPoolConfig#setMinEvictableIdleTime -> 
o.a.c.pool2.impl.BaseObjectPoolConfig#setMinEvictableIdleDuration
- o.a.c.pool2.impl.BaseObjectPoolConfig#setSoftMinEvictableIdleTime -> 
o.a.c.pool2.impl.BaseObjectPoolConfig#setSoftMinEvictableIdleDuration

to fix the following compilation warnings related to 'commons-pool2':

```
[WARNING] [Warn] 
/Users/yangjie01/SourceCode/git/spark-mine-13/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/InternalKafkaConsumerPool.scala:186:
 method setMinEvictableIdleTime in class BaseObjectPoolConfig is deprecated
Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool.PoolConfig.init,
 
origin=org.apache.commons.pool2.impl.BaseObjectPoolConfig.setMinEvictableIdleTime
[WARNING] [Warn] 
/Users/yangjie01/SourceCode/git/spark-mine-13/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/InternalKafkaConsumerPool.scala:187:
 method setSoftMinEvictableIdleTime in class BaseObjectPoolConfig is deprecated
Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool.PoolConfig.init,
 
origin=org.apache.commons.pool2.impl.BaseObjectPoolConfig.setSoftMinEvictableIdleTime
```

The fix refers to:

- 
https://github.com/apache/commons-pool/blob/e5c44f5184a55a58fef4a1efec8124d162a348bd/src/main/java/org/apache/commons/pool2/impl/BaseObjectPoolConfig.java#L765-L789
- 
https://github.com/apache/commons-pool/blob/e5c44f5184a55a58fef4a1efec8124d162a348bd/src/main/java/org/apache/commons/pool2/impl/BaseObjectPoolConfig.java#L815-L839

```java
/**
 * Sets the value for the {code minEvictableIdleTime} configuration 
attribute for pools created with this configuration instance.
 *
 * param minEvictableIdleTime The new setting of {code 
minEvictableIdleTime} for this configuration instance
 * see GenericObjectPool#getMinEvictableIdleDuration()
 * see GenericKeyedObjectPool#getMinEvictableIdleDuration()
 * since 2.10.0
 * deprecated Use {link #setMinEvictableIdleDuration(Duration)}.
 */
Deprecated
public void setMinEvictableIdleTime(final Duration 
minEvictableIdleTime) {
this.minEvictableIdleDuration = 
PoolImplUtils.nonNull(minEvictableIdleTime, DEFAULT_MIN_EVICTABLE_IDLE_TIME);
}

/**
 * Sets the value for the {code minEvictableIdleTime} configuration 
attribute for pools created with this configuration instance.
 *
 * param minEvictableIdleTime The new setting of {code 
minEvictableIdleTime} for this configuration instance
 * see GenericObjectPool#getMinEvictableIdleDuration()
 * see GenericKeyedObjectPool#getMinEvictableIdleDuration()
 * since 2.12.0
 */
public void setMinEvictableIdleDuration(final Duration 
minEvictableIdleTime) {
this.minEvictableIdleDuration = 
PoolImplUtils.nonNull(minEvictableIdleTime, DEFAULT_MIN_EVICTABLE_IDLE_TIME);
}

/**
 * Sets the value for the {code softMinEvictableIdleTime} configuration 
attribute for pools created with this configuration instance.
 *
 * param softMinEvictableIdleTime The new setting of {code 
softMinEvictableIdleTime} for this configuration instance
 * see GenericObjectPool#getSoftMinEvictableIdleDuration()
 * see GenericKeyedObjectPool#getSoftMinEvictableIdleDuration()
 * since 2.10.0
 * deprecated Use {link #setSoftMinEvictableIdleDuration(Duration)}.
 */
Deprecated
public void setSoftMinEvictableIdleTime(final Duration 
softMinEvictableIdleTime) {
this.softMinEvictableIdleDuration = 
PoolImplUtils.nonNull(softMinEvictableIdleTime, 
DEFAULT_SOFT_MIN_EVICTABLE_IDLE_TIME);
}

/**
 * Sets the value for the {code softMinEvictableIdleTime} configuration 
attribute for pools created with this configuration instance.
 *
 * param softMinEvictableIdleTime The new setting of {code 
softMinEvictableId

(spark) branch master updated: [SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` where `LazyList.force` is called

2024-06-13 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 75fff90d2618 [SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` 
where `LazyList.force` is called
75fff90d2618 is described below

commit 75fff90d2618617a66b9a3311792c8b16e8e
Author: yangjie01 
AuthorDate: Fri Jun 14 12:30:44 2024 +0800

[SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` where 
`LazyList.force` is called

### What changes were proposed in this pull request?
Refer to the suggestion of 
https://github.com/apache/spark/pull/43563#pullrequestreview-2114900378, this 
pr add handling for Stream where LazyList.force is called

### Why are the changes needed?
Even though `Stream` is deprecated in 2.13, it is not _removed_ and thus is 
is possible that some parts of Spark / Catalyst (or third-party code) might 
continue to pass around `Stream` instances. Hence, we should restore the call 
to `Stream.force` where `.force` is called on `LazyList`, to avoid losing the 
eager materialization for Streams that happen to flow to these call sites. This 
is also a guarantee of compatibility.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add some new tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46970 from LuciferYang/SPARK-45685-FOLLOWUP.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  4 +++-
 .../apache/spark/sql/catalyst/trees/TreeNode.scala | 13 ---
 .../sql/catalyst/plans/LogicalPlanSuite.scala  | 22 ++
 .../spark/sql/catalyst/trees/TreeNodeSuite.scala   | 27 ++
 .../sql/execution/WholeStageCodegenExec.scala  |  4 +++-
 .../apache/spark/sql/execution/PlannerSuite.scala  |  8 +++
 .../sql/execution/WholeStageCodegenSuite.scala | 10 
 7 files changed, 83 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index bc0ca31dc635..c9c8fdb676b2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -226,12 +226,14 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]
   }
 }
 
+@scala.annotation.nowarn("cat=deprecation")
 def recursiveTransform(arg: Any): AnyRef = arg match {
   case e: Expression => transformExpression(e)
   case Some(value) => Some(recursiveTransform(value))
   case m: Map[_, _] => m
   case d: DataType => d // Avoid unpacking Structs
-  case stream: LazyList[_] => stream.map(recursiveTransform).force
+  case stream: Stream[_] => stream.map(recursiveTransform).force
+  case lazyList: LazyList[_] => lazyList.map(recursiveTransform).force
   case seq: Iterable[_] => seq.map(recursiveTransform)
   case other: AnyRef => other
   case null => null
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index 23d26854a767..6683f2dbfb39 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.trees
 
 import java.util.UUID
 
+import scala.annotation.nowarn
 import scala.collection.{mutable, Map}
 import scala.jdk.CollectionConverters._
 import scala.reflect.ClassTag
@@ -378,12 +379,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]
   case nonChild: AnyRef => nonChild
   case null => null
 }
+@nowarn("cat=deprecation")
 val newArgs = mapProductIterator {
   case s: StructType => s // Don't convert struct types to some other type 
of Seq[StructField]
   // Handle Seq[TreeNode] in TreeNode parameters.
-  case s: LazyList[_] =>
-// LazyList is lazy so we need to force materialization
+  case s: Stream[_] =>
+// Stream is lazy so we need to force materialization
 s.map(mapChild).force
+  case l: LazyList[_] =>
+// LazyList is lazy so we need to force materialization
+l.map(mapChild).force
   case s: Seq[_] =>
 s.map(mapChild)
   case m: Map[_, _] =>
@@ -801,6 +806,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]
   case other => other
 }
 
+@nowarn("cat=deprecation")
 val newArgs =

(spark) branch master updated: [SPARK-48604][SQL] Replace deprecated `new ArrowType.Decimal(precision, scale)` method call

2024-06-13 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 08e741b92b8f [SPARK-48604][SQL] Replace deprecated `new 
ArrowType.Decimal(precision, scale)` method call
08e741b92b8f is described below

commit 08e741b92b8fc9e43c838d0849317916218414ce
Author: Wei Guo 
AuthorDate: Thu Jun 13 18:11:30 2024 +0800

[SPARK-48604][SQL] Replace deprecated `new ArrowType.Decimal(precision, 
scale)` method call

### What changes were proposed in this pull request?

This pr replaces deprecated classes and methods of `arrow-vector` called in 
Spark:

- `Decimal(int precision, int scale)` -> `Decimal(
  JsonProperty("precision") int precision,
  JsonProperty("scale") int scale,
  JsonProperty("bitWidth") int bitWidth
)`

All `arrow-vector` related Spark classes, I made a double check, only in 
`ArrowUtils` there is a deprecated  method call.
### Why are the changes needed?

Clean up deprecated API usage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Passed GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46961 from wayneguow/deprecated_arrow.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala
index d9bd3b0e612b..6852fe09ef96 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala
@@ -51,7 +51,7 @@ private[sql] object ArrowUtils {
 case BinaryType if !largeVarTypes => ArrowType.Binary.INSTANCE
 case _: StringType if largeVarTypes => ArrowType.LargeUtf8.INSTANCE
 case BinaryType if largeVarTypes => ArrowType.LargeBinary.INSTANCE
-case DecimalType.Fixed(precision, scale) => new 
ArrowType.Decimal(precision, scale)
+case DecimalType.Fixed(precision, scale) => new 
ArrowType.Decimal(precision, scale, 8 * 16)
 case DateType => new ArrowType.Date(DateUnit.DAY)
 case TimestampType if timeZoneId == null =>
   throw SparkException.internalError("Missing timezoneId where it is 
mandatory.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (78fd4e3301ff -> b8c7aee12f02)

2024-06-12 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 78fd4e3301ff [SPARK-48584][SQL][FOLLOWUP] Improve the unescapePathName
 add b8c7aee12f02 [SPARK-48609][BUILD] Upgrade `scala-xml` to 2.3.0

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of `commons-io` called in Spark

2024-06-12 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd045c9887fe [SPARK-48583][SQL][TESTS] Replace deprecated classes and 
methods of `commons-io` called in Spark
fd045c9887fe is described below

commit fd045c9887feabc37c0f15fa41c860847f5fffa0
Author: Wei Guo 
AuthorDate: Thu Jun 13 11:03:45 2024 +0800

[SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of 
`commons-io` called in Spark

### What changes were proposed in this pull request?

This pr replaces deprecated classes and methods of `commons-io`  called in 
Spark:

- `writeStringToFile(final File file, final String data)` -> 
`writeStringToFile(final File file, final String data, final Charset charset)`
- `CountingInputStream` -> `BoundedInputStream`

### Why are the changes needed?

Clean up deprecated API usage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Passed related test cases in `UDFXPathUtilSuite` and `XmlSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46935 from wayneguow/deprecated.

Authored-by: Wei Guo 
Signed-off-by: yangjie01 
---
 .../spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala |  3 ++-
 .../spark/sql/execution/datasources/xml/XmlInputFormat.scala   | 10 ++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala
index a8dc2b20f56d..8351e94c0c36 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.catalyst.expressions.xml
 
+import java.nio.charset.StandardCharsets
 import javax.xml.xpath.XPathConstants.STRING
 
 import org.w3c.dom.Node
@@ -85,7 +86,7 @@ class UDFXPathUtilSuite extends SparkFunSuite {
 tempFile.deleteOnExit()
 val fname = tempFile.getAbsolutePath
 
-FileUtils.writeStringToFile(tempFile, secretValue)
+FileUtils.writeStringToFile(tempFile, secretValue, StandardCharsets.UTF_8)
 
 val xml =
   s"""
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala
index 4359ac02f5f5..6169cec6f821 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala
@@ -20,7 +20,7 @@ import java.io.{InputStream, InputStreamReader, IOException, 
Reader}
 import java.nio.ByteBuffer
 import java.nio.charset.Charset
 
-import org.apache.commons.io.input.CountingInputStream
+import org.apache.commons.io.input.BoundedInputStream
 import org.apache.hadoop.fs.Seekable
 import org.apache.hadoop.io.{LongWritable, Text}
 import org.apache.hadoop.io.compress._
@@ -67,7 +67,7 @@ private[xml] class XmlRecordReader extends 
RecordReader[LongWritable, Text] {
   private var end: Long = _
   private var reader: Reader = _
   private var filePosition: Seekable = _
-  private var countingIn: CountingInputStream = _
+  private var countingIn: BoundedInputStream = _
   private var readerLeftoverCharFn: () => Boolean = _
   private var readerByteBuffer: ByteBuffer = _
   private var decompressor: Decompressor = _
@@ -117,7 +117,9 @@ private[xml] class XmlRecordReader extends 
RecordReader[LongWritable, Text] {
   }
 } else {
   fsin.seek(start)
-  countingIn = new CountingInputStream(fsin)
+  countingIn = BoundedInputStream.builder()
+.setInputStream(fsin)
+.get()
   in = countingIn
   // don't use filePosition in this case. We have to count bytes read 
manually
 }
@@ -156,7 +158,7 @@ private[xml] class XmlRecordReader extends 
RecordReader[LongWritable, Text] {
 if (filePosition != null) {
   filePosition.getPos
 } else {
-  start + countingIn.getByteCount -
+  start + countingIn.getCount -
 readerByteBuffer.remaining() -
 (if (readerLeftoverCharFn()) 1 else 0)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (53d65fd12dd9 -> 452c1b64b625)

2024-06-11 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 53d65fd12dd9 [SPARK-48565][UI] Fix thread dump display in UI
 add 452c1b64b625 [SPARK-48551][SQL] Perf improvement for escapePathName

No new revisions were added by this update.

Summary of changes:
 .../EscapePathBenchmark-jdk21-results.txt} |  9 +--
 .../benchmarks/EscapePathBenchmark-results.txt}|  9 +--
 .../catalyst/catalog/ExternalCatalogUtils.scala| 42 
 .../spark/sql/catalyst/EscapePathBenchmark.scala   | 74 ++
 .../catalog/ExternalCatalogUtilsSuite.scala| 42 
 5 files changed, 154 insertions(+), 22 deletions(-)
 copy sql/{core/benchmarks/HashedRelationMetricsBenchmark-results.txt => 
catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt} (51%)
 copy sql/{core/benchmarks/HashedRelationMetricsBenchmark-results.txt => 
catalyst/benchmarks/EscapePathBenchmark-results.txt} (51%)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/EscapePathBenchmark.scala
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtilsSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48563][BUILD] Upgrade `pickle` to 1.5

2024-06-10 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3fe6abde125b [SPARK-48563][BUILD] Upgrade `pickle` to 1.5
3fe6abde125b is described below

commit 3fe6abde125b7c34437a3f72d17ee97d9653c218
Author: yangjie01 
AuthorDate: Tue Jun 11 10:36:32 2024 +0800

[SPARK-48563][BUILD] Upgrade `pickle` to 1.5

### What changes were proposed in this pull request?
This pr aims upgrade `pickle` from 1.3 to 1.5.

### Why are the changes needed?
The new version include a new  fix related to [empty bytes object 
construction](https://github.com/irmen/pickle/commit/badc8fe08c9e47b87df66b8a16c67010e3614e35)

All changes from 1.3 to 1.5 are as follows:

- https://github.com/irmen/pickle/compare/pickle-1.3...pickle-1.5

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46913 from LuciferYang/pickle-1.5.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 core/pom.xml  | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/pom.xml b/core/pom.xml
index 7413ad0d3393..adb1b3034b42 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -399,7 +399,7 @@
 
   net.razorvine
   pickle
-  1.3
+  1.5
 
 
   net.sf.py4j
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 8ab76b5787b8..4585b534e908 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -241,7 +241,7 @@ parquet-encoding/1.13.1//parquet-encoding-1.13.1.jar
 parquet-format-structures/1.13.1//parquet-format-structures-1.13.1.jar
 parquet-hadoop/1.13.1//parquet-hadoop-1.13.1.jar
 parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
-pickle/1.3//pickle-1.3.jar
+pickle/1.5//pickle-1.5.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
 rocksdbjni/8.11.4//rocksdbjni-8.11.4.jar


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE

2024-06-04 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b88f5ae10cc [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and 
attach its NOTICE
8b88f5ae10cc is described below

commit 8b88f5ae10cc676a9778c186b12c691fa913088d
Author: Kent Yao 
AuthorDate: Tue Jun 4 21:33:01 2024 +0800

[SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE

### What changes were proposed in this pull request?

Update Stream Library to 2.9.8 and attach its NOTICE

### Why are the changes needed?

update dep and notice file

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46861 from yaooqinn/SPARK-48522.

Authored-by: Kent Yao 
Signed-off-by: yangjie01 
---
 NOTICE-binary | 9 +
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/NOTICE-binary b/NOTICE-binary
index c82d0b52f31c..c4cfe0e9f8b3 100644
--- a/NOTICE-binary
+++ b/NOTICE-binary
@@ -33,11 +33,12 @@ services.
 // Version 2.0, in this case for
 // --
 
-Hive Beeline
-Copyright 2016 The Apache Software Foundation
+=== NOTICE FOR com.clearspring.analytics:streams ===
+stream-api
+Copyright 2016 AddThis
 
-This product includes software developed at
-The Apache Software Foundation (http://www.apache.org/).
+This product includes software developed by AddThis.
+=== END OF NOTICE FOR com.clearspring.analytics:streams ===
 
 Apache Avro
 Copyright 2009-2014 The Apache Software Foundation
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 3d8ffee05d3a..acb236e1c4e0 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -262,7 +262,7 @@ spire-platform_2.13/0.18.0//spire-platform_2.13-0.18.0.jar
 spire-util_2.13/0.18.0//spire-util_2.13-0.18.0.jar
 spire_2.13/0.18.0//spire_2.13-0.18.0.jar
 stax-api/1.0.1//stax-api-1.0.1.jar
-stream/2.9.6//stream-2.9.6.jar
+stream/2.9.8//stream-2.9.8.jar
 super-csv/2.2.0//super-csv-2.2.0.jar
 threeten-extra/1.7.1//threeten-extra-1.7.1.jar
 tink/1.13.0//tink-1.13.0.jar
diff --git a/pom.xml b/pom.xml
index ce3b4041ae57..bd384e42b0ec 100644
--- a/pom.xml
+++ b/pom.xml
@@ -806,7 +806,7 @@
   
 com.clearspring.analytics
 stream
-2.9.6
+2.9.8
 
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48506][CORE] Compression codec short names are case insensitive except for event logging

2024-06-04 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f4afa2215a1a [SPARK-48506][CORE] Compression codec short names are 
case insensitive except for event logging
f4afa2215a1a is described below

commit f4afa2215a1a390d9f099a26155fbefc5beefbe9
Author: Kent Yao 
AuthorDate: Tue Jun 4 20:33:51 2024 +0800

[SPARK-48506][CORE] Compression codec short names are case insensitive 
except for event logging

### What changes were proposed in this pull request?

Compression codec short names, e.g. map statuses, broadcasts, shuffle, 
parquet/orc/avro outputs, are case insensitive except for event logging. 
Calling `org.apache.spark.io.CompressionCodec.getShortName` causes this issue.

In this PR, we make `CompressionCodec.getShortName` handle case sensitivity 
correctly.

### Why are the changes needed?

Feature parity

### Does this PR introduce _any_ user-facing change?

Yes, spark.eventLog.compression.codec now accepts not only the lowercased 
form of lz4, lzf, snappy, and zstd, but also forms with any of the characters 
to be upcased。

### How was this patch tested?

new tests
### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46847 from yaooqinn/SPARK-48506.

Authored-by: Kent Yao 
Signed-off-by: yangjie01 
---
 .../main/scala/org/apache/spark/io/CompressionCodec.scala |  5 +++--
 .../scala/org/apache/spark/io/CompressionCodecSuite.scala | 15 +++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala 
b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
index 7d5a86d1a81d..233228a9c6d4 100644
--- a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
+++ b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
@@ -101,8 +101,9 @@ private[spark] object CompressionCodec {
* If it is already a short name, just return it.
*/
   def getShortName(codecName: String): String = {
-if (shortCompressionCodecNames.contains(codecName)) {
-  codecName
+val lowercasedCodec = codecName.toLowerCase(Locale.ROOT)
+if (shortCompressionCodecNames.contains(lowercasedCodec)) {
+  lowercasedCodec
 } else {
   shortCompressionCodecNames
 .collectFirst { case (k, v) if v == codecName => k }
diff --git 
a/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala 
b/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala
index 729fcecff120..5c09a1f965b9 100644
--- a/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala
+++ b/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.io
 
 import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
+import java.util.Locale
 
 import com.google.common.io.ByteStreams
 
@@ -160,4 +161,18 @@ class CompressionCodecSuite extends SparkFunSuite {
 ByteStreams.readFully(concatenatedBytes, decompressed)
 assert(decompressed.toSeq === (0 to 127))
   }
+
+  test("SPARK-48506: CompressionCodec getShortName is case insensitive for 
short names") {
+CompressionCodec.shortCompressionCodecNames.foreach { case (shortName, 
codecClass) =>
+  assert(CompressionCodec.getShortName(shortName) === shortName)
+  assert(CompressionCodec.getShortName(shortName.toUpperCase(Locale.ROOT)) 
=== shortName)
+  assert(CompressionCodec.getShortName(codecClass) === shortName)
+  checkError(
+exception = intercept[SparkIllegalArgumentException] {
+  CompressionCodec.getShortName(codecClass.toUpperCase(Locale.ROOT))
+},
+errorClass = "CODEC_SHORT_NAME_NOT_FOUND",
+parameters = Map("codecName" -> codecClass.toUpperCase(Locale.ROOT)))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48519][BUILD] Upgrade jetty to 11.0.21

2024-06-04 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d273fdf37bc2 [SPARK-48519][BUILD] Upgrade jetty to 11.0.21
d273fdf37bc2 is described below

commit d273fdf37bc291aadf8677305bda2a91b593219f
Author: yangjie01 
AuthorDate: Tue Jun 4 19:08:40 2024 +0800

[SPARK-48519][BUILD] Upgrade jetty to 11.0.21

### What changes were proposed in this pull request?
This pr aims to upgrade jetty from 11.0.20 to 11.0.21.

### Why are the changes needed?
The new version bring some bug fix like [Reduce ByteBuffer churning in 
HttpOutput](https://github.com/jetty/jetty.project/commit/fe94c9f8a40df49021b28280f708448870c5b420).
 The full release notes as follows:
- https://github.com/jetty/jetty.project/releases/tag/jetty-11.0.21

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46843 from LuciferYang/jetty-11.0.21.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 65e627b1854f..3d8ffee05d3a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -137,8 +137,8 @@ 
jersey-container-servlet/3.0.12//jersey-container-servlet-3.0.12.jar
 jersey-hk2/3.0.12//jersey-hk2-3.0.12.jar
 jersey-server/3.0.12//jersey-server-3.0.12.jar
 jettison/1.5.4//jettison-1.5.4.jar
-jetty-util-ajax/11.0.20//jetty-util-ajax-11.0.20.jar
-jetty-util/11.0.20//jetty-util-11.0.20.jar
+jetty-util-ajax/11.0.21//jetty-util-ajax-11.0.21.jar
+jetty-util/11.0.21//jetty-util-11.0.21.jar
 jline/2.14.6//jline-2.14.6.jar
 jline/3.25.1//jline-3.25.1.jar
 jna/5.14.0//jna-5.14.0.jar
diff --git a/pom.xml b/pom.xml
index ded8cc2405fd..ce3b4041ae57 100644
--- a/pom.xml
+++ b/pom.xml
@@ -140,7 +140,7 @@
 1.13.1
 2.0.1
 shaded-protobuf
-11.0.20
+11.0.21
 5.0.0
 
 4.0.1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (8d534c048866 -> 9270931221d4)

2024-06-02 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8d534c048866 [SPARK-48487][INFRA] Update License & Notice according to 
the dependency changes
 add 9270931221d4 [SPARK-48433][BUILD] Upgrade `checkstyle` to 10.17.0

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 project/plugins.sbt | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts

2024-05-31 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 11c06fcbf2e6 [SPARK-48484][SQL] Fix: V2Write use the same 
TaskAttemptId for different task attempts
11c06fcbf2e6 is described below

commit 11c06fcbf2e62e870c758cedcd386ba2d539352d
Author: jackylee-ch 
AuthorDate: Fri May 31 22:37:49 2024 +0800

[SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different 
task attempts

### What changes were proposed in this pull request?
After #40064 , we always get the same TaskAttemptId for different task 
attempts which has the same partitionId. This would lead different task 
attempts write to the same directory.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46811 from 
jackylee-ch/fix_v2write_use_same_directories_for_different_task_attempts.

Lead-authored-by: jackylee-ch 
Co-authored-by: Kent Yao 
Signed-off-by: yangjie01 
(cherry picked from commit 67d11b1992aaa100d0e1fa30b0e5c33684c93a89)
Signed-off-by: yangjie01 
---
 .../datasources/v2/FileWriterFactory.scala |  8 ++--
 .../datasources/v2/FileWriterFactorySuite.scala| 48 ++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
index 4b1a099d3bac..f18424b4bcb8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
@@ -38,7 +38,7 @@ case class FileWriterFactory (
   @transient private lazy val jobId = 
SparkHadoopWriterUtils.createJobID(jobTrackerID, 0)
 
   override def createWriter(partitionId: Int, realTaskId: Long): 
DataWriter[InternalRow] = {
-val taskAttemptContext = createTaskAttemptContext(partitionId)
+val taskAttemptContext = createTaskAttemptContext(partitionId, 
realTaskId.toInt & Int.MaxValue)
 committer.setupTask(taskAttemptContext)
 if (description.partitionColumns.isEmpty) {
   new SingleDirectoryDataWriter(description, taskAttemptContext, committer)
@@ -47,9 +47,11 @@ case class FileWriterFactory (
 }
   }
 
-  private def createTaskAttemptContext(partitionId: Int): 
TaskAttemptContextImpl = {
+  private def createTaskAttemptContext(
+  partitionId: Int,
+  realTaskId: Int): TaskAttemptContextImpl = {
 val taskId = new TaskID(jobId, TaskType.MAP, partitionId)
-val taskAttemptId = new TaskAttemptID(taskId, 0)
+val taskAttemptId = new TaskAttemptID(taskId, realTaskId)
 // Set up the configuration object
 val hadoopConf = description.serializableHadoopConf.value
 hadoopConf.set("mapreduce.job.id", jobId.toString)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
new file mode 100644
index ..bd2030797441
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.mockito.Mockito._
+import org.scalatest.PrivateMethodTester
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.execution.datasources.WriteJobDescription
+import org.apache.spark.util.SerializableConfiguration
+
+class FileWriterFactorySuite extends SparkFunSuite with PrivateMethodTester

(spark) branch branch-3.5 updated: [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts

2024-05-31 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7d39000f809a [SPARK-48484][SQL] Fix: V2Write use the same 
TaskAttemptId for different task attempts
7d39000f809a is described below

commit 7d39000f809a117d2ef9e73e46697704e45ba262
Author: jackylee-ch 
AuthorDate: Fri May 31 22:37:49 2024 +0800

[SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different 
task attempts

### What changes were proposed in this pull request?
After #40064 , we always get the same TaskAttemptId for different task 
attempts which has the same partitionId. This would lead different task 
attempts write to the same directory.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46811 from 
jackylee-ch/fix_v2write_use_same_directories_for_different_task_attempts.

Lead-authored-by: jackylee-ch 
Co-authored-by: Kent Yao 
Signed-off-by: yangjie01 
(cherry picked from commit 67d11b1992aaa100d0e1fa30b0e5c33684c93a89)
Signed-off-by: yangjie01 
---
 .../datasources/v2/FileWriterFactory.scala |  8 ++--
 .../datasources/v2/FileWriterFactorySuite.scala| 48 ++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
index 4b1a099d3bac..f18424b4bcb8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
@@ -38,7 +38,7 @@ case class FileWriterFactory (
   @transient private lazy val jobId = 
SparkHadoopWriterUtils.createJobID(jobTrackerID, 0)
 
   override def createWriter(partitionId: Int, realTaskId: Long): 
DataWriter[InternalRow] = {
-val taskAttemptContext = createTaskAttemptContext(partitionId)
+val taskAttemptContext = createTaskAttemptContext(partitionId, 
realTaskId.toInt & Int.MaxValue)
 committer.setupTask(taskAttemptContext)
 if (description.partitionColumns.isEmpty) {
   new SingleDirectoryDataWriter(description, taskAttemptContext, committer)
@@ -47,9 +47,11 @@ case class FileWriterFactory (
 }
   }
 
-  private def createTaskAttemptContext(partitionId: Int): 
TaskAttemptContextImpl = {
+  private def createTaskAttemptContext(
+  partitionId: Int,
+  realTaskId: Int): TaskAttemptContextImpl = {
 val taskId = new TaskID(jobId, TaskType.MAP, partitionId)
-val taskAttemptId = new TaskAttemptID(taskId, 0)
+val taskAttemptId = new TaskAttemptID(taskId, realTaskId)
 // Set up the configuration object
 val hadoopConf = description.serializableHadoopConf.value
 hadoopConf.set("mapreduce.job.id", jobId.toString)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
new file mode 100644
index ..bd2030797441
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.mockito.Mockito._
+import org.scalatest.PrivateMethodTester
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.execution.datasources.WriteJobDescription
+import org.apache.spark.util.SerializableConfiguration
+
+class FileWriterFactorySuite extends SparkFunSuite with PrivateMethodTester

(spark) branch master updated: [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts

2024-05-31 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 67d11b1992aa [SPARK-48484][SQL] Fix: V2Write use the same 
TaskAttemptId for different task attempts
67d11b1992aa is described below

commit 67d11b1992aaa100d0e1fa30b0e5c33684c93a89
Author: jackylee-ch 
AuthorDate: Fri May 31 22:37:49 2024 +0800

[SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different 
task attempts

### What changes were proposed in this pull request?
After #40064 , we always get the same TaskAttemptId for different task 
attempts which has the same partitionId. This would lead different task 
attempts write to the same directory.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46811 from 
jackylee-ch/fix_v2write_use_same_directories_for_different_task_attempts.

Lead-authored-by: jackylee-ch 
Co-authored-by: Kent Yao 
Signed-off-by: yangjie01 
---
 .../datasources/v2/FileWriterFactory.scala |  8 ++--
 .../datasources/v2/FileWriterFactorySuite.scala| 48 ++
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
index 4b1a099d3bac..f18424b4bcb8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala
@@ -38,7 +38,7 @@ case class FileWriterFactory (
   @transient private lazy val jobId = 
SparkHadoopWriterUtils.createJobID(jobTrackerID, 0)
 
   override def createWriter(partitionId: Int, realTaskId: Long): 
DataWriter[InternalRow] = {
-val taskAttemptContext = createTaskAttemptContext(partitionId)
+val taskAttemptContext = createTaskAttemptContext(partitionId, 
realTaskId.toInt & Int.MaxValue)
 committer.setupTask(taskAttemptContext)
 if (description.partitionColumns.isEmpty) {
   new SingleDirectoryDataWriter(description, taskAttemptContext, committer)
@@ -47,9 +47,11 @@ case class FileWriterFactory (
 }
   }
 
-  private def createTaskAttemptContext(partitionId: Int): 
TaskAttemptContextImpl = {
+  private def createTaskAttemptContext(
+  partitionId: Int,
+  realTaskId: Int): TaskAttemptContextImpl = {
 val taskId = new TaskID(jobId, TaskType.MAP, partitionId)
-val taskAttemptId = new TaskAttemptID(taskId, 0)
+val taskAttemptId = new TaskAttemptID(taskId, realTaskId)
 // Set up the configuration object
 val hadoopConf = description.serializableHadoopConf.value
 hadoopConf.set("mapreduce.job.id", jobId.toString)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
new file mode 100644
index ..bd2030797441
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.mockito.Mockito._
+import org.scalatest.PrivateMethodTester
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.execution.datasources.WriteJobDescription
+import org.apache.spark.util.SerializableConfiguration
+
+class FileWriterFactorySuite extends SparkFunSuite with PrivateMethodTester {
+
+  test("SPARK-48484: V2Write uses different TaskAttemptIds for different task 
attempts&quo

(spark) branch master updated: [SPARK-47361][SQL] Derby: Calculate suitable precision and scale for DECIMAL type

2024-05-30 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 69afd4be9c93 [SPARK-47361][SQL] Derby: Calculate suitable precision 
and scale for DECIMAL type
69afd4be9c93 is described below

commit 69afd4be9c93cb31a840b969ed1984c0b6b92f8e
Author: Kent Yao 
AuthorDate: Thu May 30 17:31:28 2024 +0800

[SPARK-47361][SQL] Derby: Calculate suitable precision and scale for 
DECIMAL type

### What changes were proposed in this pull request?

When storing `decimal(p, s)` to derby, if `p > 31`, `s` is wrongly 
hardcoded to `5` which is the assumed default scale of derby decimal. Actually, 
0 is the default scale, 5 is the default precision 
https://db.apache.org/derby/docs/10.13/ref/rrefsqlj15260.html

This PR calculates a suitable scale to make room for precision.

### Why are the changes needed?

avoid precision loss

### Does this PR introduce _any_ user-facing change?

Yes, but derby is rare in production environments, and the new mapping are 
compatible for most usecases

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46776 from yaooqinn/SPARK-48439.

Authored-by: Kent Yao 
Signed-off-by: yangjie01 
---
 .../main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala  | 12 +---
 .../datasources/v2/jdbc/DerbyTableCatalogSuite.scala |  8 
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala
index 36af0e6aeaf1..23da4dbb60a5 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala
@@ -48,9 +48,15 @@ private case class DerbyDialect() extends JdbcDialect {
 case ByteType => Option(JdbcType("SMALLINT", java.sql.Types.SMALLINT))
 case ShortType => Option(JdbcType("SMALLINT", java.sql.Types.SMALLINT))
 case BooleanType => Option(JdbcType("BOOLEAN", java.sql.Types.BOOLEAN))
-// 31 is the maximum precision and 5 is the default scale for a Derby 
DECIMAL
-case t: DecimalType if t.precision > 31 =>
-  Option(JdbcType("DECIMAL(31,5)", java.sql.Types.DECIMAL))
+// 31 is the maximum precision
+// https://db.apache.org/derby/docs/10.13/ref/rrefsqlj15260.html
+case t: DecimalType =>
+  val (p, s) = if (t.precision > 31) {
+(31, math.max(t.scale - (t.precision - 31), 0))
+  } else {
+(t.precision, t.scale)
+  }
+  Option(JdbcType(s"DECIMAL($p,$s)", java.sql.Types.DECIMAL))
 case _ => None
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala
index e3714e604495..d793ef526c47 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala
@@ -51,4 +51,12 @@ class DerbyTableCatalogSuite extends QueryTest with 
SharedSparkSession {
   checkAnswer(sql(s"SHOW TABLES IN derby.test1"), Row("test1", "TABLE2", 
false))
 }
   }
+
+  test("SPARK-48439: Calculate suitable precision and scale for DECIMAL type") 
{
+withTable("derby.test1.table1") {
+  sql("CREATE TABLE derby.test1.table1 (c1 decimal(38, 18))")
+  sql("INSERT INTO derby.test1.table1 VALUES (1.123456789123456789)")
+  checkAnswer(sql("SELECT * FROM derby.test1.table1"), Row(1.12345678912))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48420][BUILD] Upgrade netty to `4.1.110.Final`

2024-05-27 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a88cc1ad9319 [SPARK-48420][BUILD] Upgrade netty to `4.1.110.Final`
a88cc1ad9319 is described below

commit a88cc1ad9319bd0f4a14e2d6094865229449c8cb
Author: panbingkun 
AuthorDate: Tue May 28 13:09:39 2024 +0800

[SPARK-48420][BUILD] Upgrade netty to `4.1.110.Final`

### What changes were proposed in this pull request?
The pr aims to upgrade `netty` from `4.1.109.Final` to `4.1.110.Final`.

### Why are the changes needed?
- https://netty.io/news/2024/05/22/4-1-110-Final.html
  This version has brought some bug fixes and improvements, such as:
  Fix Zstd throws Exception on read-only volumes 
(https://github.com/netty/netty/pull/13982)
  Add unix domain socket transport in netty 4.x via JDK16+ 
([#13965](https://github.com/netty/netty/pull/13965))
  Backport #13075: Add the AdaptivePoolingAllocator 
([#13976](https://github.com/netty/netty/pull/13976))
  Add no-value key handling only for form body 
([#13998](https://github.com/netty/netty/pull/13998))
  Add support for specifying SecureRandom in SSLContext initialization 
([#14058](https://github.com/netty/netty/pull/14058))

- 
https://github.com/netty/netty/issues?q=milestone%3A4.1.110.Final+is%3Aclosed

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46744 from panbingkun/SPARK-48420.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 38 +--
 pom.xml   |  2 +-
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 10d812c9fd8a..e854bd0e804a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -197,31 +197,31 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar
 metrics-json/4.2.25//metrics-json-4.2.25.jar
 metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.109.Final//netty-all-4.1.109.Final.jar
-netty-buffer/4.1.109.Final//netty-buffer-4.1.109.Final.jar
-netty-codec-http/4.1.109.Final//netty-codec-http-4.1.109.Final.jar
-netty-codec-http2/4.1.109.Final//netty-codec-http2-4.1.109.Final.jar
-netty-codec-socks/4.1.109.Final//netty-codec-socks-4.1.109.Final.jar
-netty-codec/4.1.109.Final//netty-codec-4.1.109.Final.jar
-netty-common/4.1.109.Final//netty-common-4.1.109.Final.jar
-netty-handler-proxy/4.1.109.Final//netty-handler-proxy-4.1.109.Final.jar
-netty-handler/4.1.109.Final//netty-handler-4.1.109.Final.jar
-netty-resolver/4.1.109.Final//netty-resolver-4.1.109.Final.jar
+netty-all/4.1.110.Final//netty-all-4.1.110.Final.jar
+netty-buffer/4.1.110.Final//netty-buffer-4.1.110.Final.jar
+netty-codec-http/4.1.110.Final//netty-codec-http-4.1.110.Final.jar
+netty-codec-http2/4.1.110.Final//netty-codec-http2-4.1.110.Final.jar
+netty-codec-socks/4.1.110.Final//netty-codec-socks-4.1.110.Final.jar
+netty-codec/4.1.110.Final//netty-codec-4.1.110.Final.jar
+netty-common/4.1.110.Final//netty-common-4.1.110.Final.jar
+netty-handler-proxy/4.1.110.Final//netty-handler-proxy-4.1.110.Final.jar
+netty-handler/4.1.110.Final//netty-handler-4.1.110.Final.jar
+netty-resolver/4.1.110.Final//netty-resolver-4.1.110.Final.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-aarch_64.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.65.Final-osx-aarch_64.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-osx-x86_64.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-windows-x86_64.jar
 netty-tcnative-classes/2.0.65.Final//netty-tcnative-classes-2.0.65.Final.jar
-netty-transport-classes-epoll/4.1.109.Final//netty-transport-classes-epoll-4.1.109.Final.jar
-netty-transport-classes-kqueue/4.1.109.Final//netty-transport-classes-kqueue-4.1.109.Final.jar
-netty-transport-native-epoll/4.1.109.Final/linux-aarch_64/netty-transport-native-epoll-4.1.109.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.109.Final/linux-riscv64/netty-transport-native-epoll-4.1.109.Final-linux-riscv64.jar
-netty-transport-native-epoll/4.1.109.Final/linux-x86_64/netty-transport-native-epoll-4.1.109.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.109.Final/osx-aarch_64/netty-transport

(spark) branch master updated (3346afd4b250 -> ef43bbbc1163)

2024-05-24 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3346afd4b250 [SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import
 add ef43bbbc1163 [SPARK-48384][BUILD] Exclude 
`io.netty:netty-tcnative-boringssl-static` from `zookeeper`

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 1 -
 pom.xml   | 4 
 2 files changed, 4 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48386][TESTS] Replace JVM assert with JUnit Assert in tests

2024-05-22 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5df9a0866ae6 [SPARK-48386][TESTS] Replace JVM assert with JUnit Assert 
in tests
5df9a0866ae6 is described below

commit 5df9a0866ae60a42d78136a21a82a0b6e58daefa
Author: panbingkun 
AuthorDate: Thu May 23 10:46:08 2024 +0800

[SPARK-48386][TESTS] Replace JVM assert with JUnit Assert in tests

### What changes were proposed in this pull request?
The pr aims to replace `JVM assert` with `JUnit Assert` in tests.

### Why are the changes needed?
assert() statements do not produce as useful errors when they fail, and, if 
they were somehow disabled, would fail to test anything.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46698 from panbingkun/minor_assert.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .../protocol/EncryptedMessageWithHeaderSuite.java  |  2 +-
 .../shuffle/RetryingBlockTransferorSuite.java  |  8 +++---
 .../apache/spark/util/SparkLoggerSuiteBase.java| 30 --
 .../apache/spark/sql/TestStatefulProcessor.java| 10 +---
 .../sql/TestStatefulProcessorWithInitialState.java |  4 ++-
 .../JavaAdvancedDataSourceV2WithV2Filter.java  | 14 +-
 6 files changed, 38 insertions(+), 30 deletions(-)

diff --git 
a/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java
 
b/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java
index 7478fa1db711..2865d411bf67 100644
--- 
a/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java
+++ 
b/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java
@@ -116,7 +116,7 @@ public class EncryptedMessageWithHeaderSuite {
 
   // Validate we read data correctly
   assertEquals(bodyResult.readableBytes(), chunkSize);
-  assert(bodyResult.readableBytes() < (randomData.length - readIndex));
+  assertTrue(bodyResult.readableBytes() < (randomData.length - readIndex));
   while (bodyResult.readableBytes() > 0) {
 assertEquals(bodyResult.readByte(), randomData[readIndex++]);
   }
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
index 3725973ae733..84c8b1b3353f 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
@@ -288,7 +288,7 @@ public class RetryingBlockTransferorSuite {
 verify(listener, timeout(5000)).onBlockTransferSuccess("b0", block0);
 verify(listener).getTransferType();
 verifyNoMoreInteractions(listener);
-assert(_retryingBlockTransferor.getRetryCount() == 0);
+assertEquals(0, _retryingBlockTransferor.getRetryCount());
   }
 
   @Test
@@ -310,7 +310,7 @@ public class RetryingBlockTransferorSuite {
 verify(listener, timeout(5000)).onBlockTransferFailure("b0", 
saslTimeoutException);
 verify(listener, times(3)).getTransferType();
 verifyNoMoreInteractions(listener);
-assert(_retryingBlockTransferor.getRetryCount() == MAX_RETRIES);
+assertEquals(MAX_RETRIES, _retryingBlockTransferor.getRetryCount());
   }
 
   @Test
@@ -339,7 +339,7 @@ public class RetryingBlockTransferorSuite {
 // This should be equal to 1 because after the SASL exception is retried,
 // retryCount should be set back to 0. Then after that b1 encounters an
 // exception that is retried.
-assert(_retryingBlockTransferor.getRetryCount() == 1);
+assertEquals(1, _retryingBlockTransferor.getRetryCount());
   }
 
   @Test
@@ -368,7 +368,7 @@ public class RetryingBlockTransferorSuite {
 verify(listener, timeout(5000)).onBlockTransferFailure("b0", 
saslExceptionFinal);
 verify(listener, atLeastOnce()).getTransferType();
 verifyNoMoreInteractions(listener);
-assert(_retryingBlockTransferor.getRetryCount() == MAX_RETRIES);
+assertEquals(MAX_RETRIES, _retryingBlockTransferor.getRetryCount());
   }
 
   @Test
diff --git 
a/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java 
b/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java
index 46bfe3415080..0869f9827324 100644
--- a/common/utils/src/test/java/org/apache/spark/util/SparkLogg

(spark) branch master updated: [SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation

2024-05-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4fc2910f92d1 [SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a 
forked implementation
4fc2910f92d1 is described below

commit 4fc2910f92d1b5f7e0dd5f803e822668f23c21c5
Author: Cheng Pan 
AuthorDate: Mon May 20 20:42:57 2024 +0800

[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked 
implementation

### What changes were proposed in this pull request?

This PR replaces AmIpFilter with a forked implementation, and removes the 
dependency `hadoop-yarn-server-web-proxy`

### Why are the changes needed?

SPARK-47118 upgraded Spark built-in Jetty from 10 to 11, and migrated from 
`javax.servlet` to `jakarta.servlet`, which breaks the Spark on YARN.

```
Caused by: java.lang.IllegalStateException: class 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a 
jakarta.servlet.Filter
at 
org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
at 
org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724)
at 
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at 
java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
at 
java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
at 
org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
... 38 more
```

During the investigation, I found a comment here 
https://github.com/apache/spark/pull/31642#issuecomment-786257114

> Agree that in the long term we should either: 1) consider to re-implement 
the logic in Spark which allows us to get away from server-side dependency in 
Hadoop ...

This should be a simple and clean way to address the exact issue, then we 
don't need to wait for Hadoop `jakarta.servlet` migration, and it also strips a 
Hadoop dependency.

### Does this PR introduce _any_ user-facing change?

No, this recovers the bootstrap of the Spark application on YARN mode, 
keeping the same behavior with Spark 3.5 and earlier versions.

### How was this patch tested?

UTs are added. (refer to 
`org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter`)

I tested it in a YARN cluster.

Spark successfully started.
```
roothadoop-master1:/opt/spark-SPARK-48238# JAVA_HOME=/opt/openjdk-17 
bin/spark-sql --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 --conf 
spark.executorEnv.JAVA_HOME=/opt/openjdk-17
WARNING: Using incubator modules: jdk.incubator.vector
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
2024-05-18 04:11:36 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2024-05-18 04:11:44 WARN Client: Neither spark.yarn.jars nor 
spark.yarn.archive} is set, falling back to uploading libraries under 
SPARK_HOME.
Spark Web UI available at http://hadoop-master1.orb.local:4040
Spark master: yarn, Application Id: application_1716005503866_0001
spark-sql (default)> select version();
4.0.0 4ddc2303c7cbabee12a3de9f674aaacad3f5eb01
Time taken: 1.707 seconds, Fetched 1 row(s)
spark-sql (default)>
```

When access `http://hadoop-master1.orb.local:4040`, it redirects to 
`http://hadoop-master1.orb.local:8088/proxy/redirect/application_1716005503866_0001/`,
 and the UI looks correct.

https://github.com/apache/spark/assets/26535726/8500fc83-48c5-4603-8d05-37855f0308ae";>

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46611 from pan3793/SPARK-48238.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 assembly/pom.xml   |   4 -
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   1 -
 pom.xml|  77 -
 .../org/apache/spark/deploy/yarn/AmIpFilter.java   | 239 ++
 .../apache/spark/deploy/yarn/AmIpPrincipal.java|  35 +++
 .../deploy/yarn/AmIpServletRequestWrapper.java |  54 
 .../org/apache/spark/deploy/yarn/ProxyUtils.java   | 126 
 .../spark/deploy/yarn/ApplicationMaster.scala  |   2 +-
 .../apache/spark/deploy/yarn/AmIpFilterSuite.scala | 342 +
 .../org/apache/spark/streaming/Checkpoint.scala|   2 +-
 10 files changed, 798 insertions(+), 84 deletions(-)

diff --g

(spark) branch master updated: [SPARK-48242][BUILD] Upgrade extra-enforcer-rules to 1.8.0

2024-05-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2eea28b5efd4 [SPARK-48242][BUILD] Upgrade extra-enforcer-rules to 1.8.0
2eea28b5efd4 is described below

commit 2eea28b5efd4ae30d7962c92b2f9851cf3938b5e
Author: panbingkun 
AuthorDate: Mon May 20 17:23:59 2024 +0800

[SPARK-48242][BUILD] Upgrade extra-enforcer-rules to 1.8.0

### What changes were proposed in this pull request?
The pr aims to upgrade `extra-enforcer-rules to 1.8.0` from `1.7.0` to 
`1.8.0`.

### Why are the changes needed?
The full release notes:
https://github.com/mojohaus/extra-enforcer-rules/releases/tag/1.8.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
```
sh dev/test-dependencies.sh
```
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46538 from panbingkun/SPARK-48242.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 611e82f343d8..5811e5b7716d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3008,7 +3008,7 @@
 
   org.codehaus.mojo
   extra-enforcer-rules
-  1.7.0
+  1.8.0
 
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48274][BUILD] Upgrade GenJavadoc to `0.19`

2024-05-14 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7441bd0197e6 [SPARK-48274][BUILD] Upgrade GenJavadoc to `0.19`
7441bd0197e6 is described below

commit 7441bd0197e6442c4f98481bf2fb23b49b5f75cf
Author: panbingkun 
AuthorDate: Wed May 15 10:22:34 2024 +0800

[SPARK-48274][BUILD] Upgrade GenJavadoc to `0.19`

### What changes were proposed in this pull request?
This PR upgrades `GenJavadoc` plugin from `0.18` to `0.19`.

### Why are the changes needed?
1.The full release notes: 
https://github.com/lightbend/genjavadoc/releases/tag/v0.19

2.The latest version supports scala `2.13.14`, which is a `prerequisite` 
for us to upgrade spark's scala `2.13.14`.
https://mvnrepository.com/artifact/com.typesafe.genjavadoc/genjavadoc-plugin

3.The last upgrade occurred 3 years ago 
https://github.com/apache/spark/pull/33383

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Built the doc:

```
./build/sbt -Phadoop-3 -Pkubernetes -Pkinesis-asl -Phive-thriftserver 
-Pdocker-integration-tests -Pyarn -Phadoop-cloud -Pspark-ganglia-lgpl -Phive 
-Pjvm-profiler unidoc
```
https://github.com/apache/spark/assets/15246973/58d3fac8-c968-44e0-83f3-84cf00a5084f";>

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46579 from panbingkun/unidocGenjavadocVersion_0_19.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 project/SparkBuild.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 5bb7745d77bf..d1b0ed953e30 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -266,7 +266,7 @@ object SparkBuild extends PomBuild {
   .orElse(sys.props.get("java.home").map { p => new 
File(p).getParentFile().getAbsolutePath() })
   .map(file),
 publishMavenStyle := true,
-unidocGenjavadocVersion := "0.18",
+unidocGenjavadocVersion := "0.19",
 
 // Override SBT's default resolvers:
 resolvers := Seq(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48257][BUILD] Polish POM for Hive dependencies

2024-05-13 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c4c9ccbdf562 [SPARK-48257][BUILD] Polish POM for Hive dependencies
c4c9ccbdf562 is described below

commit c4c9ccbdf562b5da6066d6cd0517ab27bf9de3fa
Author: Cheng Pan 
AuthorDate: Mon May 13 21:59:14 2024 +0800

[SPARK-48257][BUILD] Polish POM for Hive dependencies

### What changes were proposed in this pull request?

1. `org.apache.hive` and 
`${hive.group}` co-exists in `pom.xml`, this PR unifies them 
to `${hive.group}`
2. `hive23.version`, `hive.version.short`, `` were used in Spark 3.0 period 
to distinguish hive 1.2 and hive 2.3, which are useless today, this PR removes 
those outdated definitions.
3. update/remove some outdated comments. e.g. remove the comment for Hive 
LOG4J exclusion because Spark already switched to LOG4J2, generalize the 
comments for Hive Parquet/Jetty exclusion

### Why are the changes needed?

Cleanup POM.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass CI.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46558 from pan3793/SPARK-48257.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 pom.xml  | 48 ++--
 sql/core/pom.xml |  2 +-
 sql/hive/pom.xml |  4 ++--
 3 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/pom.xml b/pom.xml
index ad6e9391b68c..12d20f4f0736 100644
--- a/pom.xml
+++ b/pom.xml
@@ -133,9 +133,6 @@
 core
 
 2.3.10
-2.3.10
-
-2.3
 
 3.7.0
 
@@ -2112,7 +2109,6 @@
 commons-logging
 commons-logging
   
-  
   
+  
   
 org.eclipse.jetty.aggregate
 jetty-all
   
-  
   
 org.apache.logging.log4j
 *
@@ -2139,10 +2134,9 @@
   
   
   
-org.apache.hive
+${hive.group}
 hive-storage-api
   
-  
 
   
 
@@ -2261,7 +2255,6 @@
 org.json
 json
   
-  
   
   
 ${hive.group}
@@ -2276,7 +2269,6 @@
 org.apache.calcite.avatica
 avatica
   
-  
   
 org.apache.logging.log4j
 *
@@ -2297,7 +2289,6 @@
 net.hydromatic
 aggdesigner-algorithm
   
-  
 
   
   
@@ -2410,7 +2401,6 @@
 org.slf4j
 slf4j-log4j12
   
-  
   
   
 org.apache.hbase
@@ -2420,7 +2410,6 @@
 co.cask.tephra
 *
   
-  
 
   
 
@@ -2478,12 +2467,14 @@
 org.codehaus.groovy
 groovy-all
   
-  
   
 ${hive.group}
 hive-service-rpc
   
-  
+  
   
 org.apache.parquet
 parquet-hadoop-bundle
@@ -2497,7 +2488,6 @@
 tomcat
 jasper-runtime
   
-  
 
   
 
@@ -2574,30 +2564,28 @@
 org.codehaus.groovy
 groovy-all
   
-  
   
   
 org.apache.logging.log4j
 log4j-slf4j-impl
   
-  
 
   
 
   
   
-org.apache.hive
+${hive.group}
 hive-llap-common
-${hive23.version}
+${hive.version}
 ${hive.deps.scope}
 
   
-org.apache.hive
+${hive.group}
 hive-common
   
   
-org.apache.hive
+${hive.group}
 hive-serde
   
   
@@ -2608,21 +2596,21 @@
   
   
   
-org.apache.hive
+${hive.group}
 hive-llap-client
-${hive23.version}
+${hive.version}
 test
 
   
-org.apache.hive
+${hive.group}
 hive-common
   
   
-org.apache.hive
+${hive.group}
 hive-serde
   
   
-org.apache.hive
+${hive.group}
 hive-llap-common
   
   
@@ -2683,7 +2671,7 @@
 hadoop-client-api
   
   
-org.apache.hive
+${hive.group}
 hive-storage-api
   
 
@@ -2713,7 +2701,7 @@
 orc-core
   
   
-org.apache.hive
+${hive.group}
 hive-storage-api
   
   
@@ -2902,7

(spark) branch master updated (cae2248bc13d -> acc37531deb9)

2024-05-12 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cae2248bc13d [MINOR][PYTHON][TESTS] Move test 
`test_named_arguments_negative` to `test_arrow_python_udf`
 add acc37531deb9 [SPARK-47993][PYTHON][FOLLOW-UP] Update migration guide 
about Python 3.8 dropped

No new revisions were added by this update.

Summary of changes:
 python/docs/source/migration_guide/pyspark_upgrade.rst | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test

2024-05-05 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8294c5962feb [SPARK-48138][CONNECT][TESTS] Disable a flaky 
`SparkSessionE2ESuite.interrupt tag` test
8294c5962feb is described below

commit 8294c5962febe53eebdff79f65f5f293d93a1997
Author: Dongjoon Hyun 
AuthorDate: Mon May 6 13:45:54 2024 +0800

[SPARK-48138][CONNECT][TESTS] Disable a flaky 
`SparkSessionE2ESuite.interrupt tag` test

### What changes were proposed in this pull request?

This PR aims to disable  a flaky test, `SparkSessionE2ESuite.interrupt 
tag`, temporarily.

To re-enable this, SPARK-48139 is created as a blocker issue for 4.0.0.

### Why are the changes needed?

This test case was added at `Apache Spark 3.5.0` but has been unstable 
unfortunately until now.
- #42009

We tried to stabilize this test case before `Apache Spark 4.0.0-preview`.
- #45173
- #46374

However, it's still flaky.

- https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
(Master, 2024-05-05)
- https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
(Master, 2024-05-04)

This PR aims to stablize CI first and to focus this flaky issue as a 
blocker level before going on `Spark Connect GA` in SPARK-48139 before Apache 
Spark 4.0.0.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46396 from dongjoon-hyun/SPARK-48138.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 .../jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala
index d1015d55b1df..f56085191f87 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala
@@ -108,7 +108,8 @@ class SparkSessionE2ESuite extends RemoteSparkSession {
 assert(interrupted.length == 2, s"Interrupted operations: $interrupted.")
   }
 
-  test("interrupt tag") {
+  // TODO(SPARK-48139): Re-enable `SparkSessionE2ESuite.interrupt tag`
+  ignore("interrupt tag") {
 val session = spark
 import session.implicits._
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args

2024-04-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1f9e09ce2148 [SPARK-48021][ML][BUILD][FOLLOWUP] add 
`--add-modules=jdk.incubator.vector` to maven compile args
1f9e09ce2148 is described below

commit 1f9e09ce2148dfc5e0fd9f3e43e5ceef8133414b
Author: panbingkun 
AuthorDate: Sun Apr 28 16:24:43 2024 +0800

[SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` 
to maven compile args

### What changes were proposed in this pull request?
The pr is following up https://github.com/apache/spark/pull/46246
The pr aims to  add `--add-modules=jdk.incubator.vector` to maven `compile 
args`.

### Why are the changes needed?
As commented by LuciferYang , we need to be consistent in `maven` compile.
https://github.com/apache/spark/pull/46246#issuecomment-2081298219
https://github.com/apache/spark/assets/15246973/26163da2-f27d-4ec2-893f-d9282b68aec1";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46259 from panbingkun/SPARK-48021.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 pom.xml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pom.xml b/pom.xml
index b916659fdbfa..efbf93856333 100644
--- a/pom.xml
+++ b/pom.xml
@@ -304,6 +304,7 @@
 
 
   -XX:+IgnoreUnrecognizedVMOptions
+  --add-modules=jdk.incubator.vector
   --add-opens=java.base/java.lang=ALL-UNNAMED
   --add-opens=java.base/java.lang.invoke=ALL-UNNAMED
   --add-opens=java.base/java.lang.reflect=ALL-UNNAMED


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL"

2024-04-22 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61dc9d991373 [SPARK-47928][SQL][TEST] Speed up test "Add jar support 
Ivy URI in SQL"
61dc9d991373 is described below

commit 61dc9d991373c01d449a8ed26d9bfd7eb93f9301
Author: Cheng Pan 
AuthorDate: Mon Apr 22 18:36:49 2024 +0800

[SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL"

### What changes were proposed in this pull request?

`SQLQuerySuite`/"SPARK-33084: Add jar support Ivy URI in SQL" uses Hive 
deps to test `ADD JAR` which pulls tons of transitive deps, this PR replaces it 
with light jars but covers all semantics to speed up the UT.

### Why are the changes needed?

Speed up the test, and reduce unnecessary relationships with Hive.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Run UT locally.

Before
```
[info] - SPARK-33084: Add jar support Ivy URI in SQL (16 minutes, 55 
seconds)
```
After
```
[info] - SPARK-33084: Add jar support Ivy URI in SQL (17 seconds, 783 
milliseconds)
```
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46150 from pan3793/SPARK-47928.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 25 +++---
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index f81369bbad36..78d4b91088a6 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -3748,22 +3748,21 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val sc = spark.sparkContext
-val hiveVersion = "2.3.9"
 // transitive=false, only download specified jar
-sql(s"ADD JAR 
ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion?transitive=false")
-assert(sc.listJars()
-  
.exists(_.contains(s"org.apache.hive.hcatalog_hive-hcatalog-core-$hiveVersion.jar")))
+sql(s"ADD JAR 
ivy://org.springframework:spring-core:6.1.6?transitive=false")
+
assert(sc.listJars().exists(_.contains("org.springframework_spring-core-6.1.6.jar")))
+
assert(!sc.listJars().exists(_.contains("org.springframework_spring-jcl-6.1.6.jar")))
 
 // default transitive=true, test download ivy URL jar return multiple jars
-sql("ADD JAR ivy://org.scala-js:scalajs-test-interface_2.12:1.2.0")
-assert(sc.listJars().exists(_.contains("scalajs-library_2.12")))
-assert(sc.listJars().exists(_.contains("scalajs-test-interface_2.12")))
-
-sql(s"ADD JAR ivy://org.apache.hive:hive-contrib:$hiveVersion" +
-  "?exclude=org.pentaho:pentaho-aggdesigner-algorithm&transitive=true")
-
assert(sc.listJars().exists(_.contains(s"org.apache.hive_hive-contrib-$hiveVersion.jar")))
-
assert(sc.listJars().exists(_.contains(s"org.apache.hive_hive-exec-$hiveVersion.jar")))
-
assert(!sc.listJars().exists(_.contains("org.pentaho.pentaho_aggdesigner-algorithm")))
+sql("ADD JAR ivy://org.awaitility:awaitility:4.2.1")
+
assert(sc.listJars().exists(_.contains("org.awaitility_awaitility-4.2.1.jar")))
+assert(sc.listJars().exists(_.contains("org.hamcrest_hamcrest-2.1.jar")))
+
+sql("ADD JAR ivy://org.junit.jupiter:junit-jupiter:5.10.2" +
+  "?exclude=org.junit.jupiter:junit-jupiter-engine&transitive=true")
+
assert(sc.listJars().exists(_.contains("org.junit.jupiter_junit-jupiter-api-5.10.2.jar")))
+
assert(sc.listJars().exists(_.contains("org.junit.jupiter_junit-jupiter-params-5.10.2.jar")))
+
assert(!sc.listJars().exists(_.contains("org.junit.jupiter_junit-jupiter-engine-5.10.2.jar")))
   }
 
   test("SPARK-33677: LikeSimplification should be skipped if pattern contains 
any escapeChar") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (458f70bd5213 -> 2d0b56c3eac6)

2024-04-21 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 458f70bd5213 [SPARK-47902][SQL] Making Compute Current Time* 
expressions foldable
 add 2d0b56c3eac6 [SPARK-47932][SQL][TESTS] Avoid using legacy commons-lang

No new revisions were added by this update.

Summary of changes:
 sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47901][BUILD] Upgrade common-text 1.12.0

2024-04-18 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 06b12fc31709 [SPARK-47901][BUILD] Upgrade common-text 1.12.0
06b12fc31709 is described below

commit 06b12fc317093d5a45c7c76e6617a9917f98b10d
Author: yangjie01 
AuthorDate: Fri Apr 19 13:21:37 2024 +0800

[SPARK-47901][BUILD] Upgrade common-text 1.12.0

### What changes were proposed in this pull request?
This pr aims to upgrade Apache common-text from 1.11.0 to 1.12.0

### Why are the changes needed?
The new version bring 2 bug fix:

- [TEXT-232](https://issues.apache.org/jira/browse/TEXT-232):  
WordUtils.containsAllWords?() may throw PatternSyntaxException
- [TEXT-175](https://issues.apache.org/jira/browse/TEXT-175):  Fix 
regression for determining whitespace in WordUtils

The full release notes as follows:

- 
https://github.com/apache/commons-text/blob/rel/commons-text-1.12.0/RELEASE-NOTES.txt

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46127 from LuciferYang/commons-text-1.12.0.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 45a4d499e513..770a7522e9f7 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -48,7 +48,7 @@ commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.14.0//commons-lang3-3.14.0.jar
 commons-math3/3.6.1//commons-math3-3.6.1.jar
 commons-pool/1.5.4//commons-pool-1.5.4.jar
-commons-text/1.11.0//commons-text-1.11.0.jar
+commons-text/1.12.0//commons-text-1.12.0.jar
 compress-lzf/1.1.2//compress-lzf-1.1.2.jar
 curator-client/5.6.0//curator-client-5.6.0.jar
 curator-framework/5.6.0//curator-framework-5.6.0.jar
diff --git a/pom.xml b/pom.xml
index 682365d9704a..74a2a61d6e09 100644
--- a/pom.xml
+++ b/pom.xml
@@ -606,7 +606,7 @@
   
 org.apache.commons
 commons-text
-1.11.0
+1.12.0
   
   
 commons-lang


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47850][SQL] Support `spark.sql.hive.convertInsertingUnpartitionedTable`

2024-04-18 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 52a2e63dd714 [SPARK-47850][SQL] Support 
`spark.sql.hive.convertInsertingUnpartitionedTable`
52a2e63dd714 is described below

commit 52a2e63dd7147e2701c9c26667fe5bd9fdc3f14c
Author: Cheng Pan 
AuthorDate: Thu Apr 18 15:05:15 2024 +0800

[SPARK-47850][SQL] Support 
`spark.sql.hive.convertInsertingUnpartitionedTable`

### What changes were proposed in this pull request?

This PR introduced a new configuration 
`spark.sql.hive.convertInsertingUnpartitionedTable` alongside the existing 
`spark.sql.hive.convertInsertingPartitionedTable` to allow fine grain switching 
from Hive Serde to Data Source on inserting Parquet/ORC Hive tables.

### Why are the changes needed?

In the practice of hybrid workload (Hive tables may be read/written by 
Hive, Spark, Impala, etc.), we usually use DataSource for reading Parquet/ORC 
tables but Hive Serde for writing, the current configuration combination allows 
us to achieve that except for unpartitioned tables.

### Does this PR introduce _any_ user-facing change?

No. The new added configuration 
`spark.sql.hive.convertInsertingUnpartitionedTable` default value is `true`, 
which keeps the existing behavior.

### How was this patch tested?

New UT is added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46052 from pan3793/SPARK-47850.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 .../plans/logical/basicLogicalOperators.scala  |  1 +
 .../apache/spark/sql/execution/command/views.scala |  1 +
 .../org/apache/spark/sql/hive/HiveStrategies.scala | 10 --
 .../org/apache/spark/sql/hive/HiveUtils.scala  | 10 ++
 .../execution/CreateHiveTableAsSelectCommand.scala |  5 ++-
 .../sql/hive/execution/InsertIntoHiveTable.scala   |  7 
 .../spark/sql/hive/orc/HiveOrcQuerySuite.scala | 37 ++
 7 files changed, 67 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 1c8f7a97dd7f..7c36e3bc79af 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -783,6 +783,7 @@ object View {
 "spark.sql.hive.convertMetastoreParquet",
 "spark.sql.hive.convertMetastoreOrc",
 "spark.sql.hive.convertInsertingPartitionedTable",
+"spark.sql.hive.convertInsertingUnpartitionedTable",
 "spark.sql.hive.convertMetastoreCtas"
   ).contains(key) || key.startsWith("spark.sql.catalog.")
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
index d71d0d43683c..cb5e7e7f42d2 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
@@ -360,6 +360,7 @@ object ViewHelper extends SQLConfHelper with Logging {
 "spark.sql.hive.convertMetastoreParquet",
 "spark.sql.hive.convertMetastoreOrc",
 "spark.sql.hive.convertInsertingPartitionedTable",
+"spark.sql.hive.convertInsertingUnpartitionedTable",
 "spark.sql.hive.convertMetastoreCtas",
 SQLConf.ADDITIONAL_REMOTE_REPOSITORIES.key)
 
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
index 5972a9df78ec..e74cc088a1f6 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
@@ -34,6 +34,7 @@ import 
org.apache.spark.sql.execution.command.{CreateTableCommand, DDLUtils, Ins
 import org.apache.spark.sql.execution.datasources.{CreateTable, 
DataSourceStrategy, HadoopFsRelation, InsertIntoHadoopFsRelationCommand, 
LogicalRelation}
 import org.apache.spark.sql.hive.execution._
 import org.apache.spark.sql.hive.execution.HiveScriptTransformationExec
+import org.apache.spark.sql.hive.execution.InsertIntoHiveTable.BY_CTAS
 import org.apache.spark.sql.internal.HiveSerDe
 
 
@@ -194,6 +195,8 @@ object HiveAnalysis extends Rule[LogicalPlan] {
  * - When writing to non-partitioned Hive-serde Parquet/Orc tables
  * - When writing to partitioned Hive-serde Parquet/Orc tables when

(spark) branch master updated: [SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing

2024-04-08 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 08c49637795f [SPARK-47770][INFRA] Fix 
`GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing
08c49637795f is described below

commit 08c49637795fd56ef550a509648f0890ff22a948
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 9 11:14:49 2024 +0800

[SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to 
return `false` instead of failing

### What changes were proposed in this pull request?

This PR aims to fix `GenerateMIMAIgnore.isPackagePrivateModule` to work 
correctly.

For example, `Metadata` is a case class inside package private 
`DefaultParamsReader` class. Currently, MIMA fails at this class analysis.


https://github.com/apache/spark/blob/f8e652e88320528a70e605a6a3cf986725e153a5/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L474-L485

The root cause is `isPackagePrivateModule` fails due to 
`scala.ScalaReflectionException`. We can simply make `isPackagePrivateModule` 
return `false`  instead of failing.
```
Error instrumenting 
class:org.apache.spark.ml.util.DefaultParamsReader$Metadata
Exception in thread "main" scala.ScalaReflectionException: type 
Serializable is not a class
at scala.reflect.api.Symbols$SymbolApi.asClass(Symbols.scala:284)
at scala.reflect.api.Symbols$SymbolApi.asClass$(Symbols.scala:284)
at 
scala.reflect.internal.Symbols$SymbolContextApiImpl.asClass(Symbols.scala:99)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala1(JavaMirrors.scala:1085)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$classToScala$1(JavaMirrors.scala:1040)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$toScala$1(JavaMirrors.scala:150)
at 
scala.reflect.runtime.TwoWayCaches$TwoWayCache.toScala(TwoWayCaches.scala:50)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.toScala(JavaMirrors.scala:148)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala(JavaMirrors.scala:1040)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.typeToScala(JavaMirrors.scala:1148)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$2(JavaMirrors.scala:816)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$1(JavaMirrors.scala:816)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.completeRest(JavaMirrors.scala:810)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.complete(JavaMirrors.scala:806)
at 
scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1575)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1538)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(SynchronizedSymbols.scala:221)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info(SynchronizedSymbols.scala:158)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info$(SynchronizedSymbols.scala:158)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.info(SynchronizedSymbols.scala:221)
at 
scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1733)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin(SynchronizedSymbols.scala:109)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin$(SynchronizedSymbols.scala:107)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.isPackagePrivateModule(GenerateMIMAIgnore.scala:48)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.$anonfun$privateWithin$1(GenerateMIMAIgnore.scala:67)
at scala.collection.immutable.List.foreach(List.scala:334)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:61)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:125)
at 
org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)
```

### Why are the changes needed?

**BEFORE**
```
$ dev/mima | grep org.apache.spark.ml.util.DefaultParamsReader
Using SPARK_LOCAL_IP=localhost
Using SPARK_LOCAL_IP

(spark) branch master updated (a598f654066d -> 03f4e45cd7e9)

2024-04-02 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a598f654066d [SPARK-47664][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add more 
tests
 add 03f4e45cd7e9 [SPARK-47685][SQL] Restore the support for `Stream` type 
in `Dataset#groupBy`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala | 4 +++-
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-45593][BUILD][3.5] Correct relocation connect guava dependency

2024-04-01 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 2da520e88266 [SPARK-45593][BUILD][3.5] Correct relocation connect 
guava dependency
2da520e88266 is described below

commit 2da520e88266530b2283ef3c9ac90bdc806b7556
Author: yikaifei 
AuthorDate: Mon Apr 1 15:35:23 2024 +0800

[SPARK-45593][BUILD][3.5] Correct relocation connect guava dependency

### What changes were proposed in this pull request?

This PR amins to correct relocation connect guava dependency and remove 
duplicate connect-common from SBT build jars.

This PR cherry-pick from https://github.com/apache/spark/pull/43436 and 
https://github.com/apache/spark/pull/44801 as a backport to 3.5 branch.

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Follow the steps described at 
https://github.com/apache/spark/pull/43195#issue-1921234067 to test manually.

In addition, will continue to observe the GA situation in recent days.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45775 from Yikf/branch-3.5.

Authored-by: yikaifei 
Signed-off-by: yangjie01 
---
 assembly/pom.xml |  6 ++
 connector/connect/client/jvm/pom.xml | 22 ++
 connector/connect/common/pom.xml | 33 +
 connector/connect/server/pom.xml |  1 +
 project/SparkBuild.scala |  6 +-
 5 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index d1ef9b24afda..21330058f77d 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -159,6 +159,12 @@
   org.apache.spark
   spark-connect_${scala.binary.version}
   ${project.version}
+  
+
+  org.apache.spark
+  
spark-connect-common_${scala.binary.version}
+
+  
 
 
   org.apache.spark
diff --git a/connector/connect/client/jvm/pom.xml 
b/connector/connect/client/jvm/pom.xml
index 53ff0b0147e0..6febc5ee6bd6 100644
--- a/connector/connect/client/jvm/pom.xml
+++ b/connector/connect/client/jvm/pom.xml
@@ -51,9 +51,14 @@
   ${project.version}
 
 
+
+  com.google.protobuf
+  protobuf-java
+  compile
+
 
   com.google.guava
   guava
@@ -61,8 +66,9 @@
   compile
 
 
-  com.google.protobuf
-  protobuf-java
+  com.google.guava
+  failureaccess
+  ${guava.failureaccess.version}
   compile
 
 
@@ -108,6 +114,7 @@
   true
   
 
+  com.google.guava:*
   com.google.android:*
   com.google.api.grpc:*
   com.google.code.findbugs:*
@@ -127,6 +134,13 @@
 
   
   
+
+  com.google.common
+  
${spark.shade.packageName}.connect.guava
+  
+com.google.common.**
+  
+
 
   io.grpc
   ${spark.shade.packageName}.io.grpc
@@ -138,7 +152,7 @@
   com.google
   
${spark.shade.packageName}.com.google
   
-
+
 com.google.common.**
   
 
diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml
index 7ce0aa6615d3..3c07b63c50a5 100644
--- a/connector/connect/common/pom.xml
+++ b/connector/connect/common/pom.xml
@@ -47,18 +47,6 @@
 com.google.protobuf
 protobuf-java
 
-
-com.google.guava
-guava
-${connect.guava.version}
-compile
-
-
-com.google.guava
-failureaccess
-${guava.failureaccess.version}
-compile
-
 
 io.grpc
 grpc-netty
@@ -152,6 +140,27 @@
 
 
 
+
+org.apache.maven.plugins
+maven-shade-plugin
+
+false
+
+
+org.spark-project.spark:unused
+
org.apache.tomcat:annotations-api
+
+
+
+
+
+package
+
+shade
+
+
+
+
 
 
 
diff --git a/connector/connect/server/pom.xml b/connector/connect

(spark) branch master updated: [SPARK-47629][INFRA] Add `common/variant` and `connector/kinesis-asl` to maven daily test module list

2024-03-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a8b247e9a50a [SPARK-47629][INFRA] Add `common/variant` and 
`connector/kinesis-asl` to maven daily test module list
a8b247e9a50a is described below

commit a8b247e9a50ae0450360e76bc69b2c6cdf5ea6f8
Author: yangjie01 
AuthorDate: Fri Mar 29 13:26:40 2024 +0800

[SPARK-47629][INFRA] Add `common/variant` and `connector/kinesis-asl` to 
maven daily test module list

### What changes were proposed in this pull request?
This pr add `common/variant` and `connector/kinesis-asl` to maven daily 
test module list.

### Why are the changes needed?
Synchronize the modules to be tested in Maven daily test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Monitor GA after merge

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45754 from LuciferYang/SPARK-47629.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .github/workflows/maven_test.yml | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/.github/workflows/maven_test.yml b/.github/workflows/maven_test.yml
index 34fa9a8b7768..b01f08a23e47 100644
--- a/.github/workflows/maven_test.yml
+++ b/.github/workflows/maven_test.yml
@@ -62,7 +62,7 @@ jobs:
   - hive2.3
 modules:
   - >-
-
core,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch,common#utils
+
core,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch,common#utils,common#variant
   - >-
 graphx,streaming,hadoop-cloud
   - >-
@@ -70,7 +70,7 @@ jobs:
   - >-
 repl,sql#hive-thriftserver
   - >-
-
connector#kafka-0-10,connector#kafka-0-10-sql,connector#kafka-0-10-token-provider,connector#spark-ganglia-lgpl,connector#protobuf,connector#avro
+
connector#kafka-0-10,connector#kafka-0-10-sql,connector#kafka-0-10-token-provider,connector#spark-ganglia-lgpl,connector#protobuf,connector#avro,connector#kinesis-asl
   - >-
 
sql#api,sql#catalyst,resource-managers#yarn,resource-managers#kubernetes#core
 # Here, we split Hive and SQL tests into some of slow ones and the 
rest of them.
@@ -188,20 +188,21 @@ jobs:
   export MAVEN_OPTS="-Xss64m -Xmx4g -Xms4g 
-XX:ReservedCodeCacheSize=128m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
   export MAVEN_CLI_OPTS="--no-transfer-progress"
   export JAVA_VERSION=${{ matrix.java }}
+  export ENABLE_KINESIS_TESTS=0
   # Replace with the real module name, for example, 
connector#kafka-0-10 -> connector/kafka-0-10
   export TEST_MODULES=`echo "$MODULES_TO_TEST" | sed -e "s%#%/%g"`
-  ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes 
-Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl 
-Djava.version=${JAVA_VERSION/-ea} clean install
+  ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes 
-Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl 
-Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} clean install
   if [[ "$INCLUDED_TAGS" != "" ]]; then
-./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} 
-Dtest.include.tags="$INCLUDED_TAGS" test -fae
+./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} 
-Dtest.include.tags="$INCLUDED_TAGS" test -fae
   elif [[ "$MODULES_TO_TEST" == "connect" ]]; then
 ./build/mvn $MAVEN_CLI_OPTS -Dtest.exclude.tags="$EXCLUDED_TAGS" 
-Djava.version=${JAVA_VERSION/-ea} -pl 
connector/connect/client/jvm,connector/connect/common,connector/connect/server 
test -fae
   elif [[ "$EXCLUDED_TAGS" != "" ]]; then
-./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} 
-Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae
+./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} 
-Dtes

(spark) branch master updated: [SPARK-47610][CORE] Always set `io.netty.tryReflectionSetAccessible=true`

2024-03-26 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5f392a219de2 [SPARK-47610][CORE] Always set 
`io.netty.tryReflectionSetAccessible=true`
5f392a219de2 is described below

commit 5f392a219de29b0856884fb95ff3e313f1047013
Author: Cheng Pan 
AuthorDate: Wed Mar 27 13:16:13 2024 +0800

[SPARK-47610][CORE] Always set `io.netty.tryReflectionSetAccessible=true`

### What changes were proposed in this pull request?

Always set `io.netty.tryReflectionSetAccessible=true`

### Why are the changes needed?

Arrow requires `-Dio.netty.tryReflectionSetAccessible=true` for JDK9+, see 
details in ARROW-7223. SPARK-29924 (fixed in 3.0.0) added document to guide 
users to add such JavaOpts manually, as Arrow is a Spark built-in component, 
and later we added such Java options to the building system(Maven, SBT, and 
PySpark test suite) manually. Now Spark requires JDK 17+, I think we can add 
such Java options by default to reduce disturbing users.

### Does this PR introduce _any_ user-facing change?

Yes, no impacts for those users who manually added 
`io.netty.tryReflectionSetAccessible=true`, but makes life easier for new Spark 
users.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45733 from pan3793/SPARK-47610.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 docs/index.md| 2 --
 .../src/main/java/org/apache/spark/launcher/JavaModuleOptions.java   | 3 ++-
 pom.xml  | 5 +++--
 project/SparkBuild.scala | 4 ++--
 python/docs/source/getting_started/install.rst   | 5 +
 python/run-tests.py  | 2 +-
 sql/hive/pom.xml | 2 +-
 7 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index 12c53c40c8f7..57f701316bd0 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -38,8 +38,6 @@ Spark runs on Java 17/21, Scala 2.13, Python 3.8+, and R 3.5+.
 When using the Scala API, it is necessary for applications to use the same 
version of Scala that Spark was compiled for.
 For example, when using Scala 2.13, use Spark compiled for 2.13, and compile 
code/applications for Scala 2.13 as well.
 
-Setting `-Dio.netty.tryReflectionSetAccessible=true` is required for the 
Apache Arrow library. This prevents the 
`java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.(long, int) not available` error when Apache Arrow 
uses Netty internally.
-
 # Running the Examples and Shell
 
 Spark comes with several sample programs. Python, Scala, Java, and R examples 
are in the
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java 
b/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java
index 8893f4bcb85a..3a8fa6c42d47 100644
--- a/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java
+++ b/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java
@@ -42,7 +42,8 @@ public class JavaModuleOptions {
   "--add-opens=java.base/sun.security.action=ALL-UNNAMED",
   "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED",
   "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED",
-  "-Djdk.reflect.useDirectMethodHandle=false"};
+  "-Djdk.reflect.useDirectMethodHandle=false",
+  "-Dio.netty.tryReflectionSetAccessible=true"};
 
 /**
  * Returns the default Java options related to `--add-opens' and
diff --git a/pom.xml b/pom.xml
index 79f8745e01f8..ffa4b5df36cb 100644
--- a/pom.xml
+++ b/pom.xml
@@ -316,6 +316,7 @@
   --add-opens=java.base/sun.security.action=ALL-UNNAMED
   --add-opens=java.base/sun.util.calendar=ALL-UNNAMED
   -Djdk.reflect.useDirectMethodHandle=false
+  -Dio.netty.tryReflectionSetAccessible=true
 
   
   
@@ -3109,7 +3110,7 @@
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs}
 
   
-  -da -Xmx4g -XX:ReservedCodeCacheSize=${CodeCacheSize} 
${extraJavaTestArgs} -Dio.netty.tryReflectionSetAccessible=true
+  -da -Xmx4

(spark) branch master updated: [SPARK-47474][CORE] Revert SPARK-47461 and add some comments

2024-03-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bee1bceb [SPARK-47474][CORE] Revert SPARK-47461 and add some 
comments
bee1bceb is described below

commit bee1bcebdad218a4151ad192d4893ff0fed9
Author: yangjie01 
AuthorDate: Thu Mar 21 13:58:39 2024 +0800

[SPARK-47474][CORE] Revert SPARK-47461 and add some comments

### What changes were proposed in this pull request?
This pr revert the change of SPARK-47461 and add some comments to 
`ExecutorAllocationManager#totalRunningTasksPerResourceProfile` to clarify that 
the tests in `ExecutorAllocationManagerSuite` need to call 
`listener.totalRunningTasksPerResourceProfile` with `synchronized`.

### Why are the changes needed?
`ExecutorAllocationManagerSuite` need to call 
`listener.totalRunningTasksPerResourceProfile` with `synchronized`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45602 from LuciferYang/SPARK-47474.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../src/main/scala/org/apache/spark/ExecutorAllocationManager.scala | 6 ++
 .../scala/org/apache/spark/ExecutorAllocationManagerSuite.scala | 4 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
index cdd1aecf4a22..94927caff1d7 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
@@ -320,6 +320,12 @@ private[spark] class ExecutorAllocationManager(
 }
   }
 
+  // Please do not delete this function, the tests in 
`ExecutorAllocationManagerSuite`
+  // need to access `listener.totalRunningTasksPerResourceProfile` with 
`synchronized`.
+  private def totalRunningTasksPerResourceProfile(id: Int): Int = synchronized 
{
+listener.totalRunningTasksPerResourceProfile(id)
+  }
+
   /**
* This is called at a fixed interval to regulate the number of pending 
executor requests
* and number of executors running.
diff --git 
a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
index aeb3cf53ff1a..e1da2b6dd9d6 100644
--- a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
@@ -1934,6 +1934,8 @@ private object ExecutorAllocationManagerSuite extends 
PrivateMethodTester {
 PrivateMethod[Map[Int, Map[String, 
Int]]](Symbol("rpIdToHostToLocalTaskCount"))
   private val _onSpeculativeTaskSubmitted =
 PrivateMethod[Unit](Symbol("onSpeculativeTaskSubmitted"))
+  private val _totalRunningTasksPerResourceProfile =
+PrivateMethod[Int](Symbol("totalRunningTasksPerResourceProfile"))
 
   private val defaultProfile = ResourceProfile.getOrCreateDefaultProfile(new 
SparkConf)
 
@@ -2041,7 +2043,7 @@ private object ExecutorAllocationManagerSuite extends 
PrivateMethodTester {
   }
 
   private def totalRunningTasksPerResourceProfile(manager: 
ExecutorAllocationManager): Int = {
-manager.listener.totalRunningTasksPerResourceProfile(defaultProfile.id)
+manager invokePrivate 
_totalRunningTasksPerResourceProfile(defaultProfile.id)
   }
 
   private def hostToLocalTaskCount(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (bb0867f54d43 -> 5d3845f2942a)

2024-03-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from bb0867f54d43 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to 
`jul-to-slf4j`
 add 5d3845f2942a [SPARK-46920][YARN] Improve executor exit error message 
on YARN

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/deploy/yarn/YarnAllocator.scala   | 28 --
 1 file changed, 16 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala`

2024-03-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 142677bcd203 [SPARK-47455][BUILD] Fix resource leak during the 
initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala`
142677bcd203 is described below

commit 142677bcd203caf2b6d07bf41d654e123d910ee8
Author: yangjie01 
AuthorDate: Wed Mar 20 15:19:33 2024 +0800

[SPARK-47455][BUILD] Fix resource leak during the initialization of 
`scalaStyleOnCompileConfig` in `SparkBuild.scala`

### What changes were proposed in this pull request?

https://github.com/apache/spark/blob/e01ed0da22f24204fe23143032ff39be7f4b56af/project/SparkBuild.scala#L157-L173

`Source.fromFile(in)` opens a `BufferedSource` resource handle, but it does 
not close it, this pr fix this issue.

### Why are the changes needed?
Close resource after used.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45582 from LuciferYang/SPARK-47455.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
(cherry picked from commit 85bf7615f85eea3e9192a7684ef711cf44042e05)
Signed-off-by: yangjie01 
---
 project/SparkBuild.scala | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 31263eaa4c8d..31516c8c6ffe 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -159,16 +159,21 @@ object SparkBuild extends PomBuild {
 val replacements = Map(
   """customId="println" level="error"""" -> """customId="println" 
level="warn""""
 )
-var contents = Source.fromFile(in).getLines.mkString("\n")
-for ((k, v) <- replacements) {
-  require(contents.contains(k), s"Could not rewrite '$k' in original 
scalastyle config.")
-  contents = contents.replace(k, v)
-}
-new PrintWriter(out) {
-  write(contents)
-  close()
+val source = Source.fromFile(in)
+try {
+  var contents = source.getLines.mkString("\n")
+  for ((k, v) <- replacements) {
+require(contents.contains(k), s"Could not rewrite '$k' in original 
scalastyle config.")
+contents = contents.replace(k, v)
+  }
+  new PrintWriter(out) {
+write(contents)
+close()
+  }
+  out
+} finally {
+  source.close()
 }
-out
   }
 
   // Return a cached scalastyle task for a given configuration (usually 
Compile or Test)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala`

2024-03-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8fcd9a1b0024 [SPARK-47455][BUILD] Fix resource leak during the 
initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala`
8fcd9a1b0024 is described below

commit 8fcd9a1b0024d24e3622b1948123e7f239a734a5
Author: yangjie01 
AuthorDate: Wed Mar 20 15:19:33 2024 +0800

[SPARK-47455][BUILD] Fix resource leak during the initialization of 
`scalaStyleOnCompileConfig` in `SparkBuild.scala`

### What changes were proposed in this pull request?

https://github.com/apache/spark/blob/e01ed0da22f24204fe23143032ff39be7f4b56af/project/SparkBuild.scala#L157-L173

`Source.fromFile(in)` opens a `BufferedSource` resource handle, but it does 
not close it, this pr fix this issue.

### Why are the changes needed?
Close resource after used.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45582 from LuciferYang/SPARK-47455.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
(cherry picked from commit 85bf7615f85eea3e9192a7684ef711cf44042e05)
Signed-off-by: yangjie01 
---
 project/SparkBuild.scala | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 79b58deafde5..dfadfea172d8 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -160,16 +160,21 @@ object SparkBuild extends PomBuild {
 val replacements = Map(
   """customId="println" level="error"""" -> """customId="println" 
level="warn""""
 )
-var contents = Source.fromFile(in).getLines.mkString("\n")
-for ((k, v) <- replacements) {
-  require(contents.contains(k), s"Could not rewrite '$k' in original 
scalastyle config.")
-  contents = contents.replace(k, v)
-}
-new PrintWriter(out) {
-  write(contents)
-  close()
+val source = Source.fromFile(in)
+try {
+  var contents = source.getLines.mkString("\n")
+  for ((k, v) <- replacements) {
+require(contents.contains(k), s"Could not rewrite '$k' in original 
scalastyle config.")
+contents = contents.replace(k, v)
+  }
+  new PrintWriter(out) {
+write(contents)
+close()
+  }
+  out
+} finally {
+  source.close()
 }
-out
   }
 
   // Return a cached scalastyle task for a given configuration (usually 
Compile or Test)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (c3a04fa59ce1 -> 85bf7615f85e)

2024-03-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c3a04fa59ce1 [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as 
TimestampNTZ
 add 85bf7615f85e [SPARK-47455][BUILD] Fix resource leak during the 
initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala`

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite`

2024-03-08 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 8b4316461e2b [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the 
compilation error related to `PropagateEmptyRelationSuite`
8b4316461e2b is described below

commit 8b4316461e2bc3ca3b72170648ca6b6e36537a65
Author: yangjie01 
AuthorDate: Fri Mar 8 16:44:54 2024 +0800

[SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related 
to `PropagateEmptyRelationSuite`

### What changes were proposed in this pull request?
https://github.com/apache/spark/pull/45406 has been backported to 
branch-3.4, where the newly added test case in `PropagateEmptyRelationSuite` 
uses `DataTypeUtils`, but `DataTypeUtils` is a utility class added in Apache 
Spark 3.5(SPARK-44475), so this triggered a compilation failure in branch-3.4:

- https://github.com/apache/spark/actions/runs/8183755511/job/22377119069

```
[error] 
/home/runner/work/spark/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala:229:27:
 not found: value DataTypeUtils
[error] val schemaForStream = 
DataTypeUtils.fromAttributes(outputForStream)
[error]   ^
[error] 
/home/runner/work/spark/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala:233:26:
 not found: value DataTypeUtils
[error] val schemaForBatch = 
DataTypeUtils.fromAttributes(outputForBatch)
[error]  ^
[info] done compiling
[info] compiling 1 Scala source to 
/home/runner/work/spark/spark/connector/connect/common/target/scala-2.12/test-classes
 ...
[info] compiling 25 Scala sources and 1 Java source to 
/home/runner/work/spark/spark/connector/connect/client/jvm/target/scala-2.12/classes
 ...
[info] done compiling
[error] two errors found
```

Therefore, this PR changes to use the `StructType.fromAttributes` function 
to fix the compilation failure."

### Why are the changes needed?
Fix the compilation failure in branch-3.4

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45428 from LuciferYang/SPARK-47305-FOLLOWUP-34.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
index a1132eadcc6f..91b62ae953f0 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
@@ -226,11 +226,11 @@ class PropagateEmptyRelationSuite extends PlanTest {
 val data = Seq(Row(1))
 
 val outputForStream = Seq($"a".int)
-val schemaForStream = DataTypeUtils.fromAttributes(outputForStream)
+val schemaForStream = StructType.fromAttributes(outputForStream)
 val converterForStream = 
CatalystTypeConverters.createToCatalystConverter(schemaForStream)
 
 val outputForBatch = Seq($"b".int)
-val schemaForBatch = DataTypeUtils.fromAttributes(outputForBatch)
+val schemaForBatch = StructType.fromAttributes(outputForBatch)
 val converterForBatch = 
CatalystTypeConverters.createToCatalystConverter(schemaForBatch)
 
 val streamingRelation = LocalRelation(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (3aa16e8193cf -> 433c9b064a3f)

2024-03-03 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3aa16e8193cf [MINOR] Update outdated comments for class 
`o.a.s.s.functions`
 add 433c9b064a3f [SPARK-47246][SQL] Replace `InternalRow.fromSeq` with 
`new GenericInternalRow` to save a collection conversion

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/literals.scala |  5 ++---
 .../spark/sql/catalyst/expressions/objects/objects.scala |  2 +-
 .../org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala| 12 ++--
 .../spark/sql/execution/columnar/InMemoryRelation.scala  |  4 ++--
 .../datasources/v2/state/metadata/StateMetadataSource.scala  |  5 +++--
 .../scala/org/apache/spark/sql/hive/HiveInspectors.scala |  2 +-
 .../apache/spark/sql/hive/execution/HiveTableScanExec.scala  |  2 +-
 7 files changed, 16 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (c8d293dff595 -> 7e7ba4eaf071)

2024-02-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c8d293dff595 [SPARK-47147][PYTHON][SQL] Fix PySpark collated string 
conversion error
 add 7e7ba4eaf071 [MINOR][SQL] Remove out-of-dated comment in 
`CollectLimitExec`

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala | 5 -
 1 file changed, 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 2.0.62.Final

2024-02-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb1e7872a3e6 [SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and 
`netty-tcnative` to 2.0.62.Final
fb1e7872a3e6 is described below

commit fb1e7872a3e64eab6127f9c2b3ffa42b63162f6c
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 20 17:04:41 2024 +0800

[SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and `netty-tcnative` 
to 2.0.62.Final

### What changes were proposed in this pull request?

This PR aims to upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 
2.0.62.Final.

### Why are the changes needed?

To bring the latest bug fixes.
- https://netty.io/news/2024/02/13/4-1-107-Final.html

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45178 from dongjoon-hyun/SPARK-47100.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  4 +--
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index dbbddbc54c11..cc0145e004a0 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -192,32 +192,32 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar
 metrics-json/4.2.25//metrics-json-4.2.25.jar
 metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.106.Final//netty-all-4.1.106.Final.jar
-netty-buffer/4.1.106.Final//netty-buffer-4.1.106.Final.jar
-netty-codec-http/4.1.106.Final//netty-codec-http-4.1.106.Final.jar
-netty-codec-http2/4.1.106.Final//netty-codec-http2-4.1.106.Final.jar
-netty-codec-socks/4.1.106.Final//netty-codec-socks-4.1.106.Final.jar
-netty-codec/4.1.106.Final//netty-codec-4.1.106.Final.jar
-netty-common/4.1.106.Final//netty-common-4.1.106.Final.jar
-netty-handler-proxy/4.1.106.Final//netty-handler-proxy-4.1.106.Final.jar
-netty-handler/4.1.106.Final//netty-handler-4.1.106.Final.jar
-netty-resolver/4.1.106.Final//netty-resolver-4.1.106.Final.jar
+netty-all/4.1.107.Final//netty-all-4.1.107.Final.jar
+netty-buffer/4.1.107.Final//netty-buffer-4.1.107.Final.jar
+netty-codec-http/4.1.107.Final//netty-codec-http-4.1.107.Final.jar
+netty-codec-http2/4.1.107.Final//netty-codec-http2-4.1.107.Final.jar
+netty-codec-socks/4.1.107.Final//netty-codec-socks-4.1.107.Final.jar
+netty-codec/4.1.107.Final//netty-codec-4.1.107.Final.jar
+netty-common/4.1.107.Final//netty-common-4.1.107.Final.jar
+netty-handler-proxy/4.1.107.Final//netty-handler-proxy-4.1.107.Final.jar
+netty-handler/4.1.107.Final//netty-handler-4.1.107.Final.jar
+netty-resolver/4.1.107.Final//netty-resolver-4.1.107.Final.jar
 
netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar
-netty-tcnative-boringssl-static/2.0.61.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-aarch_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-aarch_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-x86_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-windows-x86_64.jar
-netty-tcnative-classes/2.0.61.Final//netty-tcnative-classes-2.0.61.Final.jar
-netty-transport-classes-epoll/4.1.106.Final//netty-transport-classes-epoll-4.1.106.Final.jar
-netty-transport-classes-kqueue/4.1.106.Final//netty-transport-classes-kqueue-4.1.106.Final.jar
-netty-transport-native-epoll/4.1.106.Final/linux-aarch_64/netty-transport-native-epoll-4.1.106.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.106.Final/linux-riscv64/netty-transport-native-epoll-4.1.106.Final-linux-riscv64.jar
-netty-transport-native-epoll/4.1.106.Final/linux-x86_64/netty-transport-native-epoll-4.1.106.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.106.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.106.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.106.Final/osx-x86_64/netty-transport-native-kqueue-4.1.106.Final-osx-x86_64.jar
-netty-transport-native-unix-common/4.1.106.Final//netty-transport-native-unix-common-4.1.106.Final.jar
-netty-transport/4.1.106.Final//netty-transport-4.1.106.Final.jar
+netty-tcnative-boringssl-static/2.0.62.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.62.Final-linux-aarch_64.jar
+netty

(spark) branch master updated: [SPARK-47084][BUILD] Upgrade joda-time to 2.12.7

2024-02-18 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 85108b0cb605 [SPARK-47084][BUILD] Upgrade joda-time to 2.12.7
85108b0cb605 is described below

commit 85108b0cb6059e9a5301b63ab266084defd0ddf2
Author: panbingkun 
AuthorDate: Mon Feb 19 10:15:37 2024 +0800

[SPARK-47084][BUILD] Upgrade joda-time to 2.12.7

### What changes were proposed in this pull request?
The pr aims to upgrade `joda-time` from `2.12.6` to `2.12.7`.

### Why are the changes needed?
The version `DateTimeZone` data updated to version `2024agtz`.
The full release notes: 
https://www.joda.org/joda-time/changes-report.html#a2.12.7

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45153 from panbingkun/SPARK-47084.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 0b619a249e96..5aabe0e4aef1 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -139,7 +139,7 @@ jetty-util/10.0.19//jetty-util-10.0.19.jar
 jline/2.14.6//jline-2.14.6.jar
 jline/3.22.0//jline-3.22.0.jar
 jna/5.13.0//jna-5.13.0.jar
-joda-time/2.12.6//joda-time-2.12.6.jar
+joda-time/2.12.7//joda-time-2.12.7.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/pom.xml b/pom.xml
index a14f2d255a90..64931dd73282 100644
--- a/pom.xml
+++ b/pom.xml
@@ -208,7 +208,7 @@
   Because it transitions Jakarta REST API from javax to jakarta package.
 -->
 2.41
-2.12.6
+2.12.7
 3.5.2
 3.0.0
 0.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][INFRA][DOCS] Remove undated comment in build_and_test.yml

2024-02-18 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7c60fe21a29d [MINOR][INFRA][DOCS] Remove undated comment in 
build_and_test.yml
7c60fe21a29d is described below

commit 7c60fe21a29dd852de01da214c84e6a3deb38e31
Author: Hyukjin Kwon 
AuthorDate: Mon Feb 19 10:13:52 2024 +0800

[MINOR][INFRA][DOCS] Remove undated comment in build_and_test.yml

### What changes were proposed in this pull request?

This PR removes outdated comment. We don't use branch-3.3 anymore

### Why are the changes needed?

To remove obsolete information

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45149 from HyukjinKwon/minor-ga.

Authored-by: Hyukjin Kwon 
Signed-off-by: yangjie01 
---
 .github/workflows/build_and_test.yml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index c578d5079be8..bad34fd746ba 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -285,7 +285,6 @@ jobs:
   infra-image:
 name: "Base image build"
 needs: precondition
-# Currently, enable docker build from cache for `master` and branch (since 
3.4) jobs
 if: >-
   fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
   fromJson(needs.precondition.outputs.required).lint == 'true' ||


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47073][BUILD] Upgrade several Maven plugins to the latest versions

2024-02-16 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2abd3a2f445e [SPARK-47073][BUILD] Upgrade several Maven plugins to the 
latest versions
2abd3a2f445e is described below

commit 2abd3a2f445e86337ad94da19f301cb2b8bc232f
Author: Dongjoon Hyun 
AuthorDate: Fri Feb 16 22:10:41 2024 +0800

[SPARK-47073][BUILD] Upgrade several Maven plugins to the latest versions

### What changes were proposed in this pull request?

This PR aims to upgrade several maven plugins to the latest versions for 
Apache Spark 4.0.0.

### Why are the changes needed?

To bring the latest bug fixes.

- `versions-maven-plugin` from 2.16.0 to 2.16.2.
- `maven-enforcer-plugin` from 3.3.0 to 3.4.1.
- `maven-compiler-plugin` from 3.11.0 to 3.12.1.
- `maven-surefire-plugin` from 3.1.2 to 3.2.5.
- `maven-clean-plugin` from 3.3.1 to 3.3.2.
- `maven-javadoc-plugin` from 3.5.0 to 3.6.3.
- `maven-shade-plugin` from 3.5.0 to 3.5.1.
- `maven-dependency-plugin` from 3.6.0 to 3.6.1.
- `maven-checkstyle-plugin` from 3.3.0 to 3.3.1.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45136 from dongjoon-hyun/SPARK-47073.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 pom.xml | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/pom.xml b/pom.xml
index b83378af30ff..cd669dd02b6d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -179,7 +179,7 @@
 
 4.7.1
 false
-2.16.0
+2.16.2
 
 true
 true
@@ -2852,7 +2852,7 @@
 
   org.apache.maven.plugins
   maven-enforcer-plugin
-  3.3.0
+  3.4.1
   
 
   enforce-versions
@@ -3035,7 +3035,7 @@
 
   org.apache.maven.plugins
   maven-compiler-plugin
-  3.11.0
+  3.12.1
   
 ${java.version}
 ${java.version}
@@ -3052,7 +3052,7 @@
 
   org.apache.maven.plugins
   maven-surefire-plugin
-  3.1.2
+  3.2.5
   
   
 
@@ -3189,7 +3189,7 @@
 
   org.apache.maven.plugins
   maven-clean-plugin
-  3.3.1
+  3.3.2
   
 
   
@@ -3216,7 +3216,7 @@
 
   org.apache.maven.plugins
   maven-javadoc-plugin
-  3.5.0
+  3.6.3
   
 
   -Xdoclint:all
@@ -3272,7 +3272,7 @@
 
   org.apache.maven.plugins
   maven-shade-plugin
-  3.5.0
+  3.5.1
   
 
   org.ow2.asm
@@ -3299,7 +3299,7 @@
 
   org.apache.maven.plugins
   maven-dependency-plugin
-  3.6.0
+  3.6.1
   
 
   default-cli
@@ -3439,7 +3439,7 @@
   
 org.apache.maven.plugins
 maven-checkstyle-plugin
-3.3.0
+3.3.1
 
   false
   true


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47025][BUILD][TESTS] Upgrade `Guava` dependency in `docker-integration-tests` test module

2024-02-12 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d260f5753e9d [SPARK-47025][BUILD][TESTS] Upgrade `Guava` dependency in 
`docker-integration-tests` test module
d260f5753e9d is described below

commit d260f5753e9db00b84d85c34d1ebd21e36a98ac1
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 13 08:41:00 2024 +0800

[SPARK-47025][BUILD][TESTS] Upgrade `Guava` dependency in 
`docker-integration-tests` test module

### What changes were proposed in this pull request?

This PR aims to update `docker-integration-tests` test module to use the 
latest `Guava` test dependency. Specifically,
- Switch from `provided` dependency to `test` dependency
- Upgrade from version `19.0` to `33.0.0-jre`.

### Why are the changes needed?

Previously, `docker-integration-tests` uses `Guava 19.0` dependency as 
`provided` scope because `docker-java-core` is using `Guava 19.0` still.

- 
https://mvnrepository.com/artifact/com.github.docker-java/docker-java-core/3.3.4

### Does this PR introduce _any_ user-facing change?

No, `docker-integration-tests` is an integration test module.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45088 from dongjoon-hyun/SPARK-47025.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 connector/docker-integration-tests/pom.xml | 3 ++-
 project/SparkBuild.scala   | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/connector/docker-integration-tests/pom.xml 
b/connector/docker-integration-tests/pom.xml
index 4cca3ef12ae5..f9430da052be 100644
--- a/connector/docker-integration-tests/pom.xml
+++ b/connector/docker-integration-tests/pom.xml
@@ -49,7 +49,8 @@
 
   com.google.guava
   guava
-  19.0
+  33.0.0-jre
+  test
 
 
   org.apache.spark
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 45b51cb0ff5b..24e2c814f99f 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -951,7 +951,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-dependencyOverrides += "com.google.guava" % "guava" % "19.0"
+dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
   )
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46615][CONNECT] Support s.c.immutable.ArraySeq in ArrowDeserializers

2024-02-07 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 729fc8ec95e0 [SPARK-46615][CONNECT] Support s.c.immutable.ArraySeq in 
ArrowDeserializers
729fc8ec95e0 is described below

commit 729fc8ec95e017bd6eead283c0b660b9c57a174d
Author: panbingkun 
AuthorDate: Thu Feb 8 14:57:13 2024 +0800

[SPARK-46615][CONNECT] Support s.c.immutable.ArraySeq in ArrowDeserializers

### What changes were proposed in this pull request?
The pr aims to support s.c.immutable.ArraySeq as customCollectionCls in 
ArrowDeserializers.

### Why are the changes needed?
Because s.c.immutable.ArraySeq is a commonly used type in Scala 2.13, we 
should support it.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT (SQLImplicitsTestSuite).

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44618 from panbingkun/SPARK-46615.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .../scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala| 11 +++
 .../spark/sql/connect/client/arrow/ArrowDeserializer.scala|  9 +
 .../spark/sql/connect/client/arrow/ArrowEncoderUtils.scala|  2 ++
 3 files changed, 22 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
index b2c13850a13a..3e4704b6ab8e 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
@@ -52,6 +52,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 
   test("test implicit encoder resolution") {
 val spark = session
+import org.apache.spark.util.ArrayImplicits._
 import spark.implicits._
 def testImplicit[T: Encoder](expected: T): Unit = {
   val encoder = encoderFor[T]
@@ -84,6 +85,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(booleans)
 testImplicit(booleans.toSeq)
 testImplicit(booleans.toSeq)(newBooleanSeqEncoder)
+testImplicit(booleans.toImmutableArraySeq)
 
 val bytes = Array(76.toByte, 59.toByte, 121.toByte)
 testImplicit(bytes.head)
@@ -91,6 +93,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(bytes)
 testImplicit(bytes.toSeq)
 testImplicit(bytes.toSeq)(newByteSeqEncoder)
+testImplicit(bytes.toImmutableArraySeq)
 
 val shorts = Array(21.toShort, (-213).toShort, 14876.toShort)
 testImplicit(shorts.head)
@@ -98,6 +101,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(shorts)
 testImplicit(shorts.toSeq)
 testImplicit(shorts.toSeq)(newShortSeqEncoder)
+testImplicit(shorts.toImmutableArraySeq)
 
 val ints = Array(4, 6, 5)
 testImplicit(ints.head)
@@ -105,6 +109,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(ints)
 testImplicit(ints.toSeq)
 testImplicit(ints.toSeq)(newIntSeqEncoder)
+testImplicit(ints.toImmutableArraySeq)
 
 val longs = Array(System.nanoTime(), System.currentTimeMillis())
 testImplicit(longs.head)
@@ -112,6 +117,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(longs)
 testImplicit(longs.toSeq)
 testImplicit(longs.toSeq)(newLongSeqEncoder)
+testImplicit(longs.toImmutableArraySeq)
 
 val floats = Array(3f, 10.9f)
 testImplicit(floats.head)
@@ -119,6 +125,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(floats)
 testImplicit(floats.toSeq)
 testImplicit(floats.toSeq)(newFloatSeqEncoder)
+testImplicit(floats.toImmutableArraySeq)
 
 val doubles = Array(23.78d, -329.6d)
 testImplicit(doubles.head)
@@ -126,22 +133,26 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(doubles)
 testImplicit(doubles.toSeq)
 testImplicit(doubles.toSeq)(newDoubleSeqEncoder)
+testImplicit(doubles.toImmutableArraySeq)
 
 val strings = Array("foo", "baz", "bar")
 testImplicit(strings.head)
 testImplicit(strings)
 testImplicit(strings.toSeq)
 testImplicit(strings.toSeq)(newStringSeqEncoder)
+testImplicit(strings.toImmutableArraySeq)
 
 val myTypes = Array(MyType(12L, Math.E, Math.PI), MyType(0, 0, 0))
 testImplicit(myTypes.head)
 testImplicit(myTypes)
 testImplicit(myType

(spark) branch branch-3.5 updated: [SPARK-46400][CORE][SQL][3.5] When there are corrupted files in the local maven repo, skip this cache and try again

2024-02-07 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 77f8b38a1091 [SPARK-46400][CORE][SQL][3.5] When there are corrupted 
files in the local maven repo, skip this cache and try again
77f8b38a1091 is described below

commit 77f8b38a1091aa51af32dc790b61ae54ac47a2c2
Author: panbingkun 
AuthorDate: Thu Feb 8 14:41:51 2024 +0800

[SPARK-46400][CORE][SQL][3.5] When there are corrupted files in the local 
maven repo, skip this cache and try again

### What changes were proposed in this pull request?
The pr aims to
- fix potential bug(ie: https://github.com/apache/spark/pull/44208) and 
enhance user experience.
- make the code more compliant with standards

Backport above to branch 3.5.
Master branch pr: https://github.com/apache/spark/pull/44343

### Why are the changes needed?
We use the local maven repo as the first-level cache in ivy.  The original 
intention was to reduce the time required to parse and obtain the ar, but when 
there are corrupted files in the local maven repo,The above mechanism will be 
directly interrupted and the prompt is very unfriendly, which will greatly 
confuse the user.  Based on the original intention, we should skip the cache 
directly in similar situations.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45017 from panbingkun/branch-3.5_SPARK-46400.

Authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 .../org/apache/spark/deploy/SparkSubmit.scala  | 116 +
 .../sql/hive/client/IsolatedClientLoader.scala |   4 +
 2 files changed, 98 insertions(+), 22 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index af35f451e370..0f0d8b6c07c0 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -41,7 +41,7 @@ import org.apache.ivy.Ivy
 import org.apache.ivy.core.LogOptions
 import org.apache.ivy.core.module.descriptor._
 import org.apache.ivy.core.module.id.{ArtifactId, ModuleId, ModuleRevisionId}
-import org.apache.ivy.core.report.ResolveReport
+import org.apache.ivy.core.report.{DownloadStatus, ResolveReport}
 import org.apache.ivy.core.resolve.ResolveOptions
 import org.apache.ivy.core.retrieve.RetrieveOptions
 import org.apache.ivy.core.settings.IvySettings
@@ -1226,7 +1226,7 @@ private[spark] object SparkSubmitUtils extends Logging {
 s"be whitespace. The artifactId provided is: ${splits(1)}")
   require(splits(2) != null && splits(2).trim.nonEmpty, s"The version 
cannot be null or " +
 s"be whitespace. The version provided is: ${splits(2)}")
-  new MavenCoordinate(splits(0), splits(1), splits(2))
+  MavenCoordinate(splits(0), splits(1), splits(2))
 }
   }
 
@@ -1241,21 +1241,27 @@ private[spark] object SparkSubmitUtils extends Logging {
   }
 
   /**
-   * Extracts maven coordinates from a comma-delimited string
+   * Create a ChainResolver used by Ivy to search for and resolve dependencies.
+   *
* @param defaultIvyUserDir The default user path for Ivy
+   * @param useLocalM2AsCache Whether to use the local maven repo as a cache
* @return A ChainResolver used by Ivy to search for and resolve 
dependencies.
*/
-  def createRepoResolvers(defaultIvyUserDir: File): ChainResolver = {
+  def createRepoResolvers(
+  defaultIvyUserDir: File,
+  useLocalM2AsCache: Boolean = true): ChainResolver = {
 // We need a chain resolver if we want to check multiple repositories
 val cr = new ChainResolver
 cr.setName("spark-list")
 
-val localM2 = new IBiblioResolver
-localM2.setM2compatible(true)
-localM2.setRoot(m2Path.toURI.toString)
-localM2.setUsepoms(true)
-localM2.setName("local-m2-cache")
-cr.add(localM2)
+if (useLocalM2AsCache) {
+  val localM2 = new IBiblioResolver
+  localM2.setM2compatible(true)
+  localM2.setRoot(m2Path.toURI.toString)
+  localM2.setUsepoms(true)
+  localM2.setName("local-m2-cache")
+  cr.add(localM2)
+}
 
 val localIvy = new FileSystemResolver
 val localIvyRoot = new File(defaultIvyUserDir, "local")
@@ -1351,18 +1357,23 @@ private[spark] object SparkSubmitUtils extends Logging {
 
   /**
* Build Ivy Settings using options with default resolvers
+   *
* @param remoteRepos Comma-delimited string of remote repositories other 
than maven central
* @param ivyPath The path to the local ivy r

(spark) branch master updated: [MINOR][PYTHON][SQL][TESTS] Don't load Python Data Source when Python executable is not available even for testing

2024-02-07 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ab7fcacca41d [MINOR][PYTHON][SQL][TESTS] Don't load Python Data Source 
when Python executable is not available even for testing
ab7fcacca41d is described below

commit ab7fcacca41dad0ec2334b5d990bf36522fb5c82
Author: Hyukjin Kwon 
AuthorDate: Thu Feb 8 14:19:43 2024 +0800

[MINOR][PYTHON][SQL][TESTS] Don't load Python Data Source when Python 
executable is not available even for testing

### What changes were proposed in this pull request?

This PR proposes to don't load Python Data Source Python executable is not 
available even for testing

### Why are the changes needed?

Whether if we're in test or not, it can't work loading Python Data Sources 
anyway.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45067 from HyukjinKwon/minor-checking.

Authored-by: Hyukjin Kwon 
    Signed-off-by: yangjie01 
---
 .../org/apache/spark/sql/execution/datasources/DataSourceManager.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
index f63157b91efb..1b396675d909 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
@@ -98,7 +98,7 @@ object DataSourceManager extends Logging {
   private def normalize(name: String): String = name.toLowerCase(Locale.ROOT)
 
   private def initialStaticDataSourceBuilders: Map[String, 
UserDefinedPythonDataSource] = {
-if (Utils.isTesting || shouldLoadPythonDataSources) this.synchronized {
+if (shouldLoadPythonDataSources) this.synchronized {
   if (dataSourceBuilders.isEmpty) {
 val maybeResult = try {
   Some(UserDefinedPythonDataSource.lookupAllDataSourcesInPython())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47005][PYTHON][DOCS] Refine docstring of `asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last`

2024-02-07 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 45956f72d864 [SPARK-47005][PYTHON][DOCS] Refine docstring of 
`asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last`
45956f72d864 is described below

commit 45956f72d864701cd84635e9cac0a29592c08b1c
Author: yangjie01 
AuthorDate: Thu Feb 8 14:09:06 2024 +0800

[SPARK-47005][PYTHON][DOCS] Refine docstring of 
`asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last`

### What changes were proposed in this pull request?
This pr refine docstring of  
`asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last` and add some 
new examples.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45066 from LuciferYang/sort-funcs.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 python/pyspark/sql/functions/builtin.py | 148 +++-
 1 file changed, 128 insertions(+), 20 deletions(-)

diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index 110006df4317..6320f9b922ee 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -2889,7 +2889,7 @@ def getbit(col: "ColumnOrName", pos: "ColumnOrName") -> 
Column:
 @_try_remote_functions
 def asc_nulls_first(col: "ColumnOrName") -> Column:
 """
-Returns a sort expression based on the ascending order of the given
+Sort Function: Returns a sort expression based on the ascending order of 
the given
 column name, and null values return before non-null values.
 
 .. versionadded:: 2.4.0
@@ -2909,10 +2909,11 @@ def asc_nulls_first(col: "ColumnOrName") -> Column:
 
 Examples
 
->>> df1 = spark.createDataFrame([(1, "Bob"),
-...  (0, None),
-...  (2, "Alice")], ["age", "name"])
->>> df1.sort(asc_nulls_first(df1.name)).show()
+Example 1: Sorting a DataFrame with null values in ascending order
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([(1, "Bob"), (0, None), (2, "Alice")], 
["age", "name"])
+>>> df.sort(sf.asc_nulls_first(df.name)).show()
 +---+-+
 |age| name|
 +---+-+
@@ -2921,6 +2922,32 @@ def asc_nulls_first(col: "ColumnOrName") -> Column:
 |  1|  Bob|
 +---+-+
 
+Example 2: Sorting a DataFrame with multiple columns, null values in 
ascending order
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame(
+...   [(1, "Bob", None), (0, None, "Z"), (2, "Alice", "Y")], ["age", 
"name", "grade"])
+>>> df.sort(sf.asc_nulls_first(df.name), 
sf.asc_nulls_first(df.grade)).show()
++---+-+-+
+|age| name|grade|
++---+-+-+
+|  0| NULL|Z|
+|  2|Alice|Y|
+|  1|  Bob| NULL|
++---+-+-+
+
+Example 3: Sorting a DataFrame with null values in ascending order using 
column name string
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([(1, "Bob"), (0, None), (2, "Alice")], 
["age", "name"])
+>>> df.sort(sf.asc_nulls_first("name")).show()
++---+-+
+|age| name|
++---+-+
+|  0| NULL|
+|  2|Alice|
+|  1|  Bob|
++---+-+
 """
 return (
 col.asc_nulls_first()
@@ -2932,7 +2959,7 @@ def asc_nulls_first(col: "ColumnOrName") -> Column:
 @_try_remote_functions
 def asc_nulls_last(col: "ColumnOrName") -> Column:
 """
-Returns a sort expression based on the ascending order of the given
+Sort Function: Returns a sort expression based on the ascending order of 
the given
 column name, and null values appear after non-null values.
 
 .. versionadded:: 2.4.0
@@ -2952,10 +2979,11 @@ def asc_nulls_last(col: "ColumnOrName") -> Column:
 
 Examples
 
->>> df1 = spark.createDataFrame([(0, None),
-...  (1, "Bob"),
-...  (2, "Alice")], ["age", "name"])
->>> df1.sort

(spark) branch master updated: [SPARK-46987][CONNECT] `ProtoUtils.abbreviate` avoid unnecessary `setField` operation

2024-02-06 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a95aa7a7dda1 [SPARK-46987][CONNECT] `ProtoUtils.abbreviate` avoid 
unnecessary `setField` operation
a95aa7a7dda1 is described below

commit a95aa7a7dda1a5a2cfee69b3c132c524c0e01c7d
Author: Ruifeng Zheng 
AuthorDate: Wed Feb 7 10:26:34 2024 +0800

[SPARK-46987][CONNECT] `ProtoUtils.abbreviate` avoid unnecessary `setField` 
operation

### What changes were proposed in this pull request?
`ProtoUtils.abbreviate` avoid unnecessary `setField` operation

### Why are the changes needed?
according to the [API 
reference](https://protobuf.dev/reference/java/api-docs/com/google/protobuf/Message.html#toBuilder--):

> Message.Builder toBuilder()
Constructs a builder initialized with the current message. Use this to 
derive a new message from the current one.

the builder we used already has all the fields, so we only need to update 
the truncated fields.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45045 from zhengruifeng/connect_redaction_nit.

Authored-by: Ruifeng Zheng 
Signed-off-by: yangjie01 
---
 .../scala/org/apache/spark/sql/connect/common/ProtoUtils.scala| 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala
index 44de2350b9fd..2f31b63acf87 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala
@@ -43,8 +43,6 @@ private[connect] object ProtoUtils {
 val threshold = thresholds.getOrElse(STRING, MAX_STRING_SIZE)
 if (size > threshold) {
   builder.setField(field, createString(string.take(threshold), size))
-} else {
-  builder.setField(field, string)
 }
 
   case (field: FieldDescriptor, byteString: ByteString)
@@ -57,8 +55,6 @@ private[connect] object ProtoUtils {
 byteString
   .substring(0, threshold)
   .concat(createTruncatedByteString(size)))
-} else {
-  builder.setField(field, byteString)
 }
 
   case (field: FieldDescriptor, byteArray: Array[Byte])
@@ -71,8 +67,6 @@ private[connect] object ProtoUtils {
 ByteString
   .copyFrom(byteArray, 0, threshold)
   .concat(createTruncatedByteString(size)))
-} else {
-  builder.setField(field, byteArray)
 }
 
   // TODO(SPARK-43117): should also support 1, repeated msg; 2, map
@@ -80,7 +74,7 @@ private[connect] object ProtoUtils {
   if field.getJavaType == FieldDescriptor.JavaType.MESSAGE && msg != 
null =>
 builder.setField(field, abbreviate(msg, thresholds))
 
-  case (field: FieldDescriptor, value: Any) => builder.setField(field, 
value)
+  case _ =>
 }
 
 builder.build()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46895][CORE] Replace Timer with single thread scheduled executor

2024-02-06 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5d5b3a54b7b5 [SPARK-46895][CORE] Replace Timer with single thread 
scheduled executor
5d5b3a54b7b5 is described below

commit 5d5b3a54b7b5fb4308fe40da696ba805c72983fc
Author: beliefer 
AuthorDate: Tue Feb 6 17:23:03 2024 +0800

[SPARK-46895][CORE] Replace Timer with single thread scheduled executor

### What changes were proposed in this pull request?
This PR propose to replace `Timer` with single thread scheduled executor.

### Why are the changes needed?
The javadoc recommends `ScheduledThreadPoolExecutor` instead of `Timer`.
![屏幕快照 2024-01-12 下午12 47 
57](https://github.com/apache/spark/assets/8486025/4fc5ed61-6bb9-4768-915a-ad919a067d04)

This change based on the following two points.
**System time sensitivity**

Timer scheduling is based on the absolute time of the operating system and 
is sensitive to the operating system's time. Once the operating system's time 
changes, Timer scheduling is no longer precise.
The scheduled Thread Pool Executor scheduling is based on relative time and 
is not affected by changes in operating system time.

**Are anomalies captured**

Timer does not capture exceptions thrown by Timer Tasks, and in addition, 
Timer is single threaded. Once a scheduling task encounters an exception, the 
entire thread will terminate and other tasks that need to be scheduled will no 
longer be executed.
The scheduled Thread Pool Executor implements scheduling functions based on 
a thread pool. After a task throws an exception, other tasks can still execute 
normally.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
GA tests.

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #44718 from beliefer/replace-timer-with-threadpool.

Authored-by: beliefer 
Signed-off-by: yangjie01 
---
 .../main/scala/org/apache/spark/BarrierCoordinator.scala | 11 +++
 .../main/scala/org/apache/spark/BarrierTaskContext.scala | 14 ++
 .../org/apache/spark/scheduler/TaskSchedulerImpl.scala   | 15 ---
 .../scala/org/apache/spark/ui/ConsoleProgressBar.scala   | 11 ---
 .../main/scala/org/apache/spark/util/ThreadUtils.scala   | 16 ++--
 .../java/org/apache/spark/launcher/LauncherServer.java   |  8 
 6 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala 
b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
index 9bc7ade2e5ad..942242107e22 100644
--- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
@@ -17,8 +17,8 @@
 
 package org.apache.spark
 
-import java.util.{Timer, TimerTask}
-import java.util.concurrent.ConcurrentHashMap
+import java.util.TimerTask
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
 import java.util.function.Consumer
 
 import scala.collection.mutable.{ArrayBuffer, HashSet}
@@ -26,6 +26,7 @@ import scala.collection.mutable.{ArrayBuffer, HashSet}
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
 import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+import org.apache.spark.util.ThreadUtils
 
 /**
  * For each barrier stage attempt, only at most one barrier() call can be 
active at any time, thus
@@ -51,7 +52,8 @@ private[spark] class BarrierCoordinator(
 
   // TODO SPARK-25030 Create a Timer() in the mainClass submitted to 
SparkSubmit makes it unable to
   // fetch result, we shall fix the issue.
-  private lazy val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+  private lazy val timer = ThreadUtils.newSingleThreadScheduledExecutor(
+"BarrierCoordinator barrier epoch increment timer")
 
   // Listen to StageCompleted event, clear corresponding ContextBarrierState.
   private val listener = new SparkListener {
@@ -77,6 +79,7 @@ private[spark] class BarrierCoordinator(
   states.forEachValue(1, clearStateConsumer)
   states.clear()
   listenerBus.removeListener(listener)
+  ThreadUtils.shutdown(timer)
 } finally {
   super.onStop()
 }
@@ -168,7 +171,7 @@ private[spark] class BarrierCoordinator(
 // we may timeout for the sync.
 if (requesters.isEmpty) {
   initTimerTask(this)
-  timer.schedule(timerTask, timeoutInSecs * 1000)
+  timer.schedule(timerTask, timeoutInSecs, TimeUnit.SECONDS)
 }
 // Add the requester to array of RPCC

(spark) branch master updated (0154c059cddb -> fd476c1c855a)

2024-02-03 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0154c059cddb [MINOR][DOCS] Remove Java 8/11 at 
`IgnoreUnrecognizedVMOptions` description
 add fd476c1c855a [SPARK-46969][SQL][TESTS] Recover `to_timestamp('366', 
'DD')` test case of `datetime-parsing-invalid.sql`

No new revisions were added by this update.

Summary of changes:
 .../ansi/datetime-parsing-invalid.sql.out|  7 +++
 .../analyzer-results/datetime-parsing-invalid.sql.out|  7 +++
 .../sql-tests/inputs/datetime-parsing-invalid.sql|  3 +--
 .../results/ansi/datetime-parsing-invalid.sql.out| 16 
 .../sql-tests/results/datetime-parsing-invalid.sql.out   |  8 
 5 files changed, 39 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46918][YARN] Replace self-defined variables with Hadoop ContainerExitStatus

2024-01-30 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e32a8249c6dd [SPARK-46918][YARN] Replace self-defined variables with 
Hadoop ContainerExitStatus
e32a8249c6dd is described below

commit e32a8249c6ddb15e01d2307964f2978f4a10ad56
Author: Cheng Pan 
AuthorDate: Tue Jan 30 20:17:11 2024 +0800

[SPARK-46918][YARN] Replace self-defined variables with Hadoop 
ContainerExitStatus

### What changes were proposed in this pull request?

Replace the Spark self-defined `VMEM_EXCEEDED_EXIT_CODE` and 
`PMEM_EXCEEDED_EXIT_CODE` with Hadoop defined 
`ContainerExitStatus.KILLED_EXCEEDED_VMEM` and 
`ContainerExitStatus.KILLED_EXCEEDED_PMEM` which were introduced in 
YARN-2091(since Hadoop 2.5.0)

### Why are the changes needed?

Minor code clean-up

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44950 from pan3793/SPARK-46918.

Authored-by: Cheng Pan 
Signed-off-by: yangjie01 
---
 .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala  | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index 736eaa52b81c..7f0469937fef 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -851,9 +851,6 @@ private[yarn] class YarnAllocator(
   onHostStr,
   completedContainer.getState,
   completedContainer.getExitStatus))
-// Hadoop 2.2.X added a ContainerExitStatus we should switch to use
-// there are some exit status' we shouldn't necessarily count against 
us, but for
-// now I think its ok as none of the containers are expected to exit.
 val exitStatus = completedContainer.getExitStatus
 val (exitCausedByApp, containerExitReason) = exitStatus match {
   case _ if shutdown =>
@@ -867,7 +864,7 @@ private[yarn] class YarnAllocator(
 // just as easily finish on any other executor. See SPARK-8167.
 (false, s"Container ${containerId}${onHostStr} was preempted.")
   // Should probably still count memory exceeded exit codes towards 
task failures
-  case VMEM_EXCEEDED_EXIT_CODE =>
+  case ContainerExitStatus.KILLED_EXCEEDED_VMEM =>
 val vmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX virtual 
memory used".r
 val diag = 
vmemExceededPattern.findFirstIn(completedContainer.getDiagnostics)
   .map(_.concat(".")).getOrElse("")
@@ -876,7 +873,7 @@ private[yarn] class YarnAllocator(
   s"${YarnConfiguration.NM_VMEM_PMEM_RATIO} or disabling " +
   s"${YarnConfiguration.NM_VMEM_CHECK_ENABLED} because of 
YARN-4714."
 (true, message)
-  case PMEM_EXCEEDED_EXIT_CODE =>
+  case ContainerExitStatus.KILLED_EXCEEDED_PMEM =>
 val pmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX physical 
memory used".r
 val diag = 
pmemExceededPattern.findFirstIn(completedContainer.getDiagnostics)
   .map(_.concat(".")).getOrElse("")
@@ -1025,8 +1022,6 @@ private[yarn] class YarnAllocator(
 
 private object YarnAllocator {
   val MEM_REGEX = "[0-9.]+ [KMG]B"
-  val VMEM_EXCEEDED_EXIT_CODE = -103
-  val PMEM_EXCEEDED_EXIT_CODE = -104
   val DECOMMISSIONING_NODES_CACHE_SIZE = 200
 
   val NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS = Set(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner

2024-01-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 56633e697571 [SPARK-46898][CONNECT] Simplify the protobuf function 
transformation in Planner
56633e697571 is described below

commit 56633e69757174da8a7dd8f4ea5298fd0a00e656
Author: Ruifeng Zheng 
AuthorDate: Mon Jan 29 13:55:59 2024 +0800

[SPARK-46898][CONNECT] Simplify the protobuf function transformation in 
Planner

### What changes were proposed in this pull request?
Simplify the protobuf function transformation in Planner

### Why are the changes needed?
make `transformUnregisteredFunction` simple and reuse existing helper 
function

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44925 from zhengruifeng/connect_proto_simple.

Authored-by: Ruifeng Zheng 
Signed-off-by: yangjie01 
---
 .../sql/connect/planner/SparkConnectPlanner.scala  | 80 +++---
 1 file changed, 25 insertions(+), 55 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 3e59b2644755..977bff690bac 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -1710,53 +1710,6 @@ class SparkConnectPlanner(
*/
   private def transformUnregisteredFunction(
   fun: proto.Expression.UnresolvedFunction): Option[Expression] = {
-def extractArgsOfProtobufFunction(
-functionName: String,
-argumentsCount: Int,
-children: collection.Seq[Expression])
-: (String, Option[Array[Byte]], Map[String, String]) = {
-  val messageClassName = children(1) match {
-case Literal(s, StringType) if s != null => s.toString
-case other =>
-  throw InvalidPlanInput(
-s"MessageClassName in $functionName should be a literal string, 
but got $other")
-  }
-  val (binaryFileDescSetOpt, options) = if (argumentsCount == 2) {
-(None, Map.empty[String, String])
-  } else if (argumentsCount == 3) {
-children(2) match {
-  case Literal(b, BinaryType) if b != null =>
-(Some(b.asInstanceOf[Array[Byte]]), Map.empty[String, String])
-  case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) =>
-(None, ExprUtils.convertToMapData(CreateMap(arguments)))
-  case other =>
-throw InvalidPlanInput(
-  s"The valid type for the 3rd arg in $functionName " +
-s"is binary or map, but got $other")
-}
-  } else if (argumentsCount == 4) {
-val fileDescSetOpt = children(2) match {
-  case Literal(b, BinaryType) if b != null =>
-Some(b.asInstanceOf[Array[Byte]])
-  case other =>
-throw InvalidPlanInput(
-  s"DescFilePath in $functionName should be a literal binary, but 
got $other")
-}
-val map = children(3) match {
-  case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) =>
-ExprUtils.convertToMapData(CreateMap(arguments))
-  case other =>
-throw InvalidPlanInput(
-  s"Options in $functionName should be created by map, but got 
$other")
-}
-(fileDescSetOpt, map)
-  } else {
-throw InvalidPlanInput(
-  s"$functionName requires 2 ~ 4 arguments, but got $argumentsCount 
ones!")
-  }
-  (messageClassName, binaryFileDescSetOpt, options)
-}
-
 fun.getFunctionName match {
   case "product" if fun.getArgumentsCount == 1 =>
 Some(
@@ -1979,17 +1932,13 @@ class SparkConnectPlanner(
   // Protobuf-specific functions
   case "from_protobuf" if Seq(2, 3, 4).contains(fun.getArgumentsCount) =>
 val children = fun.getArgumentsList.asScala.map(transformExpression)
-val (messageClassName, binaryFileDescSetOpt, options) =
-  extractArgsOfProtobufFunction("from_protobuf", 
fun.getArgumentsCount, children)
-Some(
-  ProtobufDataToCatalyst(children.head, messageClassName, 
binaryFileDescSetOpt, options))
+val (msgName, desc, options) = extractProtobufArgs(children.toSeq)
+Some(ProtobufDataToCatalyst(children(0), msgName, desc, options))
 
   case "to_protobuf" if Seq

(spark) branch master updated: [SPARK-46432][BUILD] Upgrade Netty to 4.1.106.Final

2024-01-25 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44b163d281b9 [SPARK-46432][BUILD] Upgrade Netty to 4.1.106.Final
44b163d281b9 is described below

commit 44b163d281b9773cab9995e690ec3f4751c8be69
Author: panbingkun 
AuthorDate: Fri Jan 26 11:12:11 2024 +0800

[SPARK-46432][BUILD] Upgrade Netty to 4.1.106.Final

### What changes were proposed in this pull request?
The pr aims to upgrade `Netty` from `4.1.100.Final` to `4.1.106.Final`.

### Why are the changes needed?
- To bring the latest bug fixes
Automatically close Http2StreamChannel when 
Http2FrameStreamExceptionreaches end ofChannelPipeline 
([#13651](https://github.com/netty/netty/pull/13651))
Symbol not found: _netty_jni_util_JNI_OnLoad 
([#13695](https://github.com/netty/netty/issues/13728))

- 4.1.106.Final release note: 
https://netty.io/news/2024/01/19/4-1-106-Final.html
- 4.1.105.Final release note: 
https://netty.io/news/2024/01/16/4-1-105-Final.html
- 4.1.104.Final release note: 
https://netty.io/news/2023/12/15/4-1-104-Final.html
- 4.1.103.Final release note: 
https://netty.io/news/2023/12/13/4-1-103-Final.html
- 4.1.101.Final release note: 
https://netty.io/news/2023/11/09/4-1-101-Final.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44384 from panbingkun/SPARK-46432.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun 
Signed-off-by: yangjie01 
---
 common/network-yarn/pom.xml   | 44 ++-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 37 +++--
 pom.xml   |  2 +-
 3 files changed, 43 insertions(+), 40 deletions(-)

diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index c809bdfbbc1d..3f2ae21eeb3b 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -173,27 +173,29 @@
 unpack
 package
 
-
-
-
-
-
-
-
-
-
-
-
-
-
+  
+
+
+
+
+
+
+
+
+
+
+
+
+  
 
 
   run
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4ee0f5a41191..71f9ac8665b0 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -191,16 +191,16 @@ metrics-jmx/4.2.21//metrics-jmx-4.2.21.jar
 metrics-json/4.2.21//metrics-json-4.2.21.jar
 metrics-jvm/4.2.21//metrics-jvm-4.2.21.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.100.Final//netty-all-4.1.100.Final.jar
-netty-buffer/4.1.100.Final//netty-buffer-4.1.100.Final.jar
-netty-codec-http/4.1.100.Final//netty-codec-http-4.1.100.Final.jar
-netty-codec-http2/4.1.100.Final//netty-codec-http2-4.1.100.Final.jar
-netty-codec-socks/4.1.100.Final//netty-codec-socks-4.1.100.Final.jar
-netty-codec/4.1.100.Final//netty-codec-4.1.100.Final.jar
-netty-common/4.1.100.Final//netty-common-4.1.100.Final.jar
-netty-handler-proxy/4.1.100.Final//netty-handler-proxy-4.1.100.Final.jar
-netty-handler/4.1.100.Final//netty-handler-4.1.100.Final.jar
-netty-resolver/4.1.100.Final//netty-resolver-4.1.100.Final.jar
+netty-all/4.1.106.Final//netty-all-4.1.106.Final.jar
+netty-buffer/4.1.106.Final//netty-buffer-4.1.106.Final.jar
+netty-codec-http/4.1.106.Final//netty-codec-http-4.1.106.Final.jar
+netty-codec-http2/4.1.106.Final//netty-codec-http2-4.1.106.Final.jar
+netty-codec-socks/4.1.106.Final//netty-codec-socks-4.1.106.Final.jar
+netty-codec/4.1.106.Final//netty-codec-4.1.106.Final.jar
+netty-common/4.1.106.Final//netty-common-4.1.106.Final.jar
+netty-handler-proxy/4.1.106.Final//netty-handler-proxy-4.1.106.Final.jar
+netty-handler/4.1.106.Final//netty-handler-4.1.106.Final.jar
+netty-resolver/4.1.106.Final//netty-resolver-4.1.106.Final.jar
 
netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar
 
netty-tcnative-boringssl-static/2.0.61.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-aarch_64.jar
 
netty-tcnative-boringssl-static/2.0.61.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar
@@ -208,14 +208,15 @@ 
netty-tcnative-boringssl-static/2.0.61.Final/osx

(spark) branch master updated: [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input

2024-01-24 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d3a8b303c5c0 [SPARK-46787][CONNECT] `bloomFilter` function should 
throw `AnalysisException` for invalid input
d3a8b303c5c0 is described below

commit d3a8b303c5c056ec0863d20b33de6f1a5865dfae
Author: Ruifeng Zheng 
AuthorDate: Thu Jan 25 11:11:18 2024 +0800

[SPARK-46787][CONNECT] `bloomFilter` function should throw 
`AnalysisException` for invalid input

### What changes were proposed in this pull request?
`bloomFilter` function should throw `AnalysisException` for invalid input

### Why are the changes needed?

1. `BloomFilterAggregate` itself validates the input, and throws meaningful 
errors. we should not handle those invalid input and throw `InvalidPlanInput` 
in Planner.
2. to be consistent with vanilla Scala API and other functions

### Does this PR introduce _any_ user-facing change?
yes, `InvalidPlanInput` -> `AnalysisException`

### How was this patch tested?
updated CI

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44821 from zhengruifeng/connect_bloom_filter_agg_error.

Authored-by: Ruifeng Zheng 
Signed-off-by: yangjie01 
---
 .../apache/spark/sql/DataFrameStatFunctions.scala  | 28 --
 .../spark/sql/ClientDataFrameStatSuite.scala   | 20 
 .../sql/connect/planner/SparkConnectPlanner.scala  | 25 +--
 3 files changed, 16 insertions(+), 57 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index 4daa9fa88e66..4eef26da706f 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -22,7 +22,6 @@ import java.io.ByteArrayInputStream
 
 import scala.jdk.CollectionConverters._
 
-import org.apache.spark.SparkException
 import org.apache.spark.connect.proto.{Relation, StatSampleBy}
 import org.apache.spark.sql.DataFrameStatFunctions.approxQuantileResultEncoder
 import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.{ArrayEncoder, 
BinaryEncoder, PrimitiveDoubleEncoder}
@@ -599,7 +598,7 @@ final class DataFrameStatFunctions private[sql] 
(sparkSession: SparkSession, roo
* @since 3.5.0
*/
   def bloomFilter(colName: String, expectedNumItems: Long, fpp: Double): 
BloomFilter = {
-buildBloomFilter(Column(colName), expectedNumItems, -1L, fpp)
+bloomFilter(Column(colName), expectedNumItems, fpp)
   }
 
   /**
@@ -614,7 +613,8 @@ final class DataFrameStatFunctions private[sql] 
(sparkSession: SparkSession, roo
* @since 3.5.0
*/
   def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): 
BloomFilter = {
-buildBloomFilter(col, expectedNumItems, -1L, fpp)
+val numBits = BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
+bloomFilter(col, expectedNumItems, numBits)
   }
 
   /**
@@ -629,7 +629,7 @@ final class DataFrameStatFunctions private[sql] 
(sparkSession: SparkSession, roo
* @since 3.5.0
*/
   def bloomFilter(colName: String, expectedNumItems: Long, numBits: Long): 
BloomFilter = {
-buildBloomFilter(Column(colName), expectedNumItems, numBits, Double.NaN)
+bloomFilter(Column(colName), expectedNumItems, numBits)
   }
 
   /**
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] 
(sparkSession: SparkSession, roo
* @since 3.5.0
*/
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): 
BloomFilter = {
-buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-  col: Column,
-  expectedNumItems: Long,
-  numBits: Long,
-  fpp: Double): BloomFilter = {
-def numBitsValue: Long = if (!fpp.isNaN) {
-  BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-} else {
-  numBits
-}
-
-if (fpp <= 0d || fpp >= 1d) {
-  throw new SparkException("False positive probability must be within 
range (0.0, 1.0)")
-}
-val agg = Column.fn("bloom_filter_agg", col, lit(expectedNumItems), 
lit(numBitsValue))
-
+val agg = Column.fn("bloom_filter_agg", col, lit(expectedNumItems), 
lit(numBits))
 val ds = sparkSession.newDataset(BinaryEncoder) { builder =>
   builder.getProjectBuilder
 .setInput(root)
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala
index d0a89f67

(spark) branch master updated: [SPARK-46826][INFRA] Reset `grpcio` installation version of `Python linter dependencies for branch-3.4/branch-3.5`

2024-01-24 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 74b6301f152d [SPARK-46826][INFRA] Reset `grpcio` installation version 
of `Python linter dependencies for branch-3.4/branch-3.5`
74b6301f152d is described below

commit 74b6301f152d438246756d665a3aa69e401e6273
Author: yangjie01 
AuthorDate: Wed Jan 24 19:06:31 2024 +0800

[SPARK-46826][INFRA] Reset `grpcio` installation version of `Python linter 
dependencies for branch-3.4/branch-3.5`

### What changes were proposed in this pull request?
https://github.com/apache/spark/pull/43942 upgraded the `grpcio` version 
and simultaneously upgraded the `grpcio` version installed in `Install Python 
linter dependencies for branch-3.4` and `Install Python linter dependencies for 
branch-3.5` in `build_and_test.yml`. These two steps are used to install Python 
linter dependencies for `branch-3.4/branch-3.5` in daily tests. They should use 
the same configuration as `branch-3.4/branch-3.5` for safety. So this pr reset 
the version of grpc [...]

- branch-3.4


https://github.com/apache/spark/blob/e56bd97c04c184104046e51e6759e616c86683fa/.github/workflows/build_and_test.yml#L588-L595

- branch-3.5


https://github.com/apache/spark/blob/0956db6901bf03d2d948b23f00bcd6e74a0c251b/.github/workflows/build_and_test.yml#L637-L644

### Why are the changes needed?
The versions of the dependencies installed in `Install Python linter 
dependencies for branch-3.4` and `Install Python linter dependencies for 
branch-3.5` should be consistent with `branch-3.4/branch-3.5`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Monitor GA after merged

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44866 from LuciferYang/SPARK-46826.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .github/workflows/build_and_test.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 4038f63fb0dc..1d98727a4231 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -687,14 +687,14 @@ jobs:
 # SPARK-44554: Copy from 
https://github.com/apache/spark/blob/a05c27e85829fe742c1828507a1fd180cdc84b54/.github/workflows/build_and_test.yml#L571-L578
 # Should delete this section after SPARK 3.4 EOL.
 python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 
'mypy==0.920' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 
'jinja2<3.0.0' 'black==22.6.0'
-python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.59.3' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
+python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.48.1' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
 - name: Install Python linter dependencies for branch-3.5
   if: inputs.branch == 'branch-3.5'
   run: |
 # SPARK-45212: Copy from 
https://github.com/apache/spark/blob/555c8def51e5951c7bf5165a332795e9e330ec9d/.github/workflows/build_and_test.yml#L631-L638
 # Should delete this section after SPARK 3.5 EOL.
 python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 
'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 
'jinja2<3.0.0' 'black==22.6.0'
-python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.59.3' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
+python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
 - name: Install Python dependencies for python linter and documentation 
generation
   if: inputs.branch != 'branch-3.4' && inputs.branch != 'branch-3.5'
   run: |


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-45593][BUILD][FOLLOWUP] Correct relocation connect guava dependency

2024-01-24 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ea0436752fe5 [SPARK-45593][BUILD][FOLLOWUP] Correct relocation connect 
guava dependency
ea0436752fe5 is described below

commit ea0436752fe5b2a1ca58fad3877f48905b3c2d8a
Author: yikaifei 
AuthorDate: Wed Jan 24 19:03:00 2024 +0800

[SPARK-45593][BUILD][FOLLOWUP] Correct relocation connect guava dependency

### What changes were proposed in this pull request?

This PR amins to correct relocation connect guava dependency and remove 
duplicate connect-common from SBT build jars.

**Item 1:** In https://github.com/apache/spark/pull/43436, We fixed the 
connect module dependency on guava, but the dependency on guava was relocation 
incorrectly.
- connect server and connect client jvm don't relocation guava dependency, 
this runs the risk of causing conflict problems;
- connect common relocation does not take effect because it defines 
conflicting relocation rules with the parent pom(Now, we remove guava 
dependency from connect-common as it never use this library);

**Item2:** Remove duplicate connect-common from SBT build jars as it is 
shaded in the spark connect. Also, in fact, before this PR, in the output jars 
built using SBT, connect-common and common-server were the same thing, because 
they both hit the `jar.getName.contains("spark-connect")` condition.

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44801 from Yikf/SPARK-45593-SBT.

Authored-by: yikaifei 
Signed-off-by: yangjie01 
---
 connector/connect/client/jvm/pom.xml | 22 +-
 connector/connect/common/pom.xml | 25 -
 connector/connect/server/pom.xml | 26 ++
 project/SparkBuild.scala |  6 +-
 4 files changed, 52 insertions(+), 27 deletions(-)

diff --git a/connector/connect/client/jvm/pom.xml 
b/connector/connect/client/jvm/pom.xml
index 9bedebf523a7..81ffb140226e 100644
--- a/connector/connect/client/jvm/pom.xml
+++ b/connector/connect/client/jvm/pom.xml
@@ -59,6 +59,18 @@
   protobuf-java
   compile
 
+
+  com.google.guava
+  guava
+  ${connect.guava.version}
+  compile
+
+
+  com.google.guava
+  failureaccess
+  ${guava.failureaccess.version}
+  compile
+
 
   com.lihaoyi
   ammonite_${scala.version}
@@ -105,6 +117,7 @@
   true
   
 
+  com.google.guava:*
   com.google.android:*
   com.google.api.grpc:*
   com.google.code.findbugs:*
@@ -124,6 +137,13 @@
 
   
   
+
+  com.google.common
+  
${spark.shade.packageName}.connect.guava
+  
+com.google.common.**
+  
+
 
   io.grpc
   ${spark.shade.packageName}.io.grpc
@@ -135,7 +155,7 @@
   com.google
   
${spark.shade.packageName}.com.google
   
-
+
 com.google.common.**
   
 
diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml
index 336d83e04c15..b0f015246f4c 100644
--- a/connector/connect/common/pom.xml
+++ b/connector/connect/common/pom.xml
@@ -47,23 +47,6 @@
 com.google.protobuf
 protobuf-java
 
-
-
-com.google.guava
-guava
-${connect.guava.version}
-compile
-
-
-com.google.guava
-failureaccess
-${guava.failureaccess.version}
-compile
-
 
 io.grpc
 grpc-netty
@@ -158,17 +141,9 @@
 
 
 org.spark-project.spark:unused
-com.google.guava:guava
-com.google.guava:failureaccess
 
org.apache.tomcat:annotations-api
 
 
-
-
-com.google.common
-
${spark.shade.packageName}.connect.guava
-
-
 
 
 
diff --git a/connector/connect/server/pom.xml b/connector/connect/server/pom.xml
index 82127f736ccb..bdea8a627000 100644
--- a/connector/connect/serve

(spark) branch master updated: [SPARK-44495][INFRA][K8S] Use the latest minikube in K8s IT

2024-01-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d114e262ba29 [SPARK-44495][INFRA][K8S] Use the latest minikube in K8s 
IT
d114e262ba29 is described below

commit d114e262ba295995bb6a85035c1717cd353a526a
Author: Dongjoon Hyun 
AuthorDate: Sun Jan 21 08:53:16 2024 +0800

[SPARK-44495][INFRA][K8S] Use the latest minikube in K8s IT

### What changes were proposed in this pull request?

This PR aims to recover GitHub Action K8s IT to use the latest Minikube and 
to make it sure that Apache Spark K8s module are tested with all Minikubes 
without any issues.

**BEFORE**
- Minikube: v1.30.1
- K8s: v1.26.3

**AFTER**
- Minikube: v1.32.0
- K8s: v1.28.3

### Why are the changes needed?

- Previously, it was pinned due to the failure.
- After this PR, we will track the latest Minikube and K8s version always.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44813 from dongjoon-hyun/SPARK-44495.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 .github/workflows/build_and_test.yml  | 8 +++-
 .../deploy/k8s/integrationtest/KubernetesTestComponents.scala | 2 ++
 .../apache/spark/deploy/k8s/integrationtest/PVTestsSuite.scala| 3 ++-
 .../spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala  | 4 ++--
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 99bb2b12e083..69636629ca9d 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -1063,9 +1063,7 @@ jobs:
   - name: start minikube
 run: |
   # See more in "Installation" https://minikube.sigs.k8s.io/docs/start/
-  # curl -LO 
https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
-  # TODO(SPARK-44495): Resume to use the latest minikube for 
k8s-integration-tests.
-  curl -LO 
https://storage.googleapis.com/minikube/releases/v1.30.1/minikube-linux-amd64
+  curl -LO 
https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
   sudo install minikube-linux-amd64 /usr/local/bin/minikube
   rm minikube-linux-amd64
   # Github Action limit cpu:2, memory: 6947MB, limit to 2U6G for 
better resource statistic
@@ -1074,7 +1072,7 @@ jobs:
 run: |
   kubectl get pods -A
   kubectl describe node
-  - name: Run Spark on K8S integration test (With driver cpu 0.5, executor 
cpu 0.2 limited)
+  - name: Run Spark on K8S integration test
 run: |
   # Prepare PV test
   PVC_TMP_DIR=$(mktemp -d)
@@ -1084,7 +1082,7 @@ jobs:
   kubectl create clusterrolebinding serviceaccounts-cluster-admin 
--clusterrole=cluster-admin --group=system:serviceaccounts || true
   kubectl apply -f 
https://raw.githubusercontent.com/volcano-sh/volcano/v1.8.2/installer/volcano-development.yaml
 || true
   eval $(minikube docker-env)
-  build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano 
-Pkubernetes-integration-tests -Dspark.kubernetes.test.driverRequestCores=0.5 
-Dspark.kubernetes.test.executorRequestCores=0.2 
-Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local 
"kubernetes-integration-tests/test"
+  build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano 
-Pkubernetes-integration-tests 
-Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local 
"kubernetes-integration-tests/test"
   - name: Upload Spark on K8S integration tests log files
 if: ${{ !success() }}
 uses: actions/upload-artifact@v4
diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala
 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala
index 3762c31538dc..9581a78619dd 100644
--- 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala
+++ 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala
@@ -75,6 +75,8 @@ private[spark] class KubernetesTestComponents(val 
kubernetesClient: KubernetesCl
   .set(UI_ENABLED.key, "true")
   .set("spark.kubernetes.submis

(spark) branch master updated: [SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh`

2024-01-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dceb8bdc72ef [SPARK-46767][PYTHON][DOCS] Refine docstring of 
`abs/acos/acosh`
dceb8bdc72ef is described below

commit dceb8bdc72ef24ffa1eb5c1820e6350207f042f5
Author: yangjie01 
AuthorDate: Sat Jan 20 17:39:01 2024 +0800

[SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh`

### What changes were proposed in this pull request?
This pr refine docstring of  `abs/acos/acosh` and add some new examples.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44794 from LuciferYang/math-functions-1.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 python/pyspark/sql/functions/builtin.py | 169 ++--
 1 file changed, 137 insertions(+), 32 deletions(-)

diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index 1f6d86de28dc..62400accba10 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -734,7 +734,7 @@ def try_sum(col: "ColumnOrName") -> Column:
 @_try_remote_functions
 def abs(col: "ColumnOrName") -> Column:
 """
-Computes the absolute value.
+Mathematical Function: Computes the absolute value of the given column or 
expression.
 
 .. versionadded:: 1.3.0
 
@@ -744,22 +744,66 @@ def abs(col: "ColumnOrName") -> Column:
 Parameters
 --
 col : :class:`~pyspark.sql.Column` or str
-target column to compute on.
+The target column or expression to compute the absolute value on.
 
 Returns
 ---
 :class:`~pyspark.sql.Column`
-column for computed results.
+A new column object representing the absolute value of the input.
 
 Examples
 
->>> df = spark.range(1)
->>> df.select(abs(lit(-1))).show()
-+---+
-|abs(-1)|
-+---+
-|  1|
-+---+
+Example 1: Compute the absolute value of a negative number
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([(1, -1), (2, -2), (3, -3)], ["id", 
"value"])
+>>> df.select(sf.abs(df.value)).show()
++--+
+|abs(value)|
++--+
+| 1|
+| 2|
+| 3|
++--+
+
+Example 2: Compute the absolute value of an expression
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([(1, 1), (2, -2), (3, 3)], ["id", "value"])
+>>> df.select(sf.abs(df.id - df.value)).show()
++-+
+|abs((id - value))|
++-+
+|0|
+|4|
+|0|
++-+
+
+Example 3: Compute the absolute value of a column with null values
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([(1, None), (2, -2), (3, None)], ["id", 
"value"])
+>>> df.select(sf.abs(df.value)).show()
++--+
+|abs(value)|
++--+
+|  NULL|
+| 2|
+|  NULL|
++--+
+
+Example 4: Compute the absolute value of a column with double values
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([(1, -1.5), (2, -2.5), (3, -3.5)], ["id", 
"value"])
+>>> df.select(sf.abs(df.value)).show()
++--+
+|abs(value)|
++--+
+|   1.5|
+|   2.5|
+|   3.5|
++--+
 """
 return _invoke_function_over_columns("abs", col)
 
@@ -1478,7 +1522,8 @@ def product(col: "ColumnOrName") -> Column:
 @_try_remote_functions
 def acos(col: "ColumnOrName") -> Column:
 """
-Computes inverse cosine of the input column.
+Mathematical Function: Computes the inverse cosine (also known as 
arccosine)
+of the given column or expression.
 
 .. versionadded:: 1.4.0
 
@@ -1488,23 +1533,54 @@ def acos(col: "ColumnOrName") -> Column:
 Parameters
 --
 col : :class:`~pyspark.sql.Column` or str
-target column to compute on.
+The target column or expression to compute the inverse cosine on.
 
 Returns
 ---
 :class:`~pyspark.sql

(spark) branch master updated: [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error

2024-01-17 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44d2c86e71fc [SPARK-45593][BUILD] Building a runnable distribution 
from master code running spark-sql raise error
44d2c86e71fc is described below

commit 44d2c86e71fca7044e6d5d9e9222eecff17c360c
Author: yikaifei 
AuthorDate: Thu Jan 18 11:32:01 2024 +0800

[SPARK-45593][BUILD] Building a runnable distribution from master code 
running spark-sql raise error

### What changes were proposed in this pull request?

Fix a build issue, when building a runnable distribution from master code 
running spark-sql raise error:
```
Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.guava.util.concurrent.internal.InternalFutureFailureAccess
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
... 58 more
```
the problem is due to a gauva dependency in  spark-connect-common POM that 
**conflicts**  with the shade plugin of the parent pom.

- the spark-connect-common contains `connect.guava.version` version of 
guava, and it is relocation as `${spark.shade.packageName}.guava` not the 
`${spark.shade.packageName}.connect.guava`;
- The spark-network-common also contains guava related classes, it has also 
been relocation is `${spark.shade.packageName}.guava`, but guava version 
`${guava.version}`;
- As a result, in the presence of different versions of the classpath 
org.sparkproject.guava.xx;

In addition, after investigation, it seems that module spark-connect-common 
is not related to guava, so we can remove guava dependency from 
spark-connect-common.

### Why are the changes needed?

Building a runnable distribution from master code is not runnable.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

I ran the build command output a runnable distribution package manually for 
the tests;

Build command:
```
./dev/make-distribution.sh --name ui --pip --tgz  -Phive 
-Phive-thriftserver -Pyarn -Pconnect
```

Test result:
https://github.com/apache/spark/assets/51110188/aefbc433-ea5c-4287-8ebd-367806043ac8";>

I also checked the `org.sparkproject.guava.cache.LocalCache` from jars dir;
Before:
```
➜  jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./
.//spark-connect_2.13-4.0.0-SNAPSHOT.jar
.//spark-network-common_2.13-4.0.0-SNAPSHOT.jar
.//spark-connect-common_2.13-4.0.0-SNAPSHOT.jar
```

Now:
```
➜  jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./
.//spark-network-common_2.13-4.0.0-SNAPSHOT.jar
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43436 from Yikf/SPARK-45593.

Authored-by: yikaifei 
Signed-off-by: yangjie01 
---
 assembly/pom.xml |  6 ++
 connector/connect/client/jvm/pom.xml |  8 +---
 connector/connect/common/pom.xml | 34 ++
 connector/connect/server/pom.xml | 25 -
 4 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 77ff87c17f52..cd8c3fca9d23 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -149,6 +149,12 @@
   org.apache.spark
   spark-connect_${scala.binary.version}
   ${project.version}
+  
+
+  org.apache.spark
+  
spark-connect-common_${scala.binary.version}
+
+  
 
 
   org.apache.spark
diff --git a/connector/connect/client/jvm/pom.xml 
b/connector/connect/client/jvm/pom.xml
index 8057a33df178..9bedebf523a7 100644
--- a/connector/connect/client/jvm/pom.xml
+++ b/connector/connect/client/jvm/pom.xml
@@ -51,15 +51,9 @@
   ${project.version}
 
 
-
-  com.google.guava
-  guava
-  ${connect.guava.version}
-  compile
-
 
   com.google.protobuf
   protobuf-java
diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml
index a374646f8f29..336d83e04c15 100644
--- a/connector/connect/common/pom.xml
+++ b/connector/connect/common/pom.xml
@@ -47,6 +47,11 @@
 com.google.protobuf
 protobuf-java
 
+
 
 com.google.guava
 guava
@@ -145,6 +150,35 @@
 
 
 
+
+org.apache.maven.plugins

  1   2   3   4   >