(spark) branch master updated (6b879c2ca1f0 -> dc82285610e6)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6b879c2ca1f0 [SPARK-49455][SQL][TESTS] Refactor `StagingInMemoryTableCatalog` to override the non-deprecated functions add dc82285610e6 [SPARK-49483][BUILD] Upgrade `commons-lang3` to 3.17.0 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (fb8d01acf166 -> 6b879c2ca1f0)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from fb8d01acf166 [SPARK-48682][SQL][FOLLOW-UP] Changed initCap behaviour with UTF8_BINARY collation add 6b879c2ca1f0 [SPARK-49455][SQL][TESTS] Refactor `StagingInMemoryTableCatalog` to override the non-deprecated functions No new revisions were added by this update. Summary of changes: .../connector/catalog/StagingInMemoryTableCatalog.scala | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49460][SQL] Remove `cleanupResource()` from EmptyRelationExec
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9cec3c4f7c1b [SPARK-49460][SQL] Remove `cleanupResource()` from EmptyRelationExec 9cec3c4f7c1b is described below commit 9cec3c4f7c1b467023f0eefff69e8b7c5105417d Author: Ziqi Liu AuthorDate: Sat Aug 31 10:05:18 2024 +0800 [SPARK-49460][SQL] Remove `cleanupResource()` from EmptyRelationExec ### What changes were proposed in this pull request? Remove cleanupResource() from`EmptyRelationExec` ### Why are the changes needed? This bug was introduced in https://github.com/apache/spark/pull/46830 : `cleanupResources` might be executed on the executor where `logical` is null. After revisiting cleanupResources relevant code paths, I think `EmptyRelationExec` doesn't need to anything here. - for driver side cleanup, we have [this code path](https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala) to cleanup each AQE query stage. - for executor side cleanup, so far we only have SortMergeJoinExec which invoke cleanupResource during its execution, so upon the time when EmptyRelationExec is created, it's guaranteed necessary cleanup has been done. - After all, `EmptyRelationExec` is only a never-execute wrapper for materialized physical query stages, it should not be responsible for any cleanup invocation. So I'm removing `cleanupResources` implementation from `EmptyRelationExec`. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? New unit test. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #47931 from liuzqt/SPARK-49460. Authored-by: Ziqi Liu Signed-off-by: yangjie01 --- .../spark/sql/execution/EmptyRelationExec.scala| 10 -- .../adaptive/AdaptiveQueryExecSuite.scala | 37 ++ 2 files changed, 37 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala index 085c0b22524c..8a544de7567e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala @@ -22,7 +22,6 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.plans.logical.LocalRelation import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan -import org.apache.spark.sql.execution.adaptive.LogicalQueryStage import org.apache.spark.sql.vectorized.ColumnarBatch /** @@ -81,13 +80,4 @@ case class EmptyRelationExec(@transient logical: LogicalPlan) extends LeafExecNo override def doCanonicalize(): SparkPlan = { this.copy(logical = LocalRelation(logical.output).canonicalized) } - - override protected[sql] def cleanupResources(): Unit = { -logical.foreach { - case LogicalQueryStage(_, physical) => -physical.cleanupResources() - case _ => -} -super.cleanupResources() - } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala index fc54e7ecd46d..938a96a86b01 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala @@ -1608,6 +1608,43 @@ class AdaptiveQueryExecSuite } } + test("SPARK-49460: NPE in EmptyRelationExec.cleanupResources") { +withTable("t1left", "t1right", "t1empty") { + spark.sql("create table t1left (a int, b int);") + spark.sql("insert into t1left values (1, 1), (2,2), (3,3);") + spark.sql("create table t1right (a int, b int);") + spark.sql("create table t1empty (a int, b int);") + spark.sql("insert into t1right values (2,20), (4, 40);") + + spark.sql(""" + |with leftT as ( + | with erp as ( + |select + | * + |from + | t1left + | join t1empty on t1left.a = t1empty.a + | join t1right on t1left.a = t1right.a + | ) + | SELECT + |CASE + |
(spark) branch master updated: [SPARK-49119][SQL] Fix the inconsistency of syntax `show columns` between v1 and v2
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 53c1f31dc26b [SPARK-49119][SQL] Fix the inconsistency of syntax `show columns` between v1 and v2 53c1f31dc26b is described below commit 53c1f31dc26bb56d56e0b71b144910df5d376a76 Author: panbingkun AuthorDate: Fri Aug 30 16:15:01 2024 +0800 [SPARK-49119][SQL] Fix the inconsistency of syntax `show columns` between v1 and v2 ### What changes were proposed in this pull request? The pr aims to - fix the `inconsistency` of syntax `show columns` between `v1` and `v2`. - assign a name `SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE` to the error condition `_LEGACY_ERROR_TEMP_1057`. - unify v1 and v2 `SHOW COLUMNS ...` tests. - move some UT related to `SHOW COLUMNS` from `DDLSuite` to `command/ShowColumnsSuiteBase` or `v1/ShowColumnsSuiteBase`. - move some UT related to `SHOW COLUMNS` from `DDLParserSuite` and `ErrorParserSuite` to `ShowColumnsParserSuite`. ### Why are the changes needed? In `AstBuilder`, we have `a comment` that explains as follows: https://github.com/apache/spark/blob/2a752105091ef95f994526b15bae2159657c8ed0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L5054-L5055 However, in our v2 of the syntax `show columns` implementation, we `did not` perform the above checks, as shown below: ``` withNamespaceAndTable("ns", "tbl") { t => sql(s"CREATE TABLE $t (col1 int, col2 string) $defaultUsing") sql(s"SHOW COLUMNS IN $t IN ns1") } ``` - Before (inconsistent, v1 will fail, but v2 will success) v1: ``` [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`. ``` v2: ``` Execute successfully. ``` so, we should fix it. - After (consistent, v1 & v2 all will fail) v1: ``` [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`. ``` v2: ``` [SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE] SHOW COLUMNS with conflicting namespace: `ns1` != `ns`. ``` ### Does this PR introduce _any_ user-facing change? Yes, for v2 tables, in syntax `SHOW COLUMNS {FROM | IN} {tableName} {FROM | IN} {namespace}`, if the namespace (`second parameter`) is different from the namespace of the table(`first parameter`), the command will succeed without any awareness before this PR, after this PR, it will report an error. ### How was this patch tested? Add new UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47628 from panbingkun/SPARK-49119. Lead-authored-by: panbingkun Co-authored-by: Kent Yao Signed-off-by: yangjie01 --- .../src/main/resources/error/error-conditions.json | 11 +-- .../spark/sql/errors/QueryCompilationErrors.scala | 11 +-- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 23 - .../sql/catalyst/parser/ErrorParserSuite.scala | 4 - .../catalyst/analysis/ResolveSessionCatalog.scala | 3 +- .../datasources/v2/DataSourceV2Strategy.scala | 13 ++- ...olumnsTableExec.scala => ShowColumnsExec.scala} | 4 +- .../analyzer-results/show_columns.sql.out | 7 +- .../sql-tests/results/show_columns.sql.out | 7 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 10 --- .../spark/sql/execution/command/DDLSuite.scala | 33 --- .../execution/command/ShowColumnsParserSuite.scala | 55 .../execution/command/ShowColumnsSuiteBase.scala | 100 + .../execution/command/v1/ShowColumnsSuite.scala| 55 .../execution/command/v2/ShowColumnsSuite.scala} | 17 +--- .../hive/execution/command/ShowColumnsSuite.scala} | 18 ++-- 16 files changed, 255 insertions(+), 116 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 89d2627ef32e..496a90e5db34 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -3866,6 +3866,12 @@ ], "sqlState" : "42K08" }, + "SHOW_COLUMNS_WITH_CONFLICT_NAMESPACE" : { +"message" : [ + "SHOW COLUMNS with conflicting namespaces: != ." +], +"sqlState" : "42K05" + }, "SORT_BY_WITHOUT_BUCKETING" : { "message" : [ "sortBy must be used together with bucketBy." @@ -5685,11 +5691,6 @@ "ADD COLUM
(spark) branch master updated: [SPARK-49457][BUILD] Remove uncommon curl option `--retry-all-errors`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new effcf22a029f [SPARK-49457][BUILD] Remove uncommon curl option `--retry-all-errors` effcf22a029f is described below commit effcf22a029f2f61aa2513ae06554d171a774f5b Author: Cheng Pan AuthorDate: Thu Aug 29 22:55:08 2024 +0800 [SPARK-49457][BUILD] Remove uncommon curl option `--retry-all-errors` ### What changes were proposed in this pull request? Remove uncommon curl option `--retry-all-errors`, which is added in curl 7.71.0 - June 24 2020, old versions can not recognize this option. ### Why are the changes needed? It causes `build/mvn` to fail on Ubuntu 20.04. ``` exec: curl --retry 3 --retry-all-errors --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz?action=download curl: option --retry-all-errors: is unknown curl: try 'curl --help' or 'curl --manual' for more information ``` ``` $ curl --version curl 7.68.0 (aarch64-unknown-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3 Release-Date: 2020-01-08 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47926 from pan3793/SPARK-49457. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- build/mvn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/build/mvn b/build/mvn index 28454c68fd12..060209ac1ac4 100755 --- a/build/mvn +++ b/build/mvn @@ -58,7 +58,7 @@ install_app() { local local_checksum="${local_tarball}.${checksum_suffix}" local remote_checksum="https://archive.apache.org/dist/${url_path}.${checksum_suffix}"; - local curl_opts="--retry 3 --retry-all-errors --silent --show-error -L" + local curl_opts="--retry 3 --silent --show-error -L" local wget_opts="--no-verbose" if [ ! -f "$binary" ]; then - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][SQL] Fix the incorrect method `@link` tag in `StagingTableCatalog`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 57b6bc114e33 [MINOR][SQL] Fix the incorrect method `@link` tag in `StagingTableCatalog` 57b6bc114e33 is described below commit 57b6bc114e3348e00cbe88af3be0ad2a5cc0a579 Author: yangjie01 AuthorDate: Wed Aug 28 17:44:53 2024 +0800 [MINOR][SQL] Fix the incorrect method `@link` tag in `StagingTableCatalog` ### What changes were proposed in this pull request? This pr fixes an incorrect method `link` in `StagingTableCatalog`, link it to the method that should be `override` instead of the current method. ### Why are the changes needed? Fix the incorrect method `link` tag ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #47899 from LuciferYang/minor-wrong-link-StagingTableCatalog. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../org/apache/spark/sql/connector/catalog/StagingTableCatalog.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java index 6f074faf6e58..eead1ade4079 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/StagingTableCatalog.java @@ -102,7 +102,7 @@ public interface StagingTableCatalog extends TableCatalog { * returned table's {@link StagedTable#commitStagedChanges()} is called. * * This is deprecated, please override - * {@link #stageReplace(Identifier, StructType, Transform[], Map)} instead. + * {@link #stageReplace(Identifier, Column[], Transform[], Map)} instead. */ @Deprecated(since = "3.4.0") default StagedTable stageReplace( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49327][BUILD] Upgrade `commons-compress` to 1.27.1
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a31a5acbd389 [SPARK-49327][BUILD] Upgrade `commons-compress` to 1.27.1 a31a5acbd389 is described below commit a31a5acbd3891b9b903c65878707f64ec338a8fb Author: panbingkun AuthorDate: Wed Aug 21 17:30:16 2024 +0800 [SPARK-49327][BUILD] Upgrade `commons-compress` to 1.27.1 ### What changes were proposed in this pull request? The pr aims to upgrade `commons-compress` from `1.27.0` to `1.27.1` ### Why are the changes needed? Although the last upgrade occurred 10 days ago, this version fixed a serious bug as follows: https://commons.apache.org/proper/commons-compress/changes-report.html#a1.27.1 - Compression into BZip2 format has unexpected end of file when using a BufferedOutputStream. Fixes [COMPRESS-686](https://issues.apache.org/jira/browse/COMPRESS-686). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47821 from panbingkun/SPARK-49327. Authored-by: panbingkun Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 5a86dee79d98..60f11565658b 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -40,7 +40,7 @@ commons-codec/1.17.1//commons-codec-1.17.1.jar commons-collections/3.2.2//commons-collections-3.2.2.jar commons-collections4/4.4//commons-collections4-4.4.jar commons-compiler/3.1.9//commons-compiler-3.1.9.jar -commons-compress/1.27.0//commons-compress-1.27.0.jar +commons-compress/1.27.1//commons-compress-1.27.1.jar commons-crypto/1.1.0//commons-crypto-1.1.0.jar commons-dbcp/1.4//commons-dbcp-1.4.jar commons-io/2.16.1//commons-io-2.16.1.jar diff --git a/pom.xml b/pom.xml index fc1836f1c406..3fb276e53059 100644 --- a/pom.xml +++ b/pom.xml @@ -186,7 +186,7 @@ 1.1.10.6 3.0.3 1.17.1 -1.27.0 +1.27.1 2.16.1 2.6 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (e64f620fe8fd -> 1ae482ac7e64)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e64f620fe8fd [SPARK-48796][SS] Load Column Family Id from RocksDBCheckpointMetadata for VCF when restarting add 1ae482ac7e64 [SPARK-49075][BUILD] Upgrade JUnit5 related to the latest version No new revisions were added by this update. Summary of changes: pom.xml | 10 +- project/plugins.sbt | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][SQL][TESTS] Changes the `test:runMain` in the code comments to `Test/runMain`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7b43a6fdf0f9 [MINOR][SQL][TESTS] Changes the `test:runMain` in the code comments to `Test/runMain` 7b43a6fdf0f9 is described below commit 7b43a6fdf0f906ebbc83ca2a91f31e3ab76a68b2 Author: yangjie01 AuthorDate: Thu Aug 15 17:11:26 2024 +0800 [MINOR][SQL][TESTS] Changes the `test:runMain` in the code comments to `Test/runMain` ### What changes were proposed in this pull request? This PR only changes the `test:runMain` description related to run command in the code comments to `Test/runMain`. ### Why are the changes needed? When we use the execution command in the code comments, we will see the following compilation warning: ``` build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.TopKBenchmark" ``` ``` [warn] sbt 0.13 shell syntax is deprecated; use slash syntax instead: sql / Test / runMain ``` The relevant comments should be updated to eliminate the compilation warnings when run the command. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually run the test using the updated command and check that the corresponding compilation warning is no longer present. ### Was this patch authored or co-authored using generative AI tooling? No Closes #47767 from LuciferYang/runMain-command-comments. Authored-by: yangjie01 Signed-off-by: yangjie01 --- sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala | 2 +- .../sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala | 4 ++-- .../sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala | 4 ++-- .../org/apache/spark/sql/execution/benchmark/TopKBenchmark.scala | 4 ++-- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala b/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala index 00b757e4f78f..48a16f01d574 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala @@ -359,7 +359,7 @@ class GenTPCDSDataConfig(args: Array[String]) { private def printUsageAndExit(exitCode: Int): Unit = { // scalastyle:off System.err.println(""" - |build/sbt "test:runMain [Options]" + |build/sbt "Test/runMain [Options]" |Options: | --masterthe Spark master to use, default to local[*] | --dsdgenDir location of dsdgen diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala index e7f83cb7eb4b..0078c3f9f65d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala @@ -33,9 +33,9 @@ import org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter * 1. without sbt: * bin/spark-submit --class --jars * 2. build/sbt build/sbt ";project sql;set javaOptions - *in Test += \"-Dspark.memory.debugFill=false\";test:runMain " + *in Test += \"-Dspark.memory.debugFill=false\";Test/runMain " * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt ";project sql;set javaOptions - *in Test += \"-Dspark.memory.debugFill=false\";test:runMain " + *in Test += \"-Dspark.memory.debugFill=false\";Test/runMain " * Results will be written to * "benchmarks/ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt". * }}} diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala index 88cdfebbb173..1244dd029981 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TakeOrderedAndProjectBenchmark.scala @@ -28,9 +28,9 @@ import org.apache.spark.sql.internal.SQLConf * 1. without sbt: * bin/spark-submit --class *--jars , - * 2. build/sbt "sql/test:runMain " + * 2. build/sbt "sql/Test/runMain " * 3. generate result: - * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain &qu
(spark) branch master updated: [SPARK-49240][BUILD] Add `scalastyle` and `checkstyle` rules to avoid `URL` constructors
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0c24ae11164d [SPARK-49240][BUILD] Add `scalastyle` and `checkstyle` rules to avoid `URL` constructors 0c24ae11164d is described below commit 0c24ae11164d2aa5dcd5042d63e43839ee479757 Author: Dongjoon Hyun AuthorDate: Thu Aug 15 12:27:15 2024 +0800 [SPARK-49240][BUILD] Add `scalastyle` and `checkstyle` rules to avoid `URL` constructors ### What changes were proposed in this pull request? This PR aims to add `scalastyle` and `checkstyle` rules to avoid `URL` constructors. ### Why are the changes needed? The java.net.URL class does not itself encode or decode any URL components according to the escaping mechanism defined in RFC2396. So, from Java 20, all `URL` constructors are deprecated. We had better use better `URI` class. - https://bugs.openjdk.org/browse/JDK-8295949 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with newly added rules. After this PR, there is only two exceptional instances in `JettyUtils.scala` and `UISuite.scala`. - `JettyUtils` is tricky instance - UISuite test case is supposed to add bad URL which URI prevents with `java.net.URISyntaxException`. This is an example why `URI` is better. In this PR, we keep the old, URL class, to keep the test coverage. ``` $ git grep -C1 'new URL(' core/src/main/scala/org/apache/spark/ui/JettyUtils.scala-// scalastyle:off URLConstructor core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:val newUrl = new URL(requestURL, prefixedDestPath).toString core/src/main/scala/org/apache/spark/ui/JettyUtils.scala-// scalastyle:on URLConstructor -- core/src/test/scala/org/apache/spark/ui/UISuite.scala- // scalastyle:off URLConstructor core/src/test/scala/org/apache/spark/ui/UISuite.scala: val badRequest = new URL( core/src/test/scala/org/apache/spark/ui/UISuite.scala- s"http://$localhost:${serverInfo.boundPort}$path/root?bypass&invalid<=foo") ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47762 from dongjoon-hyun/SPARK-49240. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- .../org/apache/spark/sql/avro/AvroSuite.scala | 8 +++ .../apache/spark/deploy/FaultToleranceTest.scala | 4 ++-- .../spark/deploy/rest/RestSubmissionClient.scala | 14 +-- .../scala/org/apache/spark/ui/JettyUtils.scala | 5 +++- .../main/scala/org/apache/spark/util/Utils.scala | 2 +- .../spark/deploy/LogUrlsStandaloneSuite.scala | 4 ++-- .../deploy/history/HistoryServerPageSuite.scala| 6 ++--- .../spark/deploy/history/HistoryServerSuite.scala | 28 +++--- .../deploy/master/MasterDecommisionSuite.scala | 4 ++-- .../spark/deploy/master/ui/MasterWebUISuite.scala | 4 ++-- .../deploy/rest/StandaloneRestSubmitSuite.scala| 4 ++-- .../org/apache/spark/ui/UISeleniumSuite.scala | 16 ++--- .../test/scala/org/apache/spark/ui/UISuite.scala | 25 ++- dev/checkstyle.xml | 4 .../k8s/integrationtest/DepsTestsSuite.scala | 4 ++-- .../org/apache/spark/deploy/yarn/AmIpFilter.java | 6 ++--- .../cluster/YarnSchedulerBackendSuite.scala| 8 +++ scalastyle-config.xml | 5 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 6 ++--- .../v1/sql/SqlResourceWithActualMetricsSuite.scala | 18 +++--- 20 files changed, 95 insertions(+), 80 deletions(-) diff --git a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index 3f1314e970b1..b20ee4b3cc23 100644 --- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.avro import java.io._ -import java.net.URL +import java.net.URI import java.nio.file.{Files, Paths, StandardCopyOption} import java.sql.{Date, Timestamp} import java.util.UUID @@ -648,7 +648,7 @@ abstract class AvroSuite assert(message.contains("No Avro files found.")) Files.copy( - Paths.get(new URL(episodesAvro).toURI), + Paths.get(new URI(episodesAvro)), Paths.get(dir.getCanonicalPath, "episodes.avro")) val result = spark.read.format("avro").load(episodesAvro).collect() @@ -2139,7 +2139,7 @@ abstract class AvroSuite test("SPARK
(spark) branch master updated: [SPARK-49234][BUILD] Upgrade `xz` to `1.10`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1947646f2b63 [SPARK-49234][BUILD] Upgrade `xz` to `1.10` 1947646f2b63 is described below commit 1947646f2b6398506405dc592c6c9b15075cc4e7 Author: Dongjoon Hyun AuthorDate: Wed Aug 14 16:50:04 2024 +0800 [SPARK-49234][BUILD] Upgrade `xz` to `1.10` ### What changes were proposed in this pull request? This PR aims to upgrade `xz` to `1.10` independently from Apache Avro change. ### Why are the changes needed? `1.10` is the latest minor version update with new improvements (like `ARM64`, `optimized classes for Java >= 9`) and bug fixes. - https://github.com/tukaani-project/xz-java/blob/master/NEWS.md#110-2024-07-29 Note that the license is also changed from `Public Domain` to `BSD Zero Clause`. > Licensing change: From version 1.10 onwards, XZ for Java is under the BSD Zero Clause License (0BSD). 1.9 and older are in the public domain and obviously remain so; the change only affects the new releases. ### Does this PR introduce _any_ user-facing change? No behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47750 from dongjoon-hyun/SPARK-49234. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- LICENSE-binary| 11 +-- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- licenses/LICENSE-xz.txt | 11 +++ pom.xml | 2 +- 4 files changed, 18 insertions(+), 8 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index 28f1b63033d5..89826482d363 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -422,6 +422,11 @@ Python Software Foundation License python/pyspark/loose_version.py +BSD 0-Clause + +org.tukaani:xz + + BSD 2-Clause com.github.luben:zstd-jni @@ -520,12 +525,6 @@ org.glassfish.hk2:hk2-locator org.glassfish.hk2:hk2-utils org.glassfish.hk2:osgi-resource-locator - -Public Domain -- -org.tukaani:xz - - Creative Commons CC0 1.0 Universal Public Domain Dedication --- (see LICENSE-CC0.txt) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index aac733e16bed..a0febbfc721a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -271,7 +271,7 @@ univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar wildfly-openssl/1.1.3.Final//wildfly-openssl-1.1.3.Final.jar xbean-asm9-shaded/4.25//xbean-asm9-shaded-4.25.jar xmlschema-core/2.3.1//xmlschema-core-2.3.1.jar -xz/1.9//xz-1.9.jar +xz/1.10//xz-1.10.jar zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar zookeeper-jute/3.9.2//zookeeper-jute-3.9.2.jar zookeeper/3.9.2//zookeeper-3.9.2.jar diff --git a/licenses/LICENSE-xz.txt b/licenses/LICENSE-xz.txt new file mode 100644 index ..4322122aecf1 --- /dev/null +++ b/licenses/LICENSE-xz.txt @@ -0,0 +1,11 @@ +Permission to use, copy, modify, and/or distribute this +software for any purpose with or without fee is hereby granted. + +THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL +WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL +THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR +CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM +LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, +NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN +CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. diff --git a/pom.xml b/pom.xml index cd95cc9f9587..6b2e8b3482d0 100644 --- a/pom.xml +++ b/pom.xml @@ -1608,7 +1608,7 @@ org.tukaani xz -1.9 +1.10
(spark) branch master updated: [SPARK-49187][BUILD] Upgrade slf4j to 2.0.16
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e72d21c299a4 [SPARK-49187][BUILD] Upgrade slf4j to 2.0.16 e72d21c299a4 is described below commit e72d21c299a450e48b3cf6e5d36b8f3e9a568088 Author: yangjie01 AuthorDate: Tue Aug 13 10:42:21 2024 +0800 [SPARK-49187][BUILD] Upgrade slf4j to 2.0.16 ### What changes were proposed in this pull request? This pr aims to upgrade slf4j from 2.0.14 to 2.0.16. ### Why are the changes needed? The new version bring 2 fix: - Fixed issue with stale MANIFEST.MF files. This issue was raisied in https://github.com/qos-ch/slf4j/issues/421 - The information about the provider LoggerFactory connected with will now be reported using the level DEBUG and will not be printed by default.(https://github.com/qos-ch/slf4j/commit/3ff00870b32c2067d72fb83d6a9e95548130) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Action - manual check run `build/sbt core/test` **before** we can see the following message before test ``` SLF4J(I): Connected with provider of type [org.apache.logging.slf4j.SLF4JServiceProvider] ``` **after** No more logs similar to `SLF4J(I): Connected with provider of type [org.apache.logging.slf4j.SLF4JServiceProvider]` ### Was this patch authored or co-authored using generative AI tooling? No Closes #47720 from LuciferYang/SPARK-49187. Authored-by: yangjie01 Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- pom.xml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 66774ec42ce4..b3ded712072f 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -126,7 +126,7 @@ javax.servlet-api/4.0.1//javax.servlet-api-4.0.1.jar javolution/5.5.1//javolution-5.5.1.jar jaxb-api/2.2.11//jaxb-api-2.2.11.jar jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar -jcl-over-slf4j/2.0.14//jcl-over-slf4j-2.0.14.jar +jcl-over-slf4j/2.0.16//jcl-over-slf4j-2.0.16.jar jdo-api/3.0.1//jdo-api-3.0.1.jar jdom2/2.0.6//jdom2-2.0.6.jar jersey-client/3.0.12//jersey-client-3.0.12.jar @@ -153,7 +153,7 @@ json4s-jackson_2.13/4.0.7//json4s-jackson_2.13-4.0.7.jar json4s-scalap_2.13/4.0.7//json4s-scalap_2.13-4.0.7.jar jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar -jul-to-slf4j/2.0.14//jul-to-slf4j-2.0.14.jar +jul-to-slf4j/2.0.16//jul-to-slf4j-2.0.16.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar kubernetes-client-api/6.13.2//kubernetes-client-api-6.13.2.jar kubernetes-client/6.13.2//kubernetes-client-6.13.2.jar @@ -253,7 +253,7 @@ scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar scala-parser-combinators_2.13/2.4.0//scala-parser-combinators_2.13-2.4.0.jar scala-reflect/2.13.14//scala-reflect-2.13.14.jar scala-xml_2.13/2.3.0//scala-xml_2.13-2.3.0.jar -slf4j-api/2.0.14//slf4j-api-2.0.14.jar +slf4j-api/2.0.16//slf4j-api-2.0.16.jar snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar snakeyaml/2.2//snakeyaml-2.2.jar snappy-java/1.1.10.6//snappy-java-1.1.10.6.jar diff --git a/pom.xml b/pom.xml index 8f3283d2498b..cda6d8d5c289 100644 --- a/pom.xml +++ b/pom.xml @@ -119,7 +119,7 @@ 3.2.0 spark 9.7 -2.0.14 +2.0.16 2.22.1 3.4.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49206][CORE][UI] Add `Environment Variables` table to Master `EnvironmentPage`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2973097b4fae [SPARK-49206][CORE][UI] Add `Environment Variables` table to Master `EnvironmentPage` 2973097b4fae is described below commit 2973097b4fae86f69322c7927962c2599f3f98b6 Author: Dongjoon Hyun AuthorDate: Mon Aug 12 20:51:41 2024 +0800 [SPARK-49206][CORE][UI] Add `Environment Variables` table to Master `EnvironmentPage` ### What changes were proposed in this pull request? This PR aims to add `Environment Variables` table to Master `EnvironmentPage` via a new configuration, `spark.master.ui.visibleEnvVarPrefixes`. ### Why are the changes needed? To allow users to expose and show the environment variables of Spark Master. ### Does this PR introduce _any_ user-facing change? Yes, but this is a new table on `Spark Master` UI's `EnvironmentPage`. ### How was this patch tested? Pass the CIs with newly added test case. **DEFAULT** ``` $ sbin/start-master.sh ``` ![Screenshot 2024-08-12 at 00 53 26](https://github.com/user-attachments/assets/b7929536-a25d-4bd5-876d-908ed7403b92) **Expose `AWS_`** ``` $ AWS_CA_BUNDLE=/tmp/root-ca.pem \ AWS_ENDPOINT_URL=https://s3express-usw2-az1.us-west-2.amazonaws.com \ SPARK_MASTER_OPTS="-Dspark.master.ui.visibleEnvVarPrefixes=AWS_" \ sbin/start-master.sh ``` ![Screenshot 2024-08-12 at 01 05 25](https://github.com/user-attachments/assets/50a57ba1-8ed8-4827-ad51-b303da09a663) ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47714 from dongjoon-hyun/SPARK-49206. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- .../main/resources/org/apache/spark/ui/static/webui.js | 1 + .../spark/deploy/master/ui/EnvironmentPage.scala | 18 ++ .../scala/org/apache/spark/internal/config/UI.scala| 7 +++ .../deploy/master/ui/ReadOnlyMasterWebUISuite.scala| 11 +++ 4 files changed, 37 insertions(+) diff --git a/core/src/main/resources/org/apache/spark/ui/static/webui.js b/core/src/main/resources/org/apache/spark/ui/static/webui.js index b365082c1e14..4c7cf8c8ea90 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/webui.js +++ b/core/src/main/resources/org/apache/spark/ui/static/webui.js @@ -75,6 +75,7 @@ $(function() { collapseTablePageLoad('collapse-aggregated-systemProperties','aggregated-systemProperties'); collapseTablePageLoad('collapse-aggregated-metricsProperties','aggregated-metricsProperties'); collapseTablePageLoad('collapse-aggregated-classpathEntries','aggregated-classpathEntries'); + collapseTablePageLoad('collapse-aggregated-environmentVariables','aggregated-environmentVariables'); collapseTablePageLoad('collapse-aggregated-activeJobs','aggregated-activeJobs'); collapseTablePageLoad('collapse-aggregated-completedJobs','aggregated-completedJobs'); collapseTablePageLoad('collapse-aggregated-failedJobs','aggregated-failedJobs'); diff --git a/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala index 190e821524ba..c05b20d30b98 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala @@ -17,12 +17,14 @@ package org.apache.spark.deploy.master.ui +import scala.jdk.CollectionConverters._ import scala.xml.Node import jakarta.servlet.http.HttpServletRequest import org.apache.spark.{SparkConf, SparkEnv} import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.internal.config.UI.MASTER_UI_VISIBLE_ENV_VAR_PREFIXES import org.apache.spark.ui._ import org.apache.spark.util.Utils @@ -39,6 +41,9 @@ private[ui] class EnvironmentPage( val systemProperties = Utils.redact(conf, details("System Properties")).sorted val metricsProperties = Utils.redact(conf, details("Metrics Properties")).sorted val classpathEntries = details("Classpath Entries").sorted +val prefixes = conf.get(MASTER_UI_VISIBLE_ENV_VAR_PREFIXES) +val environmentVariables = System.getenv().asScala + .filter { case (k, _) => prefixes.exists(k.startsWith(_)) }.toSeq.sorted val runtimeInformationTable = UIUtils.listingTable(propertyHeader, propertyRow, jvmInformation, fixedWidth = true, headerClasses = headerClasses) @@ -52,6 +57,8 @@ private[ui] class EnvironmentPage( metricsPr
(spark) branch master updated: [SPARK-49077][SQL][TESTS] Remove `bouncycastle-related` test dependencies from `hive-thriftserver` module
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 08b1fb55d738 [SPARK-49077][SQL][TESTS] Remove `bouncycastle-related` test dependencies from `hive-thriftserver` module 08b1fb55d738 is described below commit 08b1fb55d738dde5a69aa94aab946d15bc2568af Author: yangjie01 AuthorDate: Thu Aug 1 23:34:17 2024 +0800 [SPARK-49077][SQL][TESTS] Remove `bouncycastle-related` test dependencies from `hive-thriftserver` module ### What changes were proposed in this pull request? After SPARK-49066 merged, other than `OrcEncryptionSuite`, the test cases for writing Orc data no longer require the use of `FakeKeyProvider`. As a result, `hive-thriftserver` no longer needs these test dependencies. ### Why are the changes needed? Clean up the test dependencies that are no longer needed by `hive-thriftserver`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual Test with this pr. ``` build/mvn -Phive -Phive-thriftserver clean install -DskipTests build/mvn -Phive -Phive-thriftserver clean install -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -pl sql/hive-thriftserver ``` ``` Run completed in 5 minutes, 14 seconds. Total number of tests run: 243 Suites: completed 2, aborted 0 Tests: succeeded 243, failed 0, canceled 0, ignored 20, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #47563 from LuciferYang/SPARK-49077. Authored-by: yangjie01 Signed-off-by: yangjie01 --- sql/hive-thriftserver/pom.xml | 10 -- 1 file changed, 10 deletions(-) diff --git a/sql/hive-thriftserver/pom.xml b/sql/hive-thriftserver/pom.xml index d50c78bd1f9b..6a352f8a530d 100644 --- a/sql/hive-thriftserver/pom.xml +++ b/sql/hive-thriftserver/pom.xml @@ -156,16 +156,6 @@ org.apache.httpcomponents httpcore - - org.bouncycastle - bcprov-jdk18on - test - - - org.bouncycastle - bcpkix-jdk18on - test - target/scala-${scala.binary.version}/classes - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49076][SQL] Fix the outdated `logical plan name` in `AstBuilder's` comments
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new abf9bac69680 [SPARK-49076][SQL] Fix the outdated `logical plan name` in `AstBuilder's` comments abf9bac69680 is described below commit abf9bac696804d7af28fbc9cd9026efddea303e3 Author: panbingkun AuthorDate: Thu Aug 1 23:32:44 2024 +0800 [SPARK-49076][SQL] Fix the outdated `logical plan name` in `AstBuilder's` comments ### What changes were proposed in this pull request? The pr aims to fix the outdated `logical plan name` in `AstBuilder's` comments. ### Why are the changes needed? - After the pr https://github.com/apache/spark/pull/33609, the name of the logical plan below has been changed: `AlterTableAddColumns` -> `AddColumns` `AlterTableRenameColumn` -> `RenameColumn` `AlterTableAlterColumn` -> `AlterColumn` `AlterTableDropColumns` -> `DropColumns` - After the pr https://github.com/apache/spark/pull/30398 The name of the logical plan `ShowPartitionsStatement` has been changed to `ShowPartitions`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Only update comments. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47562 from panbingkun/fix_astbuilder. Lead-authored-by: panbingkun Co-authored-by: panbingkun Signed-off-by: yangjie01 --- .../apache/spark/sql/catalyst/parser/AstBuilder.scala| 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index a046ededf964..feb3ef4e7155 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -61,8 +61,8 @@ import org.apache.spark.util.random.RandomSampler * The AstBuilder converts an ANTLR4 ParseTree into a catalyst Expression, LogicalPlan or * TableIdentifier. */ -class AstBuilder extends DataTypeAstBuilder with SQLConfHelper - with Logging with DataTypeErrorsBase { +class AstBuilder extends DataTypeAstBuilder + with SQLConfHelper with Logging with DataTypeErrorsBase { import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ import ParserUtils._ @@ -4452,7 +4452,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper } /** - * Parse a [[AlterTableAddColumns]] command. + * Parse a [[AddColumns]] command. * * For example: * {{{ @@ -4469,7 +4469,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper } /** - * Parse a [[AlterTableRenameColumn]] command. + * Parse a [[RenameColumn]] command. * * For example: * {{{ @@ -4485,7 +4485,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper } /** - * Parse a [[AlterTableAlterColumn]] command to alter a column's property. + * Parse a [[AlterColumn]] command to alter a column's property. * * For example: * {{{ @@ -4555,7 +4555,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper } /** - * Parse a [[AlterTableAlterColumn]] command. This is Hive SQL syntax. + * Parse a [[AlterColumn]] command. This is Hive SQL syntax. * * For example: * {{{ @@ -4639,7 +4639,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper } /** - * Parse a [[AlterTableDropColumns]] command. + * Parse a [[DropColumns]] command. * * For example: * {{{ @@ -4979,7 +4979,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper * A command for users to list the partition names of a table. If partition spec is specified, * partitions that match the spec are returned. Otherwise an empty result set is returned. * - * This function creates a [[ShowPartitionsStatement]] logical plan + * This function creates a [[ShowPartitions]] logical plan * * The syntax of using this command in SQL is: * {{{ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48964][SQL][DOCS] Fix the discrepancy between implementation, comment and documentation of option `recursive.fields.max.depth` in ProtoBuf connector
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5954ed19b9dd [SPARK-48964][SQL][DOCS] Fix the discrepancy between implementation, comment and documentation of option `recursive.fields.max.depth` in ProtoBuf connector 5954ed19b9dd is described below commit 5954ed19b9dd1e7e741e37a689ae741722f7d5b6 Author: Wei Guo AuthorDate: Wed Jul 31 17:04:26 2024 +0800 [SPARK-48964][SQL][DOCS] Fix the discrepancy between implementation, comment and documentation of option `recursive.fields.max.depth` in ProtoBuf connector ### What changes were proposed in this pull request? This PR aims to fix the discrepancy between implementation, comment and documentation of option `recursive.fields.max.depth` in ProtoBuf connector ### Why are the changes needed? Unify code implementation and documentation description. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47458 from wayneguow/SPARK-48964. Authored-by: Wei Guo Signed-off-by: yangjie01 --- .../utils/src/main/resources/error/error-conditions.json | 2 +- .../apache/spark/sql/protobuf/utils/SchemaConverters.scala | 14 +++--- docs/sql-data-sources-protobuf.md | 11 ++- 3 files changed, 14 insertions(+), 13 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 15b851a78d62..de127d4a7bf0 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -3640,7 +3640,7 @@ }, "RECURSIVE_PROTOBUF_SCHEMA" : { "message" : [ - "Found recursive reference in Protobuf schema, which can not be processed by Spark by default: . try setting the option `recursive.fields.max.depth` 0 to 10. Going beyond 10 levels of recursion is not allowed." + "Found recursive reference in Protobuf schema, which can not be processed by Spark by default: . try setting the option `recursive.fields.max.depth` 1 to 10. Going beyond 10 levels of recursion is not allowed." ], "sqlState" : "42K0G" }, diff --git a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala index feb5aed03451..56c1f8185061 100644 --- a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala +++ b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala @@ -176,16 +176,16 @@ object SchemaConverters extends Logging { } case MESSAGE => // If the `recursive.fields.max.depth` value is not specified, it will default to -1, -// and recursive fields are not permitted. Setting it to 0 drops all recursive fields, -// 1 allows it to be recursed once, and 2 allows it to be recursed twice and so on. -// A value greater than 10 is not allowed, and if a protobuf record has more depth for -// recursive fields than the allowed value, it will be truncated and some fields may be -// discarded. +// and recursive fields are not permitted. Setting it to 1 drops all recursive fields, +// 2 allows it to be recursed once, and 3 allows it to be recursed twice and so on. +// A value less than or equal to 0 or greater than 10 is not allowed, and if a protobuf +// record has more depth for recursive fields than the allowed value, it will be truncated +// and some fields may be discarded. // SQL Schema for protob2uf `message Person { string name = 1; Person bff = 2;}` // will vary based on the value of "recursive.fields.max.depth". // 1: struct -// 2: struct> -// 3: struct>> +// 2: struct> +// 3: struct>> // and so on. // TODO(rangadi): A better way to terminate would be replace the remaining recursive struct // with the byte array of corresponding protobuf. This way no information is lost. diff --git a/docs/sql-data-sources-protobuf.md b/docs/sql-data-sources-protobuf.md index 28e3e83bef7c..34cb1d4997d2 100644 --- a/docs/sql-data-sources-protobuf.md +++ b/docs/sql-data-sources-protobuf.md @@ -402,9 +402,9 @@ Spark supports the writing of all Spark SQL types into Protobuf. For most types, ## Handling circular references protobuf fields One common i
(spark) branch master updated: [SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6ff93eaca57e [SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1 6ff93eaca57e is described below commit 6ff93eaca57e14ed26e2e3cceb25d53e811f4765 Author: panbingkun AuthorDate: Tue Jul 30 13:28:53 2024 +0800 [SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1 ### What changes were proposed in this pull request? The pr aims to upgrade `RoaringBitmap` from `1.1.0` to `1.2.1`. ### Why are the changes needed? - The full release notes: https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.2.0 - The latest version has brought bug fixes and some improvements: improve: Optimize RoaringBitSet.get(int fromIndex, int toIndex) https://github.com/RoaringBitmap/RoaringBitmap/pull/727 fix: add bitmapOfRange (non-static) in https://github.com/RoaringBitmap/RoaringBitmap/pull/728 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47247 from panbingkun/SPARK-48829. Authored-by: panbingkun Signed-off-by: yangjie01 --- core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 core/benchmarks/MapStatusesConvertBenchmark-results.txt | 8 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt index a15442496b24..8f0886ae4d99 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure +OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1023-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500674685 12 0.0 673772738.0 1.0X -Num Maps: 5 Fetch partitions:1000 1579 1590 12 0.0 1579383970.0 0.4X -Num Maps: 5 Fetch partitions:1500 2435 2472 37 0.0 2434530380.0 0.3X +Num Maps: 5 Fetch partitions:500697707 10 0.0 697013793.0 1.0X +Num Maps: 5 Fetch partitions:1000 1608 1621 16 0.0 1608250487.0 0.4X +Num Maps: 5 Fetch partitions:1500 2443 2478 39 0.0 2443321570.0 0.3X diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-results.txt index b9f36af4a653..b64b0b392473 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure +OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1023-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500703716 11 0.0 703103575.0 1.0X -Num Maps: 5 Fetch partitions:1000 1707 1723 14 0.0 1707060398.0 0.4X -Num Maps: 5 Fetch partitions:1500 2626 2638 14 0.0 2625981097.0 0.3X +Num Maps: 5 Fetch partitions:500769772 3 0.0 769382967.0 1.0X +Num Maps: 5 Fetch partitions:1000 1698 1715 14 0.0 1698166886.0 0.5X +Num Maps: 5 Fetch partitions:1500 2588 2606 26 0.0 2587840071.0 0.3X diff --git a/dev
(spark) branch master updated: [SPARK-48974][SQL][SS][ML][MLLIB] Use `SparkSession.implicits` instead of `SQLContext.implicits`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 877c3f2bea92 [SPARK-48974][SQL][SS][ML][MLLIB] Use `SparkSession.implicits` instead of `SQLContext.implicits` 877c3f2bea92 is described below commit 877c3f2bea924ca9f3fd5b7e9c6cbfb0fc3be958 Author: yangjie01 AuthorDate: Wed Jul 24 10:41:07 2024 +0800 [SPARK-48974][SQL][SS][ML][MLLIB] Use `SparkSession.implicits` instead of `SQLContext.implicits` ### What changes were proposed in this pull request? This PR replaces `SQLContext.implicits` with `SparkSession.implicits` in the Spark codebase. ### Why are the changes needed? Reduce the usage of code from `SQLContext` within the internal code of Spark. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #47457 from LuciferYang/use-sparksession-implicits. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: yangjie01 --- .../src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 2 +- .../apache/spark/ml/classification/FMClassifierSuite.scala | 4 ++-- .../spark/ml/classification/LogisticRegressionSuite.scala| 12 ++-- .../apache/spark/ml/recommendation/CollectTopKSuite.scala| 4 ++-- .../apache/spark/ml/regression/LinearRegressionSuite.scala | 4 ++-- .../test/scala/org/apache/spark/ml/util/MLTestingUtils.scala | 2 +- .../spark/sql/execution/datasources/csv/CSVUtils.scala | 2 +- .../org/apache/spark/sql/SparkSessionExtensionSuite.scala| 8 .../org/apache/spark/sql/streaming/util/BlockingSource.scala | 2 +- .../spark/sql/hive/HiveContextCompatibilitySuite.scala | 4 ++-- .../org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala | 2 +- .../scala/org/apache/spark/sql/hive/ListTablesSuite.scala| 2 +- .../org/apache/spark/sql/hive/execution/HiveQuerySuite.scala | 2 +- .../spark/sql/hive/execution/HiveResolutionSuite.scala | 2 +- .../apache/spark/sql/hive/execution/HiveTableScanSuite.scala | 2 +- .../sql/sources/BucketedWriteWithHiveSupportSuite.scala | 2 +- 16 files changed, 28 insertions(+), 28 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala index e23423e4c004..1257d2ccfbfb 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala @@ -119,7 +119,7 @@ object MLUtils extends Logging { ).resolveRelation(checkFilesExist = false)) .select("value") -import lines.sqlContext.implicits._ +import lines.sparkSession.implicits._ lines.select(trim($"value").as("line")) .filter(not((length($"line") === 0).or($"line".startsWith("#" diff --git a/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala index 68e83fccf3d1..ff9ce1ca7b9f 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala @@ -52,8 +52,8 @@ class FMClassifierSuite extends MLTest with DefaultReadWriteTest { } test("FMClassifier: Predictor, Classifier methods") { -val sqlContext = smallBinaryDataset.sqlContext -import sqlContext.implicits._ +val session = smallBinaryDataset.sparkSession +import session.implicits._ val fm = new FMClassifier() val model = fm.fit(smallBinaryDataset) diff --git a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala index 8e54262e2f61..b0e275f5e193 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala @@ -550,8 +550,8 @@ class LogisticRegressionSuite extends MLTest with DefaultReadWriteTest { } test("multinomial logistic regression: Predictor, Classifier methods") { -val sqlContext = smallMultinomialDataset.sqlContext -import sqlContext.implicits._ +val session = smallMultinomialDataset.sparkSession +import session.implicits._ val mlr = new LogisticRegression().setFamily("multinomial") val model = mlr.fit(smallMultinomialDataset) @@ -590,8 +590,8 @@ class LogisticRegressionSuite extends MLTest with DefaultReadWriteTes
(spark) branch master updated: [SPARK-48893][SQL][PYTHON][DOCS] Add some examples for `linearRegression` built-in functions
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a809740cf6ec [SPARK-48893][SQL][PYTHON][DOCS] Add some examples for `linearRegression` built-in functions a809740cf6ec is described below commit a809740cf6ec039141d416f6fb27a6deb66b3d2c Author: Wei Guo AuthorDate: Tue Jul 23 10:27:21 2024 +0800 [SPARK-48893][SQL][PYTHON][DOCS] Add some examples for `linearRegression` built-in functions ### What changes were proposed in this pull request? This PR aims to add some extra examples for `linearRegression` built-in functions. ### Why are the changes needed? - Align the use examples for this series of functions. - Allow users to better understand the usage of `linearRegression` related methods from sql built-in functions docs(https://spark.apache.org/docs/latest/api/sql/index.html). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA and Manual testing for new examples. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47343 from wayneguow/regr_series. Authored-by: Wei Guo Signed-off-by: yangjie01 --- python/pyspark/sql/functions/builtin.py| 545 ++--- .../expressions/aggregate/linearRegression.scala | 28 +- .../sql-functions/sql-expression-schema.md | 4 +- 3 files changed, 494 insertions(+), 83 deletions(-) diff --git a/python/pyspark/sql/functions/builtin.py b/python/pyspark/sql/functions/builtin.py index 5b9d0dd87002..3d094dd38c50 100644 --- a/python/pyspark/sql/functions/builtin.py +++ b/python/pyspark/sql/functions/builtin.py @@ -3671,16 +3671,59 @@ def regr_avgx(y: "ColumnOrName", x: "ColumnOrName") -> Column: Examples ->>> from pyspark.sql import functions as sf ->>> x = (sf.col("id") % 3).alias("x") ->>> y = (sf.randn(42) + x * 10).alias("y") ->>> spark.range(0, 1000, 1, 1).select(x, y).select( -... sf.regr_avgx("y", "x"), sf.avg("x") -... ).show() +Example 1: All paris are non-null + +>>> import pyspark.sql.functions as sf +>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x)") +>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show() +---+--+ |regr_avgx(y, x)|avg(x)| +---+--+ -| 0.999| 0.999| +| 2.75| 2.75| ++---+--+ + +Example 2: All paris's x values are null + +>>> import pyspark.sql.functions as sf +>>> df = spark.sql("SELECT * FROM VALUES (1, null) AS tab(y, x)") +>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show() ++---+--+ +|regr_avgx(y, x)|avg(x)| ++---+--+ +| NULL| NULL| ++---+--+ + +Example 3: All paris's y values are null + +>>> import pyspark.sql.functions as sf +>>> df = spark.sql("SELECT * FROM VALUES (null, 1) AS tab(y, x)") +>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show() ++---+--+ +|regr_avgx(y, x)|avg(x)| ++---+--+ +| NULL| 1.0| ++---+--+ + +Example 4: Some paris's x values are null + +>>> import pyspark.sql.functions as sf +>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x)") +>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show() ++---+--+ +|regr_avgx(y, x)|avg(x)| ++---+--+ +|3.0| 3.0| ++---+--+ + +Example 5: Some paris's x or y values are null + +>>> import pyspark.sql.functions as sf +>>> df = spark.sql("SELECT * FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x)") +>>> df.select(sf.regr_avgx("y", "x"), sf.avg("x")).show() ++---+--+ +|regr_avgx(y, x)|avg(x)| ++---+--+ +|3.0| 3.0| +---+--+ """ return _invoke_function_over_columns("regr_avgx", y, x) @@ -3708,17 +3751,60 @@ def regr_avgy(y: "ColumnOrName", x: "ColumnOrName") -> Column: Examples
(spark) branch master updated: [SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance the test coverage of behavior changes of `asin` and `acos` complying Standard SQL
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4cc41ea63f94 [SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance the test coverage of behavior changes of `asin` and `acos` complying Standard SQL 4cc41ea63f94 is described below commit 4cc41ea63f943b61be8f771f5cd95cfd4ea15c2e Author: Wei Guo AuthorDate: Tue Jul 23 10:21:32 2024 +0800 [SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance the test coverage of behavior changes of `asin` and `acos` complying Standard SQL ### What changes were proposed in this pull request? This PR aims to upgrade `h2` from 2.2.220 to 2.3.230 and enhance the test coverage of behavior changes of `asin` and `acos` complying Standard SQL. The detail of behavior changes as follows: After this commit( https://github.com/h2database/h2database/commit/186647d4a35d05681febf4f53502b306aa6d511a), the behavior of `asin` and `acos` has changed in h2, complying with Standard SQL, and throwing exceptions directly when the argument is invalid(< -1d || > 1d). ### Why are the changes needed? 2.3.230 is latest version of `h2`, there are a lot of bug fixes and improvements. Full change notes: https://www.h2database.com/html/changelog.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Update a exist test case and add a new test case. Pass GA and manually test `JDBCV2Suite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47414 from wayneguow/upgrade_h2. Authored-by: Wei Guo Signed-off-by: yangjie01 --- connect/server/pom.xml | 2 +- sql/core/pom.xml | 2 +- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 58 +++--- 3 files changed, 42 insertions(+), 20 deletions(-) diff --git a/connect/server/pom.xml b/connect/server/pom.xml index 73a3310c8a38..ecbb22168aa1 100644 --- a/connect/server/pom.xml +++ b/connect/server/pom.xml @@ -254,7 +254,7 @@ com.h2database h2 - 2.2.220 + 2.3.230 test diff --git a/sql/core/pom.xml b/sql/core/pom.xml index 59d798e6e62f..c891763eb4e1 100644 --- a/sql/core/pom.xml +++ b/sql/core/pom.xml @@ -166,7 +166,7 @@ com.h2database h2 - 2.2.220 + 2.3.230 test diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index e1a7971b283c..db06aac7f5e0 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -37,7 +37,7 @@ import org.apache.spark.sql.connector.expressions.Expression import org.apache.spark.sql.execution.FormattedMode import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2ScanRelation, V1ScanWrapper} import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog -import org.apache.spark.sql.functions.{abs, acos, asin, atan, atan2, avg, ceil, coalesce, cos, cosh, cot, count, count_distinct, degrees, exp, floor, lit, log => logarithm, log10, not, pow, radians, round, signum, sin, sinh, sqrt, sum, tan, tanh, udf, when} +import org.apache.spark.sql.functions.{abs, acos, asin, avg, ceil, coalesce, count, count_distinct, degrees, exp, floor, lit, log => logarithm, log10, not, pow, radians, round, signum, sqrt, sum, udf, when} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession import org.apache.spark.sql.types.{DataType, IntegerType, StringType} @@ -1258,25 +1258,29 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel checkAnswer(df15, Seq(Row(1, "cathy", 9000, 1200, false), Row(2, "alex", 12000, 1200, false), Row(6, "jen", 12000, 1200, true))) -val df16 = spark.table("h2.test.employee") - .filter(sin($"bonus") < -0.08) - .filter(sinh($"bonus") > 200) - .filter(cos($"bonus") > 0.9) - .filter(cosh($"bonus") > 200) - .filter(tan($"bonus") < -0.08) - .filter(tanh($"bonus") === 1) - .filter(cot($"bonus") < -11) - .filter(asin($"bonus") > 0.1) - .filter(acos($"bonus") > 1.4) - .filter(atan($"bonus") > 1.4) - .filter(atan2($"bonus", $"bonus") > 0.7) +val df16 = sql( + """ +|SELECT * FROM h2.test.employee +|WHERE sin(bonus) < -0.08 +|
(spark) branch master updated: [MINOR][SQL][TESTS] Enable test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 546da0d5522e [MINOR][SQL][TESTS] Enable test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite` 546da0d5522e is described below commit 546da0d5522ec79620bd29563c5ea809386635f5 Author: yangjie01 AuthorDate: Thu Jul 18 15:58:21 2024 +0800 [MINOR][SQL][TESTS] Enable test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite` ### What changes were proposed in this pull request? This PR enabled test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite` because this test no longer depends on Hive classes, we can test it like other test cases in this Suite. ### Why are the changes needed? Enable test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #47400 from LuciferYang/minor-testOrcAPI. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java| 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java index 2a0c8c00574a..691fb67bbe90 100644 --- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java +++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java @@ -144,10 +144,7 @@ public class JavaDataFrameReaderWriterSuite { .write().parquet(output); } - /** - * This only tests whether API compiles, but does not run it as orc() - * cannot be run without Hive classes. - */ + @Test public void testOrcAPI() { spark.read().schema(schema).orc(); spark.read().schema(schema).orc(input); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][SQL][TESTS] Fix compilation warning `adaptation of an empty argument list by inserting () is deprecated`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b6c05259a0b9 [MINOR][SQL][TESTS] Fix compilation warning `adaptation of an empty argument list by inserting () is deprecated` b6c05259a0b9 is described below commit b6c05259a0b98205d2f0fe2476ecd09c8d258b0a Author: panbingkun AuthorDate: Mon Jul 15 17:11:17 2024 +0800 [MINOR][SQL][TESTS] Fix compilation warning `adaptation of an empty argument list by inserting () is deprecated` ### What changes were proposed in this pull request? The pr aims to fix compilation warning: `adaptation of an empty argument list by inserting () is deprecated` ### Why are the changes needed? Fix compilation warning. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check. Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47350 from panbingkun/ParquetCommitterSuite_deprecated. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala index eadd55bdc320..fb435e3639fd 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala @@ -116,7 +116,7 @@ class ParquetCommitterSuite extends SparkFunSuite with SQLTestUtils test("SPARK-48804: Fail fast on unloadable or invalid committers") { Seq("invalid", getClass.getName).foreach { committer => val e = intercept[IllegalArgumentException] { -withSQLConf(SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key -> committer)() +withSQLConf(SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key -> committer)(()) } assert(e.getMessage.contains(classOf[OutputCommitter].getName)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4c7edd2a2048 [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug 4c7edd2a2048 is described below commit 4c7edd2a20480a8521fcc88a966b22619143aebd Author: panbingkun AuthorDate: Fri Jul 12 15:22:34 2024 +0800 [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug ### What changes were proposed in this pull request? The pr aims to refactor `HiveQuerySuite` and `fix` bug, includes: - use `getWorkspaceFilePath` to enable `HiveQuerySuite` to run successfully in the IDE. - make the test `lookup hive UDF in another thread` `independence`, without relying on the previous UT `current_database with multiple sessions`. - enable two test: `non-boolean conditions in a CaseWhen are illegal` and `Dynamic partition folder layout`. ### Why are the changes needed? - Run successfully in the `IDE` Before: https://github.com/apache/spark/assets/15246973/005fd49c-3edf-4e51-8223-097fd7a485bf";> After: https://github.com/apache/spark/assets/15246973/caedec72-be0c-4bb5-bc06-26cceef8b4b8";> - Make UT `lookup hive UDF in another thread` `independence` when `only` running it, it actually failed with the following error: https://github.com/apache/spark/assets/15246973/ef9c260f-8c0d-4821-8233-d4d7ae13802a";> **why ?** Because the previous UT `current_database with multiple sessions` changed `current database` and was not restored after it finished running. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47293 from panbingkun/refactor_HiveQuerySuite. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../sql/hive/execution/HiveComparisonTest.scala| 5 +- .../spark/sql/hive/execution/HiveQuerySuite.scala | 249 +++-- 2 files changed, 135 insertions(+), 119 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala index f0feccb4f494..87e58bb8fa13 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala @@ -100,8 +100,9 @@ abstract class HiveComparisonTest extends SparkFunSuite with BeforeAndAfterAll { .map(name => new File(targetDir, s"$suiteName.$name")) /** The local directory with cached golden answer will be stored. */ - protected val answerCache = new File("src" + File.separator + "test" + -File.separator + "resources" + File.separator + "golden") + protected val answerCache = getWorkspaceFilePath( +"sql", "hive", "src", "test", "resources", "golden").toFile + if (!answerCache.exists) { answerCache.mkdir() } diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala index 5ccb7f0d1f84..24d1e24b30c8 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala @@ -18,7 +18,6 @@ package org.apache.spark.sql.hive.execution import java.io.File -import java.net.URI import java.nio.file.Files import java.sql.Timestamp @@ -679,15 +678,23 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd assert(actual === expected) } - // TODO: adopt this test when Spark SQL has the functionality / framework to report errors. - // See https://github.com/apache/spark/pull/1055#issuecomment-45820167 for a discussion. - ignore("non-boolean conditions in a CaseWhen are illegal") { + test("non-boolean conditions in a CaseWhen are illegal") { checkError( exception = intercept[AnalysisException] { sql("SELECT (CASE WHEN key > 2 THEN 3 WHEN 1 THEN 2 ELSE 0 END) FROM src").collect() }, - errorClass = null, - parameters = Map.empty) + errorClass = "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE", + parameters = Map( +"sqlExpr" -> "\"CASE WHEN (key > 2) THEN 3 WHEN 1 THEN 2 ELSE 0 END\"", +"paramIndex" -> "second", +"i
(spark) branch master updated: [SPARK-48866][SQL] Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 261dbf4a9047 [SPARK-48866][SQL] Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET 261dbf4a9047 is described below commit 261dbf4a9047bc00271137b547341e02351106ed Author: Kent Yao AuthorDate: Thu Jul 11 18:59:10 2024 +0800 [SPARK-48866][SQL] Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET ### What changes were proposed in this pull request? This PR fixes hints at the error message of INVALID_PARAMETER_VALUE.CHARSET. The current error message does not enumerate all valid charsets, e.g. UTF-32. This PR parameterizes it to fix this issue. ### Why are the changes needed? Bugfix, the hint w/ charsets missing is not helpful ### Does this PR introduce _any_ user-facing change? Yes, error message changing ### How was this patch tested? modified tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #47295 from yaooqinn/SPARK-48866. Authored-by: Kent Yao Signed-off-by: yangjie01 --- common/utils/src/main/resources/error/error-conditions.json | 2 +- .../scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 5 +++-- .../resources/sql-tests/results/ansi/string-functions.sql.out | 8 .../src/test/resources/sql-tests/results/string-functions.sql.out | 8 .../org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala | 8 +--- 5 files changed, 25 insertions(+), 6 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 02d1e63e380a..7f54a77c94a0 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -2584,7 +2584,7 @@ }, "CHARSET" : { "message" : [ - "expects one of the charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', but got ." + "expects one of the , but got ." ] }, "DATETIME_UNIT" : { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index d524742e126e..bdd53219de40 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -42,7 +42,7 @@ import org.apache.spark.sql.catalyst.plans.JoinType import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.plans.logical.statsEstimation.ValueInterval import org.apache.spark.sql.catalyst.trees.{Origin, TreeNode} -import org.apache.spark.sql.catalyst.util.{sideBySide, DateTimeUtils, FailFastMode, MapData} +import org.apache.spark.sql.catalyst.util.{sideBySide, CharsetProvider, DateTimeUtils, FailFastMode, MapData} import org.apache.spark.sql.connector.catalog.{CatalogNotFoundException, Table, TableProvider} import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ import org.apache.spark.sql.connector.expressions.Transform @@ -2742,7 +2742,8 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE messageParameters = Map( "functionName" -> toSQLId(functionName), "parameter" -> toSQLId("charset"), -"charset" -> charset)) +"charset" -> charset, +"charsets" -> CharsetProvider.VALID_CHARSETS.mkString(", "))) } def malformedCharacterCoding(functionName: String, charset: String): RuntimeException = { diff --git a/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out index da2fa9ca0c18..d4adec22c50f 100644 --- a/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out @@ -846,6 +846,7 @@ org.apache.spark.SparkIllegalArgumentException "sqlState" : "22023", "messageParameters" : { "charset" : "WINDOWS-1252", +"charsets" : "UTF-16LE, UTF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, ISO-8859-1", "functionName" : "`encode`", "parameter" : "`charset`" } @@ -863,6 +864,7 @@
(spark) branch master updated: [SPARK-48826][BUILD] Upgrade `fasterxml.jackson` to 2.17.2
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 65daff55f556 [SPARK-48826][BUILD] Upgrade `fasterxml.jackson` to 2.17.2 65daff55f556 is described below commit 65daff55f556ab48e06aa1f0536b627a8b479b9b Author: Wei Guo AuthorDate: Tue Jul 9 16:01:27 2024 +0800 [SPARK-48826][BUILD] Upgrade `fasterxml.jackson` to 2.17.2 ### What changes were proposed in this pull request? This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2. ### Why are the changes needed? There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind): [#4561](https://github.com/FasterXML/jackson-databind/issues/4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock) [#4575](https://github.com/FasterXML/jackson-databind/issues/4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input [#4577](https://github.com/FasterXML/jackson-databind/issues/4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation) [#4595](https://github.com/FasterXML/jackson-databind/issues/4595): No way to explicitly disable wrapping in custom annotation processor [#4607](https://github.com/FasterXML/jackson-databind/issues/4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id' [#4610](https://github.com/FasterXML/jackson-databind/issues/4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling The full release note of 2.17.2: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47241 from wayneguow/upgrade_jackson. Authored-by: Wei Guo Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++--- pom.xml | 4 ++-- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 2c3bee92176b..5ec7cb541ee7 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -102,16 +102,16 @@ icu4j/75.1//icu4j-75.1.jar ini4j/0.5.4//ini4j-0.5.4.jar istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar ivy/2.5.2//ivy-2.5.2.jar -jackson-annotations/2.17.1//jackson-annotations-2.17.1.jar +jackson-annotations/2.17.2//jackson-annotations-2.17.2.jar jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar -jackson-core/2.17.1//jackson-core-2.17.1.jar -jackson-databind/2.17.1//jackson-databind-2.17.1.jar -jackson-dataformat-cbor/2.17.1//jackson-dataformat-cbor-2.17.1.jar -jackson-dataformat-yaml/2.17.1//jackson-dataformat-yaml-2.17.1.jar +jackson-core/2.17.2//jackson-core-2.17.2.jar +jackson-databind/2.17.2//jackson-databind-2.17.2.jar +jackson-dataformat-cbor/2.17.2//jackson-dataformat-cbor-2.17.2.jar +jackson-dataformat-yaml/2.17.2//jackson-dataformat-yaml-2.17.2.jar jackson-datatype-jdk8/2.17.0//jackson-datatype-jdk8-2.17.0.jar -jackson-datatype-jsr310/2.17.1//jackson-datatype-jsr310-2.17.1.jar +jackson-datatype-jsr310/2.17.2//jackson-datatype-jsr310-2.17.2.jar jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar -jackson-module-scala_2.13/2.17.1//jackson-module-scala_2.13-2.17.1.jar +jackson-module-scala_2.13/2.17.2//jackson-module-scala_2.13-2.17.2.jar jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar diff --git a/pom.xml b/pom.xml index 0ebe6ab8c580..b2dd22cb0c0a 100644 --- a/pom.xml +++ b/pom.xml @@ -180,8 +180,8 @@ true true 1.9.13 -2.17.1 - 2.17.1 +2.17.2 + 2.17.2 2.3.1 3.0.2 1.1.10.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48720][SQL] Align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 73126b17515a [SPARK-48720][SQL] Align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2 73126b17515a is described below commit 73126b17515adc73dbb63f199fd641c330171d02 Author: panbingkun AuthorDate: Mon Jul 8 11:40:21 2024 +0800 [SPARK-48720][SQL] Align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2 ### What changes were proposed in this pull request? The pr aims to: - align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2. (this means that in the v1, regardless of whether `IF EXISTS` is specified or not, when unset a `non-existent` property, it is `ignored` and no longer `fails`.) - update the description of `ALTER TABLE ... UNSET TBLPROPERTIES ...` in the doc `docs/sql-ref-syntax-ddl-alter-table.md`. - unify v1 and v2 `ALTER TABLE ... UNSET TBLPROPERTIES ...` tests. - Add the following `scenario` for `ALTER TABLE ... SET TBLPROPERTIES ...` testing A.`table to alter does not exist` B.`alter table set reserved properties` ### Why are the changes needed? - align the command `ALTER TABLE ... UNSET TBLPROPERTIES ...` in v1 and v2, avoid confusing end-users. - to improve test coverage. - align with other similar tests, eg: `AlterTableSetTblProperties*` ### Does this PR introduce _any_ user-facing change? Yes, in the `v1`, regardless of whether `IF EXISTS` is specified or not, when unset a `non-existent` property, it is `ignored` and no longer `fails` ### How was this patch tested? Update some UT & Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47097 from panbingkun/alter_unset_table. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../src/main/resources/error/error-conditions.json | 6 - docs/sql-ref-syntax-ddl-alter-table.md | 20 ++- .../spark/sql/errors/QueryCompilationErrors.scala | 10 -- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 19 --- .../apache/spark/sql/execution/command/ddl.scala | 8 -- .../AlterTableSetTblPropertiesSuiteBase.scala | 80 +-- .../AlterTableUnsetTblPropertiesParserSuite.scala | 65 + .../AlterTableUnsetTblPropertiesSuiteBase.scala| 149 + .../sql/execution/command/DDLParserSuite.scala | 12 -- .../spark/sql/execution/command/DDLSuite.scala | 67 - .../v1/AlterTableSetTblPropertiesSuite.scala | 4 + ...ala => AlterTableUnsetTblPropertiesSuite.scala} | 17 ++- .../v2/AlterTableSetTblPropertiesSuite.scala | 4 + ...ala => AlterTableUnsetTblPropertiesSuite.scala} | 10 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 26 +--- .../AlterTableUnsetTblPropertiesSuite.scala| 27 16 files changed, 353 insertions(+), 171 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 45b922b88063..06f8d3a78252 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -4275,12 +4275,6 @@ ], "sqlState" : "42883" }, - "UNSET_NONEXISTENT_PROPERTIES" : { -"message" : [ - "Attempted to unset non-existent properties [] in table ." -], -"sqlState" : "42K0J" - }, "UNSUPPORTED_ADD_FILE" : { "message" : [ "Don't support add file." diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 566e73da2151..31eaf659b5c7 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -236,21 +236,29 @@ ALTER TABLE table_identifier DROP [ IF EXISTS ] partition_spec [PURGE] ### SET AND UNSET - SET TABLE PROPERTIES + SET PROPERTIES `ALTER TABLE SET` command is used for setting the table properties. If a particular property was already set, this overrides the old value with the new one. -`ALTER TABLE UNSET` is used to drop the table property. - # Syntax ```sql --- Set Table Properties +-- Set Properties ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 = val1, key2 = val2, ... ) +``` + + UNSET PROPERTIES + +`ALTER TABLE UNSET` command is used to drop the table property. --- Unset Table Properties -ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) +**Note:** If the specified property key does not exist, whether specify `IF EXISTS` or not, the command will ignore it and finally suc
(spark) branch master updated (f1eca903f5c2 -> 489e32535aad)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f1eca903f5c2 [SPARK-48719][SQL] Fix the calculation bug of `RegrSlope` & `RegrIntercept` when the first parameter is null add 489e32535aad [SPARK-48177][BUILD][FOLLOWUP] Update parquet version in `sql-data-sources-parquet.md` doc No new revisions were added by this update. Summary of changes: docs/sql-data-sources-parquet.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls to bridged APIs based on `SparkSession#sqlContext` with `SparkSession` API
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 54b75582506d [SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls to bridged APIs based on `SparkSession#sqlContext` with `SparkSession` API 54b75582506d is described below commit 54b75582506d0e58af7f500b9d284ab7222e98f0 Author: yangjie01 AuthorDate: Thu Jul 4 19:27:22 2024 +0800 [SPARK-48805][SQL][ML][SS][AVRO][EXAMPLES] Replace calls to bridged APIs based on `SparkSession#sqlContext` with `SparkSession` API ### What changes were proposed in this pull request? In the internal code of Spark, there are instances where, despite having a SparkSession instance, the bridged APIs based on SparkSession#sqlContext are still used. Therefore, this PR makes some simplifications in this regard:" 1. `SparkSession#sqlContext#read` -> `SparkSession#read` ```scala /** * Returns a [[DataFrameReader]] that can be used to read non-streaming data in as a * `DataFrame`. * {{{ * sqlContext.read.parquet("/path/to/file.parquet") * sqlContext.read.schema(schema).json("/path/to/file.json") * }}} * * group genericdata * since 1.4.0 */ def read: DataFrameReader = sparkSession.read ``` 2. `SparkSession#sqlContext#setConf` -> `SparkSession#conf#set` ```scala /** * Set the given Spark SQL configuration property. * * group config * since 1.0.0 */ def setConf(key: String, value: String): Unit = { sparkSession.conf.set(key, value) } ``` 3. `SparkSession#sqlContext#getConf` -> `SparkSession#conf#get` ```scala /** * Return the value of Spark SQL configuration property for the given key. * * group config * since 1.0.0 */ def getConf(key: String): String = { sparkSession.conf.get(key) } ``` 4. `SparkSession#sqlContext#createDataFrame` -> `SparkSession#createDataFrame` ```scala /** * Creates a DataFrame from an RDD of Product (e.g. case classes, tuples). * * group dataframes * since 1.3.0 */ def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = { sparkSession.createDataFrame(rdd) } ``` 5. `SparkSession#sqlContext#sessionState` -> `SparkSession#sessionState` ```scala private[sql] def sessionState: SessionState = sparkSession.sessionState ``` 6. `SparkSession#sqlContext#sharedState` -> `SparkSession#sharedState` ```scala private[sql] def sharedState: SharedState = sparkSession.sharedState ``` 7. `SparkSession#sqlContext#streams` -> `SparkSession#streams` ```scala /** * Returns a `StreamingQueryManager` that allows managing all the * [[org.apache.spark.sql.streaming.StreamingQuery StreamingQueries]] active on `this` context. * * since 2.0.0 */ def streams: StreamingQueryManager = sparkSession.streams ``` 8. `SparkSession#sqlContext#uncacheTable` -> `SparkSession#catalog#uncacheTable` ```scala /** * Removes the specified table from the in-memory cache. * group cachemgmt * since 1.3.0 */ def uncacheTable(tableName: String): Unit = { sparkSession.catalog.uncacheTable(tableName) } ``` ### Why are the changes needed? Decrease the nesting levels of API calls ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manually checked `SparkHiveExample` ### Was this patch authored or co-authored using generative AI tooling? No Closes #47210 from LuciferYang/session.sqlContext. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala | 4 ++-- .../apache/spark/examples/sql/hive/SparkHiveExample.scala| 4 ++-- .../apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala | 2 +- .../sql/execution/streaming/FlatMapGroupsWithStateExec.scala | 12 ++-- .../sql/execution/streaming/TransformWithStateExec.scala | 12 ++-- .../test/scala/org/apache/spark/sql/CachedTableSuite.scala | 4 ++-- .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 2 +- .../apache/spark/sql/hive/HiveParquetMetastoreSuite.scala| 2 +- .../org/apache/spark/sql/hive/HiveUDFDynamicLoadSuite.scala | 2 +- .../spark/sql/hive/PartitionedTablePerfStatsSuite.scala | 2 +- 10 files changed, 23 insertions(+), 23 deletions(-) diff --git a/connector/avro/src/test/scala/org/apa
(spark) branch master updated: [SPARK-48765][DEPLOY] Enhance default value evaluation for SPARK_IDENT_STRING
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bc16b24c7a32 [SPARK-48765][DEPLOY] Enhance default value evaluation for SPARK_IDENT_STRING bc16b24c7a32 is described below commit bc16b24c7a328cf103b003b1c4a5cf16832cf2bd Author: Cheng Pan AuthorDate: Mon Jul 1 19:49:31 2024 +0800 [SPARK-48765][DEPLOY] Enhance default value evaluation for SPARK_IDENT_STRING ### What changes were proposed in this pull request? This PR follows Hadoop[1] to enhance the `SPARK_IDENT_STRING` default value evaluation. [1] https://github.com/apache/hadoop/blob/rel/release-3.4.0/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L893-L896 ### Why are the changes needed? I found in some cases, `$USER` is not available, thus the auto-generated log and pid file names are strange. For example, there is no `$USER` when login to docker ``` $ docker run -t -i ubuntu:latest root1dbeaefd6cd4:/# echo $USER root1dbeaefd6cd4:/# id -nu root root1dbeaefd6cd4:/# exit ``` ### Does this PR introduce _any_ user-facing change? Yes, affects log/pid file names. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No Closes #47160 from pan3793/SPARK-48765. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- sbin/spark-daemon.sh | 4 1 file changed, 4 insertions(+) diff --git a/sbin/spark-daemon.sh b/sbin/spark-daemon.sh index 28d205f03e0f..b7233e6e9bf3 100755 --- a/sbin/spark-daemon.sh +++ b/sbin/spark-daemon.sh @@ -98,6 +98,10 @@ spark_rotate_log () . "${SPARK_HOME}/bin/load-spark-env.sh" if [ "$SPARK_IDENT_STRING" = "" ]; then + # if for some reason the shell doesn't have $USER defined + # (e.g., ssh'd in to execute a command) + # let's get the effective username and use that + USER=${USER:-$(id -nu)} export SPARK_IDENT_STRING="$USER" fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48691][BUILD] Upgrade scalatest related dependencies to the 3.2.19 series
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7f5f96cad224 [SPARK-48691][BUILD] Upgrade scalatest related dependencies to the 3.2.19 series 7f5f96cad224 is described below commit 7f5f96cad22464e02679ab1a1c6eb08b9da039ef Author: Wei Guo AuthorDate: Wed Jun 26 22:31:08 2024 +0800 [SPARK-48691][BUILD] Upgrade scalatest related dependencies to the 3.2.19 series ### What changes were proposed in this pull request? This PR aims to upgrade: - `scalatest` to 3.2.19 - `mockto` to 5.12.0 - `selenium` to 4.21 - `bytebuddy` to 1.14.17 ### Why are the changes needed? Full release notes: scalatest: https://github.com/scalatest/scalatest/releases/tag/release-3.2.19 mockito: https://github.com/mockito/mockito/releases/tag/v5.12.0 https://github.com/mockito/mockito/releases/tag/v5.11.0 selenium: https://github.com/SeleniumHQ/selenium/compare/selenium-4.17.0...selenium-4.21.0 bytebuddy: https://github.com/raphw/byte-buddy/compare/byte-buddy-1.14.11...byte-buddy-1.14.17 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GitHub Actions - Manual tests: ``` build/sbt clean -Dguava.version=33.0.0-jre -Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver -Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive -Phive-thriftserver \ "core/testOnly *HistoryServerSuite" ``` ``` build/sbt clean -Dguava.version=33.0.0-jre -Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver -Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive -Phive-thriftserver \ "core/testOnly *UISeleniumSuite" ``` ``` build/sbt clean -Dguava.version=33.0.0-jre -Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver -Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive -Phive-thriftserver \ "sql/testOnly *UISeleniumSuite" ``` ``` build/sbt clean -Dguava.version=33.0.0-jre -Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver -Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive -Phive-thriftserver \ "streaming/testOnly *UISeleniumSuite" ``` ``` build/sbt clean -Dguava.version=33.0.0-jre -Dspark.test.webdriver.chrome.driver=/opt/homebrew/bin/chromedriver -Dtest.default.exclude.tags="org.apache.spark.tags.ExtendedLevelDBTest" -Phive -Phive-thriftserver \ "hive-thriftserver/testOnly *UISeleniumSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47065 from wayneguow/upgrade_mockito. Authored-by: Wei Guo Signed-off-by: yangjie01 --- pom.xml | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/pom.xml b/pom.xml index b2e7bba3ec75..5e181cc38d31 100644 --- a/pom.xml +++ b/pom.xml @@ -206,8 +206,8 @@ 0.16.0 4.13.1 1.1 -4.17.0 -4.17.0 +4.21.0 +4.21.0 3.1.0 1.1.0 1.8.0 @@ -420,12 +420,12 @@ org.scalatestplus - mockito-5-10_${scala.binary.version} + mockito-5-12_${scala.binary.version} test org.scalatestplus - selenium-4-17_${scala.binary.version} + selenium-4-21_${scala.binary.version} test @@ -1156,25 +1156,25 @@ org.scalatest scalatest_${scala.binary.version} -3.2.18 +3.2.19 test org.scalatestplus scalacheck-1-18_${scala.binary.version} -3.2.18.0 +3.2.19.0 test org.scalatestplus -mockito-5-10_${scala.binary.version} -3.2.18.0 +mockito-5-12_${scala.binary.version} +3.2.19.0 test org.scalatestplus -selenium-4-17_${scala.binary.version} -3.2.18.0 +selenium-4-21_${scala.binary.version} +3.2.19.0 test @@ -1186,19 +1186,19 @@ org.mockito mockito-core -5.10.0 +5.12.0 test net.bytebuddy byte-buddy -1.14.11 +1.14.17 test net.bytebuddy byte-buddy-agent -1.14.11 +1.14.17 test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48724][SQL][TESTS] Fix incorrect conf settings of `ignoreCorruptFiles` related tests case in `ParquetQuerySuite`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a474b88aa2eb [SPARK-48724][SQL][TESTS] Fix incorrect conf settings of `ignoreCorruptFiles` related tests case in `ParquetQuerySuite` a474b88aa2eb is described below commit a474b88aa2ebb2af17273975f2f91584c0ce9af1 Author: Wei Guo AuthorDate: Wed Jun 26 19:30:52 2024 +0800 [SPARK-48724][SQL][TESTS] Fix incorrect conf settings of `ignoreCorruptFiles` related tests case in `ParquetQuerySuite` ### What changes were proposed in this pull request? This PR aims to fix incorrect conf settings of `ignoreCorruptFiles` related tests case in `ParquetQuerySuite`. The inner `withSQLConf (SQLConf.IGNORE_CORRUPT_FILES.key -> "false")` will overwrite the outer configuration, making it impossible to test the situation where `sqlConf` is true. ### Why are the changes needed? Fix test coverage logic. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47088 from wayneguow/parquet_query_suite. Authored-by: Wei Guo Signed-off-by: yangjie01 --- .../datasources/parquet/ParquetQuerySuite.scala| 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala index a329d3fdc3cb..4d413efe5043 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala @@ -369,16 +369,14 @@ abstract class ParquetQuerySuite extends QueryTest with ParquetTest with SharedS } withSQLConf(SQLConf.IGNORE_CORRUPT_FILES.key -> sqlConf) { -withSQLConf(SQLConf.IGNORE_CORRUPT_FILES.key -> "false") { - val exception = intercept[SparkException] { -testIgnoreCorruptFiles(options) - }.getCause - assert(exception.getMessage().contains("is not a Parquet file")) - val exception2 = intercept[SparkException] { -testIgnoreCorruptFilesWithoutSchemaInfer(options) - }.getCause - assert(exception2.getMessage().contains("is not a Parquet file")) -} +val exception = intercept[SparkException] { + testIgnoreCorruptFiles(options) +}.getCause +assert(exception.getMessage().contains("is not a Parquet file")) +val exception2 = intercept[SparkException] { + testIgnoreCorruptFilesWithoutSchemaInfer(options) +}.getCause +assert(exception2.getMessage().contains("is not a Parquet file")) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5112e5887714 [SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1 5112e5887714 is described below commit 5112e58877147c7fb169d2c53845ce00de127866 Author: panbingkun AuthorDate: Tue Jun 25 11:45:42 2024 +0800 [SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1 ### What changes were proposed in this pull request? The pr aims to upgrade rocksdbjni from `8.11.4` to `9.2.1`. ### Why are the changes needed? The full release notes as follows: https://github.com/facebook/rocksdb/releases/tag/v9.2.1 https://github.com/facebook/rocksdb/releases/tag/v9.1.1 https://github.com/facebook/rocksdb/releases/tag/v9.1.0 https://github.com/facebook/rocksdb/releases/tag/v9.0.1 https://github.com/facebook/rocksdb/releases/tag/v9.0.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46146 from panbingkun/test_rocksdbjni_9. Authored-by: panbingkun Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml| 2 +- ...StoreBasicOperationsBenchmark-jdk21-results.txt | 120 ++--- .../StateStoreBasicOperationsBenchmark-results.txt | 120 ++--- 4 files changed, 122 insertions(+), 122 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index edaf1c494d13..b99ec346e6ab 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -244,7 +244,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar pickle/1.5//pickle-1.5.jar py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar -rocksdbjni/8.11.4//rocksdbjni-8.11.4.jar +rocksdbjni/9.2.1//rocksdbjni-9.2.1.jar scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar scala-compiler/2.13.14//scala-compiler-2.13.14.jar scala-library/2.13.14//scala-library-2.13.14.jar diff --git a/pom.xml b/pom.xml index 85fc2aefdf90..a6dc3a60d89c 100644 --- a/pom.xml +++ b/pom.xml @@ -691,7 +691,7 @@ org.rocksdb rocksdbjni -8.11.4 +9.2.1 ${leveldbjni.group} diff --git a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt index e563e60a8f48..6a42c7b283b7 100644 --- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt +++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt @@ -2,143 +2,143 @@ put rows -OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure AMD EPYC 7763 64-Core Processor putting 1 rows (1 rows to overwrite - rate 100): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative --- -In-memory 10 12 1 1.01023.2 1.0X -RocksDB (trackTotalNumberOfRows: true) 42 44 2 0.24197.6 0.2X -RocksDB (trackTotalNumberOfRows: false) 16 17 1 0.61591.7 0.6X +In-memory 10 11 1 1.0 968.0 1.0X +RocksDB (trackTotalNumberOfRows: true) 40 42 2 0.24033.5 0.2X +RocksDB (trackTotalNumberOfRows: false) 15 16 1 0.71502.0 0.6X -OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure AMD EPYC 7763 64-Core Processor putting 1 rows (5000 rows to overwrite - rate 50): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -In-memory 10 11 1 1.01009.0 1.0X -RocksDB (trackTotalNumberOfRows: true
(spark) branch master updated: [SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 67c71874bcb2 [SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0 67c71874bcb2 is described below commit 67c71874bcb2ce6fe2f68e0e47cab72e0d37a687 Author: Wei Guo AuthorDate: Fri Jun 21 10:31:43 2024 +0800 [SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0 ### What changes were proposed in this pull request? This PR aims to upgrade `RoaringBitmap` to 1.1.0. ### Why are the changes needed? There are some bug fixes in `RoaringBitmap` 1.1.0: Fix RunContainer#contains(BitmapContainer) (https://github.com/RoaringBitmap/RoaringBitmap/issues/721) by LeeWorrall in https://github.com/RoaringBitmap/RoaringBitmap/pull/722 Fix ArrayContainer#contains(RunContainer) (https://github.com/RoaringBitmap/RoaringBitmap/issues/723) by LeeWorrall in https://github.com/RoaringBitmap/RoaringBitmap/pull/724 Full release note: https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.1.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47020 from wayneguow/upgrade_RoaringBitmap. Authored-by: Wei Guo Signed-off-by: yangjie01 --- core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 core/benchmarks/MapStatusesConvertBenchmark-results.txt | 8 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt index 71c13a0fc5ad..a15442496b24 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1022-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500664678 14 0.0 664277160.0 1.0X -Num Maps: 5 Fetch partitions:1000 1597 1616 29 0.0 1596794881.0 0.4X -Num Maps: 5 Fetch partitions:1500 2402 2421 18 0.0 2401654923.0 0.3X +Num Maps: 5 Fetch partitions:500674685 12 0.0 673772738.0 1.0X +Num Maps: 5 Fetch partitions:1000 1579 1590 12 0.0 1579383970.0 0.4X +Num Maps: 5 Fetch partitions:1500 2435 2472 37 0.0 2434530380.0 0.3X diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-results.txt index a7379aa0d4af..b9f36af4a653 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1022-azure AMD EPYC 7763 64-Core Processor MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500699715 14 0.0 698750825.0 1.0X -Num Maps: 5 Fetch partitions:1000 1653 1676 36 0.0 1653453370.0 0.4X -Num Maps: 5 Fetch partitions:1500 2580 2613 30 0.0 2579900318.0 0.3X +Num Maps: 5 Fetch partitions:500703716 11 0.0 703103575.0 1.0X +Num Maps: 5 Fetch partitions:1000 1707 1723 14 0.0 1707060398.0 0.4X +Num Maps
(spark) branch master updated: [SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 58701d811d95 [SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable 58701d811d95 is described below commit 58701d811d95918ac4a73d8fb260c46ccbf25bdd Author: Wenchen Fan AuthorDate: Tue Jun 18 16:10:19 2024 +0800 [SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/45234 to make the test more stable by using broadcast hint. ### Why are the changes needed? test improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #47007 from cloud-fan/follow. Authored-by: Wenchen Fan Signed-off-by: yangjie01 --- .../apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala index 4e1e171c8a84..d6fd45269ce6 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala @@ -961,7 +961,7 @@ class AdaptiveQueryExecSuite spark.range(10).toDF("col1").createTempView("t1") spark.range(5).coalesce(2).toDF("col2").createTempView("t2") spark.range(15).toDF("col3").filter(Symbol("col3") >= 2).createTempView("t3") - sql("SELECT * FROM (SELECT /*+ BROADCAST(t2) */ * FROM t1 " + + sql("SELECT /*+ BROADCAST(t3) */ * FROM (SELECT /*+ BROADCAST(t2) */ * FROM t1 " + "INNER JOIN t2 ON t1.col1 = t2.col2) t JOIN t3 ON t.col1 = t3.col3;") } withTempView("t1", "t2", "t3") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyException` throw out the `original` exception
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a3feffdff9cd [SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyException` throw out the `original` exception a3feffdff9cd is described below commit a3feffdff9cd17e0435ac5620731093f40d1a3bf Author: panbingkun AuthorDate: Tue Jun 18 14:50:34 2024 +0800 [SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyException` throw out the `original` exception ### What changes were proposed in this pull request? The pr aims to make `built-in` JdbcDialect's method classifyException throw out the `original` exception. ### Why are the changes needed? As discussed in https://github.com/apache/spark/pull/46912#discussion_r1630876576, the following code: https://github.com/apache/spark/blob/df4156aa3217cf0f58b4c6cbf33c967bb43f7155/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L746-L751 have lost the original cause of the error, let's correct it. ### Does this PR introduce _any_ user-facing change? Yes, more accurate error conditions for end users. ### How was this patch tested? - Manually test. - Update existed UT & Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46937 from panbingkun/improve_JDBCTableCatalog. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 26 -- .../apache/spark/sql/jdbc/AggregatedDialect.scala | 3 ++- .../org/apache/spark/sql/jdbc/DB2Dialect.scala | 2 +- .../apache/spark/sql/jdbc/DatabricksDialect.scala | 2 +- .../org/apache/spark/sql/jdbc/DerbyDialect.scala | 2 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 2 +- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 17 ++ .../apache/spark/sql/jdbc/MsSqlServerDialect.scala | 2 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 2 +- .../org/apache/spark/sql/jdbc/OracleDialect.scala | 2 +- .../apache/spark/sql/jdbc/PostgresDialect.scala| 3 ++- .../apache/spark/sql/jdbc/SnowflakeDialect.scala | 2 +- .../apache/spark/sql/jdbc/TeradataDialect.scala| 2 +- .../v2/jdbc/JDBCTableCatalogSuite.scala| 16 ++--- 14 files changed, 52 insertions(+), 31 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala index c78e87d0b846..88ba00a8a1ae 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala @@ -83,14 +83,16 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu def testCreateTableWithProperty(tbl: String): Unit = {} - def checkErrorFailedLoadTable(e: AnalysisException, tbl: String): Unit = { -checkError( + private def checkErrorFailedJDBC( + e: AnalysisException, + errorClass: String, + tbl: String): Unit = { +checkErrorMatchPVals( exception = e, - errorClass = "FAILED_JDBC.UNCLASSIFIED", + errorClass = errorClass, parameters = Map( -"url" -> "jdbc:", -"message" -> s"Failed to load table: $tbl" - ) +"url" -> "jdbc:.*", +"tableName" -> s"`$tbl`") ) } @@ -132,7 +134,7 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu val e = intercept[AnalysisException] { sql(s"ALTER TABLE $catalogName.not_existing_table ADD COLUMNS (C4 STRING)") } -checkErrorFailedLoadTable(e, "not_existing_table") +checkErrorFailedJDBC(e, "FAILED_JDBC.LOAD_TABLE", "not_existing_table") } test("SPARK-33034: ALTER TABLE ... drop column") { @@ -154,7 +156,7 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu val e = intercept[AnalysisException] { sql(s"ALTER TABLE $catalogName.not_existing_table DROP COLUMN C1") } -checkErrorFailedLoadTable(e, "not_existing_table") +checkErrorFailedJDBC(e, "FAILED_JDBC.LOAD_TABLE", "not_existing_table") } test("SPARK-33034: ALTER TABLE ... update column type") { @@ -170,7 +172,7 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu val e = intercept[AnalysisException] { sql(s"ALTER TABLE $catalogName
(spark) branch master updated: [SPARK-48615][SQL] Perf improvement for parsing hex string
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 257a7883f215 [SPARK-48615][SQL] Perf improvement for parsing hex string 257a7883f215 is described below commit 257a7883f2150e037eb05f8c7a84103ad9a1 Author: Kent Yao AuthorDate: Mon Jun 17 09:56:05 2024 +0800 [SPARK-48615][SQL] Perf improvement for parsing hex string ### What changes were proposed in this pull request? Currently, we use two heximal string parsing functions. One uses Apache Codecs Hex for X-prefixed lit parsing, and the other use builtin impl for unhex function. I did a benchmark for them comparing with the `java.util.HexFormat` which was introduced in JDK17. ``` OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 Apple M2 Max Cardinality 100: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Apache 5050 5100 86 0.25050.1 1.0X Spark 3822 3840 30 0.33821.6 1.3X Java 2462 2522 87 0.42462.1 2.1X OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 Apple M2 Max Cardinality 200: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Apache10020 10828 1154 0.25010.1 1.0X Spark 6875 6966 144 0.33437.7 1.5X Java 4999 5092 89 0.42499.3 2.0X OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 Apple M2 Max Cardinality 400: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Apache20090 20433 433 0.25022.5 1.0X Spark 13389 13620 229 0.33347.2 1.5X Java 10023 10069 42 0.42505.6 2.0X OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 Apple M2 Max Cardinality 800: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Apache40277 43453 2755 0.25034.7 1.0X Spark 27145 27380 311 0.33393.1 1.5X Java 19980 21198 1473 0.42497.5 2.0X ``` The results indicate that the speed is Apache Codecs < builtin < Java, increasing by ~50%. In this PR, we replace these two with the Java 17 API ### Why are the changes needed? performance enhance ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? benchmarking existing unit tests in org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite ### Was this patch authored or co-authored using generative AI tooling? no Closes #46972 from yaooqinn/SPARK-48615. Authored-by: Kent Yao Signed-off-by: yangjie01 --- .../benchmarks/HexBenchmark-jdk21-results.txt | 14 sql/catalyst/benchmarks/HexBenchmark-results.txt | 14 .../sql/catalyst/expressions/mathExpressions.scala | 94 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 7 +- .../sql/catalyst/expressions/HexBenchmark.scala| 90 + 5 files changed, 158 insertions(+), 61 deletions(-) diff --git a/sql/catalyst/benchmarks/HexBenchmark-jdk21-results.txt b/sql/catalyst/benchmarks/HexBenchmark-jdk21-results.txt new file mode 100644 index ..afa3efa7a919 --- /dev/null +++ b/sql/ca
(spark) branch master updated: [SPARK-48626][CORE] Change the scope of object LogKeys as private in Spark
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 878de0014a37 [SPARK-48626][CORE] Change the scope of object LogKeys as private in Spark 878de0014a37 is described below commit 878de0014a3782187180c40158f0805e51335cb5 Author: Gengliang Wang AuthorDate: Fri Jun 14 15:08:41 2024 +0800 [SPARK-48626][CORE] Change the scope of object LogKeys as private in Spark ### What changes were proposed in this pull request? Change the scope of object LogKeys as private in Spark. ### Why are the changes needed? LogKeys are internal and developing. Making it private can avoid future confusion or compiling failures. This is suggested by pan3793 in https://github.com/apache/spark/pull/46947#issuecomment-2167164424 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT ### Was this patch authored or co-authored using generative AI tooling? No Closes #46983 from gengliangwang/changeScope. Authored-by: Gengliang Wang Signed-off-by: yangjie01 --- common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala index b8b63382fe4c..ec621c4f84ce 100644 --- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala +++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala @@ -57,7 +57,7 @@ trait LogKey { * Various keys used for mapped diagnostic contexts(MDC) in logging. All structured logging keys * should be defined here for standardization. */ -object LogKeys { +private[spark] object LogKeys { case object ACCUMULATOR_ID extends LogKey case object ACL_ENABLED extends LogKey case object ACTUAL_NUM_FILES extends LogKey - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48612][SQL][SS] Cleanup deprecated api usage related to commons-pool2
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 157b1e3ba5d5 [SPARK-48612][SQL][SS] Cleanup deprecated api usage related to commons-pool2 157b1e3ba5d5 is described below commit 157b1e3ba5d5d5e75eb79805eaa3ea14fa876f5b Author: yangjie01 AuthorDate: Fri Jun 14 12:36:22 2024 +0800 [SPARK-48612][SQL][SS] Cleanup deprecated api usage related to commons-pool2 ### What changes were proposed in this pull request? This pr make the following changes - o.a.c.pool2.impl.BaseObjectPoolConfig#setMinEvictableIdleTime -> o.a.c.pool2.impl.BaseObjectPoolConfig#setMinEvictableIdleDuration - o.a.c.pool2.impl.BaseObjectPoolConfig#setSoftMinEvictableIdleTime -> o.a.c.pool2.impl.BaseObjectPoolConfig#setSoftMinEvictableIdleDuration to fix the following compilation warnings related to 'commons-pool2': ``` [WARNING] [Warn] /Users/yangjie01/SourceCode/git/spark-mine-13/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/InternalKafkaConsumerPool.scala:186: method setMinEvictableIdleTime in class BaseObjectPoolConfig is deprecated Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool.PoolConfig.init, origin=org.apache.commons.pool2.impl.BaseObjectPoolConfig.setMinEvictableIdleTime [WARNING] [Warn] /Users/yangjie01/SourceCode/git/spark-mine-13/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/InternalKafkaConsumerPool.scala:187: method setSoftMinEvictableIdleTime in class BaseObjectPoolConfig is deprecated Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool.PoolConfig.init, origin=org.apache.commons.pool2.impl.BaseObjectPoolConfig.setSoftMinEvictableIdleTime ``` The fix refers to: - https://github.com/apache/commons-pool/blob/e5c44f5184a55a58fef4a1efec8124d162a348bd/src/main/java/org/apache/commons/pool2/impl/BaseObjectPoolConfig.java#L765-L789 - https://github.com/apache/commons-pool/blob/e5c44f5184a55a58fef4a1efec8124d162a348bd/src/main/java/org/apache/commons/pool2/impl/BaseObjectPoolConfig.java#L815-L839 ```java /** * Sets the value for the {code minEvictableIdleTime} configuration attribute for pools created with this configuration instance. * * param minEvictableIdleTime The new setting of {code minEvictableIdleTime} for this configuration instance * see GenericObjectPool#getMinEvictableIdleDuration() * see GenericKeyedObjectPool#getMinEvictableIdleDuration() * since 2.10.0 * deprecated Use {link #setMinEvictableIdleDuration(Duration)}. */ Deprecated public void setMinEvictableIdleTime(final Duration minEvictableIdleTime) { this.minEvictableIdleDuration = PoolImplUtils.nonNull(minEvictableIdleTime, DEFAULT_MIN_EVICTABLE_IDLE_TIME); } /** * Sets the value for the {code minEvictableIdleTime} configuration attribute for pools created with this configuration instance. * * param minEvictableIdleTime The new setting of {code minEvictableIdleTime} for this configuration instance * see GenericObjectPool#getMinEvictableIdleDuration() * see GenericKeyedObjectPool#getMinEvictableIdleDuration() * since 2.12.0 */ public void setMinEvictableIdleDuration(final Duration minEvictableIdleTime) { this.minEvictableIdleDuration = PoolImplUtils.nonNull(minEvictableIdleTime, DEFAULT_MIN_EVICTABLE_IDLE_TIME); } /** * Sets the value for the {code softMinEvictableIdleTime} configuration attribute for pools created with this configuration instance. * * param softMinEvictableIdleTime The new setting of {code softMinEvictableIdleTime} for this configuration instance * see GenericObjectPool#getSoftMinEvictableIdleDuration() * see GenericKeyedObjectPool#getSoftMinEvictableIdleDuration() * since 2.10.0 * deprecated Use {link #setSoftMinEvictableIdleDuration(Duration)}. */ Deprecated public void setSoftMinEvictableIdleTime(final Duration softMinEvictableIdleTime) { this.softMinEvictableIdleDuration = PoolImplUtils.nonNull(softMinEvictableIdleTime, DEFAULT_SOFT_MIN_EVICTABLE_IDLE_TIME); } /** * Sets the value for the {code softMinEvictableIdleTime} configuration attribute for pools created with this configuration instance. * * param softMinEvictableIdleTime The new setting of {code softMinEvictableId
(spark) branch master updated: [SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` where `LazyList.force` is called
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 75fff90d2618 [SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` where `LazyList.force` is called 75fff90d2618 is described below commit 75fff90d2618617a66b9a3311792c8b16e8e Author: yangjie01 AuthorDate: Fri Jun 14 12:30:44 2024 +0800 [SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` where `LazyList.force` is called ### What changes were proposed in this pull request? Refer to the suggestion of https://github.com/apache/spark/pull/43563#pullrequestreview-2114900378, this pr add handling for Stream where LazyList.force is called ### Why are the changes needed? Even though `Stream` is deprecated in 2.13, it is not _removed_ and thus is is possible that some parts of Spark / Catalyst (or third-party code) might continue to pass around `Stream` instances. Hence, we should restore the call to `Stream.force` where `.force` is called on `LazyList`, to avoid losing the eager materialization for Streams that happen to flow to these call sites. This is also a guarantee of compatibility. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add some new tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #46970 from LuciferYang/SPARK-45685-FOLLOWUP. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../spark/sql/catalyst/plans/QueryPlan.scala | 4 +++- .../apache/spark/sql/catalyst/trees/TreeNode.scala | 13 --- .../sql/catalyst/plans/LogicalPlanSuite.scala | 22 ++ .../spark/sql/catalyst/trees/TreeNodeSuite.scala | 27 ++ .../sql/execution/WholeStageCodegenExec.scala | 4 +++- .../apache/spark/sql/execution/PlannerSuite.scala | 8 +++ .../sql/execution/WholeStageCodegenSuite.scala | 10 7 files changed, 83 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala index bc0ca31dc635..c9c8fdb676b2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala @@ -226,12 +226,14 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] } } +@scala.annotation.nowarn("cat=deprecation") def recursiveTransform(arg: Any): AnyRef = arg match { case e: Expression => transformExpression(e) case Some(value) => Some(recursiveTransform(value)) case m: Map[_, _] => m case d: DataType => d // Avoid unpacking Structs - case stream: LazyList[_] => stream.map(recursiveTransform).force + case stream: Stream[_] => stream.map(recursiveTransform).force + case lazyList: LazyList[_] => lazyList.map(recursiveTransform).force case seq: Iterable[_] => seq.map(recursiveTransform) case other: AnyRef => other case null => null diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala index 23d26854a767..6683f2dbfb39 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.trees import java.util.UUID +import scala.annotation.nowarn import scala.collection.{mutable, Map} import scala.jdk.CollectionConverters._ import scala.reflect.ClassTag @@ -378,12 +379,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] case nonChild: AnyRef => nonChild case null => null } +@nowarn("cat=deprecation") val newArgs = mapProductIterator { case s: StructType => s // Don't convert struct types to some other type of Seq[StructField] // Handle Seq[TreeNode] in TreeNode parameters. - case s: LazyList[_] => -// LazyList is lazy so we need to force materialization + case s: Stream[_] => +// Stream is lazy so we need to force materialization s.map(mapChild).force + case l: LazyList[_] => +// LazyList is lazy so we need to force materialization +l.map(mapChild).force case s: Seq[_] => s.map(mapChild) case m: Map[_, _] => @@ -801,6 +806,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] case other => other } +@nowarn("cat=deprecation") val newArgs =
(spark) branch master updated: [SPARK-48604][SQL] Replace deprecated `new ArrowType.Decimal(precision, scale)` method call
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 08e741b92b8f [SPARK-48604][SQL] Replace deprecated `new ArrowType.Decimal(precision, scale)` method call 08e741b92b8f is described below commit 08e741b92b8fc9e43c838d0849317916218414ce Author: Wei Guo AuthorDate: Thu Jun 13 18:11:30 2024 +0800 [SPARK-48604][SQL] Replace deprecated `new ArrowType.Decimal(precision, scale)` method call ### What changes were proposed in this pull request? This pr replaces deprecated classes and methods of `arrow-vector` called in Spark: - `Decimal(int precision, int scale)` -> `Decimal( JsonProperty("precision") int precision, JsonProperty("scale") int scale, JsonProperty("bitWidth") int bitWidth )` All `arrow-vector` related Spark classes, I made a double check, only in `ArrowUtils` there is a deprecated method call. ### Why are the changes needed? Clean up deprecated API usage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46961 from wayneguow/deprecated_arrow. Authored-by: Wei Guo Signed-off-by: yangjie01 --- sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala b/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala index d9bd3b0e612b..6852fe09ef96 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala @@ -51,7 +51,7 @@ private[sql] object ArrowUtils { case BinaryType if !largeVarTypes => ArrowType.Binary.INSTANCE case _: StringType if largeVarTypes => ArrowType.LargeUtf8.INSTANCE case BinaryType if largeVarTypes => ArrowType.LargeBinary.INSTANCE -case DecimalType.Fixed(precision, scale) => new ArrowType.Decimal(precision, scale) +case DecimalType.Fixed(precision, scale) => new ArrowType.Decimal(precision, scale, 8 * 16) case DateType => new ArrowType.Date(DateUnit.DAY) case TimestampType if timeZoneId == null => throw SparkException.internalError("Missing timezoneId where it is mandatory.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (78fd4e3301ff -> b8c7aee12f02)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 78fd4e3301ff [SPARK-48584][SQL][FOLLOWUP] Improve the unescapePathName add b8c7aee12f02 [SPARK-48609][BUILD] Upgrade `scala-xml` to 2.3.0 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of `commons-io` called in Spark
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fd045c9887fe [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of `commons-io` called in Spark fd045c9887fe is described below commit fd045c9887feabc37c0f15fa41c860847f5fffa0 Author: Wei Guo AuthorDate: Thu Jun 13 11:03:45 2024 +0800 [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of `commons-io` called in Spark ### What changes were proposed in this pull request? This pr replaces deprecated classes and methods of `commons-io` called in Spark: - `writeStringToFile(final File file, final String data)` -> `writeStringToFile(final File file, final String data, final Charset charset)` - `CountingInputStream` -> `BoundedInputStream` ### Why are the changes needed? Clean up deprecated API usage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed related test cases in `UDFXPathUtilSuite` and `XmlSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46935 from wayneguow/deprecated. Authored-by: Wei Guo Signed-off-by: yangjie01 --- .../spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala | 3 ++- .../spark/sql/execution/datasources/xml/XmlInputFormat.scala | 10 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala index a8dc2b20f56d..8351e94c0c36 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.catalyst.expressions.xml +import java.nio.charset.StandardCharsets import javax.xml.xpath.XPathConstants.STRING import org.w3c.dom.Node @@ -85,7 +86,7 @@ class UDFXPathUtilSuite extends SparkFunSuite { tempFile.deleteOnExit() val fname = tempFile.getAbsolutePath -FileUtils.writeStringToFile(tempFile, secretValue) +FileUtils.writeStringToFile(tempFile, secretValue, StandardCharsets.UTF_8) val xml = s""" diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala index 4359ac02f5f5..6169cec6f821 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala @@ -20,7 +20,7 @@ import java.io.{InputStream, InputStreamReader, IOException, Reader} import java.nio.ByteBuffer import java.nio.charset.Charset -import org.apache.commons.io.input.CountingInputStream +import org.apache.commons.io.input.BoundedInputStream import org.apache.hadoop.fs.Seekable import org.apache.hadoop.io.{LongWritable, Text} import org.apache.hadoop.io.compress._ @@ -67,7 +67,7 @@ private[xml] class XmlRecordReader extends RecordReader[LongWritable, Text] { private var end: Long = _ private var reader: Reader = _ private var filePosition: Seekable = _ - private var countingIn: CountingInputStream = _ + private var countingIn: BoundedInputStream = _ private var readerLeftoverCharFn: () => Boolean = _ private var readerByteBuffer: ByteBuffer = _ private var decompressor: Decompressor = _ @@ -117,7 +117,9 @@ private[xml] class XmlRecordReader extends RecordReader[LongWritable, Text] { } } else { fsin.seek(start) - countingIn = new CountingInputStream(fsin) + countingIn = BoundedInputStream.builder() +.setInputStream(fsin) +.get() in = countingIn // don't use filePosition in this case. We have to count bytes read manually } @@ -156,7 +158,7 @@ private[xml] class XmlRecordReader extends RecordReader[LongWritable, Text] { if (filePosition != null) { filePosition.getPos } else { - start + countingIn.getByteCount - + start + countingIn.getCount - readerByteBuffer.remaining() - (if (readerLeftoverCharFn()) 1 else 0) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (53d65fd12dd9 -> 452c1b64b625)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 53d65fd12dd9 [SPARK-48565][UI] Fix thread dump display in UI add 452c1b64b625 [SPARK-48551][SQL] Perf improvement for escapePathName No new revisions were added by this update. Summary of changes: .../EscapePathBenchmark-jdk21-results.txt} | 9 +-- .../benchmarks/EscapePathBenchmark-results.txt}| 9 +-- .../catalyst/catalog/ExternalCatalogUtils.scala| 42 .../spark/sql/catalyst/EscapePathBenchmark.scala | 74 ++ .../catalog/ExternalCatalogUtilsSuite.scala| 42 5 files changed, 154 insertions(+), 22 deletions(-) copy sql/{core/benchmarks/HashedRelationMetricsBenchmark-results.txt => catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt} (51%) copy sql/{core/benchmarks/HashedRelationMetricsBenchmark-results.txt => catalyst/benchmarks/EscapePathBenchmark-results.txt} (51%) create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/EscapePathBenchmark.scala create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtilsSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48563][BUILD] Upgrade `pickle` to 1.5
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3fe6abde125b [SPARK-48563][BUILD] Upgrade `pickle` to 1.5 3fe6abde125b is described below commit 3fe6abde125b7c34437a3f72d17ee97d9653c218 Author: yangjie01 AuthorDate: Tue Jun 11 10:36:32 2024 +0800 [SPARK-48563][BUILD] Upgrade `pickle` to 1.5 ### What changes were proposed in this pull request? This pr aims upgrade `pickle` from 1.3 to 1.5. ### Why are the changes needed? The new version include a new fix related to [empty bytes object construction](https://github.com/irmen/pickle/commit/badc8fe08c9e47b87df66b8a16c67010e3614e35) All changes from 1.3 to 1.5 are as follows: - https://github.com/irmen/pickle/compare/pickle-1.3...pickle-1.5 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46913 from LuciferYang/pickle-1.5. Authored-by: yangjie01 Signed-off-by: yangjie01 --- core/pom.xml | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/core/pom.xml b/core/pom.xml index 7413ad0d3393..adb1b3034b42 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -399,7 +399,7 @@ net.razorvine pickle - 1.3 + 1.5 net.sf.py4j diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 8ab76b5787b8..4585b534e908 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -241,7 +241,7 @@ parquet-encoding/1.13.1//parquet-encoding-1.13.1.jar parquet-format-structures/1.13.1//parquet-format-structures-1.13.1.jar parquet-hadoop/1.13.1//parquet-hadoop-1.13.1.jar parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar -pickle/1.3//pickle-1.3.jar +pickle/1.5//pickle-1.5.jar py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar rocksdbjni/8.11.4//rocksdbjni-8.11.4.jar - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8b88f5ae10cc [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE 8b88f5ae10cc is described below commit 8b88f5ae10cc676a9778c186b12c691fa913088d Author: Kent Yao AuthorDate: Tue Jun 4 21:33:01 2024 +0800 [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE ### What changes were proposed in this pull request? Update Stream Library to 2.9.8 and attach its NOTICE ### Why are the changes needed? update dep and notice file ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? passing ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #46861 from yaooqinn/SPARK-48522. Authored-by: Kent Yao Signed-off-by: yangjie01 --- NOTICE-binary | 9 + dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/NOTICE-binary b/NOTICE-binary index c82d0b52f31c..c4cfe0e9f8b3 100644 --- a/NOTICE-binary +++ b/NOTICE-binary @@ -33,11 +33,12 @@ services. // Version 2.0, in this case for // -- -Hive Beeline -Copyright 2016 The Apache Software Foundation +=== NOTICE FOR com.clearspring.analytics:streams === +stream-api +Copyright 2016 AddThis -This product includes software developed at -The Apache Software Foundation (http://www.apache.org/). +This product includes software developed by AddThis. +=== END OF NOTICE FOR com.clearspring.analytics:streams === Apache Avro Copyright 2009-2014 The Apache Software Foundation diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 3d8ffee05d3a..acb236e1c4e0 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -262,7 +262,7 @@ spire-platform_2.13/0.18.0//spire-platform_2.13-0.18.0.jar spire-util_2.13/0.18.0//spire-util_2.13-0.18.0.jar spire_2.13/0.18.0//spire_2.13-0.18.0.jar stax-api/1.0.1//stax-api-1.0.1.jar -stream/2.9.6//stream-2.9.6.jar +stream/2.9.8//stream-2.9.8.jar super-csv/2.2.0//super-csv-2.2.0.jar threeten-extra/1.7.1//threeten-extra-1.7.1.jar tink/1.13.0//tink-1.13.0.jar diff --git a/pom.xml b/pom.xml index ce3b4041ae57..bd384e42b0ec 100644 --- a/pom.xml +++ b/pom.xml @@ -806,7 +806,7 @@ com.clearspring.analytics stream -2.9.6 +2.9.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48506][CORE] Compression codec short names are case insensitive except for event logging
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f4afa2215a1a [SPARK-48506][CORE] Compression codec short names are case insensitive except for event logging f4afa2215a1a is described below commit f4afa2215a1a390d9f099a26155fbefc5beefbe9 Author: Kent Yao AuthorDate: Tue Jun 4 20:33:51 2024 +0800 [SPARK-48506][CORE] Compression codec short names are case insensitive except for event logging ### What changes were proposed in this pull request? Compression codec short names, e.g. map statuses, broadcasts, shuffle, parquet/orc/avro outputs, are case insensitive except for event logging. Calling `org.apache.spark.io.CompressionCodec.getShortName` causes this issue. In this PR, we make `CompressionCodec.getShortName` handle case sensitivity correctly. ### Why are the changes needed? Feature parity ### Does this PR introduce _any_ user-facing change? Yes, spark.eventLog.compression.codec now accepts not only the lowercased form of lz4, lzf, snappy, and zstd, but also forms with any of the characters to be upcased。 ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46847 from yaooqinn/SPARK-48506. Authored-by: Kent Yao Signed-off-by: yangjie01 --- .../main/scala/org/apache/spark/io/CompressionCodec.scala | 5 +++-- .../scala/org/apache/spark/io/CompressionCodecSuite.scala | 15 +++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala index 7d5a86d1a81d..233228a9c6d4 100644 --- a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala +++ b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala @@ -101,8 +101,9 @@ private[spark] object CompressionCodec { * If it is already a short name, just return it. */ def getShortName(codecName: String): String = { -if (shortCompressionCodecNames.contains(codecName)) { - codecName +val lowercasedCodec = codecName.toLowerCase(Locale.ROOT) +if (shortCompressionCodecNames.contains(lowercasedCodec)) { + lowercasedCodec } else { shortCompressionCodecNames .collectFirst { case (k, v) if v == codecName => k } diff --git a/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala b/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala index 729fcecff120..5c09a1f965b9 100644 --- a/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala +++ b/core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.io import java.io.{ByteArrayInputStream, ByteArrayOutputStream} +import java.util.Locale import com.google.common.io.ByteStreams @@ -160,4 +161,18 @@ class CompressionCodecSuite extends SparkFunSuite { ByteStreams.readFully(concatenatedBytes, decompressed) assert(decompressed.toSeq === (0 to 127)) } + + test("SPARK-48506: CompressionCodec getShortName is case insensitive for short names") { +CompressionCodec.shortCompressionCodecNames.foreach { case (shortName, codecClass) => + assert(CompressionCodec.getShortName(shortName) === shortName) + assert(CompressionCodec.getShortName(shortName.toUpperCase(Locale.ROOT)) === shortName) + assert(CompressionCodec.getShortName(codecClass) === shortName) + checkError( +exception = intercept[SparkIllegalArgumentException] { + CompressionCodec.getShortName(codecClass.toUpperCase(Locale.ROOT)) +}, +errorClass = "CODEC_SHORT_NAME_NOT_FOUND", +parameters = Map("codecName" -> codecClass.toUpperCase(Locale.ROOT))) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48519][BUILD] Upgrade jetty to 11.0.21
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d273fdf37bc2 [SPARK-48519][BUILD] Upgrade jetty to 11.0.21 d273fdf37bc2 is described below commit d273fdf37bc291aadf8677305bda2a91b593219f Author: yangjie01 AuthorDate: Tue Jun 4 19:08:40 2024 +0800 [SPARK-48519][BUILD] Upgrade jetty to 11.0.21 ### What changes were proposed in this pull request? This pr aims to upgrade jetty from 11.0.20 to 11.0.21. ### Why are the changes needed? The new version bring some bug fix like [Reduce ByteBuffer churning in HttpOutput](https://github.com/jetty/jetty.project/commit/fe94c9f8a40df49021b28280f708448870c5b420). The full release notes as follows: - https://github.com/jetty/jetty.project/releases/tag/jetty-11.0.21 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46843 from LuciferYang/jetty-11.0.21. Authored-by: yangjie01 Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 65e627b1854f..3d8ffee05d3a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -137,8 +137,8 @@ jersey-container-servlet/3.0.12//jersey-container-servlet-3.0.12.jar jersey-hk2/3.0.12//jersey-hk2-3.0.12.jar jersey-server/3.0.12//jersey-server-3.0.12.jar jettison/1.5.4//jettison-1.5.4.jar -jetty-util-ajax/11.0.20//jetty-util-ajax-11.0.20.jar -jetty-util/11.0.20//jetty-util-11.0.20.jar +jetty-util-ajax/11.0.21//jetty-util-ajax-11.0.21.jar +jetty-util/11.0.21//jetty-util-11.0.21.jar jline/2.14.6//jline-2.14.6.jar jline/3.25.1//jline-3.25.1.jar jna/5.14.0//jna-5.14.0.jar diff --git a/pom.xml b/pom.xml index ded8cc2405fd..ce3b4041ae57 100644 --- a/pom.xml +++ b/pom.xml @@ -140,7 +140,7 @@ 1.13.1 2.0.1 shaded-protobuf -11.0.20 +11.0.21 5.0.0 4.0.1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (8d534c048866 -> 9270931221d4)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8d534c048866 [SPARK-48487][INFRA] Update License & Notice according to the dependency changes add 9270931221d4 [SPARK-48433][BUILD] Upgrade `checkstyle` to 10.17.0 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- project/plugins.sbt | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 11c06fcbf2e6 [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts 11c06fcbf2e6 is described below commit 11c06fcbf2e62e870c758cedcd386ba2d539352d Author: jackylee-ch AuthorDate: Fri May 31 22:37:49 2024 +0800 [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts ### What changes were proposed in this pull request? After #40064 , we always get the same TaskAttemptId for different task attempts which has the same partitionId. This would lead different task attempts write to the same directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46811 from jackylee-ch/fix_v2write_use_same_directories_for_different_task_attempts. Lead-authored-by: jackylee-ch Co-authored-by: Kent Yao Signed-off-by: yangjie01 (cherry picked from commit 67d11b1992aaa100d0e1fa30b0e5c33684c93a89) Signed-off-by: yangjie01 --- .../datasources/v2/FileWriterFactory.scala | 8 ++-- .../datasources/v2/FileWriterFactorySuite.scala| 48 ++ 2 files changed, 53 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala index 4b1a099d3bac..f18424b4bcb8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala @@ -38,7 +38,7 @@ case class FileWriterFactory ( @transient private lazy val jobId = SparkHadoopWriterUtils.createJobID(jobTrackerID, 0) override def createWriter(partitionId: Int, realTaskId: Long): DataWriter[InternalRow] = { -val taskAttemptContext = createTaskAttemptContext(partitionId) +val taskAttemptContext = createTaskAttemptContext(partitionId, realTaskId.toInt & Int.MaxValue) committer.setupTask(taskAttemptContext) if (description.partitionColumns.isEmpty) { new SingleDirectoryDataWriter(description, taskAttemptContext, committer) @@ -47,9 +47,11 @@ case class FileWriterFactory ( } } - private def createTaskAttemptContext(partitionId: Int): TaskAttemptContextImpl = { + private def createTaskAttemptContext( + partitionId: Int, + realTaskId: Int): TaskAttemptContextImpl = { val taskId = new TaskID(jobId, TaskType.MAP, partitionId) -val taskAttemptId = new TaskAttemptID(taskId, 0) +val taskAttemptId = new TaskAttemptID(taskId, realTaskId) // Set up the configuration object val hadoopConf = description.serializableHadoopConf.value hadoopConf.set("mapreduce.job.id", jobId.toString) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala new file mode 100644 index ..bd2030797441 --- /dev/null +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.mockito.Mockito._ +import org.scalatest.PrivateMethodTester + +import org.apache.spark.SparkFunSuite +import org.apache.spark.internal.io.FileCommitProtocol +import org.apache.spark.sql.execution.datasources.WriteJobDescription +import org.apache.spark.util.SerializableConfiguration + +class FileWriterFactorySuite extends SparkFunSuite with PrivateMethodTester
(spark) branch branch-3.5 updated: [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 7d39000f809a [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts 7d39000f809a is described below commit 7d39000f809a117d2ef9e73e46697704e45ba262 Author: jackylee-ch AuthorDate: Fri May 31 22:37:49 2024 +0800 [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts ### What changes were proposed in this pull request? After #40064 , we always get the same TaskAttemptId for different task attempts which has the same partitionId. This would lead different task attempts write to the same directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46811 from jackylee-ch/fix_v2write_use_same_directories_for_different_task_attempts. Lead-authored-by: jackylee-ch Co-authored-by: Kent Yao Signed-off-by: yangjie01 (cherry picked from commit 67d11b1992aaa100d0e1fa30b0e5c33684c93a89) Signed-off-by: yangjie01 --- .../datasources/v2/FileWriterFactory.scala | 8 ++-- .../datasources/v2/FileWriterFactorySuite.scala| 48 ++ 2 files changed, 53 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala index 4b1a099d3bac..f18424b4bcb8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala @@ -38,7 +38,7 @@ case class FileWriterFactory ( @transient private lazy val jobId = SparkHadoopWriterUtils.createJobID(jobTrackerID, 0) override def createWriter(partitionId: Int, realTaskId: Long): DataWriter[InternalRow] = { -val taskAttemptContext = createTaskAttemptContext(partitionId) +val taskAttemptContext = createTaskAttemptContext(partitionId, realTaskId.toInt & Int.MaxValue) committer.setupTask(taskAttemptContext) if (description.partitionColumns.isEmpty) { new SingleDirectoryDataWriter(description, taskAttemptContext, committer) @@ -47,9 +47,11 @@ case class FileWriterFactory ( } } - private def createTaskAttemptContext(partitionId: Int): TaskAttemptContextImpl = { + private def createTaskAttemptContext( + partitionId: Int, + realTaskId: Int): TaskAttemptContextImpl = { val taskId = new TaskID(jobId, TaskType.MAP, partitionId) -val taskAttemptId = new TaskAttemptID(taskId, 0) +val taskAttemptId = new TaskAttemptID(taskId, realTaskId) // Set up the configuration object val hadoopConf = description.serializableHadoopConf.value hadoopConf.set("mapreduce.job.id", jobId.toString) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala new file mode 100644 index ..bd2030797441 --- /dev/null +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.mockito.Mockito._ +import org.scalatest.PrivateMethodTester + +import org.apache.spark.SparkFunSuite +import org.apache.spark.internal.io.FileCommitProtocol +import org.apache.spark.sql.execution.datasources.WriteJobDescription +import org.apache.spark.util.SerializableConfiguration + +class FileWriterFactorySuite extends SparkFunSuite with PrivateMethodTester
(spark) branch master updated: [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 67d11b1992aa [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts 67d11b1992aa is described below commit 67d11b1992aaa100d0e1fa30b0e5c33684c93a89 Author: jackylee-ch AuthorDate: Fri May 31 22:37:49 2024 +0800 [SPARK-48484][SQL] Fix: V2Write use the same TaskAttemptId for different task attempts ### What changes were proposed in this pull request? After #40064 , we always get the same TaskAttemptId for different task attempts which has the same partitionId. This would lead different task attempts write to the same directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46811 from jackylee-ch/fix_v2write_use_same_directories_for_different_task_attempts. Lead-authored-by: jackylee-ch Co-authored-by: Kent Yao Signed-off-by: yangjie01 --- .../datasources/v2/FileWriterFactory.scala | 8 ++-- .../datasources/v2/FileWriterFactorySuite.scala| 48 ++ 2 files changed, 53 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala index 4b1a099d3bac..f18424b4bcb8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactory.scala @@ -38,7 +38,7 @@ case class FileWriterFactory ( @transient private lazy val jobId = SparkHadoopWriterUtils.createJobID(jobTrackerID, 0) override def createWriter(partitionId: Int, realTaskId: Long): DataWriter[InternalRow] = { -val taskAttemptContext = createTaskAttemptContext(partitionId) +val taskAttemptContext = createTaskAttemptContext(partitionId, realTaskId.toInt & Int.MaxValue) committer.setupTask(taskAttemptContext) if (description.partitionColumns.isEmpty) { new SingleDirectoryDataWriter(description, taskAttemptContext, committer) @@ -47,9 +47,11 @@ case class FileWriterFactory ( } } - private def createTaskAttemptContext(partitionId: Int): TaskAttemptContextImpl = { + private def createTaskAttemptContext( + partitionId: Int, + realTaskId: Int): TaskAttemptContextImpl = { val taskId = new TaskID(jobId, TaskType.MAP, partitionId) -val taskAttemptId = new TaskAttemptID(taskId, 0) +val taskAttemptId = new TaskAttemptID(taskId, realTaskId) // Set up the configuration object val hadoopConf = description.serializableHadoopConf.value hadoopConf.set("mapreduce.job.id", jobId.toString) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala new file mode 100644 index ..bd2030797441 --- /dev/null +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/FileWriterFactorySuite.scala @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.mockito.Mockito._ +import org.scalatest.PrivateMethodTester + +import org.apache.spark.SparkFunSuite +import org.apache.spark.internal.io.FileCommitProtocol +import org.apache.spark.sql.execution.datasources.WriteJobDescription +import org.apache.spark.util.SerializableConfiguration + +class FileWriterFactorySuite extends SparkFunSuite with PrivateMethodTester { + + test("SPARK-48484: V2Write uses different TaskAttemptIds for different task attempts&quo
(spark) branch master updated: [SPARK-47361][SQL] Derby: Calculate suitable precision and scale for DECIMAL type
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 69afd4be9c93 [SPARK-47361][SQL] Derby: Calculate suitable precision and scale for DECIMAL type 69afd4be9c93 is described below commit 69afd4be9c93cb31a840b969ed1984c0b6b92f8e Author: Kent Yao AuthorDate: Thu May 30 17:31:28 2024 +0800 [SPARK-47361][SQL] Derby: Calculate suitable precision and scale for DECIMAL type ### What changes were proposed in this pull request? When storing `decimal(p, s)` to derby, if `p > 31`, `s` is wrongly hardcoded to `5` which is the assumed default scale of derby decimal. Actually, 0 is the default scale, 5 is the default precision https://db.apache.org/derby/docs/10.13/ref/rrefsqlj15260.html This PR calculates a suitable scale to make room for precision. ### Why are the changes needed? avoid precision loss ### Does this PR introduce _any_ user-facing change? Yes, but derby is rare in production environments, and the new mapping are compatible for most usecases ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46776 from yaooqinn/SPARK-48439. Authored-by: Kent Yao Signed-off-by: yangjie01 --- .../main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala | 12 +--- .../datasources/v2/jdbc/DerbyTableCatalogSuite.scala | 8 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala index 36af0e6aeaf1..23da4dbb60a5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala @@ -48,9 +48,15 @@ private case class DerbyDialect() extends JdbcDialect { case ByteType => Option(JdbcType("SMALLINT", java.sql.Types.SMALLINT)) case ShortType => Option(JdbcType("SMALLINT", java.sql.Types.SMALLINT)) case BooleanType => Option(JdbcType("BOOLEAN", java.sql.Types.BOOLEAN)) -// 31 is the maximum precision and 5 is the default scale for a Derby DECIMAL -case t: DecimalType if t.precision > 31 => - Option(JdbcType("DECIMAL(31,5)", java.sql.Types.DECIMAL)) +// 31 is the maximum precision +// https://db.apache.org/derby/docs/10.13/ref/rrefsqlj15260.html +case t: DecimalType => + val (p, s) = if (t.precision > 31) { +(31, math.max(t.scale - (t.precision - 31), 0)) + } else { +(t.precision, t.scale) + } + Option(JdbcType(s"DECIMAL($p,$s)", java.sql.Types.DECIMAL)) case _ => None } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala index e3714e604495..d793ef526c47 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/DerbyTableCatalogSuite.scala @@ -51,4 +51,12 @@ class DerbyTableCatalogSuite extends QueryTest with SharedSparkSession { checkAnswer(sql(s"SHOW TABLES IN derby.test1"), Row("test1", "TABLE2", false)) } } + + test("SPARK-48439: Calculate suitable precision and scale for DECIMAL type") { +withTable("derby.test1.table1") { + sql("CREATE TABLE derby.test1.table1 (c1 decimal(38, 18))") + sql("INSERT INTO derby.test1.table1 VALUES (1.123456789123456789)") + checkAnswer(sql("SELECT * FROM derby.test1.table1"), Row(1.12345678912)) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48420][BUILD] Upgrade netty to `4.1.110.Final`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a88cc1ad9319 [SPARK-48420][BUILD] Upgrade netty to `4.1.110.Final` a88cc1ad9319 is described below commit a88cc1ad9319bd0f4a14e2d6094865229449c8cb Author: panbingkun AuthorDate: Tue May 28 13:09:39 2024 +0800 [SPARK-48420][BUILD] Upgrade netty to `4.1.110.Final` ### What changes were proposed in this pull request? The pr aims to upgrade `netty` from `4.1.109.Final` to `4.1.110.Final`. ### Why are the changes needed? - https://netty.io/news/2024/05/22/4-1-110-Final.html This version has brought some bug fixes and improvements, such as: Fix Zstd throws Exception on read-only volumes (https://github.com/netty/netty/pull/13982) Add unix domain socket transport in netty 4.x via JDK16+ ([#13965](https://github.com/netty/netty/pull/13965)) Backport #13075: Add the AdaptivePoolingAllocator ([#13976](https://github.com/netty/netty/pull/13976)) Add no-value key handling only for form body ([#13998](https://github.com/netty/netty/pull/13998)) Add support for specifying SecureRandom in SSLContext initialization ([#14058](https://github.com/netty/netty/pull/14058)) - https://github.com/netty/netty/issues?q=milestone%3A4.1.110.Final+is%3Aclosed ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46744 from panbingkun/SPARK-48420. Authored-by: panbingkun Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 38 +-- pom.xml | 2 +- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 10d812c9fd8a..e854bd0e804a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -197,31 +197,31 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar metrics-json/4.2.25//metrics-json-4.2.25.jar metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.109.Final//netty-all-4.1.109.Final.jar -netty-buffer/4.1.109.Final//netty-buffer-4.1.109.Final.jar -netty-codec-http/4.1.109.Final//netty-codec-http-4.1.109.Final.jar -netty-codec-http2/4.1.109.Final//netty-codec-http2-4.1.109.Final.jar -netty-codec-socks/4.1.109.Final//netty-codec-socks-4.1.109.Final.jar -netty-codec/4.1.109.Final//netty-codec-4.1.109.Final.jar -netty-common/4.1.109.Final//netty-common-4.1.109.Final.jar -netty-handler-proxy/4.1.109.Final//netty-handler-proxy-4.1.109.Final.jar -netty-handler/4.1.109.Final//netty-handler-4.1.109.Final.jar -netty-resolver/4.1.109.Final//netty-resolver-4.1.109.Final.jar +netty-all/4.1.110.Final//netty-all-4.1.110.Final.jar +netty-buffer/4.1.110.Final//netty-buffer-4.1.110.Final.jar +netty-codec-http/4.1.110.Final//netty-codec-http-4.1.110.Final.jar +netty-codec-http2/4.1.110.Final//netty-codec-http2-4.1.110.Final.jar +netty-codec-socks/4.1.110.Final//netty-codec-socks-4.1.110.Final.jar +netty-codec/4.1.110.Final//netty-codec-4.1.110.Final.jar +netty-common/4.1.110.Final//netty-common-4.1.110.Final.jar +netty-handler-proxy/4.1.110.Final//netty-handler-proxy-4.1.110.Final.jar +netty-handler/4.1.110.Final//netty-handler-4.1.110.Final.jar +netty-resolver/4.1.110.Final//netty-resolver-4.1.110.Final.jar netty-tcnative-boringssl-static/2.0.65.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-aarch_64.jar netty-tcnative-boringssl-static/2.0.65.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar netty-tcnative-boringssl-static/2.0.65.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.65.Final-osx-aarch_64.jar netty-tcnative-boringssl-static/2.0.65.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-osx-x86_64.jar netty-tcnative-boringssl-static/2.0.65.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-windows-x86_64.jar netty-tcnative-classes/2.0.65.Final//netty-tcnative-classes-2.0.65.Final.jar -netty-transport-classes-epoll/4.1.109.Final//netty-transport-classes-epoll-4.1.109.Final.jar -netty-transport-classes-kqueue/4.1.109.Final//netty-transport-classes-kqueue-4.1.109.Final.jar -netty-transport-native-epoll/4.1.109.Final/linux-aarch_64/netty-transport-native-epoll-4.1.109.Final-linux-aarch_64.jar -netty-transport-native-epoll/4.1.109.Final/linux-riscv64/netty-transport-native-epoll-4.1.109.Final-linux-riscv64.jar -netty-transport-native-epoll/4.1.109.Final/linux-x86_64/netty-transport-native-epoll-4.1.109.Final-linux-x86_64.jar -netty-transport-native-kqueue/4.1.109.Final/osx-aarch_64/netty-transport
(spark) branch master updated (3346afd4b250 -> ef43bbbc1163)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3346afd4b250 [SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import add ef43bbbc1163 [SPARK-48384][BUILD] Exclude `io.netty:netty-tcnative-boringssl-static` from `zookeeper` No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 1 - pom.xml | 4 2 files changed, 4 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48386][TESTS] Replace JVM assert with JUnit Assert in tests
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5df9a0866ae6 [SPARK-48386][TESTS] Replace JVM assert with JUnit Assert in tests 5df9a0866ae6 is described below commit 5df9a0866ae60a42d78136a21a82a0b6e58daefa Author: panbingkun AuthorDate: Thu May 23 10:46:08 2024 +0800 [SPARK-48386][TESTS] Replace JVM assert with JUnit Assert in tests ### What changes were proposed in this pull request? The pr aims to replace `JVM assert` with `JUnit Assert` in tests. ### Why are the changes needed? assert() statements do not produce as useful errors when they fail, and, if they were somehow disabled, would fail to test anything. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46698 from panbingkun/minor_assert. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../protocol/EncryptedMessageWithHeaderSuite.java | 2 +- .../shuffle/RetryingBlockTransferorSuite.java | 8 +++--- .../apache/spark/util/SparkLoggerSuiteBase.java| 30 -- .../apache/spark/sql/TestStatefulProcessor.java| 10 +--- .../sql/TestStatefulProcessorWithInitialState.java | 4 ++- .../JavaAdvancedDataSourceV2WithV2Filter.java | 14 +- 6 files changed, 38 insertions(+), 30 deletions(-) diff --git a/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java b/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java index 7478fa1db711..2865d411bf67 100644 --- a/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java +++ b/common/network-common/src/test/java/org/apache/spark/network/protocol/EncryptedMessageWithHeaderSuite.java @@ -116,7 +116,7 @@ public class EncryptedMessageWithHeaderSuite { // Validate we read data correctly assertEquals(bodyResult.readableBytes(), chunkSize); - assert(bodyResult.readableBytes() < (randomData.length - readIndex)); + assertTrue(bodyResult.readableBytes() < (randomData.length - readIndex)); while (bodyResult.readableBytes() > 0) { assertEquals(bodyResult.readByte(), randomData[readIndex++]); } diff --git a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java index 3725973ae733..84c8b1b3353f 100644 --- a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java +++ b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java @@ -288,7 +288,7 @@ public class RetryingBlockTransferorSuite { verify(listener, timeout(5000)).onBlockTransferSuccess("b0", block0); verify(listener).getTransferType(); verifyNoMoreInteractions(listener); -assert(_retryingBlockTransferor.getRetryCount() == 0); +assertEquals(0, _retryingBlockTransferor.getRetryCount()); } @Test @@ -310,7 +310,7 @@ public class RetryingBlockTransferorSuite { verify(listener, timeout(5000)).onBlockTransferFailure("b0", saslTimeoutException); verify(listener, times(3)).getTransferType(); verifyNoMoreInteractions(listener); -assert(_retryingBlockTransferor.getRetryCount() == MAX_RETRIES); +assertEquals(MAX_RETRIES, _retryingBlockTransferor.getRetryCount()); } @Test @@ -339,7 +339,7 @@ public class RetryingBlockTransferorSuite { // This should be equal to 1 because after the SASL exception is retried, // retryCount should be set back to 0. Then after that b1 encounters an // exception that is retried. -assert(_retryingBlockTransferor.getRetryCount() == 1); +assertEquals(1, _retryingBlockTransferor.getRetryCount()); } @Test @@ -368,7 +368,7 @@ public class RetryingBlockTransferorSuite { verify(listener, timeout(5000)).onBlockTransferFailure("b0", saslExceptionFinal); verify(listener, atLeastOnce()).getTransferType(); verifyNoMoreInteractions(listener); -assert(_retryingBlockTransferor.getRetryCount() == MAX_RETRIES); +assertEquals(MAX_RETRIES, _retryingBlockTransferor.getRetryCount()); } @Test diff --git a/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java b/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java index 46bfe3415080..0869f9827324 100644 --- a/common/utils/src/test/java/org/apache/spark/util/SparkLogg
(spark) branch master updated: [SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4fc2910f92d1 [SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation 4fc2910f92d1 is described below commit 4fc2910f92d1b5f7e0dd5f803e822668f23c21c5 Author: Cheng Pan AuthorDate: Mon May 20 20:42:57 2024 +0800 [SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked implementation ### What changes were proposed in this pull request? This PR replaces AmIpFilter with a forked implementation, and removes the dependency `hadoop-yarn-server-web-proxy` ### Why are the changes needed? SPARK-47118 upgraded Spark built-in Jetty from 10 to 11, and migrated from `javax.servlet` to `jakarta.servlet`, which breaks the Spark on YARN. ``` Caused by: java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749) ... 38 more ``` During the investigation, I found a comment here https://github.com/apache/spark/pull/31642#issuecomment-786257114 > Agree that in the long term we should either: 1) consider to re-implement the logic in Spark which allows us to get away from server-side dependency in Hadoop ... This should be a simple and clean way to address the exact issue, then we don't need to wait for Hadoop `jakarta.servlet` migration, and it also strips a Hadoop dependency. ### Does this PR introduce _any_ user-facing change? No, this recovers the bootstrap of the Spark application on YARN mode, keeping the same behavior with Spark 3.5 and earlier versions. ### How was this patch tested? UTs are added. (refer to `org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter`) I tested it in a YARN cluster. Spark successfully started. ``` roothadoop-master1:/opt/spark-SPARK-48238# JAVA_HOME=/opt/openjdk-17 bin/spark-sql --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 --conf spark.executorEnv.JAVA_HOME=/opt/openjdk-17 WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2024-05-18 04:11:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-05-18 04:11:44 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive} is set, falling back to uploading libraries under SPARK_HOME. Spark Web UI available at http://hadoop-master1.orb.local:4040 Spark master: yarn, Application Id: application_1716005503866_0001 spark-sql (default)> select version(); 4.0.0 4ddc2303c7cbabee12a3de9f674aaacad3f5eb01 Time taken: 1.707 seconds, Fetched 1 row(s) spark-sql (default)> ``` When access `http://hadoop-master1.orb.local:4040`, it redirects to `http://hadoop-master1.orb.local:8088/proxy/redirect/application_1716005503866_0001/`, and the UI looks correct. https://github.com/apache/spark/assets/26535726/8500fc83-48c5-4603-8d05-37855f0308ae";> ### Was this patch authored or co-authored using generative AI tooling? No Closes #46611 from pan3793/SPARK-48238. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- assembly/pom.xml | 4 - dev/deps/spark-deps-hadoop-3-hive-2.3 | 1 - pom.xml| 77 - .../org/apache/spark/deploy/yarn/AmIpFilter.java | 239 ++ .../apache/spark/deploy/yarn/AmIpPrincipal.java| 35 +++ .../deploy/yarn/AmIpServletRequestWrapper.java | 54 .../org/apache/spark/deploy/yarn/ProxyUtils.java | 126 .../spark/deploy/yarn/ApplicationMaster.scala | 2 +- .../apache/spark/deploy/yarn/AmIpFilterSuite.scala | 342 + .../org/apache/spark/streaming/Checkpoint.scala| 2 +- 10 files changed, 798 insertions(+), 84 deletions(-) diff --g
(spark) branch master updated: [SPARK-48242][BUILD] Upgrade extra-enforcer-rules to 1.8.0
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2eea28b5efd4 [SPARK-48242][BUILD] Upgrade extra-enforcer-rules to 1.8.0 2eea28b5efd4 is described below commit 2eea28b5efd4ae30d7962c92b2f9851cf3938b5e Author: panbingkun AuthorDate: Mon May 20 17:23:59 2024 +0800 [SPARK-48242][BUILD] Upgrade extra-enforcer-rules to 1.8.0 ### What changes were proposed in this pull request? The pr aims to upgrade `extra-enforcer-rules to 1.8.0` from `1.7.0` to `1.8.0`. ### Why are the changes needed? The full release notes: https://github.com/mojohaus/extra-enforcer-rules/releases/tag/1.8.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. ``` sh dev/test-dependencies.sh ``` - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46538 from panbingkun/SPARK-48242. Authored-by: panbingkun Signed-off-by: yangjie01 --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 611e82f343d8..5811e5b7716d 100644 --- a/pom.xml +++ b/pom.xml @@ -3008,7 +3008,7 @@ org.codehaus.mojo extra-enforcer-rules - 1.7.0 + 1.8.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48274][BUILD] Upgrade GenJavadoc to `0.19`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7441bd0197e6 [SPARK-48274][BUILD] Upgrade GenJavadoc to `0.19` 7441bd0197e6 is described below commit 7441bd0197e6442c4f98481bf2fb23b49b5f75cf Author: panbingkun AuthorDate: Wed May 15 10:22:34 2024 +0800 [SPARK-48274][BUILD] Upgrade GenJavadoc to `0.19` ### What changes were proposed in this pull request? This PR upgrades `GenJavadoc` plugin from `0.18` to `0.19`. ### Why are the changes needed? 1.The full release notes: https://github.com/lightbend/genjavadoc/releases/tag/v0.19 2.The latest version supports scala `2.13.14`, which is a `prerequisite` for us to upgrade spark's scala `2.13.14`. https://mvnrepository.com/artifact/com.typesafe.genjavadoc/genjavadoc-plugin 3.The last upgrade occurred 3 years ago https://github.com/apache/spark/pull/33383 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Built the doc: ``` ./build/sbt -Phadoop-3 -Pkubernetes -Pkinesis-asl -Phive-thriftserver -Pdocker-integration-tests -Pyarn -Phadoop-cloud -Pspark-ganglia-lgpl -Phive -Pjvm-profiler unidoc ``` https://github.com/apache/spark/assets/15246973/58d3fac8-c968-44e0-83f3-84cf00a5084f";> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46579 from panbingkun/unidocGenjavadocVersion_0_19. Authored-by: panbingkun Signed-off-by: yangjie01 --- project/SparkBuild.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 5bb7745d77bf..d1b0ed953e30 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -266,7 +266,7 @@ object SparkBuild extends PomBuild { .orElse(sys.props.get("java.home").map { p => new File(p).getParentFile().getAbsolutePath() }) .map(file), publishMavenStyle := true, -unidocGenjavadocVersion := "0.18", +unidocGenjavadocVersion := "0.19", // Override SBT's default resolvers: resolvers := Seq( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48257][BUILD] Polish POM for Hive dependencies
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c4c9ccbdf562 [SPARK-48257][BUILD] Polish POM for Hive dependencies c4c9ccbdf562 is described below commit c4c9ccbdf562b5da6066d6cd0517ab27bf9de3fa Author: Cheng Pan AuthorDate: Mon May 13 21:59:14 2024 +0800 [SPARK-48257][BUILD] Polish POM for Hive dependencies ### What changes were proposed in this pull request? 1. `org.apache.hive` and `${hive.group}` co-exists in `pom.xml`, this PR unifies them to `${hive.group}` 2. `hive23.version`, `hive.version.short`, `` were used in Spark 3.0 period to distinguish hive 1.2 and hive 2.3, which are useless today, this PR removes those outdated definitions. 3. update/remove some outdated comments. e.g. remove the comment for Hive LOG4J exclusion because Spark already switched to LOG4J2, generalize the comments for Hive Parquet/Jetty exclusion ### Why are the changes needed? Cleanup POM. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass CI. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46558 from pan3793/SPARK-48257. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- pom.xml | 48 ++-- sql/core/pom.xml | 2 +- sql/hive/pom.xml | 4 ++-- 3 files changed, 21 insertions(+), 33 deletions(-) diff --git a/pom.xml b/pom.xml index ad6e9391b68c..12d20f4f0736 100644 --- a/pom.xml +++ b/pom.xml @@ -133,9 +133,6 @@ core 2.3.10 -2.3.10 - -2.3 3.7.0 @@ -2112,7 +2109,6 @@ commons-logging commons-logging - + org.eclipse.jetty.aggregate jetty-all - org.apache.logging.log4j * @@ -2139,10 +2134,9 @@ -org.apache.hive +${hive.group} hive-storage-api - @@ -2261,7 +2255,6 @@ org.json json - ${hive.group} @@ -2276,7 +2269,6 @@ org.apache.calcite.avatica avatica - org.apache.logging.log4j * @@ -2297,7 +2289,6 @@ net.hydromatic aggdesigner-algorithm - @@ -2410,7 +2401,6 @@ org.slf4j slf4j-log4j12 - org.apache.hbase @@ -2420,7 +2410,6 @@ co.cask.tephra * - @@ -2478,12 +2467,14 @@ org.codehaus.groovy groovy-all - ${hive.group} hive-service-rpc - + org.apache.parquet parquet-hadoop-bundle @@ -2497,7 +2488,6 @@ tomcat jasper-runtime - @@ -2574,30 +2564,28 @@ org.codehaus.groovy groovy-all - org.apache.logging.log4j log4j-slf4j-impl - -org.apache.hive +${hive.group} hive-llap-common -${hive23.version} +${hive.version} ${hive.deps.scope} -org.apache.hive +${hive.group} hive-common -org.apache.hive +${hive.group} hive-serde @@ -2608,21 +2596,21 @@ -org.apache.hive +${hive.group} hive-llap-client -${hive23.version} +${hive.version} test -org.apache.hive +${hive.group} hive-common -org.apache.hive +${hive.group} hive-serde -org.apache.hive +${hive.group} hive-llap-common @@ -2683,7 +2671,7 @@ hadoop-client-api -org.apache.hive +${hive.group} hive-storage-api @@ -2713,7 +2701,7 @@ orc-core -org.apache.hive +${hive.group} hive-storage-api @@ -2902,7
(spark) branch master updated (cae2248bc13d -> acc37531deb9)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cae2248bc13d [MINOR][PYTHON][TESTS] Move test `test_named_arguments_negative` to `test_arrow_python_udf` add acc37531deb9 [SPARK-47993][PYTHON][FOLLOW-UP] Update migration guide about Python 3.8 dropped No new revisions were added by this update. Summary of changes: python/docs/source/migration_guide/pyspark_upgrade.rst | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8294c5962feb [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test 8294c5962feb is described below commit 8294c5962febe53eebdff79f65f5f293d93a1997 Author: Dongjoon Hyun AuthorDate: Mon May 6 13:45:54 2024 +0800 [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test ### What changes were proposed in this pull request? This PR aims to disable a flaky test, `SparkSessionE2ESuite.interrupt tag`, temporarily. To re-enable this, SPARK-48139 is created as a blocker issue for 4.0.0. ### Why are the changes needed? This test case was added at `Apache Spark 3.5.0` but has been unstable unfortunately until now. - #42009 We tried to stabilize this test case before `Apache Spark 4.0.0-preview`. - #45173 - #46374 However, it's still flaky. - https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 (Master, 2024-05-05) - https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 (Master, 2024-05-04) This PR aims to stablize CI first and to focus this flaky issue as a blocker level before going on `Spark Connect GA` in SPARK-48139 before Apache Spark 4.0.0. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46396 from dongjoon-hyun/SPARK-48138. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- .../jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala index d1015d55b1df..f56085191f87 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala @@ -108,7 +108,8 @@ class SparkSessionE2ESuite extends RemoteSparkSession { assert(interrupted.length == 2, s"Interrupted operations: $interrupted.") } - test("interrupt tag") { + // TODO(SPARK-48139): Re-enable `SparkSessionE2ESuite.interrupt tag` + ignore("interrupt tag") { val session = spark import session.implicits._ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1f9e09ce2148 [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args 1f9e09ce2148 is described below commit 1f9e09ce2148dfc5e0fd9f3e43e5ceef8133414b Author: panbingkun AuthorDate: Sun Apr 28 16:24:43 2024 +0800 [SPARK-48021][ML][BUILD][FOLLOWUP] add `--add-modules=jdk.incubator.vector` to maven compile args ### What changes were proposed in this pull request? The pr is following up https://github.com/apache/spark/pull/46246 The pr aims to add `--add-modules=jdk.incubator.vector` to maven `compile args`. ### Why are the changes needed? As commented by LuciferYang , we need to be consistent in `maven` compile. https://github.com/apache/spark/pull/46246#issuecomment-2081298219 https://github.com/apache/spark/assets/15246973/26163da2-f27d-4ec2-893f-d9282b68aec1";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46259 from panbingkun/SPARK-48021. Authored-by: panbingkun Signed-off-by: yangjie01 --- pom.xml | 1 + 1 file changed, 1 insertion(+) diff --git a/pom.xml b/pom.xml index b916659fdbfa..efbf93856333 100644 --- a/pom.xml +++ b/pom.xml @@ -304,6 +304,7 @@ -XX:+IgnoreUnrecognizedVMOptions + --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL"
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61dc9d991373 [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL" 61dc9d991373 is described below commit 61dc9d991373c01d449a8ed26d9bfd7eb93f9301 Author: Cheng Pan AuthorDate: Mon Apr 22 18:36:49 2024 +0800 [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL" ### What changes were proposed in this pull request? `SQLQuerySuite`/"SPARK-33084: Add jar support Ivy URI in SQL" uses Hive deps to test `ADD JAR` which pulls tons of transitive deps, this PR replaces it with light jars but covers all semantics to speed up the UT. ### Why are the changes needed? Speed up the test, and reduce unnecessary relationships with Hive. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Run UT locally. Before ``` [info] - SPARK-33084: Add jar support Ivy URI in SQL (16 minutes, 55 seconds) ``` After ``` [info] - SPARK-33084: Add jar support Ivy URI in SQL (17 seconds, 783 milliseconds) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46150 from pan3793/SPARK-47928. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 25 +++--- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index f81369bbad36..78d4b91088a6 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -3748,22 +3748,21 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark test("SPARK-33084: Add jar support Ivy URI in SQL") { val sc = spark.sparkContext -val hiveVersion = "2.3.9" // transitive=false, only download specified jar -sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion?transitive=false") -assert(sc.listJars() - .exists(_.contains(s"org.apache.hive.hcatalog_hive-hcatalog-core-$hiveVersion.jar"))) +sql(s"ADD JAR ivy://org.springframework:spring-core:6.1.6?transitive=false") + assert(sc.listJars().exists(_.contains("org.springframework_spring-core-6.1.6.jar"))) + assert(!sc.listJars().exists(_.contains("org.springframework_spring-jcl-6.1.6.jar"))) // default transitive=true, test download ivy URL jar return multiple jars -sql("ADD JAR ivy://org.scala-js:scalajs-test-interface_2.12:1.2.0") -assert(sc.listJars().exists(_.contains("scalajs-library_2.12"))) -assert(sc.listJars().exists(_.contains("scalajs-test-interface_2.12"))) - -sql(s"ADD JAR ivy://org.apache.hive:hive-contrib:$hiveVersion" + - "?exclude=org.pentaho:pentaho-aggdesigner-algorithm&transitive=true") - assert(sc.listJars().exists(_.contains(s"org.apache.hive_hive-contrib-$hiveVersion.jar"))) - assert(sc.listJars().exists(_.contains(s"org.apache.hive_hive-exec-$hiveVersion.jar"))) - assert(!sc.listJars().exists(_.contains("org.pentaho.pentaho_aggdesigner-algorithm"))) +sql("ADD JAR ivy://org.awaitility:awaitility:4.2.1") + assert(sc.listJars().exists(_.contains("org.awaitility_awaitility-4.2.1.jar"))) +assert(sc.listJars().exists(_.contains("org.hamcrest_hamcrest-2.1.jar"))) + +sql("ADD JAR ivy://org.junit.jupiter:junit-jupiter:5.10.2" + + "?exclude=org.junit.jupiter:junit-jupiter-engine&transitive=true") + assert(sc.listJars().exists(_.contains("org.junit.jupiter_junit-jupiter-api-5.10.2.jar"))) + assert(sc.listJars().exists(_.contains("org.junit.jupiter_junit-jupiter-params-5.10.2.jar"))) + assert(!sc.listJars().exists(_.contains("org.junit.jupiter_junit-jupiter-engine-5.10.2.jar"))) } test("SPARK-33677: LikeSimplification should be skipped if pattern contains any escapeChar") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (458f70bd5213 -> 2d0b56c3eac6)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 458f70bd5213 [SPARK-47902][SQL] Making Compute Current Time* expressions foldable add 2d0b56c3eac6 [SPARK-47932][SQL][TESTS] Avoid using legacy commons-lang No new revisions were added by this update. Summary of changes: sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47901][BUILD] Upgrade common-text 1.12.0
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 06b12fc31709 [SPARK-47901][BUILD] Upgrade common-text 1.12.0 06b12fc31709 is described below commit 06b12fc317093d5a45c7c76e6617a9917f98b10d Author: yangjie01 AuthorDate: Fri Apr 19 13:21:37 2024 +0800 [SPARK-47901][BUILD] Upgrade common-text 1.12.0 ### What changes were proposed in this pull request? This pr aims to upgrade Apache common-text from 1.11.0 to 1.12.0 ### Why are the changes needed? The new version bring 2 bug fix: - [TEXT-232](https://issues.apache.org/jira/browse/TEXT-232): WordUtils.containsAllWords?() may throw PatternSyntaxException - [TEXT-175](https://issues.apache.org/jira/browse/TEXT-175): Fix regression for determining whitespace in WordUtils The full release notes as follows: - https://github.com/apache/commons-text/blob/rel/commons-text-1.12.0/RELEASE-NOTES.txt ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46127 from LuciferYang/commons-text-1.12.0. Authored-by: yangjie01 Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 45a4d499e513..770a7522e9f7 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -48,7 +48,7 @@ commons-lang/2.6//commons-lang-2.6.jar commons-lang3/3.14.0//commons-lang3-3.14.0.jar commons-math3/3.6.1//commons-math3-3.6.1.jar commons-pool/1.5.4//commons-pool-1.5.4.jar -commons-text/1.11.0//commons-text-1.11.0.jar +commons-text/1.12.0//commons-text-1.12.0.jar compress-lzf/1.1.2//compress-lzf-1.1.2.jar curator-client/5.6.0//curator-client-5.6.0.jar curator-framework/5.6.0//curator-framework-5.6.0.jar diff --git a/pom.xml b/pom.xml index 682365d9704a..74a2a61d6e09 100644 --- a/pom.xml +++ b/pom.xml @@ -606,7 +606,7 @@ org.apache.commons commons-text -1.11.0 +1.12.0 commons-lang - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47850][SQL] Support `spark.sql.hive.convertInsertingUnpartitionedTable`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 52a2e63dd714 [SPARK-47850][SQL] Support `spark.sql.hive.convertInsertingUnpartitionedTable` 52a2e63dd714 is described below commit 52a2e63dd7147e2701c9c26667fe5bd9fdc3f14c Author: Cheng Pan AuthorDate: Thu Apr 18 15:05:15 2024 +0800 [SPARK-47850][SQL] Support `spark.sql.hive.convertInsertingUnpartitionedTable` ### What changes were proposed in this pull request? This PR introduced a new configuration `spark.sql.hive.convertInsertingUnpartitionedTable` alongside the existing `spark.sql.hive.convertInsertingPartitionedTable` to allow fine grain switching from Hive Serde to Data Source on inserting Parquet/ORC Hive tables. ### Why are the changes needed? In the practice of hybrid workload (Hive tables may be read/written by Hive, Spark, Impala, etc.), we usually use DataSource for reading Parquet/ORC tables but Hive Serde for writing, the current configuration combination allows us to achieve that except for unpartitioned tables. ### Does this PR introduce _any_ user-facing change? No. The new added configuration `spark.sql.hive.convertInsertingUnpartitionedTable` default value is `true`, which keeps the existing behavior. ### How was this patch tested? New UT is added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46052 from pan3793/SPARK-47850. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- .../plans/logical/basicLogicalOperators.scala | 1 + .../apache/spark/sql/execution/command/views.scala | 1 + .../org/apache/spark/sql/hive/HiveStrategies.scala | 10 -- .../org/apache/spark/sql/hive/HiveUtils.scala | 10 ++ .../execution/CreateHiveTableAsSelectCommand.scala | 5 ++- .../sql/hive/execution/InsertIntoHiveTable.scala | 7 .../spark/sql/hive/orc/HiveOrcQuerySuite.scala | 37 ++ 7 files changed, 67 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index 1c8f7a97dd7f..7c36e3bc79af 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -783,6 +783,7 @@ object View { "spark.sql.hive.convertMetastoreParquet", "spark.sql.hive.convertMetastoreOrc", "spark.sql.hive.convertInsertingPartitionedTable", +"spark.sql.hive.convertInsertingUnpartitionedTable", "spark.sql.hive.convertMetastoreCtas" ).contains(key) || key.startsWith("spark.sql.catalog.") } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala index d71d0d43683c..cb5e7e7f42d2 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala @@ -360,6 +360,7 @@ object ViewHelper extends SQLConfHelper with Logging { "spark.sql.hive.convertMetastoreParquet", "spark.sql.hive.convertMetastoreOrc", "spark.sql.hive.convertInsertingPartitionedTable", +"spark.sql.hive.convertInsertingUnpartitionedTable", "spark.sql.hive.convertMetastoreCtas", SQLConf.ADDITIONAL_REMOTE_REPOSITORIES.key) diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala index 5972a9df78ec..e74cc088a1f6 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala @@ -34,6 +34,7 @@ import org.apache.spark.sql.execution.command.{CreateTableCommand, DDLUtils, Ins import org.apache.spark.sql.execution.datasources.{CreateTable, DataSourceStrategy, HadoopFsRelation, InsertIntoHadoopFsRelationCommand, LogicalRelation} import org.apache.spark.sql.hive.execution._ import org.apache.spark.sql.hive.execution.HiveScriptTransformationExec +import org.apache.spark.sql.hive.execution.InsertIntoHiveTable.BY_CTAS import org.apache.spark.sql.internal.HiveSerDe @@ -194,6 +195,8 @@ object HiveAnalysis extends Rule[LogicalPlan] { * - When writing to non-partitioned Hive-serde Parquet/Orc tables * - When writing to partitioned Hive-serde Parquet/Orc tables when
(spark) branch master updated: [SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 08c49637795f [SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing 08c49637795f is described below commit 08c49637795fd56ef550a509648f0890ff22a948 Author: Dongjoon Hyun AuthorDate: Tue Apr 9 11:14:49 2024 +0800 [SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing ### What changes were proposed in this pull request? This PR aims to fix `GenerateMIMAIgnore.isPackagePrivateModule` to work correctly. For example, `Metadata` is a case class inside package private `DefaultParamsReader` class. Currently, MIMA fails at this class analysis. https://github.com/apache/spark/blob/f8e652e88320528a70e605a6a3cf986725e153a5/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L474-L485 The root cause is `isPackagePrivateModule` fails due to `scala.ScalaReflectionException`. We can simply make `isPackagePrivateModule` return `false` instead of failing. ``` Error instrumenting class:org.apache.spark.ml.util.DefaultParamsReader$Metadata Exception in thread "main" scala.ScalaReflectionException: type Serializable is not a class at scala.reflect.api.Symbols$SymbolApi.asClass(Symbols.scala:284) at scala.reflect.api.Symbols$SymbolApi.asClass$(Symbols.scala:284) at scala.reflect.internal.Symbols$SymbolContextApiImpl.asClass(Symbols.scala:99) at scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala1(JavaMirrors.scala:1085) at scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$classToScala$1(JavaMirrors.scala:1040) at scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$toScala$1(JavaMirrors.scala:150) at scala.reflect.runtime.TwoWayCaches$TwoWayCache.toScala(TwoWayCaches.scala:50) at scala.reflect.runtime.JavaMirrors$JavaMirror.toScala(JavaMirrors.scala:148) at scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala(JavaMirrors.scala:1040) at scala.reflect.runtime.JavaMirrors$JavaMirror.typeToScala(JavaMirrors.scala:1148) at scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$2(JavaMirrors.scala:816) at scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$1(JavaMirrors.scala:816) at scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.completeRest(JavaMirrors.scala:810) at scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.complete(JavaMirrors.scala:806) at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1575) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1538) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(SynchronizedSymbols.scala:221) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info(SynchronizedSymbols.scala:158) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info$(SynchronizedSymbols.scala:158) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.info(SynchronizedSymbols.scala:221) at scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1733) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin(SynchronizedSymbols.scala:109) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin$(SynchronizedSymbols.scala:107) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221) at org.apache.spark.tools.GenerateMIMAIgnore$.isPackagePrivateModule(GenerateMIMAIgnore.scala:48) at org.apache.spark.tools.GenerateMIMAIgnore$.$anonfun$privateWithin$1(GenerateMIMAIgnore.scala:67) at scala.collection.immutable.List.foreach(List.scala:334) at org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:61) at org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:125) at org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala) ``` ### Why are the changes needed? **BEFORE** ``` $ dev/mima | grep org.apache.spark.ml.util.DefaultParamsReader Using SPARK_LOCAL_IP=localhost Using SPARK_LOCAL_IP
(spark) branch master updated (a598f654066d -> 03f4e45cd7e9)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a598f654066d [SPARK-47664][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add more tests add 03f4e45cd7e9 [SPARK-47685][SQL] Restore the support for `Stream` type in `Dataset#groupBy` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala | 4 +++- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 +++- 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-45593][BUILD][3.5] Correct relocation connect guava dependency
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 2da520e88266 [SPARK-45593][BUILD][3.5] Correct relocation connect guava dependency 2da520e88266 is described below commit 2da520e88266530b2283ef3c9ac90bdc806b7556 Author: yikaifei AuthorDate: Mon Apr 1 15:35:23 2024 +0800 [SPARK-45593][BUILD][3.5] Correct relocation connect guava dependency ### What changes were proposed in this pull request? This PR amins to correct relocation connect guava dependency and remove duplicate connect-common from SBT build jars. This PR cherry-pick from https://github.com/apache/spark/pull/43436 and https://github.com/apache/spark/pull/44801 as a backport to 3.5 branch. ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Follow the steps described at https://github.com/apache/spark/pull/43195#issue-1921234067 to test manually. In addition, will continue to observe the GA situation in recent days. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45775 from Yikf/branch-3.5. Authored-by: yikaifei Signed-off-by: yangjie01 --- assembly/pom.xml | 6 ++ connector/connect/client/jvm/pom.xml | 22 ++ connector/connect/common/pom.xml | 33 + connector/connect/server/pom.xml | 1 + project/SparkBuild.scala | 6 +- 5 files changed, 51 insertions(+), 17 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index d1ef9b24afda..21330058f77d 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -159,6 +159,12 @@ org.apache.spark spark-connect_${scala.binary.version} ${project.version} + + + org.apache.spark + spark-connect-common_${scala.binary.version} + + org.apache.spark diff --git a/connector/connect/client/jvm/pom.xml b/connector/connect/client/jvm/pom.xml index 53ff0b0147e0..6febc5ee6bd6 100644 --- a/connector/connect/client/jvm/pom.xml +++ b/connector/connect/client/jvm/pom.xml @@ -51,9 +51,14 @@ ${project.version} + + com.google.protobuf + protobuf-java + compile + com.google.guava guava @@ -61,8 +66,9 @@ compile - com.google.protobuf - protobuf-java + com.google.guava + failureaccess + ${guava.failureaccess.version} compile @@ -108,6 +114,7 @@ true + com.google.guava:* com.google.android:* com.google.api.grpc:* com.google.code.findbugs:* @@ -127,6 +134,13 @@ + + com.google.common + ${spark.shade.packageName}.connect.guava + +com.google.common.** + + io.grpc ${spark.shade.packageName}.io.grpc @@ -138,7 +152,7 @@ com.google ${spark.shade.packageName}.com.google - + com.google.common.** diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml index 7ce0aa6615d3..3c07b63c50a5 100644 --- a/connector/connect/common/pom.xml +++ b/connector/connect/common/pom.xml @@ -47,18 +47,6 @@ com.google.protobuf protobuf-java - -com.google.guava -guava -${connect.guava.version} -compile - - -com.google.guava -failureaccess -${guava.failureaccess.version} -compile - io.grpc grpc-netty @@ -152,6 +140,27 @@ + +org.apache.maven.plugins +maven-shade-plugin + +false + + +org.spark-project.spark:unused + org.apache.tomcat:annotations-api + + + + + +package + +shade + + + + diff --git a/connector/connect/server/pom.xml b/connector/connect
(spark) branch master updated: [SPARK-47629][INFRA] Add `common/variant` and `connector/kinesis-asl` to maven daily test module list
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a8b247e9a50a [SPARK-47629][INFRA] Add `common/variant` and `connector/kinesis-asl` to maven daily test module list a8b247e9a50a is described below commit a8b247e9a50ae0450360e76bc69b2c6cdf5ea6f8 Author: yangjie01 AuthorDate: Fri Mar 29 13:26:40 2024 +0800 [SPARK-47629][INFRA] Add `common/variant` and `connector/kinesis-asl` to maven daily test module list ### What changes were proposed in this pull request? This pr add `common/variant` and `connector/kinesis-asl` to maven daily test module list. ### Why are the changes needed? Synchronize the modules to be tested in Maven daily test ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Monitor GA after merge ### Was this patch authored or co-authored using generative AI tooling? No Closes #45754 from LuciferYang/SPARK-47629. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .github/workflows/maven_test.yml | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/.github/workflows/maven_test.yml b/.github/workflows/maven_test.yml index 34fa9a8b7768..b01f08a23e47 100644 --- a/.github/workflows/maven_test.yml +++ b/.github/workflows/maven_test.yml @@ -62,7 +62,7 @@ jobs: - hive2.3 modules: - >- - core,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch,common#utils + core,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch,common#utils,common#variant - >- graphx,streaming,hadoop-cloud - >- @@ -70,7 +70,7 @@ jobs: - >- repl,sql#hive-thriftserver - >- - connector#kafka-0-10,connector#kafka-0-10-sql,connector#kafka-0-10-token-provider,connector#spark-ganglia-lgpl,connector#protobuf,connector#avro + connector#kafka-0-10,connector#kafka-0-10-sql,connector#kafka-0-10-token-provider,connector#spark-ganglia-lgpl,connector#protobuf,connector#avro,connector#kinesis-asl - >- sql#api,sql#catalyst,resource-managers#yarn,resource-managers#kubernetes#core # Here, we split Hive and SQL tests into some of slow ones and the rest of them. @@ -188,20 +188,21 @@ jobs: export MAVEN_OPTS="-Xss64m -Xmx4g -Xms4g -XX:ReservedCodeCacheSize=128m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN" export MAVEN_CLI_OPTS="--no-transfer-progress" export JAVA_VERSION=${{ matrix.java }} + export ENABLE_KINESIS_TESTS=0 # Replace with the real module name, for example, connector#kafka-0-10 -> connector/kafka-0-10 export TEST_MODULES=`echo "$MODULES_TO_TEST" | sed -e "s%#%/%g"` - ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} clean install + ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} clean install if [[ "$INCLUDED_TAGS" != "" ]]; then -./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} -Dtest.include.tags="$INCLUDED_TAGS" test -fae +./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtest.include.tags="$INCLUDED_TAGS" test -fae elif [[ "$MODULES_TO_TEST" == "connect" ]]; then ./build/mvn $MAVEN_CLI_OPTS -Dtest.exclude.tags="$EXCLUDED_TAGS" -Djava.version=${JAVA_VERSION/-ea} -pl connector/connect/client/jvm,connector/connect/common,connector/connect/server test -fae elif [[ "$EXCLUDED_TAGS" != "" ]]; then -./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} -Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae +./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Djava.version=${JAVA_VERSION/-ea} -Dtes
(spark) branch master updated: [SPARK-47610][CORE] Always set `io.netty.tryReflectionSetAccessible=true`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5f392a219de2 [SPARK-47610][CORE] Always set `io.netty.tryReflectionSetAccessible=true` 5f392a219de2 is described below commit 5f392a219de29b0856884fb95ff3e313f1047013 Author: Cheng Pan AuthorDate: Wed Mar 27 13:16:13 2024 +0800 [SPARK-47610][CORE] Always set `io.netty.tryReflectionSetAccessible=true` ### What changes were proposed in this pull request? Always set `io.netty.tryReflectionSetAccessible=true` ### Why are the changes needed? Arrow requires `-Dio.netty.tryReflectionSetAccessible=true` for JDK9+, see details in ARROW-7223. SPARK-29924 (fixed in 3.0.0) added document to guide users to add such JavaOpts manually, as Arrow is a Spark built-in component, and later we added such Java options to the building system(Maven, SBT, and PySpark test suite) manually. Now Spark requires JDK 17+, I think we can add such Java options by default to reduce disturbing users. ### Does this PR introduce _any_ user-facing change? Yes, no impacts for those users who manually added `io.netty.tryReflectionSetAccessible=true`, but makes life easier for new Spark users. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45733 from pan3793/SPARK-47610. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- docs/index.md| 2 -- .../src/main/java/org/apache/spark/launcher/JavaModuleOptions.java | 3 ++- pom.xml | 5 +++-- project/SparkBuild.scala | 4 ++-- python/docs/source/getting_started/install.rst | 5 + python/run-tests.py | 2 +- sql/hive/pom.xml | 2 +- 7 files changed, 10 insertions(+), 13 deletions(-) diff --git a/docs/index.md b/docs/index.md index 12c53c40c8f7..57f701316bd0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -38,8 +38,6 @@ Spark runs on Java 17/21, Scala 2.13, Python 3.8+, and R 3.5+. When using the Scala API, it is necessary for applications to use the same version of Scala that Spark was compiled for. For example, when using Scala 2.13, use Spark compiled for 2.13, and compile code/applications for Scala 2.13 as well. -Setting `-Dio.netty.tryReflectionSetAccessible=true` is required for the Apache Arrow library. This prevents the `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available` error when Apache Arrow uses Netty internally. - # Running the Examples and Shell Spark comes with several sample programs. Python, Scala, Java, and R examples are in the diff --git a/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java b/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java index 8893f4bcb85a..3a8fa6c42d47 100644 --- a/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java +++ b/launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java @@ -42,7 +42,8 @@ public class JavaModuleOptions { "--add-opens=java.base/sun.security.action=ALL-UNNAMED", "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED", "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED", - "-Djdk.reflect.useDirectMethodHandle=false"}; + "-Djdk.reflect.useDirectMethodHandle=false", + "-Dio.netty.tryReflectionSetAccessible=true"}; /** * Returns the default Java options related to `--add-opens' and diff --git a/pom.xml b/pom.xml index 79f8745e01f8..ffa4b5df36cb 100644 --- a/pom.xml +++ b/pom.xml @@ -316,6 +316,7 @@ --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false + -Dio.netty.tryReflectionSetAccessible=true @@ -3109,7 +3110,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports --ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} -Dio.netty.tryReflectionSetAccessible=true +-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} - -da -Xmx4g -XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} -Dio.netty.tryReflectionSetAccessible=true + -da -Xmx4
(spark) branch master updated: [SPARK-47474][CORE] Revert SPARK-47461 and add some comments
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bee1bceb [SPARK-47474][CORE] Revert SPARK-47461 and add some comments bee1bceb is described below commit bee1bcebdad218a4151ad192d4893ff0fed9 Author: yangjie01 AuthorDate: Thu Mar 21 13:58:39 2024 +0800 [SPARK-47474][CORE] Revert SPARK-47461 and add some comments ### What changes were proposed in this pull request? This pr revert the change of SPARK-47461 and add some comments to `ExecutorAllocationManager#totalRunningTasksPerResourceProfile` to clarify that the tests in `ExecutorAllocationManagerSuite` need to call `listener.totalRunningTasksPerResourceProfile` with `synchronized`. ### Why are the changes needed? `ExecutorAllocationManagerSuite` need to call `listener.totalRunningTasksPerResourceProfile` with `synchronized`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45602 from LuciferYang/SPARK-47474. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../src/main/scala/org/apache/spark/ExecutorAllocationManager.scala | 6 ++ .../scala/org/apache/spark/ExecutorAllocationManagerSuite.scala | 4 +++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala index cdd1aecf4a22..94927caff1d7 100644 --- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala +++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala @@ -320,6 +320,12 @@ private[spark] class ExecutorAllocationManager( } } + // Please do not delete this function, the tests in `ExecutorAllocationManagerSuite` + // need to access `listener.totalRunningTasksPerResourceProfile` with `synchronized`. + private def totalRunningTasksPerResourceProfile(id: Int): Int = synchronized { +listener.totalRunningTasksPerResourceProfile(id) + } + /** * This is called at a fixed interval to regulate the number of pending executor requests * and number of executors running. diff --git a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala index aeb3cf53ff1a..e1da2b6dd9d6 100644 --- a/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala @@ -1934,6 +1934,8 @@ private object ExecutorAllocationManagerSuite extends PrivateMethodTester { PrivateMethod[Map[Int, Map[String, Int]]](Symbol("rpIdToHostToLocalTaskCount")) private val _onSpeculativeTaskSubmitted = PrivateMethod[Unit](Symbol("onSpeculativeTaskSubmitted")) + private val _totalRunningTasksPerResourceProfile = +PrivateMethod[Int](Symbol("totalRunningTasksPerResourceProfile")) private val defaultProfile = ResourceProfile.getOrCreateDefaultProfile(new SparkConf) @@ -2041,7 +2043,7 @@ private object ExecutorAllocationManagerSuite extends PrivateMethodTester { } private def totalRunningTasksPerResourceProfile(manager: ExecutorAllocationManager): Int = { -manager.listener.totalRunningTasksPerResourceProfile(defaultProfile.id) +manager invokePrivate _totalRunningTasksPerResourceProfile(defaultProfile.id) } private def hostToLocalTaskCount( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (bb0867f54d43 -> 5d3845f2942a)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from bb0867f54d43 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j` add 5d3845f2942a [SPARK-46920][YARN] Improve executor exit error message on YARN No new revisions were added by this update. Summary of changes: .../apache/spark/deploy/yarn/YarnAllocator.scala | 28 -- 1 file changed, 16 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 142677bcd203 [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala` 142677bcd203 is described below commit 142677bcd203caf2b6d07bf41d654e123d910ee8 Author: yangjie01 AuthorDate: Wed Mar 20 15:19:33 2024 +0800 [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala` ### What changes were proposed in this pull request? https://github.com/apache/spark/blob/e01ed0da22f24204fe23143032ff39be7f4b56af/project/SparkBuild.scala#L157-L173 `Source.fromFile(in)` opens a `BufferedSource` resource handle, but it does not close it, this pr fix this issue. ### Why are the changes needed? Close resource after used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45582 from LuciferYang/SPARK-47455. Authored-by: yangjie01 Signed-off-by: yangjie01 (cherry picked from commit 85bf7615f85eea3e9192a7684ef711cf44042e05) Signed-off-by: yangjie01 --- project/SparkBuild.scala | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 31263eaa4c8d..31516c8c6ffe 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -159,16 +159,21 @@ object SparkBuild extends PomBuild { val replacements = Map( """customId="println" level="error"""" -> """customId="println" level="warn"""" ) -var contents = Source.fromFile(in).getLines.mkString("\n") -for ((k, v) <- replacements) { - require(contents.contains(k), s"Could not rewrite '$k' in original scalastyle config.") - contents = contents.replace(k, v) -} -new PrintWriter(out) { - write(contents) - close() +val source = Source.fromFile(in) +try { + var contents = source.getLines.mkString("\n") + for ((k, v) <- replacements) { +require(contents.contains(k), s"Could not rewrite '$k' in original scalastyle config.") +contents = contents.replace(k, v) + } + new PrintWriter(out) { +write(contents) +close() + } + out +} finally { + source.close() } -out } // Return a cached scalastyle task for a given configuration (usually Compile or Test) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8fcd9a1b0024 [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala` 8fcd9a1b0024 is described below commit 8fcd9a1b0024d24e3622b1948123e7f239a734a5 Author: yangjie01 AuthorDate: Wed Mar 20 15:19:33 2024 +0800 [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala` ### What changes were proposed in this pull request? https://github.com/apache/spark/blob/e01ed0da22f24204fe23143032ff39be7f4b56af/project/SparkBuild.scala#L157-L173 `Source.fromFile(in)` opens a `BufferedSource` resource handle, but it does not close it, this pr fix this issue. ### Why are the changes needed? Close resource after used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45582 from LuciferYang/SPARK-47455. Authored-by: yangjie01 Signed-off-by: yangjie01 (cherry picked from commit 85bf7615f85eea3e9192a7684ef711cf44042e05) Signed-off-by: yangjie01 --- project/SparkBuild.scala | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 79b58deafde5..dfadfea172d8 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -160,16 +160,21 @@ object SparkBuild extends PomBuild { val replacements = Map( """customId="println" level="error"""" -> """customId="println" level="warn"""" ) -var contents = Source.fromFile(in).getLines.mkString("\n") -for ((k, v) <- replacements) { - require(contents.contains(k), s"Could not rewrite '$k' in original scalastyle config.") - contents = contents.replace(k, v) -} -new PrintWriter(out) { - write(contents) - close() +val source = Source.fromFile(in) +try { + var contents = source.getLines.mkString("\n") + for ((k, v) <- replacements) { +require(contents.contains(k), s"Could not rewrite '$k' in original scalastyle config.") +contents = contents.replace(k, v) + } + new PrintWriter(out) { +write(contents) +close() + } + out +} finally { + source.close() } -out } // Return a cached scalastyle task for a given configuration (usually Compile or Test) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (c3a04fa59ce1 -> 85bf7615f85e)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from c3a04fa59ce1 [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ add 85bf7615f85e [SPARK-47455][BUILD] Fix resource leak during the initialization of `scalaStyleOnCompileConfig` in `SparkBuild.scala` No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 8b4316461e2b [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` 8b4316461e2b is described below commit 8b4316461e2bc3ca3b72170648ca6b6e36537a65 Author: yangjie01 AuthorDate: Fri Mar 8 16:44:54 2024 +0800 [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/45406 has been backported to branch-3.4, where the newly added test case in `PropagateEmptyRelationSuite` uses `DataTypeUtils`, but `DataTypeUtils` is a utility class added in Apache Spark 3.5(SPARK-44475), so this triggered a compilation failure in branch-3.4: - https://github.com/apache/spark/actions/runs/8183755511/job/22377119069 ``` [error] /home/runner/work/spark/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala:229:27: not found: value DataTypeUtils [error] val schemaForStream = DataTypeUtils.fromAttributes(outputForStream) [error] ^ [error] /home/runner/work/spark/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala:233:26: not found: value DataTypeUtils [error] val schemaForBatch = DataTypeUtils.fromAttributes(outputForBatch) [error] ^ [info] done compiling [info] compiling 1 Scala source to /home/runner/work/spark/spark/connector/connect/common/target/scala-2.12/test-classes ... [info] compiling 25 Scala sources and 1 Java source to /home/runner/work/spark/spark/connector/connect/client/jvm/target/scala-2.12/classes ... [info] done compiling [error] two errors found ``` Therefore, this PR changes to use the `StructType.fromAttributes` function to fix the compilation failure." ### Why are the changes needed? Fix the compilation failure in branch-3.4 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45428 from LuciferYang/SPARK-47305-FOLLOWUP-34. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala index a1132eadcc6f..91b62ae953f0 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala @@ -226,11 +226,11 @@ class PropagateEmptyRelationSuite extends PlanTest { val data = Seq(Row(1)) val outputForStream = Seq($"a".int) -val schemaForStream = DataTypeUtils.fromAttributes(outputForStream) +val schemaForStream = StructType.fromAttributes(outputForStream) val converterForStream = CatalystTypeConverters.createToCatalystConverter(schemaForStream) val outputForBatch = Seq($"b".int) -val schemaForBatch = DataTypeUtils.fromAttributes(outputForBatch) +val schemaForBatch = StructType.fromAttributes(outputForBatch) val converterForBatch = CatalystTypeConverters.createToCatalystConverter(schemaForBatch) val streamingRelation = LocalRelation( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (3aa16e8193cf -> 433c9b064a3f)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3aa16e8193cf [MINOR] Update outdated comments for class `o.a.s.s.functions` add 433c9b064a3f [SPARK-47246][SQL] Replace `InternalRow.fromSeq` with `new GenericInternalRow` to save a collection conversion No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/literals.scala | 5 ++--- .../spark/sql/catalyst/expressions/objects/objects.scala | 2 +- .../org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala| 12 ++-- .../spark/sql/execution/columnar/InMemoryRelation.scala | 4 ++-- .../datasources/v2/state/metadata/StateMetadataSource.scala | 5 +++-- .../scala/org/apache/spark/sql/hive/HiveInspectors.scala | 2 +- .../apache/spark/sql/hive/execution/HiveTableScanExec.scala | 2 +- 7 files changed, 16 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (c8d293dff595 -> 7e7ba4eaf071)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from c8d293dff595 [SPARK-47147][PYTHON][SQL] Fix PySpark collated string conversion error add 7e7ba4eaf071 [MINOR][SQL] Remove out-of-dated comment in `CollectLimitExec` No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala | 5 - 1 file changed, 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 2.0.62.Final
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fb1e7872a3e6 [SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 2.0.62.Final fb1e7872a3e6 is described below commit fb1e7872a3e64eab6127f9c2b3ffa42b63162f6c Author: Dongjoon Hyun AuthorDate: Tue Feb 20 17:04:41 2024 +0800 [SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 2.0.62.Final ### What changes were proposed in this pull request? This PR aims to upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 2.0.62.Final. ### Why are the changes needed? To bring the latest bug fixes. - https://netty.io/news/2024/02/13/4-1-107-Final.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45178 from dongjoon-hyun/SPARK-47100. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +-- pom.xml | 4 +-- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index dbbddbc54c11..cc0145e004a0 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -192,32 +192,32 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar metrics-json/4.2.25//metrics-json-4.2.25.jar metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.106.Final//netty-all-4.1.106.Final.jar -netty-buffer/4.1.106.Final//netty-buffer-4.1.106.Final.jar -netty-codec-http/4.1.106.Final//netty-codec-http-4.1.106.Final.jar -netty-codec-http2/4.1.106.Final//netty-codec-http2-4.1.106.Final.jar -netty-codec-socks/4.1.106.Final//netty-codec-socks-4.1.106.Final.jar -netty-codec/4.1.106.Final//netty-codec-4.1.106.Final.jar -netty-common/4.1.106.Final//netty-common-4.1.106.Final.jar -netty-handler-proxy/4.1.106.Final//netty-handler-proxy-4.1.106.Final.jar -netty-handler/4.1.106.Final//netty-handler-4.1.106.Final.jar -netty-resolver/4.1.106.Final//netty-resolver-4.1.106.Final.jar +netty-all/4.1.107.Final//netty-all-4.1.107.Final.jar +netty-buffer/4.1.107.Final//netty-buffer-4.1.107.Final.jar +netty-codec-http/4.1.107.Final//netty-codec-http-4.1.107.Final.jar +netty-codec-http2/4.1.107.Final//netty-codec-http2-4.1.107.Final.jar +netty-codec-socks/4.1.107.Final//netty-codec-socks-4.1.107.Final.jar +netty-codec/4.1.107.Final//netty-codec-4.1.107.Final.jar +netty-common/4.1.107.Final//netty-common-4.1.107.Final.jar +netty-handler-proxy/4.1.107.Final//netty-handler-proxy-4.1.107.Final.jar +netty-handler/4.1.107.Final//netty-handler-4.1.107.Final.jar +netty-resolver/4.1.107.Final//netty-resolver-4.1.107.Final.jar netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar -netty-tcnative-boringssl-static/2.0.61.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-aarch_64.jar -netty-tcnative-boringssl-static/2.0.61.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar -netty-tcnative-boringssl-static/2.0.61.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-aarch_64.jar -netty-tcnative-boringssl-static/2.0.61.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-x86_64.jar -netty-tcnative-boringssl-static/2.0.61.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-windows-x86_64.jar -netty-tcnative-classes/2.0.61.Final//netty-tcnative-classes-2.0.61.Final.jar -netty-transport-classes-epoll/4.1.106.Final//netty-transport-classes-epoll-4.1.106.Final.jar -netty-transport-classes-kqueue/4.1.106.Final//netty-transport-classes-kqueue-4.1.106.Final.jar -netty-transport-native-epoll/4.1.106.Final/linux-aarch_64/netty-transport-native-epoll-4.1.106.Final-linux-aarch_64.jar -netty-transport-native-epoll/4.1.106.Final/linux-riscv64/netty-transport-native-epoll-4.1.106.Final-linux-riscv64.jar -netty-transport-native-epoll/4.1.106.Final/linux-x86_64/netty-transport-native-epoll-4.1.106.Final-linux-x86_64.jar -netty-transport-native-kqueue/4.1.106.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.106.Final-osx-aarch_64.jar -netty-transport-native-kqueue/4.1.106.Final/osx-x86_64/netty-transport-native-kqueue-4.1.106.Final-osx-x86_64.jar -netty-transport-native-unix-common/4.1.106.Final//netty-transport-native-unix-common-4.1.106.Final.jar -netty-transport/4.1.106.Final//netty-transport-4.1.106.Final.jar +netty-tcnative-boringssl-static/2.0.62.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.62.Final-linux-aarch_64.jar +netty
(spark) branch master updated: [SPARK-47084][BUILD] Upgrade joda-time to 2.12.7
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 85108b0cb605 [SPARK-47084][BUILD] Upgrade joda-time to 2.12.7 85108b0cb605 is described below commit 85108b0cb6059e9a5301b63ab266084defd0ddf2 Author: panbingkun AuthorDate: Mon Feb 19 10:15:37 2024 +0800 [SPARK-47084][BUILD] Upgrade joda-time to 2.12.7 ### What changes were proposed in this pull request? The pr aims to upgrade `joda-time` from `2.12.6` to `2.12.7`. ### Why are the changes needed? The version `DateTimeZone` data updated to version `2024agtz`. The full release notes: https://www.joda.org/joda-time/changes-report.html#a2.12.7 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45153 from panbingkun/SPARK-47084. Authored-by: panbingkun Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 0b619a249e96..5aabe0e4aef1 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -139,7 +139,7 @@ jetty-util/10.0.19//jetty-util-10.0.19.jar jline/2.14.6//jline-2.14.6.jar jline/3.22.0//jline-3.22.0.jar jna/5.13.0//jna-5.13.0.jar -joda-time/2.12.6//joda-time-2.12.6.jar +joda-time/2.12.7//joda-time-2.12.7.jar jodd-core/3.5.2//jodd-core-3.5.2.jar jpam/1.1//jpam-1.1.jar json/1.8//json-1.8.jar diff --git a/pom.xml b/pom.xml index a14f2d255a90..64931dd73282 100644 --- a/pom.xml +++ b/pom.xml @@ -208,7 +208,7 @@ Because it transitions Jakarta REST API from javax to jakarta package. --> 2.41 -2.12.6 +2.12.7 3.5.2 3.0.0 0.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][INFRA][DOCS] Remove undated comment in build_and_test.yml
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7c60fe21a29d [MINOR][INFRA][DOCS] Remove undated comment in build_and_test.yml 7c60fe21a29d is described below commit 7c60fe21a29dd852de01da214c84e6a3deb38e31 Author: Hyukjin Kwon AuthorDate: Mon Feb 19 10:13:52 2024 +0800 [MINOR][INFRA][DOCS] Remove undated comment in build_and_test.yml ### What changes were proposed in this pull request? This PR removes outdated comment. We don't use branch-3.3 anymore ### Why are the changes needed? To remove obsolete information ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45149 from HyukjinKwon/minor-ga. Authored-by: Hyukjin Kwon Signed-off-by: yangjie01 --- .github/workflows/build_and_test.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index c578d5079be8..bad34fd746ba 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -285,7 +285,6 @@ jobs: infra-image: name: "Base image build" needs: precondition -# Currently, enable docker build from cache for `master` and branch (since 3.4) jobs if: >- fromJson(needs.precondition.outputs.required).pyspark == 'true' || fromJson(needs.precondition.outputs.required).lint == 'true' || - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47073][BUILD] Upgrade several Maven plugins to the latest versions
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2abd3a2f445e [SPARK-47073][BUILD] Upgrade several Maven plugins to the latest versions 2abd3a2f445e is described below commit 2abd3a2f445e86337ad94da19f301cb2b8bc232f Author: Dongjoon Hyun AuthorDate: Fri Feb 16 22:10:41 2024 +0800 [SPARK-47073][BUILD] Upgrade several Maven plugins to the latest versions ### What changes were proposed in this pull request? This PR aims to upgrade several maven plugins to the latest versions for Apache Spark 4.0.0. ### Why are the changes needed? To bring the latest bug fixes. - `versions-maven-plugin` from 2.16.0 to 2.16.2. - `maven-enforcer-plugin` from 3.3.0 to 3.4.1. - `maven-compiler-plugin` from 3.11.0 to 3.12.1. - `maven-surefire-plugin` from 3.1.2 to 3.2.5. - `maven-clean-plugin` from 3.3.1 to 3.3.2. - `maven-javadoc-plugin` from 3.5.0 to 3.6.3. - `maven-shade-plugin` from 3.5.0 to 3.5.1. - `maven-dependency-plugin` from 3.6.0 to 3.6.1. - `maven-checkstyle-plugin` from 3.3.0 to 3.3.1. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manual. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45136 from dongjoon-hyun/SPARK-47073. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- pom.xml | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/pom.xml b/pom.xml index b83378af30ff..cd669dd02b6d 100644 --- a/pom.xml +++ b/pom.xml @@ -179,7 +179,7 @@ 4.7.1 false -2.16.0 +2.16.2 true true @@ -2852,7 +2852,7 @@ org.apache.maven.plugins maven-enforcer-plugin - 3.3.0 + 3.4.1 enforce-versions @@ -3035,7 +3035,7 @@ org.apache.maven.plugins maven-compiler-plugin - 3.11.0 + 3.12.1 ${java.version} ${java.version} @@ -3052,7 +3052,7 @@ org.apache.maven.plugins maven-surefire-plugin - 3.1.2 + 3.2.5 @@ -3189,7 +3189,7 @@ org.apache.maven.plugins maven-clean-plugin - 3.3.1 + 3.3.2 @@ -3216,7 +3216,7 @@ org.apache.maven.plugins maven-javadoc-plugin - 3.5.0 + 3.6.3 -Xdoclint:all @@ -3272,7 +3272,7 @@ org.apache.maven.plugins maven-shade-plugin - 3.5.0 + 3.5.1 org.ow2.asm @@ -3299,7 +3299,7 @@ org.apache.maven.plugins maven-dependency-plugin - 3.6.0 + 3.6.1 default-cli @@ -3439,7 +3439,7 @@ org.apache.maven.plugins maven-checkstyle-plugin -3.3.0 +3.3.1 false true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47025][BUILD][TESTS] Upgrade `Guava` dependency in `docker-integration-tests` test module
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d260f5753e9d [SPARK-47025][BUILD][TESTS] Upgrade `Guava` dependency in `docker-integration-tests` test module d260f5753e9d is described below commit d260f5753e9db00b84d85c34d1ebd21e36a98ac1 Author: Dongjoon Hyun AuthorDate: Tue Feb 13 08:41:00 2024 +0800 [SPARK-47025][BUILD][TESTS] Upgrade `Guava` dependency in `docker-integration-tests` test module ### What changes were proposed in this pull request? This PR aims to update `docker-integration-tests` test module to use the latest `Guava` test dependency. Specifically, - Switch from `provided` dependency to `test` dependency - Upgrade from version `19.0` to `33.0.0-jre`. ### Why are the changes needed? Previously, `docker-integration-tests` uses `Guava 19.0` dependency as `provided` scope because `docker-java-core` is using `Guava 19.0` still. - https://mvnrepository.com/artifact/com.github.docker-java/docker-java-core/3.3.4 ### Does this PR introduce _any_ user-facing change? No, `docker-integration-tests` is an integration test module. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45088 from dongjoon-hyun/SPARK-47025. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- connector/docker-integration-tests/pom.xml | 3 ++- project/SparkBuild.scala | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/connector/docker-integration-tests/pom.xml b/connector/docker-integration-tests/pom.xml index 4cca3ef12ae5..f9430da052be 100644 --- a/connector/docker-integration-tests/pom.xml +++ b/connector/docker-integration-tests/pom.xml @@ -49,7 +49,8 @@ com.google.guava guava - 19.0 + 33.0.0-jre + test org.apache.spark diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 45b51cb0ff5b..24e2c814f99f 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -951,7 +951,7 @@ object Unsafe { object DockerIntegrationTests { // This serves to override the override specified in DependencyOverrides: lazy val settings = Seq( -dependencyOverrides += "com.google.guava" % "guava" % "19.0" +dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre" ) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46615][CONNECT] Support s.c.immutable.ArraySeq in ArrowDeserializers
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 729fc8ec95e0 [SPARK-46615][CONNECT] Support s.c.immutable.ArraySeq in ArrowDeserializers 729fc8ec95e0 is described below commit 729fc8ec95e017bd6eead283c0b660b9c57a174d Author: panbingkun AuthorDate: Thu Feb 8 14:57:13 2024 +0800 [SPARK-46615][CONNECT] Support s.c.immutable.ArraySeq in ArrowDeserializers ### What changes were proposed in this pull request? The pr aims to support s.c.immutable.ArraySeq as customCollectionCls in ArrowDeserializers. ### Why are the changes needed? Because s.c.immutable.ArraySeq is a commonly used type in Scala 2.13, we should support it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Update existed UT (SQLImplicitsTestSuite). ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44618 from panbingkun/SPARK-46615. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala| 11 +++ .../spark/sql/connect/client/arrow/ArrowDeserializer.scala| 9 + .../spark/sql/connect/client/arrow/ArrowEncoderUtils.scala| 2 ++ 3 files changed, 22 insertions(+) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala index b2c13850a13a..3e4704b6ab8e 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala @@ -52,6 +52,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { test("test implicit encoder resolution") { val spark = session +import org.apache.spark.util.ArrayImplicits._ import spark.implicits._ def testImplicit[T: Encoder](expected: T): Unit = { val encoder = encoderFor[T] @@ -84,6 +85,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(booleans) testImplicit(booleans.toSeq) testImplicit(booleans.toSeq)(newBooleanSeqEncoder) +testImplicit(booleans.toImmutableArraySeq) val bytes = Array(76.toByte, 59.toByte, 121.toByte) testImplicit(bytes.head) @@ -91,6 +93,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(bytes) testImplicit(bytes.toSeq) testImplicit(bytes.toSeq)(newByteSeqEncoder) +testImplicit(bytes.toImmutableArraySeq) val shorts = Array(21.toShort, (-213).toShort, 14876.toShort) testImplicit(shorts.head) @@ -98,6 +101,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(shorts) testImplicit(shorts.toSeq) testImplicit(shorts.toSeq)(newShortSeqEncoder) +testImplicit(shorts.toImmutableArraySeq) val ints = Array(4, 6, 5) testImplicit(ints.head) @@ -105,6 +109,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(ints) testImplicit(ints.toSeq) testImplicit(ints.toSeq)(newIntSeqEncoder) +testImplicit(ints.toImmutableArraySeq) val longs = Array(System.nanoTime(), System.currentTimeMillis()) testImplicit(longs.head) @@ -112,6 +117,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(longs) testImplicit(longs.toSeq) testImplicit(longs.toSeq)(newLongSeqEncoder) +testImplicit(longs.toImmutableArraySeq) val floats = Array(3f, 10.9f) testImplicit(floats.head) @@ -119,6 +125,7 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(floats) testImplicit(floats.toSeq) testImplicit(floats.toSeq)(newFloatSeqEncoder) +testImplicit(floats.toImmutableArraySeq) val doubles = Array(23.78d, -329.6d) testImplicit(doubles.head) @@ -126,22 +133,26 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(doubles) testImplicit(doubles.toSeq) testImplicit(doubles.toSeq)(newDoubleSeqEncoder) +testImplicit(doubles.toImmutableArraySeq) val strings = Array("foo", "baz", "bar") testImplicit(strings.head) testImplicit(strings) testImplicit(strings.toSeq) testImplicit(strings.toSeq)(newStringSeqEncoder) +testImplicit(strings.toImmutableArraySeq) val myTypes = Array(MyType(12L, Math.E, Math.PI), MyType(0, 0, 0)) testImplicit(myTypes.head) testImplicit(myTypes) testImplicit(myType
(spark) branch branch-3.5 updated: [SPARK-46400][CORE][SQL][3.5] When there are corrupted files in the local maven repo, skip this cache and try again
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 77f8b38a1091 [SPARK-46400][CORE][SQL][3.5] When there are corrupted files in the local maven repo, skip this cache and try again 77f8b38a1091 is described below commit 77f8b38a1091aa51af32dc790b61ae54ac47a2c2 Author: panbingkun AuthorDate: Thu Feb 8 14:41:51 2024 +0800 [SPARK-46400][CORE][SQL][3.5] When there are corrupted files in the local maven repo, skip this cache and try again ### What changes were proposed in this pull request? The pr aims to - fix potential bug(ie: https://github.com/apache/spark/pull/44208) and enhance user experience. - make the code more compliant with standards Backport above to branch 3.5. Master branch pr: https://github.com/apache/spark/pull/44343 ### Why are the changes needed? We use the local maven repo as the first-level cache in ivy. The original intention was to reduce the time required to parse and obtain the ar, but when there are corrupted files in the local maven repo,The above mechanism will be directly interrupted and the prompt is very unfriendly, which will greatly confuse the user. Based on the original intention, we should skip the cache directly in similar situations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45017 from panbingkun/branch-3.5_SPARK-46400. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../org/apache/spark/deploy/SparkSubmit.scala | 116 + .../sql/hive/client/IsolatedClientLoader.scala | 4 + 2 files changed, 98 insertions(+), 22 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index af35f451e370..0f0d8b6c07c0 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -41,7 +41,7 @@ import org.apache.ivy.Ivy import org.apache.ivy.core.LogOptions import org.apache.ivy.core.module.descriptor._ import org.apache.ivy.core.module.id.{ArtifactId, ModuleId, ModuleRevisionId} -import org.apache.ivy.core.report.ResolveReport +import org.apache.ivy.core.report.{DownloadStatus, ResolveReport} import org.apache.ivy.core.resolve.ResolveOptions import org.apache.ivy.core.retrieve.RetrieveOptions import org.apache.ivy.core.settings.IvySettings @@ -1226,7 +1226,7 @@ private[spark] object SparkSubmitUtils extends Logging { s"be whitespace. The artifactId provided is: ${splits(1)}") require(splits(2) != null && splits(2).trim.nonEmpty, s"The version cannot be null or " + s"be whitespace. The version provided is: ${splits(2)}") - new MavenCoordinate(splits(0), splits(1), splits(2)) + MavenCoordinate(splits(0), splits(1), splits(2)) } } @@ -1241,21 +1241,27 @@ private[spark] object SparkSubmitUtils extends Logging { } /** - * Extracts maven coordinates from a comma-delimited string + * Create a ChainResolver used by Ivy to search for and resolve dependencies. + * * @param defaultIvyUserDir The default user path for Ivy + * @param useLocalM2AsCache Whether to use the local maven repo as a cache * @return A ChainResolver used by Ivy to search for and resolve dependencies. */ - def createRepoResolvers(defaultIvyUserDir: File): ChainResolver = { + def createRepoResolvers( + defaultIvyUserDir: File, + useLocalM2AsCache: Boolean = true): ChainResolver = { // We need a chain resolver if we want to check multiple repositories val cr = new ChainResolver cr.setName("spark-list") -val localM2 = new IBiblioResolver -localM2.setM2compatible(true) -localM2.setRoot(m2Path.toURI.toString) -localM2.setUsepoms(true) -localM2.setName("local-m2-cache") -cr.add(localM2) +if (useLocalM2AsCache) { + val localM2 = new IBiblioResolver + localM2.setM2compatible(true) + localM2.setRoot(m2Path.toURI.toString) + localM2.setUsepoms(true) + localM2.setName("local-m2-cache") + cr.add(localM2) +} val localIvy = new FileSystemResolver val localIvyRoot = new File(defaultIvyUserDir, "local") @@ -1351,18 +1357,23 @@ private[spark] object SparkSubmitUtils extends Logging { /** * Build Ivy Settings using options with default resolvers + * * @param remoteRepos Comma-delimited string of remote repositories other than maven central * @param ivyPath The path to the local ivy r
(spark) branch master updated: [MINOR][PYTHON][SQL][TESTS] Don't load Python Data Source when Python executable is not available even for testing
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ab7fcacca41d [MINOR][PYTHON][SQL][TESTS] Don't load Python Data Source when Python executable is not available even for testing ab7fcacca41d is described below commit ab7fcacca41dad0ec2334b5d990bf36522fb5c82 Author: Hyukjin Kwon AuthorDate: Thu Feb 8 14:19:43 2024 +0800 [MINOR][PYTHON][SQL][TESTS] Don't load Python Data Source when Python executable is not available even for testing ### What changes were proposed in this pull request? This PR proposes to don't load Python Data Source Python executable is not available even for testing ### Why are the changes needed? Whether if we're in test or not, it can't work loading Python Data Sources anyway. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45067 from HyukjinKwon/minor-checking. Authored-by: Hyukjin Kwon Signed-off-by: yangjie01 --- .../org/apache/spark/sql/execution/datasources/DataSourceManager.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala index f63157b91efb..1b396675d909 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala @@ -98,7 +98,7 @@ object DataSourceManager extends Logging { private def normalize(name: String): String = name.toLowerCase(Locale.ROOT) private def initialStaticDataSourceBuilders: Map[String, UserDefinedPythonDataSource] = { -if (Utils.isTesting || shouldLoadPythonDataSources) this.synchronized { +if (shouldLoadPythonDataSources) this.synchronized { if (dataSourceBuilders.isEmpty) { val maybeResult = try { Some(UserDefinedPythonDataSource.lookupAllDataSourcesInPython()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47005][PYTHON][DOCS] Refine docstring of `asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 45956f72d864 [SPARK-47005][PYTHON][DOCS] Refine docstring of `asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last` 45956f72d864 is described below commit 45956f72d864701cd84635e9cac0a29592c08b1c Author: yangjie01 AuthorDate: Thu Feb 8 14:09:06 2024 +0800 [SPARK-47005][PYTHON][DOCS] Refine docstring of `asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last` ### What changes were proposed in this pull request? This pr refine docstring of `asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last` and add some new examples. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45066 from LuciferYang/sort-funcs. Authored-by: yangjie01 Signed-off-by: yangjie01 --- python/pyspark/sql/functions/builtin.py | 148 +++- 1 file changed, 128 insertions(+), 20 deletions(-) diff --git a/python/pyspark/sql/functions/builtin.py b/python/pyspark/sql/functions/builtin.py index 110006df4317..6320f9b922ee 100644 --- a/python/pyspark/sql/functions/builtin.py +++ b/python/pyspark/sql/functions/builtin.py @@ -2889,7 +2889,7 @@ def getbit(col: "ColumnOrName", pos: "ColumnOrName") -> Column: @_try_remote_functions def asc_nulls_first(col: "ColumnOrName") -> Column: """ -Returns a sort expression based on the ascending order of the given +Sort Function: Returns a sort expression based on the ascending order of the given column name, and null values return before non-null values. .. versionadded:: 2.4.0 @@ -2909,10 +2909,11 @@ def asc_nulls_first(col: "ColumnOrName") -> Column: Examples ->>> df1 = spark.createDataFrame([(1, "Bob"), -... (0, None), -... (2, "Alice")], ["age", "name"]) ->>> df1.sort(asc_nulls_first(df1.name)).show() +Example 1: Sorting a DataFrame with null values in ascending order + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([(1, "Bob"), (0, None), (2, "Alice")], ["age", "name"]) +>>> df.sort(sf.asc_nulls_first(df.name)).show() +---+-+ |age| name| +---+-+ @@ -2921,6 +2922,32 @@ def asc_nulls_first(col: "ColumnOrName") -> Column: | 1| Bob| +---+-+ +Example 2: Sorting a DataFrame with multiple columns, null values in ascending order + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame( +... [(1, "Bob", None), (0, None, "Z"), (2, "Alice", "Y")], ["age", "name", "grade"]) +>>> df.sort(sf.asc_nulls_first(df.name), sf.asc_nulls_first(df.grade)).show() ++---+-+-+ +|age| name|grade| ++---+-+-+ +| 0| NULL|Z| +| 2|Alice|Y| +| 1| Bob| NULL| ++---+-+-+ + +Example 3: Sorting a DataFrame with null values in ascending order using column name string + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([(1, "Bob"), (0, None), (2, "Alice")], ["age", "name"]) +>>> df.sort(sf.asc_nulls_first("name")).show() ++---+-+ +|age| name| ++---+-+ +| 0| NULL| +| 2|Alice| +| 1| Bob| ++---+-+ """ return ( col.asc_nulls_first() @@ -2932,7 +2959,7 @@ def asc_nulls_first(col: "ColumnOrName") -> Column: @_try_remote_functions def asc_nulls_last(col: "ColumnOrName") -> Column: """ -Returns a sort expression based on the ascending order of the given +Sort Function: Returns a sort expression based on the ascending order of the given column name, and null values appear after non-null values. .. versionadded:: 2.4.0 @@ -2952,10 +2979,11 @@ def asc_nulls_last(col: "ColumnOrName") -> Column: Examples ->>> df1 = spark.createDataFrame([(0, None), -... (1, "Bob"), -... (2, "Alice")], ["age", "name"]) ->>> df1.sort
(spark) branch master updated: [SPARK-46987][CONNECT] `ProtoUtils.abbreviate` avoid unnecessary `setField` operation
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a95aa7a7dda1 [SPARK-46987][CONNECT] `ProtoUtils.abbreviate` avoid unnecessary `setField` operation a95aa7a7dda1 is described below commit a95aa7a7dda1a5a2cfee69b3c132c524c0e01c7d Author: Ruifeng Zheng AuthorDate: Wed Feb 7 10:26:34 2024 +0800 [SPARK-46987][CONNECT] `ProtoUtils.abbreviate` avoid unnecessary `setField` operation ### What changes were proposed in this pull request? `ProtoUtils.abbreviate` avoid unnecessary `setField` operation ### Why are the changes needed? according to the [API reference](https://protobuf.dev/reference/java/api-docs/com/google/protobuf/Message.html#toBuilder--): > Message.Builder toBuilder() Constructs a builder initialized with the current message. Use this to derive a new message from the current one. the builder we used already has all the fields, so we only need to update the truncated fields. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #45045 from zhengruifeng/connect_redaction_nit. Authored-by: Ruifeng Zheng Signed-off-by: yangjie01 --- .../scala/org/apache/spark/sql/connect/common/ProtoUtils.scala| 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala index 44de2350b9fd..2f31b63acf87 100644 --- a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala +++ b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/ProtoUtils.scala @@ -43,8 +43,6 @@ private[connect] object ProtoUtils { val threshold = thresholds.getOrElse(STRING, MAX_STRING_SIZE) if (size > threshold) { builder.setField(field, createString(string.take(threshold), size)) -} else { - builder.setField(field, string) } case (field: FieldDescriptor, byteString: ByteString) @@ -57,8 +55,6 @@ private[connect] object ProtoUtils { byteString .substring(0, threshold) .concat(createTruncatedByteString(size))) -} else { - builder.setField(field, byteString) } case (field: FieldDescriptor, byteArray: Array[Byte]) @@ -71,8 +67,6 @@ private[connect] object ProtoUtils { ByteString .copyFrom(byteArray, 0, threshold) .concat(createTruncatedByteString(size))) -} else { - builder.setField(field, byteArray) } // TODO(SPARK-43117): should also support 1, repeated msg; 2, map @@ -80,7 +74,7 @@ private[connect] object ProtoUtils { if field.getJavaType == FieldDescriptor.JavaType.MESSAGE && msg != null => builder.setField(field, abbreviate(msg, thresholds)) - case (field: FieldDescriptor, value: Any) => builder.setField(field, value) + case _ => } builder.build() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46895][CORE] Replace Timer with single thread scheduled executor
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5d5b3a54b7b5 [SPARK-46895][CORE] Replace Timer with single thread scheduled executor 5d5b3a54b7b5 is described below commit 5d5b3a54b7b5fb4308fe40da696ba805c72983fc Author: beliefer AuthorDate: Tue Feb 6 17:23:03 2024 +0800 [SPARK-46895][CORE] Replace Timer with single thread scheduled executor ### What changes were proposed in this pull request? This PR propose to replace `Timer` with single thread scheduled executor. ### Why are the changes needed? The javadoc recommends `ScheduledThreadPoolExecutor` instead of `Timer`. ![屏幕快照 2024-01-12 下午12 47 57](https://github.com/apache/spark/assets/8486025/4fc5ed61-6bb9-4768-915a-ad919a067d04) This change based on the following two points. **System time sensitivity** Timer scheduling is based on the absolute time of the operating system and is sensitive to the operating system's time. Once the operating system's time changes, Timer scheduling is no longer precise. The scheduled Thread Pool Executor scheduling is based on relative time and is not affected by changes in operating system time. **Are anomalies captured** Timer does not capture exceptions thrown by Timer Tasks, and in addition, Timer is single threaded. Once a scheduling task encounters an exception, the entire thread will terminate and other tasks that need to be scheduled will no longer be executed. The scheduled Thread Pool Executor implements scheduling functions based on a thread pool. After a task throws an exception, other tasks can still execute normally. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA tests. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44718 from beliefer/replace-timer-with-threadpool. Authored-by: beliefer Signed-off-by: yangjie01 --- .../main/scala/org/apache/spark/BarrierCoordinator.scala | 11 +++ .../main/scala/org/apache/spark/BarrierTaskContext.scala | 14 ++ .../org/apache/spark/scheduler/TaskSchedulerImpl.scala | 15 --- .../scala/org/apache/spark/ui/ConsoleProgressBar.scala | 11 --- .../main/scala/org/apache/spark/util/ThreadUtils.scala | 16 ++-- .../java/org/apache/spark/launcher/LauncherServer.java | 8 6 files changed, 47 insertions(+), 28 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala index 9bc7ade2e5ad..942242107e22 100644 --- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala +++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala @@ -17,8 +17,8 @@ package org.apache.spark -import java.util.{Timer, TimerTask} -import java.util.concurrent.ConcurrentHashMap +import java.util.TimerTask +import java.util.concurrent.{ConcurrentHashMap, TimeUnit} import java.util.function.Consumer import scala.collection.mutable.{ArrayBuffer, HashSet} @@ -26,6 +26,7 @@ import scala.collection.mutable.{ArrayBuffer, HashSet} import org.apache.spark.internal.Logging import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint} import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, SparkListenerStageCompleted} +import org.apache.spark.util.ThreadUtils /** * For each barrier stage attempt, only at most one barrier() call can be active at any time, thus @@ -51,7 +52,8 @@ private[spark] class BarrierCoordinator( // TODO SPARK-25030 Create a Timer() in the mainClass submitted to SparkSubmit makes it unable to // fetch result, we shall fix the issue. - private lazy val timer = new Timer("BarrierCoordinator barrier epoch increment timer") + private lazy val timer = ThreadUtils.newSingleThreadScheduledExecutor( +"BarrierCoordinator barrier epoch increment timer") // Listen to StageCompleted event, clear corresponding ContextBarrierState. private val listener = new SparkListener { @@ -77,6 +79,7 @@ private[spark] class BarrierCoordinator( states.forEachValue(1, clearStateConsumer) states.clear() listenerBus.removeListener(listener) + ThreadUtils.shutdown(timer) } finally { super.onStop() } @@ -168,7 +171,7 @@ private[spark] class BarrierCoordinator( // we may timeout for the sync. if (requesters.isEmpty) { initTimerTask(this) - timer.schedule(timerTask, timeoutInSecs * 1000) + timer.schedule(timerTask, timeoutInSecs, TimeUnit.SECONDS) } // Add the requester to array of RPCC
(spark) branch master updated (0154c059cddb -> fd476c1c855a)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0154c059cddb [MINOR][DOCS] Remove Java 8/11 at `IgnoreUnrecognizedVMOptions` description add fd476c1c855a [SPARK-46969][SQL][TESTS] Recover `to_timestamp('366', 'DD')` test case of `datetime-parsing-invalid.sql` No new revisions were added by this update. Summary of changes: .../ansi/datetime-parsing-invalid.sql.out| 7 +++ .../analyzer-results/datetime-parsing-invalid.sql.out| 7 +++ .../sql-tests/inputs/datetime-parsing-invalid.sql| 3 +-- .../results/ansi/datetime-parsing-invalid.sql.out| 16 .../sql-tests/results/datetime-parsing-invalid.sql.out | 8 5 files changed, 39 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46918][YARN] Replace self-defined variables with Hadoop ContainerExitStatus
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e32a8249c6dd [SPARK-46918][YARN] Replace self-defined variables with Hadoop ContainerExitStatus e32a8249c6dd is described below commit e32a8249c6ddb15e01d2307964f2978f4a10ad56 Author: Cheng Pan AuthorDate: Tue Jan 30 20:17:11 2024 +0800 [SPARK-46918][YARN] Replace self-defined variables with Hadoop ContainerExitStatus ### What changes were proposed in this pull request? Replace the Spark self-defined `VMEM_EXCEEDED_EXIT_CODE` and `PMEM_EXCEEDED_EXIT_CODE` with Hadoop defined `ContainerExitStatus.KILLED_EXCEEDED_VMEM` and `ContainerExitStatus.KILLED_EXCEEDED_PMEM` which were introduced in YARN-2091(since Hadoop 2.5.0) ### Why are the changes needed? Minor code clean-up ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44950 from pan3793/SPARK-46918. Authored-by: Cheng Pan Signed-off-by: yangjie01 --- .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index 736eaa52b81c..7f0469937fef 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -851,9 +851,6 @@ private[yarn] class YarnAllocator( onHostStr, completedContainer.getState, completedContainer.getExitStatus)) -// Hadoop 2.2.X added a ContainerExitStatus we should switch to use -// there are some exit status' we shouldn't necessarily count against us, but for -// now I think its ok as none of the containers are expected to exit. val exitStatus = completedContainer.getExitStatus val (exitCausedByApp, containerExitReason) = exitStatus match { case _ if shutdown => @@ -867,7 +864,7 @@ private[yarn] class YarnAllocator( // just as easily finish on any other executor. See SPARK-8167. (false, s"Container ${containerId}${onHostStr} was preempted.") // Should probably still count memory exceeded exit codes towards task failures - case VMEM_EXCEEDED_EXIT_CODE => + case ContainerExitStatus.KILLED_EXCEEDED_VMEM => val vmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX virtual memory used".r val diag = vmemExceededPattern.findFirstIn(completedContainer.getDiagnostics) .map(_.concat(".")).getOrElse("") @@ -876,7 +873,7 @@ private[yarn] class YarnAllocator( s"${YarnConfiguration.NM_VMEM_PMEM_RATIO} or disabling " + s"${YarnConfiguration.NM_VMEM_CHECK_ENABLED} because of YARN-4714." (true, message) - case PMEM_EXCEEDED_EXIT_CODE => + case ContainerExitStatus.KILLED_EXCEEDED_PMEM => val pmemExceededPattern = raw"$MEM_REGEX of $MEM_REGEX physical memory used".r val diag = pmemExceededPattern.findFirstIn(completedContainer.getDiagnostics) .map(_.concat(".")).getOrElse("") @@ -1025,8 +1022,6 @@ private[yarn] class YarnAllocator( private object YarnAllocator { val MEM_REGEX = "[0-9.]+ [KMG]B" - val VMEM_EXCEEDED_EXIT_CODE = -103 - val PMEM_EXCEEDED_EXIT_CODE = -104 val DECOMMISSIONING_NODES_CACHE_SIZE = 200 val NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS = Set( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 56633e697571 [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner 56633e697571 is described below commit 56633e69757174da8a7dd8f4ea5298fd0a00e656 Author: Ruifeng Zheng AuthorDate: Mon Jan 29 13:55:59 2024 +0800 [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner ### What changes were proposed in this pull request? Simplify the protobuf function transformation in Planner ### Why are the changes needed? make `transformUnregisteredFunction` simple and reuse existing helper function ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44925 from zhengruifeng/connect_proto_simple. Authored-by: Ruifeng Zheng Signed-off-by: yangjie01 --- .../sql/connect/planner/SparkConnectPlanner.scala | 80 +++--- 1 file changed, 25 insertions(+), 55 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 3e59b2644755..977bff690bac 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -1710,53 +1710,6 @@ class SparkConnectPlanner( */ private def transformUnregisteredFunction( fun: proto.Expression.UnresolvedFunction): Option[Expression] = { -def extractArgsOfProtobufFunction( -functionName: String, -argumentsCount: Int, -children: collection.Seq[Expression]) -: (String, Option[Array[Byte]], Map[String, String]) = { - val messageClassName = children(1) match { -case Literal(s, StringType) if s != null => s.toString -case other => - throw InvalidPlanInput( -s"MessageClassName in $functionName should be a literal string, but got $other") - } - val (binaryFileDescSetOpt, options) = if (argumentsCount == 2) { -(None, Map.empty[String, String]) - } else if (argumentsCount == 3) { -children(2) match { - case Literal(b, BinaryType) if b != null => -(Some(b.asInstanceOf[Array[Byte]]), Map.empty[String, String]) - case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) => -(None, ExprUtils.convertToMapData(CreateMap(arguments))) - case other => -throw InvalidPlanInput( - s"The valid type for the 3rd arg in $functionName " + -s"is binary or map, but got $other") -} - } else if (argumentsCount == 4) { -val fileDescSetOpt = children(2) match { - case Literal(b, BinaryType) if b != null => -Some(b.asInstanceOf[Array[Byte]]) - case other => -throw InvalidPlanInput( - s"DescFilePath in $functionName should be a literal binary, but got $other") -} -val map = children(3) match { - case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) => -ExprUtils.convertToMapData(CreateMap(arguments)) - case other => -throw InvalidPlanInput( - s"Options in $functionName should be created by map, but got $other") -} -(fileDescSetOpt, map) - } else { -throw InvalidPlanInput( - s"$functionName requires 2 ~ 4 arguments, but got $argumentsCount ones!") - } - (messageClassName, binaryFileDescSetOpt, options) -} - fun.getFunctionName match { case "product" if fun.getArgumentsCount == 1 => Some( @@ -1979,17 +1932,13 @@ class SparkConnectPlanner( // Protobuf-specific functions case "from_protobuf" if Seq(2, 3, 4).contains(fun.getArgumentsCount) => val children = fun.getArgumentsList.asScala.map(transformExpression) -val (messageClassName, binaryFileDescSetOpt, options) = - extractArgsOfProtobufFunction("from_protobuf", fun.getArgumentsCount, children) -Some( - ProtobufDataToCatalyst(children.head, messageClassName, binaryFileDescSetOpt, options)) +val (msgName, desc, options) = extractProtobufArgs(children.toSeq) +Some(ProtobufDataToCatalyst(children(0), msgName, desc, options)) case "to_protobuf" if Seq
(spark) branch master updated: [SPARK-46432][BUILD] Upgrade Netty to 4.1.106.Final
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 44b163d281b9 [SPARK-46432][BUILD] Upgrade Netty to 4.1.106.Final 44b163d281b9 is described below commit 44b163d281b9773cab9995e690ec3f4751c8be69 Author: panbingkun AuthorDate: Fri Jan 26 11:12:11 2024 +0800 [SPARK-46432][BUILD] Upgrade Netty to 4.1.106.Final ### What changes were proposed in this pull request? The pr aims to upgrade `Netty` from `4.1.100.Final` to `4.1.106.Final`. ### Why are the changes needed? - To bring the latest bug fixes Automatically close Http2StreamChannel when Http2FrameStreamExceptionreaches end ofChannelPipeline ([#13651](https://github.com/netty/netty/pull/13651)) Symbol not found: _netty_jni_util_JNI_OnLoad ([#13695](https://github.com/netty/netty/issues/13728)) - 4.1.106.Final release note: https://netty.io/news/2024/01/19/4-1-106-Final.html - 4.1.105.Final release note: https://netty.io/news/2024/01/16/4-1-105-Final.html - 4.1.104.Final release note: https://netty.io/news/2023/12/15/4-1-104-Final.html - 4.1.103.Final release note: https://netty.io/news/2023/12/13/4-1-103-Final.html - 4.1.101.Final release note: https://netty.io/news/2023/11/09/4-1-101-Final.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44384 from panbingkun/SPARK-46432. Lead-authored-by: panbingkun Co-authored-by: panbingkun Signed-off-by: yangjie01 --- common/network-yarn/pom.xml | 44 ++- dev/deps/spark-deps-hadoop-3-hive-2.3 | 37 +++-- pom.xml | 2 +- 3 files changed, 43 insertions(+), 40 deletions(-) diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index c809bdfbbc1d..3f2ae21eeb3b 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -173,27 +173,29 @@ unpack package - - - - - - - - - - - - - + + + + + + + + + + + + + + run diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 4ee0f5a41191..71f9ac8665b0 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -191,16 +191,16 @@ metrics-jmx/4.2.21//metrics-jmx-4.2.21.jar metrics-json/4.2.21//metrics-json-4.2.21.jar metrics-jvm/4.2.21//metrics-jvm-4.2.21.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.100.Final//netty-all-4.1.100.Final.jar -netty-buffer/4.1.100.Final//netty-buffer-4.1.100.Final.jar -netty-codec-http/4.1.100.Final//netty-codec-http-4.1.100.Final.jar -netty-codec-http2/4.1.100.Final//netty-codec-http2-4.1.100.Final.jar -netty-codec-socks/4.1.100.Final//netty-codec-socks-4.1.100.Final.jar -netty-codec/4.1.100.Final//netty-codec-4.1.100.Final.jar -netty-common/4.1.100.Final//netty-common-4.1.100.Final.jar -netty-handler-proxy/4.1.100.Final//netty-handler-proxy-4.1.100.Final.jar -netty-handler/4.1.100.Final//netty-handler-4.1.100.Final.jar -netty-resolver/4.1.100.Final//netty-resolver-4.1.100.Final.jar +netty-all/4.1.106.Final//netty-all-4.1.106.Final.jar +netty-buffer/4.1.106.Final//netty-buffer-4.1.106.Final.jar +netty-codec-http/4.1.106.Final//netty-codec-http-4.1.106.Final.jar +netty-codec-http2/4.1.106.Final//netty-codec-http2-4.1.106.Final.jar +netty-codec-socks/4.1.106.Final//netty-codec-socks-4.1.106.Final.jar +netty-codec/4.1.106.Final//netty-codec-4.1.106.Final.jar +netty-common/4.1.106.Final//netty-common-4.1.106.Final.jar +netty-handler-proxy/4.1.106.Final//netty-handler-proxy-4.1.106.Final.jar +netty-handler/4.1.106.Final//netty-handler-4.1.106.Final.jar +netty-resolver/4.1.106.Final//netty-resolver-4.1.106.Final.jar netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar netty-tcnative-boringssl-static/2.0.61.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-aarch_64.jar netty-tcnative-boringssl-static/2.0.61.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar @@ -208,14 +208,15 @@ netty-tcnative-boringssl-static/2.0.61.Final/osx
(spark) branch master updated: [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d3a8b303c5c0 [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input d3a8b303c5c0 is described below commit d3a8b303c5c056ec0863d20b33de6f1a5865dfae Author: Ruifeng Zheng AuthorDate: Thu Jan 25 11:11:18 2024 +0800 [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input ### What changes were proposed in this pull request? `bloomFilter` function should throw `AnalysisException` for invalid input ### Why are the changes needed? 1. `BloomFilterAggregate` itself validates the input, and throws meaningful errors. we should not handle those invalid input and throw `InvalidPlanInput` in Planner. 2. to be consistent with vanilla Scala API and other functions ### Does this PR introduce _any_ user-facing change? yes, `InvalidPlanInput` -> `AnalysisException` ### How was this patch tested? updated CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #44821 from zhengruifeng/connect_bloom_filter_agg_error. Authored-by: Ruifeng Zheng Signed-off-by: yangjie01 --- .../apache/spark/sql/DataFrameStatFunctions.scala | 28 -- .../spark/sql/ClientDataFrameStatSuite.scala | 20 .../sql/connect/planner/SparkConnectPlanner.scala | 25 +-- 3 files changed, 16 insertions(+), 57 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala index 4daa9fa88e66..4eef26da706f 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala @@ -22,7 +22,6 @@ import java.io.ByteArrayInputStream import scala.jdk.CollectionConverters._ -import org.apache.spark.SparkException import org.apache.spark.connect.proto.{Relation, StatSampleBy} import org.apache.spark.sql.DataFrameStatFunctions.approxQuantileResultEncoder import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.{ArrayEncoder, BinaryEncoder, PrimitiveDoubleEncoder} @@ -599,7 +598,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo * @since 3.5.0 */ def bloomFilter(colName: String, expectedNumItems: Long, fpp: Double): BloomFilter = { -buildBloomFilter(Column(colName), expectedNumItems, -1L, fpp) +bloomFilter(Column(colName), expectedNumItems, fpp) } /** @@ -614,7 +613,8 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo * @since 3.5.0 */ def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): BloomFilter = { -buildBloomFilter(col, expectedNumItems, -1L, fpp) +val numBits = BloomFilter.optimalNumOfBits(expectedNumItems, fpp) +bloomFilter(col, expectedNumItems, numBits) } /** @@ -629,7 +629,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo * @since 3.5.0 */ def bloomFilter(colName: String, expectedNumItems: Long, numBits: Long): BloomFilter = { -buildBloomFilter(Column(colName), expectedNumItems, numBits, Double.NaN) +bloomFilter(Column(colName), expectedNumItems, numBits) } /** @@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo * @since 3.5.0 */ def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = { -buildBloomFilter(col, expectedNumItems, numBits, Double.NaN) - } - - private def buildBloomFilter( - col: Column, - expectedNumItems: Long, - numBits: Long, - fpp: Double): BloomFilter = { -def numBitsValue: Long = if (!fpp.isNaN) { - BloomFilter.optimalNumOfBits(expectedNumItems, fpp) -} else { - numBits -} - -if (fpp <= 0d || fpp >= 1d) { - throw new SparkException("False positive probability must be within range (0.0, 1.0)") -} -val agg = Column.fn("bloom_filter_agg", col, lit(expectedNumItems), lit(numBitsValue)) - +val agg = Column.fn("bloom_filter_agg", col, lit(expectedNumItems), lit(numBits)) val ds = sparkSession.newDataset(BinaryEncoder) { builder => builder.getProjectBuilder .setInput(root) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala index d0a89f67
(spark) branch master updated: [SPARK-46826][INFRA] Reset `grpcio` installation version of `Python linter dependencies for branch-3.4/branch-3.5`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 74b6301f152d [SPARK-46826][INFRA] Reset `grpcio` installation version of `Python linter dependencies for branch-3.4/branch-3.5` 74b6301f152d is described below commit 74b6301f152d438246756d665a3aa69e401e6273 Author: yangjie01 AuthorDate: Wed Jan 24 19:06:31 2024 +0800 [SPARK-46826][INFRA] Reset `grpcio` installation version of `Python linter dependencies for branch-3.4/branch-3.5` ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/43942 upgraded the `grpcio` version and simultaneously upgraded the `grpcio` version installed in `Install Python linter dependencies for branch-3.4` and `Install Python linter dependencies for branch-3.5` in `build_and_test.yml`. These two steps are used to install Python linter dependencies for `branch-3.4/branch-3.5` in daily tests. They should use the same configuration as `branch-3.4/branch-3.5` for safety. So this pr reset the version of grpc [...] - branch-3.4 https://github.com/apache/spark/blob/e56bd97c04c184104046e51e6759e616c86683fa/.github/workflows/build_and_test.yml#L588-L595 - branch-3.5 https://github.com/apache/spark/blob/0956db6901bf03d2d948b23f00bcd6e74a0c251b/.github/workflows/build_and_test.yml#L637-L644 ### Why are the changes needed? The versions of the dependencies installed in `Install Python linter dependencies for branch-3.4` and `Install Python linter dependencies for branch-3.5` should be consistent with `branch-3.4/branch-3.5`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Monitor GA after merged ### Was this patch authored or co-authored using generative AI tooling? No Closes #44866 from LuciferYang/SPARK-46826. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .github/workflows/build_and_test.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 4038f63fb0dc..1d98727a4231 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -687,14 +687,14 @@ jobs: # SPARK-44554: Copy from https://github.com/apache/spark/blob/a05c27e85829fe742c1828507a1fd180cdc84b54/.github/workflows/build_and_test.yml#L571-L578 # Should delete this section after SPARK 3.4 EOL. python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 'mypy==0.920' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 'jinja2<3.0.0' 'black==22.6.0' -python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.59.3' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' +python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.48.1' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' - name: Install Python linter dependencies for branch-3.5 if: inputs.branch == 'branch-3.5' run: | # SPARK-45212: Copy from https://github.com/apache/spark/blob/555c8def51e5951c7bf5165a332795e9e330ec9d/.github/workflows/build_and_test.yml#L631-L638 # Should delete this section after SPARK 3.5 EOL. python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 'jinja2<3.0.0' 'black==22.6.0' -python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.59.3' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' +python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' - name: Install Python dependencies for python linter and documentation generation if: inputs.branch != 'branch-3.4' && inputs.branch != 'branch-3.5' run: | - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45593][BUILD][FOLLOWUP] Correct relocation connect guava dependency
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ea0436752fe5 [SPARK-45593][BUILD][FOLLOWUP] Correct relocation connect guava dependency ea0436752fe5 is described below commit ea0436752fe5b2a1ca58fad3877f48905b3c2d8a Author: yikaifei AuthorDate: Wed Jan 24 19:03:00 2024 +0800 [SPARK-45593][BUILD][FOLLOWUP] Correct relocation connect guava dependency ### What changes were proposed in this pull request? This PR amins to correct relocation connect guava dependency and remove duplicate connect-common from SBT build jars. **Item 1:** In https://github.com/apache/spark/pull/43436, We fixed the connect module dependency on guava, but the dependency on guava was relocation incorrectly. - connect server and connect client jvm don't relocation guava dependency, this runs the risk of causing conflict problems; - connect common relocation does not take effect because it defines conflicting relocation rules with the parent pom(Now, we remove guava dependency from connect-common as it never use this library); **Item2:** Remove duplicate connect-common from SBT build jars as it is shaded in the spark connect. Also, in fact, before this PR, in the output jars built using SBT, connect-common and common-server were the same thing, because they both hit the `jar.getName.contains("spark-connect")` condition. ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44801 from Yikf/SPARK-45593-SBT. Authored-by: yikaifei Signed-off-by: yangjie01 --- connector/connect/client/jvm/pom.xml | 22 +- connector/connect/common/pom.xml | 25 - connector/connect/server/pom.xml | 26 ++ project/SparkBuild.scala | 6 +- 4 files changed, 52 insertions(+), 27 deletions(-) diff --git a/connector/connect/client/jvm/pom.xml b/connector/connect/client/jvm/pom.xml index 9bedebf523a7..81ffb140226e 100644 --- a/connector/connect/client/jvm/pom.xml +++ b/connector/connect/client/jvm/pom.xml @@ -59,6 +59,18 @@ protobuf-java compile + + com.google.guava + guava + ${connect.guava.version} + compile + + + com.google.guava + failureaccess + ${guava.failureaccess.version} + compile + com.lihaoyi ammonite_${scala.version} @@ -105,6 +117,7 @@ true + com.google.guava:* com.google.android:* com.google.api.grpc:* com.google.code.findbugs:* @@ -124,6 +137,13 @@ + + com.google.common + ${spark.shade.packageName}.connect.guava + +com.google.common.** + + io.grpc ${spark.shade.packageName}.io.grpc @@ -135,7 +155,7 @@ com.google ${spark.shade.packageName}.com.google - + com.google.common.** diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml index 336d83e04c15..b0f015246f4c 100644 --- a/connector/connect/common/pom.xml +++ b/connector/connect/common/pom.xml @@ -47,23 +47,6 @@ com.google.protobuf protobuf-java - - -com.google.guava -guava -${connect.guava.version} -compile - - -com.google.guava -failureaccess -${guava.failureaccess.version} -compile - io.grpc grpc-netty @@ -158,17 +141,9 @@ org.spark-project.spark:unused -com.google.guava:guava -com.google.guava:failureaccess org.apache.tomcat:annotations-api - - -com.google.common - ${spark.shade.packageName}.connect.guava - - diff --git a/connector/connect/server/pom.xml b/connector/connect/server/pom.xml index 82127f736ccb..bdea8a627000 100644 --- a/connector/connect/serve
(spark) branch master updated: [SPARK-44495][INFRA][K8S] Use the latest minikube in K8s IT
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d114e262ba29 [SPARK-44495][INFRA][K8S] Use the latest minikube in K8s IT d114e262ba29 is described below commit d114e262ba295995bb6a85035c1717cd353a526a Author: Dongjoon Hyun AuthorDate: Sun Jan 21 08:53:16 2024 +0800 [SPARK-44495][INFRA][K8S] Use the latest minikube in K8s IT ### What changes were proposed in this pull request? This PR aims to recover GitHub Action K8s IT to use the latest Minikube and to make it sure that Apache Spark K8s module are tested with all Minikubes without any issues. **BEFORE** - Minikube: v1.30.1 - K8s: v1.26.3 **AFTER** - Minikube: v1.32.0 - K8s: v1.28.3 ### Why are the changes needed? - Previously, it was pinned due to the failure. - After this PR, we will track the latest Minikube and K8s version always. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44813 from dongjoon-hyun/SPARK-44495. Authored-by: Dongjoon Hyun Signed-off-by: yangjie01 --- .github/workflows/build_and_test.yml | 8 +++- .../deploy/k8s/integrationtest/KubernetesTestComponents.scala | 2 ++ .../apache/spark/deploy/k8s/integrationtest/PVTestsSuite.scala| 3 ++- .../spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala | 4 ++-- 4 files changed, 9 insertions(+), 8 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 99bb2b12e083..69636629ca9d 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -1063,9 +1063,7 @@ jobs: - name: start minikube run: | # See more in "Installation" https://minikube.sigs.k8s.io/docs/start/ - # curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 - # TODO(SPARK-44495): Resume to use the latest minikube for k8s-integration-tests. - curl -LO https://storage.googleapis.com/minikube/releases/v1.30.1/minikube-linux-amd64 + curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 sudo install minikube-linux-amd64 /usr/local/bin/minikube rm minikube-linux-amd64 # Github Action limit cpu:2, memory: 6947MB, limit to 2U6G for better resource statistic @@ -1074,7 +1072,7 @@ jobs: run: | kubectl get pods -A kubectl describe node - - name: Run Spark on K8S integration test (With driver cpu 0.5, executor cpu 0.2 limited) + - name: Run Spark on K8S integration test run: | # Prepare PV test PVC_TMP_DIR=$(mktemp -d) @@ -1084,7 +1082,7 @@ jobs: kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts || true kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.8.2/installer/volcano-development.yaml || true eval $(minikube docker-env) - build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dspark.kubernetes.test.driverRequestCores=0.5 -Dspark.kubernetes.test.executorRequestCores=0.2 -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local "kubernetes-integration-tests/test" + build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local "kubernetes-integration-tests/test" - name: Upload Spark on K8S integration tests log files if: ${{ !success() }} uses: actions/upload-artifact@v4 diff --git a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala index 3762c31538dc..9581a78619dd 100644 --- a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala +++ b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala @@ -75,6 +75,8 @@ private[spark] class KubernetesTestComponents(val kubernetesClient: KubernetesCl .set(UI_ENABLED.key, "true") .set("spark.kubernetes.submis
(spark) branch master updated: [SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dceb8bdc72ef [SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh` dceb8bdc72ef is described below commit dceb8bdc72ef24ffa1eb5c1820e6350207f042f5 Author: yangjie01 AuthorDate: Sat Jan 20 17:39:01 2024 +0800 [SPARK-46767][PYTHON][DOCS] Refine docstring of `abs/acos/acosh` ### What changes were proposed in this pull request? This pr refine docstring of `abs/acos/acosh` and add some new examples. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44794 from LuciferYang/math-functions-1. Authored-by: yangjie01 Signed-off-by: yangjie01 --- python/pyspark/sql/functions/builtin.py | 169 ++-- 1 file changed, 137 insertions(+), 32 deletions(-) diff --git a/python/pyspark/sql/functions/builtin.py b/python/pyspark/sql/functions/builtin.py index 1f6d86de28dc..62400accba10 100644 --- a/python/pyspark/sql/functions/builtin.py +++ b/python/pyspark/sql/functions/builtin.py @@ -734,7 +734,7 @@ def try_sum(col: "ColumnOrName") -> Column: @_try_remote_functions def abs(col: "ColumnOrName") -> Column: """ -Computes the absolute value. +Mathematical Function: Computes the absolute value of the given column or expression. .. versionadded:: 1.3.0 @@ -744,22 +744,66 @@ def abs(col: "ColumnOrName") -> Column: Parameters -- col : :class:`~pyspark.sql.Column` or str -target column to compute on. +The target column or expression to compute the absolute value on. Returns --- :class:`~pyspark.sql.Column` -column for computed results. +A new column object representing the absolute value of the input. Examples ->>> df = spark.range(1) ->>> df.select(abs(lit(-1))).show() -+---+ -|abs(-1)| -+---+ -| 1| -+---+ +Example 1: Compute the absolute value of a negative number + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([(1, -1), (2, -2), (3, -3)], ["id", "value"]) +>>> df.select(sf.abs(df.value)).show() ++--+ +|abs(value)| ++--+ +| 1| +| 2| +| 3| ++--+ + +Example 2: Compute the absolute value of an expression + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([(1, 1), (2, -2), (3, 3)], ["id", "value"]) +>>> df.select(sf.abs(df.id - df.value)).show() ++-+ +|abs((id - value))| ++-+ +|0| +|4| +|0| ++-+ + +Example 3: Compute the absolute value of a column with null values + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([(1, None), (2, -2), (3, None)], ["id", "value"]) +>>> df.select(sf.abs(df.value)).show() ++--+ +|abs(value)| ++--+ +| NULL| +| 2| +| NULL| ++--+ + +Example 4: Compute the absolute value of a column with double values + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([(1, -1.5), (2, -2.5), (3, -3.5)], ["id", "value"]) +>>> df.select(sf.abs(df.value)).show() ++--+ +|abs(value)| ++--+ +| 1.5| +| 2.5| +| 3.5| ++--+ """ return _invoke_function_over_columns("abs", col) @@ -1478,7 +1522,8 @@ def product(col: "ColumnOrName") -> Column: @_try_remote_functions def acos(col: "ColumnOrName") -> Column: """ -Computes inverse cosine of the input column. +Mathematical Function: Computes the inverse cosine (also known as arccosine) +of the given column or expression. .. versionadded:: 1.4.0 @@ -1488,23 +1533,54 @@ def acos(col: "ColumnOrName") -> Column: Parameters -- col : :class:`~pyspark.sql.Column` or str -target column to compute on. +The target column or expression to compute the inverse cosine on. Returns --- :class:`~pyspark.sql
(spark) branch master updated: [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 44d2c86e71fc [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error 44d2c86e71fc is described below commit 44d2c86e71fca7044e6d5d9e9222eecff17c360c Author: yikaifei AuthorDate: Thu Jan 18 11:32:01 2024 +0800 [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error ### What changes were proposed in this pull request? Fix a build issue, when building a runnable distribution from master code running spark-sql raise error: ``` Caused by: java.lang.ClassNotFoundException: org.sparkproject.guava.util.concurrent.internal.InternalFutureFailureAccess at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) ... 58 more ``` the problem is due to a gauva dependency in spark-connect-common POM that **conflicts** with the shade plugin of the parent pom. - the spark-connect-common contains `connect.guava.version` version of guava, and it is relocation as `${spark.shade.packageName}.guava` not the `${spark.shade.packageName}.connect.guava`; - The spark-network-common also contains guava related classes, it has also been relocation is `${spark.shade.packageName}.guava`, but guava version `${guava.version}`; - As a result, in the presence of different versions of the classpath org.sparkproject.guava.xx; In addition, after investigation, it seems that module spark-connect-common is not related to guava, so we can remove guava dependency from spark-connect-common. ### Why are the changes needed? Building a runnable distribution from master code is not runnable. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I ran the build command output a runnable distribution package manually for the tests; Build command: ``` ./dev/make-distribution.sh --name ui --pip --tgz -Phive -Phive-thriftserver -Pyarn -Pconnect ``` Test result: https://github.com/apache/spark/assets/51110188/aefbc433-ea5c-4287-8ebd-367806043ac8";> I also checked the `org.sparkproject.guava.cache.LocalCache` from jars dir; Before: ``` ➜ jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./ .//spark-connect_2.13-4.0.0-SNAPSHOT.jar .//spark-network-common_2.13-4.0.0-SNAPSHOT.jar .//spark-connect-common_2.13-4.0.0-SNAPSHOT.jar ``` Now: ``` ➜ jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./ .//spark-network-common_2.13-4.0.0-SNAPSHOT.jar ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #43436 from Yikf/SPARK-45593. Authored-by: yikaifei Signed-off-by: yangjie01 --- assembly/pom.xml | 6 ++ connector/connect/client/jvm/pom.xml | 8 +--- connector/connect/common/pom.xml | 34 ++ connector/connect/server/pom.xml | 25 - 4 files changed, 41 insertions(+), 32 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 77ff87c17f52..cd8c3fca9d23 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -149,6 +149,12 @@ org.apache.spark spark-connect_${scala.binary.version} ${project.version} + + + org.apache.spark + spark-connect-common_${scala.binary.version} + + org.apache.spark diff --git a/connector/connect/client/jvm/pom.xml b/connector/connect/client/jvm/pom.xml index 8057a33df178..9bedebf523a7 100644 --- a/connector/connect/client/jvm/pom.xml +++ b/connector/connect/client/jvm/pom.xml @@ -51,15 +51,9 @@ ${project.version} - - com.google.guava - guava - ${connect.guava.version} - compile - com.google.protobuf protobuf-java diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml index a374646f8f29..336d83e04c15 100644 --- a/connector/connect/common/pom.xml +++ b/connector/connect/common/pom.xml @@ -47,6 +47,11 @@ com.google.protobuf protobuf-java + com.google.guava guava @@ -145,6 +150,35 @@ + +org.apache.maven.plugins