(spark) branch master updated: [SPARK-47611][SQL] Cleanup dead code in MySQLDialect.getCatalystType
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b540cc538614 [SPARK-47611][SQL] Cleanup dead code in MySQLDialect.getCatalystType b540cc538614 is described below commit b540cc538614c9808dc5e83a339ff52917fa0f37 Author: Kent Yao AuthorDate: Wed Mar 27 01:45:22 2024 -0700 [SPARK-47611][SQL] Cleanup dead code in MySQLDialect.getCatalystType ### What changes were proposed in this pull request? This PR removes an unnecessary case-match branch for Types.BIT in MySQLDialect.getCatalystType, this is a special case for Maria Connector/J and can be handled in defaults as we have matched and handled Types.BIT&size > 1 before this. Additionally, we add some new tests for this corner case and other MySQL/Maria quirks ### Why are the changes needed? code refactoring and test improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45734 from yaooqinn/SPARK-47611. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 32 -- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 2 -- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 2 -- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 705957631601..10049169caa1 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -64,10 +64,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE unsigned_numbers (" + "tiny TINYINT UNSIGNED, small SMALLINT UNSIGNED, med MEDIUMINT UNSIGNED," + "nor INT UNSIGNED, big BIGINT UNSIGNED, deci DECIMAL(40,20) UNSIGNED," + - "dbl DOUBLE UNSIGNED)").executeUpdate() + "dbl DOUBLE UNSIGNED, tiny1u TINYINT(1) UNSIGNED)").executeUpdate() conn.prepareStatement("INSERT INTO unsigned_numbers VALUES (255, 65535, 16777215, 4294967295," + - "9223372036854775808, 123456789012345.123456789012345, 1.0002)").executeUpdate() + "9223372036854775808, 123456789012345.123456789012345, 1.0002, 0)") + .executeUpdate() conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() @@ -150,6 +151,13 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(rows.get(4).isInstanceOf[BigDecimal]) assert(rows.get(5).isInstanceOf[BigDecimal]) assert(rows.get(6).isInstanceOf[Double]) +// Unlike MySQL, MariaDB seems not to distinguish signed and unsigned tinyint(1). +val isMaria = jdbcUrl.indexOf("disableMariaDbDriver") == -1 +if (isMaria) { + assert(rows.get(7).isInstanceOf[Boolean]) +} else { + assert(rows.get(7).isInstanceOf[Short]) +} assert(rows.getShort(0) === 255) assert(rows.getInt(1) === 65535) assert(rows.getInt(2) === 16777215) @@ -157,6 +165,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(rows.getAs[BigDecimal](4).equals(new BigDecimal("9223372036854775808"))) assert(rows.getAs[BigDecimal](5).equals(new BigDecimal("123456789012345.1234567890123450"))) assert(rows.getDouble(6) === 1.0002) +if (isMaria) { + assert(rows.getBoolean(7) === false) +} else { + assert(rows.getShort(7) === 0) +} } test("Date types") { @@ -260,6 +273,21 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { test("SPARK-47478: all boolean synonyms read-write roundtrip") { val df = sqlContext.read.jdbc(jdbcUrl, "bools", new Properties) checkAnswer(df, Row(true, true, true)) + +val properties0 = new Properties() +properties0.setProperty("transformedBitIsBoolean", "false") +properties0.setProperty("tinyInt1isBit", "true") + +checkAnswer(spark.read.jdbc(jdbcUrl, "bools", properties0), Row(true, true, true)) +val properties1 = new Properties() +properties1.setProperty("transformedBitIsBoolean", "true")
(spark) branch master updated (f9eb3f3c13bf -> a600c0ea3159)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f9eb3f3c13bf [SPARK-46575][SQL][FOLLOWUP] Add back `HiveThriftServer2.startWithContext(SQLContext)` method for compatibility add a600c0ea3159 [SPARK-47491][CORE] Add `slf4j-api` jar to the class path first before the others of `jars` directory No new revisions were added by this update. Summary of changes: .../main/java/org/apache/spark/launcher/AbstractCommandBuilder.java | 6 ++ .../test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala | 3 +-- 2 files changed, 7 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (fd4b8e89f3a0 -> 7d87a94dd77f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from fd4b8e89f3a0 [SPARK-47555][SQL] Show a warning message about SQLException if `JDBCTableCatalog.loadTable` fails add 7d87a94dd77f [MINOR][CORE] When failed to canceling the job group, add a warning log No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (e00eace41a63 -> fd4b8e89f3a0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e00eace41a63 [SPARK-47561][SQL] Fix analyzer rule order issues about Alias add fd4b8e89f3a0 [SPARK-47555][SQL] Show a warning message about SQLException if `JDBCTableCatalog.loadTable` fails No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala| 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47561][SQL] Fix analyzer rule order issues about Alias
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e00eace41a63 [SPARK-47561][SQL] Fix analyzer rule order issues about Alias e00eace41a63 is described below commit e00eace41a63996deb213b6e1816257ebca281e5 Author: Wenchen Fan AuthorDate: Tue Mar 26 07:45:54 2024 -0700 [SPARK-47561][SQL] Fix analyzer rule order issues about Alias ### What changes were proposed in this pull request? We found two analyzer rule execution order issues in our internal workloads: - `CreateStruct.apply` creates `NamePlaceholder` for unresolved `NamedExpression`. However, with certain rule execution order, the `NamedExpression` may be removed (e.g. remove unnecessary `Alias`) before `NamePlaceholder` is resolved, then `NamePlaceholder` can't be resolved anymore. - UNPIVOT uses `UnresolvedAlias` to wrap `UnresolvedAttribute`. There is a conflict about how to determine the final alias name. If `ResolveAliases` runs first, then `UnresolvedAlias` will be removed and eventually the alias will be `b` for nested column `a.b`. If `ResolveReferences` runs first, then we resolve `a.b` first and then `UnresolvedAlias` will determine the alias as `a.b` not `b`. This PR fixes the two issues - `CreateStruct.apply` should determine the field name immediately if the input is `Alias` - The parser rule for UNPIVOT should follow how we parse SELECT and return `UnresolvedAttribute` directly without the `UnresolvedAlias` wrapper. It's a bit risky to fix the order issue between `ResolveAliases` and `ResolveReferences` as it can change the final query schema, we will save it for later. ### Why are the changes needed? fix unstable analyzer behavior with different rule execution orders. ### Does this PR introduce _any_ user-facing change? Yes, some failed queries can run now. The issue for UNPIVOT only affects the error message. ### How was this patch tested? verified by our internal workloads. The repro query is quite complicated to trigger a certain rule execution order so we won't add tests for it. The fix is quite obvious. ### Was this patch authored or co-authored using generative AI tooling? no Closes #45718 from cloud-fan/rule. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../catalyst/expressions/complexTypeCreator.scala | 1 + .../spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../sql/catalyst/parser/UnpivotParserSuite.scala | 39 ++ 3 files changed, 20 insertions(+), 22 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala index 332a49f78ab9..993684f2c1ed 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala @@ -374,6 +374,7 @@ object CreateStruct { // alias name inside CreateNamedStruct. case (u: UnresolvedAttribute, _) => Seq(Literal(u.nameParts.last), u) case (u @ UnresolvedExtractValue(_, e: Literal), _) if e.dataType == StringType => Seq(e, u) + case (a: Alias, _) => Seq(Literal(a.name), a) case (e: NamedExpression, _) if e.resolved => Seq(Literal(e.name), e) case (e: NamedExpression, _) => Seq(NamePlaceholder, e) case (g @ GetStructField(_, _, Some(name)), _) => Seq(Literal(name), g) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 131eaa3d..170dcc37f0a5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1346,7 +1346,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { * Create an Unpivot column. */ override def visitUnpivotColumn(ctx: UnpivotColumnContext): NamedExpression = withOrigin(ctx) { - UnresolvedAlias(UnresolvedAttribute(visitMultipartIdentifier(ctx.multipartIdentifier))) +UnresolvedAttribute(visitMultipartIdentifier(ctx.multipartIdentifier)) } /** diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala index c680e08c1c83..3012ef6f1544 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala +++ b/sql/catalys
(spark) branch master updated: [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8e20f8e3b440 [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense 8e20f8e3b440 is described below commit 8e20f8e3b4404b6d72ec47c546c94a040467c774 Author: Niranjan Jayakar AuthorDate: Tue Mar 26 07:43:10 2024 -0700 [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense ### What changes were proposed in this pull request? VS Code's intellisense is unable to detect the methods and properties of `SparkSession.builder`. A video is worth a thousand words: [video](https://github.com/apache/spark/assets/16217941/e611e7e7-8760-4d9f-aa6c-9d4bd519d516). Adjust the implementation for better compatibility with the IDE. ### Why are the changes needed? Compatibility with IDE tooling. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Built the wheel file locally and tested on local IDE. See [video](https://github.com/apache/spark/assets/16217941/429b06dd-44a7-4d13-a551-c2b72c326c1e). Confirmed the same works for Pycharm. Further confirmed that the Pydocs for these methods are unaffected. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45700 from nija-at/vscode-intellisense. Authored-by: Niranjan Jayakar Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/connect/session.py | 7 +++ python/pyspark/sql/session.py | 29 ++--- 2 files changed, 21 insertions(+), 15 deletions(-) diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py index f339fada0d11..2c08349a3300 100644 --- a/python/pyspark/sql/connect/session.py +++ b/python/pyspark/sql/connect/session.py @@ -236,10 +236,9 @@ class SparkSession: _client: SparkConnectClient -@classproperty -def builder(cls) -> Builder: -return cls.Builder() - +# SPARK-47544: Explicitly declaring this as an identifier instead of a method. +# If changing, make sure this bug is not reintroduced. +builder: Builder = classproperty(lambda cls: cls.Builder()) # type: ignore builder.__doc__ = PySparkSession.builder.__doc__ def __init__(self, connection: Union[str, DefaultChannelBuilder], userId: Optional[str] = None): diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index 6c80b7f42da4..4a8a653fd466 100644 --- a/python/pyspark/sql/session.py +++ b/python/pyspark/sql/session.py @@ -499,12 +499,18 @@ class SparkSession(SparkConversionMixin): os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1" opts["spark.remote"] = url -return RemoteSparkSession.builder.config(map=opts).getOrCreate() +return cast( +SparkSession, + RemoteSparkSession.builder.config(map=opts).getOrCreate(), +) elif "SPARK_LOCAL_REMOTE" in os.environ: url = "sc://localhost" os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1" opts["spark.remote"] = url -return RemoteSparkSession.builder.config(map=opts).getOrCreate() +return cast( +SparkSession, + RemoteSparkSession.builder.config(map=opts).getOrCreate(), +) else: raise PySparkRuntimeError( error_class="SESSION_ALREADY_EXIST", @@ -560,14 +566,14 @@ class SparkSession(SparkConversionMixin): # used in conjunction with Spark Connect mode. os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1" opts["spark.remote"] = url -return RemoteSparkSession.builder.config(map=opts).create() +return cast(SparkSession, RemoteSparkSession.builder.config(map=opts).create()) else: raise PySparkRuntimeError( error_class="ONLY_SUPPORTED_WITH_SPARK_CONNECT", message_parameters={"feature": "SparkSession.builder.create"}, ) -# TODO(SPARK-38912): Replace @classproperty with @classmethod + @property once support for +# TOD
(spark) branch master updated: [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 89104b93d324 [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types 89104b93d324 is described below commit 89104b93d324129ebe4dec3c666fe5e36a7586ad Author: Kent Yao AuthorDate: Tue Mar 26 07:37:39 2024 -0700 [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types ### What changes were proposed in this pull request? This PR adds tests for MySQL ENUM/SET Types In MySQL/Maria Connector/J, the JDBC ResultSetMetadata API maps ENUM/SET types to `typeId:java.sql.Types.CHAR,typeName:'CHAR'`, which makes it impossible to distinguish them from a normal `CHAR(n)` type. When working with ENUM/SET, it's possible to encounter char padding issues. However, this can be resolved by setting the LEGACY_CHAR_VARCHAR_AS_STRING parameter to true. ### Why are the changes needed? API auditing for MYSQL jdbc data source ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? added tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45713 from yaooqinn/SPARK-47557. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++ 1 file changed, 15 insertions(+) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 09eb99c25227..705957631601 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -26,6 +26,7 @@ import scala.util.Using import org.apache.spark.sql.Row import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.tags.DockerTest /** @@ -84,6 +85,10 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { "f4 FLOAT UNSIGNED, f5 FLOAT(10) UNSIGNED, f6 FLOAT(53) UNSIGNED)").executeUpdate() conn.prepareStatement("INSERT INTO floats VALUES (1.23, 4.56, 7.89, 1.23, 4.56, 7.89)") .executeUpdate() + +conn.prepareStatement("CREATE TABLE collections (" + +"a SET('cap', 'hat', 'helmet'), b ENUM('S', 'M', 'L', 'XL'))").executeUpdate() +conn.prepareStatement("INSERT INTO collections VALUES ('cap,hat', 'M')").executeUpdate() } def testConnection(): Unit = { @@ -275,6 +280,16 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { val df = spark.read.jdbc(jdbcUrl, "floats", new Properties) checkAnswer(df, Row(1.23f, 4.56f, 7.89d, 1.23d, 4.56d, 7.89d)) } + + test("SPARK-47557: MySQL ENUM/SET types contains only java.sq.Types.CHAR information") { +val df = spark.read.jdbc(jdbcUrl, "collections", new Properties) +checkAnswer(df, Row("cap,hat ", "M ")) +df.write.mode("append").jdbc(jdbcUrl, "collections", new Properties) +withSQLConf(SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key -> "true") { + checkAnswer(spark.read.jdbc(jdbcUrl, "collections", new Properties), +Row("cap,hat", "M") :: Row("cap,hat", "M") :: Nil) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for the number of partitions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ded8cdf8d945 [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for the number of partitions ded8cdf8d945 is described below commit ded8cdf8d9459e0e5b73c01c8ee41ae54ccd7ac5 Author: Hyukjin Kwon AuthorDate: Tue Mar 26 07:35:49 2024 -0700 [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for the number of partitions ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/45486 that addresses https://github.com/apache/spark/pull/45486#discussion_r1538753052 review comment to recover the test coverage related to the number of partitions in Python Data Source. ### Why are the changes needed? To restore the test coverage. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Unittest fixed, CI in this PR should verify it. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45720 from HyukjinKwon/SPARK-47367-folliwup. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_python_datasource.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/python/pyspark/sql/tests/test_python_datasource.py b/python/pyspark/sql/tests/test_python_datasource.py index f69e1dee1285..d028a210b007 100644 --- a/python/pyspark/sql/tests/test_python_datasource.py +++ b/python/pyspark/sql/tests/test_python_datasource.py @@ -28,6 +28,7 @@ from pyspark.sql.datasource import ( WriterCommitMessage, CaseInsensitiveDict, ) +from pyspark.sql.functions import spark_partition_id from pyspark.sql.types import Row, StructType from pyspark.testing.sqlutils import ( have_pyarrow, @@ -236,10 +237,12 @@ class BasePythonDataSourceTestsMixin: self.spark.dataSource.register(InMemoryDataSource) df = self.spark.read.format("memory").load() +self.assertEqual(df.select(spark_partition_id()).distinct().count(), 3) assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1"), Row(x=2, y="2")]) df = self.spark.read.format("memory").option("num_partitions", 2).load() assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")]) +self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2) def _get_test_json_data_source(self): import json - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 87cae7bc7870 [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing 87cae7bc7870 is described below commit 87cae7bc7870bacafc6afad99ba86a6efca2a464 Author: Dongjoon Hyun AuthorDate: Mon Mar 25 16:06:03 2024 -0700 [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing ### What changes were proposed in this pull request? This PR aims to handle HADOOP-19097 from Apache Spark side. We can remove this when Apache Hadoop `3.4.1` releases. - https://github.com/apache/hadoop/pull/6601 ### Why are the changes needed? Apache Hadoop shows a warning to its default configuration. This default value issue is fixed at Apache Spark 3.4.1. ``` 24/03/25 14:46:21 WARN ConfigurationHelper: Option fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms instead ``` This change will suppress Apache Hadoop default warning in the consistent way with the future Hadoop releases. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Manually. **BUILD** ``` $ dev/make-distribution.sh -Phadoop-cloud ``` **BEFORE** ``` scala> spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/") ... 24/03/25 15:50:46 WARN ConfigurationHelper: Option fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms instead ``` **AFTER** ``` scala> spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/") ...(ConfigurationHelper warning is gone)... ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45710 from dongjoon-hyun/SPARK-47552. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/SparkContext.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index d519617c4095..f8f0107ed139 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -417,6 +417,9 @@ class SparkContext(config: SparkConf) extends Logging { if (!_conf.contains("spark.app.name")) { throw new SparkException("An application name must be set in your configuration") } +// HADOOP-19097 Set fs.s3a.connection.establish.timeout to 30s +// We can remove this after Apache Hadoop 3.4.1 releases +conf.setIfMissing("spark.hadoop.fs.s3a.connection.establish.timeout", "30s") // This should be set as early as possible. SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47550][K8S][BUILD] Update `kubernetes-client` to 6.11.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7b9b3cb9d82c [SPARK-47550][K8S][BUILD] Update `kubernetes-client` to 6.11.0 7b9b3cb9d82c is described below commit 7b9b3cb9d82cdf017d6cd57e0dee4239deb1727d Author: Bjørn Jørgensen AuthorDate: Mon Mar 25 13:40:15 2024 -0700 [SPARK-47550][K8S][BUILD] Update `kubernetes-client` to 6.11.0 ### What changes were proposed in this pull request? Update `kubernetes-client` from 6.10.0 to 6.11.0 ### Why are the changes needed? [Release notes for 6.11.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.11.0) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45707 from bjornjorgensen/kub-client6.11.0. Authored-by: Bjørn Jørgensen Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +-- pom.xml | 2 +- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 2ffef88dbe7e..0d3e24161fe6 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -155,31 +155,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar jul-to-slf4j/2.0.12//jul-to-slf4j-2.0.12.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar -kubernetes-client-api/6.10.0//kubernetes-client-api-6.10.0.jar -kubernetes-client/6.10.0//kubernetes-client-6.10.0.jar -kubernetes-httpclient-okhttp/6.10.0//kubernetes-httpclient-okhttp-6.10.0.jar -kubernetes-model-admissionregistration/6.10.0//kubernetes-model-admissionregistration-6.10.0.jar -kubernetes-model-apiextensions/6.10.0//kubernetes-model-apiextensions-6.10.0.jar -kubernetes-model-apps/6.10.0//kubernetes-model-apps-6.10.0.jar -kubernetes-model-autoscaling/6.10.0//kubernetes-model-autoscaling-6.10.0.jar -kubernetes-model-batch/6.10.0//kubernetes-model-batch-6.10.0.jar -kubernetes-model-certificates/6.10.0//kubernetes-model-certificates-6.10.0.jar -kubernetes-model-common/6.10.0//kubernetes-model-common-6.10.0.jar -kubernetes-model-coordination/6.10.0//kubernetes-model-coordination-6.10.0.jar -kubernetes-model-core/6.10.0//kubernetes-model-core-6.10.0.jar -kubernetes-model-discovery/6.10.0//kubernetes-model-discovery-6.10.0.jar -kubernetes-model-events/6.10.0//kubernetes-model-events-6.10.0.jar -kubernetes-model-extensions/6.10.0//kubernetes-model-extensions-6.10.0.jar -kubernetes-model-flowcontrol/6.10.0//kubernetes-model-flowcontrol-6.10.0.jar -kubernetes-model-gatewayapi/6.10.0//kubernetes-model-gatewayapi-6.10.0.jar -kubernetes-model-metrics/6.10.0//kubernetes-model-metrics-6.10.0.jar -kubernetes-model-networking/6.10.0//kubernetes-model-networking-6.10.0.jar -kubernetes-model-node/6.10.0//kubernetes-model-node-6.10.0.jar -kubernetes-model-policy/6.10.0//kubernetes-model-policy-6.10.0.jar -kubernetes-model-rbac/6.10.0//kubernetes-model-rbac-6.10.0.jar -kubernetes-model-resource/6.10.0//kubernetes-model-resource-6.10.0.jar -kubernetes-model-scheduling/6.10.0//kubernetes-model-scheduling-6.10.0.jar -kubernetes-model-storageclass/6.10.0//kubernetes-model-storageclass-6.10.0.jar +kubernetes-client-api/6.11.0//kubernetes-client-api-6.11.0.jar +kubernetes-client/6.11.0//kubernetes-client-6.11.0.jar +kubernetes-httpclient-okhttp/6.11.0//kubernetes-httpclient-okhttp-6.11.0.jar +kubernetes-model-admissionregistration/6.11.0//kubernetes-model-admissionregistration-6.11.0.jar +kubernetes-model-apiextensions/6.11.0//kubernetes-model-apiextensions-6.11.0.jar +kubernetes-model-apps/6.11.0//kubernetes-model-apps-6.11.0.jar +kubernetes-model-autoscaling/6.11.0//kubernetes-model-autoscaling-6.11.0.jar +kubernetes-model-batch/6.11.0//kubernetes-model-batch-6.11.0.jar +kubernetes-model-certificates/6.11.0//kubernetes-model-certificates-6.11.0.jar +kubernetes-model-common/6.11.0//kubernetes-model-common-6.11.0.jar +kubernetes-model-coordination/6.11.0//kubernetes-model-coordination-6.11.0.jar +kubernetes-model-core/6.11.0//kubernetes-model-core-6.11.0.jar +kubernetes-model-discovery/6.11.0//kubernetes-model-discovery-6.11.0.jar +kubernetes-model-events/6.11.0//kubernetes-model-events-6.11.0.jar +kubernetes-model-extensions/6.11.0//kubernetes-model-extensions-6.11.0.jar +kubernetes-model-flowcontrol/6.11.0//kubernetes-model-flowcontrol-6.11.0.jar +kubernetes-model-gatewayapi/6.11.0//kubernetes-model-gatewayapi-6.11.0.jar +kubernetes-model-metrics/6.11.0//kubernetes-model-metrics-6.11.0.jar +kubernetes-model-networking/6.11.0//kubernetes-model-networking-6.11.0.jar +kubernetes-model-node/6.11.0//kubernetes-model-node-6.11.0
(spark) branch master updated: [SPARK-47548][BUILD] Remove unused `commons-beanutils` dependency
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 90d506573409 [SPARK-47548][BUILD] Remove unused `commons-beanutils` dependency 90d506573409 is described below commit 90d5065734095d840f51d1eea3449d969565b742 Author: Dongjoon Hyun AuthorDate: Mon Mar 25 11:35:18 2024 -0700 [SPARK-47548][BUILD] Remove unused `commons-beanutils` dependency ### What changes were proposed in this pull request? This PR aims to remove unused `commons-beanutils` dependency from `pom.xml` and `LICENSE-binary`. ### Why are the changes needed? #30701 removed `commons-beanutils` from `hadoop-3` profile at Apache Spark 3.2.0. - https://github.com/apache/spark/pull/30701 #40788 removed `hadoop-2` profile from Apache Spark 3.5.0 - https://github.com/apache/spark/pull/40788 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45705 from dongjoon-hyun/SPARK-47548. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- LICENSE-binary | 1 - pom.xml| 5 - 2 files changed, 6 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index b9e7820c0baf..40271c9924bc 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -204,7 +204,6 @@ This project bundles some components that are also licensed under the Apache License Version 2.0: -commons-beanutils:commons-beanutils org.apache.zookeeper:zookeeper oro:oro commons-configuration:commons-configuration diff --git a/pom.xml b/pom.xml index 8e68ad7346f8..de26e6ed33a8 100644 --- a/pom.xml +++ b/pom.xml @@ -646,11 +646,6 @@ commons-collections4 ${commons.collections4.version} - -commons-beanutils -commons-beanutils -1.9.4 - org.apache.ivy ivy - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated (77fd58bf8d52 -> 5e7600eab833)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from 77fd58bf8d52 [SPARK-47537][SQL][3.4] Fix error data type mapping on MySQL Connector/J add 5e7600eab833 [SPARK-47503][SQL][3.4] Make makeDotNode escape graph node name always No new revisions were added by this update. Summary of changes: .../spark/sql/execution/ui/SparkPlanGraph.scala| 3 +- .../sql/execution/ui/SparkPlanGraphSuite.scala | 44 ++ 2 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47537][SQL][3.4] Fix error data type mapping on MySQL Connector/J
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 77fd58bf8d52 [SPARK-47537][SQL][3.4] Fix error data type mapping on MySQL Connector/J 77fd58bf8d52 is described below commit 77fd58bf8d5276906c674f3cdcec2715c8520d47 Author: Kent Yao AuthorDate: Mon Mar 25 08:51:20 2024 -0700 [SPARK-47537][SQL][3.4] Fix error data type mapping on MySQL Connector/J ### What changes were proposed in this pull request? This PR fixes: - BIT(n>1) is wrongly mapping to boolean instead of long for MySQL Connector/J. This is because we only have a case branch for Maria Connector/J. - MySQL Docker Integration Tests were using Maria Connector/J, not MySQL Connector/J ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45691 from yaooqinn/SPARK-47537-BB. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 47 +- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 28 - .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala| 4 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 5 +++ 4 files changed, 79 insertions(+), 5 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index bc202b1b8323..d0fcbfb7aaa8 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -43,7 +43,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { override val usesIpc = false override val jdbcPort: Int = 3306 override def getJdbcUrl(ip: String, port: Int): String = - s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass" + s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass&disableMariaDbDriver" } override def dataPreparation(conn: Connection): Unit = { @@ -74,6 +74,19 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { "'jumps', 'over', 'the', 'lazy', 'dog')").executeUpdate() } + def testConnection(): Unit = { +val conn = getConnection() +try { + assert(conn.getClass.getName === "com.mysql.cj.jdbc.ConnectionImpl") +} finally { + conn.close() +} + } + + test("SPARK-47537: ensure use the right jdbc driver") { +testConnection() + } + test("Basic test") { val df = sqlContext.read.jdbc(jdbcUrl, "tbl", new Properties) val rows = df.collect() @@ -193,3 +206,35 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(sql("select x, y from queryOption").collect.toSet == expectedResult) } } + +/** + * To run this test suite for a specific version (e.g., mysql:8.3.0): + * {{{ + * ENABLE_DOCKER_INTEGRATION_TESTS=1 MYSQL_DOCKER_IMAGE_NAME=mysql:8.3.0 + * ./build/sbt -Pdocker-integration-tests + * "docker-integration-tests/testOnly *MySQLOverMariaConnectorIntegrationSuite" + * }}} + */ +@DockerTest +class MySQLOverMariaConnectorIntegrationSuite extends MySQLIntegrationSuite { + + override val db = new DatabaseOnDocker { +override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", "mysql:8.0.31") +override val env = Map( + "MYSQL_ROOT_PASSWORD" -> "rootpass" +) +override val usesIpc = false +override val jdbcPort: Int = 3306 +override def getJdbcUrl(ip: String, port: Int): String = + s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass" + } + + override def testConnection(): Unit = { +val conn = getConnection() +try { + assert(conn.getClass.getName === "org.mariadb.jdbc.MariaDbConnection") +} finally { + conn.close() +} + } +} diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala index 072fdbb3f342..c4056c224f66 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test
(spark) branch branch-3.5 updated: [SPARK-47537][SQL][3.5] Fix error data type mapping on MySQL Connector/J
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9ad7b75784da [SPARK-47537][SQL][3.5] Fix error data type mapping on MySQL Connector/J 9ad7b75784da is described below commit 9ad7b75784daa48bf20dd00ae3288c718272fd69 Author: Kent Yao AuthorDate: Mon Mar 25 08:50:00 2024 -0700 [SPARK-47537][SQL][3.5] Fix error data type mapping on MySQL Connector/J ### What changes were proposed in this pull request? This PR fixes: - BIT(n>1) is wrongly mapping to boolean instead of long for MySQL Connector/J. This is because we only have a case branch for Maria Connector/J. - MySQL Docker Integration Tests were using Maria Connector/J, not MySQL Connector/J ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45690 from yaooqinn/SPARK-47537-B. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 47 +- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 38 + .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala| 4 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 5 +++ 4 files changed, 84 insertions(+), 10 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index dcf4225d522d..68d88fbc552a 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -43,7 +43,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { override val usesIpc = false override val jdbcPort: Int = 3306 override def getJdbcUrl(ip: String, port: Int): String = - s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass" + s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass&disableMariaDbDriver" } override def dataPreparation(conn: Connection): Unit = { @@ -75,6 +75,19 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { "'jumps', 'over', 'the', 'lazy', 'dog', '{\"status\": \"merrily\"}')").executeUpdate() } + def testConnection(): Unit = { +val conn = getConnection() +try { + assert(conn.getClass.getName === "com.mysql.cj.jdbc.ConnectionImpl") +} finally { + conn.close() +} + } + + test("SPARK-47537: ensure use the right jdbc driver") { +testConnection() + } + test("Basic test") { val df = sqlContext.read.jdbc(jdbcUrl, "tbl", new Properties) val rows = df.collect() @@ -200,3 +213,35 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(sql("select x, y from queryOption").collect.toSet == expectedResult) } } + +/** + * To run this test suite for a specific version (e.g., mysql:8.3.0): + * {{{ + * ENABLE_DOCKER_INTEGRATION_TESTS=1 MYSQL_DOCKER_IMAGE_NAME=mysql:8.3.0 + * ./build/sbt -Pdocker-integration-tests + * "docker-integration-tests/testOnly *MySQLOverMariaConnectorIntegrationSuite" + * }}} + */ +@DockerTest +class MySQLOverMariaConnectorIntegrationSuite extends MySQLIntegrationSuite { + + override val db = new DatabaseOnDocker { +override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", "mysql:8.0.31") +override val env = Map( + "MYSQL_ROOT_PASSWORD" -> "rootpass" +) +override val usesIpc = false +override val jdbcPort: Int = 3306 +override def getJdbcUrl(ip: String, port: Int): String = + s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass" + } + + override def testConnection(): Unit = { +val conn = getConnection() +try { + assert(conn.getClass.getName === "org.mariadb.jdbc.MariaDbConnection") +} finally { + conn.close() +} + } +} diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala index 719b858b87b6..f6f264804e7d 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegration
(spark) branch master updated: [SPARK-47538][BUILD] Remove `commons-logging` dependency
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e22ddcbd852c [SPARK-47538][BUILD] Remove `commons-logging` dependency e22ddcbd852c is described below commit e22ddcbd852c95375d39fd6074627e1b5a91c6e7 Author: Dongjoon Hyun AuthorDate: Sun Mar 24 23:16:50 2024 -0700 [SPARK-47538][BUILD] Remove `commons-logging` dependency ### What changes were proposed in this pull request? This PR aims to remove `commons-logging` dependency in favor of `jcl-over-slf4j`. ### Why are the changes needed? - https://slf4j.org/legacy.html#jclOverSLF4J > To ease migration to SLF4J from JCL, SLF4J distributions include the jar file jcl-over-slf4j.jar. This jar file is intended as a drop-in replacement for JCL version 1.1.1. It implements the public API of JCL but using SLF4J underneath, hence the name "JCL over SLF4J." ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45687 from dongjoon-hyun/commons-logging. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- LICENSE-binary| 1 - NOTICE-binary | 8 connector/kafka-0-10-sql/pom.xml | 4 connector/kafka-0-10/pom.xml | 4 dev/deps/spark-deps-hadoop-3-hive-2.3 | 1 - pom.xml | 24 ++-- sql/hive-thriftserver/pom.xml | 6 ++ 7 files changed, 32 insertions(+), 16 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index 2073d85246b6..b9e7820c0baf 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -325,7 +325,6 @@ commons-cli:commons-cli commons-dbcp:commons-dbcp commons-io:commons-io commons-lang:commons-lang -commons-logging:commons-logging commons-net:commons-net commons-pool:commons-pool io.fabric8:zjsonpatch diff --git a/NOTICE-binary b/NOTICE-binary index ef2dba45055a..5f1c1c617c36 100644 --- a/NOTICE-binary +++ b/NOTICE-binary @@ -271,14 +271,6 @@ benchmarking framework, which can be obtained at: * HOMEPAGE: * https://github.com/google/caliper -This product optionally depends on 'Apache Commons Logging', a logging -framework, which can be obtained at: - - * LICENSE: -* license/LICENSE.commons-logging.txt (Apache License 2.0) - * HOMEPAGE: -* http://commons.apache.org/logging/ - This product optionally depends on 'Apache Log4J', a logging framework, which can be obtained at: diff --git a/connector/kafka-0-10-sql/pom.xml b/connector/kafka-0-10-sql/pom.xml index e22a57354b89..35f58134f1a8 100644 --- a/connector/kafka-0-10-sql/pom.xml +++ b/connector/kafka-0-10-sql/pom.xml @@ -116,6 +116,10 @@ com.fasterxml.jackson.core jackson-annotations + + commons-logging + commons-logging + diff --git a/connector/kafka-0-10/pom.xml b/connector/kafka-0-10/pom.xml index 6a71a5d446c8..1b26839a371c 100644 --- a/connector/kafka-0-10/pom.xml +++ b/connector/kafka-0-10/pom.xml @@ -92,6 +92,10 @@ com.fasterxml.jackson.core jackson-annotations + + commons-logging + commons-logging + diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index d39c92f3fc37..2ffef88dbe7e 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -46,7 +46,6 @@ commons-dbcp/1.4//commons-dbcp-1.4.jar commons-io/2.15.1//commons-io-2.15.1.jar commons-lang/2.6//commons-lang-2.6.jar commons-lang3/3.14.0//commons-lang3-3.14.0.jar -commons-logging/1.1.3//commons-logging-1.1.3.jar commons-math3/3.6.1//commons-math3-3.6.1.jar commons-pool/1.5.4//commons-pool-1.5.4.jar commons-text/1.11.0//commons-text-1.11.0.jar diff --git a/pom.xml b/pom.xml index 5a878a1c3319..8e68ad7346f8 100644 --- a/pom.xml +++ b/pom.xml @@ -651,12 +651,6 @@ commons-beanutils 1.9.4 - -commons-logging -commons-logging - -1.1.3 - org.apache.ivy ivy @@ -671,6 +665,12 @@ org.apache.httpcomponents httpclient ${commons.httpclient.version} + + +commons-logging +commons-logging + + org.apache.httpcomponents @@ -721,6 +721,12 @@ htmlunit3-driver ${htmlunit3-driver.version} test + + +commons-logging +commons-logging + + @@ -1
(spark) branch master updated: [SPARK-47537][SQL] Fix error data type mapping on MySQL Connector/J
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 824d67052c75 [SPARK-47537][SQL] Fix error data type mapping on MySQL Connector/J 824d67052c75 is described below commit 824d67052c7542594fe98405b5062593d90233ee Author: Kent Yao AuthorDate: Sun Mar 24 23:04:44 2024 -0700 [SPARK-47537][SQL] Fix error data type mapping on MySQL Connector/J ### What changes were proposed in this pull request? This PR fixes: - BIT(n>1) is wrongly mapping to boolean instead of long for MySQL Connector/J. This is because we only have a case branch for Maria Connector/J. - MySQL Docker Integration Tests were using Maria Connector/J, not MySQL Connector/J ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45689 from yaooqinn/SPARK-47537. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLDatabaseOnDocker.scala | 4 +- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 45 ++ .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 29 +++--- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 5 +++ 4 files changed, 67 insertions(+), 16 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala index 87b13a06d965..568eb5f10973 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala @@ -26,6 +26,6 @@ class MySQLDatabaseOnDocker extends DatabaseOnDocker { override val jdbcPort: Int = 3306 override def getJdbcUrl(ip: String, port: Int): String = -s"jdbc:mysql://$ip:$port/" + - s"mysql?user=root&password=rootpass&allowPublicKeyRetrieval=true&useSSL=false" + s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass&allowPublicKeyRetrieval=true" + + s"&useSSL=false&disableMariaDbDriver" } diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 921e63acf7e1..09eb99c25227 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -22,9 +22,10 @@ import java.sql.{Connection, Date, Timestamp} import java.time.LocalDateTime import java.util.Properties +import scala.util.Using + import org.apache.spark.sql.Row import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._ -import org.apache.spark.sql.types.{BooleanType, MetadataBuilder, StructType} import org.apache.spark.tags.DockerTest /** @@ -85,6 +86,16 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { .executeUpdate() } + def testConnection(): Unit = { +Using.resource(getConnection()) { conn => + assert(conn.getClass.getName === "com.mysql.cj.jdbc.ConnectionImpl") +} + } + + test("SPARK-47537: ensure use the right jdbc driver") { +testConnection() + } + test("Basic test") { val df = sqlContext.read.jdbc(jdbcUrl, "tbl", new Properties) val rows = df.collect() @@ -246,13 +257,6 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { checkAnswer(df, Row(true, true, true)) df.write.mode("append").jdbc(jdbcUrl, "bools", new Properties) checkAnswer(df, Seq(Row(true, true, true), Row(true, true, true))) -val mb = new MetadataBuilder() - .putBoolean("isTimestampNTZ", false) - .putLong("scale", 0) -assert(df.schema === new StructType() - .add("b1", BooleanType, nullable = true, mb.putBoolean("isSigned", true).build()) - .add("b2", BooleanType, nullable = true, mb.putBoolean("isSigned", false).build()) - .add("b3", BooleanType, nullable = true, mb.putBoolean("isSigned", true).build())) } test("SPARK-47515: Save TimestampNTZType as DATETIME in MySQL") { @@ -272,3 +276,28 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { checkAnswe
(spark) branch master updated (b34a44f175aa -> d8d119a21e07)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b34a44f175aa [SPARK-47535][INFRA] Update `publish_snapshot.yml` to publish twice per day add d8d119a21e07 [SPARK-47536][BUILD] Upgrade `jmock-junit5` to 2.13.1 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (99fb84b7ad27 -> b34a44f175aa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 99fb84b7ad27 [SPARK-47534][SQL] Move `o.a.s.variant` to `o.a.s.types.variant` add b34a44f175aa [SPARK-47535][INFRA] Update `publish_snapshot.yml` to publish twice per day No new revisions were added by this update. Summary of changes: .github/workflows/publish_snapshot.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47534][SQL] Move `o.a.s.variant` to `o.a.s.types.variant`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 99fb84b7ad27 [SPARK-47534][SQL] Move `o.a.s.variant` to `o.a.s.types.variant` 99fb84b7ad27 is described below commit 99fb84b7ad276114b1ab97bd71704d4bdc163a40 Author: Dongjoon Hyun AuthorDate: Sun Mar 24 17:18:12 2024 -0700 [SPARK-47534][SQL] Move `o.a.s.variant` to `o.a.s.types.variant` ### What changes were proposed in this pull request? According to https://github.com/apache/spark/pull/45479#pullrequestreview-1946939461, this PR aims to rename `variant` package and the corresponding test suite like the following. ``` - package org.apache.spark.variant; + package org.apache.spark.types.variant; ``` ``` $ git diff master --stat common/variant/src/main/java/org/apache/spark/{ => types}/variant/Variant.java | 2 +- common/variant/src/main/java/org/apache/spark/{ => types}/variant/VariantBuilder.java| 4 ++-- common/variant/src/main/java/org/apache/spark/{ => types}/variant/VariantSizeLimitException.java | 2 +- common/variant/src/main/java/org/apache/spark/{ => types}/variant/VariantUtil.java | 2 +- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala | 2 +- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/{ => variant}/VariantExpressionSuite.scala | 6 +++--- 6 files changed, 9 insertions(+), 9 deletions(-) ``` ### Why are the changes needed? To make it clear that `variant` package is related to be a type. ### Does this PR introduce _any_ user-facing change? No. This package is new in Apache Spark 4.0.0. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45685 from dongjoon-hyun/SPARK-47534. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../src/main/java/org/apache/spark/{ => types}/variant/Variant.java | 2 +- .../java/org/apache/spark/{ => types}/variant/VariantBuilder.java | 4 ++-- .../apache/spark/{ => types}/variant/VariantSizeLimitException.java | 2 +- .../main/java/org/apache/spark/{ => types}/variant/VariantUtil.java | 2 +- .../spark/sql/catalyst/expressions/variant/variantExpressions.scala | 2 +- .../catalyst/expressions/{ => variant}/VariantExpressionSuite.scala | 6 +++--- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/common/variant/src/main/java/org/apache/spark/variant/Variant.java b/common/variant/src/main/java/org/apache/spark/types/variant/Variant.java similarity index 96% rename from common/variant/src/main/java/org/apache/spark/variant/Variant.java rename to common/variant/src/main/java/org/apache/spark/types/variant/Variant.java index 11c82d3fe1c0..e43b7ec8ac54 100644 --- a/common/variant/src/main/java/org/apache/spark/variant/Variant.java +++ b/common/variant/src/main/java/org/apache/spark/types/variant/Variant.java @@ -15,7 +15,7 @@ * limitations under the License. */ -package org.apache.spark.variant; +package org.apache.spark.types.variant; /** * This class is structurally equivalent to {@link org.apache.spark.unsafe.types.VariantVal}. We diff --git a/common/variant/src/main/java/org/apache/spark/variant/VariantBuilder.java b/common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java similarity index 99% rename from common/variant/src/main/java/org/apache/spark/variant/VariantBuilder.java rename to common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java index 70227d67746d..21a12cbe9d71 100644 --- a/common/variant/src/main/java/org/apache/spark/variant/VariantBuilder.java +++ b/common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java @@ -15,7 +15,7 @@ * limitations under the License. */ -package org.apache.spark.variant; +package org.apache.spark.types.variant; import java.io.IOException; import java.math.BigDecimal; @@ -32,7 +32,7 @@ import com.fasterxml.jackson.core.JsonParseException; import com.fasterxml.jackson.core.JsonToken; import com.fasterxml.jackson.core.exc.InputCoercionException; -import static org.apache.spark.variant.VariantUtil.*; +import static org.apache.spark.types.variant.VariantUtil.*; /** * Build variant value and metadata by parsing JSON values. diff --git a/common/variant/src/main/java/org/apache/spark/variant/VariantSizeLimitException.java b/common/variant/src/main/java/org/apache/spark/types/variant/VariantSizeLimitException.java similarity index 96% rename from common/variant/
(spark) branch master updated: [SPARK-47533][BUILD] Migrate scalafmt dialect to `scala213`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 310fd6517887 [SPARK-47533][BUILD] Migrate scalafmt dialect to `scala213` 310fd6517887 is described below commit 310fd65178876aa613448d86e64bee16ea3f888f Author: panbingkun AuthorDate: Sun Mar 24 16:41:19 2024 -0700 [SPARK-47533][BUILD] Migrate scalafmt dialect to `scala213` ### What changes were proposed in this pull request? The pr aims to migrate `scalafmt dialect` from `scala212` to `scala213`. ### Why are the changes needed? In the Spark version `4.0.0`, the version `scala2.12 ` is no longer supported. During our migration from `scala2.12` to `scala2.13`, this should be migrated synchronously. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45683 from panbingkun/scalafmt_dialect. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/.scalafmt.conf | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf index 6d1ab0243dc5..9a01136dfaf8 100644 --- a/dev/.scalafmt.conf +++ b/dev/.scalafmt.conf @@ -26,10 +26,5 @@ optIn = { danglingParentheses.preset = false docstrings.style = Asterisk maxColumn = 98 -runner.dialect = scala212 -fileOverride { - "glob:**/src/**/scala-2.13/**.scala" { -runner.dialect = scala213 - } -} +runner.dialect = scala213 version = 3.8.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47503][SQL][3.5] Make makeDotNode escape graph node name always
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 2016db66e578 [SPARK-47503][SQL][3.5] Make makeDotNode escape graph node name always 2016db66e578 is described below commit 2016db66e578a0459672aad2a82a53a69601eaec Author: alexey AuthorDate: Sun Mar 24 15:17:10 2024 -0700 [SPARK-47503][SQL][3.5] Make makeDotNode escape graph node name always ### What changes were proposed in this pull request? This is a backport of https://github.com/apache/spark/pull/45640 To prevent corruption of dot file a node name should be escaped even if there is no metrics to display ### Why are the changes needed? This pr fixes a bug in spark history server which fails to display query for cached JDBC relation named in quotes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45684 from alex35736/branch-3.5. Authored-by: alexey Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/ui/SparkPlanGraph.scala| 3 +- .../sql/execution/ui/SparkPlanGraphSuite.scala | 44 ++ 2 files changed, 46 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala index 1504207d39cb..668cece53335 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala @@ -189,7 +189,8 @@ class SparkPlanGraphNode( } else { // SPARK-30684: when there is no metrics, add empty lines to increase the height of the node, // so that there won't be gaps between an edge and a small node. - s""" $id [labelType="html" label="$name"];""" + val escapedName = StringEscapeUtils.escapeJava(name) + s""" $id [labelType="html" label="$escapedName"];""" } } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala new file mode 100644 index ..88237cd09ac7 --- /dev/null +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.ui + +import org.apache.spark.SparkFunSuite + +class SparkPlanGraphSuite extends SparkFunSuite { + test("SPARK-47503: name of a node should be escaped even if there is no metrics") { +val planGraphNode = new SparkPlanGraphNode( + id = 24, + name = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1]", + desc = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1] " + +"[ticket_no#0] PushedFilters: [], ReadSchema: struct", + metrics = List( +SQLPlanMetric( + name = "number of output rows", + accumulatorId = 75, + metricType = "sum" +), +SQLPlanMetric( + name = "JDBC query execution time", + accumulatorId = 35, + metricType = "nsTiming"))) +val dotNode = planGraphNode.makeDotNode(Map.empty[Long, String]) +val expectedDotNode = " 24 [labelType=\"html\" label=\"" + + "Scan JDBCRelation(\\\"test-schema\\\".tickets) [numPartitions=1]\"];" + +assertResult(expectedDotNode)(dotNode) + } +} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47528][SQL] Add UserDefinedType support to DataTypeUtils.canWrite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 767a52d5db35 [SPARK-47528][SQL] Add UserDefinedType support to DataTypeUtils.canWrite 767a52d5db35 is described below commit 767a52d5db354786d5ca07ddc4192d0eb8e8be80 Author: Liang-Chi Hsieh AuthorDate: Sun Mar 24 01:27:52 2024 -0700 [SPARK-47528][SQL] Add UserDefinedType support to DataTypeUtils.canWrite ### What changes were proposed in this pull request? This patch adds `UserDefinedType` handling to `DataTypeUtils.canWrite`. ### Why are the changes needed? Our customer hits an issue recently when they tries to save a DataFrame containing some UDTs as table (`saveAsTable`). The error looks like: ``` - Cannot write 'xxx': struct<...> is incompatible with struct<...> ``` The catalog strings between two sides are actually same which makes the customer confused. It is because `DataTypeUtils.canWrite` doesn't handle `UserDefinedType`. If the `UserDefinedType`'s underlying sql type is same as read side, `canWrite` should return true for two sides. ### Does this PR introduce _any_ user-facing change? Yes. Write side column with `UserDefinedType` can be written into read side column with same sql data type. ### How was this patch tested? Unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #45678 from viirya/udt_dt_write. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/types/DataTypeUtils.scala | 14 ++- .../types/DataTypeWriteCompatibilitySuite.scala| 134 + 2 files changed, 147 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala index 01fb86bf2957..cf8e903f03a3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala @@ -22,7 +22,7 @@ import org.apache.spark.sql.catalyst.util.TypeUtils.toSQLId import org.apache.spark.sql.errors.QueryCompilationErrors import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy.{ANSI, STRICT} -import org.apache.spark.sql.types.{ArrayType, AtomicType, DataType, Decimal, DecimalType, MapType, NullType, StructField, StructType} +import org.apache.spark.sql.types.{ArrayType, AtomicType, DataType, Decimal, DecimalType, MapType, NullType, StructField, StructType, UserDefinedType} import org.apache.spark.sql.types.DecimalType.{forType, fromDecimal} object DataTypeUtils { @@ -64,6 +64,8 @@ object DataTypeUtils { * - Both types are structs and have the same number of fields. The type and nullability of each * field from read/write is compatible. If byName is true, the name of each field from * read/write needs to be the same. + * - It is user defined type and its underlying sql type is same as the read type, or the read + * type is user defined type and its underlying sql type is same as the write type. * - Both types are atomic and the write type can be safely cast to the read type. * * Extra fields in write-side structs are not allowed to avoid accidentally writing data that @@ -180,6 +182,16 @@ object DataTypeUtils { case (w, r) if DataTypeUtils.sameType(w, r) && !w.isInstanceOf[NullType] => true + // If write-side data type is a user-defined type, check with its underlying data type. + case (w, r) if w.isInstanceOf[UserDefinedType[_]] && !r.isInstanceOf[UserDefinedType[_]] => +canWrite(tableName, w.asInstanceOf[UserDefinedType[_]].sqlType, r, byName, resolver, + context, storeAssignmentPolicy, addError) + + // If read-side data type is a user-defined type, check with its underlying data type. + case (w, r) if r.isInstanceOf[UserDefinedType[_]] && !w.isInstanceOf[UserDefinedType[_]] => +canWrite(tableName, w, r.asInstanceOf[UserDefinedType[_]].sqlType, byName, resolver, + context, storeAssignmentPolicy, addError) + case (w, r) => throw QueryCompilationErrors.incompatibleDataToTableCannotSafelyCastError( tableName, context, w.catalogString, r.catalogString diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeWriteCompatibilitySuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeWriteCompatibilitySuite.scala index 7aaa69a0a5dd..8c9196cc33ca
(spark) branch master updated: [SPARK-47526][BUILD] Upgrade `netty` to 4.1.108.Final and `netty-tcnative` to 2.0.65.Final
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ba175e3f6ae9 [SPARK-47526][BUILD] Upgrade `netty` to 4.1.108.Final and `netty-tcnative` to 2.0.65.Final ba175e3f6ae9 is described below commit ba175e3f6ae92b584c1e38083a6792ab0bdb726d Author: panbingkun AuthorDate: Sun Mar 24 00:23:54 2024 -0700 [SPARK-47526][BUILD] Upgrade `netty` to 4.1.108.Final and `netty-tcnative` to 2.0.65.Final ### What changes were proposed in this pull request? The pr aims to upgrade: - `netty` from `4.1.107.Final` to `4.1.108.Final`. - `netty-tcnative` from `2.0.62.Final` to `2.0.65.Final`. ### Why are the changes needed? - `netty` 1.The release notes: https://netty.io/news/2024/03/21/4-1-108-Final.html 2.To bring some bug fixes, eg: Epoll: Fix possible Classloader deadlock caused by loading class via JNI ([#13879](https://github.com/netty/netty/issues/13879)) - `netty-tcnative` 2.0.62.Final VS 2.0.65.Final https://github.com/netty/netty-tcnative/compare/netty-tcnative-parent-2.0.62.Final...netty-tcnative-parent-2.0.65.Final ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45676 from panbingkun/SPARK-47526. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +-- pom.xml | 4 +-- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index eed298bc19c0..d39c92f3fc37 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -197,32 +197,32 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar metrics-json/4.2.25//metrics-json-4.2.25.jar metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.107.Final//netty-all-4.1.107.Final.jar -netty-buffer/4.1.107.Final//netty-buffer-4.1.107.Final.jar -netty-codec-http/4.1.107.Final//netty-codec-http-4.1.107.Final.jar -netty-codec-http2/4.1.107.Final//netty-codec-http2-4.1.107.Final.jar -netty-codec-socks/4.1.107.Final//netty-codec-socks-4.1.107.Final.jar -netty-codec/4.1.107.Final//netty-codec-4.1.107.Final.jar -netty-common/4.1.107.Final//netty-common-4.1.107.Final.jar -netty-handler-proxy/4.1.107.Final//netty-handler-proxy-4.1.107.Final.jar -netty-handler/4.1.107.Final//netty-handler-4.1.107.Final.jar -netty-resolver/4.1.107.Final//netty-resolver-4.1.107.Final.jar +netty-all/4.1.108.Final//netty-all-4.1.108.Final.jar +netty-buffer/4.1.108.Final//netty-buffer-4.1.108.Final.jar +netty-codec-http/4.1.108.Final//netty-codec-http-4.1.108.Final.jar +netty-codec-http2/4.1.108.Final//netty-codec-http2-4.1.108.Final.jar +netty-codec-socks/4.1.108.Final//netty-codec-socks-4.1.108.Final.jar +netty-codec/4.1.108.Final//netty-codec-4.1.108.Final.jar +netty-common/4.1.108.Final//netty-common-4.1.108.Final.jar +netty-handler-proxy/4.1.108.Final//netty-handler-proxy-4.1.108.Final.jar +netty-handler/4.1.108.Final//netty-handler-4.1.108.Final.jar +netty-resolver/4.1.108.Final//netty-resolver-4.1.108.Final.jar netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar -netty-tcnative-boringssl-static/2.0.62.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.62.Final-linux-aarch_64.jar -netty-tcnative-boringssl-static/2.0.62.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.62.Final-linux-x86_64.jar -netty-tcnative-boringssl-static/2.0.62.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.62.Final-osx-aarch_64.jar -netty-tcnative-boringssl-static/2.0.62.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.62.Final-osx-x86_64.jar -netty-tcnative-boringssl-static/2.0.62.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.62.Final-windows-x86_64.jar -netty-tcnative-classes/2.0.62.Final//netty-tcnative-classes-2.0.62.Final.jar -netty-transport-classes-epoll/4.1.107.Final//netty-transport-classes-epoll-4.1.107.Final.jar -netty-transport-classes-kqueue/4.1.107.Final//netty-transport-classes-kqueue-4.1.107.Final.jar -netty-transport-native-epoll/4.1.107.Final/linux-aarch_64/netty-transport-native-epoll-4.1.107.Final-linux-aarch_64.jar -netty-transport-native-epoll/4.1.107.Final/linux-riscv64/netty-transport-native-epoll-4.1.107.Final-linux-riscv64.jar -netty-transport-native-epoll/4.1.107.Final/linux-x86_64/netty-transport-native-epoll-4.1.107.Final-linux-x86_64.jar -netty-transport-native-kqueue/4.1.107.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.107.Final-osx-aarch_64.jar -netty-transport-native-kqueue/4.1.107.Final/osx
(spark) branch master updated: [SPARK-47503][SQL] Make `makeDotNode` escape graph node name always
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 15ea9de88b8d [SPARK-47503][SQL] Make `makeDotNode` escape graph node name always 15ea9de88b8d is described below commit 15ea9de88b8d8a87daa6abbc8e80c439e2d38e03 Author: Alexey AuthorDate: Sun Mar 24 00:16:05 2024 -0700 [SPARK-47503][SQL] Make `makeDotNode` escape graph node name always ### What changes were proposed in this pull request? To prevent corruption of dot file a node name should be escaped even if there is no metrics to display ### Why are the changes needed? This pr fixes a bug in spark history server which fails to display query for cached JDBC relation named in quotes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45640 from alex35736/SPARK-47503. Lead-authored-by: Alexey Co-authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/ui/SparkPlanGraph.scala| 2 +- .../sql/execution/ui/SparkPlanGraphSuite.scala | 46 ++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala index 11ba3cd05e26..f94d7dc7ab4c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala @@ -189,7 +189,7 @@ class SparkPlanGraphNode( } else { // SPARK-30684: when there is no metrics, add empty lines to increase the height of the node, // so that there won't be gaps between an edge and a small node. - s"$name" + s"${StringEscapeUtils.escapeJava(name)}" } s""" $id [id="$nodeId" labelType="html" label="$labelStr" tooltip="$tooltip"];""" diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala new file mode 100644 index ..975dbc1a1d8d --- /dev/null +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.ui + +import org.apache.spark.SparkFunSuite + +class SparkPlanGraphSuite extends SparkFunSuite { + test("SPARK-47503: name of a node should be escaped even if there is no metrics") { +val planGraphNode = new SparkPlanGraphNode( + id = 24, + name = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1]", + desc = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1] " + +"[ticket_no#0] PushedFilters: [], ReadSchema: struct", + metrics = List( +SQLPlanMetric( + name = "number of output rows", + accumulatorId = 75, + metricType = "sum" +), +SQLPlanMetric( + name = "JDBC query execution time", + accumulatorId = 35, + metricType = "nsTiming"))) +val dotNode = planGraphNode.makeDotNode(Map.empty[Long, String]) +val expectedDotNode = " 24 [id=\"node24\" labelType=\"html\" label=\"" + + "Scan JDBCRelation(\\\"test-schema\\\".tickets) [numPartitions=1]\" " + + "tooltip=\"Scan JDBCRelation(\\\"test-schema\\\".tickets) [numPartitions=1] [ticket_no#0] " + + "PushedFilters: [], ReadSchema: struct\"];" + +assertResult(expectedDotNode)(dotNode) + } +} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47497][SQL] Make `to_csv` support the output of `array/struct/map/binary` as pretty strings
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 18bb6a3aea82 [SPARK-47497][SQL] Make `to_csv` support the output of `array/struct/map/binary` as pretty strings 18bb6a3aea82 is described below commit 18bb6a3aea826c2e279457ab72ce6656646cda69 Author: panbingkun AuthorDate: Sun Mar 24 00:13:24 2024 -0700 [SPARK-47497][SQL] Make `to_csv` support the output of `array/struct/map/binary` as pretty strings ### What changes were proposed in this pull request? The pr aims make `to_csv` - support the output of `array/struct/map/binary` as `pretty strings`. - not support `variant`. ### Why are the changes needed? This PR was generated from follow-up comment suggestions https://github.com/apache/spark/pull/44665#issuecomment-2011239475, https://github.com/apache/spark/assets/15246973/04dd1497-da42-4b03-b21d-b041ead86f87";> ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? - Update existed UT. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45657 from panbingkun/SPARK-47497. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/functions/builtin.py| 12 +- .../sql/catalyst/csv/UnivocityGenerator.scala | 126 ++-- .../sql/catalyst/expressions/csvExpressions.scala | 35 - .../org/apache/spark/sql/CsvFunctionsSuite.scala | 165 + 4 files changed, 308 insertions(+), 30 deletions(-) diff --git a/python/pyspark/sql/functions/builtin.py b/python/pyspark/sql/functions/builtin.py index a31465a77873..99a2375965c2 100644 --- a/python/pyspark/sql/functions/builtin.py +++ b/python/pyspark/sql/functions/builtin.py @@ -15591,12 +15591,12 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col >>> from pyspark.sql import Row, functions as sf >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))] >>> df = spark.createDataFrame(data, ("key", "value")) ->>> df.select(sf.to_csv(df.value)).show(truncate=False) # doctest: +SKIP -+---+ -|to_csv(value) | -+---+ -|2,Alice,"[100,200,300]"| -+---+ +>>> df.select(sf.to_csv(df.value)).show(truncate=False) ++-+ +|to_csv(value)| ++-+ +|2,Alice,"[100, 200, 300]"| ++-+ Example 3: Converting a StructType with null values to a CSV string diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala index b61652f4b523..f10a53bde5dd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala @@ -22,7 +22,8 @@ import java.io.Writer import com.univocity.parsers.csv.CsvWriter import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils, IntervalStringStyles, IntervalUtils, TimestampFormatter} +import org.apache.spark.sql.catalyst.expressions.SpecializedGetters +import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils, IntervalStringStyles, IntervalUtils, SparkStringUtils, TimestampFormatter} import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ @@ -36,9 +37,9 @@ class UnivocityGenerator( writerSettings.setHeaders(schema.fieldNames: _*) private val gen = new CsvWriter(writer, writerSettings) - // A `ValueConverter` is responsible for converting a value of an `InternalRow` to `String`. - // When the value is null, this converter should not be called. - private type ValueConverter = (InternalRow, Int) => String + // A `ValueConverter` is responsible for converting a value of an `SpecializedGetters` + // to `String`. When the value is null, this converter should not be called. + private type ValueConverter = (SpecializedGetters, Int) => String // `ValueConverter`s for all values in the fields of the schema private val valueConverters: Array[ValueConverter] = @@ -64,33 +65,126 @@ class UnivocityGenerator( private val nullAsQuotedEmptyString = SQLConf.get.getConf(SQLConf.LEGACY_NULL_VALUE_WRITTEN_AS_QUOTED_EMPTY_STRING_CSV) - @scala.annotation.tailrec private
(spark) branch master updated: [SPARK-47531][BUILD] Upgrade `Arrow` to 15.0.2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 063638b4d5db [SPARK-47531][BUILD] Upgrade `Arrow` to 15.0.2 063638b4d5db is described below commit 063638b4d5dbc6732226cc98b8557db7d3578755 Author: panbingkun AuthorDate: Sun Mar 24 00:11:35 2024 -0700 [SPARK-47531][BUILD] Upgrade `Arrow` to 15.0.2 ### What changes were proposed in this pull request? The pr aims to upgrade `arrow-memory-netty` from `15.0.0` to `15.0.2`. ### Why are the changes needed? The release notes: https://arrow.apache.org/release/15.0.2.html https://arrow.apache.org/release/15.0.1.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45682 from panbingkun/SPARK-47531. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 pom.xml | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 903c7a245af3..eed298bc19c0 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -16,10 +16,10 @@ antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar aopalliance-repackaged/3.0.3//aopalliance-repackaged-3.0.3.jar arpack/3.0.3//arpack-3.0.3.jar arpack_combined_all/0.1//arpack_combined_all-0.1.jar -arrow-format/15.0.0//arrow-format-15.0.0.jar -arrow-memory-core/15.0.0//arrow-memory-core-15.0.0.jar -arrow-memory-netty/15.0.0//arrow-memory-netty-15.0.0.jar -arrow-vector/15.0.0//arrow-vector-15.0.0.jar +arrow-format/15.0.2//arrow-format-15.0.2.jar +arrow-memory-core/15.0.2//arrow-memory-core-15.0.2.jar +arrow-memory-netty/15.0.2//arrow-memory-netty-15.0.2.jar +arrow-vector/15.0.2//arrow-vector-15.0.2.jar audience-annotations/0.12.0//audience-annotations-0.12.0.jar avro-ipc/1.11.3//avro-ipc-1.11.3.jar avro-mapred/1.11.3//avro-mapred-1.11.3.jar diff --git a/pom.xml b/pom.xml index 637aa50f0314..83dbe5c23789 100644 --- a/pom.xml +++ b/pom.xml @@ -225,7 +225,7 @@ If you are changing Arrow version specification, please check ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too. --> -15.0.0 +15.0.2 3.0.0-M1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (9d6b9f7305f2 -> 11f5d3fa10b3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9d6b9f7305f2 [SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs add 11f5d3fa10b3 [SPARK-47530][BUILD][TESTS] Add `bcpkix-jdk18on` test dependencies to `hive` module for Hadoop 3.4.0 No new revisions were added by this update. Summary of changes: sql/hive/pom.xml | 5 + 1 file changed, 5 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9d6b9f7305f2 [SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs 9d6b9f7305f2 is described below commit 9d6b9f7305f208a8b66f5f390e063db86b5678cd Author: panbingkun AuthorDate: Sat Mar 23 18:27:29 2024 -0700 [SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs ### What changes were proposed in this pull request? This PR aims to update `Hadoop` dependency in some docs. ### Why are the changes needed? Currently Spark codebase master is using Apache Hadoop `3.4.0` by default. ### Does this PR introduce _any_ user-facing change? No. This is a doc-only change. ### How was this patch tested? N/A. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45679 from panbingkun/minor_use_hadoop_3.4. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- assembly/README | 2 +- docs/building-spark.md | 2 +- docs/running-on-kubernetes.md| 2 +- resource-managers/kubernetes/integration-tests/README.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/assembly/README b/assembly/README index 3dde243d3e69..ad1305c5b4d5 100644 --- a/assembly/README +++ b/assembly/README @@ -9,4 +9,4 @@ This module is off by default. To activate it specify the profile in the command If you need to build an assembly for a different version of Hadoop the hadoop-version system property needs to be set as in this example: - -Dhadoop.version=3.3.6 + -Dhadoop.version=3.4.0 diff --git a/docs/building-spark.md b/docs/building-spark.md index 3d12b521c024..56efbc1a0110 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -79,7 +79,7 @@ from `hadoop.version`. Example: -./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package +./build/mvn -Pyarn -Dhadoop.version=3.4.0 -DskipTests clean package ## Building With Hive and JDBC Support diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 8c92bd9f8cb3..01e9d6382c18 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -236,7 +236,7 @@ A typical example of this using S3 is via passing the following options: ``` ... ---packages org.apache.hadoop:hadoop-aws:3.2.2 +--packages org.apache.hadoop:hadoop-aws:3.4.0 --conf spark.kubernetes.file.upload.path=s3a:///path --conf spark.hadoop.fs.s3a.access.key=... --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem diff --git a/resource-managers/kubernetes/integration-tests/README.md b/resource-managers/kubernetes/integration-tests/README.md index c0d92d988b1a..f8070ec4ce93 100644 --- a/resource-managers/kubernetes/integration-tests/README.md +++ b/resource-managers/kubernetes/integration-tests/README.md @@ -130,7 +130,7 @@ properties to Maven. For example: mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.13 \ -Pkubernetes -Pkubernetes-integration-tests \ --Phadoop-3 -Dhadoop.version=3.3.6 \ +-Phadoop-3 -Dhadoop.version=3.4.0 \ -Dspark.kubernetes.test.sparkTgz=spark-4.0.0-SNAPSHOT-bin-example.tgz \ -Dspark.kubernetes.test.imageTag=sometag \ -Dspark.kubernetes.test.imageRepo=docker.io/somerepo \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b9335b90280a -> c29d132aeb5d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b9335b90280a [SPARK-47510][INFRA] Fix `DSTREAM` label pattern in `labeler.yml` add c29d132aeb5d [SPARK-47495][CORE] Fix primary resource jar added to spark.jars twice under k8s cluster mode No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/deploy/SparkSubmit.scala | 3 ++- .../org/apache/spark/deploy/SparkSubmitSuite.scala| 19 +++ 2 files changed, 21 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (39500a315166 -> b9335b90280a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 39500a315166 [SPARK-47522][SQL][FOLLOWUP] Add float(p) values for MySQLIntegrationSuite add b9335b90280a [SPARK-47510][INFRA] Fix `DSTREAM` label pattern in `labeler.yml` No new revisions were added by this update. Summary of changes: .github/labeler.yml | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47522][SQL][FOLLOWUP] Add float(p) values for MySQLIntegrationSuite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 39500a315166 [SPARK-47522][SQL][FOLLOWUP] Add float(p) values for MySQLIntegrationSuite 39500a315166 is described below commit 39500a315166d8e342b678ef3038995a03ce84d6 Author: Kent Yao AuthorDate: Fri Mar 22 13:23:51 2024 -0700 [SPARK-47522][SQL][FOLLOWUP] Add float(p) values for MySQLIntegrationSuite ### What changes were proposed in this pull request? Add float(p) values for MySQLIntegrationSuite ### Why are the changes needed? test improvements ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test cases ### Was this patch authored or co-authored using generative AI tooling? no Closes #45672 from yaooqinn/SPARK-47522-F. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index cd3001311b03..921e63acf7e1 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -79,8 +79,10 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("INSERT INTO strings VALUES ('the', 'quick', 'brown', 'fox', " + "'jumps', 'over', 'the', 'lazy', 'dog', '{\"status\": \"merrily\"}')").executeUpdate() -conn.prepareStatement("CREATE TABLE floats (f1 FLOAT, f2 FLOAT UNSIGNED)").executeUpdate() -conn.prepareStatement("INSERT INTO floats VALUES (1.23, 4.56)").executeUpdate() +conn.prepareStatement("CREATE TABLE floats (f1 FLOAT, f2 FLOAT(10), f3 FLOAT(53), " + + "f4 FLOAT UNSIGNED, f5 FLOAT(10) UNSIGNED, f6 FLOAT(53) UNSIGNED)").executeUpdate() +conn.prepareStatement("INSERT INTO floats VALUES (1.23, 4.56, 7.89, 1.23, 4.56, 7.89)") + .executeUpdate() } test("Basic test") { @@ -267,6 +269,6 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { test("SPARK-47522: Read MySQL FLOAT as FloatType to keep consistent with the write side") { val df = spark.read.jdbc(jdbcUrl, "floats", new Properties) -checkAnswer(df, Row(1.23f, 4.56d)) +checkAnswer(df, Row(1.23f, 4.56f, 7.89d, 1.23d, 4.56d, 7.89d)) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (245669053a34 -> 36126a5c1821)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 245669053a34 [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage add 36126a5c1821 [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala| 12 ++-- docs/sql-migration-guide.md | 1 + .../main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala | 2 ++ .../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 8 4 files changed, 21 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (93f98c0a61dd -> 245669053a34)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 93f98c0a61dd [SPARK-47523][SQL] Replace deprecated `JsonParser#getCurrentName` with `JsonParser#currentName` add 245669053a34 [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/storage/FallbackStorage.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 585845ec9ef8 [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage 585845ec9ef8 is described below commit 585845ec9ef85a7b2ce10882e5e0d391702a0769 Author: maheshbehera AuthorDate: Fri Mar 22 10:44:55 2024 -0700 [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage ### What changes were proposed in this pull request? In method FallbackStorage.open, file open is guarded by Utils.tryWithResource to avoid file handle leakage incase of failure during read. ### Why are the changes needed? To avoid file handle leakage in case of read failure. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes #45663 from maheshk114/SPARK-47521. Authored-by: maheshbehera Signed-off-by: Dongjoon Hyun (cherry picked from commit 245669053a34cb1d4a84689230e5bd1d163be5c6) Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/storage/FallbackStorage.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala index 5aa5c6eff7b2..98ed3167d119 100644 --- a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala +++ b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala @@ -187,15 +187,15 @@ private[spark] object FallbackStorage extends Logging { val name = ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID).name val hash = JavaUtils.nonNegativeHash(name) val dataFile = new Path(fallbackPath, s"$appId/$shuffleId/$hash/$name") -val f = fallbackFileSystem.open(dataFile) val size = nextOffset - offset logDebug(s"To byte array $size") val array = new Array[Byte](size.toInt) val startTimeNs = System.nanoTime() -f.seek(offset) -f.readFully(array) -logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 1000)}ms") -f.close() +Utils.tryWithResource(fallbackFileSystem.open(dataFile)) { f => + f.seek(offset) + f.readFully(array) + logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 1000)}ms") +} new NioManagedBuffer(ByteBuffer.wrap(array)) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 30cb7edecbf0 [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage 30cb7edecbf0 is described below commit 30cb7edecbf0ef7aed1e216ad147ebb318aea09c Author: maheshbehera AuthorDate: Fri Mar 22 10:44:55 2024 -0700 [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage ### What changes were proposed in this pull request? In method FallbackStorage.open, file open is guarded by Utils.tryWithResource to avoid file handle leakage incase of failure during read. ### Why are the changes needed? To avoid file handle leakage in case of read failure. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes #45663 from maheshk114/SPARK-47521. Authored-by: maheshbehera Signed-off-by: Dongjoon Hyun (cherry picked from commit 245669053a34cb1d4a84689230e5bd1d163be5c6) Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/storage/FallbackStorage.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala index eb23fb4b1c84..161120393490 100644 --- a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala +++ b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala @@ -188,15 +188,15 @@ private[spark] object FallbackStorage extends Logging { val name = ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID).name val hash = JavaUtils.nonNegativeHash(name) val dataFile = new Path(fallbackPath, s"$appId/$shuffleId/$hash/$name") -val f = fallbackFileSystem.open(dataFile) val size = nextOffset - offset logDebug(s"To byte array $size") val array = new Array[Byte](size.toInt) val startTimeNs = System.nanoTime() -f.seek(offset) -f.readFully(array) -logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 1000)}ms") -f.close() +Utils.tryWithResource(fallbackFileSystem.open(dataFile)) { f => + f.seek(offset) + f.readFully(array) + logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 1000)}ms") +} new NioManagedBuffer(ByteBuffer.wrap(array)) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47523][SQL] Replace deprecated `JsonParser#getCurrentName` with `JsonParser#currentName`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 93f98c0a61dd [SPARK-47523][SQL] Replace deprecated `JsonParser#getCurrentName` with `JsonParser#currentName` 93f98c0a61dd is described below commit 93f98c0a61ddb66eb777c3940fbf29fc58e2d79b Author: yangjie01 AuthorDate: Fri Mar 22 08:37:09 2024 -0700 [SPARK-47523][SQL] Replace deprecated `JsonParser#getCurrentName` with `JsonParser#currentName` ### What changes were proposed in this pull request? This pr replaces the use of `JsonParser#getCurrentName` with `JsonParser#currentName` in Spark code, as `JsonParser#getCurrentName` has been deprecated since jackson 2.17. https://github.com/FasterXML/jackson-core/blob/8fba680579885bf9cdae72e93f16de557056d6e3/src/main/java/com/fasterxml/jackson/core/JsonParser.java#L1521-L1551 ```java /** * Deprecated alias of {link #currentName()}. * * return Name of the current field in the parsing context * * throws IOException for low-level read issues, or * {link JsonParseException} for decoding problems * * deprecated Since 2.17 use {link #currentName} instead. */ Deprecated public abstract String getCurrentName() throws IOException; /** * Method that can be called to get the name associated with * the current token: for {link JsonToken#FIELD_NAME}s it will * be the same as what {link #getText} returns; * for field values it will be preceding field name; * and for others (array values, root-level values) null. * * return Name of the current field in the parsing context * * throws IOException for low-level read issues, or * {link JsonParseException} for decoding problems * * since 2.10 */ public String currentName() throws IOException { // !!! TODO: switch direction in 2.18 or later return getCurrentName(); } ``` ### Why are the changes needed? Clean up deprecated API usage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45668 from LuciferYang/SPARK-47523. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/expressions/jsonExpressions.scala | 6 +++--- .../org/apache/spark/sql/catalyst/json/JacksonParser.scala | 10 +- .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala | 2 +- .../org/apache/spark/sql/errors/QueryExecutionErrors.scala | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index 9fca09b46a99..b155987242b3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -411,7 +411,7 @@ class GetJsonObjectEvaluator(cachedPath: UTF8String) { p.nextToken() arrayIndex(p, () => evaluatePath(p, g, style, xs))(idx) - case (FIELD_NAME, Named(name) :: xs) if p.getCurrentName == name => + case (FIELD_NAME, Named(name) :: xs) if p.currentName == name => // exact field match if (p.nextToken() != JsonToken.VALUE_NULL) { evaluatePath(p, g, style, xs) @@ -546,7 +546,7 @@ case class JsonTuple(children: Seq[Expression]) while (parser.nextToken() != JsonToken.END_OBJECT) { if (parser.getCurrentToken == JsonToken.FIELD_NAME) { // check to see if this field is desired in the output -val jsonField = parser.getCurrentName +val jsonField = parser.currentName var idx = fieldNames.indexOf(jsonField) if (idx >= 0) { // it is, copy the child tree to the correct location in the output row @@ -1056,7 +1056,7 @@ case class JsonObjectKeys(child: Expression) extends UnaryExpression with Codege // traverse until the end of input and ensure it returns valid key while(parser.nextValue() != null && parser.currentName() != null) { // add current fieldName to the ArrayBuffer - arrayBufferOfKeys += UTF8String.fromString(parser.getCurrentName) + arrayBufferOfKeys += UTF8String.fromString(parser.currentName) // skip all the children of inner object or array parser.skipChildren() diff --git a/sql/catalyst/src/main/scala/
(spark) branch master updated (32dfdd305aec -> d1be4fb61368)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 32dfdd305aec [SPARK-47517][CORE][SQL] Prefer Utils.bytesToString for size display add d1be4fb61368 [SPARK-47517][INFRA][FOLLOWUP] Prevent `byteCountToDisplaySize` via Scalastyle No new revisions were added by this update. Summary of changes: scalastyle-config.xml | 5 + 1 file changed, 5 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS] Fix typo in spark connect overview
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d7be50f122ed [MINOR][DOCS] Fix typo in spark connect overview d7be50f122ed is described below commit d7be50f122ed9eec5551a842e1a61a3503d308ed Author: rrueda AuthorDate: Fri Mar 22 07:41:46 2024 -0700 [MINOR][DOCS] Fix typo in spark connect overview ### What changes were proposed in this pull request? Fix typo in the command to install the `spark-connect-repl` by substituting the `en-dash` char by the `hyphen-minus`. ### Why are the changes needed? It doesn't work with the `en-dash`. ### Does this PR introduce _any_ user-facing change? Only documentation. ### How was this patch tested? The documentation was built and the output checked manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45650 from rorueda/fix-docs-connect. Authored-by: rrueda Signed-off-by: Dongjoon Hyun --- docs/spark-connect-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md index 7a085df86c8d..268155360fcc 100644 --- a/docs/spark-connect-overview.md +++ b/docs/spark-connect-overview.md @@ -224,7 +224,7 @@ For the Scala shell, we use an Ammonite-based REPL that is currently not include To set up the new Scala shell, first download and install [Coursier CLI](https://get-coursier.io/docs/cli-installation). Then, install the REPL using the following command in a terminal window: {% highlight bash %} -cs install –-contrib spark-connect-repl +cs install --contrib spark-connect-repl {% endhighlight %} And now you can start the Ammonite-based Scala REPL/shell to connect to your Spark server like this: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (aea13fca5d57 -> ca44489f4585)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from aea13fca5d57 [SPARK-47500][PYTHON][CONNECT] Factor column name handling out of `plan.py` add ca44489f4585 [SPARK-47499][PYTHON][CONNECT][TESTS] Enable `test_help_command` test No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/connect/test_parity_dataframe.py | 4 ++-- python/pyspark/sql/tests/test_dataframe.py| 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new e57a7d068839 [SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes e57a7d068839 is described below commit e57a7d068839d549afe08b4a79e82d027b56a5f5 Author: Kent Yao AuthorDate: Thu Mar 21 23:06:03 2024 -0700 [SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes ### What changes were proposed in this pull request? Add migration guide for TINYINT type mapping changes ### Why are the changes needed? behavior change doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ### Was this patch authored or co-authored using generative AI tooling? no Closes #45658 from yaooqinn/SPARK-47462-FB. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 8 1 file changed, 8 insertions(+) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index f788d89c4999..3bb83750ef92 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -22,6 +22,14 @@ license: | * Table of contents {:toc} +## Upgrading from Spark SQL 3.5.1 to 3.5.2 + +- Since 3.5.2, MySQL JDBC datasource will read TINYINT UNSIGNED as ShortType, while in 3.5.1, it was wrongly read as ByteType. + +## Upgrading from Spark SQL 3.5.0 to 3.5.1 + +- Since Spark 3.5.1, MySQL JDBC datasource will read TINYINT(n > 1) and TINYINT UNSIGNED as ByteType, while in Spark 3.5.0 and below, they were read as IntegerType. To restore the previous behavior, you can cast the column to the old type. + ## Upgrading from Spark SQL 3.4 to 3.5 - Since Spark 3.5, the JDBC options related to DS V2 pushdown are `true` by default. These options include: `pushDownAggregate`, `pushDownLimit`, `pushDownOffset` and `pushDownTableSample`. To restore the legacy behavior, please set them to `false`. e.g. set `spark.sql.catalog.your_catalog_name.pushDownAggregate` to `false`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (0ef7b771b33d -> 47bce8ececa8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0ef7b771b33d [SPARK-47501][SQL][FOLLOWUP] Rename convertDateToDate to convertJavaDateToDate add 47bce8ececa8 [MINOR][SQL] Fix a typo in `DelegateSymlinkTextInputSplit` comment No new revisions were added by this update. Summary of changes: .../org/apache/hadoop/hive/ql/io/DelegateSymlinkTextInputFormat.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (5042263f8668 -> 0ef7b771b33d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5042263f8668 [SPARK-47479][SQL] Optimize cannot write data to relations with multiple paths error log add 0ef7b771b33d [SPARK-47501][SQL][FOLLOWUP] Rename convertDateToDate to convertJavaDateToDate No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala | 4 ++-- sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala | 2 +- .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47514][SQL][TESTS] Add a test coverage for createTable method (partitioned-table) in CatalogSuite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9ba70d6ed302 [SPARK-47514][SQL][TESTS] Add a test coverage for createTable method (partitioned-table) in CatalogSuite 9ba70d6ed302 is described below commit 9ba70d6ed3029b444d6a37835eb27c6916e5c78a Author: panbingkun AuthorDate: Thu Mar 21 20:57:25 2024 -0700 [SPARK-47514][SQL][TESTS] Add a test coverage for createTable method (partitioned-table) in CatalogSuite ### What changes were proposed in this pull request? The pr aims to add a test coverage for createTable method (`partitioned-table`) in `CatalogSuite`. ### Why are the changes needed? Currently, the UT about `createTable` the partitions are `empty`. Let's improve it. ### Does this PR introduce _any_ user-facing change? No, only for tests. ### How was this patch tested? - Manually test. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45637 from panbingkun/minor_catalogsuites. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/catalog/CatalogSuite.scala | 27 -- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala index 145bfd286123..e20dfd4f6051 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala @@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.analysis.{NamespaceAlreadyExistsException, import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.util.quoteIdentifier import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, ScalarFunction, UnboundFunction} -import org.apache.spark.sql.connector.expressions.{LogicalExpressions, Transform} +import org.apache.spark.sql.connector.expressions.{Expressions, LogicalExpressions, Transform} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.{DataType, DoubleType, IntegerType, LongType, StringType, StructType, TimestampType} import org.apache.spark.sql.util.CaseInsensitiveStringMap @@ -96,7 +96,7 @@ class CatalogSuite extends SparkFunSuite { assert(catalog.listTables(Array("ns2")).toSet == Set(ident3)) } - test("createTable") { + test("createTable: non-partitioned table") { val catalog = newCatalog() assert(!catalog.tableExists(testIdent)) @@ -111,6 +111,29 @@ class CatalogSuite extends SparkFunSuite { assert(catalog.tableExists(testIdent)) } + test("createTable: partitioned table") { +val partCatalog = new InMemoryPartitionTableCatalog +partCatalog.initialize("test", CaseInsensitiveStringMap.empty()) + +assert(!partCatalog.tableExists(testIdent)) + +val columns = Array( +Column.create("col0", IntegerType), +Column.create("part0", IntegerType)) +val table = partCatalog.createTable( + testIdent, + columns, + Array[Transform](Expressions.identity("part0")), + util.Collections.emptyMap[String, String]) + +val parsed = CatalystSqlParser.parseMultipartIdentifier(table.name) +assert(parsed == Seq("test", "`", ".", "test_table")) +assert(table.columns === columns) +assert(table.properties.asScala == Map()) + +assert(partCatalog.tableExists(testIdent)) + } + test("createTable: with properties") { val catalog = newCatalog() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (6a27789ad7d5 -> b4c09221b2e0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6a27789ad7d5 [SPARK-47398][SQL] Extract a trait for InMemoryTableScanExec to allow for extending functionality add b4c09221b2e0 [SPARK-47502][INFRA] Make output the installation packages in `descending size`, add `titles`, and remove `unused` packages No new revisions were added by this update. Summary of changes: dev/free_disk_space | 5 +++-- dev/free_disk_space_container | 15 +-- 2 files changed, 16 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47507][BUILD][3.5] Upgrade ORC to 1.9.3
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 203f943efcb1 [SPARK-47507][BUILD][3.5] Upgrade ORC to 1.9.3 203f943efcb1 is described below commit 203f943efcb1ad699f1098c86d4bb4e46fc3bbc2 Author: Dongjoon Hyun AuthorDate: Thu Mar 21 12:16:25 2024 -0700 [SPARK-47507][BUILD][3.5] Upgrade ORC to 1.9.3 ### What changes were proposed in this pull request? This PR aims to upgrade ORC to 1.9.3 for Apache Spark 3.5.2. ### Why are the changes needed? Apache ORC 1.9.3 is the latest maintenance release. To bring the latest bug fixes, we had better upgrade. - https://orc.apache.org/news/2024/03/20/ORC-1.9.3/ - https://github.com/apache/orc/pull/1692 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45646 from dongjoon-hyun/SPARK-47507. Lead-authored-by: Dongjoon Hyun Co-authored-by: Gang Wu Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- pom.xml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 8ecf931bf513..1cd7d5a8f2d7 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -212,9 +212,9 @@ opencsv/2.3//opencsv-2.3.jar opentracing-api/0.33.0//opentracing-api-0.33.0.jar opentracing-noop/0.33.0//opentracing-noop-0.33.0.jar opentracing-util/0.33.0//opentracing-util-0.33.0.jar -orc-core/1.9.2/shaded-protobuf/orc-core-1.9.2-shaded-protobuf.jar -orc-mapreduce/1.9.2/shaded-protobuf/orc-mapreduce-1.9.2-shaded-protobuf.jar -orc-shims/1.9.2//orc-shims-1.9.2.jar +orc-core/1.9.3/shaded-protobuf/orc-core-1.9.3-shaded-protobuf.jar +orc-mapreduce/1.9.3/shaded-protobuf/orc-mapreduce-1.9.3-shaded-protobuf.jar +orc-shims/1.9.3//orc-shims-1.9.3.jar oro/2.0.8//oro-2.0.8.jar osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar paranamer/2.8//paranamer-2.8.jar diff --git a/pom.xml b/pom.xml index fb6208777d3f..269a42d41f17 100644 --- a/pom.xml +++ b/pom.xml @@ -141,7 +141,7 @@ 10.14.2.0 1.13.1 -1.9.2 +1.9.3 shaded-protobuf 9.4.54.v20240208 4.0.3 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47505][INFRA][3.4] Fix `Pyspark-errors` test jobs for branch-3.4
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 47c698e0bac9 [SPARK-47505][INFRA][3.4] Fix `Pyspark-errors` test jobs for branch-3.4 47c698e0bac9 is described below commit 47c698e0bac9b0edecfc3f85801e4b4f8b57534a Author: panbingkun AuthorDate: Thu Mar 21 10:34:11 2024 -0700 [SPARK-47505][INFRA][3.4] Fix `Pyspark-errors` test jobs for branch-3.4 ### What changes were proposed in this pull request? The pr aims to fix `pyspark-errors` test jobs for branch-3.4. ### Why are the changes needed? Fix `pyspark-errors` test jobs for branch-3.4. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45624 from panbingkun/branch-3.4_fix_pyerrors. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 17 + dev/free_disk_space_container| 33 + 2 files changed, 50 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 2184577d5c44..8ae303178033 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -396,6 +396,12 @@ jobs: key: pyspark-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }} restore-keys: | pyspark-coursier- +- name: Free up disk space + shell: 'script -q -e -c "bash {0}"' + run: | +if [ -f ./dev/free_disk_space_container ]; then + ./dev/free_disk_space_container +fi - name: Install Java ${{ matrix.java }} uses: actions/setup-java@v3 with: @@ -493,6 +499,12 @@ jobs: key: sparkr-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }} restore-keys: | sparkr-coursier- +- name: Free up disk space + shell: 'script -q -e -c "bash {0}"' + run: | +if [ -f ./dev/free_disk_space_container ]; then + ./dev/free_disk_space_container +fi - name: Install Java ${{ inputs.java }} uses: actions/setup-java@v3 with: @@ -571,6 +583,11 @@ jobs: key: docs-maven-${{ hashFiles('**/pom.xml') }} restore-keys: | docs-maven- +- name: Free up disk space + run: | +if [ -f ./dev/free_disk_space_container ]; then + ./dev/free_disk_space_container +fi - name: Install Java 8 uses: actions/setup-java@v3 with: diff --git a/dev/free_disk_space_container b/dev/free_disk_space_container new file mode 100755 index ..cc3b74643e4f --- /dev/null +++ b/dev/free_disk_space_container @@ -0,0 +1,33 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +echo "==" +echo "Free up disk space on CI system" +echo "==" + +echo "Listing 100 largest packages" +dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n | tail -n 100 +df -h + +echo "Removing large packages" +rm -rf /__t/CodeQL +rm -rf /__t/go +rm -rf /__t/node + +df -h - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47487][SQL] Simplify code in AnsiTypeCoercion
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 25ecde948beb [SPARK-47487][SQL] Simplify code in AnsiTypeCoercion 25ecde948beb is described below commit 25ecde948bebf01d2cb1e160516238e1d949ffdb Author: Wenchen Fan AuthorDate: Thu Mar 21 08:54:26 2024 -0700 [SPARK-47487][SQL] Simplify code in AnsiTypeCoercion ### What changes were proposed in this pull request? Simplify the code in `AnsiTypeCoercion.implicitCast`, to merge common code paths. ### Why are the changes needed? improve code readability ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #45612 from cloud-fan/type-coercion. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/types/DataType.scala | 2 +- .../sql/catalyst/analysis/AnsiTypeCoercion.scala | 56 ++ 2 files changed, 16 insertions(+), 42 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala index b37924a6d353..16cf6224ce27 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala @@ -102,7 +102,7 @@ abstract class DataType extends AbstractDataType { */ private[spark] def existsRecursively(f: (DataType) => Boolean): Boolean = f(this) - override private[sql] def defaultConcreteType: DataType = this + final override private[sql] def defaultConcreteType: DataType = this override private[sql] def acceptsType(other: DataType): Boolean = sameType(other) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala index c70d6696ad06..92ea3ba1ca29 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala @@ -180,56 +180,30 @@ object AnsiTypeCoercion extends TypeCoercionBase { // cast the input to decimal. case (n: NumericType, DecimalType) => Some(DecimalType.forType(n)) - // Cast null type (usually from null literals) into target types - // By default, the result type is `target.defaultConcreteType`. When the target type is - // `TypeCollection`, there is another branch to find the "closet convertible data type" below. - case (NullType, target) if !target.isInstanceOf[TypeCollection] => -Some(target.defaultConcreteType) - // If a function expects a StringType, no StringType instance should be implicitly cast to // StringType with a collation that's not accepted (aka. lockdown unsupported collations). case (_: StringType, StringType) => None case (_: StringType, _: StringTypeCollated) => None - // This type coercion system will allow implicit converting String type as other - // primitive types, in case of breaking too many existing Spark SQL queries. - case (StringType, a: AtomicType) => -Some(a) - - // If the target type is any Numeric type, convert the String type as Double type. - case (StringType, NumericType) => -Some(DoubleType) - - // If the target type is any Decimal type, convert the String type as the default - // Decimal type. - case (StringType, DecimalType) => -Some(DecimalType.SYSTEM_DEFAULT) - - // If the target type is any timestamp type, convert the String type as the default - // Timestamp type. - case (StringType, AnyTimestampType) => -Some(AnyTimestampType.defaultConcreteType) - - case (DateType, AnyTimestampType) => -Some(AnyTimestampType.defaultConcreteType) - - case (_, target: DataType) => -if (Cast.canANSIStoreAssign(inType, target)) { - Some(target) + // If a function expects integral type, fractional input is not allowed. + case (_: FractionalType, IntegralType) => None + + // Ideally the implicit cast rule should be the same as `Cast.canANSIStoreAssign` so that it's + // consistent with table insertion. To avoid breaking too many existing Spark SQL queries, + // we make the system to allow implicitly converting String type as other primitive types. + case (StringType, a @ (_: AtomicType | NumericType | DecimalType | AnyTimestampType)) => +Some(a.defaultConcreteType) + +
(spark) branch master updated: [SPARK-47501][SQL] Add convertDateToDate like the existing convertTimestampToTimestamp for JdbcDialect
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 32f3d4dc0389 [SPARK-47501][SQL] Add convertDateToDate like the existing convertTimestampToTimestamp for JdbcDialect 32f3d4dc0389 is described below commit 32f3d4dc03892f23bc073205335ad0d4174c213a Author: Kent Yao AuthorDate: Thu Mar 21 07:48:41 2024 -0700 [SPARK-47501][SQL] Add convertDateToDate like the existing convertTimestampToTimestamp for JdbcDialect ### What changes were proposed in this pull request? Add convertDateToDate like the existing convertTimestampToTimestamp for JdbcDialect ### Why are the changes needed? The date '±infinity' values cause overflows like timestamp '±infinity' in #41843 ### Does this PR introduce _any_ user-facing change? fix expected overflow for dates to align with the timestamps of PostgreSQL ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45638 from yaooqinn/SPARK-47501. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 24 ++ .../sql/execution/datasources/jdbc/JdbcUtils.scala | 6 ++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 8 + .../apache/spark/sql/jdbc/PostgresDialect.scala| 37 +++--- 4 files changed, 54 insertions(+), 21 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala index 8d137ba88cb1..a47e834a4b3c 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala @@ -155,6 +155,14 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite { "('-infinity', ARRAY[TIMESTAMP '-infinity'])") .executeUpdate() +conn.prepareStatement("CREATE TABLE infinity_dates" + +"(id SERIAL PRIMARY KEY, date_column DATE, date_array DATE[])") + .executeUpdate() +conn.prepareStatement("INSERT INTO infinity_dates (date_column, date_array)" + +" VALUES ('infinity', ARRAY[DATE 'infinity']), " + +"('-infinity', ARRAY[DATE '-infinity'])") + .executeUpdate() + conn.prepareStatement("CREATE DOMAIN not_null_text AS TEXT DEFAULT ''").executeUpdate() conn.prepareStatement("create table custom_type(type_array not_null_text[]," + "type not_null_text)").executeUpdate() @@ -462,6 +470,22 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite { assert(negativeInfinitySeq.head.getTime == minTimeStamp) } + test("SPARK-47501: infinity date test") { +val df = sqlContext.read.jdbc(jdbcUrl, "infinity_dates", new Properties) +val row = df.collect() + +assert(row.length == 2) +val infinity = row(0).getDate(1) +val negativeInfinity = row(1).getDate(1) +val infinitySeq = row(0).getAs[scala.collection.Seq[Date]]("date_array") +val negativeInfinitySeq = row(1).getAs[scala.collection.Seq[Date]]("date_array") +val minDate = -6213565440L +val maxDate = 25340215680L +assert(infinity.getTime == maxDate) +assert(negativeInfinity.getTime == minDate) +assert(infinitySeq.head.getTime == maxDate) +assert(negativeInfinitySeq.head.getTime == minDate) + } test("SPARK-47407: Support java.sql.Types.NULL for NullType") { val df = spark.read.format("jdbc") diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index 84d87f008217..70fd9bd071e9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -404,7 +404,7 @@ object JdbcUtils extends Logging with SQLConfHelper { // DateTimeUtils.fromJavaDate does not handle null value, so we need to check it. val dateVal = rs.getDate(pos + 1) if (dateVal != null) { - row.setInt(pos, fromJavaDate(dateVal)) + row.setInt(pos, fromJavaDate(dialect.convertDateToDate(dateVal))) } else { row.update(pos, null)
(spark) branch master updated: [SPARK-45393][BUILD][FOLLOWUP] Update IsolatedClientLoader fallback Hadoop version to 3.4.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3a1609af41e4 [SPARK-45393][BUILD][FOLLOWUP] Update IsolatedClientLoader fallback Hadoop version to 3.4.0 3a1609af41e4 is described below commit 3a1609af41e46ffaefddd4faabb24c284c108254 Author: Cheng Pan AuthorDate: Wed Mar 20 23:58:00 2024 -0700 [SPARK-45393][BUILD][FOLLOWUP] Update IsolatedClientLoader fallback Hadoop version to 3.4.0 ### What changes were proposed in this pull request? Update IsolatedClientLoader fallback Hadoop version to 3.4.0 ### Why are the changes needed? Sync with the default Hadoop version ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45628 from pan3793/SPARK-45393-followup. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala index 5693041d21f9..99fa0d46b903 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala @@ -66,7 +66,7 @@ private[hive] object IsolatedClientLoader extends Logging { case e: RuntimeException if e.getMessage.contains("hadoop") => // If the error message contains hadoop, it is probably because the hadoop // version cannot be resolved. -val fallbackVersion = "3.3.6" +val fallbackVersion = "3.4.0" logWarning(s"Failed to resolve Hadoop artifacts for the version $hadoopVersion. We " + s"will change the hadoop version from $hadoopVersion to $fallbackVersion and try " + "again. It is recommended to set jars used by Hive metastore client through " + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-41888][PYTHON][CONNECT][TESTS] Enable doctest for `DataFrame.observe`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 91df046e0825 [SPARK-41888][PYTHON][CONNECT][TESTS] Enable doctest for `DataFrame.observe` 91df046e0825 is described below commit 91df046e0825c8916107de1cdf7c4a022fb1a53d Author: Ruifeng Zheng AuthorDate: Wed Mar 20 23:26:30 2024 -0700 [SPARK-41888][PYTHON][CONNECT][TESTS] Enable doctest for `DataFrame.observe` ### What changes were proposed in this pull request? Enable doctest for `DataFrame.observe` ### Why are the changes needed? for test coverage ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #45627 from zhengruifeng/enable_listener_doctest. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/connect/dataframe.py | 3 --- 1 file changed, 3 deletions(-) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 7171bee24bd7..741606c89aa4 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -2243,9 +2243,6 @@ def _test() -> None: globs = pyspark.sql.connect.dataframe.__dict__.copy() -# TODO(SPARK-41888): Support StreamingQueryListener for DataFrame.observe -del pyspark.sql.connect.dataframe.DataFrame.observe.__doc__ - globs["spark"] = ( PySparkSession.builder.appName("sql.connect.dataframe tests") .remote("local[4]") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new a79101775c6f [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j` a79101775c6f is described below commit a79101775c6fc83e61d0fd393dace7b96286bb38 Author: Dongjoon Hyun AuthorDate: Wed Mar 20 22:01:37 2024 -0700 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j` This PR aims to fix a typo `slf4j-to-jul` to `jul-to-slf4j`. There exists only one. ``` $ git grep slf4j-to-jul common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:// slf4j-to-jul bridge order to route their logs to JUL. ``` Apache Spark uses `jul-to-slf4j` which includes a `java.util.logging` (jul) handler, namely `SLF4JBridgeHandler`, which routes all incoming jul records to the SLF4j API. https://github.com/apache/spark/blob/bb3e27581887a094ead0d2f7b4a6b2a17ee84b6f/pom.xml#L735 This typo was there since Apache Spark 1.0.0. No, this is a comment fix. Manual review. No. Closes #45625 from dongjoon-hyun/jul-to-slf4j. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit bb0867f54d437f6467274e854506aea2900bceb1) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/internal/Logging.scala b/core/src/main/scala/org/apache/spark/internal/Logging.scala index 614103dee7b7..22c51e999771 100644 --- a/core/src/main/scala/org/apache/spark/internal/Logging.scala +++ b/core/src/main/scala/org/apache/spark/internal/Logging.scala @@ -193,7 +193,7 @@ private[spark] object Logging { val initLock = new Object() try { // We use reflection here to handle the case where users remove the -// slf4j-to-jul bridge order to route their logs to JUL. +// jul-to-slf4j bridge order to route their logs to JUL. val bridgeClass = Utils.classForName("org.slf4j.bridge.SLF4JBridgeHandler") bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null) val installed = bridgeClass.getMethod("isInstalled").invoke(null).asInstanceOf[Boolean] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new e17fdba1f507 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j` e17fdba1f507 is described below commit e17fdba1f507fda816dcf5af0f15684399f5b7f8 Author: Dongjoon Hyun AuthorDate: Wed Mar 20 22:01:37 2024 -0700 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j` ### What changes were proposed in this pull request? This PR aims to fix a typo `slf4j-to-jul` to `jul-to-slf4j`. There exists only one. ``` $ git grep slf4j-to-jul common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:// slf4j-to-jul bridge order to route their logs to JUL. ``` Apache Spark uses `jul-to-slf4j` which includes a `java.util.logging` (jul) handler, namely `SLF4JBridgeHandler`, which routes all incoming jul records to the SLF4j API. https://github.com/apache/spark/blob/bb3e27581887a094ead0d2f7b4a6b2a17ee84b6f/pom.xml#L735 ### Why are the changes needed? This typo was there since Apache Spark 1.0.0. ### Does this PR introduce _any_ user-facing change? No, this is a comment fix. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45625 from dongjoon-hyun/jul-to-slf4j. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit bb0867f54d437f6467274e854506aea2900bceb1) Signed-off-by: Dongjoon Hyun --- common/utils/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala index 83e01330ce3f..bd82ce962b8d 100644 --- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala +++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala @@ -196,7 +196,7 @@ private[spark] object Logging { val initLock = new Object() try { // We use reflection here to handle the case where users remove the -// slf4j-to-jul bridge order to route their logs to JUL. +// jul-to-slf4j bridge order to route their logs to JUL. val bridgeClass = SparkClassUtils.classForName("org.slf4j.bridge.SLF4JBridgeHandler") bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null) val installed = bridgeClass.getMethod("isInstalled").invoke(null).asInstanceOf[Boolean] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bb0867f54d43 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j` bb0867f54d43 is described below commit bb0867f54d437f6467274e854506aea2900bceb1 Author: Dongjoon Hyun AuthorDate: Wed Mar 20 22:01:37 2024 -0700 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j` ### What changes were proposed in this pull request? This PR aims to fix a typo `slf4j-to-jul` to `jul-to-slf4j`. There exists only one. ``` $ git grep slf4j-to-jul common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:// slf4j-to-jul bridge order to route their logs to JUL. ``` Apache Spark uses `jul-to-slf4j` which includes a `java.util.logging` (jul) handler, namely `SLF4JBridgeHandler`, which routes all incoming jul records to the SLF4j API. https://github.com/apache/spark/blob/bb3e27581887a094ead0d2f7b4a6b2a17ee84b6f/pom.xml#L735 ### Why are the changes needed? This typo was there since Apache Spark 1.0.0. ### Does this PR introduce _any_ user-facing change? No, this is a comment fix. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45625 from dongjoon-hyun/jul-to-slf4j. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- common/utils/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala index 80c622bd5328..c2f61e4d7804 100644 --- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala +++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala @@ -196,7 +196,7 @@ private[spark] object Logging { val initLock = new Object() try { // We use reflection here to handle the case where users remove the -// slf4j-to-jul bridge order to route their logs to JUL. +// jul-to-slf4j bridge order to route their logs to JUL. val bridgeClass = SparkClassUtils.classForName("org.slf4j.bridge.SLF4JBridgeHandler") bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null) val installed = bridgeClass.getMethod("isInstalled").invoke(null).asInstanceOf[Boolean] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 622ab532bb52 [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 622ab532bb52 is described below commit 622ab532bb52a41d8e6ab0f6b4997350a63ec10b Author: Gengliang Wang AuthorDate: Wed Mar 20 15:17:23 2024 -0700 [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 ### What changes were proposed in this pull request? Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 ### Why are the changes needed? Show the behavior change to users. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? It's just doc change ### Was this patch authored or co-authored using generative AI tooling? Yes, there are some doc suggestion from copilot in docs/sql-migration-guide.md Closes #45623 from gengliangwang/SPARK-47494. Authored-by: Gengliang Wang Signed-off-by: Dongjoon Hyun (cherry picked from commit 11247d804cd370aaeb88736a706c587e7f5c83b3) Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index b83745e75c79..b3b1fb2122e8 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -93,6 +93,8 @@ license: | - Since Spark 3.3, the `unbase64` function throws error for a malformed `str` input. Use `try_to_binary(, 'base64')` to tolerate malformed input and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function returns a best-efforts result for a malformed `str` input. + - Since Spark 3.3, when reading Parquet files that were not produced by Spark, Parquet timestamp columns with annotation `isAdjustedToUTC = false` are inferred as TIMESTAMP_NTZ type during schema inference. In Spark 3.2 and earlier, these columns are inferred as TIMESTAMP type. To restore the behavior before Spark 3.3, you can set `spark.sql.parquet.inferTimestampNTZ.enabled` to `false`. + - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS (b)`-style SQL statements, `grouping__id` returns different values from Apache Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by expressions plus grouping set columns. To restore the behavior before 3.3.1 and 3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) and [SPARK-40562](https:/ [...] ## Upgrading from Spark SQL 3.1 to 3.2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 430a407c3963 [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 430a407c3963 is described below commit 430a407c39633637dba738482877edf806561ba7 Author: Gengliang Wang AuthorDate: Wed Mar 20 15:17:23 2024 -0700 [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 ### What changes were proposed in this pull request? Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 ### Why are the changes needed? Show the behavior change to users. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? It's just doc change ### Was this patch authored or co-authored using generative AI tooling? Yes, there are some doc suggestion from copilot in docs/sql-migration-guide.md Closes #45623 from gengliangwang/SPARK-47494. Authored-by: Gengliang Wang Signed-off-by: Dongjoon Hyun (cherry picked from commit 11247d804cd370aaeb88736a706c587e7f5c83b3) Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 0e54c33c6d12..f788d89c4999 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -99,6 +99,8 @@ license: | - Since Spark 3.3, the `unbase64` function throws error for a malformed `str` input. Use `try_to_binary(, 'base64')` to tolerate malformed input and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function returns a best-efforts result for a malformed `str` input. + - Since Spark 3.3, when reading Parquet files that were not produced by Spark, Parquet timestamp columns with annotation `isAdjustedToUTC = false` are inferred as TIMESTAMP_NTZ type during schema inference. In Spark 3.2 and earlier, these columns are inferred as TIMESTAMP type. To restore the behavior before Spark 3.3, you can set `spark.sql.parquet.inferTimestampNTZ.enabled` to `false`. + - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS (b)`-style SQL statements, `grouping__id` returns different values from Apache Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by expressions plus grouping set columns. To restore the behavior before 3.3.1 and 3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) and [SPARK-40562](https:/ [...] ## Upgrading from Spark SQL 3.1 to 3.2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b8e7d99d417a -> 11247d804cd3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning add 11247d804cd3 [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3 No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning b8e7d99d417a is described below commit b8e7d99d417ab4bcc3e69d11a0eee5864cb083e3 Author: Anish Shrigondekar AuthorDate: Wed Mar 20 15:11:51 2024 -0700 [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning ### What changes were proposed in this pull request? Fix RocksDB Logger constructor use to avoid deprecation warning ### Why are the changes needed? With the latest RocksDB upgrade, the Logger constructor used was deprecated which was throwing a compiler warning. ``` [warn] val dbLogger = new Logger(dbOptions) { [warn]^ [warn] one warning found [warn] two warnings found [info] compiling 36 Scala sources and 16 Java sources to /Users/anish.shrigondekar/spark/spark/sql/core/target/scala-2.13/classes ... [warn] -target is deprecated: Use -release instead to compile against the correct platform API. [warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation [warn] /Users/anish.shrigondekar/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:851:24: constructor Logger in class Logger is deprecated [warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.execution.streaming.state.RocksDB.createLogger.dbLogger, origin=org.rocksdb.Logger. ``` Updated to use the new recommendation as mentioned here - https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Logger.html Recommendation: ``` [Logger](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.DBOptions-)([DBOptions](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/DBOptions.html) dboptions) Deprecated. Use [Logger(InfoLogLevel)](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.InfoLogLevel-) instead, e.g. new Logger(dbOptions.infoLogLevel()). ``` After the fix, the warning is not seen. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #45616 from anishshri-db/task/SPARK-47490. Authored-by: Anish Shrigondekar Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala index 950baba9031b..8fad5ce7bd6a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala @@ -848,7 +848,7 @@ class RocksDB( /** Create a native RocksDB logger that forwards native logs to log4j with correct log levels. */ private def createLogger(): Logger = { -val dbLogger = new Logger(dbOptions) { +val dbLogger = new Logger(dbOptions.infoLogLevel()) { override def log(infoLogLevel: InfoLogLevel, logMsg: String) = { // Map DB log level to log4j levels // Warn is mapped to info because RocksDB warn is too verbose - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f66274e92d1c [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method f66274e92d1c is described below commit f66274e92d1ce6e65fecd45711da59eb08a9d296 Author: yangjie01 AuthorDate: Wed Mar 20 15:10:49 2024 -0700 [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method ### What changes were proposed in this pull request? The private method `getString` in `ArrowDeserializers` is no longer used after SPARK-9 | https://github.com/apache/spark/pull/42076, this pr removes it. ### Why are the changes needed? Code clean up. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45610 from LuciferYang/SPARK-47486. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../spark/sql/connect/client/arrow/ArrowDeserializer.scala | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala index ac9619487f02..eaf2927863ec 100644 --- a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala +++ b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala @@ -29,10 +29,9 @@ import scala.collection.mutable import scala.reflect.ClassTag import org.apache.arrow.memory.BufferAllocator -import org.apache.arrow.vector.{FieldVector, VarCharVector, VectorSchemaRoot} +import org.apache.arrow.vector.{FieldVector, VectorSchemaRoot} import org.apache.arrow.vector.complex.{ListVector, MapVector, StructVector} import org.apache.arrow.vector.ipc.ArrowReader -import org.apache.arrow.vector.util.Text import org.apache.spark.sql.catalyst.ScalaReflection import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder @@ -468,16 +467,6 @@ object ArrowDeserializers { private def isTuple(cls: Class[_]): Boolean = cls.getName.startsWith("scala.Tuple") - private def getString(v: VarCharVector, i: Int): String = { -// This is currently a bit heavy on allocations: -// - byte array created in VarCharVector.get -// - CharBuffer created CharSetEncoder -// - char array in String -// By using direct buffers and reusing the char buffer -// we could get rid of the first two allocations. -Text.decode(v.get(i)) - } - private def loadListIntoBuilder( v: ListVector, i: Int, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 49b4c3bc9c09 [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0 49b4c3bc9c09 is described below commit 49b4c3bc9c09325de941dfaf41e4fd3a4a4c345f Author: Dongjoon Hyun AuthorDate: Wed Mar 20 10:37:51 2024 -0700 [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0 ### What changes were proposed in this pull request? This PR aims to upgrade to Apache Hadoop 3.4.0 for Apache Spark 4.0.0. ### Why are the changes needed? To bring the new features like the following - https://hadoop.apache.org/docs/r3.4.0 - [HADOOP-18995](https://issues.apache.org/jira/browse/HADOOP-18995) Upgrade AWS SDK version to 2.21.33 for `S3 Express One Zone` - [HADOOP-18328](https://issues.apache.org/jira/browse/HADOOP-18328) Supports `S3 on Outposts` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45583 from dongjoon-hyun/SPARK-45393. Lead-authored-by: Dongjoon Hyun Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 27 -- pom.xml| 2 +- .../spark/deploy/yarn/YarnClusterSuite.scala | 3 ++- 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 86da61d89149..903c7a245af3 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -9,7 +9,7 @@ algebra_2.13/2.8.0//algebra_2.13-2.8.0.jar aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar -aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar +aliyun-sdk-oss/3.13.2//aliyun-sdk-oss-3.13.2.jar annotations/17.0.0//annotations-17.0.0.jar antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar @@ -24,7 +24,6 @@ audience-annotations/0.12.0//audience-annotations-0.12.0.jar avro-ipc/1.11.3//avro-ipc-1.11.3.jar avro-mapred/1.11.3//avro-mapred-1.11.3.jar avro/1.11.3//avro-1.11.3.jar -aws-java-sdk-bundle/1.12.367//aws-java-sdk-bundle-1.12.367.jar azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar azure-storage/7.0.1//azure-storage-7.0.1.jar @@ -32,6 +31,7 @@ blas/3.0.3//blas-3.0.3.jar bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar +bundle/2.23.19//bundle-2.23.19.jar cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar chill-java/0.10.0//chill-java-0.10.0.jar chill_2.13/0.10.0//chill_2.13-0.10.0.jar @@ -65,21 +65,23 @@ derbytools/10.16.1.1//derbytools-10.16.1.1.jar dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar +esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar gmetric4j/1.0.10//gmetric4j-1.0.10.jar gson/2.2.4//gson-2.2.4.jar guava/14.0.1//guava-14.0.1.jar -hadoop-aliyun/3.3.6//hadoop-aliyun-3.3.6.jar -hadoop-annotations/3.3.6//hadoop-annotations-3.3.6.jar -hadoop-aws/3.3.6//hadoop-aws-3.3.6.jar -hadoop-azure-datalake/3.3.6//hadoop-azure-datalake-3.3.6.jar -hadoop-azure/3.3.6//hadoop-azure-3.3.6.jar -hadoop-client-api/3.3.6//hadoop-client-api-3.3.6.jar -hadoop-client-runtime/3.3.6//hadoop-client-runtime-3.3.6.jar -hadoop-cloud-storage/3.3.6//hadoop-cloud-storage-3.3.6.jar -hadoop-shaded-guava/1.1.1//hadoop-shaded-guava-1.1.1.jar -hadoop-yarn-server-web-proxy/3.3.6//hadoop-yarn-server-web-proxy-3.3.6.jar +hadoop-aliyun/3.4.0//hadoop-aliyun-3.4.0.jar +hadoop-annotations/3.4.0//hadoop-annotations-3.4.0.jar +hadoop-aws/3.4.0//hadoop-aws-3.4.0.jar +hadoop-azure-datalake/3.4.0//hadoop-azure-datalake-3.4.0.jar +hadoop-azure/3.4.0//hadoop-azure-3.4.0.jar +hadoop-client-api/3.4.0//hadoop-client-api-3.4.0.jar +hadoop-client-runtime/3.4.0//hadoop-client-runtime-3.4.0.jar +hadoop-cloud-storage/3.4.0//hadoop-cloud-storage-3.4.0.jar +hadoop-huaweicloud/3.4.0//hadoop-huaweicloud-3.4.0.jar +hadoop-shaded-guava/1.2.0//hadoop-shaded-guava-1.2.0.jar +hadoop-yarn-server-web-proxy/3.4.0//hadoop-yarn-server-web-proxy-3.4.0.jar hive-beeline/2.3.9//hive-beeline-2.3.9.jar hive-cli/2.3.9//hive
(spark) branch master updated: [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a34c8ceb19bd [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect a34c8ceb19bd is described below commit a34c8ceb19bd1c1548a60bb144d1c587a2861cd8 Author: Kent Yao AuthorDate: Wed Mar 20 09:31:26 2024 -0700 [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect ### What changes were proposed in this pull request? Align mappings of other unsigned numeric types with TINYINT in MySQLDialect. TINYINT is mapping to ByteType and TINYINT UNSIGNED is mapping to ShortType. In this PR, we - map SMALLINT to ShortType, SMALLINT UNSIGNED to IntegerType. W/o this, both of them are mapping to IntegerType - map MEDIUMINT UNSIGNED to IntegerType, and MEDIUMINT is AS-IS. W/o this, MEDIUMINT UNSIGNED uses LongType Other unsigned/signed types remain unchanged and only improve the test coverage. ### Why are the changes needed? Consistency and efficiency while reading MySQL numeric values ### Does this PR introduce _any_ user-facing change? yes, the mappings described the 1st section. ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45588 from yaooqinn/SPARK-47462. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 39 ++ .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 10 ++ 2 files changed, 42 insertions(+), 7 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 3d65b4f305b3..5b2214f2efd6 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -53,11 +53,19 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), " + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci DECIMAL(40,20), flt FLOAT, " - + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate() + + "dbl DOUBLE, tiny TINYINT)").executeUpdate() conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', " + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " - + "42.75, 1.0002, -128, 255)").executeUpdate() + + "42.75, 1.0002, -128)").executeUpdate() + +conn.prepareStatement("CREATE TABLE unsigned_numbers (" + + "tiny TINYINT UNSIGNED, small SMALLINT UNSIGNED, med MEDIUMINT UNSIGNED," + + "nor INT UNSIGNED, big BIGINT UNSIGNED, deci DECIMAL(40,20) UNSIGNED," + + "dbl DOUBLE UNSIGNED)").executeUpdate() + +conn.prepareStatement("INSERT INTO unsigned_numbers VALUES (255, 65535, 16777215, 4294967295," + + "9223372036854775808, 123456789012345.123456789012345, 1.0002)").executeUpdate() conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() @@ -87,10 +95,10 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { val rows = df.collect() assert(rows.length == 1) val types = rows(0).toSeq.map(x => x.getClass.toString) -assert(types.length == 11) +assert(types.length == 10) assert(types(0).equals("class java.lang.Boolean")) assert(types(1).equals("class java.lang.Long")) -assert(types(2).equals("class java.lang.Integer")) +assert(types(2).equals("class java.lang.Short")) assert(types(3).equals("class java.lang.Integer")) assert(types(4).equals("class java.lang.Integer")) assert(types(5).equals("class java.lang.Long")) @@ -98,10 +106,9 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.Double")) assert(types(8).equals("class java.lang.Double")) assert(types(9).equals("class java.lang.Byte")) -assert(types(10).equals("class java.lang.Short")) assert(rows(0).getBoolean(0) == false) assert(rows
(spark) branch branch-3.5 updated: [SPARK-47481][INFRA][3.5] Fix Python linter
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9baf82b1c97a [SPARK-47481][INFRA][3.5] Fix Python linter 9baf82b1c97a is described below commit 9baf82b1c97a792a3733dedccf1c03737b592bbd Author: panbingkun AuthorDate: Wed Mar 20 07:19:29 2024 -0700 [SPARK-47481][INFRA][3.5] Fix Python linter ### What changes were proposed in this pull request? The pr aims to fix `python linter issue` on `branch-3.5` through pinning `matplotlib==3.7.2` ### Why are the changes needed? Fix `python linter issue` on `branch-3.5`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45550 from panbingkun/branch-3.5_scheduled_job. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index d3fcd7ab3622..f0b88666c040 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,10 +65,10 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib -RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=2.0.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage 'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 'protobuf==3.20.3' 'googleapis-common-protos==1.56.4' # Add torch as a testing dependency for TorchDistributor -RUN python3.9 -m pip install torch torchvision torcheval +RUN python3.9 -m pip install 'torch==2.0.1' 'torchvision==0.15.2' torcheval - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 4de8000f21a4 [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure 4de8000f21a4 is described below commit 4de8000f21a48796d30af37bc57269395792a254 Author: panbingkun AuthorDate: Wed Mar 20 07:15:32 2024 -0700 [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure ### What changes were proposed in this pull request? The pr aims to fix `python linter issue` on branch-3.4 through pinning `matplotlib<3.3.0` ### Why are the changes needed? - Through this PR https://github.com/apache/spark/pull/45600, we found that the version of `matplotlib` in our Docker image was `3.8.2`, which clearly did not meet the original requirements for `branch-3.4`. https://github.com/panbingkun/spark/actions/runs/8354370179/job/22869580038 https://github.com/apache/spark/assets/15246973/dd425bfb-ce5f-4a99-a487-a462d6e9";> https://github.com/apache/spark/blob/branch-3.4/dev/requirements.txt#L12 https://github.com/apache/spark/assets/15246973/70485648-b886-4218-bb21-c41a85d5eecf";> - Fix as follows: https://github.com/apache/spark/assets/15246973/db31d8fb-0b6c-4925-95e1-0ca0247bb9f5";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45608 from panbingkun/branch_3.4_pin_matplotlib. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 68d27052437b..5ebd10339be9 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -37,6 +37,7 @@ RUN add-apt-repository ppa:pypy/ppa RUN apt update RUN $APT_INSTALL gfortran libopenblas-dev liblapack-dev RUN $APT_INSTALL build-essential +RUN $APT_INSTALL python3-matplotlib RUN mkdir -p /usr/local/pypy/pypy3.7 && \ curl -sqL https://downloads.python.org/pypy/pypy3.7-v7.3.7-linux64.tar.bz2 | tar xjf - -C /usr/local/pypy/pypy3.7 --strip-components=1 && \ @@ -64,8 +65,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht # See more in SPARK-39735 ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" -RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib -RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage 'matplotlib<3.3.0' +RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 'matplotlib<3.3.0' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-status - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new d25f49a14733 [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` d25f49a14733 is described below commit d25f49a14733c5a0e872498cab40a30a5ebc28b4 Author: Dongjoon Hyun AuthorDate: Tue Mar 19 20:53:45 2024 -0700 [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` ### What changes were proposed in this pull request? This PR aims to pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` to recover the following test failure. ### Why are the changes needed? `numpy==1.23.5` was the version of the last successful run. - https://github.com/apache/spark/actions/runs/8276453417/job/22725387782 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? Closes #45595 from dongjoon-hyun/pin-numpy. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 93d8793826ff..68d27052437b 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib -RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-status - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (bc378f4ff5e2 -> 61d7b0f24fc9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from bc378f4ff5e2 [SPARK-47330][SQL][TESTS] XML: Added XmlExpressionsSuite add 61d7b0f24fc9 [SPARK-47470][SQL][TESTS] Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c32d27850e2e [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven c32d27850e2e is described below commit c32d27850e2ea5f8cb36099ab8453b09f4c70861 Author: Dongjoon Hyun AuthorDate: Tue Mar 19 17:52:38 2024 -0700 [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven ### What changes were proposed in this pull request? This PR aims to exclude `logback` from SBT dependency like Maven to fix the following SBT issue. ``` [info] stderr> SLF4J: Class path contains multiple SLF4J bindings. [info] stderr> SLF4J: Found binding in [jar:file:/home/runner/work/spark/spark/assembly/target/scala-2.13/jars/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class] [info] stderr> SLF4J: Found binding in [jar:file:/home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class] [info] stderr> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. [info] stderr> SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder] ``` ### Why are the changes needed? **Maven** ``` $ build/mvn dependency:tree --pl core | grep logback Using `mvn` from path: /opt/homebrew/bin/mvn Using SPARK_LOCAL_IP=localhost ``` **SBT (BEFORE)** ``` $ build/sbt "core/test:dependencyTree" | grep logback Using SPARK_LOCAL_IP=localhost [info] | +-ch.qos.logback:logback-classic:1.2.13 [info] | | +-ch.qos.logback:logback-core:1.2.13 [info] | +-ch.qos.logback:logback-core:1.2.13 [info] | | +-ch.qos.logback:logback-classic:1.2.13 [info] | | | +-ch.qos.logback:logback-core:1.2.13 [info] | | +-ch.qos.logback:logback-core:1.2.13 [info] | +-ch.qos.logback:logback-classic:1.2.13 [info] | | +-ch.qos.logback:logback-core:1.2.13 [info] | +-ch.qos.logback:logback-core:1.2.13 ``` **SBT (AFTER)** ``` $ build/sbt "core/test:dependencyTree" | grep logback Using SPARK_LOCAL_IP=localhost ``` ### Does this PR introduce _any_ user-facing change? No. This only fixes developer and CI issues. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45594 from dongjoon-hyun/SPARK-47468. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- project/SparkBuild.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index b7b9589568e1..3d89af2aa7b4 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -1078,6 +1078,7 @@ object ExcludedDependencies { // purpose only. Here we exclude them from the whole project scope and add them w/ yarn only. excludeDependencies ++= Seq( ExclusionRule(organization = "com.sun.jersey"), + ExclusionRule(organization = "ch.qos.logback"), ExclusionRule("javax.ws.rs", "jsr311-api")) ) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 32ee2d7936a5 [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` 32ee2d7936a5 is described below commit 32ee2d7936a50a653e8ea599d622fbc550fa5eac Author: panbingkun AuthorDate: Tue Mar 19 16:27:15 2024 -0700 [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` ### What changes were proposed in this pull request? The pr aims to update `labeler.yml` for module `common/sketch` and `common/variant`. ### Why are the changes needed? Currently, the above modules are not classified in the file `labeler.yml`, and the GitHub action label cannot automatically tag the submitted PR. ### Does this PR introduce _any_ user-facing change? Yes, only for dev. ### How was this patch tested? Manually test: after this PR is merged, continue to observe. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45590 from panbingkun/SPARK-47464. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- .github/labeler.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/labeler.yml b/.github/labeler.yml index 7d24390f2968..104eac99ec4d 100644 --- a/.github/labeler.yml +++ b/.github/labeler.yml @@ -101,6 +101,8 @@ SQL: ] - any-glob-to-any-file: [ 'common/unsafe/**/*', + 'common/sketch/**/*', + 'common/variant/**/*', 'bin/spark-sql*', 'bin/beeline*', 'sbin/*thriftserver*.sh', - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (90560dce85b0 -> db531c6ee719)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 90560dce85b0 [SPARK-47458][CORE] Fix the problem with calculating the maximum concurrent tasks for the barrier stage add db531c6ee719 [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala | 4 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala | 4 +--- 2 files changed, 1 insertion(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b6a836946311 -> a6bffcc3e5f0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b6a836946311 [SPARK-47454][PYTHON][CONNECT][TESTS] Split `pyspark.sql.tests.test_dataframe` add a6bffcc3e5f0 [SPARK-47457][SQL] Fix `IsolatedClientLoader.supportsHadoopShadedClient` to handle Hadoop 3.4+ No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 2 ++ .../org/apache/spark/sql/hive/client/HadoopVersionInfoSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ef94f7094989 [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile` ef94f7094989 is described below commit ef94f709498974cb31e805541e0803270cd5c39e Author: Dongjoon Hyun AuthorDate: Mon Mar 18 23:15:32 2024 -0700 [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile` ### What changes were proposed in this pull request? This PR aims to use `Ubuntu 22.04` in `dev/infra/Dockerfile` for Apache Spark 4.0.0. | Installed SW | BEFORE | AFTER | | - | | --- | | Ubuntu LTS | 20.04.5 | 22.04.4 | | Java| 17.0.10 | 17.0.10 | | PyPy 3.8| 3.8.16| 3.8.16 | | Python 3.9 | 3.9.5 | 3.9.18 | | Python 3.10 | 3.10.13 | 3.10.12 | | Python 3.11| 3.11.8| 3.11.8 | | Python 3.12 | 3.12.2| 3.12.2 | | R | 3.6.3 | 4.1.2 | ### Why are the changes needed? - Since Apache Spark 3.4.0, we use `Ubuntu 20.04` via SPARK-39522. - From Apache Spark 4.0.0, this PR aims to use `Ubuntu 22.04` mainly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45576 from dongjoon-hyun/SPARK-47452. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 52 +--- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 64adf33e6742..f17ee58c9d90 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -15,11 +15,11 @@ # limitations under the License. # -# Image for building and testing Spark branches. Based on Ubuntu 20.04. +# Image for building and testing Spark branches. Based on Ubuntu 22.04. # See also in https://hub.docker.com/_/ubuntu -FROM ubuntu:focal-20221019 +FROM ubuntu:jammy-20240227 -ENV FULL_REFRESH_DATE 20240117 +ENV FULL_REFRESH_DATE 20240318 ENV DEBIAN_FRONTEND noninteractive ENV DEBCONF_NONINTERACTIVE_SEEN true @@ -50,10 +50,8 @@ RUN apt-get update && apt-get install -y \ openjdk-17-jdk-headless \ pandoc \ pkg-config \ -python3-pip \ -python3-setuptools \ -python3.8 \ -python3.9 \ +python3.10 \ +python3-psutil \ qpdf \ r-base \ ruby \ @@ -64,10 +62,10 @@ RUN apt-get update && apt-get install -y \ && rm -rf /var/lib/apt/lists/* -RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> /etc/apt/sources.list +RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' >> /etc/apt/sources.list RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key E298A3A825C0D65DFD57CBB651716619E084DAB9 RUN gpg -a --export E084DAB9 | apt-key add - -RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' +RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' # See more in SPARK-39959, roxygen2 < 7.2.1 RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \ @@ -82,9 +80,6 @@ RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \ ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" -RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9 - - RUN add-apt-repository ppa:pypy/ppa RUN mkdir -p /usr/local/pypy/pypy3.8 && \ curl -sqL https://downloads.python.org/pypy/pypy3.8-v7.3.11-linux64.tar.bz2 | tar xjf - -C /usr/local/pypy/pypy3.8 --strip-components=1 && \ @@ -98,41 +93,44 @@ ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.2.1 scipy plotly # Python deps for Spark Connect ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 protobuf==4.25.1 googleapis-common-protos==1.56.4" -# Add torch as a testing dependency for TorchDistributor and DeepspeedTorchDistributor -RUN python3.9 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS && \ -python3.9 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \ -python3.9 -m pip install deepspeed torcheval && \ -python3.9 -m pip cache purge - -# Install Python 3.10 at the last stage to avoid breaking Python 3.9 -RUN add-apt-repository ppa:deadsnakes/ppa -RUN apt-get update && apt-get install -y \ -python3.10 python3.10-distut
(spark) branch master updated (5f48931fcdf7 -> 5e42ecc8163a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL docker image version to 8.3.0 add 5e42ecc8163a [SPARK-47456][SQL] Support ORC Brotli codec No new revisions were added by this update. Summary of changes: docs/sql-data-sources-orc.md | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 4 ++-- .../spark/sql/execution/datasources/orc/OrcCompressionCodec.java | 3 ++- .../org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala| 3 ++- .../spark/sql/execution/datasources/FileSourceCodecSuite.scala | 5 - 5 files changed, 11 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (681b41f0808e -> 5f48931fcdf7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 681b41f0808e [SPARK-47422][SQL] Support collated strings in array operations add 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL docker image version to 8.3.0 No new revisions were added by this update. Summary of changes: ...baseOnDocker.scala => MySQLDatabaseOnDocker.scala} | 17 +++-- .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++ .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 19 --- .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala | 19 --- 4 files changed, 18 insertions(+), 52 deletions(-) copy connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{DB2DatabaseOnDocker.scala => MySQLDatabaseOnDocker.scala} (66%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (9f8147c2a8d2 -> e01ed0da22f2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe while using foreachbatch and stateful streaming query to prevent state from being re-loaded in each batch add e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML within XmlFunctionsSuite No new revisions were added by this update. Summary of changes: sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (cb20fcae951d -> acf17fd67217)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cb20fcae951d [SPARK-47448][CORE] Enable `spark.shuffle.service.removeShuffle` by default add acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job No new revisions were added by this update. Summary of changes: .github/workflows/build_sparkr_window.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (51e8634a5883 -> cb20fcae951d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same add cb20fcae951d [SPARK-47448][CORE] Enable `spark.shuffle.service.removeShuffle` by default No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala | 1 + docs/configuration.md | 2 +- docs/core-migration-guide.md | 2 ++ 4 files changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a40940a0bc6d [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal` a40940a0bc6d is described below commit a40940a0bc6de58b5c56b8ad918f338c6e70572f Author: Dongjoon Hyun AuthorDate: Mon Mar 18 12:39:44 2024 -0700 [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal` ### What changes were proposed in this pull request? This PR aims to make `BlockManager` warn before invoking `removeBlockInternal` by switching the log position. To be clear, 1. For the case where `removeBlockInternal` succeeds, the log messages are identical before and after this PR. 2. For the case where `removeBlockInternal` fails, the user will see one additional warning message like the following which was hidden from the users before this PR. ``` logWarning(s"Putting block $blockId failed") ``` ### Why are the changes needed? When `Put` operation fails, Apache Spark currently tries `removeBlockInternal` first before logging. https://github.com/apache/spark/blob/ce93c9fd86715e2479552628398f6fc11e83b2af/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1554-L1567 On top of that, if `removeBlockInternal` fails consecutively, Spark shows the warning like the following and fails the job. ``` 24/03/18 18:40:46 WARN BlockManager: Putting block broadcast_0 failed due to exception java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e. 24/03/18 18:40:46 WARN BlockManager: Block broadcast_0 was not removed normally. 24/03/18 18:40:46 INFO TaskSchedulerImpl: Cancelling stage 0 24/03/18 18:40:46 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled 24/03/18 18:40:46 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) failed in 0.264 s due to Job aborted due to stage failure: Task serialization failed: java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e ``` It's misleading although they might share the same root cause. Since `Put` operation fails before the above failure, we had better switch WARN message to make it clear. ### Does this PR introduce _any_ user-facing change? No. This is a warning message change only. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45570 from dongjoon-hyun/SPARK-47446. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/storage/BlockManager.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala index 228ec5752e1b..89b3914e94af 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala @@ -1561,8 +1561,8 @@ private[spark] class BlockManager( blockInfoManager.unlock(blockId) } } else { -removeBlockInternal(blockId, tellMaster = false) logWarning(s"Putting block $blockId failed") +removeBlockInternal(blockId, tellMaster = false) } res } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47383][CORE] Support `spark.shutdown.timeout` config
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ce93c9fd8671 [SPARK-47383][CORE] Support `spark.shutdown.timeout` config ce93c9fd8671 is described below commit ce93c9fd86715e2479552628398f6fc11e83b2af Author: Rob Reeves AuthorDate: Mon Mar 18 10:36:38 2024 -0700 [SPARK-47383][CORE] Support `spark.shutdown.timeout` config ### What changes were proposed in this pull request? Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. ### Why are the changes needed? Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. ### Does this PR introduce _any_ user-facing change? Yes, a new config `spark.shutdown.timeout` is added. ### How was this patch tested? Manual testing in spark-shell. This behavior is not practical to write a unit test for. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/internal/config/package.scala| 10 ++ .../org/apache/spark/util/ShutdownHookManager.scala | 19 +-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index aa240b5cc5b5..e72b9cb694eb 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -2683,4 +2683,14 @@ package object config { .version("4.0.0") .booleanConf .createWithDefault(false) + + private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS = +ConfigBuilder("spark.shutdown.timeout") + .internal() + .doc("Defines the timeout period to wait for all shutdown hooks to be executed. " + +"This must be passed as a system property argument in the Java options, for example " + +"spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".") + .version("4.0.0") + .timeConf(TimeUnit.MILLISECONDS) + .createOptional } diff --git a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala index 4db268604a3e..c6cad9440168 100644 --- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala +++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala @@ -19,12 +19,16 @@ package org.apache.spark.util import java.io.File import java.util.PriorityQueue +import java.util.concurrent.TimeUnit import scala.util.Try import org.apache.hadoop.fs.FileSystem +import org.apache.spark.SparkConf import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.SPARK_SHUTDOWN_TIMEOUT_MS + /** * Various utility methods used by Spark. @@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager { val hookTask = new Runnable() { override def run(): Unit = runAll() } -org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( - hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30) +val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30 +// The timeout property must be passed as a Java system property because this +// is initialized before Spark configurations are registered as system +// properties later in initialization. +val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS) + +timeout.fold { + org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( +hookTask, priority) +} { t => + org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( +hookTask, priority, t, TimeUnit.MILLISECONDS) +} } def runAll(): Unit = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8bd42cbdb6bf [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561 8bd42cbdb6bf is described below commit 8bd42cbdb6bfa40aead94570b06e926f8e8aa9e1 Author: Kent Yao AuthorDate: Mon Mar 18 08:56:55 2024 -0700 [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561 ### What changes were proposed in this pull request? SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, which caused unsigned TINYINT overflow. As regardless of signed or unsigned types, the TINYINT is used for java.sql.Types. In this PR, we put the signed info into the metadata for mapping TINYINT to short or byte. ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? Uses can read MySQL UNSIGNED TINYINT values after this PR like versions before 3.5.0 which has breaked since 3.5.1 ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45556 from yaooqinn/SPARK-47435. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 9 ++-- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 9 ++-- .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 ++- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 15 -- .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 9 ++-- .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 9 ++-- .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 26 ++ .../sql/execution/datasources/jdbc/JdbcUtils.scala | 5 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 10 ++-- .../v2/jdbc/JDBCTableCatalogSuite.scala| 60 -- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 24 + 11 files changed, 114 insertions(+), 68 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index b1d239337aa0..79e88f109534 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -57,10 +57,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), " + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci DECIMAL(40,20), flt FLOAT, " - + "dbl DOUBLE, tiny TINYINT)").executeUpdate() + + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate() + conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', " + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " - + "42.75, 1.0002, -128)").executeUpdate() + + "42.75, 1.0002, -128, 255)").executeUpdate() conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() @@ -90,7 +91,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { val rows = df.collect() assert(rows.length == 1) val types = rows(0).toSeq.map(x => x.getClass.toString) -assert(types.length == 10) +assert(types.length == 11) assert(types(0).equals("class java.lang.Boolean")) assert(types(1).equals("class java.lang.Long")) assert(types(2).equals("class java.lang.Integer")) @@ -101,6 +102,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.Double")) assert(types(8).equals("class java.lang.Double")) assert(types(9).equals("class java.lang.Byte")) +assert(types(10).equals("class java.lang.Short")) assert(rows(0).getBoolean(0) == false) assert(rows(0).getLong(1) == 0x225) assert(rows(0).getInt(2) == 17) @@ -112,6 +114,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(rows(0).getDouble(7) == 42.75) assert(rows(0).getDouble(8) == 1.0002) assert(rows(0).getByte(9) == 0x80.toByte) +assert(rows(0).getShort(10) == 0xff.toShort) } test("Date types") { diff --git a/connector/docker-integration-tests/src/test/scala/org/apa
(spark) branch master updated (4dc362dbc6c0 -> 1aafe60b3e76)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 add 1aafe60b3e76 [SPARK-47442][CORE][TEST] Use port 0 to start worker servers in MasterSuite No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/deploy/master/MasterSuiteBase.scala| 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 4dc362dbc6c0 is described below commit 4dc362dbc6c039d955e4dceb87e53dfc76ef2a5c Author: panbingkun AuthorDate: Mon Mar 18 08:25:16 2024 -0700 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 ### What changes were proposed in this pull request? The pr aims to upgrade jackson from `2.16.1` to `2.17.0`. ### Why are the changes needed? The full release notes: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45562 from panbingkun/SPARK-47438. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++--- pom.xml | 4 ++-- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index d4b7d38aea22..86da61d89149 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -103,15 +103,15 @@ icu4j/72.1//icu4j-72.1.jar ini4j/0.5.4//ini4j-0.5.4.jar istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar ivy/2.5.2//ivy-2.5.2.jar -jackson-annotations/2.16.1//jackson-annotations-2.16.1.jar +jackson-annotations/2.17.0//jackson-annotations-2.17.0.jar jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar -jackson-core/2.16.1//jackson-core-2.16.1.jar -jackson-databind/2.16.1//jackson-databind-2.16.1.jar -jackson-dataformat-cbor/2.16.1//jackson-dataformat-cbor-2.16.1.jar -jackson-dataformat-yaml/2.16.1//jackson-dataformat-yaml-2.16.1.jar -jackson-datatype-jsr310/2.16.1//jackson-datatype-jsr310-2.16.1.jar +jackson-core/2.17.0//jackson-core-2.17.0.jar +jackson-databind/2.17.0//jackson-databind-2.17.0.jar +jackson-dataformat-cbor/2.17.0//jackson-dataformat-cbor-2.17.0.jar +jackson-dataformat-yaml/2.17.0//jackson-dataformat-yaml-2.17.0.jar +jackson-datatype-jsr310/2.17.0//jackson-datatype-jsr310-2.17.0.jar jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar -jackson-module-scala_2.13/2.16.1//jackson-module-scala_2.13-2.16.1.jar +jackson-module-scala_2.13/2.17.0//jackson-module-scala_2.13-2.17.0.jar jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar diff --git a/pom.xml b/pom.xml index 757d911c1229..5cc56a92999d 100644 --- a/pom.xml +++ b/pom.xml @@ -184,8 +184,8 @@ true true 1.9.13 -2.16.1 - 2.16.1 +2.17.0 + 2.17.0 2.3.1 3.0.2 1.1.10.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 57424b92c5b5 [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md 57424b92c5b5 is described below commit 57424b92c5b5e7c3de680a7d8a6b137911f45666 Author: Matt Braymer-Hayes AuthorDate: Mon Mar 18 07:53:11 2024 -0700 [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md ### What changes were proposed in this pull request? Adds the Web UI to the `Other Documents` list on the main page. ### Why are the changes needed? I found it difficult to find the Web UI docs: it's only linked inside the Monitoring docs. Adding it to the main page will make it easier for people to find and use the docs. ### Does this PR introduce _any_ user-facing change? Yes: adds another cross-reference on the main page. ### How was this patch tested? Visually verified that Markdown still rendered properly. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45534 from mattayes/patch-2. Authored-by: Matt Braymer-Hayes Signed-off-by: Dongjoon Hyun --- docs/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 5f3858bec86b..12c53c40c8f7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -138,6 +138,7 @@ options for deployment: * [Configuration](configuration.html): customize Spark via its configuration system * [Monitoring](monitoring.html): track the behavior of your applications +* [Web UI](web-ui.html): view useful information about your applications * [Tuning Guide](tuning.html): best practices to optimize performance and memory use * [Job Scheduling](job-scheduling.html): scheduling resources across and within Spark applications * [Security](security.html): Spark security support @@ -145,7 +146,7 @@ options for deployment: * Integration with other storage systems: * [Cloud Infrastructures](cloud-integration.html) * [OpenStack Swift](storage-openstack-swift.html) -* [Migration Guide](migration-guide.html): Migration guides for Spark components +* [Migration Guide](migration-guide.html): migration guides for Spark components * [Building Spark](building-spark.html): build Spark using the Maven system * [Contributing to Spark](https://spark.apache.org/contributing.html) * [Third Party Projects](https://spark.apache.org/third-party-projects.html): related third party Spark projects - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 7a899e219f5a [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` 7a899e219f5a is described below commit 7a899e219f5a17ab12aeb8d67738025b7e2b9d9c Author: Huw Campbell AuthorDate: Mon Mar 18 07:38:10 2024 -0700 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` ### What changes were proposed in this pull request? Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash ### Why are the changes needed? When using a proxy, an invalid redirect is issued if this is not included ### Does this PR introduce _any_ user-facing change? Only that people will be able to use these links if they are using a proxy ### How was this patch tested? With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists. Edit: Further tested by building a version of our application with this patch applied, the links work now. ### Was this patch authored or co-authored using generative AI tooling? No. Page with working link https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";> Goes correctly to https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";> Before it would redirect and we'd get a 404. https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";> Closes #45527 from HuwCampbell/patch-1. Authored-by: Huw Campbell Signed-off-by: Dongjoon Hyun (cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala index 7cd7db4088ac..ce3e7cde01b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala @@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable( override def row(query: StructuredStreamingRow): Seq[Node] = { val streamingQuery = query.streamingUIData -val statisticsLink = "%s/%s/statistics?id=%s" +val statisticsLink = "%s/%s/statistics/?id=%s" .format(SparkUIUtils.prependBaseUri(request, parent.basePath), parent.prefix, streamingQuery.summary.runId) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new bb7a6138b827 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` bb7a6138b827 is described below commit bb7a6138b827975fc827813ab42a2b9074bf8d5e Author: Huw Campbell AuthorDate: Mon Mar 18 07:38:10 2024 -0700 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` ### What changes were proposed in this pull request? Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash ### Why are the changes needed? When using a proxy, an invalid redirect is issued if this is not included ### Does this PR introduce _any_ user-facing change? Only that people will be able to use these links if they are using a proxy ### How was this patch tested? With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists. Edit: Further tested by building a version of our application with this patch applied, the links work now. ### Was this patch authored or co-authored using generative AI tooling? No. Page with working link https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";> Goes correctly to https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";> Before it would redirect and we'd get a 404. https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";> Closes #45527 from HuwCampbell/patch-1. Authored-by: Huw Campbell Signed-off-by: Dongjoon Hyun (cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala index 7cd7db4088ac..ce3e7cde01b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala @@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable( override def row(query: StructuredStreamingRow): Seq[Node] = { val streamingQuery = query.streamingUIData -val statisticsLink = "%s/%s/statistics?id=%s" +val statisticsLink = "%s/%s/statistics/?id=%s" .format(SparkUIUtils.prependBaseUri(request, parent.basePath), parent.prefix, streamingQuery.summary.runId) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (d3f12df6e09e -> 9b466d329c3c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*` add 9b466d329c3c [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated (be0e44e59b3e -> b4e2c6750cb3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git from be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI add b4e2c6750cb3 [SPARK-47433][PYTHON][DOCS][INFRA][3.4] Update PySpark package dependency with version ranges No new revisions were added by this update. Summary of changes: dev/requirements.txt | 2 +- python/docs/source/getting_started/install.rst | 16 2 files changed, 9 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new cc6912ec612c [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0` cc6912ec612c is described below commit cc6912ec612c30e46e1595860a5519bb1caa221b Author: Dongjoon Hyun AuthorDate: Sun Mar 17 15:15:50 2024 -0700 [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0` ### What changes were proposed in this pull request? This PR aims to add `pyarrow` upper bound requirement, `<13.0.0`, to Apache Spark 3.5.x. ### Why are the changes needed? PyArrow 13.0.0 has breaking changes mentioned by #42920 which is a part of Apache Spark 4.0.0. ### Does this PR introduce _any_ user-facing change? No, this only clarifies the upper bound. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45553 from dongjoon-hyun/SPARK-47432. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/requirements.txt | 2 +- python/docs/source/getting_started/install.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/requirements.txt b/dev/requirements.txt index 597417aba1f3..0749af75aa4b 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -3,7 +3,7 @@ py4j # PySpark dependencies (optional) numpy -pyarrow +pyarrow<13.0.0 pandas scipy plotly diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 6822285e9617..e97632a8b384 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -157,7 +157,7 @@ PackageSupported version Note == = == `py4j` >=0.10.9.7Required `pandas` >=1.0.5 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL -`pyarrow` >=4.0.0 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL +`pyarrow` >=4.0.0,<13.0.0 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL `numpy`>=1.15Required for pandas API on Spark and MLLib DataFrame-based API; Optional for Spark SQL `grpcio` >=1.48,<1.57 Required for Spark Connect `grpcio-status`>=1.48,<1.57 Required for Spark Connect - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI be0e44e59b3e is described below commit be0e44e59b3e71cb11353e11f19146e0d1827432 Author: Ruifeng Zheng AuthorDate: Wed Sep 13 15:51:27 2023 +0800 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI Pin `pyarrow==12.0.1` in CI to fix test failure, https://github.com/apache/spark/actions/runs/6167186123/job/16738683632 ``` == FAIL [0.095s]: test_from_to_pandas (pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, in _assert_pandas_equal assert_series_equal( File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}") File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail raise AssertionError(msg) AssertionError: Attributes of Series are different Attribute "dtype" are different [left]: datetime64[ns] [right]: datetime64[us] ``` No CI and manually test No Closes #42897 from zhengruifeng/pin_pyarrow. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng (cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b) Signed-off-by: Dongjoon Hyun (cherry picked from commit 8049a203b8c5f2f8045701916e66cfc786e16b57) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 ++-- dev/infra/Dockerfile | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 33747fb5b61d..2184577d5c44 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -252,7 +252,7 @@ jobs: - name: Install Python packages (Python 3.8) if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) run: | -python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5' +python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5' python3.8 -m pip list # Run the tests. - name: Run tests @@ -626,7 +626,7 @@ jobs: # See also https://issues.apache.org/jira/browse/SPARK-38279. python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 'alabaster==0.7.13' python3.9 -m pip install ipython_genutils # See SPARK-38517 -python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly>=4.8' +python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 'plotly>=4.8' python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421 apt-get update -y apt-get install -y ruby ruby-dev diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 2e78f4af2144..93d8793826ff 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib -RUN python3.9 -m pip install numpy pyarrow 'pandas<=1.5.3' scipy unittest-xml-reporting plotl
(spark) branch branch-3.5 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8049a203b8c5 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI 8049a203b8c5 is described below commit 8049a203b8c5f2f8045701916e66cfc786e16b57 Author: Ruifeng Zheng AuthorDate: Wed Sep 13 15:51:27 2023 +0800 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI ### What changes were proposed in this pull request? Pin `pyarrow==12.0.1` in CI ### Why are the changes needed? to fix test failure, https://github.com/apache/spark/actions/runs/6167186123/job/16738683632 ``` == FAIL [0.095s]: test_from_to_pandas (pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, in _assert_pandas_equal assert_series_equal( File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}") File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail raise AssertionError(msg) AssertionError: Attributes of Series are different Attribute "dtype" are different [left]: datetime64[ns] [right]: datetime64[us] ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI and manually test ### Was this patch authored or co-authored using generative AI tooling? No Closes #42897 from zhengruifeng/pin_pyarrow. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng (cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 ++-- dev/infra/Dockerfile | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index b0760a955342..8488540b415d 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -258,7 +258,7 @@ jobs: - name: Install Python packages (Python 3.8) if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) run: | -python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3' +python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3' python3.8 -m pip list # Run the tests. - name: Run tests @@ -684,7 +684,7 @@ jobs: # See also https://issues.apache.org/jira/browse/SPARK-38279. python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 'alabaster==0.7.13' python3.9 -m pip install ipython_genutils # See SPARK-38517 -python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly>=4.8' +python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 'plotly>=4.8' python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421 apt-get update -y apt-get install -y ruby ruby-dev diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index d3bae836cc63..d3fcd7ab3622 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" RUN pypy3 -m pip install numpy 'pandas
(spark) branch master updated: [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2dba72100e03 [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre` 2dba72100e03 is described below commit 2dba72100e0326f1889ff0be2dc576b1e712ad15 Author: panbingkun AuthorDate: Sun Mar 17 13:52:14 2024 -0700 [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre` ### What changes were proposed in this pull request? The pr aims to upgrade Guava used by the `connect` module to `33.1.0-jre`. ### Why are the changes needed? - The new version bring some bug fixes and optimizations as follows: cache: Fixed a bug that could cause https://github.com/google/guava/pull/6851#issuecomment-1931276822. hash: Optimized Checksum-based hash functions for Java 9+. - The full release notes: https://github.com/google/guava/releases/tag/v33.1.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45540 from panbingkun/SPARK-47426. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index d67ab1c01273..757d911c1229 100644 --- a/pom.xml +++ b/pom.xml @@ -288,7 +288,7 @@ true -33.0.0-jre +33.1.0-jre 1.0.2 1.62.2 1.1.3 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: Update the organization in committers.md (#509)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 3eae7010b9 Update the organization in committers.md (#509) 3eae7010b9 is described below commit 3eae7010b9f3cc01ceabe5036c0bd8910ccb8c67 Author: Jerry Shao AuthorDate: Sat Mar 16 20:53:28 2024 -0700 Update the organization in committers.md (#509) --- committers.md| 2 +- site/committers.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/committers.md b/committers.md index 58aedb94fd..17530a2411 100644 --- a/committers.md +++ b/committers.md @@ -73,7 +73,7 @@ navigation: |Josh Rosen|Stripe| |Sandy Ryza|Remix| |Kousuke Saruta|NTT Data| -|Saisai Shao|Tencent| +|Saisai Shao|Datastrato| |Prashant Sharma|IBM| |Gabor Somogyi|Apple| |Ram Sriharsha|Databricks| diff --git a/site/committers.html b/site/committers.html index 8a9839aa91..22e2f4c481 100644 --- a/site/committers.html +++ b/site/committers.html @@ -403,7 +403,7 @@ Saisai Shao - Tencent + Datastrato Prashant Sharma - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 3c41b1d97e1f [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208 3c41b1d97e1f is described below commit 3c41b1d97e1f5ff9f74f9ea72f7ea92dcbca2122 Author: Dongjoon Hyun AuthorDate: Fri Mar 15 22:42:17 2024 -0700 [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208 ### What changes were proposed in this pull request? This PR aims to upgrade Jetty to 9.4.54.v20240208 for Apache Spark 3.4.3. ### Why are the changes needed? To bring the latest bug fixes. - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.52.v20230823 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.51.v20230217 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45544 from dongjoon-hyun/SPARK-47428-3.4. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index 691c83632b38..a94fbcd0ca77 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -143,7 +143,7 @@ jersey-hk2/2.36//jersey-hk2-2.36.jar jersey-server/2.36//jersey-server-2.36.jar jetty-sslengine/6.1.26//jetty-sslengine-6.1.26.jar jetty-util/6.1.26//jetty-util-6.1.26.jar -jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jetty/6.1.26//jetty-6.1.26.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.2//joda-time-2.12.2.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 4d94cb5c699e..99665da7d16a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -128,8 +128,8 @@ jersey-container-servlet/2.36//jersey-container-servlet-2.36.jar jersey-hk2/2.36//jersey-hk2-2.36.jar jersey-server/2.36//jersey-server-2.36.jar jettison/1.1//jettison-1.1.jar -jetty-util-ajax/9.4.50.v20221201//jetty-util-ajax-9.4.50.v20221201.jar -jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar +jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.2//joda-time-2.12.2.jar jodd-core/3.5.2//jodd-core-3.5.2.jar diff --git a/pom.xml b/pom.xml index 373d17b76c09..77218d162c41 100644 --- a/pom.xml +++ b/pom.xml @@ -143,7 +143,7 @@ 1.12.3 1.8.6 shaded-protobuf -9.4.50.v20221201 +9.4.54.v20240208 4.0.3 0.10.0 2.5.1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 210e80e8b7ba [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job 210e80e8b7ba is described below commit 210e80e8b7baa5fc1e6462615bc8134a4c90647c Author: Dongjoon Hyun AuthorDate: Tue Oct 17 23:38:56 2023 -0700 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job ### What changes were proposed in this pull request? This PR aims to skip `Unidoc` and `MIMA` phases in many general test pipelines. `mima` test is moved to `lint` job. ### Why are the changes needed? By having an independent document generation and mima checking GitHub Action job, we can skip them in the following many jobs. https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check the GitHub action logs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43422 from dongjoon-hyun/SPARK-45587. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 1 file changed, 4 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 13527119e51a..33747fb5b61d 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -198,6 +198,8 @@ jobs: HIVE_PROFILE: ${{ matrix.hive }} GITHUB_PREV_SHA: ${{ github.event.before }} SPARK_LOCAL_IP: localhost + SKIP_UNIDOC: true + SKIP_MIMA: true SKIP_PACKAGING: true steps: - name: Checkout Spark repository @@ -578,6 +580,8 @@ jobs: run: ./dev/check-license - name: Dependencies test run: ./dev/test-dependencies.sh +- name: MIMA test + run: ./dev/mima - name: Scala linter run: ./dev/lint-scala - name: Java linter - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8c6eeb8ab018 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job 8c6eeb8ab018 is described below commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794 Author: Dongjoon Hyun AuthorDate: Tue Oct 17 23:38:56 2023 -0700 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job ### What changes were proposed in this pull request? This PR aims to skip `Unidoc` and `MIMA` phases in many general test pipelines. `mima` test is moved to `lint` job. ### Why are the changes needed? By having an independent document generation and mima checking GitHub Action job, we can skip them in the following many jobs. https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check the GitHub action logs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43422 from dongjoon-hyun/SPARK-45587. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 4 1 file changed, 4 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index ad8685754b31..b0760a955342 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -204,6 +204,8 @@ jobs: HIVE_PROFILE: ${{ matrix.hive }} GITHUB_PREV_SHA: ${{ github.event.before }} SPARK_LOCAL_IP: localhost + SKIP_UNIDOC: true + SKIP_MIMA: true SKIP_PACKAGING: true steps: - name: Checkout Spark repository @@ -627,6 +629,8 @@ jobs: run: ./dev/check-license - name: Dependencies test run: ./dev/test-dependencies.sh +- name: MIMA test + run: ./dev/mima - name: Scala linter run: ./dev/lint-scala - name: Java linter - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d59425275cdd [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208 d59425275cdd is described below commit d59425275cdd0ff678a5bcccef4c7b74fe8170cb Author: Dongjoon Hyun AuthorDate: Fri Mar 15 22:28:45 2024 -0700 [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208 ### What changes were proposed in this pull request? This PR aims to upgrade Jetty to 9.4.54.v20240208 ### Why are the changes needed? To bring the latest bug fixes. - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208 - https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45543 from dongjoon-hyun/SPARK-47428. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index c76702cd0af0..8ecf931bf513 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -130,8 +130,8 @@ jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar jersey-hk2/2.40//jersey-hk2-2.40.jar jersey-server/2.40//jersey-server-2.40.jar jettison/1.1//jettison-1.1.jar -jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar -jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar +jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar +jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.5//joda-time-2.12.5.jar jodd-core/3.5.2//jodd-core-3.5.2.jar diff --git a/pom.xml b/pom.xml index 5db3c78e00eb..fb6208777d3f 100644 --- a/pom.xml +++ b/pom.xml @@ -143,7 +143,7 @@ 1.13.1 1.9.2 shaded-protobuf -9.4.52.v20230823 +9.4.54.v20240208 4.0.3 0.10.0
(spark) branch master updated (4437e6e21237 -> 6bf031796c8c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` to `common/utils` add 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate `test_metadata` No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/connect/test_connect_session.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (b7aa9740249b -> 4437e6e21237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to NullType add 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` to `common/utils` No new revisions were added by this update. Summary of changes: .../utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {core => common/utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47234][BUILD] Upgrade Scala to 2.13.13
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 56cfc89e8f15 [SPARK-47234][BUILD] Upgrade Scala to 2.13.13 56cfc89e8f15 is described below commit 56cfc89e8f1599fe859db1bd6628a9b07d53bed4 Author: panbingkun AuthorDate: Thu Mar 14 22:40:54 2024 -0700 [SPARK-47234][BUILD] Upgrade Scala to 2.13.13 ### What changes were proposed in this pull request? The pr aims to upgrade scala from `2.13.12` to `2.13.13`. ### Why are the changes needed? - The new version bring some bug fixes: https://github.com/scala/scala/pull/10525 https://github.com/scala/scala/pull/10528 - The release notes as follows: https://github.com/scala/scala/releases/tag/v2.13.13 ### Does this PR introduce _any_ user-facing change? Yes, The `scala` version is changed from `2.13.12` to `2.13.13`. ### How was this patch tested? - Pass GA. - After the master is upgraded to this version `2.13.13`, we need to continue to observe. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45342 from panbingkun/SPARK-47234. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 docs/_config.yml | 2 +- pom.xml | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 2e091cb3638e..d4b7d38aea22 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -139,7 +139,7 @@ jettison/1.5.4//jettison-1.5.4.jar jetty-util-ajax/11.0.20//jetty-util-ajax-11.0.20.jar jetty-util/11.0.20//jetty-util-11.0.20.jar jline/2.14.6//jline-2.14.6.jar -jline/3.22.0//jline-3.22.0.jar +jline/3.24.1//jline-3.24.1.jar jna/5.13.0//jna-5.13.0.jar joda-time/2.12.7//joda-time-2.12.7.jar jodd-core/3.5.2//jodd-core-3.5.2.jar @@ -245,11 +245,11 @@ py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar rocksdbjni/8.11.3//rocksdbjni-8.11.3.jar scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar -scala-compiler/2.13.12//scala-compiler-2.13.12.jar -scala-library/2.13.12//scala-library-2.13.12.jar +scala-compiler/2.13.13//scala-compiler-2.13.13.jar +scala-library/2.13.13//scala-library-2.13.13.jar scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar -scala-reflect/2.13.12//scala-reflect-2.13.12.jar +scala-reflect/2.13.13//scala-reflect-2.13.13.jar scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar slf4j-api/2.0.12//slf4j-api-2.0.12.jar snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar diff --git a/docs/_config.yml b/docs/_config.yml index 7a305ceea67b..19183f85df23 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -22,7 +22,7 @@ include: SPARK_VERSION: 4.0.0-SNAPSHOT SPARK_VERSION_SHORT: 4.0.0 SCALA_BINARY_VERSION: "2.13" -SCALA_VERSION: "2.13.12" +SCALA_VERSION: "2.13.13" SPARK_ISSUE_TRACKER_URL: https://issues.apache.org/jira/browse/SPARK SPARK_GITHUB_URL: https://github.com/apache/spark # Before a new release, we should: diff --git a/pom.xml b/pom.xml index 6a811e74e7f8..d67ab1c01273 100644 --- a/pom.xml +++ b/pom.xml @@ -172,7 +172,7 @@ 3.2.2 4.4 -2.13.12 +2.13.13 2.13 2.2.0 @@ -226,7 +226,7 @@ ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too. --> 15.0.0 -2.5.11 +3.0.0-M1 org.fusesource.leveldbjni - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (213399b61de5 -> fe0aa1edff04)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT TIME ZONE to TimestampNTZType add fe0aa1edff04 [SPARK-47402][BUILD] Upgrade `ZooKeeper` to 3.9.2 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (7b4ab4fa452d -> 213399b61de5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7b4ab4fa452d [SPARK-47387][SQL] Remove some unused error classes add 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT TIME ZONE to TimestampNTZType No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala | 1 + .../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 10 ++ 2 files changed, 11 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d41d5ecda8c1 [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect d41d5ecda8c1 is described below commit d41d5ecda8c11d7e8f6a1fafa1d2be97c0f49f04 Author: Kent Yao AuthorDate: Thu Mar 14 10:30:48 2024 -0700 [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect ### What changes were proposed in this pull request? This PR fixes a bug in SPARK-47390, we shall separate TIME from TIMESTAMP case-match branch ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? local test with #45519 merged together ### Was this patch authored or co-authored using generative AI tooling? no Closes #45522 from yaooqinn/SPARK-47390-F. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala index 7d8ed70b2bd1..9b286620a140 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala @@ -58,10 +58,14 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelper { // See SPARK-34333 and https://github.com/pgjdbc/pgjdbc/issues/100 Some(StringType) case Types.TIMESTAMP -if "timestamptz".equalsIgnoreCase(typeName) || "timetz".equalsIgnoreCase(typeName) => +if "timestamptz".equalsIgnoreCase(typeName) => // timestamptz represents timestamp with time zone, currently it maps to Types.TIMESTAMP. // We need to change to Types.TIMESTAMP_WITH_TIMEZONE if the upstream changes. Some(TimestampType) + case Types.TIME if "timetz".equalsIgnoreCase(typeName) => +// timetz represents time with time zone, currently it maps to Types.TIME. +// We need to change to Types.TIME_WITH_TIMEZONE if the upstream changes. +Some(TimestampType) case Types.OTHER => Some(StringType) case _ if "text".equalsIgnoreCase(typeName) => Some(StringType) // sqlType is Types.VARCHAR case Types.ARRAY => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (481597cd2d79 -> b98accd9d931)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 481597cd2d79 [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20 add b98accd9d931 [SPARK-47401][K8S][DOCS] Update `YuniKorn` docs with v1.5 No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org