from:"dongjoon"

(spark) branch master updated: [SPARK-47611][SQL] Cleanup dead code in MySQLDialect.getCatalystType

2024-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b540cc538614 [SPARK-47611][SQL] Cleanup dead code in 
MySQLDialect.getCatalystType
b540cc538614 is described below

commit b540cc538614c9808dc5e83a339ff52917fa0f37
Author: Kent Yao 
AuthorDate: Wed Mar 27 01:45:22 2024 -0700

[SPARK-47611][SQL] Cleanup dead code in MySQLDialect.getCatalystType

### What changes were proposed in this pull request?

This PR removes an unnecessary case-match branch for Types.BIT in 
MySQLDialect.getCatalystType, this is a special case for Maria Connector/J and 
can be handled in defaults as we have matched and handled Types.BIT&size > 1 
before this.

Additionally, we add some new tests for this corner case and other 
MySQL/Maria quirks

### Why are the changes needed?

code refactoring and test improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45734 from yaooqinn/SPARK-47611.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 32 --
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  2 --
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  2 --
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 705957631601..10049169caa1 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -64,10 +64,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 conn.prepareStatement("CREATE TABLE unsigned_numbers (" +
   "tiny TINYINT UNSIGNED, small SMALLINT UNSIGNED, med MEDIUMINT 
UNSIGNED," +
   "nor INT UNSIGNED, big BIGINT UNSIGNED, deci DECIMAL(40,20) UNSIGNED," +
-  "dbl DOUBLE UNSIGNED)").executeUpdate()
+  "dbl DOUBLE UNSIGNED, tiny1u TINYINT(1) UNSIGNED)").executeUpdate()
 
 conn.prepareStatement("INSERT INTO unsigned_numbers VALUES (255, 65535, 
16777215, 4294967295," +
-  "9223372036854775808, 123456789012345.123456789012345, 
1.0002)").executeUpdate()
+  "9223372036854775808, 123456789012345.123456789012345, 
1.0002, 0)")
+  .executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -150,6 +151,13 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows.get(4).isInstanceOf[BigDecimal])
 assert(rows.get(5).isInstanceOf[BigDecimal])
 assert(rows.get(6).isInstanceOf[Double])
+// Unlike MySQL, MariaDB seems not to distinguish signed and unsigned 
tinyint(1).
+val isMaria = jdbcUrl.indexOf("disableMariaDbDriver") == -1
+if (isMaria) {
+  assert(rows.get(7).isInstanceOf[Boolean])
+} else {
+  assert(rows.get(7).isInstanceOf[Short])
+}
 assert(rows.getShort(0) === 255)
 assert(rows.getInt(1) === 65535)
 assert(rows.getInt(2) === 16777215)
@@ -157,6 +165,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows.getAs[BigDecimal](4).equals(new 
BigDecimal("9223372036854775808")))
 assert(rows.getAs[BigDecimal](5).equals(new 
BigDecimal("123456789012345.1234567890123450")))
 assert(rows.getDouble(6) === 1.0002)
+if (isMaria) {
+  assert(rows.getBoolean(7) === false)
+} else {
+  assert(rows.getShort(7) === 0)
+}
   }
 
   test("Date types") {
@@ -260,6 +273,21 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   test("SPARK-47478: all boolean synonyms read-write roundtrip") {
 val df = sqlContext.read.jdbc(jdbcUrl, "bools", new Properties)
 checkAnswer(df, Row(true, true, true))
+
+val properties0 = new Properties()
+properties0.setProperty("transformedBitIsBoolean", "false")
+properties0.setProperty("tinyInt1isBit", "true")
+
+checkAnswer(spark.read.jdbc(jdbcUrl, "bools", properties0), Row(true, 
true, true))
+val properties1 = new Properties()
+properties1.setProperty("transformedBitIsBoolean", "true")

(spark) branch master updated (f9eb3f3c13bf -> a600c0ea3159)

2024-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f9eb3f3c13bf [SPARK-46575][SQL][FOLLOWUP] Add back 
`HiveThriftServer2.startWithContext(SQLContext)` method for compatibility
 add a600c0ea3159 [SPARK-47491][CORE] Add `slf4j-api` jar to the class path 
first before the others of `jars` directory

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/spark/launcher/AbstractCommandBuilder.java | 6 ++
 .../test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala  | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (fd4b8e89f3a0 -> 7d87a94dd77f)

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fd4b8e89f3a0 [SPARK-47555][SQL] Show a warning message about 
SQLException if `JDBCTableCatalog.loadTable` fails
 add 7d87a94dd77f [MINOR][CORE] When failed to canceling the job group, add 
a warning log

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (e00eace41a63 -> fd4b8e89f3a0)

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e00eace41a63 [SPARK-47561][SQL] Fix analyzer rule order issues about 
Alias
 add fd4b8e89f3a0 [SPARK-47555][SQL] Show a warning message about 
SQLException if `JDBCTableCatalog.loadTable` fails

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47561][SQL] Fix analyzer rule order issues about Alias

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e00eace41a63 [SPARK-47561][SQL] Fix analyzer rule order issues about 
Alias
e00eace41a63 is described below

commit e00eace41a63996deb213b6e1816257ebca281e5
Author: Wenchen Fan 
AuthorDate: Tue Mar 26 07:45:54 2024 -0700

[SPARK-47561][SQL] Fix analyzer rule order issues about Alias

### What changes were proposed in this pull request?

We found two analyzer rule execution order issues in our internal workloads:
- `CreateStruct.apply` creates `NamePlaceholder` for unresolved 
`NamedExpression`. However, with certain rule execution order, the 
`NamedExpression` may be removed (e.g. remove unnecessary `Alias`) before 
`NamePlaceholder` is resolved, then `NamePlaceholder` can't be resolved anymore.
- UNPIVOT uses `UnresolvedAlias` to wrap `UnresolvedAttribute`. There is a 
conflict about how to determine the final alias name. If `ResolveAliases` runs 
first, then `UnresolvedAlias` will be removed and eventually the alias will be 
`b` for nested column `a.b`. If `ResolveReferences` runs first, then we resolve 
`a.b` first and then `UnresolvedAlias` will determine the alias as `a.b` not 
`b`.

This PR fixes the two issues
- `CreateStruct.apply` should determine the field name immediately if the 
input is `Alias`
- The parser rule for UNPIVOT should follow how we parse SELECT and return 
`UnresolvedAttribute` directly without the `UnresolvedAlias` wrapper. It's a 
bit risky to fix the order issue between `ResolveAliases` and 
`ResolveReferences` as it can change the final query schema, we will save it 
for later.

### Why are the changes needed?

fix unstable analyzer behavior with different rule execution orders.

### Does this PR introduce _any_ user-facing change?

Yes, some failed queries can run now. The issue for UNPIVOT only affects 
the error message.

### How was this patch tested?

verified by our internal workloads. The repro query is quite complicated to 
trigger a certain rule execution order so we won't add tests for it. The fix is 
quite obvious.

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45718 from cloud-fan/rule.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../catalyst/expressions/complexTypeCreator.scala  |  1 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  2 +-
 .../sql/catalyst/parser/UnpivotParserSuite.scala   | 39 ++
 3 files changed, 20 insertions(+), 22 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
index 332a49f78ab9..993684f2c1ed 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
@@ -374,6 +374,7 @@ object CreateStruct {
   // alias name inside CreateNamedStruct.
   case (u: UnresolvedAttribute, _) => Seq(Literal(u.nameParts.last), u)
   case (u @ UnresolvedExtractValue(_, e: Literal), _) if e.dataType == 
StringType => Seq(e, u)
+  case (a: Alias, _) => Seq(Literal(a.name), a)
   case (e: NamedExpression, _) if e.resolved => Seq(Literal(e.name), e)
   case (e: NamedExpression, _) => Seq(NamePlaceholder, e)
   case (g @ GetStructField(_, _, Some(name)), _) => Seq(Literal(name), g)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 131eaa3d..170dcc37f0a5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1346,7 +1346,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
* Create an Unpivot column.
*/
   override def visitUnpivotColumn(ctx: UnpivotColumnContext): NamedExpression 
= withOrigin(ctx) {
-
UnresolvedAlias(UnresolvedAttribute(visitMultipartIdentifier(ctx.multipartIdentifier)))
+UnresolvedAttribute(visitMultipartIdentifier(ctx.multipartIdentifier))
   }
 
   /**
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala
index c680e08c1c83..3012ef6f1544 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala
+++ 
b/sql/catalys

(spark) branch master updated: [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e20f8e3b440 [SPARK-47544][PYTHON] SparkSession builder method is 
incompatible with visual studio code intellisense
8e20f8e3b440 is described below

commit 8e20f8e3b4404b6d72ec47c546c94a040467c774
Author: Niranjan Jayakar 
AuthorDate: Tue Mar 26 07:43:10 2024 -0700

[SPARK-47544][PYTHON] SparkSession builder method is incompatible with 
visual studio code intellisense

### What changes were proposed in this pull request?

VS Code's intellisense is unable to detect the methods and properties of
`SparkSession.builder`. A video is worth a thousand words:

[video](https://github.com/apache/spark/assets/16217941/e611e7e7-8760-4d9f-aa6c-9d4bd519d516).

Adjust the implementation for better compatibility with the IDE.

### Why are the changes needed?

Compatibility with IDE tooling.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Built the wheel file locally and tested on local IDE.
See 
[video](https://github.com/apache/spark/assets/16217941/429b06dd-44a7-4d13-a551-c2b72c326c1e).

Confirmed the same works for Pycharm.

Further confirmed that the Pydocs for these methods are unaffected.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45700 from nija-at/vscode-intellisense.

Authored-by: Niranjan Jayakar 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/connect/session.py |  7 +++
 python/pyspark/sql/session.py | 29 ++---
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index f339fada0d11..2c08349a3300 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -236,10 +236,9 @@ class SparkSession:
 
 _client: SparkConnectClient
 
-@classproperty
-def builder(cls) -> Builder:
-return cls.Builder()
-
+# SPARK-47544: Explicitly declaring this as an identifier instead of a 
method.
+# If changing, make sure this bug is not reintroduced.
+builder: Builder = classproperty(lambda cls: cls.Builder())  # type: ignore
 builder.__doc__ = PySparkSession.builder.__doc__
 
 def __init__(self, connection: Union[str, DefaultChannelBuilder], userId: 
Optional[str] = None):
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 6c80b7f42da4..4a8a653fd466 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -499,12 +499,18 @@ class SparkSession(SparkConversionMixin):
 
 os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"
 opts["spark.remote"] = url
-return 
RemoteSparkSession.builder.config(map=opts).getOrCreate()
+return cast(
+SparkSession,
+
RemoteSparkSession.builder.config(map=opts).getOrCreate(),
+)
 elif "SPARK_LOCAL_REMOTE" in os.environ:
 url = "sc://localhost"
 os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"
 opts["spark.remote"] = url
-return 
RemoteSparkSession.builder.config(map=opts).getOrCreate()
+return cast(
+SparkSession,
+
RemoteSparkSession.builder.config(map=opts).getOrCreate(),
+)
 else:
 raise PySparkRuntimeError(
 error_class="SESSION_ALREADY_EXIST",
@@ -560,14 +566,14 @@ class SparkSession(SparkConversionMixin):
 # used in conjunction with Spark Connect mode.
 os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"
 opts["spark.remote"] = url
-return RemoteSparkSession.builder.config(map=opts).create()
+return cast(SparkSession, 
RemoteSparkSession.builder.config(map=opts).create())
 else:
 raise PySparkRuntimeError(
 error_class="ONLY_SUPPORTED_WITH_SPARK_CONNECT",
 message_parameters={"feature": 
"SparkSession.builder.create"},
 )
 
-# TODO(SPARK-38912): Replace @classproperty with @classmethod + @property 
once support for
+# TOD

(spark) branch master updated: [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89104b93d324 [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types
89104b93d324 is described below

commit 89104b93d324129ebe4dec3c666fe5e36a7586ad
Author: Kent Yao 
AuthorDate: Tue Mar 26 07:37:39 2024 -0700

[SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types

### What changes were proposed in this pull request?

This PR adds tests for MySQL ENUM/SET Types

In MySQL/Maria Connector/J, the JDBC ResultSetMetadata API maps ENUM/SET 
types to `typeId:java.sql.Types.CHAR,typeName:'CHAR'`, which makes it 
impossible to distinguish them from a normal `CHAR(n)` type.

When working with ENUM/SET, it's possible to encounter char padding issues. 
However, this can be resolved by setting the LEGACY_CHAR_VARCHAR_AS_STRING 
parameter to true.

### Why are the changes needed?

API auditing for MYSQL jdbc data source

### Does this PR introduce _any_ user-facing change?

no, test only

### How was this patch tested?

added tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45713 from yaooqinn/SPARK-47557.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++
 1 file changed, 15 insertions(+)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 09eb99c25227..705957631601 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -26,6 +26,7 @@ import scala.util.Using
 
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.tags.DockerTest
 
 /**
@@ -84,6 +85,10 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   "f4 FLOAT UNSIGNED, f5 FLOAT(10) UNSIGNED, f6 FLOAT(53) 
UNSIGNED)").executeUpdate()
 conn.prepareStatement("INSERT INTO floats VALUES (1.23, 4.56, 7.89, 1.23, 
4.56, 7.89)")
   .executeUpdate()
+
+conn.prepareStatement("CREATE TABLE collections (" +
+"a SET('cap', 'hat', 'helmet'), b ENUM('S', 'M', 'L', 
'XL'))").executeUpdate()
+conn.prepareStatement("INSERT INTO collections VALUES ('cap,hat', 
'M')").executeUpdate()
   }
 
   def testConnection(): Unit = {
@@ -275,6 +280,16 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val df = spark.read.jdbc(jdbcUrl, "floats", new Properties)
 checkAnswer(df, Row(1.23f, 4.56f, 7.89d, 1.23d, 4.56d, 7.89d))
   }
+
+  test("SPARK-47557: MySQL ENUM/SET types contains only java.sq.Types.CHAR 
information") {
+val df = spark.read.jdbc(jdbcUrl, "collections", new Properties)
+checkAnswer(df, Row("cap,hat   ", "M "))
+df.write.mode("append").jdbc(jdbcUrl, "collections", new Properties)
+withSQLConf(SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key -> "true") {
+  checkAnswer(spark.read.jdbc(jdbcUrl, "collections", new Properties),
+Row("cap,hat", "M") :: Row("cap,hat", "M") :: Nil)
+}
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for the number of partitions

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ded8cdf8d945 [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover 
the test case for the number of partitions
ded8cdf8d945 is described below

commit ded8cdf8d9459e0e5b73c01c8ee41ae54ccd7ac5
Author: Hyukjin Kwon 
AuthorDate: Tue Mar 26 07:35:49 2024 -0700

[SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for 
the number of partitions

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/45486 that 
addresses https://github.com/apache/spark/pull/45486#discussion_r1538753052 
review comment to recover the test coverage related to the number of partitions 
in Python Data Source.

### Why are the changes needed?

To restore the test coverage.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Unittest fixed, CI in this PR should verify it.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45720 from HyukjinKwon/SPARK-47367-folliwup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_python_datasource.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/pyspark/sql/tests/test_python_datasource.py 
b/python/pyspark/sql/tests/test_python_datasource.py
index f69e1dee1285..d028a210b007 100644
--- a/python/pyspark/sql/tests/test_python_datasource.py
+++ b/python/pyspark/sql/tests/test_python_datasource.py
@@ -28,6 +28,7 @@ from pyspark.sql.datasource import (
 WriterCommitMessage,
 CaseInsensitiveDict,
 )
+from pyspark.sql.functions import spark_partition_id
 from pyspark.sql.types import Row, StructType
 from pyspark.testing.sqlutils import (
 have_pyarrow,
@@ -236,10 +237,12 @@ class BasePythonDataSourceTestsMixin:
 
 self.spark.dataSource.register(InMemoryDataSource)
 df = self.spark.read.format("memory").load()
+self.assertEqual(df.select(spark_partition_id()).distinct().count(), 3)
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1"), Row(x=2, 
y="2")])
 
 df = self.spark.read.format("memory").option("num_partitions", 
2).load()
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")])
+self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2)
 
 def _get_test_json_data_source(self):
 import json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing

2024-03-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 87cae7bc7870 [SPARK-47552][CORE] Set 
`spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing
87cae7bc7870 is described below

commit 87cae7bc7870bacafc6afad99ba86a6efca2a464
Author: Dongjoon Hyun 
AuthorDate: Mon Mar 25 16:06:03 2024 -0700

[SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` 
to 30s if missing

### What changes were proposed in this pull request?

This PR aims to handle HADOOP-19097 from Apache Spark side. We can remove 
this when Apache Hadoop `3.4.1` releases.
- https://github.com/apache/hadoop/pull/6601

### Why are the changes needed?

Apache Hadoop shows a warning to its default configuration. This default 
value issue is fixed at Apache Spark 3.4.1.
```
24/03/25 14:46:21 WARN ConfigurationHelper: Option 
fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms 
instead
```

This change will suppress Apache Hadoop default warning in the consistent 
way with the future Hadoop releases.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Manually.

**BUILD**
```
$ dev/make-distribution.sh -Phadoop-cloud
```

**BEFORE**
```
scala> 
spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/")
...
24/03/25 15:50:46 WARN ConfigurationHelper: Option 
fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms 
instead
```

**AFTER**
```
scala> 
spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/")
...(ConfigurationHelper warning is gone)...
```

### Was this patch authored or co-authored using generative AI tooling?
    
No.

Closes #45710 from dongjoon-hyun/SPARK-47552.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/SparkContext.scala | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d519617c4095..f8f0107ed139 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -417,6 +417,9 @@ class SparkContext(config: SparkConf) extends Logging {
 if (!_conf.contains("spark.app.name")) {
   throw new SparkException("An application name must be set in your 
configuration")
 }
+// HADOOP-19097 Set fs.s3a.connection.establish.timeout to 30s
+// We can remove this after Apache Hadoop 3.4.1 releases
+conf.setIfMissing("spark.hadoop.fs.s3a.connection.establish.timeout", 
"30s")
 // This should be set as early as possible.
 SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47550][K8S][BUILD] Update `kubernetes-client` to 6.11.0

2024-03-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7b9b3cb9d82c [SPARK-47550][K8S][BUILD] Update `kubernetes-client` to 
6.11.0
7b9b3cb9d82c is described below

commit 7b9b3cb9d82cdf017d6cd57e0dee4239deb1727d
Author: Bjørn Jørgensen 
AuthorDate: Mon Mar 25 13:40:15 2024 -0700

[SPARK-47550][K8S][BUILD] Update `kubernetes-client` to 6.11.0

### What changes were proposed in this pull request?
Update `kubernetes-client` from 6.10.0 to 6.11.0

### Why are the changes needed?

[Release notes for 
6.11.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.11.0)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45707 from bjornjorgensen/kub-client6.11.0.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  2 +-
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 2ffef88dbe7e..0d3e24161fe6 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -155,31 +155,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
 jul-to-slf4j/2.0.12//jul-to-slf4j-2.0.12.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
-kubernetes-client-api/6.10.0//kubernetes-client-api-6.10.0.jar
-kubernetes-client/6.10.0//kubernetes-client-6.10.0.jar
-kubernetes-httpclient-okhttp/6.10.0//kubernetes-httpclient-okhttp-6.10.0.jar
-kubernetes-model-admissionregistration/6.10.0//kubernetes-model-admissionregistration-6.10.0.jar
-kubernetes-model-apiextensions/6.10.0//kubernetes-model-apiextensions-6.10.0.jar
-kubernetes-model-apps/6.10.0//kubernetes-model-apps-6.10.0.jar
-kubernetes-model-autoscaling/6.10.0//kubernetes-model-autoscaling-6.10.0.jar
-kubernetes-model-batch/6.10.0//kubernetes-model-batch-6.10.0.jar
-kubernetes-model-certificates/6.10.0//kubernetes-model-certificates-6.10.0.jar
-kubernetes-model-common/6.10.0//kubernetes-model-common-6.10.0.jar
-kubernetes-model-coordination/6.10.0//kubernetes-model-coordination-6.10.0.jar
-kubernetes-model-core/6.10.0//kubernetes-model-core-6.10.0.jar
-kubernetes-model-discovery/6.10.0//kubernetes-model-discovery-6.10.0.jar
-kubernetes-model-events/6.10.0//kubernetes-model-events-6.10.0.jar
-kubernetes-model-extensions/6.10.0//kubernetes-model-extensions-6.10.0.jar
-kubernetes-model-flowcontrol/6.10.0//kubernetes-model-flowcontrol-6.10.0.jar
-kubernetes-model-gatewayapi/6.10.0//kubernetes-model-gatewayapi-6.10.0.jar
-kubernetes-model-metrics/6.10.0//kubernetes-model-metrics-6.10.0.jar
-kubernetes-model-networking/6.10.0//kubernetes-model-networking-6.10.0.jar
-kubernetes-model-node/6.10.0//kubernetes-model-node-6.10.0.jar
-kubernetes-model-policy/6.10.0//kubernetes-model-policy-6.10.0.jar
-kubernetes-model-rbac/6.10.0//kubernetes-model-rbac-6.10.0.jar
-kubernetes-model-resource/6.10.0//kubernetes-model-resource-6.10.0.jar
-kubernetes-model-scheduling/6.10.0//kubernetes-model-scheduling-6.10.0.jar
-kubernetes-model-storageclass/6.10.0//kubernetes-model-storageclass-6.10.0.jar
+kubernetes-client-api/6.11.0//kubernetes-client-api-6.11.0.jar
+kubernetes-client/6.11.0//kubernetes-client-6.11.0.jar
+kubernetes-httpclient-okhttp/6.11.0//kubernetes-httpclient-okhttp-6.11.0.jar
+kubernetes-model-admissionregistration/6.11.0//kubernetes-model-admissionregistration-6.11.0.jar
+kubernetes-model-apiextensions/6.11.0//kubernetes-model-apiextensions-6.11.0.jar
+kubernetes-model-apps/6.11.0//kubernetes-model-apps-6.11.0.jar
+kubernetes-model-autoscaling/6.11.0//kubernetes-model-autoscaling-6.11.0.jar
+kubernetes-model-batch/6.11.0//kubernetes-model-batch-6.11.0.jar
+kubernetes-model-certificates/6.11.0//kubernetes-model-certificates-6.11.0.jar
+kubernetes-model-common/6.11.0//kubernetes-model-common-6.11.0.jar
+kubernetes-model-coordination/6.11.0//kubernetes-model-coordination-6.11.0.jar
+kubernetes-model-core/6.11.0//kubernetes-model-core-6.11.0.jar
+kubernetes-model-discovery/6.11.0//kubernetes-model-discovery-6.11.0.jar
+kubernetes-model-events/6.11.0//kubernetes-model-events-6.11.0.jar
+kubernetes-model-extensions/6.11.0//kubernetes-model-extensions-6.11.0.jar
+kubernetes-model-flowcontrol/6.11.0//kubernetes-model-flowcontrol-6.11.0.jar
+kubernetes-model-gatewayapi/6.11.0//kubernetes-model-gatewayapi-6.11.0.jar
+kubernetes-model-metrics/6.11.0//kubernetes-model-metrics-6.11.0.jar
+kubernetes-model-networking/6.11.0//kubernetes-model-networking-6.11.0.jar
+kubernetes-model-node/6.11.0//kubernetes-model-node-6.11.0

(spark) branch master updated: [SPARK-47548][BUILD] Remove unused `commons-beanutils` dependency

2024-03-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 90d506573409 [SPARK-47548][BUILD] Remove unused `commons-beanutils` 
dependency
90d506573409 is described below

commit 90d5065734095d840f51d1eea3449d969565b742
Author: Dongjoon Hyun 
AuthorDate: Mon Mar 25 11:35:18 2024 -0700

[SPARK-47548][BUILD] Remove unused `commons-beanutils` dependency

### What changes were proposed in this pull request?

This PR aims to remove unused `commons-beanutils` dependency from `pom.xml` 
and `LICENSE-binary`.

### Why are the changes needed?

#30701 removed `commons-beanutils` from `hadoop-3` profile at Apache Spark 
3.2.0.
- https://github.com/apache/spark/pull/30701

#40788 removed `hadoop-2` profile from Apache Spark 3.5.0
- https://github.com/apache/spark/pull/40788

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45705 from dongjoon-hyun/SPARK-47548.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 LICENSE-binary | 1 -
 pom.xml| 5 -
 2 files changed, 6 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index b9e7820c0baf..40271c9924bc 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -204,7 +204,6 @@
 This project bundles some components that are also licensed under the Apache
 License Version 2.0:
 
-commons-beanutils:commons-beanutils
 org.apache.zookeeper:zookeeper
 oro:oro
 commons-configuration:commons-configuration
diff --git a/pom.xml b/pom.xml
index 8e68ad7346f8..de26e6ed33a8 100644
--- a/pom.xml
+++ b/pom.xml
@@ -646,11 +646,6 @@
 commons-collections4
 ${commons.collections4.version}
   
-  
-commons-beanutils
-commons-beanutils
-1.9.4
-  
   
 org.apache.ivy
 ivy


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated (77fd58bf8d52 -> 5e7600eab833)

2024-03-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 77fd58bf8d52 [SPARK-47537][SQL][3.4] Fix error data type mapping on 
MySQL Connector/J
 add 5e7600eab833 [SPARK-47503][SQL][3.4] Make makeDotNode escape graph 
node name always

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ui/SparkPlanGraph.scala|  3 +-
 .../sql/execution/ui/SparkPlanGraphSuite.scala | 44 ++
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47537][SQL][3.4] Fix error data type mapping on MySQL Connector/J

2024-03-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 77fd58bf8d52 [SPARK-47537][SQL][3.4] Fix error data type mapping on 
MySQL Connector/J
77fd58bf8d52 is described below

commit 77fd58bf8d5276906c674f3cdcec2715c8520d47
Author: Kent Yao 
AuthorDate: Mon Mar 25 08:51:20 2024 -0700

[SPARK-47537][SQL][3.4] Fix error data type mapping on MySQL Connector/J

### What changes were proposed in this pull request?

This PR fixes:
- BIT(n>1) is wrongly mapping to boolean instead of long for MySQL 
Connector/J. This is because we only have a case branch for Maria Connector/J.
- MySQL Docker Integration Tests were using Maria Connector/J, not MySQL 
Connector/J

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45691 from yaooqinn/SPARK-47537-BB.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 47 +-
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 28 -
 .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala|  4 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  5 +++
 4 files changed, 79 insertions(+), 5 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index bc202b1b8323..d0fcbfb7aaa8 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -43,7 +43,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 override val usesIpc = false
 override val jdbcPort: Int = 3306
 override def getJdbcUrl(ip: String, port: Int): String =
-  s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
+  
s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass&disableMariaDbDriver"
   }
 
   override def dataPreparation(conn: Connection): Unit = {
@@ -74,6 +74,19 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   "'jumps', 'over', 'the', 'lazy', 'dog')").executeUpdate()
   }
 
+  def testConnection(): Unit = {
+val conn = getConnection()
+try {
+  assert(conn.getClass.getName === "com.mysql.cj.jdbc.ConnectionImpl")
+} finally {
+  conn.close()
+}
+  }
+
+  test("SPARK-47537: ensure use the right jdbc driver") {
+testConnection()
+  }
+
   test("Basic test") {
 val df = sqlContext.read.jdbc(jdbcUrl, "tbl", new Properties)
 val rows = df.collect()
@@ -193,3 +206,35 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(sql("select x, y from queryOption").collect.toSet == expectedResult)
   }
 }
+
+/**
+ * To run this test suite for a specific version (e.g., mysql:8.3.0):
+ * {{{
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 MYSQL_DOCKER_IMAGE_NAME=mysql:8.3.0
+ * ./build/sbt -Pdocker-integration-tests
+ * "docker-integration-tests/testOnly 
*MySQLOverMariaConnectorIntegrationSuite"
+ * }}}
+ */
+@DockerTest
+class MySQLOverMariaConnectorIntegrationSuite extends MySQLIntegrationSuite {
+
+  override val db = new DatabaseOnDocker {
+override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", 
"mysql:8.0.31")
+override val env = Map(
+  "MYSQL_ROOT_PASSWORD" -> "rootpass"
+)
+override val usesIpc = false
+override val jdbcPort: Int = 3306
+override def getJdbcUrl(ip: String, port: Int): String =
+  s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
+  }
+
+  override def testConnection(): Unit = {
+val conn = getConnection()
+try {
+  assert(conn.getClass.getName === "org.mariadb.jdbc.MariaDbConnection")
+} finally {
+  conn.close()
+}
+  }
+}
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
index 072fdbb3f342..c4056c224f66 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test

(spark) branch branch-3.5 updated: [SPARK-47537][SQL][3.5] Fix error data type mapping on MySQL Connector/J

2024-03-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9ad7b75784da [SPARK-47537][SQL][3.5] Fix error data type mapping on 
MySQL Connector/J
9ad7b75784da is described below

commit 9ad7b75784daa48bf20dd00ae3288c718272fd69
Author: Kent Yao 
AuthorDate: Mon Mar 25 08:50:00 2024 -0700

[SPARK-47537][SQL][3.5] Fix error data type mapping on MySQL Connector/J

### What changes were proposed in this pull request?

This PR fixes:
- BIT(n>1) is wrongly mapping to boolean instead of long for MySQL 
Connector/J. This is because we only have a case branch for Maria Connector/J.
- MySQL Docker Integration Tests were using Maria Connector/J, not MySQL 
Connector/J

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45690 from yaooqinn/SPARK-47537-B.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 47 +-
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 38 +
 .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala|  4 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  5 +++
 4 files changed, 84 insertions(+), 10 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index dcf4225d522d..68d88fbc552a 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -43,7 +43,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 override val usesIpc = false
 override val jdbcPort: Int = 3306
 override def getJdbcUrl(ip: String, port: Int): String =
-  s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
+  
s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass&disableMariaDbDriver"
   }
 
   override def dataPreparation(conn: Connection): Unit = {
@@ -75,6 +75,19 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   "'jumps', 'over', 'the', 'lazy', 'dog', '{\"status\": 
\"merrily\"}')").executeUpdate()
   }
 
+  def testConnection(): Unit = {
+val conn = getConnection()
+try {
+  assert(conn.getClass.getName === "com.mysql.cj.jdbc.ConnectionImpl")
+} finally {
+  conn.close()
+}
+  }
+
+  test("SPARK-47537: ensure use the right jdbc driver") {
+testConnection()
+  }
+
   test("Basic test") {
 val df = sqlContext.read.jdbc(jdbcUrl, "tbl", new Properties)
 val rows = df.collect()
@@ -200,3 +213,35 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(sql("select x, y from queryOption").collect.toSet == expectedResult)
   }
 }
+
+/**
+ * To run this test suite for a specific version (e.g., mysql:8.3.0):
+ * {{{
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 MYSQL_DOCKER_IMAGE_NAME=mysql:8.3.0
+ * ./build/sbt -Pdocker-integration-tests
+ * "docker-integration-tests/testOnly 
*MySQLOverMariaConnectorIntegrationSuite"
+ * }}}
+ */
+@DockerTest
+class MySQLOverMariaConnectorIntegrationSuite extends MySQLIntegrationSuite {
+
+  override val db = new DatabaseOnDocker {
+override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", 
"mysql:8.0.31")
+override val env = Map(
+  "MYSQL_ROOT_PASSWORD" -> "rootpass"
+)
+override val usesIpc = false
+override val jdbcPort: Int = 3306
+override def getJdbcUrl(ip: String, port: Int): String =
+  s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass"
+  }
+
+  override def testConnection(): Unit = {
+val conn = getConnection()
+try {
+  assert(conn.getClass.getName === "org.mariadb.jdbc.MariaDbConnection")
+} finally {
+  conn.close()
+}
+  }
+}
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
index 719b858b87b6..f6f264804e7d 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegration

(spark) branch master updated: [SPARK-47538][BUILD] Remove `commons-logging` dependency

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e22ddcbd852c [SPARK-47538][BUILD] Remove `commons-logging` dependency
e22ddcbd852c is described below

commit e22ddcbd852c95375d39fd6074627e1b5a91c6e7
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 24 23:16:50 2024 -0700

[SPARK-47538][BUILD] Remove `commons-logging` dependency

### What changes were proposed in this pull request?

This PR aims to remove `commons-logging` dependency in favor of 
`jcl-over-slf4j`.

### Why are the changes needed?

- https://slf4j.org/legacy.html#jclOverSLF4J

> To ease migration to SLF4J from JCL, SLF4J distributions include the jar 
file jcl-over-slf4j.jar. This jar file is intended as a drop-in replacement for 
JCL version 1.1.1. It implements the public API of JCL but using SLF4J 
underneath, hence the name "JCL over SLF4J."

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45687 from dongjoon-hyun/commons-logging.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 LICENSE-binary|  1 -
 NOTICE-binary |  8 
 connector/kafka-0-10-sql/pom.xml  |  4 
 connector/kafka-0-10/pom.xml  |  4 
 dev/deps/spark-deps-hadoop-3-hive-2.3 |  1 -
 pom.xml   | 24 ++--
 sql/hive-thriftserver/pom.xml |  6 ++
 7 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index 2073d85246b6..b9e7820c0baf 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -325,7 +325,6 @@ commons-cli:commons-cli
 commons-dbcp:commons-dbcp
 commons-io:commons-io
 commons-lang:commons-lang
-commons-logging:commons-logging
 commons-net:commons-net
 commons-pool:commons-pool
 io.fabric8:zjsonpatch
diff --git a/NOTICE-binary b/NOTICE-binary
index ef2dba45055a..5f1c1c617c36 100644
--- a/NOTICE-binary
+++ b/NOTICE-binary
@@ -271,14 +271,6 @@ benchmarking framework, which can be obtained at:
   * HOMEPAGE:
 * https://github.com/google/caliper
 
-This product optionally depends on 'Apache Commons Logging', a logging
-framework, which can be obtained at:
-
-  * LICENSE:
-* license/LICENSE.commons-logging.txt (Apache License 2.0)
-  * HOMEPAGE:
-* http://commons.apache.org/logging/
-
 This product optionally depends on 'Apache Log4J', a logging framework, which
 can be obtained at:
 
diff --git a/connector/kafka-0-10-sql/pom.xml b/connector/kafka-0-10-sql/pom.xml
index e22a57354b89..35f58134f1a8 100644
--- a/connector/kafka-0-10-sql/pom.xml
+++ b/connector/kafka-0-10-sql/pom.xml
@@ -116,6 +116,10 @@
   com.fasterxml.jackson.core
   jackson-annotations
 
+
+  commons-logging
+  commons-logging
+
   
 
 
diff --git a/connector/kafka-0-10/pom.xml b/connector/kafka-0-10/pom.xml
index 6a71a5d446c8..1b26839a371c 100644
--- a/connector/kafka-0-10/pom.xml
+++ b/connector/kafka-0-10/pom.xml
@@ -92,6 +92,10 @@
   com.fasterxml.jackson.core
   jackson-annotations
 
+
+  commons-logging
+  commons-logging
+
   
 
 
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index d39c92f3fc37..2ffef88dbe7e 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -46,7 +46,6 @@ commons-dbcp/1.4//commons-dbcp-1.4.jar
 commons-io/2.15.1//commons-io-2.15.1.jar
 commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.14.0//commons-lang3-3.14.0.jar
-commons-logging/1.1.3//commons-logging-1.1.3.jar
 commons-math3/3.6.1//commons-math3-3.6.1.jar
 commons-pool/1.5.4//commons-pool-1.5.4.jar
 commons-text/1.11.0//commons-text-1.11.0.jar
diff --git a/pom.xml b/pom.xml
index 5a878a1c3319..8e68ad7346f8 100644
--- a/pom.xml
+++ b/pom.xml
@@ -651,12 +651,6 @@
 commons-beanutils
 1.9.4
   
-  
-commons-logging
-commons-logging
-
-1.1.3
-  
   
 org.apache.ivy
 ivy
@@ -671,6 +665,12 @@
 org.apache.httpcomponents
 httpclient
 ${commons.httpclient.version}
+
+  
+commons-logging
+commons-logging
+  
+
   
   
 org.apache.httpcomponents
@@ -721,6 +721,12 @@
 htmlunit3-driver
 ${htmlunit3-driver.version}
 test
+
+  
+commons-logging
+commons-logging
+  
+
   
 
   
@@ -1

(spark) branch master updated: [SPARK-47537][SQL] Fix error data type mapping on MySQL Connector/J

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 824d67052c75 [SPARK-47537][SQL] Fix error data type mapping on MySQL 
Connector/J
824d67052c75 is described below

commit 824d67052c7542594fe98405b5062593d90233ee
Author: Kent Yao 
AuthorDate: Sun Mar 24 23:04:44 2024 -0700

[SPARK-47537][SQL] Fix error data type mapping on MySQL Connector/J

### What changes were proposed in this pull request?

This PR fixes:
- BIT(n>1) is wrongly mapping to boolean instead of long for MySQL 
Connector/J. This is because we only have a case branch for Maria Connector/J.
- MySQL Docker Integration Tests were using Maria Connector/J, not MySQL 
Connector/J

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45689 from yaooqinn/SPARK-47537.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLDatabaseOnDocker.scala |  4 +-
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 45 ++
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 29 +++---
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  5 +++
 4 files changed, 67 insertions(+), 16 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
index 87b13a06d965..568eb5f10973 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
@@ -26,6 +26,6 @@ class MySQLDatabaseOnDocker extends DatabaseOnDocker {
   override val jdbcPort: Int = 3306
 
   override def getJdbcUrl(ip: String, port: Int): String =
-s"jdbc:mysql://$ip:$port/" +
-  
s"mysql?user=root&password=rootpass&allowPublicKeyRetrieval=true&useSSL=false"
+
s"jdbc:mysql://$ip:$port/mysql?user=root&password=rootpass&allowPublicKeyRetrieval=true"
 +
+  s"&useSSL=false&disableMariaDbDriver"
 }
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 921e63acf7e1..09eb99c25227 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -22,9 +22,10 @@ import java.sql.{Connection, Date, Timestamp}
 import java.time.LocalDateTime
 import java.util.Properties
 
+import scala.util.Using
+
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
-import org.apache.spark.sql.types.{BooleanType, MetadataBuilder, StructType}
 import org.apache.spark.tags.DockerTest
 
 /**
@@ -85,6 +86,16 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   .executeUpdate()
   }
 
+  def testConnection(): Unit = {
+Using.resource(getConnection()) { conn =>
+  assert(conn.getClass.getName === "com.mysql.cj.jdbc.ConnectionImpl")
+}
+  }
+
+  test("SPARK-47537: ensure use the right jdbc driver") {
+testConnection()
+  }
+
   test("Basic test") {
 val df = sqlContext.read.jdbc(jdbcUrl, "tbl", new Properties)
 val rows = df.collect()
@@ -246,13 +257,6 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 checkAnswer(df, Row(true, true, true))
 df.write.mode("append").jdbc(jdbcUrl, "bools", new Properties)
 checkAnswer(df, Seq(Row(true, true, true), Row(true, true, true)))
-val mb = new MetadataBuilder()
-  .putBoolean("isTimestampNTZ", false)
-  .putLong("scale", 0)
-assert(df.schema === new StructType()
-  .add("b1", BooleanType, nullable = true, mb.putBoolean("isSigned", 
true).build())
-  .add("b2", BooleanType, nullable = true, mb.putBoolean("isSigned", 
false).build())
-  .add("b3", BooleanType, nullable = true, mb.putBoolean("isSigned", 
true).build()))
   }
 
   test("SPARK-47515: Save TimestampNTZType as DATETIME in MySQL") {
@@ -272,3 +276,28 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 checkAnswe

(spark) branch master updated (b34a44f175aa -> d8d119a21e07)

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b34a44f175aa [SPARK-47535][INFRA] Update `publish_snapshot.yml` to 
publish twice per day
 add d8d119a21e07 [SPARK-47536][BUILD] Upgrade `jmock-junit5` to 2.13.1

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (99fb84b7ad27 -> b34a44f175aa)

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 99fb84b7ad27 [SPARK-47534][SQL] Move `o.a.s.variant` to 
`o.a.s.types.variant`
 add b34a44f175aa [SPARK-47535][INFRA] Update `publish_snapshot.yml` to 
publish twice per day

No new revisions were added by this update.

Summary of changes:
 .github/workflows/publish_snapshot.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47534][SQL] Move `o.a.s.variant` to `o.a.s.types.variant`

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 99fb84b7ad27 [SPARK-47534][SQL] Move `o.a.s.variant` to 
`o.a.s.types.variant`
99fb84b7ad27 is described below

commit 99fb84b7ad276114b1ab97bd71704d4bdc163a40
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 24 17:18:12 2024 -0700

[SPARK-47534][SQL] Move `o.a.s.variant` to `o.a.s.types.variant`

### What changes were proposed in this pull request?

According to 
https://github.com/apache/spark/pull/45479#pullrequestreview-1946939461, this 
PR aims to rename `variant` package and the corresponding test suite like the 
following.
```
- package org.apache.spark.variant;
+ package org.apache.spark.types.variant;
```

```
$ git diff master --stat
 common/variant/src/main/java/org/apache/spark/{ => 
types}/variant/Variant.java   | 2 +-
 common/variant/src/main/java/org/apache/spark/{ => 
types}/variant/VariantBuilder.java| 4 ++--
 common/variant/src/main/java/org/apache/spark/{ => 
types}/variant/VariantSizeLimitException.java | 2 +-
 common/variant/src/main/java/org/apache/spark/{ => 
types}/variant/VariantUtil.java   | 2 +-
 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala
   | 2 +-
 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/{ => 
variant}/VariantExpressionSuite.scala | 6 +++---
 6 files changed, 9 insertions(+), 9 deletions(-)
```

### Why are the changes needed?

To make it clear that `variant` package is related to be a type.

### Does this PR introduce _any_ user-facing change?

No. This package is new in Apache Spark 4.0.0.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45685 from dongjoon-hyun/SPARK-47534.

    Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/java/org/apache/spark/{ => types}/variant/Variant.java | 2 +-
 .../java/org/apache/spark/{ => types}/variant/VariantBuilder.java   | 4 ++--
 .../apache/spark/{ => types}/variant/VariantSizeLimitException.java | 2 +-
 .../main/java/org/apache/spark/{ => types}/variant/VariantUtil.java | 2 +-
 .../spark/sql/catalyst/expressions/variant/variantExpressions.scala | 2 +-
 .../catalyst/expressions/{ => variant}/VariantExpressionSuite.scala | 6 +++---
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/common/variant/src/main/java/org/apache/spark/variant/Variant.java 
b/common/variant/src/main/java/org/apache/spark/types/variant/Variant.java
similarity index 96%
rename from common/variant/src/main/java/org/apache/spark/variant/Variant.java
rename to 
common/variant/src/main/java/org/apache/spark/types/variant/Variant.java
index 11c82d3fe1c0..e43b7ec8ac54 100644
--- a/common/variant/src/main/java/org/apache/spark/variant/Variant.java
+++ b/common/variant/src/main/java/org/apache/spark/types/variant/Variant.java
@@ -15,7 +15,7 @@
  * limitations under the License.
  */
 
-package org.apache.spark.variant;
+package org.apache.spark.types.variant;
 
 /**
  * This class is structurally equivalent to {@link 
org.apache.spark.unsafe.types.VariantVal}. We
diff --git 
a/common/variant/src/main/java/org/apache/spark/variant/VariantBuilder.java 
b/common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java
similarity index 99%
rename from 
common/variant/src/main/java/org/apache/spark/variant/VariantBuilder.java
rename to 
common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java
index 70227d67746d..21a12cbe9d71 100644
--- a/common/variant/src/main/java/org/apache/spark/variant/VariantBuilder.java
+++ 
b/common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java
@@ -15,7 +15,7 @@
  * limitations under the License.
  */
 
-package org.apache.spark.variant;
+package org.apache.spark.types.variant;
 
 import java.io.IOException;
 import java.math.BigDecimal;
@@ -32,7 +32,7 @@ import com.fasterxml.jackson.core.JsonParseException;
 import com.fasterxml.jackson.core.JsonToken;
 import com.fasterxml.jackson.core.exc.InputCoercionException;
 
-import static org.apache.spark.variant.VariantUtil.*;
+import static org.apache.spark.types.variant.VariantUtil.*;
 
 /**
  * Build variant value and metadata by parsing JSON values.
diff --git 
a/common/variant/src/main/java/org/apache/spark/variant/VariantSizeLimitException.java
 
b/common/variant/src/main/java/org/apache/spark/types/variant/VariantSizeLimitException.java
similarity index 96%
rename from 
common/variant/

(spark) branch master updated: [SPARK-47533][BUILD] Migrate scalafmt dialect to `scala213`

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 310fd6517887 [SPARK-47533][BUILD] Migrate scalafmt dialect to 
`scala213`
310fd6517887 is described below

commit 310fd65178876aa613448d86e64bee16ea3f888f
Author: panbingkun 
AuthorDate: Sun Mar 24 16:41:19 2024 -0700

[SPARK-47533][BUILD] Migrate scalafmt dialect to `scala213`

### What changes were proposed in this pull request?
The pr aims to migrate `scalafmt dialect` from `scala212` to `scala213`.

### Why are the changes needed?
In the Spark version `4.0.0`, the version  `scala2.12 ` is no longer 
supported.  During our migration from `scala2.12` to `scala2.13`, this should 
be migrated synchronously.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45683 from panbingkun/scalafmt_dialect.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/.scalafmt.conf | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf
index 6d1ab0243dc5..9a01136dfaf8 100644
--- a/dev/.scalafmt.conf
+++ b/dev/.scalafmt.conf
@@ -26,10 +26,5 @@ optIn = {
 danglingParentheses.preset = false
 docstrings.style = Asterisk
 maxColumn = 98
-runner.dialect = scala212
-fileOverride {
-  "glob:**/src/**/scala-2.13/**.scala" {
-runner.dialect = scala213
-  }
-}
+runner.dialect = scala213
 version = 3.8.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47503][SQL][3.5] Make makeDotNode escape graph node name always

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 2016db66e578 [SPARK-47503][SQL][3.5] Make makeDotNode escape graph 
node name always
2016db66e578 is described below

commit 2016db66e578a0459672aad2a82a53a69601eaec
Author: alexey 
AuthorDate: Sun Mar 24 15:17:10 2024 -0700

[SPARK-47503][SQL][3.5] Make makeDotNode escape graph node name always

### What changes were proposed in this pull request?

This is a backport of https://github.com/apache/spark/pull/45640

To prevent corruption of dot file a node name should be escaped even if 
there is no metrics to display

### Why are the changes needed?

This pr fixes a bug in spark history server which fails to display query 
for cached JDBC relation named in quotes.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Unit test.

### Was this patch authored or co-authored using generative AI tooling?
 No.

Closes #45684 from alex35736/branch-3.5.

Authored-by: alexey 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/ui/SparkPlanGraph.scala|  3 +-
 .../sql/execution/ui/SparkPlanGraphSuite.scala | 44 ++
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
index 1504207d39cb..668cece53335 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
@@ -189,7 +189,8 @@ class SparkPlanGraphNode(
 } else {
   // SPARK-30684: when there is no metrics, add empty lines to increase 
the height of the node,
   // so that there won't be gaps between an edge and a small node.
-  s"""  $id [labelType="html" label="$name"];"""
+  val escapedName = StringEscapeUtils.escapeJava(name)
+  s"""  $id [labelType="html" label="$escapedName"];"""
 }
   }
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala
new file mode 100644
index ..88237cd09ac7
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkFunSuite
+
+class SparkPlanGraphSuite extends SparkFunSuite {
+  test("SPARK-47503: name of a node should be escaped even if there is no 
metrics") {
+val planGraphNode = new SparkPlanGraphNode(
+  id = 24,
+  name = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1]",
+  desc = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1] " +
+"[ticket_no#0] PushedFilters: [], ReadSchema: 
struct",
+  metrics = List(
+SQLPlanMetric(
+  name = "number of output rows",
+  accumulatorId = 75,
+  metricType = "sum"
+),
+SQLPlanMetric(
+  name = "JDBC query execution time",
+  accumulatorId = 35,
+  metricType = "nsTiming")))
+val dotNode = planGraphNode.makeDotNode(Map.empty[Long, String])
+val expectedDotNode = "  24 [labelType=\"html\" label=\"" +
+  "Scan JDBCRelation(\\\"test-schema\\\".tickets) 
[numPartitions=1]\"];"
+
+assertResult(expectedDotNode)(dotNode)
+  }
+}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47528][SQL] Add UserDefinedType support to DataTypeUtils.canWrite

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 767a52d5db35 [SPARK-47528][SQL] Add UserDefinedType support to 
DataTypeUtils.canWrite
767a52d5db35 is described below

commit 767a52d5db354786d5ca07ddc4192d0eb8e8be80
Author: Liang-Chi Hsieh 
AuthorDate: Sun Mar 24 01:27:52 2024 -0700

[SPARK-47528][SQL] Add UserDefinedType support to DataTypeUtils.canWrite

### What changes were proposed in this pull request?

This patch adds `UserDefinedType` handling to `DataTypeUtils.canWrite`.

### Why are the changes needed?

Our customer hits an issue recently when they tries to save a DataFrame 
containing some UDTs as table (`saveAsTable`). The error looks like:

```
- Cannot write 'xxx': struct<...> is incompatible with struct<...>
```

The catalog strings between two sides are actually same which makes the 
customer confused.

It is because `DataTypeUtils.canWrite` doesn't handle `UserDefinedType`. If 
the `UserDefinedType`'s underlying sql type is same as read side, `canWrite` 
should return true for two sides.

### Does this PR introduce _any_ user-facing change?

Yes. Write side column with `UserDefinedType` can be written into read side 
column with same sql data type.

### How was this patch tested?

Unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45678 from viirya/udt_dt_write.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/types/DataTypeUtils.scala   |  14 ++-
 .../types/DataTypeWriteCompatibilitySuite.scala| 134 +
 2 files changed, 147 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala
index 01fb86bf2957..cf8e903f03a3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala
@@ -22,7 +22,7 @@ import org.apache.spark.sql.catalyst.util.TypeUtils.toSQLId
 import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy
 import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy.{ANSI, 
STRICT}
-import org.apache.spark.sql.types.{ArrayType, AtomicType, DataType, Decimal, 
DecimalType, MapType, NullType, StructField, StructType}
+import org.apache.spark.sql.types.{ArrayType, AtomicType, DataType, Decimal, 
DecimalType, MapType, NullType, StructField, StructType, UserDefinedType}
 import org.apache.spark.sql.types.DecimalType.{forType, fromDecimal}
 
 object DataTypeUtils {
@@ -64,6 +64,8 @@ object DataTypeUtils {
* - Both types are structs and have the same number of fields. The type and 
nullability of each
*   field from read/write is compatible. If byName is true, the name of 
each field from
*   read/write needs to be the same.
+   * - It is user defined type and its underlying sql type is same as the read 
type, or the read
+   *   type is user defined type and its underlying sql type is same as the 
write type.
* - Both types are atomic and the write type can be safely cast to the read 
type.
*
* Extra fields in write-side structs are not allowed to avoid accidentally 
writing data that
@@ -180,6 +182,16 @@ object DataTypeUtils {
   case (w, r) if DataTypeUtils.sameType(w, r) && !w.isInstanceOf[NullType] 
=>
 true
 
+  // If write-side data type is a user-defined type, check with its 
underlying data type.
+  case (w, r) if w.isInstanceOf[UserDefinedType[_]] && 
!r.isInstanceOf[UserDefinedType[_]] =>
+canWrite(tableName, w.asInstanceOf[UserDefinedType[_]].sqlType, r, 
byName, resolver,
+  context, storeAssignmentPolicy, addError)
+
+  // If read-side data type is a user-defined type, check with its 
underlying data type.
+  case (w, r) if r.isInstanceOf[UserDefinedType[_]] && 
!w.isInstanceOf[UserDefinedType[_]] =>
+canWrite(tableName, w, r.asInstanceOf[UserDefinedType[_]].sqlType, 
byName, resolver,
+  context, storeAssignmentPolicy, addError)
+
   case (w, r) =>
 throw 
QueryCompilationErrors.incompatibleDataToTableCannotSafelyCastError(
   tableName, context, w.catalogString, r.catalogString
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeWriteCompatibilitySuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeWriteCompatibilitySuite.scala
index 7aaa69a0a5dd..8c9196cc33ca

(spark) branch master updated: [SPARK-47526][BUILD] Upgrade `netty` to 4.1.108.Final and `netty-tcnative` to 2.0.65.Final

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ba175e3f6ae9 [SPARK-47526][BUILD] Upgrade `netty` to 4.1.108.Final and 
`netty-tcnative` to 2.0.65.Final
ba175e3f6ae9 is described below

commit ba175e3f6ae92b584c1e38083a6792ab0bdb726d
Author: panbingkun 
AuthorDate: Sun Mar 24 00:23:54 2024 -0700

[SPARK-47526][BUILD] Upgrade `netty` to 4.1.108.Final and `netty-tcnative` 
to 2.0.65.Final

### What changes were proposed in this pull request?
The pr aims to upgrade:
- `netty` from `4.1.107.Final` to `4.1.108.Final`.
- `netty-tcnative` from `2.0.62.Final` to `2.0.65.Final`.

### Why are the changes needed?
- `netty`
1.The release notes: https://netty.io/news/2024/03/21/4-1-108-Final.html
2.To bring some bug fixes, eg:
Epoll: Fix possible Classloader deadlock caused by loading class via JNI 
([#13879](https://github.com/netty/netty/issues/13879))

- `netty-tcnative`
2.0.62.Final VS 2.0.65.Final

https://github.com/netty/netty-tcnative/compare/netty-tcnative-parent-2.0.62.Final...netty-tcnative-parent-2.0.65.Final

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45676 from panbingkun/SPARK-47526.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  4 +--
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index eed298bc19c0..d39c92f3fc37 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -197,32 +197,32 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar
 metrics-json/4.2.25//metrics-json-4.2.25.jar
 metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.107.Final//netty-all-4.1.107.Final.jar
-netty-buffer/4.1.107.Final//netty-buffer-4.1.107.Final.jar
-netty-codec-http/4.1.107.Final//netty-codec-http-4.1.107.Final.jar
-netty-codec-http2/4.1.107.Final//netty-codec-http2-4.1.107.Final.jar
-netty-codec-socks/4.1.107.Final//netty-codec-socks-4.1.107.Final.jar
-netty-codec/4.1.107.Final//netty-codec-4.1.107.Final.jar
-netty-common/4.1.107.Final//netty-common-4.1.107.Final.jar
-netty-handler-proxy/4.1.107.Final//netty-handler-proxy-4.1.107.Final.jar
-netty-handler/4.1.107.Final//netty-handler-4.1.107.Final.jar
-netty-resolver/4.1.107.Final//netty-resolver-4.1.107.Final.jar
+netty-all/4.1.108.Final//netty-all-4.1.108.Final.jar
+netty-buffer/4.1.108.Final//netty-buffer-4.1.108.Final.jar
+netty-codec-http/4.1.108.Final//netty-codec-http-4.1.108.Final.jar
+netty-codec-http2/4.1.108.Final//netty-codec-http2-4.1.108.Final.jar
+netty-codec-socks/4.1.108.Final//netty-codec-socks-4.1.108.Final.jar
+netty-codec/4.1.108.Final//netty-codec-4.1.108.Final.jar
+netty-common/4.1.108.Final//netty-common-4.1.108.Final.jar
+netty-handler-proxy/4.1.108.Final//netty-handler-proxy-4.1.108.Final.jar
+netty-handler/4.1.108.Final//netty-handler-4.1.108.Final.jar
+netty-resolver/4.1.108.Final//netty-resolver-4.1.108.Final.jar
 
netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar
-netty-tcnative-boringssl-static/2.0.62.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.62.Final-linux-aarch_64.jar
-netty-tcnative-boringssl-static/2.0.62.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.62.Final-linux-x86_64.jar
-netty-tcnative-boringssl-static/2.0.62.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.62.Final-osx-aarch_64.jar
-netty-tcnative-boringssl-static/2.0.62.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.62.Final-osx-x86_64.jar
-netty-tcnative-boringssl-static/2.0.62.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.62.Final-windows-x86_64.jar
-netty-tcnative-classes/2.0.62.Final//netty-tcnative-classes-2.0.62.Final.jar
-netty-transport-classes-epoll/4.1.107.Final//netty-transport-classes-epoll-4.1.107.Final.jar
-netty-transport-classes-kqueue/4.1.107.Final//netty-transport-classes-kqueue-4.1.107.Final.jar
-netty-transport-native-epoll/4.1.107.Final/linux-aarch_64/netty-transport-native-epoll-4.1.107.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.107.Final/linux-riscv64/netty-transport-native-epoll-4.1.107.Final-linux-riscv64.jar
-netty-transport-native-epoll/4.1.107.Final/linux-x86_64/netty-transport-native-epoll-4.1.107.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.107.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.107.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.107.Final/osx

(spark) branch master updated: [SPARK-47503][SQL] Make `makeDotNode` escape graph node name always

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 15ea9de88b8d [SPARK-47503][SQL] Make `makeDotNode` escape graph node 
name always
15ea9de88b8d is described below

commit 15ea9de88b8d8a87daa6abbc8e80c439e2d38e03
Author: Alexey 
AuthorDate: Sun Mar 24 00:16:05 2024 -0700

[SPARK-47503][SQL] Make `makeDotNode` escape graph node name always

### What changes were proposed in this pull request?
To prevent corruption of dot file a node name should be escaped even if 
there is no metrics to display

### Why are the changes needed?
This pr fixes a bug in spark history server which fails to display query 
for cached JDBC relation named in quotes.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Unit test.

### Was this patch authored or co-authored using generative AI tooling?
 No.

Closes #45640 from alex35736/SPARK-47503.

Lead-authored-by: Alexey 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/ui/SparkPlanGraph.scala|  2 +-
 .../sql/execution/ui/SparkPlanGraphSuite.scala | 46 ++
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
index 11ba3cd05e26..f94d7dc7ab4c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
@@ -189,7 +189,7 @@ class SparkPlanGraphNode(
 } else {
   // SPARK-30684: when there is no metrics, add empty lines to increase 
the height of the node,
   // so that there won't be gaps between an edge and a small node.
-  s"$name"
+  s"${StringEscapeUtils.escapeJava(name)}"
 }
 s"""  $id [id="$nodeId" labelType="html" label="$labelStr" 
tooltip="$tooltip"];"""
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala
new file mode 100644
index ..975dbc1a1d8d
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanGraphSuite.scala
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkFunSuite
+
+class SparkPlanGraphSuite extends SparkFunSuite {
+  test("SPARK-47503: name of a node should be escaped even if there is no 
metrics") {
+val planGraphNode = new SparkPlanGraphNode(
+  id = 24,
+  name = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1]",
+  desc = "Scan JDBCRelation(\"test-schema\".tickets) [numPartitions=1] " +
+"[ticket_no#0] PushedFilters: [], ReadSchema: 
struct",
+  metrics = List(
+SQLPlanMetric(
+  name = "number of output rows",
+  accumulatorId = 75,
+  metricType = "sum"
+),
+SQLPlanMetric(
+  name = "JDBC query execution time",
+  accumulatorId = 35,
+  metricType = "nsTiming")))
+val dotNode = planGraphNode.makeDotNode(Map.empty[Long, String])
+val expectedDotNode = "  24 [id=\"node24\" labelType=\"html\" label=\"" +
+  "Scan JDBCRelation(\\\"test-schema\\\".tickets) 
[numPartitions=1]\" " +
+  "tooltip=\"Scan JDBCRelation(\\\"test-schema\\\".tickets) 
[numPartitions=1] [ticket_no#0] " +
+  "PushedFilters: [], ReadSchema: struct\"];"
+
+assertResult(expectedDotNode)(dotNode)
+  }
+}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47497][SQL] Make `to_csv` support the output of `array/struct/map/binary` as pretty strings

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 18bb6a3aea82 [SPARK-47497][SQL] Make `to_csv` support the output of 
`array/struct/map/binary` as pretty strings
18bb6a3aea82 is described below

commit 18bb6a3aea826c2e279457ab72ce6656646cda69
Author: panbingkun 
AuthorDate: Sun Mar 24 00:13:24 2024 -0700

[SPARK-47497][SQL] Make `to_csv` support the output of 
`array/struct/map/binary` as pretty strings

### What changes were proposed in this pull request?
The pr aims make `to_csv`
- support the output of `array/struct/map/binary` as `pretty strings`.
- not support `variant`.

### Why are the changes needed?
This PR was generated from follow-up comment suggestions 
https://github.com/apache/spark/pull/44665#issuecomment-2011239475,
https://github.com/apache/spark/assets/15246973/04dd1497-da42-4b03-b21d-b041ead86f87";>

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
- Update existed UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45657 from panbingkun/SPARK-47497.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/functions/builtin.py|  12 +-
 .../sql/catalyst/csv/UnivocityGenerator.scala  | 126 ++--
 .../sql/catalyst/expressions/csvExpressions.scala  |  35 -
 .../org/apache/spark/sql/CsvFunctionsSuite.scala   | 165 +
 4 files changed, 308 insertions(+), 30 deletions(-)

diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index a31465a77873..99a2375965c2 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -15591,12 +15591,12 @@ def to_csv(col: "ColumnOrName", options: 
Optional[Dict[str, str]] = None) -> Col
 >>> from pyspark.sql import Row, functions as sf
 >>> data = [(1, Row(age=2, name='Alice', scores=[100, 200, 300]))]
 >>> df = spark.createDataFrame(data, ("key", "value"))
->>> df.select(sf.to_csv(df.value)).show(truncate=False) # doctest: +SKIP
-+---+
-|to_csv(value)  |
-+---+
-|2,Alice,"[100,200,300]"|
-+---+
+>>> df.select(sf.to_csv(df.value)).show(truncate=False)
++-+
+|to_csv(value)|
++-+
+|2,Alice,"[100, 200, 300]"|
++-+
 
 Example 3: Converting a StructType with null values to a CSV string
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
index b61652f4b523..f10a53bde5dd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala
@@ -22,7 +22,8 @@ import java.io.Writer
 import com.univocity.parsers.csv.CsvWriter
 
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils, 
IntervalStringStyles, IntervalUtils, TimestampFormatter}
+import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
+import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils, 
IntervalStringStyles, IntervalUtils, SparkStringUtils, TimestampFormatter}
 import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
@@ -36,9 +37,9 @@ class UnivocityGenerator(
   writerSettings.setHeaders(schema.fieldNames: _*)
   private val gen = new CsvWriter(writer, writerSettings)
 
-  // A `ValueConverter` is responsible for converting a value of an 
`InternalRow` to `String`.
-  // When the value is null, this converter should not be called.
-  private type ValueConverter = (InternalRow, Int) => String
+  // A `ValueConverter` is responsible for converting a value of an 
`SpecializedGetters`
+  // to `String`. When the value is null, this converter should not be called.
+  private type ValueConverter = (SpecializedGetters, Int) => String
 
   // `ValueConverter`s for all values in the fields of the schema
   private val valueConverters: Array[ValueConverter] =
@@ -64,33 +65,126 @@ class UnivocityGenerator(
   private val nullAsQuotedEmptyString =
 
SQLConf.get.getConf(SQLConf.LEGACY_NULL_VALUE_WRITTEN_AS_QUOTED_EMPTY_STRING_CSV)
 
-  @scala.annotation.tailrec
   private

(spark) branch master updated: [SPARK-47531][BUILD] Upgrade `Arrow` to 15.0.2

2024-03-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 063638b4d5db [SPARK-47531][BUILD] Upgrade `Arrow` to 15.0.2
063638b4d5db is described below

commit 063638b4d5dbc6732226cc98b8557db7d3578755
Author: panbingkun 
AuthorDate: Sun Mar 24 00:11:35 2024 -0700

[SPARK-47531][BUILD] Upgrade `Arrow` to 15.0.2

### What changes were proposed in this pull request?
The pr aims to upgrade `arrow-memory-netty` from `15.0.0` to `15.0.2`.

### Why are the changes needed?
The release notes:
https://arrow.apache.org/release/15.0.2.html
https://arrow.apache.org/release/15.0.1.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45682 from panbingkun/SPARK-47531.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 
 pom.xml   | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 903c7a245af3..eed298bc19c0 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -16,10 +16,10 @@ antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar
 aopalliance-repackaged/3.0.3//aopalliance-repackaged-3.0.3.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
-arrow-format/15.0.0//arrow-format-15.0.0.jar
-arrow-memory-core/15.0.0//arrow-memory-core-15.0.0.jar
-arrow-memory-netty/15.0.0//arrow-memory-netty-15.0.0.jar
-arrow-vector/15.0.0//arrow-vector-15.0.0.jar
+arrow-format/15.0.2//arrow-format-15.0.2.jar
+arrow-memory-core/15.0.2//arrow-memory-core-15.0.2.jar
+arrow-memory-netty/15.0.2//arrow-memory-netty-15.0.2.jar
+arrow-vector/15.0.2//arrow-vector-15.0.2.jar
 audience-annotations/0.12.0//audience-annotations-0.12.0.jar
 avro-ipc/1.11.3//avro-ipc-1.11.3.jar
 avro-mapred/1.11.3//avro-mapred-1.11.3.jar
diff --git a/pom.xml b/pom.xml
index 637aa50f0314..83dbe5c23789 100644
--- a/pom.xml
+++ b/pom.xml
@@ -225,7 +225,7 @@
 If you are changing Arrow version specification, please check
 ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too.
 -->
-15.0.0
+15.0.2
 3.0.0-M1
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (9d6b9f7305f2 -> 11f5d3fa10b3)

2024-03-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9d6b9f7305f2 [SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs
 add 11f5d3fa10b3 [SPARK-47530][BUILD][TESTS] Add `bcpkix-jdk18on` test 
dependencies to `hive` module for Hadoop 3.4.0

No new revisions were added by this update.

Summary of changes:
 sql/hive/pom.xml | 5 +
 1 file changed, 5 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs

2024-03-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d6b9f7305f2 [SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs
9d6b9f7305f2 is described below

commit 9d6b9f7305f208a8b66f5f390e063db86b5678cd
Author: panbingkun 
AuthorDate: Sat Mar 23 18:27:29 2024 -0700

[SPARK-47529][DOCS] Use hadoop 3.4.0 in some docs

### What changes were proposed in this pull request?
This PR aims to update `Hadoop` dependency in some docs.

### Why are the changes needed?
Currently Spark codebase master is using Apache Hadoop `3.4.0` by default.

### Does this PR introduce _any_ user-facing change?
No. This is a doc-only change.

### How was this patch tested?
N/A.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45679 from panbingkun/minor_use_hadoop_3.4.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 assembly/README  | 2 +-
 docs/building-spark.md   | 2 +-
 docs/running-on-kubernetes.md| 2 +-
 resource-managers/kubernetes/integration-tests/README.md | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/assembly/README b/assembly/README
index 3dde243d3e69..ad1305c5b4d5 100644
--- a/assembly/README
+++ b/assembly/README
@@ -9,4 +9,4 @@ This module is off by default. To activate it specify the 
profile in the command
 
 If you need to build an assembly for a different version of Hadoop the
 hadoop-version system property needs to be set as in this example:
-  -Dhadoop.version=3.3.6
+  -Dhadoop.version=3.4.0
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 3d12b521c024..56efbc1a0110 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -79,7 +79,7 @@ from `hadoop.version`.
 
 Example:
 
-./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package
+./build/mvn -Pyarn -Dhadoop.version=3.4.0 -DskipTests clean package
 
 ## Building With Hive and JDBC Support
 
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 8c92bd9f8cb3..01e9d6382c18 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -236,7 +236,7 @@ A typical example of this using S3 is via passing the 
following options:
 
 ```
 ...
---packages org.apache.hadoop:hadoop-aws:3.2.2
+--packages org.apache.hadoop:hadoop-aws:3.4.0
 --conf spark.kubernetes.file.upload.path=s3a:///path
 --conf spark.hadoop.fs.s3a.access.key=...
 --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
diff --git a/resource-managers/kubernetes/integration-tests/README.md 
b/resource-managers/kubernetes/integration-tests/README.md
index c0d92d988b1a..f8070ec4ce93 100644
--- a/resource-managers/kubernetes/integration-tests/README.md
+++ b/resource-managers/kubernetes/integration-tests/README.md
@@ -130,7 +130,7 @@ properties to Maven.  For example:
 
 mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.13 \
 -Pkubernetes -Pkubernetes-integration-tests \
--Phadoop-3 -Dhadoop.version=3.3.6 \
+-Phadoop-3 -Dhadoop.version=3.4.0 \
 
-Dspark.kubernetes.test.sparkTgz=spark-4.0.0-SNAPSHOT-bin-example.tgz \
 -Dspark.kubernetes.test.imageTag=sometag \
 
-Dspark.kubernetes.test.imageRepo=docker.io/somerepo \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (b9335b90280a -> c29d132aeb5d)

2024-03-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b9335b90280a [SPARK-47510][INFRA] Fix `DSTREAM` label pattern in 
`labeler.yml`
 add c29d132aeb5d [SPARK-47495][CORE] Fix primary resource jar added to 
spark.jars twice under k8s cluster mode

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/deploy/SparkSubmit.scala   |  3 ++-
 .../org/apache/spark/deploy/SparkSubmitSuite.scala| 19 +++
 2 files changed, 21 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (39500a315166 -> b9335b90280a)

2024-03-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 39500a315166 [SPARK-47522][SQL][FOLLOWUP] Add float(p) values for 
MySQLIntegrationSuite
 add b9335b90280a [SPARK-47510][INFRA] Fix `DSTREAM` label pattern in 
`labeler.yml`

No new revisions were added by this update.

Summary of changes:
 .github/labeler.yml | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47522][SQL][FOLLOWUP] Add float(p) values for MySQLIntegrationSuite

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 39500a315166 [SPARK-47522][SQL][FOLLOWUP] Add float(p) values for 
MySQLIntegrationSuite
39500a315166 is described below

commit 39500a315166d8e342b678ef3038995a03ce84d6
Author: Kent Yao 
AuthorDate: Fri Mar 22 13:23:51 2024 -0700

[SPARK-47522][SQL][FOLLOWUP] Add float(p) values for MySQLIntegrationSuite

### What changes were proposed in this pull request?

Add float(p) values for MySQLIntegrationSuite

### Why are the changes needed?

test improvements

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

new test cases

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45672 from yaooqinn/SPARK-47522-F.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala   | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index cd3001311b03..921e63acf7e1 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -79,8 +79,10 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 conn.prepareStatement("INSERT INTO strings VALUES ('the', 'quick', 
'brown', 'fox', " +
   "'jumps', 'over', 'the', 'lazy', 'dog', '{\"status\": 
\"merrily\"}')").executeUpdate()
 
-conn.prepareStatement("CREATE TABLE floats (f1 FLOAT, f2 FLOAT 
UNSIGNED)").executeUpdate()
-conn.prepareStatement("INSERT INTO floats VALUES (1.23, 
4.56)").executeUpdate()
+conn.prepareStatement("CREATE TABLE floats (f1 FLOAT, f2 FLOAT(10), f3 
FLOAT(53), " +
+  "f4 FLOAT UNSIGNED, f5 FLOAT(10) UNSIGNED, f6 FLOAT(53) 
UNSIGNED)").executeUpdate()
+conn.prepareStatement("INSERT INTO floats VALUES (1.23, 4.56, 7.89, 1.23, 
4.56, 7.89)")
+  .executeUpdate()
   }
 
   test("Basic test") {
@@ -267,6 +269,6 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
   test("SPARK-47522: Read MySQL FLOAT as FloatType to keep consistent with the 
write side") {
 val df = spark.read.jdbc(jdbcUrl, "floats", new Properties)
-checkAnswer(df, Row(1.23f, 4.56d))
+checkAnswer(df, Row(1.23f, 4.56f, 7.89d, 1.23d, 4.56d, 7.89d))
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (245669053a34 -> 36126a5c1821)

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 245669053a34 [SPARK-47521][CORE] Use `Utils.tryWithResource` during 
reading shuffle data from external storage
 add 36126a5c1821 [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep 
consistent with the write side

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala| 12 ++--
 docs/sql-migration-guide.md  |  1 +
 .../main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala  |  2 ++
 .../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala |  8 
 4 files changed, 21 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (93f98c0a61dd -> 245669053a34)

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 93f98c0a61dd [SPARK-47523][SQL] Replace deprecated 
`JsonParser#getCurrentName` with `JsonParser#currentName`
 add 245669053a34 [SPARK-47521][CORE] Use `Utils.tryWithResource` during 
reading shuffle data from external storage

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/storage/FallbackStorage.scala  | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 585845ec9ef8 [SPARK-47521][CORE] Use `Utils.tryWithResource` during 
reading shuffle data from external storage
585845ec9ef8 is described below

commit 585845ec9ef85a7b2ce10882e5e0d391702a0769
Author: maheshbehera 
AuthorDate: Fri Mar 22 10:44:55 2024 -0700

[SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data 
from external storage

### What changes were proposed in this pull request?

In method FallbackStorage.open, file open is guarded by 
Utils.tryWithResource to avoid file handle leakage incase of failure during 
read.

### Why are the changes needed?

To avoid file handle leakage in case of read failure.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45663 from maheshk114/SPARK-47521.

Authored-by: maheshbehera 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 245669053a34cb1d4a84689230e5bd1d163be5c6)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/storage/FallbackStorage.scala  | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala 
b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala
index 5aa5c6eff7b2..98ed3167d119 100644
--- a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala
+++ b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala
@@ -187,15 +187,15 @@ private[spark] object FallbackStorage extends Logging {
 val name = ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID).name
 val hash = JavaUtils.nonNegativeHash(name)
 val dataFile = new Path(fallbackPath, s"$appId/$shuffleId/$hash/$name")
-val f = fallbackFileSystem.open(dataFile)
 val size = nextOffset - offset
 logDebug(s"To byte array $size")
 val array = new Array[Byte](size.toInt)
 val startTimeNs = System.nanoTime()
-f.seek(offset)
-f.readFully(array)
-logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 
1000)}ms")
-f.close()
+Utils.tryWithResource(fallbackFileSystem.open(dataFile)) { f =>
+  f.seek(offset)
+  f.readFully(array)
+  logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 
1000)}ms")
+}
 new NioManagedBuffer(ByteBuffer.wrap(array))
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data from external storage

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 30cb7edecbf0 [SPARK-47521][CORE] Use `Utils.tryWithResource` during 
reading shuffle data from external storage
30cb7edecbf0 is described below

commit 30cb7edecbf0ef7aed1e216ad147ebb318aea09c
Author: maheshbehera 
AuthorDate: Fri Mar 22 10:44:55 2024 -0700

[SPARK-47521][CORE] Use `Utils.tryWithResource` during reading shuffle data 
from external storage

### What changes were proposed in this pull request?

In method FallbackStorage.open, file open is guarded by 
Utils.tryWithResource to avoid file handle leakage incase of failure during 
read.

### Why are the changes needed?

To avoid file handle leakage in case of read failure.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45663 from maheshk114/SPARK-47521.

Authored-by: maheshbehera 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 245669053a34cb1d4a84689230e5bd1d163be5c6)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/storage/FallbackStorage.scala  | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala 
b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala
index eb23fb4b1c84..161120393490 100644
--- a/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala
+++ b/core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala
@@ -188,15 +188,15 @@ private[spark] object FallbackStorage extends Logging {
 val name = ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID).name
 val hash = JavaUtils.nonNegativeHash(name)
 val dataFile = new Path(fallbackPath, s"$appId/$shuffleId/$hash/$name")
-val f = fallbackFileSystem.open(dataFile)
 val size = nextOffset - offset
 logDebug(s"To byte array $size")
 val array = new Array[Byte](size.toInt)
 val startTimeNs = System.nanoTime()
-f.seek(offset)
-f.readFully(array)
-logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 
1000)}ms")
-f.close()
+Utils.tryWithResource(fallbackFileSystem.open(dataFile)) { f =>
+  f.seek(offset)
+  f.readFully(array)
+  logDebug(s"Took ${(System.nanoTime() - startTimeNs) / (1000 * 
1000)}ms")
+}
 new NioManagedBuffer(ByteBuffer.wrap(array))
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47523][SQL] Replace deprecated `JsonParser#getCurrentName` with `JsonParser#currentName`

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 93f98c0a61dd [SPARK-47523][SQL] Replace deprecated 
`JsonParser#getCurrentName` with `JsonParser#currentName`
93f98c0a61dd is described below

commit 93f98c0a61ddb66eb777c3940fbf29fc58e2d79b
Author: yangjie01 
AuthorDate: Fri Mar 22 08:37:09 2024 -0700

[SPARK-47523][SQL] Replace deprecated `JsonParser#getCurrentName` with 
`JsonParser#currentName`

### What changes were proposed in this pull request?
This pr replaces the use of `JsonParser#getCurrentName` with 
`JsonParser#currentName` in Spark code, as `JsonParser#getCurrentName` has been 
deprecated since jackson 2.17.


https://github.com/FasterXML/jackson-core/blob/8fba680579885bf9cdae72e93f16de557056d6e3/src/main/java/com/fasterxml/jackson/core/JsonParser.java#L1521-L1551

```java
/**
 * Deprecated alias of {link #currentName()}.
 *
 * return Name of the current field in the parsing context
 *
 * throws IOException for low-level read issues, or
 *   {link JsonParseException} for decoding problems
 *
 * deprecated Since 2.17 use {link #currentName} instead.
 */
Deprecated
public abstract String getCurrentName() throws IOException;

/**
 * Method that can be called to get the name associated with
 * the current token: for {link JsonToken#FIELD_NAME}s it will
 * be the same as what {link #getText} returns;
 * for field values it will be preceding field name;
 * and for others (array values, root-level values) null.
 *
 * return Name of the current field in the parsing context
 *
 * throws IOException for low-level read issues, or
 *   {link JsonParseException} for decoding problems
 *
 * since 2.10
 */
public String currentName() throws IOException {
// !!! TODO: switch direction in 2.18 or later
return getCurrentName();
}
```

### Why are the changes needed?
Clean up deprecated API usage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45668 from LuciferYang/SPARK-47523.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/expressions/jsonExpressions.scala   |  6 +++---
 .../org/apache/spark/sql/catalyst/json/JacksonParser.scala | 10 +-
 .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala   |  2 +-
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala |  2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index 9fca09b46a99..b155987242b3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -411,7 +411,7 @@ class GetJsonObjectEvaluator(cachedPath: UTF8String) {
 p.nextToken()
 arrayIndex(p, () => evaluatePath(p, g, style, xs))(idx)
 
-  case (FIELD_NAME, Named(name) :: xs) if p.getCurrentName == name =>
+  case (FIELD_NAME, Named(name) :: xs) if p.currentName == name =>
 // exact field match
 if (p.nextToken() != JsonToken.VALUE_NULL) {
   evaluatePath(p, g, style, xs)
@@ -546,7 +546,7 @@ case class JsonTuple(children: Seq[Expression])
 while (parser.nextToken() != JsonToken.END_OBJECT) {
   if (parser.getCurrentToken == JsonToken.FIELD_NAME) {
 // check to see if this field is desired in the output
-val jsonField = parser.getCurrentName
+val jsonField = parser.currentName
 var idx = fieldNames.indexOf(jsonField)
 if (idx >= 0) {
   // it is, copy the child tree to the correct location in the output 
row
@@ -1056,7 +1056,7 @@ case class JsonObjectKeys(child: Expression) extends 
UnaryExpression with Codege
 // traverse until the end of input and ensure it returns valid key
 while(parser.nextValue() != null && parser.currentName() != null) {
   // add current fieldName to the ArrayBuffer
-  arrayBufferOfKeys += UTF8String.fromString(parser.getCurrentName)
+  arrayBufferOfKeys += UTF8String.fromString(parser.currentName)
 
   // skip all the children of inner object or array
   parser.skipChildren()
diff --git 
a/sql/catalyst/src/main/scala/

(spark) branch master updated (32dfdd305aec -> d1be4fb61368)

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 32dfdd305aec [SPARK-47517][CORE][SQL] Prefer Utils.bytesToString for 
size display
 add d1be4fb61368 [SPARK-47517][INFRA][FOLLOWUP] Prevent 
`byteCountToDisplaySize` via Scalastyle

No new revisions were added by this update.

Summary of changes:
 scalastyle-config.xml | 5 +
 1 file changed, 5 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][DOCS] Fix typo in spark connect overview

2024-03-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d7be50f122ed [MINOR][DOCS] Fix typo in spark connect overview
d7be50f122ed is described below

commit d7be50f122ed9eec5551a842e1a61a3503d308ed
Author: rrueda 
AuthorDate: Fri Mar 22 07:41:46 2024 -0700

[MINOR][DOCS] Fix typo in spark connect overview

### What changes were proposed in this pull request?
Fix typo in the command to install the `spark-connect-repl` by substituting 
the `en-dash` char by the `hyphen-minus`.

### Why are the changes needed?
It doesn't work with the `en-dash`.

### Does this PR introduce _any_ user-facing change?
Only documentation.

### How was this patch tested?
The documentation was built and the output checked manually.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45650 from rorueda/fix-docs-connect.

Authored-by: rrueda 
Signed-off-by: Dongjoon Hyun 
---
 docs/spark-connect-overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md
index 7a085df86c8d..268155360fcc 100644
--- a/docs/spark-connect-overview.md
+++ b/docs/spark-connect-overview.md
@@ -224,7 +224,7 @@ For the Scala shell, we use an Ammonite-based REPL that is 
currently not include
 To set up the new Scala shell, first download and install [Coursier 
CLI](https://get-coursier.io/docs/cli-installation).
 Then, install the REPL using the following command in a terminal window:
 {% highlight bash %}
-cs install –-contrib spark-connect-repl
+cs install --contrib spark-connect-repl
 {% endhighlight %}
 
 And now you can start the Ammonite-based Scala REPL/shell to connect to your 
Spark server like this:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (aea13fca5d57 -> ca44489f4585)

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from aea13fca5d57 [SPARK-47500][PYTHON][CONNECT] Factor column name 
handling out of `plan.py`
 add ca44489f4585 [SPARK-47499][PYTHON][CONNECT][TESTS] Enable 
`test_help_command` test

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_parity_dataframe.py | 4 ++--
 python/pyspark/sql/tests/test_dataframe.py| 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new e57a7d068839 [SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for 
TINYINT mapping changes
e57a7d068839 is described below

commit e57a7d068839d549afe08b4a79e82d027b56a5f5
Author: Kent Yao 
AuthorDate: Thu Mar 21 23:06:03 2024 -0700

[SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping 
changes

### What changes were proposed in this pull request?

Add migration guide for TINYINT type mapping changes
### Why are the changes needed?

behavior change doc
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
doc build

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45658 from yaooqinn/SPARK-47462-FB.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-migration-guide.md | 8 
 1 file changed, 8 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index f788d89c4999..3bb83750ef92 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -22,6 +22,14 @@ license: |
 * Table of contents
 {:toc}
 
+## Upgrading from Spark SQL 3.5.1 to 3.5.2
+
+- Since 3.5.2, MySQL JDBC datasource will read TINYINT UNSIGNED as ShortType, 
while in 3.5.1, it was wrongly read as ByteType.
+
+## Upgrading from Spark SQL 3.5.0 to 3.5.1
+
+- Since Spark 3.5.1, MySQL JDBC datasource will read TINYINT(n > 1) and 
TINYINT UNSIGNED as ByteType, while in Spark 3.5.0 and below, they were read as 
IntegerType. To restore the previous behavior, you can cast the column to the 
old type.
+
 ## Upgrading from Spark SQL 3.4 to 3.5
 
 - Since Spark 3.5, the JDBC options related to DS V2 pushdown are `true` by 
default. These options include: `pushDownAggregate`, `pushDownLimit`, 
`pushDownOffset` and `pushDownTableSample`. To restore the legacy behavior, 
please set them to `false`. e.g. set 
`spark.sql.catalog.your_catalog_name.pushDownAggregate` to `false`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (0ef7b771b33d -> 47bce8ececa8)

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0ef7b771b33d [SPARK-47501][SQL][FOLLOWUP] Rename convertDateToDate to 
convertJavaDateToDate
 add 47bce8ececa8 [MINOR][SQL] Fix a typo in 
`DelegateSymlinkTextInputSplit` comment

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hadoop/hive/ql/io/DelegateSymlinkTextInputFormat.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (5042263f8668 -> 0ef7b771b33d)

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5042263f8668 [SPARK-47479][SQL] Optimize cannot write data to 
relations with multiple paths error log
 add 0ef7b771b33d [SPARK-47501][SQL][FOLLOWUP] Rename convertDateToDate to 
convertJavaDateToDate

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala   | 4 ++--
 sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala  | 2 +-
 .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47514][SQL][TESTS] Add a test coverage for createTable method (partitioned-table) in CatalogSuite

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9ba70d6ed302 [SPARK-47514][SQL][TESTS] Add a test coverage for 
createTable method (partitioned-table) in CatalogSuite
9ba70d6ed302 is described below

commit 9ba70d6ed3029b444d6a37835eb27c6916e5c78a
Author: panbingkun 
AuthorDate: Thu Mar 21 20:57:25 2024 -0700

[SPARK-47514][SQL][TESTS] Add a test coverage for createTable method 
(partitioned-table) in CatalogSuite

### What changes were proposed in this pull request?
The pr aims to add a test coverage for createTable method 
(`partitioned-table`) in `CatalogSuite`.

### Why are the changes needed?
Currently, the UT about `createTable` the partitions are `empty`.
Let's improve it.

### Does this PR introduce _any_ user-facing change?
No, only for tests.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45637 from panbingkun/minor_catalogsuites.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/catalog/CatalogSuite.scala | 27 --
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala
index 145bfd286123..e20dfd4f6051 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogSuite.scala
@@ -28,7 +28,7 @@ import 
org.apache.spark.sql.catalyst.analysis.{NamespaceAlreadyExistsException,
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.util.quoteIdentifier
 import org.apache.spark.sql.connector.catalog.functions.{BoundFunction, 
ScalarFunction, UnboundFunction}
-import org.apache.spark.sql.connector.expressions.{LogicalExpressions, 
Transform}
+import org.apache.spark.sql.connector.expressions.{Expressions, 
LogicalExpressions, Transform}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{DataType, DoubleType, IntegerType, 
LongType, StringType, StructType, TimestampType}
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
@@ -96,7 +96,7 @@ class CatalogSuite extends SparkFunSuite {
 assert(catalog.listTables(Array("ns2")).toSet == Set(ident3))
   }
 
-  test("createTable") {
+  test("createTable: non-partitioned table") {
 val catalog = newCatalog()
 
 assert(!catalog.tableExists(testIdent))
@@ -111,6 +111,29 @@ class CatalogSuite extends SparkFunSuite {
 assert(catalog.tableExists(testIdent))
   }
 
+  test("createTable: partitioned table") {
+val partCatalog = new InMemoryPartitionTableCatalog
+partCatalog.initialize("test", CaseInsensitiveStringMap.empty())
+
+assert(!partCatalog.tableExists(testIdent))
+
+val columns = Array(
+Column.create("col0", IntegerType),
+Column.create("part0", IntegerType))
+val table = partCatalog.createTable(
+  testIdent,
+  columns,
+  Array[Transform](Expressions.identity("part0")),
+  util.Collections.emptyMap[String, String])
+
+val parsed = CatalystSqlParser.parseMultipartIdentifier(table.name)
+assert(parsed == Seq("test", "`", ".", "test_table"))
+assert(table.columns === columns)
+assert(table.properties.asScala == Map())
+
+assert(partCatalog.tableExists(testIdent))
+  }
+
   test("createTable: with properties") {
 val catalog = newCatalog()
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (6a27789ad7d5 -> b4c09221b2e0)

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6a27789ad7d5 [SPARK-47398][SQL] Extract a trait for 
InMemoryTableScanExec to allow for extending functionality
 add b4c09221b2e0 [SPARK-47502][INFRA] Make output the installation 
packages in `descending size`, add `titles`, and remove `unused` packages

No new revisions were added by this update.

Summary of changes:
 dev/free_disk_space   |  5 +++--
 dev/free_disk_space_container | 15 +--
 2 files changed, 16 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47507][BUILD][3.5] Upgrade ORC to 1.9.3

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 203f943efcb1 [SPARK-47507][BUILD][3.5] Upgrade ORC to 1.9.3
203f943efcb1 is described below

commit 203f943efcb1ad699f1098c86d4bb4e46fc3bbc2
Author: Dongjoon Hyun 
AuthorDate: Thu Mar 21 12:16:25 2024 -0700

[SPARK-47507][BUILD][3.5] Upgrade ORC to 1.9.3

### What changes were proposed in this pull request?

This PR aims to upgrade ORC to 1.9.3 for Apache Spark 3.5.2.

### Why are the changes needed?

Apache ORC 1.9.3 is the latest maintenance release. To bring the latest bug 
fixes, we had better upgrade.
- https://orc.apache.org/news/2024/03/20/ORC-1.9.3/
  - https://github.com/apache/orc/pull/1692

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45646 from dongjoon-hyun/SPARK-47507.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: Gang Wu 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 8ecf931bf513..1cd7d5a8f2d7 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -212,9 +212,9 @@ opencsv/2.3//opencsv-2.3.jar
 opentracing-api/0.33.0//opentracing-api-0.33.0.jar
 opentracing-noop/0.33.0//opentracing-noop-0.33.0.jar
 opentracing-util/0.33.0//opentracing-util-0.33.0.jar
-orc-core/1.9.2/shaded-protobuf/orc-core-1.9.2-shaded-protobuf.jar
-orc-mapreduce/1.9.2/shaded-protobuf/orc-mapreduce-1.9.2-shaded-protobuf.jar
-orc-shims/1.9.2//orc-shims-1.9.2.jar
+orc-core/1.9.3/shaded-protobuf/orc-core-1.9.3-shaded-protobuf.jar
+orc-mapreduce/1.9.3/shaded-protobuf/orc-mapreduce-1.9.3-shaded-protobuf.jar
+orc-shims/1.9.3//orc-shims-1.9.3.jar
 oro/2.0.8//oro-2.0.8.jar
 osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
 paranamer/2.8//paranamer-2.8.jar
diff --git a/pom.xml b/pom.xml
index fb6208777d3f..269a42d41f17 100644
--- a/pom.xml
+++ b/pom.xml
@@ -141,7 +141,7 @@
 
 10.14.2.0
 1.13.1
-1.9.2
+1.9.3
 shaded-protobuf
 9.4.54.v20240208
 4.0.3


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47505][INFRA][3.4] Fix `Pyspark-errors` test jobs for branch-3.4

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 47c698e0bac9 [SPARK-47505][INFRA][3.4] Fix `Pyspark-errors` test jobs 
for branch-3.4
47c698e0bac9 is described below

commit 47c698e0bac9b0edecfc3f85801e4b4f8b57534a
Author: panbingkun 
AuthorDate: Thu Mar 21 10:34:11 2024 -0700

[SPARK-47505][INFRA][3.4] Fix `Pyspark-errors` test jobs for branch-3.4

### What changes were proposed in this pull request?
The pr aims to fix `pyspark-errors` test jobs for branch-3.4.

### Why are the changes needed?
Fix `pyspark-errors` test jobs for branch-3.4.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45624 from panbingkun/branch-3.4_fix_pyerrors.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 17 +
 dev/free_disk_space_container| 33 +
 2 files changed, 50 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 2184577d5c44..8ae303178033 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -396,6 +396,12 @@ jobs:
 key: pyspark-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
 restore-keys: |
   pyspark-coursier-
+- name: Free up disk space
+  shell: 'script -q -e -c "bash {0}"'
+  run: |
+if [ -f ./dev/free_disk_space_container ]; then
+  ./dev/free_disk_space_container
+fi
 - name: Install Java ${{ matrix.java }}
   uses: actions/setup-java@v3
   with:
@@ -493,6 +499,12 @@ jobs:
 key: sparkr-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
 restore-keys: |
   sparkr-coursier-
+- name: Free up disk space
+  shell: 'script -q -e -c "bash {0}"'
+  run: |
+if [ -f ./dev/free_disk_space_container ]; then
+  ./dev/free_disk_space_container
+fi
 - name: Install Java ${{ inputs.java }}
   uses: actions/setup-java@v3
   with:
@@ -571,6 +583,11 @@ jobs:
 key: docs-maven-${{ hashFiles('**/pom.xml') }}
 restore-keys: |
   docs-maven-
+- name: Free up disk space
+  run: |
+if [ -f ./dev/free_disk_space_container ]; then
+  ./dev/free_disk_space_container
+fi
 - name: Install Java 8
   uses: actions/setup-java@v3
   with:
diff --git a/dev/free_disk_space_container b/dev/free_disk_space_container
new file mode 100755
index ..cc3b74643e4f
--- /dev/null
+++ b/dev/free_disk_space_container
@@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+echo "=="
+echo "Free up disk space on CI system"
+echo "=="
+
+echo "Listing 100 largest packages"
+dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n | tail -n 100
+df -h
+
+echo "Removing large packages"
+rm -rf /__t/CodeQL
+rm -rf /__t/go
+rm -rf /__t/node
+
+df -h


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47487][SQL] Simplify code in AnsiTypeCoercion

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 25ecde948beb [SPARK-47487][SQL] Simplify code in AnsiTypeCoercion
25ecde948beb is described below

commit 25ecde948bebf01d2cb1e160516238e1d949ffdb
Author: Wenchen Fan 
AuthorDate: Thu Mar 21 08:54:26 2024 -0700

[SPARK-47487][SQL] Simplify code in AnsiTypeCoercion

### What changes were proposed in this pull request?

Simplify the code in `AnsiTypeCoercion.implicitCast`, to merge common code 
paths.

### Why are the changes needed?

improve code readability

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45612 from cloud-fan/type-coercion.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/types/DataType.scala  |  2 +-
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   | 56 ++
 2 files changed, 16 insertions(+), 42 deletions(-)

diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
index b37924a6d353..16cf6224ce27 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
@@ -102,7 +102,7 @@ abstract class DataType extends AbstractDataType {
*/
   private[spark] def existsRecursively(f: (DataType) => Boolean): Boolean = 
f(this)
 
-  override private[sql] def defaultConcreteType: DataType = this
+  final override private[sql] def defaultConcreteType: DataType = this
 
   override private[sql] def acceptsType(other: DataType): Boolean = 
sameType(other)
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
index c70d6696ad06..92ea3ba1ca29 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
@@ -180,56 +180,30 @@ object AnsiTypeCoercion extends TypeCoercionBase {
   // cast the input to decimal.
   case (n: NumericType, DecimalType) => Some(DecimalType.forType(n))
 
-  // Cast null type (usually from null literals) into target types
-  // By default, the result type is `target.defaultConcreteType`. When the 
target type is
-  // `TypeCollection`, there is another branch to find the "closet 
convertible data type" below.
-  case (NullType, target) if !target.isInstanceOf[TypeCollection] =>
-Some(target.defaultConcreteType)
-
   // If a function expects a StringType, no StringType instance should be 
implicitly cast to
   // StringType with a collation that's not accepted (aka. lockdown 
unsupported collations).
   case (_: StringType, StringType) => None
   case (_: StringType, _: StringTypeCollated) => None
 
-  // This type coercion system will allow implicit converting String type 
as other
-  // primitive types, in case of breaking too many existing Spark SQL 
queries.
-  case (StringType, a: AtomicType) =>
-Some(a)
-
-  // If the target type is any Numeric type, convert the String type as 
Double type.
-  case (StringType, NumericType) =>
-Some(DoubleType)
-
-  // If the target type is any Decimal type, convert the String type as 
the default
-  // Decimal type.
-  case (StringType, DecimalType) =>
-Some(DecimalType.SYSTEM_DEFAULT)
-
-  // If the target type is any timestamp type, convert the String type as 
the default
-  // Timestamp type.
-  case (StringType, AnyTimestampType) =>
-Some(AnyTimestampType.defaultConcreteType)
-
-  case (DateType, AnyTimestampType) =>
-Some(AnyTimestampType.defaultConcreteType)
-
-  case (_, target: DataType) =>
-if (Cast.canANSIStoreAssign(inType, target)) {
-  Some(target)
+  // If a function expects integral type, fractional input is not allowed.
+  case (_: FractionalType, IntegralType) => None
+
+  // Ideally the implicit cast rule should be the same as 
`Cast.canANSIStoreAssign` so that it's
+  // consistent with table insertion. To avoid breaking too many existing 
Spark SQL queries,
+  // we make the system to allow implicitly converting String type as 
other primitive types.
+  case (StringType, a @ (_: AtomicType | NumericType | DecimalType | 
AnyTimestampType)) =>
+Some(a.defaultConcreteType)
+
+

(spark) branch master updated: [SPARK-47501][SQL] Add convertDateToDate like the existing convertTimestampToTimestamp for JdbcDialect

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32f3d4dc0389 [SPARK-47501][SQL] Add convertDateToDate like the 
existing convertTimestampToTimestamp for JdbcDialect
32f3d4dc0389 is described below

commit 32f3d4dc03892f23bc073205335ad0d4174c213a
Author: Kent Yao 
AuthorDate: Thu Mar 21 07:48:41 2024 -0700

[SPARK-47501][SQL] Add convertDateToDate like the existing 
convertTimestampToTimestamp for JdbcDialect

### What changes were proposed in this pull request?

Add convertDateToDate like the existing convertTimestampToTimestamp for 
JdbcDialect

### Why are the changes needed?

The date '±infinity' values cause overflows like timestamp '±infinity' in 
#41843

### Does this PR introduce _any_ user-facing change?

fix expected overflow for dates to align with the timestamps of PostgreSQL

### How was this patch tested?
new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45638 from yaooqinn/SPARK-47501.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 24 ++
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  6 ++--
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  8 +
 .../apache/spark/sql/jdbc/PostgresDialect.scala| 37 +++---
 4 files changed, 54 insertions(+), 21 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index 8d137ba88cb1..a47e834a4b3c 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -155,6 +155,14 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 "('-infinity', ARRAY[TIMESTAMP '-infinity'])")
   .executeUpdate()
 
+conn.prepareStatement("CREATE TABLE infinity_dates" +
+"(id SERIAL PRIMARY KEY, date_column DATE, date_array DATE[])")
+  .executeUpdate()
+conn.prepareStatement("INSERT INTO infinity_dates (date_column, 
date_array)" +
+" VALUES ('infinity', ARRAY[DATE 'infinity']), " +
+"('-infinity', ARRAY[DATE '-infinity'])")
+  .executeUpdate()
+
 conn.prepareStatement("CREATE DOMAIN not_null_text AS TEXT DEFAULT 
''").executeUpdate()
 conn.prepareStatement("create table custom_type(type_array 
not_null_text[]," +
   "type not_null_text)").executeUpdate()
@@ -462,6 +470,22 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(negativeInfinitySeq.head.getTime == minTimeStamp)
   }
 
+  test("SPARK-47501: infinity date test") {
+val df = sqlContext.read.jdbc(jdbcUrl, "infinity_dates", new Properties)
+val row = df.collect()
+
+assert(row.length == 2)
+val infinity = row(0).getDate(1)
+val negativeInfinity = row(1).getDate(1)
+val infinitySeq = row(0).getAs[scala.collection.Seq[Date]]("date_array")
+val negativeInfinitySeq = 
row(1).getAs[scala.collection.Seq[Date]]("date_array")
+val minDate = -6213565440L
+val maxDate = 25340215680L
+assert(infinity.getTime == maxDate)
+assert(negativeInfinity.getTime == minDate)
+assert(infinitySeq.head.getTime == maxDate)
+assert(negativeInfinitySeq.head.getTime == minDate)
+  }
 
   test("SPARK-47407: Support java.sql.Types.NULL for NullType") {
 val df = spark.read.format("jdbc")
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index 84d87f008217..70fd9bd071e9 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -404,7 +404,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
 // DateTimeUtils.fromJavaDate does not handle null value, so we need 
to check it.
 val dateVal = rs.getDate(pos + 1)
 if (dateVal != null) {
-  row.setInt(pos, fromJavaDate(dateVal))
+  row.setInt(pos, fromJavaDate(dialect.convertDateToDate(dateVal)))
 } else {
   row.update(pos, null)

(spark) branch master updated: [SPARK-45393][BUILD][FOLLOWUP] Update IsolatedClientLoader fallback Hadoop version to 3.4.0

2024-03-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a1609af41e4 [SPARK-45393][BUILD][FOLLOWUP] Update 
IsolatedClientLoader fallback Hadoop version to 3.4.0
3a1609af41e4 is described below

commit 3a1609af41e46ffaefddd4faabb24c284c108254
Author: Cheng Pan 
AuthorDate: Wed Mar 20 23:58:00 2024 -0700

[SPARK-45393][BUILD][FOLLOWUP] Update IsolatedClientLoader fallback Hadoop 
version to 3.4.0

### What changes were proposed in this pull request?

Update IsolatedClientLoader fallback Hadoop version to 3.4.0

### Why are the changes needed?

Sync with the default Hadoop version

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45628 from pan3793/SPARK-45393-followup.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index 5693041d21f9..99fa0d46b903 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -66,7 +66,7 @@ private[hive] object IsolatedClientLoader extends Logging {
   case e: RuntimeException if e.getMessage.contains("hadoop") =>
 // If the error message contains hadoop, it is probably because 
the hadoop
 // version cannot be resolved.
-val fallbackVersion = "3.3.6"
+val fallbackVersion = "3.4.0"
 logWarning(s"Failed to resolve Hadoop artifacts for the version 
$hadoopVersion. We " +
   s"will change the hadoop version from $hadoopVersion to 
$fallbackVersion and try " +
   "again. It is recommended to set jars used by Hive metastore 
client through " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-41888][PYTHON][CONNECT][TESTS] Enable doctest for `DataFrame.observe`

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 91df046e0825 [SPARK-41888][PYTHON][CONNECT][TESTS] Enable doctest for 
`DataFrame.observe`
91df046e0825 is described below

commit 91df046e0825c8916107de1cdf7c4a022fb1a53d
Author: Ruifeng Zheng 
AuthorDate: Wed Mar 20 23:26:30 2024 -0700

[SPARK-41888][PYTHON][CONNECT][TESTS] Enable doctest for `DataFrame.observe`

### What changes were proposed in this pull request?
Enable doctest for `DataFrame.observe`

### Why are the changes needed?
for test coverage

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45627 from zhengruifeng/enable_listener_doctest.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/connect/dataframe.py | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 7171bee24bd7..741606c89aa4 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -2243,9 +2243,6 @@ def _test() -> None:
 
 globs = pyspark.sql.connect.dataframe.__dict__.copy()
 
-# TODO(SPARK-41888): Support StreamingQueryListener for DataFrame.observe
-del pyspark.sql.connect.dataframe.DataFrame.observe.__doc__
-
 globs["spark"] = (
 PySparkSession.builder.appName("sql.connect.dataframe tests")
 .remote("local[4]")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new a79101775c6f [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to 
`jul-to-slf4j`
a79101775c6f is described below

commit a79101775c6fc83e61d0fd393dace7b96286bb38
Author: Dongjoon Hyun 
AuthorDate: Wed Mar 20 22:01:37 2024 -0700

[MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`

This PR aims to fix a typo `slf4j-to-jul` to `jul-to-slf4j`. There exists 
only one.

```
$ git grep slf4j-to-jul
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:// 
slf4j-to-jul bridge order to route their logs to JUL.
```

Apache Spark uses `jul-to-slf4j` which includes a `java.util.logging` (jul) 
handler, namely `SLF4JBridgeHandler`, which routes all incoming jul records to 
the SLF4j API.


https://github.com/apache/spark/blob/bb3e27581887a094ead0d2f7b4a6b2a17ee84b6f/pom.xml#L735

This typo was there since Apache Spark 1.0.0.

No, this is a comment fix.

Manual review.

No.

Closes #45625 from dongjoon-hyun/jul-to-slf4j.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit bb0867f54d437f6467274e854506aea2900bceb1)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/Logging.scala 
b/core/src/main/scala/org/apache/spark/internal/Logging.scala
index 614103dee7b7..22c51e999771 100644
--- a/core/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/core/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -193,7 +193,7 @@ private[spark] object Logging {
   val initLock = new Object()
   try {
 // We use reflection here to handle the case where users remove the
-// slf4j-to-jul bridge order to route their logs to JUL.
+// jul-to-slf4j bridge order to route their logs to JUL.
 val bridgeClass = Utils.classForName("org.slf4j.bridge.SLF4JBridgeHandler")
 bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null)
 val installed = 
bridgeClass.getMethod("isInstalled").invoke(null).asInstanceOf[Boolean]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new e17fdba1f507 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to 
`jul-to-slf4j`
e17fdba1f507 is described below

commit e17fdba1f507fda816dcf5af0f15684399f5b7f8
Author: Dongjoon Hyun 
AuthorDate: Wed Mar 20 22:01:37 2024 -0700

[MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`

### What changes were proposed in this pull request?

This PR aims to fix a typo `slf4j-to-jul` to `jul-to-slf4j`. There exists 
only one.

```
$ git grep slf4j-to-jul
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:// 
slf4j-to-jul bridge order to route their logs to JUL.
```

Apache Spark uses `jul-to-slf4j` which includes a `java.util.logging` (jul) 
handler, namely `SLF4JBridgeHandler`, which routes all incoming jul records to 
the SLF4j API.


https://github.com/apache/spark/blob/bb3e27581887a094ead0d2f7b4a6b2a17ee84b6f/pom.xml#L735

### Why are the changes needed?

This typo was there since Apache Spark 1.0.0.

### Does this PR introduce _any_ user-facing change?

No, this is a comment fix.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45625 from dongjoon-hyun/jul-to-slf4j.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit bb0867f54d437f6467274e854506aea2900bceb1)
Signed-off-by: Dongjoon Hyun 
---
 common/utils/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
index 83e01330ce3f..bd82ce962b8d 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -196,7 +196,7 @@ private[spark] object Logging {
   val initLock = new Object()
   try {
 // We use reflection here to handle the case where users remove the
-// slf4j-to-jul bridge order to route their logs to JUL.
+// jul-to-slf4j bridge order to route their logs to JUL.
 val bridgeClass = 
SparkClassUtils.classForName("org.slf4j.bridge.SLF4JBridgeHandler")
 bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null)
 val installed = 
bridgeClass.getMethod("isInstalled").invoke(null).asInstanceOf[Boolean]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bb0867f54d43 [MINOR][CORE] Fix a comment typo `slf4j-to-jul` to 
`jul-to-slf4j`
bb0867f54d43 is described below

commit bb0867f54d437f6467274e854506aea2900bceb1
Author: Dongjoon Hyun 
AuthorDate: Wed Mar 20 22:01:37 2024 -0700

[MINOR][CORE] Fix a comment typo `slf4j-to-jul` to `jul-to-slf4j`

### What changes were proposed in this pull request?

This PR aims to fix a typo `slf4j-to-jul` to `jul-to-slf4j`. There exists 
only one.

```
$ git grep slf4j-to-jul
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:// 
slf4j-to-jul bridge order to route their logs to JUL.
```

Apache Spark uses `jul-to-slf4j` which includes a `java.util.logging` (jul) 
handler, namely `SLF4JBridgeHandler`, which routes all incoming jul records to 
the SLF4j API.


https://github.com/apache/spark/blob/bb3e27581887a094ead0d2f7b4a6b2a17ee84b6f/pom.xml#L735

### Why are the changes needed?

This typo was there since Apache Spark 1.0.0.

### Does this PR introduce _any_ user-facing change?

No, this is a comment fix.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45625 from dongjoon-hyun/jul-to-slf4j.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 common/utils/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
index 80c622bd5328..c2f61e4d7804 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -196,7 +196,7 @@ private[spark] object Logging {
   val initLock = new Object()
   try {
 // We use reflection here to handle the case where users remove the
-// slf4j-to-jul bridge order to route their logs to JUL.
+// jul-to-slf4j bridge order to route their logs to JUL.
 val bridgeClass = 
SparkClassUtils.classForName("org.slf4j.bridge.SLF4JBridgeHandler")
 bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null)
 val installed = 
bridgeClass.getMethod("isInstalled").invoke(null).asInstanceOf[Boolean]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 622ab532bb52 [SPARK-47494][DOC] Add migration doc for the behavior 
change of Parquet timestamp inference since Spark 3.3
622ab532bb52 is described below

commit 622ab532bb52a41d8e6ab0f6b4997350a63ec10b
Author: Gengliang Wang 
AuthorDate: Wed Mar 20 15:17:23 2024 -0700

[SPARK-47494][DOC] Add migration doc for the behavior change of Parquet 
timestamp inference since Spark 3.3

### What changes were proposed in this pull request?

Add migration doc for the behavior change of Parquet timestamp inference 
since Spark 3.3

### Why are the changes needed?

Show the behavior change to users.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

It's just doc change

### Was this patch authored or co-authored using generative AI tooling?

Yes, there are some doc suggestion from copilot in 
docs/sql-migration-guide.md

Closes #45623 from gengliangwang/SPARK-47494.

Authored-by: Gengliang Wang 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 11247d804cd370aaeb88736a706c587e7f5c83b3)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index b83745e75c79..b3b1fb2122e8 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -93,6 +93,8 @@ license: |
 
   - Since Spark 3.3, the `unbase64` function throws error for a malformed 
`str` input. Use `try_to_binary(, 'base64')` to tolerate malformed input 
and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function 
returns a best-efforts result for a malformed `str` input.
 
+  - Since Spark 3.3, when reading Parquet files that were not produced by 
Spark, Parquet timestamp columns with annotation `isAdjustedToUTC = false` are 
inferred as TIMESTAMP_NTZ type during schema inference. In Spark 3.2 and 
earlier, these columns are inferred as TIMESTAMP type. To restore the behavior 
before Spark 3.3, you can set `spark.sql.parquet.inferTimestampNTZ.enabled` to 
`false`.
+
   - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS 
(b)`-style SQL statements, `grouping__id` returns different values from Apache 
Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by 
expressions plus grouping set columns. To restore the behavior before 3.3.1 and 
3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For 
details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) 
and [SPARK-40562](https:/ [...]
 
 ## Upgrading from Spark SQL 3.1 to 3.2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 430a407c3963 [SPARK-47494][DOC] Add migration doc for the behavior 
change of Parquet timestamp inference since Spark 3.3
430a407c3963 is described below

commit 430a407c39633637dba738482877edf806561ba7
Author: Gengliang Wang 
AuthorDate: Wed Mar 20 15:17:23 2024 -0700

[SPARK-47494][DOC] Add migration doc for the behavior change of Parquet 
timestamp inference since Spark 3.3

### What changes were proposed in this pull request?

Add migration doc for the behavior change of Parquet timestamp inference 
since Spark 3.3

### Why are the changes needed?

Show the behavior change to users.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

It's just doc change

### Was this patch authored or co-authored using generative AI tooling?

Yes, there are some doc suggestion from copilot in 
docs/sql-migration-guide.md

Closes #45623 from gengliangwang/SPARK-47494.

Authored-by: Gengliang Wang 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 11247d804cd370aaeb88736a706c587e7f5c83b3)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 0e54c33c6d12..f788d89c4999 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -99,6 +99,8 @@ license: |
 
   - Since Spark 3.3, the `unbase64` function throws error for a malformed 
`str` input. Use `try_to_binary(, 'base64')` to tolerate malformed input 
and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function 
returns a best-efforts result for a malformed `str` input.
 
+  - Since Spark 3.3, when reading Parquet files that were not produced by 
Spark, Parquet timestamp columns with annotation `isAdjustedToUTC = false` are 
inferred as TIMESTAMP_NTZ type during schema inference. In Spark 3.2 and 
earlier, these columns are inferred as TIMESTAMP type. To restore the behavior 
before Spark 3.3, you can set `spark.sql.parquet.inferTimestampNTZ.enabled` to 
`false`.
+
   - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS 
(b)`-style SQL statements, `grouping__id` returns different values from Apache 
Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by 
expressions plus grouping set columns. To restore the behavior before 3.3.1 and 
3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For 
details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) 
and [SPARK-40562](https:/ [...]
 
 ## Upgrading from Spark SQL 3.1 to 3.2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (b8e7d99d417a -> 11247d804cd3)

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to 
avoid deprecation warning
 add 11247d804cd3 [SPARK-47494][DOC] Add migration doc for the behavior 
change of Parquet timestamp inference since Spark 3.3

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to 
avoid deprecation warning
b8e7d99d417a is described below

commit b8e7d99d417ab4bcc3e69d11a0eee5864cb083e3
Author: Anish Shrigondekar 
AuthorDate: Wed Mar 20 15:11:51 2024 -0700

[SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation 
warning

### What changes were proposed in this pull request?
Fix RocksDB Logger constructor use to avoid deprecation warning

### Why are the changes needed?
With the latest RocksDB upgrade, the Logger constructor used was deprecated 
which was throwing a compiler warning.
```
[warn] val dbLogger = new Logger(dbOptions) {
[warn]^
[warn] one warning found
[warn] two warnings found
[info] compiling 36 Scala sources and 16 Java sources to 
/Users/anish.shrigondekar/spark/spark/sql/core/target/scala-2.13/classes ...
[warn] -target is deprecated: Use -release instead to compile against the 
correct platform API.
[warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation
[warn] 
/Users/anish.shrigondekar/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:851:24:
 constructor Logger in class Logger is deprecated
[warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.execution.streaming.state.RocksDB.createLogger.dbLogger,
 origin=org.rocksdb.Logger.
```

Updated to use the new recommendation as mentioned here - 
https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Logger.html

Recommendation:
```

[Logger](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.DBOptions-)([DBOptions](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/DBOptions.html)
 dboptions)
Deprecated.
Use 
[Logger(InfoLogLevel)](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.InfoLogLevel-)
 instead, e.g. new Logger(dbOptions.infoLogLevel()).
```

After the fix, the warning is not seen.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing unit tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45616 from anishshri-db/task/SPARK-47490.

Authored-by: Anish Shrigondekar 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
index 950baba9031b..8fad5ce7bd6a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
@@ -848,7 +848,7 @@ class RocksDB(
 
   /** Create a native RocksDB logger that forwards native logs to log4j with 
correct log levels. */
   private def createLogger(): Logger = {
-val dbLogger = new Logger(dbOptions) {
+val dbLogger = new Logger(dbOptions.infoLogLevel()) {
   override def log(infoLogLevel: InfoLogLevel, logMsg: String) = {
 // Map DB log level to log4j levels
 // Warn is mapped to info because RocksDB warn is too verbose


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f66274e92d1c [SPARK-47486][CONNECT] Remove unused private 
`ArrowDeserializers.getString` method
f66274e92d1c is described below

commit f66274e92d1ce6e65fecd45711da59eb08a9d296
Author: yangjie01 
AuthorDate: Wed Mar 20 15:10:49 2024 -0700

[SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` 
method

### What changes were proposed in this pull request?
The private method `getString` in `ArrowDeserializers` is no longer used 
after SPARK-9 | https://github.com/apache/spark/pull/42076, this pr removes 
it.

### Why are the changes needed?
Code clean up.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45610 from LuciferYang/SPARK-47486.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connect/client/arrow/ArrowDeserializer.scala  | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
index ac9619487f02..eaf2927863ec 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
@@ -29,10 +29,9 @@ import scala.collection.mutable
 import scala.reflect.ClassTag
 
 import org.apache.arrow.memory.BufferAllocator
-import org.apache.arrow.vector.{FieldVector, VarCharVector, VectorSchemaRoot}
+import org.apache.arrow.vector.{FieldVector, VectorSchemaRoot}
 import org.apache.arrow.vector.complex.{ListVector, MapVector, StructVector}
 import org.apache.arrow.vector.ipc.ArrowReader
-import org.apache.arrow.vector.util.Text
 
 import org.apache.spark.sql.catalyst.ScalaReflection
 import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
@@ -468,16 +467,6 @@ object ArrowDeserializers {
 
   private def isTuple(cls: Class[_]): Boolean = 
cls.getName.startsWith("scala.Tuple")
 
-  private def getString(v: VarCharVector, i: Int): String = {
-// This is currently a bit heavy on allocations:
-// - byte array created in VarCharVector.get
-// - CharBuffer created CharSetEncoder
-// - char array in String
-// By using direct buffers and reusing the char buffer
-// we could get rid of the first two allocations.
-Text.decode(v.get(i))
-  }
-
   private def loadListIntoBuilder(
   v: ListVector,
   i: Int,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 49b4c3bc9c09 [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0
49b4c3bc9c09 is described below

commit 49b4c3bc9c09325de941dfaf41e4fd3a4a4c345f
Author: Dongjoon Hyun 
AuthorDate: Wed Mar 20 10:37:51 2024 -0700

[SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0

### What changes were proposed in this pull request?

This PR aims to upgrade to Apache Hadoop 3.4.0 for Apache Spark 4.0.0.

### Why are the changes needed?

To bring the new features like the following
- https://hadoop.apache.org/docs/r3.4.0
- [HADOOP-18995](https://issues.apache.org/jira/browse/HADOOP-18995) 
Upgrade AWS SDK version to 2.21.33 for `S3 Express One Zone`
- [HADOOP-18328](https://issues.apache.org/jira/browse/HADOOP-18328) 
Supports `S3 on Outposts`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45583 from dongjoon-hyun/SPARK-45393.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  | 27 --
 pom.xml|  2 +-
 .../spark/deploy/yarn/YarnClusterSuite.scala   |  3 ++-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 86da61d89149..903c7a245af3 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -9,7 +9,7 @@ algebra_2.13/2.8.0//algebra_2.13-2.8.0.jar
 aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar
 aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar
 aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
-aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar
+aliyun-sdk-oss/3.13.2//aliyun-sdk-oss-3.13.2.jar
 annotations/17.0.0//annotations-17.0.0.jar
 antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
 antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar
@@ -24,7 +24,6 @@ audience-annotations/0.12.0//audience-annotations-0.12.0.jar
 avro-ipc/1.11.3//avro-ipc-1.11.3.jar
 avro-mapred/1.11.3//avro-mapred-1.11.3.jar
 avro/1.11.3//avro-1.11.3.jar
-aws-java-sdk-bundle/1.12.367//aws-java-sdk-bundle-1.12.367.jar
 azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar
 azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar
 azure-storage/7.0.1//azure-storage-7.0.1.jar
@@ -32,6 +31,7 @@ blas/3.0.3//blas-3.0.3.jar
 bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
 breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
+bundle/2.23.19//bundle-2.23.19.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
 chill_2.13/0.10.0//chill_2.13-0.10.0.jar
@@ -65,21 +65,23 @@ derbytools/10.16.1.1//derbytools-10.16.1.1.jar
 
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
 eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar
 eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar
+esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar
 flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar
 gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
-hadoop-aliyun/3.3.6//hadoop-aliyun-3.3.6.jar
-hadoop-annotations/3.3.6//hadoop-annotations-3.3.6.jar
-hadoop-aws/3.3.6//hadoop-aws-3.3.6.jar
-hadoop-azure-datalake/3.3.6//hadoop-azure-datalake-3.3.6.jar
-hadoop-azure/3.3.6//hadoop-azure-3.3.6.jar
-hadoop-client-api/3.3.6//hadoop-client-api-3.3.6.jar
-hadoop-client-runtime/3.3.6//hadoop-client-runtime-3.3.6.jar
-hadoop-cloud-storage/3.3.6//hadoop-cloud-storage-3.3.6.jar
-hadoop-shaded-guava/1.1.1//hadoop-shaded-guava-1.1.1.jar
-hadoop-yarn-server-web-proxy/3.3.6//hadoop-yarn-server-web-proxy-3.3.6.jar
+hadoop-aliyun/3.4.0//hadoop-aliyun-3.4.0.jar
+hadoop-annotations/3.4.0//hadoop-annotations-3.4.0.jar
+hadoop-aws/3.4.0//hadoop-aws-3.4.0.jar
+hadoop-azure-datalake/3.4.0//hadoop-azure-datalake-3.4.0.jar
+hadoop-azure/3.4.0//hadoop-azure-3.4.0.jar
+hadoop-client-api/3.4.0//hadoop-client-api-3.4.0.jar
+hadoop-client-runtime/3.4.0//hadoop-client-runtime-3.4.0.jar
+hadoop-cloud-storage/3.4.0//hadoop-cloud-storage-3.4.0.jar
+hadoop-huaweicloud/3.4.0//hadoop-huaweicloud-3.4.0.jar
+hadoop-shaded-guava/1.2.0//hadoop-shaded-guava-1.2.0.jar
+hadoop-yarn-server-web-proxy/3.4.0//hadoop-yarn-server-web-proxy-3.4.0.jar
 hive-beeline/2.3.9//hive-beeline-2.3.9.jar
 hive-cli/2.3.9//hive

(spark) branch master updated: [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a34c8ceb19bd [SPARK-47462][SQL] Align mappings of other unsigned 
numeric types with TINYINT in MySQLDialect
a34c8ceb19bd is described below

commit a34c8ceb19bd1c1548a60bb144d1c587a2861cd8
Author: Kent Yao 
AuthorDate: Wed Mar 20 09:31:26 2024 -0700

[SPARK-47462][SQL] Align mappings of other unsigned numeric types with 
TINYINT in MySQLDialect

### What changes were proposed in this pull request?

Align mappings of other unsigned numeric types with TINYINT in 
MySQLDialect. TINYINT is mapping to ByteType and TINYINT UNSIGNED is mapping to 
ShortType.

In this PR, we
- map SMALLINT to ShortType, SMALLINT UNSIGNED to IntegerType. W/o this, 
both of them are mapping to IntegerType
- map MEDIUMINT UNSIGNED to IntegerType, and MEDIUMINT is AS-IS. W/o this, 
MEDIUMINT UNSIGNED uses LongType

Other unsigned/signed types remain unchanged and only improve the test 
coverage.

### Why are the changes needed?

Consistency and efficiency while reading MySQL numeric values

### Does this PR introduce _any_ user-facing change?

yes, the mappings described the 1st section.

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45588 from yaooqinn/SPARK-47462.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 39 ++
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 10 ++
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 3d65b4f305b3..5b2214f2efd6 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -53,11 +53,19 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits 
BIT(10), "
   + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci 
DECIMAL(40,20), flt FLOAT, "
-  + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate()
+  + "dbl DOUBLE, tiny TINYINT)").executeUpdate()
 
 conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', "
   + "17, 7, 123456789, 123456789012345, 
123456789012345.123456789012345, "
-  + "42.75, 1.0002, -128, 255)").executeUpdate()
+  + "42.75, 1.0002, -128)").executeUpdate()
+
+conn.prepareStatement("CREATE TABLE unsigned_numbers (" +
+  "tiny TINYINT UNSIGNED, small SMALLINT UNSIGNED, med MEDIUMINT 
UNSIGNED," +
+  "nor INT UNSIGNED, big BIGINT UNSIGNED, deci DECIMAL(40,20) UNSIGNED," +
+  "dbl DOUBLE UNSIGNED)").executeUpdate()
+
+conn.prepareStatement("INSERT INTO unsigned_numbers VALUES (255, 65535, 
16777215, 4294967295," +
+  "9223372036854775808, 123456789012345.123456789012345, 
1.0002)").executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -87,10 +95,10 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val rows = df.collect()
 assert(rows.length == 1)
 val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 11)
+assert(types.length == 10)
 assert(types(0).equals("class java.lang.Boolean"))
 assert(types(1).equals("class java.lang.Long"))
-assert(types(2).equals("class java.lang.Integer"))
+assert(types(2).equals("class java.lang.Short"))
 assert(types(3).equals("class java.lang.Integer"))
 assert(types(4).equals("class java.lang.Integer"))
 assert(types(5).equals("class java.lang.Long"))
@@ -98,10 +106,9 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(types(7).equals("class java.lang.Double"))
 assert(types(8).equals("class java.lang.Double"))
 assert(types(9).equals("class java.lang.Byte"))
-assert(types(10).equals("class java.lang.Short"))
 assert(rows(0).getBoolean(0) == false)
 assert(rows

(spark) branch branch-3.5 updated: [SPARK-47481][INFRA][3.5] Fix Python linter

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9baf82b1c97a [SPARK-47481][INFRA][3.5] Fix Python linter
9baf82b1c97a is described below

commit 9baf82b1c97a792a3733dedccf1c03737b592bbd
Author: panbingkun 
AuthorDate: Wed Mar 20 07:19:29 2024 -0700

[SPARK-47481][INFRA][3.5] Fix Python linter

### What changes were proposed in this pull request?
The pr aims to fix `python linter issue` on `branch-3.5` through pinning 
`matplotlib==3.7.2`

### Why are the changes needed?
Fix `python linter issue` on `branch-3.5`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45550 from panbingkun/branch-3.5_scheduled_job.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index d3fcd7ab3622..f0b88666c040 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,10 +65,10 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=2.0.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3' 
scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage 
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
 
 # Add torch as a testing dependency for TorchDistributor
-RUN python3.9 -m pip install torch torchvision torcheval
+RUN python3.9 -m pip install 'torch==2.0.1' 'torchvision==0.15.2' torcheval


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure

2024-03-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 4de8000f21a4 [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix 
Python linter failure
4de8000f21a4 is described below

commit 4de8000f21a48796d30af37bc57269395792a254
Author: panbingkun 
AuthorDate: Wed Mar 20 07:15:32 2024 -0700

[SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter 
failure

### What changes were proposed in this pull request?
The pr aims to fix `python linter issue` on branch-3.4 through pinning 
`matplotlib<3.3.0`

### Why are the changes needed?
- Through this PR https://github.com/apache/spark/pull/45600, we found that 
the version of `matplotlib` in our Docker image was `3.8.2`, which clearly did 
not meet the original requirements for `branch-3.4`.
  
https://github.com/panbingkun/spark/actions/runs/8354370179/job/22869580038
  https://github.com/apache/spark/assets/15246973/dd425bfb-ce5f-4a99-a487-a462d6e9";>
  https://github.com/apache/spark/blob/branch-3.4/dev/requirements.txt#L12
  https://github.com/apache/spark/assets/15246973/70485648-b886-4218-bb21-c41a85d5eecf";>

- Fix as follows:
https://github.com/apache/spark/assets/15246973/db31d8fb-0b6c-4925-95e1-0ca0247bb9f5";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45608 from panbingkun/branch_3.4_pin_matplotlib.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 68d27052437b..5ebd10339be9 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -37,6 +37,7 @@ RUN add-apt-repository ppa:pypy/ppa
 RUN apt update
 RUN $APT_INSTALL gfortran libopenblas-dev liblapack-dev
 RUN $APT_INSTALL build-essential
+RUN $APT_INSTALL python3-matplotlib
 
 RUN mkdir -p /usr/local/pypy/pypy3.7 && \
 curl -sqL https://downloads.python.org/pypy/pypy3.7-v7.3.7-linux64.tar.bz2 
| tar xjf - -C /usr/local/pypy/pypy3.7 --strip-components=1 && \
@@ -64,8 +65,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' 
scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage 
'matplotlib<3.3.0'
+RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' 
scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
'matplotlib<3.3.0' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos 
grpcio-status


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`

2024-03-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d25f49a14733 [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in 
`dev/infra/Dockerfile`
d25f49a14733 is described below

commit d25f49a14733c5a0e872498cab40a30a5ebc28b4
Author: Dongjoon Hyun 
AuthorDate: Tue Mar 19 20:53:45 2024 -0700

[SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`

### What changes were proposed in this pull request?

This PR aims to pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` to recover 
the following test failure.

### Why are the changes needed?

`numpy==1.23.5` was the version of the last successful run.
- https://github.com/apache/spark/actions/runs/8276453417/job/22725387782

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Closes #45595 from dongjoon-hyun/pin-numpy.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 93d8793826ff..68d27052437b 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy 
unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' 
scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos 
grpcio-status


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (bc378f4ff5e2 -> 61d7b0f24fc9)

2024-03-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from bc378f4ff5e2 [SPARK-47330][SQL][TESTS] XML: Added XmlExpressionsSuite
 add 61d7b0f24fc9 [SPARK-47470][SQL][TESTS] Ignore 
`IntentionallyFaultyConnectionProvider` error in `CliSuite`

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven

2024-03-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c32d27850e2e [SPARK-47468][BUILD] Exclude `logback` dependency from 
SBT like Maven
c32d27850e2e is described below

commit c32d27850e2ea5f8cb36099ab8453b09f4c70861
Author: Dongjoon Hyun 
AuthorDate: Tue Mar 19 17:52:38 2024 -0700

[SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven

### What changes were proposed in this pull request?

This PR aims to exclude `logback` from SBT dependency like Maven to fix the 
following SBT issue.

```
[info]   stderr> SLF4J: Class path contains multiple SLF4J bindings.
[info]   stderr> SLF4J: Found binding in 
[jar:file:/home/runner/work/spark/spark/assembly/target/scala-2.13/jars/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[info]   stderr> SLF4J: Found binding in 
[jar:file:/home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[info]   stderr> SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
[info]   stderr> SLF4J: Actual binding is of type 
[ch.qos.logback.classic.util.ContextSelectorStaticBinder]
```

### Why are the changes needed?

**Maven**
```
$ build/mvn dependency:tree --pl core | grep logback
Using `mvn` from path: /opt/homebrew/bin/mvn
Using SPARK_LOCAL_IP=localhost
```

**SBT (BEFORE)**
```
$ build/sbt "core/test:dependencyTree" | grep logback
Using SPARK_LOCAL_IP=localhost
[info]   |   +-ch.qos.logback:logback-classic:1.2.13
[info]   |   | +-ch.qos.logback:logback-core:1.2.13
[info]   |   +-ch.qos.logback:logback-core:1.2.13
[info]   | | +-ch.qos.logback:logback-classic:1.2.13
[info]   | | | +-ch.qos.logback:logback-core:1.2.13
[info]   | | +-ch.qos.logback:logback-core:1.2.13
[info]   | +-ch.qos.logback:logback-classic:1.2.13
[info]   | | +-ch.qos.logback:logback-core:1.2.13
[info]   | +-ch.qos.logback:logback-core:1.2.13
```

**SBT (AFTER)**
```
$ build/sbt "core/test:dependencyTree" | grep logback
Using SPARK_LOCAL_IP=localhost
```

### Does this PR introduce _any_ user-facing change?

No. This only fixes developer and CI issues.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

    No.
    
Closes #45594 from dongjoon-hyun/SPARK-47468.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 project/SparkBuild.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index b7b9589568e1..3d89af2aa7b4 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -1078,6 +1078,7 @@ object ExcludedDependencies {
 // purpose only. Here we exclude them from the whole project scope and add 
them w/ yarn only.
 excludeDependencies ++= Seq(
   ExclusionRule(organization = "com.sun.jersey"),
+  ExclusionRule(organization = "ch.qos.logback"),
   ExclusionRule("javax.ws.rs", "jsr311-api"))
   )
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant`

2024-03-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32ee2d7936a5 [SPARK-47464][INFRA] Update `labeler.yml` for module 
`common/sketch` and `common/variant`
32ee2d7936a5 is described below

commit 32ee2d7936a50a653e8ea599d622fbc550fa5eac
Author: panbingkun 
AuthorDate: Tue Mar 19 16:27:15 2024 -0700

[SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and 
`common/variant`

### What changes were proposed in this pull request?
The pr aims to update `labeler.yml` for module `common/sketch` and 
`common/variant`.

### Why are the changes needed?
Currently, the above modules are not classified in the file `labeler.yml`, 
and the GitHub action label cannot automatically tag the submitted PR.

### Does this PR introduce _any_ user-facing change?
Yes, only for dev.

### How was this patch tested?
Manually test: after this PR is merged, continue to observe.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45590 from panbingkun/SPARK-47464.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .github/labeler.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.github/labeler.yml b/.github/labeler.yml
index 7d24390f2968..104eac99ec4d 100644
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -101,6 +101,8 @@ SQL:
 ]
 - any-glob-to-any-file: [
  'common/unsafe/**/*',
+ 'common/sketch/**/*',
+ 'common/variant/**/*',
  'bin/spark-sql*',
  'bin/beeline*',
  'sbin/*thriftserver*.sh',


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (90560dce85b0 -> db531c6ee719)

2024-03-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 90560dce85b0 [SPARK-47458][CORE] Fix the problem with calculating the 
maximum concurrent tasks for the barrier stage
 add db531c6ee719 [SPARK-47461][CORE] Remove private function 
`totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager`

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala  | 4 
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala  | 4 +---
 2 files changed, 1 insertion(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (b6a836946311 -> a6bffcc3e5f0)

2024-03-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b6a836946311 [SPARK-47454][PYTHON][CONNECT][TESTS] Split 
`pyspark.sql.tests.test_dataframe`
 add a6bffcc3e5f0 [SPARK-47457][SQL] Fix 
`IsolatedClientLoader.supportsHadoopShadedClient` to handle Hadoop 3.4+

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 2 ++
 .../org/apache/spark/sql/hive/client/HadoopVersionInfoSuite.scala | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile`

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ef94f7094989 [SPARK-47452][INFRA] Use `Ubuntu 22.04` in 
`dev/infra/Dockerfile`
ef94f7094989 is described below

commit ef94f709498974cb31e805541e0803270cd5c39e
Author: Dongjoon Hyun 
AuthorDate: Mon Mar 18 23:15:32 2024 -0700

[SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile`

### What changes were proposed in this pull request?

This PR aims to use `Ubuntu 22.04` in `dev/infra/Dockerfile` for Apache 
Spark 4.0.0.

| Installed SW  | BEFORE | AFTER |
| - |  | --- |
| Ubuntu LTS   | 20.04.5 | 22.04.4  |
| Java| 17.0.10  | 17.0.10 |
| PyPy 3.8| 3.8.16| 3.8.16  |
| Python 3.9 | 3.9.5 | 3.9.18  |
| Python 3.10   | 3.10.13  | 3.10.12 |
| Python 3.11| 3.11.8| 3.11.8 |
| Python 3.12   | 3.12.2| 3.12.2 |
| R | 3.6.3 | 4.1.2  |

### Why are the changes needed?

- Since Apache Spark 3.4.0, we use `Ubuntu 20.04` via SPARK-39522.
- From Apache Spark 4.0.0, this PR aims to use `Ubuntu 22.04` mainly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45576 from dongjoon-hyun/SPARK-47452.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 52 +---
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 64adf33e6742..f17ee58c9d90 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -15,11 +15,11 @@
 # limitations under the License.
 #
 
-# Image for building and testing Spark branches. Based on Ubuntu 20.04.
+# Image for building and testing Spark branches. Based on Ubuntu 22.04.
 # See also in https://hub.docker.com/_/ubuntu
-FROM ubuntu:focal-20221019
+FROM ubuntu:jammy-20240227
 
-ENV FULL_REFRESH_DATE 20240117
+ENV FULL_REFRESH_DATE 20240318
 
 ENV DEBIAN_FRONTEND noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN true
@@ -50,10 +50,8 @@ RUN apt-get update && apt-get install -y \
 openjdk-17-jdk-headless \
 pandoc \
 pkg-config \
-python3-pip \
-python3-setuptools \
-python3.8 \
-python3.9 \
+python3.10 \
+python3-psutil \
 qpdf \
 r-base \
 ruby \
@@ -64,10 +62,10 @@ RUN apt-get update && apt-get install -y \
 && rm -rf /var/lib/apt/lists/*
 
 
-RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> 
/etc/apt/sources.list
+RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' >> 
/etc/apt/sources.list
 RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9
 RUN gpg -a --export E084DAB9 | apt-key add -
-RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
focal-cran40/'
+RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
jammy-cran40/'
 
 # See more in SPARK-39959, roxygen2 < 7.2.1
 RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown',  \
@@ -82,9 +80,6 @@ RUN Rscript -e "install.packages(c('devtools', 'knitr', 
'markdown',  \
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 
-RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
-
-
 RUN add-apt-repository ppa:pypy/ppa
 RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 curl -sqL 
https://downloads.python.org/pypy/pypy3.8-v7.3.11-linux64.tar.bz2 | tar xjf - 
-C /usr/local/pypy/pypy3.8 --strip-components=1 && \
@@ -98,41 +93,44 @@ ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 
pandas<=2.2.1 scipy plotly
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 protobuf==4.25.1 
googleapis-common-protos==1.56.4"
 
-# Add torch as a testing dependency for TorchDistributor and 
DeepspeedTorchDistributor
-RUN python3.9 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting 
$CONNECT_PIP_PKGS && \
-python3.9 -m pip install torch torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
-python3.9 -m pip install deepspeed torcheval && \
-python3.9 -m pip cache purge
-
-# Install Python 3.10 at the last stage to avoid breaking Python 3.9
-RUN add-apt-repository ppa:deadsnakes/ppa
-RUN apt-get update && apt-get install -y \
-python3.10 python3.10-distut

(spark) branch master updated (5f48931fcdf7 -> 5e42ecc8163a)

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL 
docker image version to 8.3.0
 add 5e42ecc8163a [SPARK-47456][SQL] Support ORC Brotli codec

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-orc.md | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala   | 4 ++--
 .../spark/sql/execution/datasources/orc/OrcCompressionCodec.java | 3 ++-
 .../org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala| 3 ++-
 .../spark/sql/execution/datasources/FileSourceCodecSuite.scala   | 5 -
 5 files changed, 11 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (681b41f0808e -> 5f48931fcdf7)

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 681b41f0808e [SPARK-47422][SQL] Support collated strings in array 
operations
 add 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL 
docker image version to 8.3.0

No new revisions were added by this update.

Summary of changes:
 ...baseOnDocker.scala => MySQLDatabaseOnDocker.scala} | 17 +++--
 .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 19 ---
 .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala   | 19 ---
 4 files changed, 18 insertions(+), 52 deletions(-)
 copy 
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{DB2DatabaseOnDocker.scala
 => MySQLDatabaseOnDocker.scala} (66%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (9f8147c2a8d2 -> e01ed0da22f2)

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe 
while using foreachbatch and stateful streaming query to prevent state from 
being re-loaded in each batch
 add e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML 
within XmlFunctionsSuite

No new revisions were added by this update.

Summary of changes:
 sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (cb20fcae951d -> acf17fd67217)

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cb20fcae951d [SPARK-47448][CORE] Enable 
`spark.shuffle.service.removeShuffle` by default
 add acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub 
Action job

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_sparkr_window.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (51e8634a5883 -> cb20fcae951d)

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the 
SparkSession is the same
 add cb20fcae951d [SPARK-47448][CORE] Enable 
`spark.shuffle.service.removeShuffle` by default

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala   | 1 +
 docs/configuration.md  | 2 +-
 docs/core-migration-guide.md   | 2 ++
 4 files changed, 5 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a40940a0bc6d [SPARK-47446][CORE] Make `BlockManager` warn before 
`removeBlockInternal`
a40940a0bc6d is described below

commit a40940a0bc6de58b5c56b8ad918f338c6e70572f
Author: Dongjoon Hyun 
AuthorDate: Mon Mar 18 12:39:44 2024 -0700

[SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`

### What changes were proposed in this pull request?

This PR aims to make `BlockManager` warn before invoking 
`removeBlockInternal` by switching the log position. To be clear,
1. For the case where `removeBlockInternal` succeeds, the log messages are 
identical before and after this PR.
2. For the case where `removeBlockInternal` fails, the user will see one 
additional warning message like the following which was hidden from the users 
before this PR.
```
logWarning(s"Putting block $blockId failed")
```

### Why are the changes needed?

When `Put` operation fails, Apache Spark currently tries 
`removeBlockInternal` first before logging.


https://github.com/apache/spark/blob/ce93c9fd86715e2479552628398f6fc11e83b2af/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1554-L1567

On top of that, if `removeBlockInternal` fails consecutively, Spark shows 
the warning like the following and fails the job.
```
24/03/18 18:40:46 WARN BlockManager: Putting block broadcast_0 failed due 
to exception java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e.
24/03/18 18:40:46 WARN BlockManager: Block broadcast_0 was not removed 
normally.
24/03/18 18:40:46 INFO TaskSchedulerImpl: Cancelling stage 0
24/03/18 18:40:46 INFO TaskSchedulerImpl: Killing all running tasks in 
stage 0: Stage cancelled
24/03/18 18:40:46 INFO DAGScheduler: ResultStage 0 (reduce at 
SparkPi.scala:38) failed in 0.264 s due to Job aborted due to stage failure: 
Task serialization failed: java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
```

It's misleading although they might share the same root cause. Since `Put` 
operation fails before the above failure, we had better switch WARN message to 
make it clear.

### Does this PR introduce _any_ user-facing change?

No. This is a warning message change only.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45570 from dongjoon-hyun/SPARK-47446.

    Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/storage/BlockManager.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 228ec5752e1b..89b3914e94af 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -1561,8 +1561,8 @@ private[spark] class BlockManager(
   blockInfoManager.unlock(blockId)
 }
   } else {
-removeBlockInternal(blockId, tellMaster = false)
 logWarning(s"Putting block $blockId failed")
+removeBlockInternal(blockId, tellMaster = false)
   }
   res
 } catch {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47383][CORE] Support `spark.shutdown.timeout` config

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ce93c9fd8671 [SPARK-47383][CORE] Support `spark.shutdown.timeout` 
config
ce93c9fd8671 is described below

commit ce93c9fd86715e2479552628398f6fc11e83b2af
Author: Rob Reeves 
AuthorDate: Mon Mar 18 10:36:38 2024 -0700

[SPARK-47383][CORE] Support `spark.shutdown.timeout` config

### What changes were proposed in this pull request?
Make the shutdown hook timeout configurable. If this is not defined it 
falls back to the existing behavior, which uses a default timeout of 30 
seconds, or whatever is defined in core-site.xml for the 
hadoop.service.shutdown.timeout property.

### Why are the changes needed?
Spark sometimes times out during the shutdown process. This can result in 
data left in the queues to be dropped and causes metadata loss (e.g. event 
logs, anything written by custom listeners).

This is not easily configurable before this change. The underlying 
`org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 
seconds.  It can be configured by setting hadoop.service.shutdown.timeout, but 
this must be done in the core-site.xml/core-default.xml because a new hadoop 
conf object is created and there is no opportunity to modify it.

### Does this PR introduce _any_ user-facing change?
Yes, a new config `spark.shutdown.timeout` is added.

### How was this patch tested?
Manual testing in spark-shell. This behavior is not practical to write a 
unit test for.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45504 from robreeves/sc_shutdown_timeout.

Authored-by: Rob Reeves 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/internal/config/package.scala| 10 ++
 .../org/apache/spark/util/ShutdownHookManager.scala   | 19 +--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index aa240b5cc5b5..e72b9cb694eb 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -2683,4 +2683,14 @@ package object config {
   .version("4.0.0")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS =
+ConfigBuilder("spark.shutdown.timeout")
+  .internal()
+  .doc("Defines the timeout period to wait for all shutdown hooks to be 
executed. " +
+"This must be passed as a system property argument in the Java 
options, for example " +
+"spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".")
+  .version("4.0.0")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createOptional
 }
diff --git 
a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala 
b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
index 4db268604a3e..c6cad9440168 100644
--- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
+++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
@@ -19,12 +19,16 @@ package org.apache.spark.util
 
 import java.io.File
 import java.util.PriorityQueue
+import java.util.concurrent.TimeUnit
 
 import scala.util.Try
 
 import org.apache.hadoop.fs.FileSystem
 
+import org.apache.spark.SparkConf
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.SPARK_SHUTDOWN_TIMEOUT_MS
+
 
 /**
  * Various utility methods used by Spark.
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
 val hookTask = new Runnable() {
   override def run(): Unit = runAll()
 }
-org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-  hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+// The timeout property must be passed as a Java system property because 
this
+// is initialized before Spark configurations are registered as system
+// properties later in initialization.
+val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS)
+
+timeout.fold {
+  org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+hookTask, priority)
+} { t =>
+  org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+hookTask, priority, t, TimeUnit.MILLISECONDS)
+}
   }
 
   def runAll(): Unit = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8bd42cbdb6bf [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED 
TINYINT  caused by SPARK-45561
8bd42cbdb6bf is described below

commit 8bd42cbdb6bfa40aead94570b06e926f8e8aa9e1
Author: Kent Yao 
AuthorDate: Mon Mar 18 08:56:55 2024 -0700

[SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT  caused by 
SPARK-45561

### What changes were proposed in this pull request?

SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, 
which caused unsigned TINYINT overflow. As regardless of signed or unsigned 
types, the TINYINT is used for java.sql.Types.

In this PR, we put the signed info into the metadata for mapping TINYINT to 
short or byte.

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

Uses can read MySQL UNSIGNED TINYINT values after this PR like versions 
before 3.5.0 which has breaked since 3.5.1

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45556 from yaooqinn/SPARK-47435.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  9 ++--
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|  9 ++--
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |  6 ++-
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 15 --
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |  9 ++--
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |  9 ++--
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 26 ++
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  5 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 10 ++--
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 60 --
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 24 +
 11 files changed, 114 insertions(+), 68 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index b1d239337aa0..79e88f109534 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -57,10 +57,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits 
BIT(10), "
   + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci 
DECIMAL(40,20), flt FLOAT, "
-  + "dbl DOUBLE, tiny TINYINT)").executeUpdate()
+  + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate()
+
 conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', "
   + "17, 7, 123456789, 123456789012345, 
123456789012345.123456789012345, "
-  + "42.75, 1.0002, -128)").executeUpdate()
+  + "42.75, 1.0002, -128, 255)").executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -90,7 +91,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val rows = df.collect()
 assert(rows.length == 1)
 val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 10)
+assert(types.length == 11)
 assert(types(0).equals("class java.lang.Boolean"))
 assert(types(1).equals("class java.lang.Long"))
 assert(types(2).equals("class java.lang.Integer"))
@@ -101,6 +102,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(types(7).equals("class java.lang.Double"))
 assert(types(8).equals("class java.lang.Double"))
 assert(types(9).equals("class java.lang.Byte"))
+assert(types(10).equals("class java.lang.Short"))
 assert(rows(0).getBoolean(0) == false)
 assert(rows(0).getLong(1) == 0x225)
 assert(rows(0).getInt(2) == 17)
@@ -112,6 +114,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows(0).getDouble(7) == 42.75)
 assert(rows(0).getDouble(8) == 1.0002)
 assert(rows(0).getByte(9) == 0x80.toByte)
+assert(rows(0).getShort(10) == 0xff.toShort)
   }
 
   test("Date types") {
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apa

(spark) branch master updated (4dc362dbc6c0 -> 1aafe60b3e76)

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
 add 1aafe60b3e76 [SPARK-47442][CORE][TEST] Use port 0 to start worker 
servers in MasterSuite

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/deploy/master/MasterSuiteBase.scala| 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47438][BUILD] Upgrade jackson to 2.17.0

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
4dc362dbc6c0 is described below

commit 4dc362dbc6c039d955e4dceb87e53dfc76ef2a5c
Author: panbingkun 
AuthorDate: Mon Mar 18 08:25:16 2024 -0700

[SPARK-47438][BUILD] Upgrade jackson to 2.17.0

### What changes were proposed in this pull request?
The pr aims to upgrade  jackson from `2.16.1` to `2.17.0`.

### Why are the changes needed?
The full release notes: 
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45562 from panbingkun/SPARK-47438.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++---
 pom.xml   |  4 ++--
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index d4b7d38aea22..86da61d89149 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -103,15 +103,15 @@ icu4j/72.1//icu4j-72.1.jar
 ini4j/0.5.4//ini4j-0.5.4.jar
 istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar
 ivy/2.5.2//ivy-2.5.2.jar
-jackson-annotations/2.16.1//jackson-annotations-2.16.1.jar
+jackson-annotations/2.17.0//jackson-annotations-2.17.0.jar
 jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar
-jackson-core/2.16.1//jackson-core-2.16.1.jar
-jackson-databind/2.16.1//jackson-databind-2.16.1.jar
-jackson-dataformat-cbor/2.16.1//jackson-dataformat-cbor-2.16.1.jar
-jackson-dataformat-yaml/2.16.1//jackson-dataformat-yaml-2.16.1.jar
-jackson-datatype-jsr310/2.16.1//jackson-datatype-jsr310-2.16.1.jar
+jackson-core/2.17.0//jackson-core-2.17.0.jar
+jackson-databind/2.17.0//jackson-databind-2.17.0.jar
+jackson-dataformat-cbor/2.17.0//jackson-dataformat-cbor-2.17.0.jar
+jackson-dataformat-yaml/2.17.0//jackson-dataformat-yaml-2.17.0.jar
+jackson-datatype-jsr310/2.17.0//jackson-datatype-jsr310-2.17.0.jar
 jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
-jackson-module-scala_2.13/2.16.1//jackson-module-scala_2.13-2.16.1.jar
+jackson-module-scala_2.13/2.17.0//jackson-module-scala_2.13-2.17.0.jar
 jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar
 jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar
 jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar
diff --git a/pom.xml b/pom.xml
index 757d911c1229..5cc56a92999d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -184,8 +184,8 @@
 true
 true
 1.9.13
-2.16.1
-
2.16.1
+2.17.0
+
2.17.0
 2.3.1
 3.0.2
 1.1.10.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57424b92c5b5 [MINOR][DOCS] Add `Web UI` link to `Other Documents` 
section of index.md
57424b92c5b5 is described below

commit 57424b92c5b5e7c3de680a7d8a6b137911f45666
Author: Matt Braymer-Hayes 
AuthorDate: Mon Mar 18 07:53:11 2024 -0700

[MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md

### What changes were proposed in this pull request?

Adds the Web UI to the `Other Documents` list on the main page.

### Why are the changes needed?

I found it difficult to find the Web UI docs: it's only linked inside the 
Monitoring docs. Adding it to the main page will make it easier for people to 
find and use the docs.

### Does this PR introduce _any_ user-facing change?

Yes: adds another cross-reference on the main page.

### How was this patch tested?

Visually verified that Markdown still rendered properly.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45534 from mattayes/patch-2.

Authored-by: Matt Braymer-Hayes 
Signed-off-by: Dongjoon Hyun 
---
 docs/index.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/index.md b/docs/index.md
index 5f3858bec86b..12c53c40c8f7 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -138,6 +138,7 @@ options for deployment:
 
 * [Configuration](configuration.html): customize Spark via its configuration 
system
 * [Monitoring](monitoring.html): track the behavior of your applications
+* [Web UI](web-ui.html): view useful information about your applications
 * [Tuning Guide](tuning.html): best practices to optimize performance and 
memory use
 * [Job Scheduling](job-scheduling.html): scheduling resources across and 
within Spark applications
 * [Security](security.html): Spark security support
@@ -145,7 +146,7 @@ options for deployment:
 * Integration with other storage systems:
   * [Cloud Infrastructures](cloud-integration.html)
   * [OpenStack Swift](storage-openstack-swift.html)
-* [Migration Guide](migration-guide.html): Migration guides for Spark 
components
+* [Migration Guide](migration-guide.html): migration guides for Spark 
components
 * [Building Spark](building-spark.html): build Spark using the Maven system
 * [Contributing to Spark](https://spark.apache.org/contributing.html)
 * [Third Party Projects](https://spark.apache.org/third-party-projects.html): 
related third party Spark projects


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 7a899e219f5a [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`
7a899e219f5a is described below

commit 7a899e219f5a17ab12aeb8d67738025b7e2b9d9c
Author: Huw Campbell 
AuthorDate: Mon Mar 18 07:38:10 2024 -0700

[SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

### What changes were proposed in this pull request?

Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when 
one is using proxy settings. Change the generated link to be consistent with 
other links and include a trailing slash

### Why are the changes needed?

When using a proxy, an invalid redirect is issued if this is not included

### Does this PR introduce _any_ user-facing change?

Only that people will be able to use these links if they are using a proxy

### How was this patch tested?

With a proxy installed I went to the location this link would generate and 
could go to the page, when it redirects with the link as it exists.

Edit: Further tested by building a version of our application with this 
patch applied, the links work now.

### Was this patch authored or co-authored using generative AI tooling?

No.

Page with working link
https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";>

Goes correctly to
https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";>

Before it would redirect and we'd get a 404.

https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";>

Closes #45527 from HuwCampbell/patch-1.

Authored-by: Huw Campbell 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
index 7cd7db4088ac..ce3e7cde01b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
@@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable(
 
   override def row(query: StructuredStreamingRow): Seq[Node] = {
 val streamingQuery = query.streamingUIData
-val statisticsLink = "%s/%s/statistics?id=%s"
+val statisticsLink = "%s/%s/statistics/?id=%s"
   .format(SparkUIUtils.prependBaseUri(request, parent.basePath), 
parent.prefix,
 streamingQuery.summary.runId)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new bb7a6138b827 [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`
bb7a6138b827 is described below

commit bb7a6138b827975fc827813ab42a2b9074bf8d5e
Author: Huw Campbell 
AuthorDate: Mon Mar 18 07:38:10 2024 -0700

[SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

### What changes were proposed in this pull request?

Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when 
one is using proxy settings. Change the generated link to be consistent with 
other links and include a trailing slash

### Why are the changes needed?

When using a proxy, an invalid redirect is issued if this is not included

### Does this PR introduce _any_ user-facing change?

Only that people will be able to use these links if they are using a proxy

### How was this patch tested?

With a proxy installed I went to the location this link would generate and 
could go to the page, when it redirects with the link as it exists.

Edit: Further tested by building a version of our application with this 
patch applied, the links work now.

### Was this patch authored or co-authored using generative AI tooling?

No.

Page with working link
https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";>

Goes correctly to
https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";>

Before it would redirect and we'd get a 404.

https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";>

Closes #45527 from HuwCampbell/patch-1.

Authored-by: Huw Campbell 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
index 7cd7db4088ac..ce3e7cde01b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
@@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable(
 
   override def row(query: StructuredStreamingRow): Seq[Node] = {
 val streamingQuery = query.streamingUIData
-val statisticsLink = "%s/%s/statistics?id=%s"
+val statisticsLink = "%s/%s/statistics/?id=%s"
   .format(SparkUIUtils.prependBaseUri(request, parent.basePath), 
parent.prefix,
 streamingQuery.summary.runId)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (d3f12df6e09e -> 9b466d329c3c)

2024-03-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class 
for `DataFrame.sort*`
 add 9b466d329c3c [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated (be0e44e59b3e -> b4e2c6750cb3)

2024-03-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` 
in CI
 add b4e2c6750cb3 [SPARK-47433][PYTHON][DOCS][INFRA][3.4] Update PySpark 
package dependency with version ranges

No new revisions were added by this update.

Summary of changes:
 dev/requirements.txt   |  2 +-
 python/docs/source/getting_started/install.rst | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0`

2024-03-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new cc6912ec612c [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` 
upper bound requirement, `<13.0.0`
cc6912ec612c is described below

commit cc6912ec612c30e46e1595860a5519bb1caa221b
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 17 15:15:50 2024 -0700

[SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound 
requirement, `<13.0.0`

### What changes were proposed in this pull request?

This PR aims to add `pyarrow` upper bound requirement, `<13.0.0`, to Apache 
Spark 3.5.x.

### Why are the changes needed?

PyArrow 13.0.0 has breaking changes mentioned by #42920 which is a part of 
Apache Spark 4.0.0.

### Does this PR introduce _any_ user-facing change?

No, this only clarifies the upper bound.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45553 from dongjoon-hyun/SPARK-47432.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/requirements.txt   | 2 +-
 python/docs/source/getting_started/install.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index 597417aba1f3..0749af75aa4b 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -3,7 +3,7 @@ py4j
 
 # PySpark dependencies (optional)
 numpy
-pyarrow
+pyarrow<13.0.0
 pandas
 scipy
 plotly
diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index 6822285e9617..e97632a8b384 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -157,7 +157,7 @@ PackageSupported version Note
 == = 
==
 `py4j` >=0.10.9.7Required
 `pandas`   >=1.0.5   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`pyarrow`  >=4.0.0   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
+`pyarrow`  >=4.0.0,<13.0.0   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
 `numpy`>=1.15Required for pandas API 
on Spark and MLLib DataFrame-based API; Optional for Spark SQL
 `grpcio`   >=1.48,<1.57  Required for Spark Connect
 `grpcio-status`>=1.48,<1.57  Required for Spark Connect


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

2024-03-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` 
in CI
be0e44e59b3e is described below

commit be0e44e59b3e71cb11353e11f19146e0d1827432
Author: Ruifeng Zheng 
AuthorDate: Wed Sep 13 15:51:27 2023 +0800

[SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

Pin `pyarrow==12.0.1` in CI

to fix test failure,  
https://github.com/apache/spark/actions/runs/6167186123/job/16738683632

```
==
FAIL [0.095s]: test_from_to_pandas 
(pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, 
in _assert_pandas_equal
assert_series_equal(
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
931, in assert_series_equal
assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
415, in assert_attr_equal
raise_assert_detail(obj, msg, left_attr, right_attr)
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
599, in raise_assert_detail
raise AssertionError(msg)
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  datetime64[ns]
[right]: datetime64[us]
```

No

CI and manually test

No

Closes #42897 from zhengruifeng/pin_pyarrow.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b)
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8049a203b8c5f2f8045701916e66cfc786e16b57)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 ++--
 dev/infra/Dockerfile | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 33747fb5b61d..2184577d5c44 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -252,7 +252,7 @@ jobs:
 - name: Install Python packages (Python 3.8)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
   run: |
-python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5'
+python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 
scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5'
 python3.8 -m pip list
 # Run the tests.
 - name: Run tests
@@ -626,7 +626,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
-python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8'
+python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
 python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 2e78f4af2144..93d8793826ff 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=1.5.3' scipy 
unittest-xml-reporting plotl

(spark) branch branch-3.5 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

2024-03-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8049a203b8c5 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` 
in CI
8049a203b8c5 is described below

commit 8049a203b8c5f2f8045701916e66cfc786e16b57
Author: Ruifeng Zheng 
AuthorDate: Wed Sep 13 15:51:27 2023 +0800

[SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

### What changes were proposed in this pull request?
Pin `pyarrow==12.0.1` in CI

### Why are the changes needed?
to fix test failure,  
https://github.com/apache/spark/actions/runs/6167186123/job/16738683632

```
==
FAIL [0.095s]: test_from_to_pandas 
(pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, 
in _assert_pandas_equal
assert_series_equal(
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
931, in assert_series_equal
assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
415, in assert_attr_equal
raise_assert_detail(obj, msg, left_attr, right_attr)
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
599, in raise_assert_detail
raise AssertionError(msg)
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  datetime64[ns]
[right]: datetime64[us]
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI and manually test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42897 from zhengruifeng/pin_pyarrow.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 ++--
 dev/infra/Dockerfile | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index b0760a955342..8488540b415d 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -258,7 +258,7 @@ jobs:
 - name: Install Python packages (Python 3.8)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
   run: |
-python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3'
+python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 
scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3'
 python3.8 -m pip list
 # Run the tests.
 - name: Run tests
@@ -684,7 +684,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
-python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8'
+python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
 python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index d3bae836cc63..d3fcd7ab3622 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas

(spark) branch master updated: [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre`

2024-03-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2dba72100e03 [SPARK-47426][BUILD] Upgrade Guava used by the connect 
module to `33.1.0-jre`
2dba72100e03 is described below

commit 2dba72100e0326f1889ff0be2dc576b1e712ad15
Author: panbingkun 
AuthorDate: Sun Mar 17 13:52:14 2024 -0700

[SPARK-47426][BUILD] Upgrade Guava used by the connect module to 
`33.1.0-jre`

### What changes were proposed in this pull request?
The pr aims to upgrade Guava used by the `connect` module to `33.1.0-jre`.

### Why are the changes needed?
- The new version bring some bug fixes and optimizations as follows:
cache: Fixed a bug that could cause 
https://github.com/google/guava/pull/6851#issuecomment-1931276822.
hash: Optimized Checksum-based hash functions for Java 9+.

- The full release notes:
https://github.com/google/guava/releases/tag/v33.1.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45540 from panbingkun/SPARK-47426.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index d67ab1c01273..757d911c1229 100644
--- a/pom.xml
+++ b/pom.xml
@@ -288,7 +288,7 @@
 
true
 
 
-33.0.0-jre
+33.1.0-jre
 1.0.2
 1.62.2
 1.1.3


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: Update the organization in committers.md (#509)

2024-03-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 3eae7010b9 Update the organization in committers.md (#509)
3eae7010b9 is described below

commit 3eae7010b9f3cc01ceabe5036c0bd8910ccb8c67
Author: Jerry Shao 
AuthorDate: Sat Mar 16 20:53:28 2024 -0700

Update the organization in committers.md (#509)
---
 committers.md| 2 +-
 site/committers.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/committers.md b/committers.md
index 58aedb94fd..17530a2411 100644
--- a/committers.md
+++ b/committers.md
@@ -73,7 +73,7 @@ navigation:
 |Josh Rosen|Stripe|
 |Sandy Ryza|Remix|
 |Kousuke Saruta|NTT Data|
-|Saisai Shao|Tencent|
+|Saisai Shao|Datastrato|
 |Prashant Sharma|IBM|
 |Gabor Somogyi|Apple|
 |Ram Sriharsha|Databricks|
diff --git a/site/committers.html b/site/committers.html
index 8a9839aa91..22e2f4c481 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -403,7 +403,7 @@
 
 
   Saisai Shao
-  Tencent
+  Datastrato
 
 
   Prashant Sharma


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208

2024-03-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 3c41b1d97e1f [SPARK-47428][BUILD][3.4] Upgrade Jetty to 
9.4.54.v20240208
3c41b1d97e1f is described below

commit 3c41b1d97e1f5ff9f74f9ea72f7ea92dcbca2122
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 15 22:42:17 2024 -0700

[SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208

### What changes were proposed in this pull request?

This PR aims to upgrade Jetty to 9.4.54.v20240208 for Apache Spark 3.4.3.

### Why are the changes needed?

To bring the latest bug fixes.
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.52.v20230823
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.51.v20230217

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45544 from dongjoon-hyun/SPARK-47428-3.4.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 691c83632b38..a94fbcd0ca77 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -143,7 +143,7 @@ jersey-hk2/2.36//jersey-hk2-2.36.jar
 jersey-server/2.36//jersey-server-2.36.jar
 jetty-sslengine/6.1.26//jetty-sslengine-6.1.26.jar
 jetty-util/6.1.26//jetty-util-6.1.26.jar
-jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jetty/6.1.26//jetty-6.1.26.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.2//joda-time-2.12.2.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4d94cb5c699e..99665da7d16a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -128,8 +128,8 @@ 
jersey-container-servlet/2.36//jersey-container-servlet-2.36.jar
 jersey-hk2/2.36//jersey-hk2-2.36.jar
 jersey-server/2.36//jersey-server-2.36.jar
 jettison/1.1//jettison-1.1.jar
-jetty-util-ajax/9.4.50.v20221201//jetty-util-ajax-9.4.50.v20221201.jar
-jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar
+jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.2//joda-time-2.12.2.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
diff --git a/pom.xml b/pom.xml
index 373d17b76c09..77218d162c41 100644
--- a/pom.xml
+++ b/pom.xml
@@ -143,7 +143,7 @@
 1.12.3
 1.8.6
 shaded-protobuf
-9.4.50.v20221201
+9.4.54.v20240208
 4.0.3
 0.10.0
 2.5.1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

2024-03-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 210e80e8b7ba [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` 
GitHub Action job
210e80e8b7ba is described below

commit 210e80e8b7baa5fc1e6462615bc8134a4c90647c
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 23:38:56 2023 -0700

[SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

### What changes were proposed in this pull request?

This PR aims to skip `Unidoc` and `MIMA` phases in many general test 
pipelines. `mima` test is moved to `lint` job.

### Why are the changes needed?

By having an independent document generation and mima checking GitHub 
Action job, we can skip them in the following many jobs.


https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually check the GitHub action logs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43422 from dongjoon-hyun/SPARK-45587.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 13527119e51a..33747fb5b61d 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -198,6 +198,8 @@ jobs:
   HIVE_PROFILE: ${{ matrix.hive }}
   GITHUB_PREV_SHA: ${{ github.event.before }}
   SPARK_LOCAL_IP: localhost
+  SKIP_UNIDOC: true
+  SKIP_MIMA: true
   SKIP_PACKAGING: true
 steps:
 - name: Checkout Spark repository
@@ -578,6 +580,8 @@ jobs:
   run: ./dev/check-license
 - name: Dependencies test
   run: ./dev/test-dependencies.sh
+- name: MIMA test
+  run: ./dev/mima
 - name: Scala linter
   run: ./dev/lint-scala
 - name: Java linter


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

2024-03-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8c6eeb8ab018 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` 
GitHub Action job
8c6eeb8ab018 is described below

commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 23:38:56 2023 -0700

[SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

### What changes were proposed in this pull request?

This PR aims to skip `Unidoc` and `MIMA` phases in many general test 
pipelines. `mima` test is moved to `lint` job.

### Why are the changes needed?

By having an independent document generation and mima checking GitHub 
Action job, we can skip them in the following many jobs.


https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually check the GitHub action logs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43422 from dongjoon-hyun/SPARK-45587.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index ad8685754b31..b0760a955342 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -204,6 +204,8 @@ jobs:
   HIVE_PROFILE: ${{ matrix.hive }}
   GITHUB_PREV_SHA: ${{ github.event.before }}
   SPARK_LOCAL_IP: localhost
+  SKIP_UNIDOC: true
+  SKIP_MIMA: true
   SKIP_PACKAGING: true
 steps:
 - name: Checkout Spark repository
@@ -627,6 +629,8 @@ jobs:
   run: ./dev/check-license
 - name: Dependencies test
   run: ./dev/test-dependencies.sh
+- name: MIMA test
+  run: ./dev/mima
 - name: Scala linter
   run: ./dev/lint-scala
 - name: Java linter


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208

2024-03-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d59425275cdd [SPARK-47428][BUILD][3.5] Upgrade Jetty to 
9.4.54.v20240208
d59425275cdd is described below

commit d59425275cdd0ff678a5bcccef4c7b74fe8170cb
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 15 22:28:45 2024 -0700

[SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208

### What changes were proposed in this pull request?

This PR aims to upgrade Jetty to 9.4.54.v20240208

### Why are the changes needed?

To bring the latest bug fixes.
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45543 from dongjoon-hyun/SPARK-47428.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index c76702cd0af0..8ecf931bf513 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -130,8 +130,8 @@ 
jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar
 jersey-hk2/2.40//jersey-hk2-2.40.jar
 jersey-server/2.40//jersey-server-2.40.jar
 jettison/1.1//jettison-1.1.jar
-jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar
-jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar
+jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.5//joda-time-2.12.5.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
diff --git a/pom.xml b/pom.xml
index 5db3c78e00eb..fb6208777d3f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -143,7 +143,7 @@
 1.13.1
 1.9.2
 shaded-protobuf
-9.4.52.v20230823
+9.4.54.v20240208
 4.0.3
 0.10.0

(spark) branch master updated (4437e6e21237 -> 6bf031796c8c)

2024-03-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` 
to `common/utils`
 add 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate 
`test_metadata`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_connect_session.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (b7aa9740249b -> 4437e6e21237)

2024-03-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to 
NullType
 add 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` 
to `common/utils`

No new revisions were added by this update.

Summary of changes:
 .../utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {core => 
common/utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties 
(100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47234][BUILD] Upgrade Scala to 2.13.13

2024-03-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 56cfc89e8f15 [SPARK-47234][BUILD] Upgrade Scala to 2.13.13
56cfc89e8f15 is described below

commit 56cfc89e8f1599fe859db1bd6628a9b07d53bed4
Author: panbingkun 
AuthorDate: Thu Mar 14 22:40:54 2024 -0700

[SPARK-47234][BUILD] Upgrade Scala to 2.13.13

### What changes were proposed in this pull request?
The pr aims to upgrade scala from `2.13.12` to `2.13.13`.

### Why are the changes needed?
- The new version bring some bug fixes:
  https://github.com/scala/scala/pull/10525
  https://github.com/scala/scala/pull/10528

- The release notes as follows: 
https://github.com/scala/scala/releases/tag/v2.13.13

### Does this PR introduce _any_ user-facing change?
Yes, The `scala` version is changed from `2.13.12` to `2.13.13`.

### How was this patch tested?
- Pass GA.
- After the master is upgraded to this version `2.13.13`, we need to 
continue to observe.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45342 from panbingkun/SPARK-47234.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 
 docs/_config.yml  | 2 +-
 pom.xml   | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 2e091cb3638e..d4b7d38aea22 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -139,7 +139,7 @@ jettison/1.5.4//jettison-1.5.4.jar
 jetty-util-ajax/11.0.20//jetty-util-ajax-11.0.20.jar
 jetty-util/11.0.20//jetty-util-11.0.20.jar
 jline/2.14.6//jline-2.14.6.jar
-jline/3.22.0//jline-3.22.0.jar
+jline/3.24.1//jline-3.24.1.jar
 jna/5.13.0//jna-5.13.0.jar
 joda-time/2.12.7//joda-time-2.12.7.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
@@ -245,11 +245,11 @@ py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
 rocksdbjni/8.11.3//rocksdbjni-8.11.3.jar
 scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
-scala-compiler/2.13.12//scala-compiler-2.13.12.jar
-scala-library/2.13.12//scala-library-2.13.12.jar
+scala-compiler/2.13.13//scala-compiler-2.13.13.jar
+scala-library/2.13.13//scala-library-2.13.13.jar
 
scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar
 scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar
-scala-reflect/2.13.12//scala-reflect-2.13.12.jar
+scala-reflect/2.13.13//scala-reflect-2.13.13.jar
 scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar
 slf4j-api/2.0.12//slf4j-api-2.0.12.jar
 snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar
diff --git a/docs/_config.yml b/docs/_config.yml
index 7a305ceea67b..19183f85df23 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -22,7 +22,7 @@ include:
 SPARK_VERSION: 4.0.0-SNAPSHOT
 SPARK_VERSION_SHORT: 4.0.0
 SCALA_BINARY_VERSION: "2.13"
-SCALA_VERSION: "2.13.12"
+SCALA_VERSION: "2.13.13"
 SPARK_ISSUE_TRACKER_URL: https://issues.apache.org/jira/browse/SPARK
 SPARK_GITHUB_URL: https://github.com/apache/spark
 # Before a new release, we should:
diff --git a/pom.xml b/pom.xml
index 6a811e74e7f8..d67ab1c01273 100644
--- a/pom.xml
+++ b/pom.xml
@@ -172,7 +172,7 @@
 
 3.2.2
 4.4
-2.13.12
+2.13.13
 2.13
 2.2.0
 
@@ -226,7 +226,7 @@
 ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too.
 -->
 15.0.0
-2.5.11
+3.0.0-M1
 
 
 org.fusesource.leveldbjni


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (213399b61de5 -> fe0aa1edff04)

2024-03-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT 
TIME ZONE to TimestampNTZType
 add fe0aa1edff04 [SPARK-47402][BUILD] Upgrade `ZooKeeper` to 3.9.2

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (7b4ab4fa452d -> 213399b61de5)

2024-03-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7b4ab4fa452d [SPARK-47387][SQL] Remove some unused error classes
 add 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT 
TIME ZONE to TimestampNTZType

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala   |  1 +
 .../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala   | 10 ++
 2 files changed, 11 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect

2024-03-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d41d5ecda8c1 [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE 
mapping in PostgresDialect
d41d5ecda8c1 is described below

commit d41d5ecda8c11d7e8f6a1fafa1d2be97c0f49f04
Author: Kent Yao 
AuthorDate: Thu Mar 14 10:30:48 2024 -0700

[SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in 
PostgresDialect

### What changes were proposed in this pull request?

This PR fixes a bug in SPARK-47390, we shall separate TIME from TIMESTAMP 
case-match branch

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

local test with #45519 merged together

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45522 from yaooqinn/SPARK-47390-F.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala  | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala
index 7d8ed70b2bd1..9b286620a140 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala
@@ -58,10 +58,14 @@ private object PostgresDialect extends JdbcDialect with 
SQLConfHelper {
 // See SPARK-34333 and https://github.com/pgjdbc/pgjdbc/issues/100
 Some(StringType)
   case Types.TIMESTAMP
-if "timestamptz".equalsIgnoreCase(typeName) || 
"timetz".equalsIgnoreCase(typeName) =>
+if "timestamptz".equalsIgnoreCase(typeName) =>
 // timestamptz represents timestamp with time zone, currently it maps 
to Types.TIMESTAMP.
 // We need to change to Types.TIMESTAMP_WITH_TIMEZONE if the upstream 
changes.
 Some(TimestampType)
+  case Types.TIME if "timetz".equalsIgnoreCase(typeName) =>
+// timetz represents time with time zone, currently it maps to 
Types.TIME.
+// We need to change to Types.TIME_WITH_TIMEZONE if the upstream 
changes.
+Some(TimestampType)
   case Types.OTHER => Some(StringType)
   case _ if "text".equalsIgnoreCase(typeName) => Some(StringType) // 
sqlType is Types.VARCHAR
   case Types.ARRAY =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (481597cd2d79 -> b98accd9d931)

2024-03-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 481597cd2d79 [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20
 add b98accd9d931 [SPARK-47401][K8S][DOCS] Update `YuniKorn` docs with v1.5

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 4553 matches

Mail list logo