from:"yao"

(spark) branch master updated: [SPARK-48574][SQL] Fix support for StructTypes with collations

2024-06-19 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ac31b1b6eaf [SPARK-48574][SQL] Fix support for StructTypes with 
collations
3ac31b1b6eaf is described below

commit 3ac31b1b6eaf9c1a45859f4238a7f7e2c4ffb9dc
Author: Mihailo Milosevic 
AuthorDate: Wed Jun 19 16:07:59 2024 +0800

[SPARK-48574][SQL] Fix support for StructTypes with collations

### What changes were proposed in this pull request?
Fix for ExtractValue expression

### Why are the changes needed?
This fix is needed in case we change default collation.

### Does this PR introduce _any_ user-facing change?
Yes, it fixes a problem.

### How was this patch tested?
Added tests in `CollationSQLExpressionsSuite.scala`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46997 from mihailom-db/SPARK-48574.

Authored-by: Mihailo Milosevic 
Signed-off-by: Kent Yao 
---
 .../catalyst/expressions/complexTypeExtractors.scala  |  4 ++--
 .../spark/sql/CollationSQLExpressionsSuite.scala  | 19 +++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
index a801d0367080..ff94322efdaa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
@@ -51,12 +51,12 @@ object ExtractValue {
   resolver: Resolver): Expression = {
 
 (child.dataType, extraction) match {
-  case (StructType(fields), NonNullLiteral(v, StringType)) =>
+  case (StructType(fields), NonNullLiteral(v, _: StringType)) =>
 val fieldName = v.toString
 val ordinal = findField(fields, fieldName, resolver)
 GetStructField(child, ordinal, Some(fieldName))
 
-  case (ArrayType(StructType(fields), containsNull), NonNullLiteral(v, 
StringType)) =>
+  case (ArrayType(StructType(fields), containsNull), NonNullLiteral(v, _: 
StringType)) =>
 val fieldName = v.toString
 val ordinal = findField(fields, fieldName, resolver)
 GetArrayStructFields(child, fields(ordinal).copy(name = fieldName),
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
index a1c6f5f94317..0c54ccb7cfb1 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
@@ -1854,6 +1854,25 @@ class CollationSQLExpressionsSuite
 })
   }
 
+  test("ExtractValue expression with collation") {
+// Supported collations
+testSuppCollations.foreach(collationName => {
+  withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) {
+val query =
+  s"""
+ |select col['Field1']
+ |from values (named_struct('Field1', 'Spark', 'Field2', 5)) as 
tab(col);
+ |""".stripMargin
+// Result & data type check
+val testQuery = sql(query)
+val dataType = StringType(collationName)
+val expectedResult = "Spark"
+assert(testQuery.schema.fields.head.dataType.sameType(dataType))
+checkAnswer(testQuery, Row(expectedResult))
+  }
+})
+  }
+
   test("Lag expression with collation") {
 // Supported collations
 testSuppCollations.foreach(collationName => {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48644][SQL] Do a length check and throw COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex

2024-06-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 05c87e51a5e5 [SPARK-48644][SQL] Do a length check and throw 
COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex
05c87e51a5e5 is described below

commit 05c87e51a5e50d1c156211848693b66937f12a8f
Author: Kent Yao 
AuthorDate: Tue Jun 18 13:19:34 2024 +0800

[SPARK-48644][SQL] Do a length check and throw 
COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex

### What changes were proposed in this pull request?

A length check is necessary for heximalising the byte array as the new 
array size is 2x the original. If the length of the new byte array exceeds 
Int.MaxValue, we shall report the actual length and the threshold instead of 
NegativeArraySizeException

### Why are the changes needed?

improve error handling

### Does this PR introduce _any_ user-facing change?

Yes, we convert large strings or binary values to hex strings, if the max 
length exceeds, the raised error is changed

### How was this patch tested?

test manually without adding unit test because such a test case is quite 
memory consuming.
```
 org.apache.spark.sql.catalyst.expressions.Hex.hex((" " * (Int.MaxValue / 2 
+ 1)).getBytes)
org.apache.spark.SparkIllegalArgumentException: 
[COLLECTION_SIZE_LIMIT_EXCEEDED.INITIALIZE] Can't create array with 2147483648 
elements which exceeding the array size limit 2147483647, cannot initialize an 
array with specified parameters. SQLSTATE: 54000
  at 
org.apache.spark.sql.errors.QueryExecutionErrors$.tooManyArrayElementsError(QueryExecutionErrors.scala:2517)
  at 
org.apache.spark.sql.catalyst.expressions.Hex$.hex(mathExpressions.scala:1042)
  ... 42 elided
```
### Was this patch authored or co-authored using generative AI tooling?
no

Closes #47001 from yaooqinn/SPARK-48644.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../apache/spark/sql/catalyst/expressions/mathExpressions.scala  | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index 20bedeb04098..5981b42aead8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -1034,7 +1034,14 @@ object Hex {
 
   def hex(bytes: Array[Byte]): UTF8String = {
 val length = bytes.length
-val value = new Array[Byte](length * 2)
+if (length == 0) {
+  return UTF8String.EMPTY_UTF8
+}
+val targetLength = length * 2L
+if (targetLength > Int.MaxValue) {
+  throw QueryExecutionErrors.tooManyArrayElementsError(targetLength, 
Int.MaxValue)
+}
+val value = new Array[Byte](targetLength.toInt)
 var i = 0
 while (i < length) {
   value(i * 2) = hexDigits((bytes(i) & 0xF0) >> 4)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (9ef092f19aaa -> 8fdd85f09779)

2024-06-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9ef092f19aaa [SPARK-48641][BUILD] Upgrade `curator` to 5.7.0
 add 8fdd85f09779 [SPARK-48603][TEST] Update *ParquetReadSchemaSuite to 
cover type widen capability

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/ReadSchemaSuite.scala| 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48641][BUILD] Upgrade `curator` to 5.7.0

2024-06-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9ef092f19aaa [SPARK-48641][BUILD] Upgrade `curator` to 5.7.0
9ef092f19aaa is described below

commit 9ef092f19aaa8c4afbf26d1c34336af328265bb0
Author: Wei Guo 
AuthorDate: Mon Jun 17 17:55:44 2024 +0800

[SPARK-48641][BUILD] Upgrade `curator` to 5.7.0

### What changes were proposed in this pull request?

This PR aims to upgrade `curator` to 5.7.0.

### Why are the changes needed?

There are some bug fixes and improvements in  Apache Curator 5.7.0:
[[CURATOR-688](https://issues.apache.org/jira/browse/CURATOR-688)] - 
SharedCount will be never updated successful when version of ZNode is overflow
[[CURATOR-696](https://issues.apache.org/jira/browse/CURATOR-696)] - Double 
leader for LeaderLatch
[[CURATOR-704](https://issues.apache.org/jira/browse/CURATOR-704)] - Use 
server version to detect supported features


https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12314425=12354115
### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46998 from wayneguow/curator_upgrade.

Authored-by: Wei Guo 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 5bdd31086bdf..c74482eb2fdb 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -51,9 +51,9 @@ commons-math3/3.6.1//commons-math3-3.6.1.jar
 commons-pool/1.5.4//commons-pool-1.5.4.jar
 commons-text/1.12.0//commons-text-1.12.0.jar
 compress-lzf/1.1.2//compress-lzf-1.1.2.jar
-curator-client/5.6.0//curator-client-5.6.0.jar
-curator-framework/5.6.0//curator-framework-5.6.0.jar
-curator-recipes/5.6.0//curator-recipes-5.6.0.jar
+curator-client/5.7.0//curator-client-5.7.0.jar
+curator-framework/5.7.0//curator-framework-5.7.0.jar
+curator-recipes/5.7.0//curator-recipes-5.7.0.jar
 datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
 datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
 datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
diff --git a/pom.xml b/pom.xml
index fc372ba17278..0c2fa604902f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -128,7 +128,7 @@
 3.11.4
 ${hadoop.version}
 3.9.2
-5.6.0
+5.7.0
 org.apache.hive
 core
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48627][SQL] Perf improvement for binary to to HEX_DISCRETE string

2024-06-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0c16624c77ad [SPARK-48627][SQL] Perf improvement for binary to to 
HEX_DISCRETE string
0c16624c77ad is described below

commit 0c16624c77ad311a3d076c4cfb4451b1f19f8a9b
Author: Kent Yao 
AuthorDate: Mon Jun 17 13:42:46 2024 +0800

[SPARK-48627][SQL] Perf improvement for binary to to HEX_DISCRETE string

### What changes were proposed in this pull request?

By replacing `String.format`, we can achieve nearly 200x performance 
improvement.

The SparkStringUtils.getHexString is widely used by
- the Spark Thrift Server to convert binary to string when sending results 
to clients
- the Spark SQL shell for display
- the Spark Shell when calling `show`
- the Spark Connect scala client when stringifying binaries in arrow vectors

```
+OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
+Apple M2 Max
+Cardinality 10:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

+
+Spark 42210  43595 
   1207  0.0  422102.9   1.0X
+Java238243 
  2  0.42381.9 177.2X
```

### Why are the changes needed?

perf improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

By existing binary*.sql's results

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46984 from yaooqinn/SPARK-48627.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../scala/org/apache/spark/sql/catalyst/util/StringUtils.scala| 8 +++-
 .../scala/org/apache/spark/sql/catalyst/util/StringUtils.scala| 6 --
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
index aa8826dd48b6..edb1ee371b15 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
@@ -16,6 +16,7 @@
  */
 package org.apache.spark.sql.catalyst.util
 
+import java.util.HexFormat
 import java.util.concurrent.atomic.AtomicBoolean
 
 import org.apache.spark.internal.Logging
@@ -101,11 +102,16 @@ object SparkStringUtils extends Logging {
 truncatedString(seq, "", sep, "", maxFields)
   }
 
+  private final lazy val SPACE_DELIMITED_UPPERCASE_HEX =
+HexFormat.of().withDelimiter(" ").withUpperCase()
+
   /**
* Returns a pretty string of the byte array which prints each byte as a hex 
digit and add spaces
* between them. For example, [1A C0].
*/
-  def getHexString(bytes: Array[Byte]): String = 
bytes.map("%02X".format(_)).mkString("[", " ", "]")
+  def getHexString(bytes: Array[Byte]): String = {
+s"[${SPACE_DELIMITED_UPPERCASE_HEX.formatHex(bytes)}]"
+  }
 
   def sideBySide(left: String, right: String): Seq[String] = {
 sideBySide(left.split("\n").toImmutableArraySeq, 
right.split("\n").toImmutableArraySeq)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
index 2fecd9a23759..e2a5319cbe1a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
@@ -66,12 +66,6 @@ object StringUtils extends Logging {
 "(?s)" + out.result() // (?s) enables dotall mode, causing "." to match 
new lines
   }
 
-  /**
-   * Returns a pretty string of the byte array which prints each byte as a hex 
digit and add spaces
-   * between them. For example, [1A C0].
-   */
-  def getHexString(bytes: Array[Byte]): String = 
bytes.map("%02X".format(_)).mkString("[", " ", "]")
-
   private[this] val trueStrings =
 Set("t", "true", "y", "yes", "1").map(UTF8String.fromString)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48621][SQL] Fix Like simplification in Optimizer for collated strings

2024-06-14 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8ee8abaa599f [SPARK-48621][SQL] Fix Like simplification in Optimizer 
for collated strings
8ee8abaa599f is described below

commit 8ee8abaa599fd6efea85018549f1ec135af319e0
Author: Uros Bojanic <157381213+uros...@users.noreply.github.com>
AuthorDate: Fri Jun 14 17:32:16 2024 +0800

[SPARK-48621][SQL] Fix Like simplification in Optimizer for collated strings

### What changes were proposed in this pull request?
Enable `LikeSimplification` optimizer rule for collated strings.

### Why are the changes needed?
Optimize how `Like` expression works with collated strings and ensure 
collation awareness when replacing `Like` expressions with `StartsWith` / 
`EndsWith` / `Contains` / `EqualTo` under special conditions.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
New e2e sql tests in `CollationSQLRegexpSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46976 from uros-db/like-simplification.

Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com>
Signed-off-by: Kent Yao 
---
 .../spark/sql/catalyst/optimizer/expressions.scala | 17 +++
 .../apache/spark/sql/CollationSQLRegexpSuite.scala | 56 ++
 2 files changed, 65 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
index 2c55e4c8fd37..2606dd2d7737 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
@@ -738,18 +738,19 @@ object LikeSimplification extends Rule[LogicalPlan] with 
PredicateHelper {
 } else {
   pattern match {
 case startsWith(prefix) =>
-  Some(StartsWith(input, Literal(prefix)))
+  Some(StartsWith(input, Literal.create(prefix, input.dataType)))
 case endsWith(postfix) =>
-  Some(EndsWith(input, Literal(postfix)))
+  Some(EndsWith(input, Literal.create(postfix, input.dataType)))
 // 'a%a' pattern is basically same with 'a%' && '%a'.
 // However, the additional `Length` condition is required to prevent 
'a' match 'a%a'.
-case startsAndEndsWith(prefix, postfix) =>
-  Some(And(GreaterThanOrEqual(Length(input), Literal(prefix.length + 
postfix.length)),
-And(StartsWith(input, Literal(prefix)), EndsWith(input, 
Literal(postfix)
+case startsAndEndsWith(prefix, postfix) => Some(
+  And(GreaterThanOrEqual(Length(input), Literal.create(prefix.length + 
postfix.length)),
+  And(StartsWith(input, Literal.create(prefix, input.dataType)),
+EndsWith(input, Literal.create(postfix, input.dataType)
 case contains(infix) =>
-  Some(Contains(input, Literal(infix)))
+  Some(Contains(input, Literal.create(infix, input.dataType)))
 case equalTo(str) =>
-  Some(EqualTo(input, Literal(str)))
+  Some(EqualTo(input, Literal.create(str, input.dataType)))
 case _ => None
   }
 }
@@ -785,7 +786,7 @@ object LikeSimplification extends Rule[LogicalPlan] with 
PredicateHelper {
 
   def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformAllExpressionsWithPruning(
 _.containsPattern(LIKE_FAMLIY), ruleId) {
-case l @ Like(input, Literal(pattern, StringType), escapeChar) =>
+case l @ Like(input, Literal(pattern, _: StringType), escapeChar) =>
   if (pattern == null) {
 // If pattern is null, return null value directly, since "col like 
null" == null.
 Literal(null, BooleanType)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala
index 740583064279..885ed3709868 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala
@@ -18,6 +18,8 @@
 package org.apache.spark.sql
 
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.internal.SqlApiConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types.{ArrayType, BooleanType, IntegerType, 
StringType}
 
@@ -55,6 +57,60 @@ class CollationSQLRegexpSuite
 })
   }
 
+  test("Like simplification should work with collated strings") {
+cas

(spark) branch master updated: [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer`

2024-06-14 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dd8b05f25fdc [SPARK-42252][CORE] Add 
`spark.shuffle.localDisk.file.output.buffer` and deprecate 
`spark.shuffle.unsafe.file.output.buffer`
dd8b05f25fdc is described below

commit dd8b05f25fdc2c964e351f4cbbf0dd474474783c
Author: wayneguow 
AuthorDate: Fri Jun 14 15:11:33 2024 +0800

[SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and 
deprecate `spark.shuffle.unsafe.file.output.buffer`

### What changes were proposed in this pull request?
Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config 
spark.shuffle.localDisk.file.output.buffer instead.

### Why are the changes needed?
The old config is desgined to be used in UnsafeShuffleWriter, but now it 
has been used in all local shuffle writers through 
LocalDiskShuffleMapOutputWriter, introduced by #25007.

### Does this PR introduce _any_ user-facing change?
Old still works, advised to use new.

### How was this patch tested?
Passed existing tests.

Closes #39819 from wayneguow/shuffle_output_buffer.

Authored-by: wayneguow 
Signed-off-by: Kent Yao 
---
 .../shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java |  2 +-
 core/src/main/scala/org/apache/spark/SparkConf.scala |  4 +++-
 .../scala/org/apache/spark/internal/config/package.scala | 10 --
 .../sort/io/LocalDiskShuffleMapOutputWriterSuite.scala   |  2 +-
 docs/configuration.md| 12 ++--
 docs/core-migration-guide.md |  2 ++
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java
 
b/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java
index 606bb625f5b2..c0b9018c770a 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java
@@ -74,7 +74,7 @@ public class LocalDiskShuffleMapOutputWriter implements 
ShuffleMapOutputWriter {
 this.blockResolver = blockResolver;
 this.bufferSize =
   (int) (long) sparkConf.get(
-package$.MODULE$.SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE()) * 1024;
+package$.MODULE$.SHUFFLE_LOCAL_DISK_FILE_OUTPUT_BUFFER_SIZE()) * 1024;
 this.partitionLengths = new long[numPartitions];
 this.outputFile = blockResolver.getDataFile(shuffleId, mapId);
 this.outputTempFile = null;
diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 95955455a9d4..cfb514913694 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -647,7 +647,9 @@ private[spark] object SparkConf extends Logging {
   
DeprecatedConfig("spark.yarn.blacklist.executor.launch.blacklisting.enabled", 
"3.1.0",
 "Please use spark.yarn.executor.launch.excludeOnFailure.enabled"),
   DeprecatedConfig("spark.network.remoteReadNioBufferConversion", "3.5.2",
-"Please open a JIRA ticket to report it if you need to use this 
configuration.")
+"Please open a JIRA ticket to report it if you need to use this 
configuration."),
+  DeprecatedConfig("spark.shuffle.unsafe.file.output.buffer", "4.0.0",
+"Please use spark.shuffle.localDisk.file.output.buffer")
 )
 
 Map(configs.map { cfg => (cfg.key -> cfg) } : _*)
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index a7268c640991..9fcd9ba529c1 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1463,8 +1463,7 @@ package object config {
 
   private[spark] val SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE =
 ConfigBuilder("spark.shuffle.unsafe.file.output.buffer")
-  .doc("The file system for this buffer size after each partition " +
-"is written in unsafe shuffle writer. In KiB unless otherwise 
specified.")
+  .doc("(Deprecated since Spark 4.0, please use 
'spark.shuffle.localDisk.file.output.buffer'.)")
   .version("2.3.0")
   .bytesConf(ByteUnit.KiB)
   .checkValue(v => v > 0 && v <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH 
/ 1024,
@@ -1472,6 +1471,13 @@ package object config {
   s" ${B

(spark) branch master updated: [SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11

2024-06-13 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 38318863f641 [SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11
38318863f641 is described below

commit 38318863f6411df7bb1ec105b8b2d3bd1dff3c6d
Author: Wei Guo 
AuthorDate: Fri Jun 14 13:54:54 2024 +0800

[SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11

### What changes were proposed in this pull request?

Upgrade `mssql-jdbc` to 12.6.2.jre11

### Why are the changes needed?

There are some issue fixes and enhancements:
https://github.com/microsoft/mssql-jdbc/releases/tag/v12.6.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Passed GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46981 from wayneguow/mssql-jdbc.

Authored-by: Wei Guo 
Signed-off-by: Kent Yao 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index a900cd993335..a80bb3c0a6c3 100644
--- a/pom.xml
+++ b/pom.xml
@@ -326,7 +326,7 @@
 8.4.0
 42.7.3
 11.5.9.0
-12.6.1.jre11
+12.6.2.jre11
 23.4.0.24.05
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to `parquet-java`

2024-06-13 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0b214f166a92 [MINOR][DOCS][TESTS] Update repo name and link from 
`parquet-mr` to `parquet-java`
0b214f166a92 is described below

commit 0b214f166a92c4e6b4fdc102f7718903a1a152d5
Author: Wei Guo 
AuthorDate: Fri Jun 14 10:33:49 2024 +0800

[MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to 
`parquet-java`

### What changes were proposed in this pull request?

This pr replaces parquet related repo name from `parquet-mr` to 
`parquet-java` and repo link from `https://github.com/apache/parquet-mr` to 
`https://github.com/apache/parquet-java`.

### Why are the changes needed?

The upstream repo name has made a change with 
[INFRA-25802](https://issues.apache.org/jira/browse/INFRA-25802), 
[PARQUET-2475](https://issues.apache.org/jira/browse/PARQUET-2475), it's better 
to update with the latest name and link.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Passed GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46963 from wayneguow/parquet.

Authored-by: Wei Guo 
Signed-off-by: Kent Yao 
---
 docs/sql-data-sources-load-save-functions.md| 2 +-
 docs/sql-data-sources-parquet.md| 6 +++---
 .../datasources/parquet/ParquetInteroperabilitySuite.scala  | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/sql-data-sources-load-save-functions.md 
b/docs/sql-data-sources-load-save-functions.md
index b42f6e84076d..70105c22e583 100644
--- a/docs/sql-data-sources-load-save-functions.md
+++ b/docs/sql-data-sources-load-save-functions.md
@@ -109,7 +109,7 @@ For example, you can control bloom filters and dictionary 
encodings for ORC data
 The following ORC example will create bloom filter and use dictionary encoding 
only for `favorite_color`.
 For Parquet, there exists `parquet.bloom.filter.enabled` and 
`parquet.enable.dictionary`, too.
 To find more detailed information about the extra ORC/Parquet options,
-visit the official Apache [ORC](https://orc.apache.org/docs/spark-config.html) 
/ [Parquet](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop) 
websites.
+visit the official Apache [ORC](https://orc.apache.org/docs/spark-config.html) 
/ [Parquet](https://github.com/apache/parquet-java/tree/master/parquet-hadoop) 
websites.
 
 ORC data source:
 
diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index f5c5ccd3b89a..5a0ca595fabb 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -350,7 +350,7 @@ Dataset df2 = 
spark.read().parquet("/path/to/table.parquet.encrypted");
 
  KMS Client
 
-The InMemoryKMS class is provided only for illustration and simple 
demonstration of Parquet encryption functionality. **It should not be used in a 
real deployment**. The master encryption keys must be kept and managed in a 
production-grade KMS system, deployed in user's organization. Rollout of Spark 
with Parquet encryption requires implementation of a client class for the KMS 
server. Parquet provides a plug-in 
[interface](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/p 
[...]
+The InMemoryKMS class is provided only for illustration and simple 
demonstration of Parquet encryption functionality. **It should not be used in a 
real deployment**. The master encryption keys must be kept and managed in a 
production-grade KMS system, deployed in user's organization. Rollout of Spark 
with Parquet encryption requires implementation of a client class for the KMS 
server. Parquet provides a plug-in 
[interface](https://github.com/apache/parquet-java/blob/apache-parquet-1.13.1 
[...]
 
 
 {% highlight java %}
@@ -371,9 +371,9 @@ public interface KmsClient {
 
 
 
-An 
[example](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/crypto/keytools/samples/VaultClient.java)
 of such class for an open source 
[KMS](https://www.vaultproject.io/api/secret/transit) can be found in the 
parquet-mr repository. The production KMS client should be designed in 
cooperation with organization's security administrators, and built by 
developers with an experience in access control management. Once such class is 
created, it c [...]
+An 
[example](https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/crypto/keytools/samples/VaultClient.java)
 of such class for an open source 
[KMS](https://www.vaultproject.io/api/secret/transit) can be found in the 
parquet-java repository. The production KMS client should be designed in 
c

(spark) branch master updated (be154a371df0 -> 70bdcc97910e)

2024-06-13 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from be154a371df0 [SPARK-48622][SQL] get SQLConf once when resolving column 
names
 add 70bdcc97910e [MINOR][DOCS] Fix metrics info of shuffle service

No new revisions were added by this update.

Summary of changes:
 docs/monitoring.md | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48622][SQL] get SQLConf once when resolving column names

2024-06-13 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new be154a371df0 [SPARK-48622][SQL] get SQLConf once when resolving column 
names
be154a371df0 is described below

commit be154a371df0401163deb221efc3b54fa089f49c
Author: Andrew Xue 
AuthorDate: Fri Jun 14 10:04:24 2024 +0800

[SPARK-48622][SQL] get SQLConf once when resolving column names

### What changes were proposed in this pull request?

`SQLConf.caseSensitiveAnalysis` is currently being retrieved for every 
column when resolving column names. This is expensive if there are many 
columns. We can instead retrieve it once before the loop, and reuse the result.

### Why are the changes needed?

Performance improvement.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Profiles of adding 1 column on an empty 10k column table (hms-parquet):

Before (55s):
https://github.com/databricks/runtime/assets/169104436/58de6a56-943e-465a-9005-ae98f960779e;>

After (13s):
https://github.com/databricks/runtime/assets/169104436/e9bdabc4-6e29-4012-bb01-103fa0b640fc;>

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46979 from andrewxue-db/andrewxue-db/spark-48622.

Authored-by: Andrew Xue 
Signed-off-by: Kent Yao 
---
 .../org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 7a19f276b513..0e0852d0a550 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -480,8 +480,9 @@ class SessionCatalog(
 val catalogTable = externalCatalog.getTable(db, table)
 val oldDataSchema = catalogTable.dataSchema
 // not supporting dropping columns yet
+val resolver = conf.resolver
 val nonExistentColumnNames =
-  oldDataSchema.map(_.name).filterNot(columnNameResolved(newDataSchema, _))
+  oldDataSchema.map(_.name).filterNot(columnNameResolved(resolver, 
newDataSchema, _))
 if (nonExistentColumnNames.nonEmpty) {
   throw 
QueryCompilationErrors.dropNonExistentColumnsNotSupportedError(nonExistentColumnNames)
 }
@@ -489,8 +490,11 @@ class SessionCatalog(
 externalCatalog.alterTableDataSchema(db, table, newDataSchema)
   }
 
-  private def columnNameResolved(schema: StructType, colName: String): Boolean 
= {
-schema.fields.map(_.name).exists(conf.resolver(_, colName))
+  private def columnNameResolved(
+  resolver: Resolver,
+  schema: StructType,
+  colName: String): Boolean = {
+schema.fields.exists(f => resolver(f.name, colName))
   }
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (fd045c9887fe -> ea2bca74923e)

2024-06-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fd045c9887fe [SPARK-48583][SQL][TESTS] Replace deprecated classes and 
methods of `commons-io` called in Spark
 add ea2bca74923e [SPARK-48602][SQL] Make csv generator support different 
output style with spark.sql.binaryOutputStyle

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/csv/UnivocityGenerator.scala |  8 +---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala  |  2 +-
 .../resources/sql-tests/analyzer-results/binary.sql.out|  7 +++
 .../sql-tests/analyzer-results/binary_base64.sql.out   |  7 +++
 .../sql-tests/analyzer-results/binary_basic.sql.out|  7 +++
 .../sql-tests/analyzer-results/binary_hex.sql.out  |  7 +++
 .../{binary_basic.sql.out => binary_hex_discrete.sql.out}  |  7 +++
 sql/core/src/test/resources/sql-tests/inputs/binary.sql|  1 +
 .../resources/sql-tests/inputs/binary_hex_discrete.sql |  3 +++
 .../src/test/resources/sql-tests/results/binary.sql.out|  8 
 .../test/resources/sql-tests/results/binary_base64.sql.out |  8 
 .../test/resources/sql-tests/results/binary_basic.sql.out  |  8 
 .../test/resources/sql-tests/results/binary_hex.sql.out|  8 
 .../{binary_basic.sql.out => binary_hex_discrete.sql.out}  | 14 +++---
 .../sql/hive/thriftserver/ThriftServerQueryTestSuite.scala |  1 +
 15 files changed, 89 insertions(+), 7 deletions(-)
 copy 
sql/core/src/test/resources/sql-tests/analyzer-results/{binary_basic.sql.out => 
binary_hex_discrete.sql.out} (69%)
 create mode 100644 
sql/core/src/test/resources/sql-tests/inputs/binary_hex_discrete.sql
 copy sql/core/src/test/resources/sql-tests/results/{binary_basic.sql.out => 
binary_hex_discrete.sql.out} (55%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48596][SQL] Perf improvement for calculating hex string for long

2024-06-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b5e1b7988031 [SPARK-48596][SQL] Perf improvement for calculating hex 
string for long
b5e1b7988031 is described below

commit b5e1b7988031044d3cbdb277668b775c08db1a74
Author: Kent Yao 
AuthorDate: Wed Jun 12 20:23:03 2024 +0800

[SPARK-48596][SQL] Perf improvement for calculating hex string for long

### What changes were proposed in this pull request?

This pull request optimizes the `Hex.hex(num: Long)` method by removing 
leading zeros, thus eliminating the need to copy the array to remove them 
afterward.
### Why are the changes needed?

- Unit tests added
- Did a benchmark locally (30~50% speedup)

```scala
Hex Long Tests:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Legacy 1062   1094  
16  9.4 106.2   1.0X
New 739807  
26 13.5  73.9   1.4X
```

```scala
object HexBenchmark extends BenchmarkBase {
  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
val N = 10_000_000
runBenchmark("Hex") {
  val benchmark = new Benchmark("Hex Long Tests", N, 10, output = 
output)
  val range = 1 to 12
  benchmark.addCase("Legacy") { _ =>
(1 to N).foreach(x => range.foreach(y => hexLegacy(x - y)))
  }

  benchmark.addCase("New") { _ =>
(1 to N).foreach(x => range.foreach(y => Hex.hex(x - y)))
  }
  benchmark.run()
}
  }

  def hexLegacy(num: Long): UTF8String = {
// Extract the hex digits of num into value[] from right to left
val value = new Array[Byte](16)
var numBuf = num
var len = 0
do {
  len += 1
  // Hex.hexDigits need to be seen here
  value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt)
  numBuf >>>= 4
} while (numBuf != 0)
UTF8String.fromBytes(java.util.Arrays.copyOfRange(value, value.length - 
len, value.length))
  }
}
```

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?
no

    Closes #46952 from yaooqinn/SPARK-48596.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../sql/catalyst/expressions/mathExpressions.scala | 28 ---
 .../spark/sql/catalyst/expressions/HexSuite.scala  | 40 ++
 2 files changed, 55 insertions(+), 13 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index 8df46500ddcf..6801fc7c257c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -1018,9 +1018,9 @@ case class Bin(child: Expression)
 }
 
 object Hex {
-  val hexDigits = Array[Char](
-'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 
'F'
-  ).map(_.toByte)
+  private final val hexDigits =
+Array[Byte]('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 
'C', 'D', 'E', 'F')
+  private final val ZERO_UTF8 = UTF8String.fromBytes(Array[Byte]('0'))
 
   // lookup table to translate '0' -> 0 ... 'F'/'f' -> 15
   val unhexDigits = {
@@ -1036,24 +1036,26 @@ object Hex {
 val value = new Array[Byte](length * 2)
 var i = 0
 while (i < length) {
-  value(i * 2) = Hex.hexDigits((bytes(i) & 0xF0) >> 4)
-  value(i * 2 + 1) = Hex.hexDigits(bytes(i) & 0x0F)
+  value(i * 2) = hexDigits((bytes(i) & 0xF0) >> 4)
+  value(i * 2 + 1) = hexDigits(bytes(i) & 0x0F)
   i += 1
 }
 UTF8String.fromBytes(value)
   }
 
   def hex(num: Long): UTF8String = {
-// Extract the hex digits of num into value[] from right to left
-val value = new Array[Byte](16)
+val zeros = jl.Long.numberOfLeadingZeros(num)
+if (zeros == jl.Long.SIZE) return ZERO_UTF8
+val len = (jl.Long.SIZE - zeros + 3) / 4
 var numBuf = num
-var len = 0
-do {
-  len += 1
-  value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt)
+val value =

(spark) branch master updated: [SPARK-48595][CORE] Cleanup deprecated api usage related to `commons-compress`

2024-06-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a3625a98e78c [SPARK-48595][CORE] Cleanup deprecated api usage related 
to `commons-compress`
a3625a98e78c is described below

commit a3625a98e78c43c64cbe4a21f7c70f46307df508
Author: yangjie01 
AuthorDate: Wed Jun 12 17:11:22 2024 +0800

[SPARK-48595][CORE] Cleanup deprecated api usage related to 
`commons-compress`

### What changes were proposed in this pull request?
This pr use `org.apache.commons.io.output.CountingOutputStream` instead of 
`org.apache.commons.compress.utils.CountingOutputStream` to fix the following 
compilation warnings related to 'commons-compress':

```
[WARNING] [Warn] 
/Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala:308:
 class CountingOutputStream in package utils is deprecated
Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.deploy.history.RollingEventLogFilesWriter.countingOutputStream,
 origin=org.apache.commons.compress.utils.CountingOutputStream
[WARNING] [Warn] 
/Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala:351:
 class CountingOutputStream in package utils is deprecated
Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.deploy.history.RollingEventLogFilesWriter.rollEventLogFile.$anonfun,
 origin=org.apache.commons.compress.utils.CountingOutputStream
```

The fix refers to:


https://github.com/apache/commons-compress/blob/95727006cac0892c654951c4e7f1db142462f22a/src/main/java/org/apache/commons/compress/utils/CountingOutputStream.java#L25-L33

```
/**
 * Stream that tracks the number of bytes read.
 *
 * since 1.3
 * NotThreadSafe
 * deprecated Use {link org.apache.commons.io.output.CountingOutputStream}.
 */
Deprecated
public class CountingOutputStream extends FilterOutputStream {
```

### Why are the changes needed?
Cleanup deprecated api usage related to `commons-compress`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46950 from LuciferYang/SPARK-48595.

Authored-by: yangjie01 
Signed-off-by: Kent Yao 
---
 .../scala/org/apache/spark/deploy/history/EventLogFileWriters.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
index 963ed121547c..f3bb6d5af335 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
@@ -21,7 +21,7 @@ import java.io._
 import java.net.URI
 import java.nio.charset.StandardCharsets
 
-import org.apache.commons.compress.utils.CountingOutputStream
+import org.apache.commons.io.output.CountingOutputStream
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileStatus, FileSystem, FSDataOutputStream, Path}
 import org.apache.hadoop.fs.permission.FsPermission
@@ -330,7 +330,7 @@ class RollingEventLogFilesWriter(
 
   override def writeEvent(eventJson: String, flushLogger: Boolean = false): 
Unit = {
 writer.foreach { w =>
-  val currentLen = countingOutputStream.get.getBytesWritten
+  val currentLen = countingOutputStream.get.getByteCount
   if (currentLen + eventJson.length > eventFileMaxLength) {
 rollEventLogFile()
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48584][SQL] Perf improvement for unescapePathName

2024-06-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new da81d8ecb802 [SPARK-48584][SQL] Perf improvement for unescapePathName
da81d8ecb802 is described below

commit da81d8ecb80226fa5fb2b6e50048f05d67fb5904
Author: Kent Yao 
AuthorDate: Wed Jun 12 16:39:49 2024 +0800

[SPARK-48584][SQL] Perf improvement for unescapePathName

### What changes were proposed in this pull request?

This PR improves perf for unescapePathName with algorithms briefly 
described as:
- If a path contains no '%' or contains '%' at `position > path.length-2`, 
we return the original identity instead of creating a new StringBuilder to 
append char by char
- Otherwise, we loop with 2 indices, `plaintextStartIdx` which starts from 
0 and then points to the next char after resolving `%xx`, and `plaintextEndIdx` 
which points to the next `'%'`. `plaintextStartIdx` moves to `plaintextEndIdx + 
3` if `%xx` is valid, or moves to `plaintextEndIdx + 1` if `%xx` is invalid.
- Instead of using Integer.parseInt with error capture, we identify the 
high and low characters manually.

### Why are the changes needed?

performance improvement for hotspots

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

- new tests in ExternalCatalogUtilsSuite
- Benchmark results (9-11x faster)
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46938 from yaooqinn/SPARK-48584.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../EscapePathBenchmark-jdk21-results.txt  | 16 ++-
 .../benchmarks/EscapePathBenchmark-results.txt | 16 ++-
 .../catalyst/catalog/ExternalCatalogUtils.scala| 52 +-
 .../spark/sql/catalyst/EscapePathBenchmark.scala   | 52 +-
 .../catalog/ExternalCatalogUtilsSuite.scala| 26 ++-
 5 files changed, 135 insertions(+), 27 deletions(-)

diff --git a/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt 
b/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt
index 4fffb9bfd49a..3d16c874e8c9 100644
--- a/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt
+++ b/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt
@@ -6,7 +6,19 @@ OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 Escape Tests: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Legacy 7128   7146 
  8  0.17127.9   1.0X
-New 790795 
  5  1.3 789.7   9.0X
+Legacy 6996   7009 
  9  0.16996.5   1.0X
+New 771776 
  3  1.3 770.7   9.1X
+
+
+
+Unescape
+
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
+AMD EPYC 7763 64-Core Processor
+Unescape Tests:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+Legacy 5127   5137 
  6  0.25127.3   1.0X
+New 579583 
  4  1.7 579.3   8.9X
 
 
diff --git a/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt 
b/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt
index 32e44f6e19ef..7cfa134652c2 100644
--- a/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt
+++ b/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt
@@ -6,7 +6,19 @@ OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 
6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 Escape Tests: Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Legacy 6719   6726 
  6  0.16719.3   1.0X
-New

(spark) branch master updated: [SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26

2024-06-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8870efce19f2 [SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26
8870efce19f2 is described below

commit 8870efce19f2abb8419f835d29304ffa7cc53251
Author: Wei Guo 
AuthorDate: Wed Jun 12 15:41:40 2024 +0800

[SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26

### What changes were proposed in this pull request?

Upgrade dropwizard metrics to 4.2.26.

### Why are the changes needed?

There are some bug fixes as belows:

- Correction for the Jetty-12 QTP metrics by dkaukov in 
https://github.com/dropwizard/metrics/pull/4181

- Fix metrics for InstrumentedEE10Handler by zUniQueX in 
https://github.com/dropwizard/metrics/pull/3928

The full release notes:
https://github.com/dropwizard/metrics/releases/tag/v4.2.26

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Passed GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46932 from wayneguow/codahale.

Authored-by: Wei Guo 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +-
 pom.xml   |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4585b534e908..f1a575fb7446 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -190,11 +190,11 @@ 
log4j-layout-template-json/2.22.1//log4j-layout-template-json-2.22.1.jar
 log4j-slf4j2-impl/2.22.1//log4j-slf4j2-impl-2.22.1.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
-metrics-core/4.2.25//metrics-core-4.2.25.jar
-metrics-graphite/4.2.25//metrics-graphite-4.2.25.jar
-metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar
-metrics-json/4.2.25//metrics-json-4.2.25.jar
-metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar
+metrics-core/4.2.26//metrics-core-4.2.26.jar
+metrics-graphite/4.2.26//metrics-graphite-4.2.26.jar
+metrics-jmx/4.2.26//metrics-jmx-4.2.26.jar
+metrics-json/4.2.26//metrics-json-4.2.26.jar
+metrics-jvm/4.2.26//metrics-jvm-4.2.26.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.110.Final//netty-all-4.1.110.Final.jar
 netty-buffer/4.1.110.Final//netty-buffer-4.1.110.Final.jar
diff --git a/pom.xml b/pom.xml
index bc81b810715b..c006a5a3234f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -151,7 +151,7 @@
 If you change codahale.metrics.version, you also need to change
 the link to metrics.dropwizard.io in docs/monitoring.md.
 -->
-4.2.25
+4.2.26
 
 1.11.3
 1.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test

2024-06-11 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 72df3cb1a43b [SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 
in ui-test
72df3cb1a43b is described below

commit 72df3cb1a43bd3cc0b20456733228dbb0b403305
Author: yangjie01 
AuthorDate: Wed Jun 12 10:14:38 2024 +0800

[SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test

### What changes were proposed in this pull request?
This pr aims to upgrade `braces` from 3.0.2 to 3.0.3 in ui-test.

The original pr was submitted by `dependabot`: 
https://github.com/apache/spark/pull/46931

### Why are the changes needed?
The new version fix vulnerability 
https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727

- 
https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3

The complete list of changes is as follows:

- https://github.com/micromatch/braces/compare/3.0.2...3.0.3

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46933 from LuciferYang/SPARK-48582.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Kent Yao 
---
 ui-test/package-lock.json | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ui-test/package-lock.json b/ui-test/package-lock.json
index 23ff8ede6515..ec870dfa4801 100644
--- a/ui-test/package-lock.json
+++ b/ui-test/package-lock.json
@@ -1392,12 +1392,12 @@
   }
 },
 "node_modules/braces": {
-  "version": "3.0.2",
-  "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.2.tgz;,
-  "integrity": 
"sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A==",
+  "version": "3.0.3",
+  "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.3.tgz;,
+  "integrity": 
"sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==",
   "dev": true,
   "dependencies": {
-"fill-range": "^7.0.1"
+"fill-range": "^7.1.1"
   },
   "engines": {
 "node": ">=8"
@@ -1911,9 +1911,9 @@
   }
 },
 "node_modules/fill-range": {
-  "version": "7.0.1",
-  "resolved": 
"https://registry.npmjs.org/fill-range/-/fill-range-7.0.1.tgz;,
-  "integrity": 
"sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ==",
+  "version": "7.1.1",
+  "resolved": 
"https://registry.npmjs.org/fill-range/-/fill-range-7.1.1.tgz;,
+  "integrity": 
"sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==",
   "dev": true,
   "dependencies": {
 "to-regex-range": "^5.0.1"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: Update docker links on the download page (#522)

2024-06-11 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c6fb226d71 Update docker links on the download page (#522)
c6fb226d71 is described below

commit c6fb226d7197e6c2e3a3922c04c506cd0cc6cee1
Author: Kent Yao 
AuthorDate: Tue Jun 11 18:52:26 2024 +0800

Update docker links on the download page (#522)
---
 downloads.md| 6 --
 site/downloads.html | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/downloads.md b/downloads.md
index 6598534668..43312a2895 100644
--- a/downloads.md
+++ b/downloads.md
@@ -41,9 +41,11 @@ Spark artifacts are [hosted in Maven 
Central](https://search.maven.org/search?q=
 https://pypi.org/project/pyspark/;>PySpark is now available in 
pypi. To install just run `pip install pyspark`.
 
 
-### Convenience Docker Container Images
+### Installing with Docker
 
-[Spark Docker Container images are available from 
DockerHub](https://hub.docker.com/r/apache/spark-py/tags), these images contain 
non-ASF software and may be subject to different license terms.
+Spark docker images are available from Dockerhub under the accounts of both 
[The Apache Software Foundation](https://hub.docker.com/r/apache/spark/) and 
[Official Images](https://hub.docker.com/_/spark).
+
+Note that, these images contain non-ASF software and may be subject to 
different license terms. Please check their 
[Dockerfiles](https://github.com/apache/spark-docker) to verify whether to 
verify whether they are compatible with your deployment.
 
 ### Release notes for stable releases
 
diff --git a/site/downloads.html b/site/downloads.html
index 77baa1a1fe..fad86b7f58 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -182,9 +182,11 @@ version: 3.5.1
 Installing with PyPi
 https://pypi.org/project/pyspark/;>PySpark is now available in 
pypi. To install just run pip install pyspark.
 
-Convenience Docker Container 
Images
+Installing with Docker
 
-https://hub.docker.com/r/apache/spark-py/tags;>Spark Docker 
Container images are available from DockerHub, these images contain non-ASF 
software and may be subject to different license terms.
+Spark docker images are available from Dockerhub under the accounts of both 
https://hub.docker.com/r/apache/spark/;>The Apache Software 
Foundation and https://hub.docker.com/_/spark;>Official 
Images.
+
+Note that, these images contain non-ASF software and may be subject to 
different license terms. Please check their https://github.com/apache/spark-docker;>Dockerfiles to verify whether 
to verify whether they are compatible with your deployment.
 
 Release notes for stable 
releases
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48565][UI] Fix thread dump display in UI

2024-06-10 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 53d65fd12dd9 [SPARK-48565][UI] Fix thread dump display in UI
53d65fd12dd9 is described below

commit 53d65fd12dd9231139188227ef9040d40d759021
Author: Cheng Pan 
AuthorDate: Tue Jun 11 11:28:50 2024 +0800

[SPARK-48565][UI] Fix thread dump display in UI

### What changes were proposed in this pull request?

Thread dump display in UI is not pretty as before, this is side-effect 
introduced by SPARK-44863

### Why are the changes needed?

Restore thread dump display in UI.

### Does this PR introduce _any_ user-facing change?

Yes, it only affects UI display.

### How was this patch tested?

Current master:
https://github.com/apache/spark/assets/26535726/5c6fd770-467f-481c-a635-2855a2853633;>

With this patch applied:
https://github.com/apache/spark/assets/26535726/3998c2aa-671f-4921-8444-b7bca8667202;>

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46916 from pan3793/SPARK-48565.

Authored-by: Cheng Pan 
Signed-off-by: Kent Yao 
---
 core/src/main/scala/org/apache/spark/status/api/v1/api.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala 
b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala
index 7a0c69e29488..6ae1dce57f31 100644
--- a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala
+++ b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala
@@ -510,7 +510,7 @@ case class StackTrace(elems: Seq[String]) {
   override def toString: String = elems.mkString
 
   def html: NodeSeq = {
-val withNewLine = elems.foldLeft(NodeSeq.Empty) { (acc, elem) =>
+val withNewLine = elems.map(_.stripLineEnd).foldLeft(NodeSeq.Empty) { 
(acc, elem) =>
   if (acc.isEmpty) {
 acc :+ Text(elem)
   } else {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable"

2024-06-07 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b7d9c317aa2e Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions 
in JDBCTableCatalog.loadTable"
b7d9c317aa2e is described below

commit b7d9c317aa2e4de8024e44db895fa8b0cbbb36db
Author: Kent Yao 
AuthorDate: Fri Jun 7 16:31:47 2024 +0800

Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in 
JDBCTableCatalog.loadTable"

This reverts commit 82b4ad2af64845503604da70ff02748c3969c991.
---
 common/utils/src/main/resources/error/error-conditions.json |  5 -
 .../execution/datasources/v2/jdbc/JDBCTableCatalog.scala| 13 +
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 36d8fe1daa37..7b8830073770 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -1255,11 +1255,6 @@
   "List namespaces."
 ]
   },
-  "LOAD_TABLE" : {
-"message" : [
-  "Load the table ."
-]
-  },
   "NAMESPACE_EXISTS" : {
 "message" : [
   "Check that the namespace  exists."
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
index e7a3fe0f8aa7..dbd8ee5981da 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
@@ -131,16 +131,13 @@ class JDBCTableCatalog extends TableCatalog
 checkNamespace(ident.namespace())
 val optionsWithTableName = new JDBCOptions(
   options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> 
getTableName(ident)))
-JdbcUtils.classifyException(
-  errorClass = "FAILED_JDBC.LOAD_TABLE",
-  messageParameters = Map(
-"url" -> options.getRedactUrl(),
-"tableName" -> toSQLId(ident)),
-  dialect,
-  description = s"Failed to load table: $ident"
-) {
+try {
   val schema = JDBCRDD.resolveTable(optionsWithTableName)
   JDBCTable(ident, schema, optionsWithTableName)
+} catch {
+  case e: SQLException =>
+logWarning("Failed to load table", e)
+throw QueryCompilationErrors.noSuchTableError(ident)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (82b4ad2af648 -> 94912920b0e9)

2024-06-06 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 82b4ad2af648 [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in 
JDBCTableCatalog.loadTable
 add 94912920b0e9 [SPARK-48548][BUILD] Add LICENSE/NOTICE for spark-core 
with shaded dependencies

No new revisions were added by this update.

Summary of changes:
 core/pom.xml   |  2 +
 .../src/main/resources/META-INF/LICENSE| 49 ++
 core/src/main/resources/META-INF/NOTICE| 29 +
 3 files changed, 43 insertions(+), 37 deletions(-)
 copy LICENSE => core/src/main/resources/META-INF/LICENSE (92%)
 create mode 100644 core/src/main/resources/META-INF/NOTICE


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable

2024-06-06 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 82b4ad2af648 [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in 
JDBCTableCatalog.loadTable
82b4ad2af648 is described below

commit 82b4ad2af64845503604da70ff02748c3969c991
Author: Wenchen Fan 
AuthorDate: Fri Jun 7 10:11:40 2024 +0800

[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in 
JDBCTableCatalog.loadTable

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/44335 , which 
missed to handle `loadTable`

### Why are the changes needed?

better error message

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46905 from cloud-fan/jdbc.

Authored-by: Wenchen Fan 
Signed-off-by: Kent Yao 
---
 common/utils/src/main/resources/error/error-conditions.json |  5 +
 .../execution/datasources/v2/jdbc/JDBCTableCatalog.scala| 13 -
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 7b8830073770..36d8fe1daa37 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -1255,6 +1255,11 @@
   "List namespaces."
 ]
   },
+  "LOAD_TABLE" : {
+"message" : [
+  "Load the table ."
+]
+  },
   "NAMESPACE_EXISTS" : {
 "message" : [
   "Check that the namespace  exists."
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
index dbd8ee5981da..e7a3fe0f8aa7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
@@ -131,13 +131,16 @@ class JDBCTableCatalog extends TableCatalog
 checkNamespace(ident.namespace())
 val optionsWithTableName = new JDBCOptions(
   options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> 
getTableName(ident)))
-try {
+JdbcUtils.classifyException(
+  errorClass = "FAILED_JDBC.LOAD_TABLE",
+  messageParameters = Map(
+"url" -> options.getRedactUrl(),
+"tableName" -> toSQLId(ident)),
+  dialect,
+  description = s"Failed to load table: $ident"
+) {
   val schema = JDBCRDD.resolveTable(optionsWithTableName)
   JDBCTable(ident, schema, optionsWithTableName)
-} catch {
-  case e: SQLException =>
-logWarning("Failed to load table", e)
-throw QueryCompilationErrors.noSuchTableError(ident)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48540][CORE] Avoid ivy output loading settings to stdout

2024-06-06 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f4434c36cc4f [SPARK-48540][CORE] Avoid ivy output loading settings to 
stdout
f4434c36cc4f is described below

commit f4434c36cc4f7b0147e0e8fe26ac0f177a5199cd
Author: sychen 
AuthorDate: Thu Jun 6 14:35:52 2024 +0800

[SPARK-48540][CORE] Avoid ivy output loading settings to stdout

### What changes were proposed in this pull request?
This PR aims to avoid ivy output loading settings to stdout.

### Why are the changes needed?
Now `org.apache.spark.util.MavenUtils#getModuleDescriptor` will output the 
following string to stdout.

This is due to the modified code order in SPARK-32596 .

```
:: loading settings :: url = 
jar:file:/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
```

Stack trace
```java
at 
org.apache.ivy.core.settings.IvySettings.load(IvySettings.java:404)
at 
org.apache.ivy.core.settings.IvySettings.loadDefault(IvySettings.java:443)
at org.apache.ivy.Ivy.configureDefault(Ivy.java:435)
at org.apache.ivy.core.IvyContext.getDefaultIvy(IvyContext.java:201)
at org.apache.ivy.core.IvyContext.getIvy(IvyContext.java:180)
at org.apache.ivy.core.IvyContext.getSettings(IvyContext.java:216)
at 
org.apache.ivy.core.module.status.StatusManager.getCurrent(StatusManager.java:40)
at 
org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.(DefaultModuleDescriptor.java:206)
at 
org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.newDefaultInstance(DefaultModuleDescriptor.java:107)
at 
org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.newDefaultInstance(DefaultModuleDescriptor.java:66)
at 
org.apache.spark.deploy.SparkSubmitUtils$.getModuleDescriptor(SparkSubmit.scala:1413)
at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1460)
at 
org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:327)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:942)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:181)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46882 from cxzl25/SPARK-48540.

Authored-by: sychen 
Signed-off-by: Kent Yao 
---
 .../src/main/scala/org/apache/spark/util/MavenUtils.scala   | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
index 08291859a32c..ae00987cd69f 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
@@ -462,14 +462,13 @@ private[spark] object MavenUtils extends Logging {
   val sysOut = System.out
   // Default configuration name for ivy
   val ivyConfName = "default"
-
-  // A Module descriptor must be specified. Entries are dummy strings
-  val md = getModuleDescriptor
-
-  md.setDefaultConf(ivyConfName)
+  var md: DefaultModuleDescriptor = null
   try {
 // To prevent ivy from logging to system out
 System.setOut(printStream)
+// A Module descriptor must be specified. Entries are dummy strings
+md = getModuleDescriptor
+md.setDefaultConf(ivyConfName)
 val artifacts = extractMavenCoordinates(coordinates)
 // Directories for caching downloads through ivy and storing the jars 
when maven coordinates
 // are supplied to spark-submit
@@ -548,7 +547,9 @@ private[spark] object MavenUtils extends Logging {
 }
   } finally {
 System.setOut(sysOut)
-clearIvyResolutionFiles(md.getModuleRevisionId, 
ivySettings.getDefaultCache, ivyConfName)
+if (md != null) {
+  clearIvyResolutionFiles(md.getModuleRevisionId, 
ivySettings.getDefaultCache, ivyConfName)
+}
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (966c3d9ef1ed -> b3700ac09861)

2024-06-06 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 966c3d9ef1ed [SPARK-47552][CORE][FOLLOWUP] Set 
spark.hadoop.fs.s3a.connection.establish.timeout to numeric
 add b3700ac09861 [SPARK-48539][BUILD][TESTS] Upgrade docker-java to 3.3.6

No new revisions were added by this update.

Summary of changes:
 pom.xml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp

2024-06-05 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 31ce2db6d208 [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp
31ce2db6d208 is described below

commit 31ce2db6d20828844d0acab464346d7e3a4206e8
Author: Kent Yao 
AuthorDate: Thu Jun 6 10:22:24 2024 +0800

[SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp

### What changes were proposed in this pull request?

As described in 
[HIVE-15551](https://issues.apache.org/jira/browse/HIVE-15551), HMS will memory 
leak when directsql is enabled for MySQL metastore DB.

Although HIVE-15551 has been resolved already, the bug can still occur on 
our side as we have multiple hive version supported.

Considering bonecp has been removed from hive since 4.0.0 and HikariCP is 
not supported by all hive versions we support, we replace bonecp with `DBCP` to 
avoid memory leak

### Why are the changes needed?

fix memory leak of HMS

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Run `org.apache.spark.sql.hive.execution.SQLQuerySuite` and pass without 
linkage errors

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46879 from yaooqinn/SPARK-48538.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 LICENSE-binary   | 1 -
 dev/deps/spark-deps-hadoop-3-hive-2.3| 1 -
 pom.xml  | 4 
 .../scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala  | 9 +
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index 456b07484257..b6971798e557 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -218,7 +218,6 @@ com.google.crypto.tink:tink
 com.google.flatbuffers:flatbuffers-java
 com.google.guava:guava
 com.jamesmurty.utils:java-xmlbuilder
-com.jolbox:bonecp
 com.ning:compress-lzf
 com.squareup.okhttp3:logging-interceptor
 com.squareup.okhttp3:okhttp
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index acb236e1c4e0..8ab76b5787b8 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -29,7 +29,6 @@ 
azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar
 azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar
 azure-storage/7.0.1//azure-storage-7.0.1.jar
 blas/3.0.3//blas-3.0.3.jar
-bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
 breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
 bundle/2.24.6//bundle-2.24.6.jar
diff --git a/pom.xml b/pom.xml
index bd384e42b0ec..585b8b193b32 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2332,6 +2332,10 @@
 co.cask.tephra
 *
   
+  
+com.jolbox
+bonecp
+  
 
   
 
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 2bb2fe970a11..11e077e891bd 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -1340,6 +1340,15 @@ private[hive] object HiveClientImpl extends Logging {
 log"will be reset to 'mr' to disable useless hive logic")
   hiveConf.set("hive.execution.engine", "mr", SOURCE_SPARK)
 }
+val cpType = hiveConf.get("datanucleus.connectionPoolingType")
+// Bonecp might cause memory leak, it could affect some hive client 
versions we support
+// See more details in HIVE-15551
+// Also, Bonecp is removed in Hive 4.0.0, see HIVE-23258
+// Here we use DBCP to replace bonecp instead of HikariCP as HikariCP was 
introduced in
+// Hive 2.2.0 (see HIVE-13931) while the minium Hive we support is 2.0.0.
+if ("bonecp".equalsIgnoreCase(cpType)) {
+  hiveConf.set("datanucleus.connectionPoolingType", "DBCP", SOURCE_SPARK)
+}
 hiveConf
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement

2024-06-05 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 88b8dc29e100 [SPARK-46937][SQL][FOLLOWUP] Properly check registered 
function replacement
88b8dc29e100 is described below

commit 88b8dc29e100a51501701ffdffbcd0eff1f97c98
Author: Wenchen Fan 
AuthorDate: Wed Jun 5 17:40:59 2024 +0800

[SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement

### What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/44976 . 
`ConcurrentHashMap#put` has a different semantic than the scala map, and it 
returns null if the key is new. We should update the checking code accordingly.

### Why are the changes needed?

avoid wrong warning messages

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manual

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46876 from cloud-fan/log.

Authored-by: Wenchen Fan 
Signed-off-by: Kent Yao 
---
 .../scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index a52feaa41acf..588752f3fc17 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -222,7 +222,7 @@ trait SimpleFunctionRegistryBase[T] extends 
FunctionRegistryBase[T] with Logging
   builder: FunctionBuilder): Unit = {
 val newFunction = (info, builder)
 functionBuilders.put(name, newFunction) match {
-  case previousFunction if previousFunction != newFunction =>
+  case previousFunction if previousFunction != null =>
 logWarning(log"The function ${MDC(FUNCTION_NAME, name)} replaced a " +
   log"previously registered function.")
   case _ =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled

2024-06-05 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d3a324d63f82 [SPARK-48535][SS] Update config docs to indicate 
possibility of data loss/corruption issue if skip nulls for stream-stream joins 
config is enabled
d3a324d63f82 is described below

commit d3a324d63f82ffc4a4818bb1bfe7485d12f1dada
Author: Anish Shrigondekar 
AuthorDate: Wed Jun 5 16:34:45 2024 +0800

[SPARK-48535][SS] Update config docs to indicate possibility of data 
loss/corruption issue if skip nulls for stream-stream joins config is enabled

### What changes were proposed in this pull request?
Update config docs to indicate possibility of data loss/corruption issue if 
skip nulls for stream-stream joins config is enabled

### Why are the changes needed?
Clarifying the implications of turning off this config after a certain 
Spark version

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A - config doc only change

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46875 from anishshri-db/task/SPARK-48535.

Authored-by: Anish Shrigondekar 
Signed-off-by: Kent Yao 
(cherry picked from commit c4f720dfb41919dade7002b49462b3ff6b91eb22)
Signed-off-by: Kent Yao 
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 74ff4f09a157..ba27a03fdc31 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2120,7 +2120,9 @@ object SQLConf {
   
buildConf("spark.sql.streaming.stateStore.skipNullsForStreamStreamJoins.enabled")
 .internal()
 .doc("When true, this config will skip null values in hash based 
stream-stream joins. " +
-  "The number of skipped null values will be shown as custom metric of 
stream join operator.")
+  "The number of skipped null values will be shown as custom metric of 
stream join operator. " +
+  "If the streaming query was started with Spark 3.5 or above, please 
exercise caution " +
+  "before enabling this config since it may hide potential data 
loss/corruption issues.")
 .version("3.3.0")
 .booleanConf
 .createWithDefault(false)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled

2024-06-05 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c4f720dfb419 [SPARK-48535][SS] Update config docs to indicate 
possibility of data loss/corruption issue if skip nulls for stream-stream joins 
config is enabled
c4f720dfb419 is described below

commit c4f720dfb41919dade7002b49462b3ff6b91eb22
Author: Anish Shrigondekar 
AuthorDate: Wed Jun 5 16:34:45 2024 +0800

[SPARK-48535][SS] Update config docs to indicate possibility of data 
loss/corruption issue if skip nulls for stream-stream joins config is enabled

### What changes were proposed in this pull request?
Update config docs to indicate possibility of data loss/corruption issue if 
skip nulls for stream-stream joins config is enabled

### Why are the changes needed?
Clarifying the implications of turning off this config after a certain 
Spark version

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A - config doc only change

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46875 from anishshri-db/task/SPARK-48535.

Authored-by: Anish Shrigondekar 
Signed-off-by: Kent Yao 
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 88c2228e640c..c4e584b9e31d 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2301,7 +2301,9 @@ object SQLConf {
   
buildConf("spark.sql.streaming.stateStore.skipNullsForStreamStreamJoins.enabled")
 .internal()
 .doc("When true, this config will skip null values in hash based 
stream-stream joins. " +
-  "The number of skipped null values will be shown as custom metric of 
stream join operator.")
+  "The number of skipped null values will be shown as custom metric of 
stream join operator. " +
+  "If the streaming query was started with Spark 3.5 or above, please 
exercise caution " +
+  "before enabling this config since it may hide potential data 
loss/corruption issues.")
 .version("3.3.0")
 .booleanConf
 .createWithDefault(false)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: Revert "[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`"

2024-06-04 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new db527ac346f2 Revert "[SPARK-48505][CORE] Simplify the implementation 
of `Utils#isG1GC`"
db527ac346f2 is described below

commit db527ac346f2f6f6dbddefe292a24848d1120172
Author: Kent Yao 
AuthorDate: Wed Jun 5 13:20:30 2024 +0800

Revert "[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`"

This reverts commit abbe301d7645217f22641cf3a5c41502680e65be.
---
 core/src/main/scala/org/apache/spark/util/Utils.scala | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 991fb074d246..0ac1405abe6c 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -19,7 +19,7 @@ package org.apache.spark.util
 
 import java.io._
 import java.lang.{Byte => JByte}
-import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, 
ThreadInfo}
+import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, 
PlatformManagedObject, ThreadInfo}
 import java.lang.reflect.InvocationTargetException
 import java.math.{MathContext, RoundingMode}
 import java.net._
@@ -3058,8 +3058,16 @@ private[spark] object Utils
*/
   lazy val isG1GC: Boolean = {
 Try {
-  ManagementFactory.getGarbageCollectorMXBeans.asScala
-.exists(_.getName.contains("G1"))
+  val clazz = 
Utils.classForName("com.sun.management.HotSpotDiagnosticMXBean")
+.asInstanceOf[Class[_ <: PlatformManagedObject]]
+  val vmOptionClazz = Utils.classForName("com.sun.management.VMOption")
+  val hotSpotDiagnosticMXBean = ManagementFactory.getPlatformMXBean(clazz)
+  val vmOptionMethod = clazz.getMethod("getVMOption", classOf[String])
+  val valueMethod = vmOptionClazz.getMethod("getValue")
+
+  val useG1GCObject = vmOptionMethod.invoke(hotSpotDiagnosticMXBean, 
"UseG1GC")
+  val useG1GC = valueMethod.invoke(useG1GCObject).asInstanceOf[String]
+  "true".equals(useG1GC)
 }.getOrElse(false)
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48518][CORE] Make LZF compression be able to run in parallel

2024-06-04 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 90ee29992522 [SPARK-48518][CORE] Make LZF compression be able to run 
in parallel
90ee29992522 is described below

commit 90ee299925220fa564c90e1f688a0d13ba0ac79d
Author: Kent Yao 
AuthorDate: Tue Jun 4 18:58:33 2024 +0800

[SPARK-48518][CORE] Make LZF compression be able to run in parallel

### What changes were proposed in this pull request?

This PR introduced a config that turns on LZF compression to parallel mode 
via using PLZFOutputStream.

FYI, https://github.com/ning/compress?tab=readme-ov-file#parallel-processing

### Why are the changes needed?

Improve performance

```
[info] OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
[info] Apple M2 Max
[info] Compress large objects:Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
[info] 
-
[info] Compression 1024 array values in 7 threads12 
13   1  0.1   11788.2   1.0X
[info] Compression 1024 array values single-threaded 23 
23   0  0.0   22512.7   0.5X
```

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

benchmark
### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46858 from yaooqinn/SPARK-48518.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 core/benchmarks/LZFBenchmark-jdk21-results.txt | 19 +
 core/benchmarks/LZFBenchmark-results.txt   | 19 +
 .../org/apache/spark/internal/config/package.scala |  7 ++
 .../org/apache/spark/io/CompressionCodec.scala |  8 +-
 .../scala/org/apache/spark/io/LZFBenchmark.scala   | 93 ++
 docs/configuration.md  |  8 ++
 6 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/core/benchmarks/LZFBenchmark-jdk21-results.txt 
b/core/benchmarks/LZFBenchmark-jdk21-results.txt
new file mode 100644
index ..e1566f201a1f
--- /dev/null
+++ b/core/benchmarks/LZFBenchmark-jdk21-results.txt
@@ -0,0 +1,19 @@
+
+Benchmark LZFCompressionCodec
+
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
+AMD EPYC 7763 64-Core Processor
+Compress small objects:   Best Time(ms)   Avg Time(ms) 
  Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+Compression 25600 int values in parallel598600 
  2428.2   2.3   1.0X
+Compression 25600 int values single-threaded568570 
  2451.0   2.2   1.1X
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
+AMD EPYC 7763 64-Core Processor
+Compress large objects:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+-
+Compression 1024 array values in 1 threads39 45
   5  0.0   38475.4   1.0X
+Compression 1024 array values single-threaded 32 33
   1  0.0   31154.5   1.2X
+
+
diff --git a/core/benchmarks/LZFBenchmark-results.txt 
b/core/benchmarks/LZFBenchmark-results.txt
new file mode 100644
index ..facc67f9cf4a
--- /dev/null
+++ b/core/benchmarks/LZFBenchmark-results.txt
@@ -0,0 +1,19 @@
+
+Benchmark LZFCompressionCodec
+
+
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1021-azure
+AMD EPYC 7763 64-Core Processor
+Compress small objects:   Best Time(ms)   Avg Time(ms) 
  Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+Compression 25600 int values in parallel602612 
  6425.1   2.4   1.0X
+Compression 25600 int

(spark) branch master updated: [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`

2024-06-04 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new abbe301d7645 [SPARK-48505][CORE] Simplify the implementation of 
`Utils#isG1GC`
abbe301d7645 is described below

commit abbe301d7645217f22641cf3a5c41502680e65be
Author: yangjie01 
AuthorDate: Tue Jun 4 15:41:41 2024 +0800

[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`

### What changes were proposed in this pull request?
This PR changes to use the result of 
`ManagementFactory.getGarbageCollectorMXBeans` to determine whether G1GC is 
used. When G1GC is used, `ManagementFactory.getGarbageCollectorMXBeans` will 
return two instances of `GarbageCollectorExtImpl`, their names are `G1 Young 
Generation` and `G1 Old Generation` respectively.

### Why are the changes needed?
Simplify the implementation.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46783 from LuciferYang/refactor-isG1GC.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Kent Yao 
---
 core/src/main/scala/org/apache/spark/util/Utils.scala | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 0ac1405abe6c..991fb074d246 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -19,7 +19,7 @@ package org.apache.spark.util
 
 import java.io._
 import java.lang.{Byte => JByte}
-import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, 
PlatformManagedObject, ThreadInfo}
+import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, 
ThreadInfo}
 import java.lang.reflect.InvocationTargetException
 import java.math.{MathContext, RoundingMode}
 import java.net._
@@ -3058,16 +3058,8 @@ private[spark] object Utils
*/
   lazy val isG1GC: Boolean = {
 Try {
-  val clazz = 
Utils.classForName("com.sun.management.HotSpotDiagnosticMXBean")
-.asInstanceOf[Class[_ <: PlatformManagedObject]]
-  val vmOptionClazz = Utils.classForName("com.sun.management.VMOption")
-  val hotSpotDiagnosticMXBean = ManagementFactory.getPlatformMXBean(clazz)
-  val vmOptionMethod = clazz.getMethod("getVMOption", classOf[String])
-  val valueMethod = vmOptionClazz.getMethod("getValue")
-
-  val useG1GCObject = vmOptionMethod.invoke(hotSpotDiagnosticMXBean, 
"UseG1GC")
-  val useG1GC = valueMethod.invoke(useG1GCObject).asInstanceOf[String]
-  "true".equals(useG1GC)
+  ManagementFactory.getGarbageCollectorMXBeans.asScala
+.exists(_.getName.contains("G1"))
 }.getOrElse(false)
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48514][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.0

2024-06-04 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6475ddfed7f4 [SPARK-48514][BUILD][K8S] Upgrade `kubernetes-client` to 
6.13.0
6475ddfed7f4 is described below

commit 6475ddfed7f4fc13ac362181c2a9d28f8f2454f7
Author: Bjørn Jørgensen 
AuthorDate: Tue Jun 4 14:51:15 2024 +0800

[SPARK-48514][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.0

### What changes were proposed in this pull request?
Upgrade kubernetes-client from 6.12.1 to 6.13.0

### Why are the changes needed?
Upgrade Fabric8 Kubernetes Model to Kubernetes v1.30.0
[Release log 
6.13.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.13.0)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46854 from bjornjorgensen/kubclient6.13.0.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  2 +-
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index b7fdc6f670bd..65e627b1854f 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -155,31 +155,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
 jul-to-slf4j/2.0.13//jul-to-slf4j-2.0.13.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
-kubernetes-client-api/6.12.1//kubernetes-client-api-6.12.1.jar
-kubernetes-client/6.12.1//kubernetes-client-6.12.1.jar
-kubernetes-httpclient-okhttp/6.12.1//kubernetes-httpclient-okhttp-6.12.1.jar
-kubernetes-model-admissionregistration/6.12.1//kubernetes-model-admissionregistration-6.12.1.jar
-kubernetes-model-apiextensions/6.12.1//kubernetes-model-apiextensions-6.12.1.jar
-kubernetes-model-apps/6.12.1//kubernetes-model-apps-6.12.1.jar
-kubernetes-model-autoscaling/6.12.1//kubernetes-model-autoscaling-6.12.1.jar
-kubernetes-model-batch/6.12.1//kubernetes-model-batch-6.12.1.jar
-kubernetes-model-certificates/6.12.1//kubernetes-model-certificates-6.12.1.jar
-kubernetes-model-common/6.12.1//kubernetes-model-common-6.12.1.jar
-kubernetes-model-coordination/6.12.1//kubernetes-model-coordination-6.12.1.jar
-kubernetes-model-core/6.12.1//kubernetes-model-core-6.12.1.jar
-kubernetes-model-discovery/6.12.1//kubernetes-model-discovery-6.12.1.jar
-kubernetes-model-events/6.12.1//kubernetes-model-events-6.12.1.jar
-kubernetes-model-extensions/6.12.1//kubernetes-model-extensions-6.12.1.jar
-kubernetes-model-flowcontrol/6.12.1//kubernetes-model-flowcontrol-6.12.1.jar
-kubernetes-model-gatewayapi/6.12.1//kubernetes-model-gatewayapi-6.12.1.jar
-kubernetes-model-metrics/6.12.1//kubernetes-model-metrics-6.12.1.jar
-kubernetes-model-networking/6.12.1//kubernetes-model-networking-6.12.1.jar
-kubernetes-model-node/6.12.1//kubernetes-model-node-6.12.1.jar
-kubernetes-model-policy/6.12.1//kubernetes-model-policy-6.12.1.jar
-kubernetes-model-rbac/6.12.1//kubernetes-model-rbac-6.12.1.jar
-kubernetes-model-resource/6.12.1//kubernetes-model-resource-6.12.1.jar
-kubernetes-model-scheduling/6.12.1//kubernetes-model-scheduling-6.12.1.jar
-kubernetes-model-storageclass/6.12.1//kubernetes-model-storageclass-6.12.1.jar
+kubernetes-client-api/6.13.0//kubernetes-client-api-6.13.0.jar
+kubernetes-client/6.13.0//kubernetes-client-6.13.0.jar
+kubernetes-httpclient-okhttp/6.13.0//kubernetes-httpclient-okhttp-6.13.0.jar
+kubernetes-model-admissionregistration/6.13.0//kubernetes-model-admissionregistration-6.13.0.jar
+kubernetes-model-apiextensions/6.13.0//kubernetes-model-apiextensions-6.13.0.jar
+kubernetes-model-apps/6.13.0//kubernetes-model-apps-6.13.0.jar
+kubernetes-model-autoscaling/6.13.0//kubernetes-model-autoscaling-6.13.0.jar
+kubernetes-model-batch/6.13.0//kubernetes-model-batch-6.13.0.jar
+kubernetes-model-certificates/6.13.0//kubernetes-model-certificates-6.13.0.jar
+kubernetes-model-common/6.13.0//kubernetes-model-common-6.13.0.jar
+kubernetes-model-coordination/6.13.0//kubernetes-model-coordination-6.13.0.jar
+kubernetes-model-core/6.13.0//kubernetes-model-core-6.13.0.jar
+kubernetes-model-discovery/6.13.0//kubernetes-model-discovery-6.13.0.jar
+kubernetes-model-events/6.13.0//kubernetes-model-events-6.13.0.jar
+kubernetes-model-extensions/6.13.0//kubernetes-model-extensions-6.13.0.jar
+kubernetes-model-flowcontrol/6.13.0//kubernetes-model-flowcontrol-6.13.0.jar
+kubernetes-model-gatewayapi/6.13.0//kubernetes-model-gatewayapi-6.13.0.jar
+kubernetes-model-metrics/6.13.0//kubernetes-model-metrics-6.13.0.jar
+kubernetes-model-networking/6.13.0//kubernetes-model-networking-6.13.0.jar
+kubernetes-model-node

(spark) branch branch-3.5 updated (7e0c31445c31 -> 7f99f2cbd7d2)

2024-06-02 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7e0c31445c31 [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan 
against streaming Dataset
 add 7f99f2cbd7d2 [SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on 
mapoutput unregister

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/MapOutputTracker.scala  | 26 ++
 .../org/apache/spark/MapOutputTrackerSuite.scala   | 55 ++
 2 files changed, 72 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48487][INFRA] Update License & Notice according to the dependency changes

2024-06-02 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d534c048866 [SPARK-48487][INFRA] Update License & Notice according to 
the dependency changes
8d534c048866 is described below

commit 8d534c048866d55256d5db6d437f6682e0051f80
Author: Kent Yao 
AuthorDate: Mon Jun 3 10:04:19 2024 +0800

[SPARK-48487][INFRA] Update License & Notice according to the dependency 
changes

### What changes were proposed in this pull request?

This PR updated License & Notice files according to the dependency changes

I also did a little refactoring to make it in alphabetical order

### Why are the changes needed?

to meet apache release policy

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manually check
### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46821 from yaooqinn/SPARK-48487.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 LICENSE-binary  | 399 +-
 NOTICE-binary   | 323 ---
 licenses-binary/LICENSE-check-qual.txt  | 413 +++
 licenses-binary/LICENSE-icu4j.txt   | 519 
 licenses-binary/LICENSE-jakarta-servlet-api.txt | 277 +
 licenses-binary/LICENSE-jline3.txt  |  34 ++
 licenses-binary/LICENSE-loose-version.txt   | 279 +
 licenses-binary/LICENSE-txw2.txt|  28 ++
 licenses/LICENSE-loose-version.txt  | 279 +
 pom.xml |   5 -
 10 files changed, 2085 insertions(+), 471 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index 40271c9924bc..456b07484257 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -204,171 +204,168 @@
 This project bundles some components that are also licensed under the Apache
 License Version 2.0:
 
-org.apache.zookeeper:zookeeper
-oro:oro
-commons-configuration:commons-configuration
-commons-digester:commons-digester
-com.chuusai:shapeless_2.13
-com.googlecode.javaewah:JavaEWAH
-com.twitter:chill-java
-com.twitter:chill_2.13
-com.univocity:univocity-parsers
-javax.jdo:jdo-api
-joda-time:joda-time
-net.sf.opencsv:opencsv
-org.apache.derby:derby
-org.objenesis:objenesis
-org.roaringbitmap:RoaringBitmap
-org.scalanlp:breeze-macros_2.13
-org.scalanlp:breeze_2.13
-org.typelevel:macro-compat_2.13
-org.yaml:snakeyaml
-org.apache.xbean:xbean-asm7-shaded
-com.squareup.okhttp3:logging-interceptor
-com.squareup.okhttp3:okhttp
-com.squareup.okio:okio
-org.apache.spark:spark-catalyst_2.13
-org.apache.spark:spark-kvstore_2.13
-org.apache.spark:spark-launcher_2.13
-org.apache.spark:spark-mllib-local_2.13
-org.apache.spark:spark-network-common_2.13
-org.apache.spark:spark-network-shuffle_2.13
-org.apache.spark:spark-sketch_2.13
-org.apache.spark:spark-tags_2.13
-org.apache.spark:spark-unsafe_2.13
-commons-httpclient:commons-httpclient
-com.vlkan:flatbuffers
-com.ning:compress-lzf
-io.airlift:aircompressor
-io.dropwizard.metrics:metrics-core
-io.dropwizard.metrics:metrics-graphite
-io.dropwizard.metrics:metrics-json
-io.dropwizard.metrics:metrics-jvm
-io.dropwizard.metrics:metrics-jmx
-org.iq80.snappy:snappy
 com.clearspring.analytics:stream
-com.jamesmurty.utils:java-xmlbuilder
-commons-codec:commons-codec
-commons-collections:commons-collections
-io.fabric8:kubernetes-client
-io.fabric8:kubernetes-model
-io.fabric8:kubernetes-model-common
-io.netty:netty-all
-net.hydromatic:eigenbase-properties
-net.sf.supercsv:super-csv
-org.apache.arrow:arrow-format
-org.apache.arrow:arrow-memory
-org.apache.arrow:arrow-vector
-org.apache.commons:commons-crypto
-org.apache.commons:commons-lang3
-org.apache.hadoop:hadoop-annotations
-org.apache.hadoop:hadoop-auth
-org.apache.hadoop:hadoop-client
-org.apache.hadoop:hadoop-common
-org.apache.hadoop:hadoop-hdfs
-org.apache.hadoop:hadoop-hdfs-client
-org.apache.hadoop:hadoop-mapreduce-client-app
-org.apache.hadoop:hadoop-mapreduce-client-common
-org.apache.hadoop:hadoop-mapreduce-client-core
-org.apache.hadoop:hadoop-mapreduce-client-jobclient
-org.apache.hadoop:hadoop-mapreduce-client-shuffle
-org.apache.hadoop:hadoop-yarn-api
-org.apache.hadoop:hadoop-yarn-client
-org.apache.hadoop:hadoop-yarn-common
-org.apache.hadoop:hadoop-yarn-server-common
-org.apache.hadoop:hadoop-yarn-server-web-proxy
-org.apache.httpcomponents:httpclient
-org.apache.httpcomponents:httpcore
-org.apache.kerby:kerb-admin
-org.apache.kerby:kerb-client
-org.apache.kerby:kerb-common
-org.apache.kerby:kerb-core
-org.apache.kerby:kerb-crypto
-org.apache.kerby:kerb-identity
-org.apache.kerby:kerb-server
-org.apache.kerby:kerb-simplekdc
-org.apache.ker

(spark) branch branch-3.4 updated: [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects

2024-05-30 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 0d4e1fa5dbb1 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in 
JDBCDialects
0d4e1fa5dbb1 is described below

commit 0d4e1fa5dbb129fd05cbdd61324cfc3e9389c1c4
Author: Mihailo Milosevic 
AuthorDate: Fri May 31 13:33:02 2024 +0800

[SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects

### What changes were proposed in this pull request?
Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`.

### Why are the changes needed?
https://github.com/apache/spark/pull/46588
Given PR was merged to master/3.5/3.4. This PR broke daily jobs for 
`OracleIntegrationSuite`. Upon inspection, it was noted that 3.4 and 3.5 are 
run with JDK8 while master is run with JDK21 and stripMargin was behaving 
differently in those cases. Upon removing stripMargin and spliting `INSERT 
INTO` statements into multiple lines, all integration tests have passed.

### Does this PR introduce _any_ user-facing change?
No, only loading of the test data was changed to follow language 
requirements.

### How was this patch tested?
Existing suite was aborted in the job and now it is running.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46806

Closes #46807 from mihailom-db/FixOracleMaster.

Authored-by: Mihailo Milosevic 
Signed-off-by: Kent Yao 
(cherry picked from commit 4360ec733d248b62798a191301e2b671f7bcfbd5)
Signed-off-by: Kent Yao 
---
 .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 28 ++
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
index 5f4f0b7a3afb..60345257f2dc 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
@@ -39,16 +39,24 @@ abstract class DockerJDBCIntegrationV2Suite extends 
DockerJDBCIntegrationSuite {
 connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 
1200)")
   .executeUpdate()
 
-connection.prepareStatement(
-  s"""
- |INSERT INTO pattern_testing_table VALUES
- |('special_character_quote''_present'),
- |('special_character_quote_not_present'),
- |('special_character_percent%_present'),
- |('special_character_percent_not_present'),
- |('special_character_underscore_present'),
- |('special_character_underscorenot_present')
- """.stripMargin).executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_quote''_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_quote_not_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_percent%_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_percent_not_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_underscore_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_underscorenot_present')")
+  .executeUpdate()
   }
 
   def tablePreparation(connection: Connection): Unit


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects

2024-05-30 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d64f96cbacd9 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in 
JDBCDialects
d64f96cbacd9 is described below

commit d64f96cbacd9d98b89f31c27cf4aa79262399659
Author: Mihailo Milosevic 
AuthorDate: Fri May 31 13:33:02 2024 +0800

[SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects

### What changes were proposed in this pull request?
Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`.

### Why are the changes needed?
https://github.com/apache/spark/pull/46588
Given PR was merged to master/3.5/3.4. This PR broke daily jobs for 
`OracleIntegrationSuite`. Upon inspection, it was noted that 3.4 and 3.5 are 
run with JDK8 while master is run with JDK21 and stripMargin was behaving 
differently in those cases. Upon removing stripMargin and spliting `INSERT 
INTO` statements into multiple lines, all integration tests have passed.

### Does this PR introduce _any_ user-facing change?
No, only loading of the test data was changed to follow language 
requirements.

### How was this patch tested?
Existing suite was aborted in the job and now it is running.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46806

Closes #46807 from mihailom-db/FixOracleMaster.

Authored-by: Mihailo Milosevic 
Signed-off-by: Kent Yao 
(cherry picked from commit 4360ec733d248b62798a191301e2b671f7bcfbd5)
Signed-off-by: Kent Yao 
---
 .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 28 ++
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
index 5f4f0b7a3afb..60345257f2dc 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
@@ -39,16 +39,24 @@ abstract class DockerJDBCIntegrationV2Suite extends 
DockerJDBCIntegrationSuite {
 connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 
1200)")
   .executeUpdate()
 
-connection.prepareStatement(
-  s"""
- |INSERT INTO pattern_testing_table VALUES
- |('special_character_quote''_present'),
- |('special_character_quote_not_present'),
- |('special_character_percent%_present'),
- |('special_character_percent_not_present'),
- |('special_character_underscore_present'),
- |('special_character_underscorenot_present')
- """.stripMargin).executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_quote''_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_quote_not_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_percent%_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_percent_not_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_underscore_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_underscorenot_present')")
+  .executeUpdate()
   }
 
   def tablePreparation(connection: Connection): Unit


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects

2024-05-30 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4360ec733d24 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in 
JDBCDialects
4360ec733d24 is described below

commit 4360ec733d248b62798a191301e2b671f7bcfbd5
Author: Mihailo Milosevic 
AuthorDate: Fri May 31 13:33:02 2024 +0800

[SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects

### What changes were proposed in this pull request?
Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`.

### Why are the changes needed?
https://github.com/apache/spark/pull/46588
Given PR was merged to master/3.5/3.4. This PR broke daily jobs for 
`OracleIntegrationSuite`. Upon inspection, it was noted that 3.4 and 3.5 are 
run with JDK8 while master is run with JDK21 and stripMargin was behaving 
differently in those cases. Upon removing stripMargin and spliting `INSERT 
INTO` statements into multiple lines, all integration tests have passed.

### Does this PR introduce _any_ user-facing change?
No, only loading of the test data was changed to follow language 
requirements.

### How was this patch tested?
Existing suite was aborted in the job and now it is running.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46806

Closes #46807 from mihailom-db/FixOracleMaster.

Authored-by: Mihailo Milosevic 
Signed-off-by: Kent Yao 
---
 .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 28 ++
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
index 5f4f0b7a3afb..60345257f2dc 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
@@ -39,16 +39,24 @@ abstract class DockerJDBCIntegrationV2Suite extends 
DockerJDBCIntegrationSuite {
 connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 
1200)")
   .executeUpdate()
 
-connection.prepareStatement(
-  s"""
- |INSERT INTO pattern_testing_table VALUES
- |('special_character_quote''_present'),
- |('special_character_quote_not_present'),
- |('special_character_percent%_present'),
- |('special_character_percent_not_present'),
- |('special_character_underscore_present'),
- |('special_character_underscorenot_present')
- """.stripMargin).executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_quote''_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_quote_not_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_percent%_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_percent_not_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_underscore_present')")
+  .executeUpdate()
+connection.prepareStatement("INSERT INTO pattern_testing_table "
++ "VALUES ('special_character_underscorenot_present')")
+  .executeUpdate()
   }
 
   def tablePreparation(connection: Connection): Unit


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48471][CORE] Improve documentation and usage guide for history server

2024-05-30 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new df15c8d7744b [SPARK-48471][CORE] Improve documentation and usage guide 
for history server
df15c8d7744b is described below

commit df15c8d7744becfd44cd4a447c362e8e007bd574
Author: Kent Yao 
AuthorDate: Thu May 30 17:16:31 2024 +0800

[SPARK-48471][CORE] Improve documentation and usage guide for history server

### What changes were proposed in this pull request?

In this PR, we improve documentation and usage guide for the history server 
by:
- Identify and print **unrecognized options** specified by users
- Obtain and print all history server-related configurations dynamically 
instead of using an incomplete, outdated hardcoded list.
- Ensure all configurations are documented for the usage guide

### Why are the changes needed?

- Revise the help guide for the history server to make it more 
user-friendly. Missing configuration in the help guide is not always reachable 
in our official documentation. E.g. spark.history.fs.safemodeCheck.interval is 
still missing from the doc since added in 1.6.
- Missusage shall be reported to users

### Does this PR introduce _any_ user-facing change?

No, the print style is still AS-IS with items increased

### How was this patch tested?

 without this pr

```
Usage: ./sbin/start-history-server.sh [options]
24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for TERM
24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for HUP
24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for INT

Options:
  --properties-file FILE  Path to a custom Spark properties file.
  Default is conf/spark-defaults.conf.

Configuration options can be set by setting the corresponding JVM system 
property.
History Server options are always available; additional options depend on 
the provider.

History Server options:

  spark.history.ui.port  Port where server will listen for 
connections
 (default 18080)
  spark.history.acls.enable  Whether to enable view acls for all 
applications
 (default false)
  spark.history.provider Name of history provider class 
(defaults to
 file system-based provider)
  spark.history.retainedApplications Max number of application UIs to keep 
loaded in memory
 (default 50)
FsHistoryProvider options:

  spark.history.fs.logDirectory  Directory where app logs are stored
 (default: file:/tmp/spark-events)
  spark.history.fs.update.interval   How often to reload log data from 
storage
 (in seconds, default: 10)
```
 For error
```java
Unrecognized options: --conf spark.history.ui.port=1
Usage: HistoryServer [options]

Options:
  --properties-file FILE   Path to 
a custom Spark properties file.
   Default 
is conf/spark-defaults.conf.

```

 For help
```java
 sbin/start-history-server.sh --help
Usage: ./sbin/start-history-server.sh [options]
{"ts":"2024-05-30T07:15:29.740Z","level":"INFO","msg":"Registering signal 
handler for TERM","context":{"signal":"TERM"},"logger":"SignalUtils"}
{"ts":"2024-05-30T07:15:29.741Z","level":"INFO","msg":"Registering signal 
handler for HUP","context":{"signal":"HUP"},"logger":"SignalUtils"}
{"ts":"2024-05-30T07:15:29.741Z","level":"INFO","msg":"Registering signal 
handler for INT","context":{"signal":"INT"},"logger":"SignalUtils"}

Options:
  --properties-file FILE   Path to 
a custom Spark properties file.
   Default 
is conf/spark-defaults.conf.

Configuration options can be set by setting the corresponding JVM system 
property.
History Server options are always available; additional options depend on 
the provider.

History Server options:
  spark.history.custom.executor.log.url

(spark) branch master updated (910c3733bfdd -> b477ef4fa992)

2024-05-30 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 910c3733bfdd Revert "[SPARK-48415][PYTHON] Refactor TypeName to 
support parameterized datatypes"
 add b477ef4fa992 [SPARK-47260][SQL] Assign name to error class 
_LEGACY_ERROR_TEMP_3250

No new revisions were added by this update.

Summary of changes:
 common/utils/src/main/resources/error/error-conditions.json  | 5 -
 sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala| 4 ++--
 .../src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala | 4 ++--
 4 files changed, 6 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48426][SQL][DOCS] Add documentation for SQL operator precedence

2024-05-28 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8bbbde7cb3c3 [SPARK-48426][SQL][DOCS] Add documentation for SQL 
operator precedence
8bbbde7cb3c3 is described below

commit 8bbbde7cb3c396bc369c06853ed3a2ec021a2530
Author: Kent Yao 
AuthorDate: Wed May 29 13:31:38 2024 +0800

[SPARK-48426][SQL][DOCS] Add documentation for SQL operator precedence

### What changes were proposed in this pull request?

This PR adds a doc for SQL operator precedence based on the current 
definition of `SqlBaseParser.g4`

Not related to this PR, I have found that our `^` and `!` operators have 
quite different precedences than other modern systems.

https://docs.oracle.com/cd/A58617_01/server.804/a58225/ch3all.htm

https://learn.microsoft.com/en-us/sql/t-sql/language-elements/operator-precedence-transact-sql?view=sql-server-ver16
https://dev.mysql.com/doc/refman/8.0/en/operator-precedence.html

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-PRECEDENCE
https://mariadb.com/kb/en/operator-precedence/

https://docs.databricks.com/en/sql/language-manual/sql-ref-functions-builtin.html#operator-precedence

### Why are the changes needed?

doc improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
 doc build


![image](https://github.com/apache/spark/assets/8326978/dd612740-dd8a-4dc9-af2c-488938f00dff)

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46757 from yaooqinn/SPARK-48426.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-operators.md | 124 ++
 2 files changed, 126 insertions(+)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 46dc4f3388cb..059a9bdc1af4 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -85,6 +85,8 @@
   url: sql-ref-datetime-pattern.html
 - text: Number Pattern
   url: sql-ref-number-pattern.html
+- text: Operators
+  url: sql-ref-operators.html
 - text: Functions
   url: sql-ref-functions.html
 - text: Identifiers
diff --git a/docs/sql-ref-operators.md b/docs/sql-ref-operators.md
new file mode 100644
index ..102e45fba8d2
--- /dev/null
+++ b/docs/sql-ref-operators.md
@@ -0,0 +1,124 @@
+---
+layout: global
+title: Operators
+displayTitle: Operators
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+An SQL operator is a symbol specifying an action that is performed on one or 
more expressions. Operators are represented by special characters or by 
keywords.
+
+### Operator Precedence
+
+When a complex expression has multiple operators, operator precedence 
determines the sequence of operations in the expression,
+e.g. in expression `1 + 2 * 3`, `*` has higher precedence than `+`, so the 
expression is evaluated as `1 + (2 * 3) = 7`.
+The order of execution can significantly affect the resulting value.
+
+Operators have the precedence levels shown in the following table.
+An operator on higher precedence is evaluated before an operator on a lower 
level.
+In the following table, the operators in descending order of precedence, 
a.k.a. 1 is the highest level.
+Operators listed on the same table cell have the same precedence and are 
evaluated from left to right or right to left based on the associativity.
+
+
+  
+
+  Precedence
+  Operator
+  Operation
+  Associativity
+
+  
+  
+
+  1
+  .[]::
+  member accesselement accesscast
+  Left to right
+
+
+  2
+  +-~
+  unary plusunary minusbitwise NOT
+  Right to left
+
+
+  3
+  */%DIV
+  multiplicationdivision, modulointegral division
+  Left to right
+
+
+  4
+  +-||
+  additionsubtractionconcatenation
+  Left to right
+
+
+  5
+  
+  bitwise shi

(spark) branch master updated: [SPARK-48436][SQL][TESTS] Use `c.m.c.j.Driver` instead of `c.m.j.Driver` in `MySQLNamespaceSuite`

2024-05-28 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 99ffcaa13b36 [SPARK-48436][SQL][TESTS] Use `c.m.c.j.Driver` instead of 
`c.m.j.Driver` in `MySQLNamespaceSuite`
99ffcaa13b36 is described below

commit 99ffcaa13b36c9ffa5582dfeec29438fa58c3e73
Author: panbingkun 
AuthorDate: Wed May 29 10:04:45 2024 +0800

[SPARK-48436][SQL][TESTS] Use `c.m.c.j.Driver` instead of `c.m.j.Driver` in 
`MySQLNamespaceSuite`

### What changes were proposed in this pull request?
The pr aims to use `com.mysql.cj.jdbc.Driver` instead of 
`com.mysql.jdbc.Driver` in `MySQLNamespaceSuite`

### Why are the changes needed?
- The full class name of `mysql driver` has changed from 
`com.mysql.jdbc.Driver` (is deprecated) to `com.mysql.cj.jdbc.Driver`,

- Eliminate warnings:
https://github.com/apache/spark/assets/15246973/8b135f30-4f89-4d10-a57a-35574e2331a9;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46773 from panbingkun/SPARK-48436.

Authored-by: panbingkun 
Signed-off-by: Kent Yao 
---
 .../test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
index d2a7aa775826..2b607fccd171 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
@@ -40,7 +40,7 @@ class MySQLNamespaceSuite extends DockerJDBCIntegrationSuite 
with V2JDBCNamespac
 
   val map = new CaseInsensitiveStringMap(
 Map("url" -> db.getJdbcUrl(dockerIp, externalPort),
-  "driver" -> "com.mysql.jdbc.Driver").asJava)
+  "driver" -> "com.mysql.cj.jdbc.Driver").asJava)
 
   catalog.initialize("mysql", map)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (f164e4ae53ca -> a78ef738af02)

2024-05-28 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f164e4ae53ca [SPARK-48425][INFRA][FOLLOWUP] Do not copy the base spark 
folder
 add a78ef738af02 [SPARK-48168][SQL][FOLLOWUP] Match expression strings of 
shift operators & functions with user inputs

No new revisions were added by this update.

Summary of changes:
 .../explain-results/function_shiftleft.explain |  2 +-
 .../explain-results/function_shiftright.explain|  2 +-
 .../function_shiftrightunsigned.explain|  2 +-
 .../grouping_and_grouping_id.explain   |  2 +-
 .../sql/catalyst/expressions/mathExpressions.scala |  8 ++--
 .../spark/sql/catalyst/parser/AstBuilder.scala |  5 ++-
 .../sql-functions/sql-expression-schema.md | 12 +++---
 .../analyzer-results/group-analytics.sql.out   | 10 ++---
 .../analyzer-results/grouping_set.sql.out  |  6 +--
 .../postgreSQL/groupingsets.sql.out| 44 +++---
 .../analyzer-results/postgreSQL/int2.sql.out   |  4 +-
 .../analyzer-results/postgreSQL/int4.sql.out   |  4 +-
 .../analyzer-results/postgreSQL/int8.sql.out   |  4 +-
 .../udf/udf-group-analytics.sql.out| 10 ++---
 .../sql-tests/results/postgreSQL/int2.sql.out  |  4 +-
 .../sql-tests/results/postgreSQL/int4.sql.out  |  4 +-
 .../sql-tests/results/postgreSQL/int8.sql.out  |  2 +-
 .../approved-plans-v1_4/q17/explain.txt|  2 +-
 .../approved-plans-v1_4/q25/explain.txt|  2 +-
 .../approved-plans-v1_4/q27.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q27/explain.txt|  2 +-
 .../approved-plans-v1_4/q29/explain.txt|  2 +-
 .../approved-plans-v1_4/q36.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q36/explain.txt|  2 +-
 .../approved-plans-v1_4/q39a/explain.txt   |  2 +-
 .../approved-plans-v1_4/q39b/explain.txt   |  2 +-
 .../approved-plans-v1_4/q49/explain.txt|  6 +--
 .../approved-plans-v1_4/q5/explain.txt |  2 +-
 .../approved-plans-v1_4/q64/explain.txt|  4 +-
 .../approved-plans-v1_4/q70.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q70/explain.txt|  2 +-
 .../approved-plans-v1_4/q72/explain.txt|  2 +-
 .../approved-plans-v1_4/q85/explain.txt|  2 +-
 .../approved-plans-v1_4/q86.sf100/explain.txt  |  2 +-
 .../approved-plans-v1_4/q86/explain.txt|  2 +-
 .../approved-plans-v2_7/q24.sf100/explain.txt  |  2 +-
 .../approved-plans-v2_7/q49/explain.txt|  6 +--
 .../approved-plans-v2_7/q5a/explain.txt|  2 +-
 .../approved-plans-v2_7/q64/explain.txt|  4 +-
 .../approved-plans-v2_7/q72/explain.txt|  2 +-
 40 files changed, 93 insertions(+), 90 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48168][SQL][FOLLOWUP] Fix bitwise shifting operator's precedence

2024-05-27 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b52645652eff [SPARK-48168][SQL][FOLLOWUP] Fix bitwise shifting 
operator's precedence
b52645652eff is described below

commit b52645652eff35345c868dc47e50b3970f3a7002
Author: Kent Yao 
AuthorDate: Mon May 27 17:55:17 2024 +0800

[SPARK-48168][SQL][FOLLOWUP] Fix bitwise shifting operator's precedence

### What changes were proposed in this pull request?

After referencing both `C`, `MySQL`'s doc,
https://en.cppreference.com/w/c/language/operator_precedence
https://dev.mysql.com/doc/refman/8.0/en/operator-precedence.html

And doing some experiments on scala-shell

```scala
scala> 1 & 2 >> 1
val res0: Int = 1

scala> 2 >> 1 << 1
val res1: Int = 2

scala> 1 << 1 + 2
val res2: Int = 8
```

The suitable precedence for `<< >> >>>` is between '+/-' and '&' with a 
left-to-right associativity.
### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

now, unreleased yet

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46753 from yaooqinn/SPARK-48168-F.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |  2 +-
 .../sql-tests/analyzer-results/bitwise.sql.out | 21 +++
 .../test/resources/sql-tests/inputs/bitwise.sql|  6 +-
 .../resources/sql-tests/results/bitwise.sql.out| 24 ++
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index f0c0adb88121..4552c17e0cf1 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -986,11 +986,11 @@ valueExpression
 | operator=(MINUS | PLUS | TILDE) valueExpression  
  #arithmeticUnary
 | left=valueExpression operator=(ASTERISK | SLASH | PERCENT | DIV) 
right=valueExpression #arithmeticBinary
 | left=valueExpression operator=(PLUS | MINUS | CONCAT_PIPE) 
right=valueExpression   #arithmeticBinary
+| left=valueExpression shiftOperator right=valueExpression 
  #shiftExpression
 | left=valueExpression operator=AMPERSAND right=valueExpression
  #arithmeticBinary
 | left=valueExpression operator=HAT right=valueExpression  
  #arithmeticBinary
 | left=valueExpression operator=PIPE right=valueExpression 
  #arithmeticBinary
 | left=valueExpression comparisonOperator right=valueExpression
  #comparison
-| left=valueExpression shiftOperator right=valueExpression 
  #shiftExpression
 ;
 
 shiftOperator
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out
index fee226c0c341..1267a984565a 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out
@@ -418,3 +418,24 @@ select cast(null as map>), 20181117 >> 2
 -- !query analysis
 Project [cast(null as map>) AS NULL#x, (20181117 >> 2) AS 
(20181117 >> 2)#x]
 +- OneRowRelation
+
+
+-- !query
+select 1 << 1 + 2 as plus_over_shift
+-- !query analysis
+Project [(1 << (1 + 2)) AS plus_over_shift#x]
++- OneRowRelation
+
+
+-- !query
+select 2 >> 1 << 1 as left_to_right
+-- !query analysis
+Project [((2 >> 1) << 1) AS left_to_right#x]
++- OneRowRelation
+
+
+-- !query
+select 1 & 2 >> 1 as shift_over_ampersand
+-- !query analysis
+Project [(1 & (2 >> 1)) AS shift_over_ampersand#x]
++- OneRowRelation
diff --git a/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql 
b/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql
index 5823b22ef645..e080fdd32a4a 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql
@@ -86,4 +86,8 @@ SELECT 20181117 <<< 2;
 SELECT 20181117 >>>> 2;
 select cast(null as array>), 20181117 >> 2;
 select cast(null as array>), 20181117 >>> 2;
-select cast(null as map>), 20181117 >> 2;
\ No newline at

(spark) branch master updated: [SPARK-48427][BUILD] Upgrade `scala-parser-combinators` to 2.4

2024-05-27 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 48a4bdb9eacb [SPARK-48427][BUILD] Upgrade `scala-parser-combinators` 
to 2.4
48a4bdb9eacb is described below

commit 48a4bdb9eacb4c7a5c56812171a9093d120b98b7
Author: yangjie01 
AuthorDate: Mon May 27 17:53:53 2024 +0800

[SPARK-48427][BUILD] Upgrade `scala-parser-combinators` to 2.4

### What changes were proposed in this pull request?
This pr aims to upgrade `scala-parser-combinators` from 2.3 to 2.4

### Why are the changes needed?
This version begins to validate the build and testing for Java 21. The full 
release note as follows:
- https://github.com/scala/scala-parser-combinators/releases/tag/v2.4.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46754 from LuciferYang/SPARK-48427.

Authored-by: yangjie01 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 61d7861f4469..10d812c9fd8a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -250,7 +250,7 @@ 
scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
 scala-compiler/2.13.14//scala-compiler-2.13.14.jar
 scala-library/2.13.14//scala-library-2.13.14.jar
 
scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar
-scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar
+scala-parser-combinators_2.13/2.4.0//scala-parser-combinators_2.13-2.4.0.jar
 scala-reflect/2.13.14//scala-reflect-2.13.14.jar
 scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar
 slf4j-api/2.0.13//slf4j-api-2.0.13.jar
diff --git a/pom.xml b/pom.xml
index eef7237ac12f..5b088db7b20b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1151,7 +1151,7 @@
   
 org.scala-lang.modules
 
scala-parser-combinators_${scala.binary.version}
-2.3.0
+2.4.0
   
   
 jline


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version

2024-05-24 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b15b6cf1f537 [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & 
Mariadb docker image version
b15b6cf1f537 is described below

commit b15b6cf1f537756eafbe8dd31a3b03dc500077f3
Author: panbingkun 
AuthorDate: Fri May 24 17:04:38 2024 +0800

[SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image 
version

### What changes were proposed in this pull request?
The pr aims to upgrade some db docker image version, include:
- `MySQL` from `8.3.0` to `8.4.0`
- `Postgres` from `10.5.12` to `10.5.25`
- `Mariadb` from `16.2-alpine` to `16.3-alpine`

### Why are the changes needed?
Tests dependencies upgrading.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46704 from panbingkun/db_images_upgrade.

Authored-by: panbingkun 
Signed-off-by: Kent Yao 
---
 .../org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala  | 6 +++---
 .../scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala | 2 +-
 .../scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala  | 6 +++---
 .../org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala | 6 +++---
 .../apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala| 6 +++---
 .../apache/spark/sql/jdbc/querytest/PostgreSQLQueryTestSuite.scala  | 6 +++---
 .../org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 +++---
 .../scala/org/apache/spark/sql/jdbc/v2/PostgresNamespaceSuite.scala | 6 +++---
 8 files changed, 22 insertions(+), 22 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
index 6825c001f767..efb2fa09f6a3 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
@@ -25,9 +25,9 @@ import 
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnecti
 import org.apache.spark.tags.DockerTest
 
 /**
- * To run this test suite for a specific version (e.g., mariadb:10.5.12):
+ * To run this test suite for a specific version (e.g., mariadb:10.5.25):
  * {{{
- *   ENABLE_DOCKER_INTEGRATION_TESTS=1 
MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.12
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 
MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.25
  * ./build/sbt -Pdocker-integration-tests
  * "docker-integration-tests/testOnly 
org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite"
  * }}}
@@ -38,7 +38,7 @@ class MariaDBKrbIntegrationSuite extends 
DockerKrbJDBCIntegrationSuite {
   override protected val keytabFileName = "mariadb.keytab"
 
   override val db = new DatabaseOnDocker {
-override val imageName = sys.env.getOrElse("MARIADB_DOCKER_IMAGE_NAME", 
"mariadb:10.5.12")
+override val imageName = sys.env.getOrElse("MARIADB_DOCKER_IMAGE_NAME", 
"mariadb:10.5.25")
 override val env = Map(
   "MYSQL_ROOT_PASSWORD" -> "rootpass"
 )
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
index 568eb5f10973..570a81ac3947 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.sql.jdbc
 
 class MySQLDatabaseOnDocker extends DatabaseOnDocker {
-  override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", 
"mysql:8.3.0")
+  override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", 
"mysql:8.4.0")
   override val env = Map(
 "MYSQL_ROOT_PASSWORD" -> "rootpass"
   )
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index 5ad4f15216b7..12a71dbd7c7f 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/

(spark) branch master updated: [SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import

2024-05-24 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3346afd4b250 [SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import
3346afd4b250 is described below

commit 3346afd4b250c3aead5a237666d4942018a463e0
Author: ulysses-you 
AuthorDate: Fri May 24 14:53:26 2024 +0800

[SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import

### What changes were proposed in this pull request?

Add DeveloperApi import

### Why are the changes needed?

Fix compile issue

### Does this PR introduce _any_ user-facing change?

Fix compile issue

### How was this patch tested?

pass CI

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46730 from ulysses-you/hot-fix.

Authored-by: ulysses-you 
Signed-off-by: Kent Yao 
---
 .../org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala
index fce20b79e113..23817be71c89 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql.execution.adaptive
 
 import scala.collection.mutable
 
-import org.apache.spark.annotation.Experimental
+import org.apache.spark.annotation.{DeveloperApi, Experimental}
 import org.apache.spark.sql.catalyst.SQLConfHelper
 
 /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48406][BUILD] Upgrade commons-cli to 1.8.0

2024-05-24 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f42ed6c76004 [SPARK-48406][BUILD] Upgrade commons-cli to 1.8.0
f42ed6c76004 is described below

commit f42ed6c760043b0213ebf0348a22dec7c0bb8244
Author: yangjie01 
AuthorDate: Fri May 24 14:23:23 2024 +0800

[SPARK-48406][BUILD] Upgrade commons-cli to 1.8.0

### What changes were proposed in this pull request?
This pr aims to upgrade Apache `commons-cli` from 1.6.0 to 1.8.0.

### Why are the changes needed?
The full release notes as follows:
- https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.0
- https://commons.apache.org/proper/commons-cli/changes-report.html#a1.8.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46727 from LuciferYang/commons-cli-180.

Authored-by: yangjie01 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 35f6103e9fa4..46c5108e4eba 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -37,7 +37,7 @@ cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
 checker-qual/3.42.0//checker-qual-3.42.0.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
 chill_2.13/0.10.0//chill_2.13-0.10.0.jar
-commons-cli/1.6.0//commons-cli-1.6.0.jar
+commons-cli/1.8.0//commons-cli-1.8.0.jar
 commons-codec/1.17.0//commons-codec-1.17.0.jar
 commons-collections/3.2.2//commons-collections-3.2.2.jar
 commons-collections4/4.4//commons-collections4-4.4.jar
diff --git a/pom.xml b/pom.xml
index ecd05ee996e1..e8d47afa1cca 100644
--- a/pom.xml
+++ b/pom.xml
@@ -210,7 +210,7 @@
 4.17.0
 3.1.0
 1.1.0
-1.6.0
+1.8.0
 1.78
 1.13.0
 6.0.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48405][BUILD] Upgrade `commons-compress` to 1.26.2

2024-05-23 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3b9b52dff614 [SPARK-48405][BUILD] Upgrade `commons-compress` to 1.26.2
3b9b52dff614 is described below

commit 3b9b52dff6149e499c59bb30641df777bd712d9b
Author: panbingkun 
AuthorDate: Fri May 24 11:52:37 2024 +0800

[SPARK-48405][BUILD] Upgrade `commons-compress` to 1.26.2

### What changes were proposed in this pull request?
The pr aims to upgrade `commons-compress` to `1.26.2`.

### Why are the changes needed?
The full release notes: 
https://commons.apache.org/proper/commons-compress/changes-report.html#a1.26.2

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46725 from panbingkun/SPARK-48405.

Authored-by: panbingkun 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 79ce883dc672..35f6103e9fa4 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -42,7 +42,7 @@ commons-codec/1.17.0//commons-codec-1.17.0.jar
 commons-collections/3.2.2//commons-collections-3.2.2.jar
 commons-collections4/4.4//commons-collections4-4.4.jar
 commons-compiler/3.1.9//commons-compiler-3.1.9.jar
-commons-compress/1.26.1//commons-compress-1.26.1.jar
+commons-compress/1.26.2//commons-compress-1.26.2.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
 commons-io/2.16.1//commons-io-2.16.1.jar
diff --git a/pom.xml b/pom.xml
index 6bbcf05b59e5..ecd05ee996e1 100644
--- a/pom.xml
+++ b/pom.xml
@@ -187,7 +187,7 @@
 1.1.10.5
 3.0.3
 1.17.0
-1.26.1
+1.26.2
 2.16.1
 
 2.6


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48399][SQL] Teradata: ByteType should map to BYTEINT instead of BYTE(binary)

2024-05-23 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6afa6cc3c16e [SPARK-48399][SQL] Teradata: ByteType should map to 
BYTEINT instead of BYTE(binary)
6afa6cc3c16e is described below

commit 6afa6cc3c16e21f94087ebb6adb01bd1ff397086
Author: Kent Yao 
AuthorDate: Fri May 24 10:13:49 2024 +0800

[SPARK-48399][SQL] Teradata: ByteType should map to BYTEINT instead of 
BYTE(binary)

### What changes were proposed in this pull request?

According to the doc of Teradata and Teradata jdbc, BYTE represents binary 
type in Teradata, while BYTEINT is used for tinyint.
- 
https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Types-and-Literals/Numeric-Data-Types/BYTEINT-Data-Type
- 
https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/frameset.html

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

Yes, ByteType used to be stored as binary type in Teradata, now it has 
become BYTEINT.
(The use-case seems rare, the migration guide or legacy config are pending 
reviewer's comments)

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46715 from yaooqinn/SPARK-48399.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../scala/org/apache/spark/sql/jdbc/TeradataDialect.scala |  1 +
 .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala  | 15 +--
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala
index 7acd22a3f10b..95a9f60b64ed 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala
@@ -42,6 +42,7 @@ private case class TeradataDialect() extends JdbcDialect {
   override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
 case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR))
 case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
+case ByteType => Option(JdbcType("BYTEINT", java.sql.Types.TINYINT))
 case _ => None
   }
 
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
index 0a792f44d3e2..e4116b565818 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
@@ -1477,16 +1477,11 @@ class JDBCSuite extends QueryTest with 
SharedSparkSession {
 }
   }
 
-  test("SPARK-15648: teradataDialect StringType data mapping") {
-val teradataDialect = JdbcDialects.get("jdbc:teradata://127.0.0.1/db")
-assert(teradataDialect.getJDBCType(StringType).
-  map(_.databaseTypeDefinition).get == "VARCHAR(255)")
-  }
-
-  test("SPARK-15648: teradataDialect BooleanType data mapping") {
-val teradataDialect = JdbcDialects.get("jdbc:teradata://127.0.0.1/db")
-assert(teradataDialect.getJDBCType(BooleanType).
-  map(_.databaseTypeDefinition).get == "CHAR(1)")
+  test("SPARK-48399: TeradataDialect jdbc data mapping") {
+val dialect = JdbcDialects.get("jdbc:teradata://127.0.0.1/db")
+assert(dialect.getJDBCType(StringType).map(_.databaseTypeDefinition).get 
== "VARCHAR(255)")
+assert(dialect.getJDBCType(BooleanType).map(_.databaseTypeDefinition).get 
== "CHAR(1)")
+assert(dialect.getJDBCType(ByteType).map(_.databaseTypeDefinition).get == 
"BYTEINT")
   }
 
   test("SPARK-38846: TeradataDialect catalyst type mapping") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48387][SQL] Postgres: Map TimestampType to TIMESTAMP WITH TIME ZONE

2024-05-22 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a48365dd98c9 [SPARK-48387][SQL] Postgres: Map TimestampType to 
TIMESTAMP WITH TIME ZONE
a48365dd98c9 is described below

commit a48365dd98c9e52b5648d1cc0af203a7290cb1dc
Author: Kent Yao 
AuthorDate: Thu May 23 10:27:16 2024 +0800

[SPARK-48387][SQL] Postgres: Map TimestampType to TIMESTAMP WITH TIME ZONE

### What changes were proposed in this pull request?

Currently, Both TimestampType/TimestampNTZType are mapped to TIMESTAMP 
WITHOUT TIME ZONE for writing while being differentiated for reading.

In this PR, we map TimestampType to TIMESTAMP WITH TIME ZONE to 
differentiate TimestampType/TimestampNTZType for writing against Postgres.

### Why are the changes needed?

TimestampType <-> TIMESTAMP WITHOUT TIME ZONE is incorrect and ambiguous 
with TimestampNTZType

### Does this PR introduce _any_ user-facing change?

Yes
migration guide and legacy configuration provided

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46701 from yaooqinn/SPARK-48387.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 46 ++
 docs/sql-data-sources-jdbc.md  |  4 +-
 docs/sql-migration-guide.md|  3 +-
 .../org/apache/spark/sql/internal/SQLConf.scala| 14 +++
 .../apache/spark/sql/jdbc/PostgresDialect.scala|  6 ++-
 5 files changed, 68 insertions(+), 5 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index dd6f1bfd3b3f..5ad4f15216b7 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -27,6 +27,7 @@ import org.apache.spark.SparkException
 import org.apache.spark.sql.{Column, DataFrame, Row}
 import org.apache.spark.sql.catalyst.expressions.Literal
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.tags.DockerTest
 
@@ -583,4 +584,49 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   assert(cause.getSQLState === "22003")
 }
   }
+
+  test("SPARK-48387: Timestamp write as timestamp with time zone") {
+val df = spark.sql("select TIMESTAMP '2018-11-17 13:33:33' as col0")
+// write timestamps for preparation
+withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> 
"false") {
+  // write timestamp as timestamp with time zone
+  df.write.jdbc(jdbcUrl, "ts_with_timezone_copy_false", new Properties)
+}
+withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> 
"true") {
+  // write timestamp as timestamp without time zone
+  df.write.jdbc(jdbcUrl, "ts_with_timezone_copy_true", new Properties)
+}
+
+// read timestamps for test
+withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> 
"true") {
+  val df1 = spark.read.option("preferTimestampNTZ", false)
+.jdbc(jdbcUrl, "ts_with_timezone_copy_false", new Properties)
+  checkAnswer(df1, Row(Timestamp.valueOf("2018-11-17 13:33:33")))
+  val df2 = spark.read.option("preferTimestampNTZ", true)
+.jdbc(jdbcUrl, "ts_with_timezone_copy_false", new Properties)
+  checkAnswer(df2, Row(LocalDateTime.of(2018, 11, 17, 13, 33, 33)))
+
+  val df3 = spark.read.option("preferTimestampNTZ", false)
+.jdbc(jdbcUrl, "ts_with_timezone_copy_true", new Properties)
+  checkAnswer(df3, Row(Timestamp.valueOf("2018-11-17 13:33:33")))
+  val df4 = spark.read.option("preferTimestampNTZ", true)
+.jdbc(jdbcUrl, "ts_with_timezone_copy_true", new Properties)
+  checkAnswer(df4, Row(LocalDateTime.of(2018, 11, 17, 13, 33, 33)))
+}
+withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> 
"false") {
+  Seq("true", "false").foreach { prefer =>
+val prop = new Properties
+prop.setProperty("preferTimestampNTZ", prefer)
+val dfCopy = spark.read.jdbc(jdbcUrl, "ts_with_timezone_copy_false", 
prop)
+checkA

(spark) branch master updated (f4958ba9587c -> bf7f664296c5)

2024-05-21 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f4958ba9587c [SPARK-48367][CONNECT][FOLLOWUP] Replace keywords that 
identify `lint-scala` detection results
 add bf7f664296c5 [SPARK-47515][SPARK-47406][SQL][FOLLOWUP] Add legacy 
config spark.sql.legacy.mysql.timestampNTZMapping.enabled

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md |  3 ++-
 .../org/apache/spark/sql/internal/SQLConf.scala | 14 ++
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala|  5 +++--
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 21 +
 4 files changed, 40 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48365][DOCS] DB2: Document Mapping Spark SQL Data Types to DB2

2024-05-21 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 664c8c19dae7 [SPARK-48365][DOCS] DB2: Document Mapping Spark SQL Data 
Types to DB2
664c8c19dae7 is described below

commit 664c8c19dae7ca23dc9142133471d96501093bed
Author: Kent Yao 
AuthorDate: Tue May 21 17:16:21 2024 +0800

[SPARK-48365][DOCS] DB2: Document Mapping Spark SQL Data Types to DB2

### What changes were proposed in this pull request?

In this PR, we document the mapping rules for Spark SQL Data Types to DB2 
ones

### Why are the changes needed?

doc improvement
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

doc build

![image](https://github.com/apache/spark/assets/8326978/40092f80-1392-48a0-96e9-8ef9cf9516e2)

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46677 from yaooqinn/SPARK-48365.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 docs/sql-data-sources-jdbc.md | 106 ++
 1 file changed, 106 insertions(+)

diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index 0c929fece679..54a8506bff51 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -1885,3 +1885,109 @@ as the activated JDBC Driver.
 
   
 
+
+### Mapping Spark SQL Data Types to DB2
+
+The below table describes the data type conversions from Spark SQL Data Types 
to DB2 data types,
+when creating, altering, or writing data to a DB2 table using the built-in 
jdbc data source with
+the [IBM Data Server Driver For JDBC and 
SQLJ](https://mvnrepository.com/artifact/com.ibm.db2/jcc) as the activated JDBC 
Driver.
+
+
+  
+
+  Spark SQL Data Type
+  DB2 Data Type
+  Remarks
+
+  
+  
+
+  BooleanType
+  BOOLEAN
+  
+
+
+  ByteType
+  SMALLINT
+  
+
+
+  ShortType
+  SMALLINT
+  
+
+
+  IntegerType
+  INTEGER
+  
+
+
+  LongType
+  BIGINT
+  
+
+
+  FloatType
+  REAL
+  
+
+
+  DoubleType
+  DOUBLE PRECISION
+  
+
+
+  DecimalType(p, s)
+  DECIMAL(p,s)
+  The maximum value for 'p' is 31 in DB2, while it is 38 in Spark. It 
might fail when storing DecimalType(p>=32, s) to DB2
+
+
+  DateType
+  DATE
+  
+
+
+  TimestampType
+  TIMESTAMP
+  
+
+
+  TimestampNTZType
+  TIMESTAMP
+  
+
+
+  StringType
+  CLOB
+  
+
+
+  BinaryType
+  BLOB
+  
+
+
+  CharType(n)
+  CHAR(n)
+  The maximum value for 'n' is 255 in DB2, while it is unlimited in 
Spark.
+
+
+  VarcharType(n)
+  VARCHAR(n)
+  The maximum value for 'n' is 255 in DB2, while it is unlimited in 
Spark.
+
+  
+
+
+The Spark Catalyst data types below are not supported with suitable DB2 types.
+
+- DayTimeIntervalType
+- YearMonthIntervalType
+- CalendarIntervalType
+- ArrayType
+- MapType
+- StructType
+- UserDefinedType
+- NullType
+- ObjectType
+- VariantType


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48300][SQL] Codegen Support for `from_xml`

2024-05-20 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6213fa661ffe [SPARK-48300][SQL] Codegen Support for `from_xml`
6213fa661ffe is described below

commit 6213fa661ffeff073f3f1a6253f7039a45f284c7
Author: panbingkun 
AuthorDate: Tue May 21 13:47:42 2024 +0800

[SPARK-48300][SQL] Codegen Support for `from_xml`

### What changes were proposed in this pull request?
The PR aims to add `Codegen Support` for `from_xml`

### Why are the changes needed?
- Improve codegen coverage.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Add new UT & existed UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46609 from panbingkun/from_xml_codegen.

Lead-authored-by: panbingkun 
Co-authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../apache/spark/sql/catalyst/expressions/xmlExpressions.scala |  6 +-
 .../test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala| 10 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
index 564a6fce1b80..48a87db291a8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
@@ -58,7 +58,6 @@ case class XmlToStructs(
 timeZoneId: Option[String] = None)
   extends UnaryExpression
   with TimeZoneAwareExpression
-  with CodegenFallback
   with ExpectsInputTypes
   with NullIntolerant
   with QueryErrorsBase {
@@ -120,6 +119,11 @@ case class XmlToStructs(
   override def nullSafeEval(xml: Any): Any =
 converter(parser.parse(xml.asInstanceOf[UTF8String].toString))
 
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val expr = ctx.addReferenceObj("this", this)
+defineCodeGen(ctx, ev, input => s"(InternalRow) 
$expr.nullSafeEval($input)")
+  }
+
   override def inputTypes: Seq[AbstractDataType] = StringTypeAnyCollation :: 
Nil
 
   override def prettyName: String = "from_xml"
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala
index bc910d7f30fb..1364fab3138e 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala
@@ -40,6 +40,16 @@ class XmlFunctionsSuite extends QueryTest with 
SharedSparkSession {
   Row(Row(1)) :: Nil)
   }
 
+  test("SPARK-48300: from_xml - Codegen Support") {
+withTempView("XmlToStructsTable") {
+  val dataDF = Seq("""1""").toDF("value")
+  dataDF.createOrReplaceTempView("XmlToStructsTable")
+  val df = sql("SELECT from_xml(value, 'a INT') FROM XmlToStructsTable")
+  
assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec])
+  checkAnswer(df, Row(Row(1)) :: Nil)
+}
+  }
+
   test("from_xml with option (timestampFormat)") {
 val df = Seq("""26/08/2015 18:00""").toDS()
 val schema = new StructType().add("time", TimestampType)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset)

2024-05-20 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e0134a9a48d [MINOR][DOCS] correct the doc error in configuration page 
(fix rest to reset)
0e0134a9a48d is described below

commit 0e0134a9a48d3f58e81d26d01637dca6f2b05a92
Author: NOTHING 
AuthorDate: Tue May 21 13:46:29 2024 +0800

[MINOR][DOCS] correct the doc error in configuration page (fix rest to 
reset)

### What changes were proposed in this pull request?
1. correct the doc error in spark-docs's  `configuration` page, it should 
be ```reset to their initial values by RESET command```, not ```rest to their 
initial values by RESET command```

### Why are the changes needed?
1. correct the doc error to make the doc clearer

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
no need to test, just spell a word incorrectly

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46663 from Justontheway/patch-1.

Authored-by: NOTHING 
Signed-off-by: Kent Yao 
---
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index cb1fb6fba958..ecd9cd75487f 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3396,7 +3396,7 @@ Spark subsystems.
 
 Runtime SQL configurations are per-session, mutable Spark SQL configurations. 
They can be set with initial values by the config file
 and command-line options with `--conf/-c` prefixed, or by setting `SparkConf` 
that are used to create `SparkSession`.
-Also, they can be set and queried by SET commands and rest to their initial 
values by RESET command,
+Also, they can be set and queried by SET commands and reset to their initial 
values by RESET command,
 or by `SparkSession.conf`'s setter and getter methods in runtime.
 
 {% include_api_gen generated-runtime-sql-config-table.html %}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (839efb1d72f6 -> 2a1bdc3eda8a)

2024-05-20 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 839efb1d72f6 [SPARK-48363][SQL] Cleanup some redundant codes in 
`from_xml`
 add 2a1bdc3eda8a [SPARK-48337][SQL] Fix precision loss for JDBC TIME values

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 31 +-
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 13 +
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 23 +++-
 3 files changed, 29 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48332][BUILD][TESTS] Upgrade `jdbc` related test dependencies

2024-05-20 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6fcdaab27ae9 [SPARK-48332][BUILD][TESTS] Upgrade `jdbc` related test 
dependencies
6fcdaab27ae9 is described below

commit 6fcdaab27ae900ee120e80c75bafe243a7e80765
Author: panbingkun 
AuthorDate: Mon May 20 18:53:17 2024 +0800

[SPARK-48332][BUILD][TESTS] Upgrade `jdbc` related test dependencies

### What changes were proposed in this pull request?
The pr aims to upgrade `jdbc` related test dependencies, include:
- com.mysql:mysql-connector-j from `8.3.0` to `8.4.0`
- com.oracle.database.jdbc:ojdbc11 from `23.3.0.23.09` to `23.4.0.24.05`

### Why are the changes needed?
- com.mysql:mysql-connector-j from, release notes:
https://dev.mysql.com/doc/relnotes/connector-j/en/news-8-4-0.html

-  com.oracle.database.jdbc:ojdbc11, release notes:

https://download.oracle.com/otn-pub/otn_software/jdbc/23c/JDBC-UCP-ReleaseNotes-23ai.txt?AuthParam=1716161887_dbf7a096828486d544bee00a2383f42a

===
Known Problems Fixed in the Patch Release 23.4.0.24.05

===
Bug 36279736 - CONNECTION TO A PROXY USER FAILS WITH INVALID 
USERNAME/PASSWORD WHEN WALLET IS PROVIDED
Bug 36187019 - LOB PROCESSING FAIL WHEN DB SET WITH EL8ISO8859P7 CHARACTER 
SET
Bug 36152805 - GET CONNECTION AGAINST NORMAL PDB WITH TFO=ON SHOULD FAIL 
WITH ORA-18739

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46653 from panbingkun/jdbc_driver_upgrade.

Authored-by: panbingkun 
Signed-off-by: Kent Yao 
---
 pom.xml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pom.xml b/pom.xml
index 5811e5b7716d..d92d210a5ffc 100644
--- a/pom.xml
+++ b/pom.xml
@@ -323,11 +323,11 @@
   -Dio.netty.tryReflectionSetAccessible=true
 
 2.7.12
-8.3.0
+8.4.0
 42.7.3
 11.5.9.0
 12.6.1.jre11
-23.3.0.23.09
+23.4.0.24.05
   
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48323][SQL] DB2: Map BooleanType to BOOLEAN instead of CHAR(1)

2024-05-20 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a888f725315 [SPARK-48323][SQL] DB2: Map BooleanType to BOOLEAN 
instead of CHAR(1)
3a888f725315 is described below

commit 3a888f7253155bdee52439bafbdf2b04fe2f186a
Author: Kent Yao 
AuthorDate: Mon May 20 18:52:07 2024 +0800

[SPARK-48323][SQL] DB2: Map BooleanType to BOOLEAN instead of CHAR(1)

### What changes were proposed in this pull request?

This PR maps BooleanType to BOOLEAN instead of CHAR(1) when writing DB2 
tables, users can restore the old behavior by setting 
spark.sql.legacy.db2.booleanTypeMapping.enabled to true

### Why are the changes needed?

DB2 has supported boolean since v9.7, which is already EOL. It's reasonable 
to BooleanType to BOOLEAN

### Does this PR introduce _any_ user-facing change?

yes, spark.sql.legacy.db2.booleanTypeMapping.enabled is provided to restore
### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46637 from yaooqinn/SPARK-48323.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../apache/spark/sql/jdbc/DB2IntegrationSuite.scala| 18 ++
 docs/sql-migration-guide.md|  1 +
 .../scala/org/apache/spark/sql/internal/SQLConf.scala  | 11 +++
 .../scala/org/apache/spark/sql/jdbc/DB2Dialect.scala   | 10 ++
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala|  2 +-
 5 files changed, 33 insertions(+), 9 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
index dbf3eae5e655..72b2ac8074f4 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.catalyst.util.CharVarcharUtils
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
+import org.apache.spark.sql.types.{ByteType, ShortType, StructType}
 import org.apache.spark.tags.DockerTest
 
 /**
@@ -174,13 +174,12 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationSuite {
 df3.write.jdbc(jdbcUrl, "stringscopy", new Properties)
 // spark types that does not have exact matching db2 table types.
 val df4 = sqlContext.createDataFrame(
-  sparkContext.parallelize(Seq(Row("1".toShort, "20".toByte, true))),
-  new StructType().add("c1", ShortType).add("b", ByteType).add("c3", 
BooleanType))
+  sparkContext.parallelize(Seq(Row("1".toShort, "20".toByte))),
+  new StructType().add("c1", ShortType).add("b", ByteType))
 df4.write.jdbc(jdbcUrl, "otherscopy", new Properties)
 val rows = sqlContext.read.jdbc(jdbcUrl, "otherscopy", new 
Properties).collect()
 assert(rows(0).getShort(0) == 1)
 assert(rows(0).getShort(1) == 20)
-assert(rows(0).getString(2) == "1")
   }
 
   test("query JDBC option") {
@@ -252,6 +251,17 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationSuite {
   test("SPARK-48269: boolean type") {
 val df = sqlContext.read.jdbc(jdbcUrl, "booleans", new Properties)
 checkAnswer(df, Row(true))
+Seq(true, false).foreach { legacy =>
+  withSQLConf(SQLConf.LEGACY_DB2_BOOLEAN_MAPPING_ENABLED.key -> 
legacy.toString) {
+val tbl = "booleanscopy" + legacy
+df.write.jdbc(jdbcUrl, tbl, new Properties)
+if (legacy) {
+  checkAnswer(sqlContext.read.jdbc(jdbcUrl, tbl, new Properties), 
Row("1"))
+} else {
+  checkAnswer(sqlContext.read.jdbc(jdbcUrl, tbl, new Properties), 
Row(true))
+}
+  }
+}
   }
 
   test("SPARK-48269: GRAPHIC types") {
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 02a4fae5d262..98075d019585 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -51,6 +51,7 @@ license: |
 - Since Spark 4.0, MsSQL Server JDBC datasource will read TINYINT as 
ShortType, while in Spark 3.5 and previous, read as IntegerType. To restore the 
previous behavior, set `spark.sql.legacy.mssqlserver.numericMapping.enabled` to 
`true`.
 - Since Spark 4.0, MsSQL Server JDBC datasource will read DATETIMEOFF

(spark) branch master updated (b0e535217bf8 -> 403619a3974c)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b0e535217bf8 [SPARK-48301][SQL][FOLLOWUP] Update the error message
 add 403619a3974c [SPARK-48306][SQL] Improve UDT in error message

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala   |  2 +-
 .../scala/org/apache/spark/sql/errors/DataTypeErrorsBase.scala |  3 ++-
 .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala  | 10 +-
 .../main/scala/org/apache/spark/sql/hive/HiveInspectors.scala  |  5 +++--
 .../sql/hive/execution/HiveScriptTransformationSuite.scala |  9 -
 .../org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala |  4 ++--
 6 files changed, 17 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (3bd845ea930a -> fa83d0f8fce7)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3bd845ea930a [SPARK-48297][SQL] Fix a regression TRANSFORM clause with 
char/varchar
 add fa83d0f8fce7 [SPARK-48296][SQL] Codegen Support for `to_xml`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/xmlExpressions.scala | 11 ++-
 .../org/apache/spark/sql/XmlFunctionsSuite.scala  | 19 ++-
 2 files changed, 24 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c1dd4a5df693 [SPARK-48297][SQL] Fix a regression TRANSFORM clause with 
char/varchar
c1dd4a5df693 is described below

commit c1dd4a5df69340884f3f0f0c28ce916bf9e30159
Author: Kent Yao 
AuthorDate: Thu May 16 17:29:47 2024 +0800

[SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar

### What changes were proposed in this pull request?

TRANSFORM with char/varchar has been accidentally invalidated since 3.1 
with a scala.MatchError, this PR fixes it

### Why are the changes needed?

bugfix
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46603 from yaooqinn/SPARK-48297.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 3bd845ea930a4709b7a2f0447b5f8af64c697239)
Signed-off-by: Kent Yao 
---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala |  4 +++-
 .../resources/sql-tests/analyzer-results/transform.sql.out| 11 +++
 sql/core/src/test/resources/sql-tests/inputs/transform.sql|  6 +-
 .../src/test/resources/sql-tests/results/transform.sql.out| 10 ++
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 5d68aed9245a..f38d41af445e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -787,7 +787,9 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 // Create the attributes.
 val (attributes, schemaLess) = if (transformClause.colTypeList != null) {
   // Typed return columns.
-  (DataTypeUtils.toAttributes(createSchema(transformClause.colTypeList)), 
false)
+  val schema = createSchema(transformClause.colTypeList)
+  val replacedSchema = 
CharVarcharUtils.replaceCharVarcharWithStringInSchema(schema)
+  (DataTypeUtils.toAttributes(replacedSchema), false)
 } else if (transformClause.identifierSeq != null) {
   // Untyped return columns.
   val attrs = visitIdentifierSeq(transformClause.identifierSeq).map { name 
=>
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out
index ceca433a1c91..aa595c551f79 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out
@@ -1035,3 +1035,14 @@ ScriptTransformation cat, [a#x, b#x], 
ScriptInputOutputSchema(List(),List(),None
 +- Project [a#x, b#x]
+- SubqueryAlias complex_trans
   +- LocalRelation [a#x, b#x]
+
+
+-- !query
+SELECT TRANSFORM (a, b)
+  USING 'cat' AS (a CHAR(10), b VARCHAR(10))
+FROM VALUES('apache', 'spark') t(a, b)
+-- !query analysis
+ScriptTransformation cat, [a#x, b#x], 
ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
++- Project [a#x, b#x]
+   +- SubqueryAlias t
+  +- LocalRelation [a#x, b#x]
diff --git a/sql/core/src/test/resources/sql-tests/inputs/transform.sql 
b/sql/core/src/test/resources/sql-tests/inputs/transform.sql
index 922a1d817778..8570496d439e 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/transform.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/transform.sql
@@ -415,4 +415,8 @@ FROM (
   ORDER BY a
 ) map_output
 SELECT TRANSFORM(a, b)
-  USING 'cat' AS (a, b);
\ No newline at end of file
+  USING 'cat' AS (a, b);
+
+SELECT TRANSFORM (a, b)
+  USING 'cat' AS (a CHAR(10), b VARCHAR(10))
+FROM VALUES('apache', 'spark') t(a, b);
diff --git a/sql/core/src/test/resources/sql-tests/results/transform.sql.out 
b/sql/core/src/test/resources/sql-tests/results/transform.sql.out
index ab726b93c07c..7975392fd014 100644
--- a/sql/core/src/test/resources/sql-tests/results/transform.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/transform.sql.out
@@ -837,3 +837,13 @@ struct
 3  3
 3  3
 3  3
+
+
+-- !query
+SELECT TRANSFORM (a, b)
+  USING 'cat' AS (a CHAR(10), b VARCHAR(10))
+FROM VALUES('apache', 'spark') t(a, b)
+-- !query schema
+struct
+-- !query output
+apache spark


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-m

(spark) branch master updated (b53d78e94f6e -> 3bd845ea930a)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b53d78e94f6e [SPARK-48036][DOCS][FOLLOWUP] Update 
sql-ref-ansi-compliance.md
 add 3bd845ea930a [SPARK-48297][SQL] Fix a regression TRANSFORM clause with 
char/varchar

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala |  4 +++-
 .../resources/sql-tests/analyzer-results/transform.sql.out| 11 +++
 sql/core/src/test/resources/sql-tests/inputs/transform.sql|  6 +-
 .../src/test/resources/sql-tests/results/transform.sql.out| 10 ++
 4 files changed, 29 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (0ba8ddc9ce5b -> b53d78e94f6e)

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0ba8ddc9ce5b [SPARK-48293][SS] Add test for when 
ForeachBatchUserFuncException wraps interrupted exception due to query stop
 add b53d78e94f6e [SPARK-48036][DOCS][FOLLOWUP] Update 
sql-ref-ansi-compliance.md

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48264][BUILD] Upgrade `datasketches-java` to 6.0.0

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 97717363abae [SPARK-48264][BUILD] Upgrade `datasketches-java` to 6.0.0
97717363abae is described below

commit 97717363abae0526f4a6f8c577f539da2d4ea314
Author: panbingkun 
AuthorDate: Thu May 16 14:14:36 2024 +0800

[SPARK-48264][BUILD] Upgrade `datasketches-java` to 6.0.0

### What changes were proposed in this pull request?
The pr aims to upgrade `datasketches-java` from `5.0.1` to `6.0.0`

### Why are the changes needed?
The full release notes:
- https://github.com/apache/datasketches-java/releases/tag/6.0.0
- https://github.com/apache/datasketches-java/releases/tag/5.0.2
  https://github.com/apache/spark/assets/15246973/fff5905a-25e8-4e2f-9492-1b6099b2bd05;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46563 from panbingkun/SPARK-48264.

Authored-by: panbingkun 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 1bd135e05b58..4b6f5dda585b 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -58,7 +58,7 @@ curator-recipes/5.6.0//curator-recipes-5.6.0.jar
 datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
 datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
 datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
-datasketches-java/5.0.1//datasketches-java-5.0.1.jar
+datasketches-java/6.0.0//datasketches-java-6.0.0.jar
 datasketches-memory/2.2.0//datasketches-memory-2.2.0.jar
 derby/10.16.1.1//derby-10.16.1.1.jar
 derbyshared/10.16.1.1//derbyshared-10.16.1.1.jar
diff --git a/pom.xml b/pom.xml
index da9f878b33b8..611e82f343d8 100644
--- a/pom.xml
+++ b/pom.xml
@@ -213,7 +213,7 @@
 1.6.0
 1.78
 1.13.0
-5.0.1
+6.0.0
 4.1.109.Final
 2.0.65.Final
 72.1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47607] Add documentation for Structured logging framework

2024-05-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9130f78fb12e [SPARK-47607] Add documentation for Structured logging 
framework
9130f78fb12e is described below

commit 9130f78fb12eed94f48e1fd9ccedb6fe651a4440
Author: Gengliang Wang 
AuthorDate: Thu May 16 14:13:13 2024 +0800

[SPARK-47607] Add documentation for Structured logging framework

### What changes were proposed in this pull request?

Add documentation for Structured logging framework

### Why are the changes needed?

Provide document for Spark developers

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Doc preview:
https://github.com/apache/spark/assets/1097932/d3c4fcdc-57e4-4af2-8b05-6b4f6731a8c0;>

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46605 from gengliangwang/updateGuideline.

Authored-by: Gengliang Wang 
Signed-off-by: Kent Yao 
---
 .../main/scala/org/apache/spark/internal/README.md | 33 ++
 1 file changed, 33 insertions(+)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/README.md 
b/common/utils/src/main/scala/org/apache/spark/internal/README.md
index c0190b965834..28d279485187 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/README.md
+++ b/common/utils/src/main/scala/org/apache/spark/internal/README.md
@@ -1,5 +1,38 @@
 # Guidelines for the Structured Logging Framework
 
+## Scala Logging
+Use the `org.apache.spark.internal.Logging` trait for logging in Scala code:
+* **Logging Messages with Variables**: When logging a message with variables, 
wrap all the variables with `MDC`s and they will be automatically added to the 
Mapped Diagnostic Context (MDC). This allows for structured logging and better 
log analysis.
+```scala
+logInfo(log"Trying to recover app: ${MDC(LogKeys.APP_ID, app.id)}")
+```
+* **Constant String Messages**: If you are logging a constant string message, 
use the log methods that accept a constant string.
+```scala
+logInfo("StateStore stopped")
+```
+
+## Java Logging
+Use the `org.apache.spark.internal.SparkLoggerFactory` to get the logger 
instance in Java code:
+* **Getting Logger Instance**: Instead of using `org.slf4j.LoggerFactory`, use 
`org.apache.spark.internal.SparkLoggerFactory` to ensure structured logging.
+```java
+import org.apache.spark.internal.SparkLogger;
+import org.apache.spark.internal.SparkLoggerFactory;
+
+private static final SparkLogger logger = 
SparkLoggerFactory.getLogger(JavaUtils.class);
+```
+* **Logging Messages with Variables**: When logging messages with variables, 
wrap all the variables with `MDC`s and they will be automatically added to the 
Mapped Diagnostic Context (MDC).
+```java
+import org.apache.spark.internal.LogKeys;
+import org.apache.spark.internal.MDC;
+
+logger.error("Unable to delete file for partition {}", 
MDC.of(LogKeys.PARTITION_ID$.MODULE$, i));
+```
+
+* **Constant String Messages**: For logging constant string messages, use the 
standard logging methods.
+```java
+logger.error("Failed to abort the writer after failing to write map output.", 
e);
+```
+
 ## LogKey
 
 `LogKey`s serve as identifiers for mapped diagnostic contexts (MDC) within 
logs. Follow these guidelines when adding a new LogKey:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (dec910ba3c36 -> 726f2c95d4dc)

2024-05-15 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from dec910ba3c36 [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & 
`org.slf4j.LoggerFactory`
 add 726f2c95d4dc [SPARK-48299][BUILD] Upgrade `scala-maven-plugin` to 4.9.1

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48219][CORE] StreamReader Charset fix with UTF8

2024-05-15 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e8322150a05 [SPARK-48219][CORE] StreamReader Charset fix with UTF8
5e8322150a05 is described below

commit 5e8322150a050ad4d0c3962d62c9a2b3e9a937c1
Author: xuyu <11161...@vivo.com>
AuthorDate: Thu May 16 12:11:44 2024 +0800

[SPARK-48219][CORE] StreamReader Charset fix with UTF8

### What changes were proposed in this pull request?
Fix some StreamReader not set with UTF8，if we actually default charset not 
support Chinese chars such as latin and conf contain Chinese chars，it would not 
resolve success，so we need set it as utf8 in StreamReader，we can find all 
StreamReader with utf8 charset in other compute framework，such as 
Calcite、Hive、Hudi and so on.

### Why are the changes needed?
May cause string decode not as expected

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Not need

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46509 from xuzifu666/SPARK-48219.

Authored-by: xuyu <11161...@vivo.com>
Signed-off-by: Kent Yao 
---
 .../main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
index 410d010a79bd..4b55453ec7a8 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
@@ -22,6 +22,7 @@ import java.io.File;
 import java.io.FileInputStream;
 import java.io.IOException;
 import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
@@ -171,7 +172,7 @@ public class HiveSessionImpl implements HiveSession {
   FileInputStream initStream = null;
   BufferedReader bufferedReader = null;
   initStream = new FileInputStream(fileName);
-  bufferedReader = new BufferedReader(new InputStreamReader(initStream));
+  bufferedReader = new BufferedReader(new InputStreamReader(initStream, 
StandardCharsets.UTF_8));
   return bufferedReader;
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48289][DOCKER][TEST] Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset

2024-05-15 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ffaa2e89a8a [SPARK-48289][DOCKER][TEST] Clean up Oracle JDBC tests by 
skipping redundant SYSTEM password reset
4ffaa2e89a8a is described below

commit 4ffaa2e89a8a777a374b7f5b22166ef9bac8b99f
Author: Kent Yao 
AuthorDate: Thu May 16 10:09:15 2024 +0800

[SPARK-48289][DOCKER][TEST] Clean up Oracle JDBC tests by skipping 
redundant SYSTEM password reset

### What changes were proposed in this pull request?
This pull request improves the Oracle JDBC tests by skipping the redundant 
SYSTEM password reset.

### Why are the changes needed?
These changes are necessary to clean up the Oracle JDBC tests.

This pull request effectively reverts the modifications introduced in 
[SPARK-46592](https://issues.apache.org/jira/browse/SPARK-46592) and [PR 
#44594](https://github.com/apache/spark/pull/44594), which attempted to work 
around the sporadic occurrence of ORA-65048 and ORA-04021 errors by setting the 
Oracle parameter DDL_LOCK_TIMEOUT.

As discussed in [issue 
#35](https://github.com/gvenzl/oci-oracle-free/issues/35), setting 
DDL_LOCK_TIMEOUT did not resolve the issue. The root cause appears to be an 
Oracle bug or unwanted behavior related to the use of Pluggable Database (PDB) 
rather than the expected functionality of Oracle itself.

Additionally, with 
[SPARK-48141](https://issues.apache.org/jira/browse/SPARK-48141), we have 
upgraded the Oracle version used in the tests to Oracle Free 23ai, version 
23.4. This upgrade should help address some of the issues observed with the 
previous Oracle version.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
This patch was tested using the existing test suite, with a particular 
focus on Oracle JDBC tests. The following steps were executed:
```
export ENABLE_DOCKER_INTEGRATION_TESTS=1
./build/sbt -Pdocker-integration-tests "docker-integration-tests/testOnly 
org.apache.spark.sql.jdbc.OracleIntegrationSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46598 from LucaCanali/fixOracleIntegrationTests.

Lead-authored-by: Kent Yao 
Co-authored-by: Luca Canali 
Signed-off-by: Kent Yao 
---
 .../spark/sql/jdbc/OracleDatabaseOnDocker.scala| 31 --
 .../spark/sql/jdbc/OracleIntegrationSuite.scala|  8 +++---
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 10 +++
 .../spark/sql/jdbc/v2/OracleNamespaceSuite.scala   |  8 +++---
 4 files changed, 13 insertions(+), 44 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala
index 88bb23f9c653..dd6bbf0af8a3 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala
@@ -17,12 +17,7 @@
 
 package org.apache.spark.sql.jdbc
 
-import java.io.{File, PrintWriter}
-
-import com.github.dockerjava.api.model._
-
 import org.apache.spark.internal.Logging
-import org.apache.spark.util.Utils
 
 class OracleDatabaseOnDocker extends DatabaseOnDocker with Logging {
   lazy override val imageName =
@@ -38,30 +33,4 @@ class OracleDatabaseOnDocker extends DatabaseOnDocker with 
Logging {
   override def getJdbcUrl(ip: String, port: Int): String = {
 s"jdbc:oracle:thin:system/$oracle_password@//$ip:$port/freepdb1"
   }
-
-  override def beforeContainerStart(
-  hostConfigBuilder: HostConfig,
-  containerConfigBuilder: ContainerConfig): Unit = {
-try {
-  val dir = Utils.createTempDir()
-  val writer = new PrintWriter(new File(dir, "install.sql"))
-  // SPARK-46592: gvenzl/oracle-free occasionally fails to start with the 
following error:
-  // 'ORA-04021: timeout occurred while waiting to lock object', when 
initializing the
-  // SYSTEM user. This is due to the fact that the default 
DDL_LOCK_TIMEOUT is 0, which
-  // means that the lock will no wait. We set the timeout to 30 seconds to 
try again.
-  // TODO: This workaround should be removed once the issue is fixed in 
the image.
-  // https://github.com/gvenzl/oci-oracle-free/issues/35
-  writer.write("ALTER SESSION SET DDL_LOCK_TIMEOUT = 30;\n")
-  writer.write(s"""ALTER USER SYSTEM IDENTIFIED BY "$oracle_password";""")
-  writer.close()
-  val newBind = new Bind(
-dir.getAbsolutePath,

(spark) branch master updated: [SPARK-46759][AVRO][FOLLOWUP] Fix configuration name for spark.sql.avro.xz.level

2024-05-15 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d0385c4a99c1 [SPARK-46759][AVRO][FOLLOWUP] Fix configuration name for 
spark.sql.avro.xz.level
d0385c4a99c1 is described below

commit d0385c4a99c172fa3e1ba2d72a65c8632b5c72a9
Author: Kent Yao 
AuthorDate: Wed May 15 16:48:55 2024 +0800

[SPARK-46759][AVRO][FOLLOWUP] Fix configuration name for 
spark.sql.avro.xz.level

### What changes were proposed in this pull request?

`spark.sql.avro.xz.level` is wrongly defined as `spark.sql.avro.zx.level`

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

no, it is not exposed via releases
### How was this patch tested?

manually

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46590 from yaooqinn/SPARK-46759-F.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 9edef5a1f3ca..afae4ebb5395 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3815,7 +3815,7 @@ object SQLConf {
 .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION)
 .createOptional
 
-  val AVRO_XZ_LEVEL = buildConf("spark.sql.avro.zx.level")
+  val AVRO_XZ_LEVEL = buildConf("spark.sql.avro.xz.level")
 .doc("Compression level for the xz codec used in writing of AVRO files. " +
   "Valid value must be in the range of from 1 to 9 inclusive " +
   "The default value is 6.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48271][SQL] Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER

2024-05-15 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ad5fcae0b0ed [SPARK-48271][SQL] Turn match error in RowEncoder into 
UNSUPPORTED_DATA_TYPE_FOR_ENCODER
ad5fcae0b0ed is described below

commit ad5fcae0b0ed41f7e97ab419b32068e5adf71064
Author: Wenchen Fan 
AuthorDate: Wed May 15 16:45:20 2024 +0800

[SPARK-48271][SQL] Turn match error in RowEncoder into 
UNSUPPORTED_DATA_TYPE_FOR_ENCODER

### What changes were proposed in this pull request?

Today we can't create `RowEncoder` with char/varchar data type, because we 
believe this can't happen. Spark will turn char/varchar into string type in 
leaf nodes. However, advanced users can even create custom logical plans and 
it's hard to guarantee no char/varchar data type in the entire query plan tree. 
UDF return type can also be char/varchar.

This PR adds UNSUPPORTED_DATA_TYPE_FOR_ENCODER instead of throwing scala 
match error.

### Why are the changes needed?

better error

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46586 from cloud-fan/error.

Authored-by: Wenchen Fan 
Signed-off-by: Kent Yao 
---
 .../src/main/resources/error/error-conditions.json  |  6 ++
 .../apache/spark/sql/catalyst/encoders/RowEncoder.scala | 12 +---
 .../src/test/scala/org/apache/spark/sql/UDFSuite.scala  | 17 +
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json 
b/common/utils/src/main/resources/error/error-conditions.json
index 730999085de9..75067a1920f7 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -4207,6 +4207,12 @@
 ],
 "sqlState" : "0A000"
   },
+  "UNSUPPORTED_DATA_TYPE_FOR_ENCODER" : {
+"message" : [
+  "Cannot create encoder for . Please use a different output 
data type for your UDF or DataFrame."
+],
+"sqlState" : "0A000"
+  },
   "UNSUPPORTED_DEFAULT_VALUE" : {
 "message" : [
   "DEFAULT column values is not supported."
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
index 16ac283eccb1..c507e952630f 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
@@ -20,9 +20,9 @@ package org.apache.spark.sql.catalyst.encoders
 import scala.collection.mutable
 import scala.reflect.classTag
 
-import org.apache.spark.sql.Row
+import org.apache.spark.sql.{AnalysisException, Row}
 import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.{BinaryEncoder, 
BoxedBooleanEncoder, BoxedByteEncoder, BoxedDoubleEncoder, BoxedFloatEncoder, 
BoxedIntEncoder, BoxedLongEncoder, BoxedShortEncoder, CalendarIntervalEncoder, 
DateEncoder, DayTimeIntervalEncoder, EncoderField, InstantEncoder, 
IterableEncoder, JavaDecimalEncoder, LocalDateEncoder, LocalDateTimeEncoder, 
MapEncoder, NullEncoder, RowEncoder => AgnosticRowEncoder, StringEncoder, 
TimestampEncoder, UDTEncoder, VariantE [...]
-import org.apache.spark.sql.errors.ExecutionErrors
+import org.apache.spark.sql.errors.{DataTypeErrorsBase, ExecutionErrors}
 import org.apache.spark.sql.internal.SqlApiConf
 import org.apache.spark.sql.types._
 import org.apache.spark.util.ArrayImplicits._
@@ -59,7 +59,7 @@ import org.apache.spark.util.ArrayImplicits._
  *   StructType -> org.apache.spark.sql.Row
  * }}}
  */
-object RowEncoder {
+object RowEncoder extends DataTypeErrorsBase {
   def encoderFor(schema: StructType): AgnosticEncoder[Row] = {
 encoderFor(schema, lenient = false)
   }
@@ -124,5 +124,11 @@ object RowEncoder {
   field.nullable,
   field.metadata)
   }.toImmutableArraySeq)
+
+case _ =>
+  throw new AnalysisException(
+errorClass = "UNSUPPORTED_DATA_TYPE_FOR_ENCODER",
+messageParameters = Map("dataType" -> toSQLType(dataType))
+)
   }
 }
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
index fe47d6c68555..32ad5a94984b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
@@ -1194,4 +1194,21 @@ class UDFSuite extends QueryTest with SharedSparkSession 
{
 .sele

(spark) branch master updated (5e87e9fbd6e6 -> da78949eee04)

2024-05-15 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5e87e9fbd6e6 [SPARK-48277] Improve error message for 
ErrorClassesJsonReader.getErrorMessage
 add da78949eee04 [SPARK-48269][DOCS][TESTS] DB2: Document Mapping Spark 
SQL Data Types from DB2 and add tests

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   |  37 +
 docs/sql-data-sources-jdbc.md  | 149 +
 2 files changed, 186 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"

2024-05-14 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 7a0c72ff7724 Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC 
Dialects"
7a0c72ff7724 is described below

commit 7a0c72ff7724b2ee40843e5bd4f83833bfa56052
Author: Kent Yao 
AuthorDate: Wed May 15 10:10:03 2024 +0800

Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"

This reverts commit a848e2790cba0b7ee77d391dc534146bd35ee50a.
---
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|   6 -
 .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala |  11 -
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |   6 -
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  |   6 -
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |   6 -
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |   6 -
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 229 -
 .../sql/connector/util/V2ExpressionSQLBuilder.java |   3 +
 .../sql/connector/expressions/expressions.scala|   4 +-
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  |   7 +
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  15 --
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala|   6 +-
 12 files changed, 14 insertions(+), 291 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
index 11ddce68aecd..1a25cd2802dd 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
@@ -67,12 +67,6 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCTest {
 connection.prepareStatement(
   "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary 
DECIMAL(20, 2), bonus DOUBLE)")
   .executeUpdate()
-connection.prepareStatement(
-  s"""CREATE TABLE pattern_testing_table (
- |pattern_testing_col LONGTEXT
- |)
-   """.stripMargin
-).executeUpdate()
   }
 
   override def testUpdateColumnType(tbl: String): Unit = {
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
index a42caeafe6fe..72edfc9f1bf1 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
@@ -38,17 +38,6 @@ abstract class DockerJDBCIntegrationV2Suite extends 
DockerJDBCIntegrationSuite {
   .executeUpdate()
 connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 
1200)")
   .executeUpdate()
-
-connection.prepareStatement(
-  s"""
- |INSERT INTO pattern_testing_table VALUES
- |('special_character_quote\\'_present'),
- |('special_character_quote_not_present'),
- |('special_character_percent%_present'),
- |('special_character_percent_not_present'),
- |('special_character_underscore_present'),
- |('special_character_underscorenot_present')
- """.stripMargin).executeUpdate()
   }
 
   def tablePreparation(connection: Connection): Unit
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
index 6658b5ed6c77..a527c6f8cb5b 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
@@ -66,12 +66,6 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JD
 connection.prepareStatement(
   "CREATE TABLE employee (dept INT, name VARCHAR(32), salary NUMERIC(20, 
2), bonus FLOAT)")
   .executeUpdate()
-connection.prepareStatement(
-  s"""CREATE TABLE pattern_testing_table (
- |pattern_testing_col LONGTEXT
- |)
-   """.stripMargin
-).executeUpdate()
   }
 
   override def notSupportsTableComment: Boolean = true
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v

(spark) branch branch-3.5 updated: Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"

2024-05-14 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 74724d61c3d0 Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC 
Dialects"
74724d61c3d0 is described below

commit 74724d61c3d04925da6faa5d49643619aa14f206
Author: Kent Yao 
AuthorDate: Wed May 15 10:09:09 2024 +0800

Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"

This reverts commit f37fa436cd4e0ef9f486a60f9af91a3ce0195df9.
---
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|   6 -
 .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala |  11 -
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |   6 -
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  |   6 -
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |   6 -
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |   6 -
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 229 -
 .../sql/connector/util/V2ExpressionSQLBuilder.java |   3 +
 .../sql/connector/expressions/expressions.scala|   4 +-
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  |   7 +
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  15 --
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala|   6 +-
 12 files changed, 14 insertions(+), 291 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
index 9b4916ddd36b..9a78244f5326 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
@@ -80,12 +80,6 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCTest {
 connection.prepareStatement(
   "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary 
DECIMAL(20, 2), bonus DOUBLE)")
   .executeUpdate()
-connection.prepareStatement(
-  s"""CREATE TABLE pattern_testing_table (
- |pattern_testing_col LONGTEXT
- |)
-   """.stripMargin
-).executeUpdate()
   }
 
   override def testUpdateColumnType(tbl: String): Unit = {
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
index a42caeafe6fe..72edfc9f1bf1 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
@@ -38,17 +38,6 @@ abstract class DockerJDBCIntegrationV2Suite extends 
DockerJDBCIntegrationSuite {
   .executeUpdate()
 connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 
1200)")
   .executeUpdate()
-
-connection.prepareStatement(
-  s"""
- |INSERT INTO pattern_testing_table VALUES
- |('special_character_quote\\'_present'),
- |('special_character_quote_not_present'),
- |('special_character_percent%_present'),
- |('special_character_percent_not_present'),
- |('special_character_underscore_present'),
- |('special_character_underscorenot_present')
- """.stripMargin).executeUpdate()
   }
 
   def tablePreparation(connection: Connection): Unit
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
index 57a2667557fa..0dc3a39f4db5 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
@@ -86,12 +86,6 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JD
 connection.prepareStatement(
   "CREATE TABLE employee (dept INT, name VARCHAR(32), salary NUMERIC(20, 
2), bonus FLOAT)")
   .executeUpdate()
-connection.prepareStatement(
-  s"""CREATE TABLE pattern_testing_table (
- |pattern_testing_col LONGTEXT
- |)
-   """.stripMargin
-).executeUpdate()
   }
 
   override def notSupportsTableComment: Boolean = true
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v

(spark) branch master updated: Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"

2024-05-14 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ff5ca81cffb Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC 
Dialects"
4ff5ca81cffb is described below

commit 4ff5ca81cffbd1940c864144ca8fbba54b605e4e
Author: Kent Yao 
AuthorDate: Wed May 15 10:05:31 2024 +0800

Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"

This reverts commit 47006a493f98ca85196194d16d58b5847177b1a3.
---
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|   6 -
 .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala |  11 -
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |   6 -
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  |   6 -
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |   6 -
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |   6 -
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 229 -
 .../sql/connector/util/V2ExpressionSQLBuilder.java |   1 +
 .../sql/connector/expressions/expressions.scala|   4 +-
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  |   7 +
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  15 --
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala|   6 +-
 12 files changed, 12 insertions(+), 291 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
index 36795747319d..3642094d11b2 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
@@ -62,12 +62,6 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCTest {
 connection.prepareStatement(
   "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary 
DECIMAL(20, 2), bonus DOUBLE)")
   .executeUpdate()
-connection.prepareStatement(
-  s"""CREATE TABLE pattern_testing_table (
- |pattern_testing_col LONGTEXT
- |)
-   """.stripMargin
-).executeUpdate()
   }
 
   override def testUpdateColumnType(tbl: String): Unit = {
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
index a42caeafe6fe..72edfc9f1bf1 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala
@@ -38,17 +38,6 @@ abstract class DockerJDBCIntegrationV2Suite extends 
DockerJDBCIntegrationSuite {
   .executeUpdate()
 connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 
1200)")
   .executeUpdate()
-
-connection.prepareStatement(
-  s"""
- |INSERT INTO pattern_testing_table VALUES
- |('special_character_quote\\'_present'),
- |('special_character_quote_not_present'),
- |('special_character_percent%_present'),
- |('special_character_percent_not_present'),
- |('special_character_underscore_present'),
- |('special_character_underscorenot_present')
- """.stripMargin).executeUpdate()
   }
 
   def tablePreparation(connection: Connection): Unit
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
index 46530fe5419a..b1b8aec5ad33 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
@@ -70,12 +70,6 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JD
 connection.prepareStatement(
   "CREATE TABLE employee (dept INT, name VARCHAR(32), salary NUMERIC(20, 
2), bonus FLOAT)")
   .executeUpdate()
-connection.prepareStatement(
-  s"""CREATE TABLE pattern_testing_table (
- |pattern_testing_col LONGTEXT
- |)
-   """.stripMargin
-).executeUpdate()
   }
 
   override def notSupportsTableComment: Boolean = true
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQ

(spark) branch master updated: [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType

2024-05-09 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 207d675110e6 [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType
207d675110e6 is described below

commit 207d675110e6fa699a434e81296f6f050eb0304b
Author: Kent Yao 
AuthorDate: Thu May 9 17:27:04 2024 +0800

[SPARK-48211][SQL] DB2: Read SMALLINT as ShortType

### What changes were proposed in this pull request?

This PR supports read SMALLINT from DB2 as ShortType

### Why are the changes needed?

- 15 bits is sufficient
- we write ShortType as SMALLINT
- we read smallint from other builtin jdbc sources as ShortType

### Does this PR introduce _any_ user-facing change?

yes, we add a migration guide for this
### How was this patch tested?

changed tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46497 from yaooqinn/SPARK-48211.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   | 69 +-
 docs/sql-migration-guide.md|  1 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 11 
 .../org/apache/spark/sql/jdbc/DB2Dialect.scala |  3 +
 4 files changed, 56 insertions(+), 28 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
index cedb33d491fb..aca174cce194 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
 import org.apache.spark.tags.DockerTest
 
@@ -77,32 +78,44 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationSuite {
   }
 
   test("Numeric types") {
-val df = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties)
-val rows = df.collect()
-assert(rows.length == 1)
-val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 10)
-assert(types(0).equals("class java.lang.Integer"))
-assert(types(1).equals("class java.lang.Integer"))
-assert(types(2).equals("class java.lang.Long"))
-assert(types(3).equals("class java.math.BigDecimal"))
-assert(types(4).equals("class java.lang.Double"))
-assert(types(5).equals("class java.lang.Double"))
-assert(types(6).equals("class java.lang.Float"))
-assert(types(7).equals("class java.math.BigDecimal"))
-assert(types(8).equals("class java.math.BigDecimal"))
-assert(types(9).equals("class java.math.BigDecimal"))
-assert(rows(0).getInt(0) == 17)
-assert(rows(0).getInt(1) == 7)
-assert(rows(0).getLong(2) == 922337203685477580L)
-val bd = new BigDecimal("123456745.567890123450")
-assert(rows(0).getAs[BigDecimal](3).equals(bd))
-assert(rows(0).getDouble(4) == 42.75)
-assert(rows(0).getDouble(5) == 5.4E-70)
-assert(rows(0).getFloat(6) == 3.4028234663852886e+38)
-assert(rows(0).getDecimal(7) == new BigDecimal("4.299900"))
-assert(rows(0).getDecimal(8) == new 
BigDecimal(".00"))
-assert(rows(0).getDecimal(9) == new 
BigDecimal("1234567891234567.123456789123456789"))
+Seq(true, false).foreach { legacy =>
+  withSQLConf(SQLConf.LEGACY_DB2_TIMESTAMP_MAPPING_ENABLED.key -> 
legacy.toString) {
+val df = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties)
+val rows = df.collect()
+assert(rows.length == 1)
+val types = rows(0).toSeq.map(x => x.getClass.toString)
+assert(types.length == 10)
+if (legacy) {
+  assert(types(0).equals("class java.lang.Integer"))
+} else {
+  assert(types(0).equals("class java.lang.Short"))
+}
+assert(types(1).equals("class java.lang.Integer"))
+assert(types(2).equals("class java.lang.Long"))
+assert(types(3).equals("class java.math.BigDecimal"))
+assert(types(4).equals("class java.lang.Double"))
+assert(types(5).equals("class java.lang.Double

(spark) branch master updated: [SPARK-48188][SQL] Consistently use normalized plan for cache

2024-05-08 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8950add773e6 [SPARK-48188][SQL] Consistently use normalized plan for 
cache
8950add773e6 is described below

commit 8950add773e63a910900f796950a6a58e40a8577
Author: Wenchen Fan 
AuthorDate: Wed May 8 20:11:24 2024 +0800

[SPARK-48188][SQL] Consistently use normalized plan for cache

### What changes were proposed in this pull request?

We must consistently use normalized plans for cache filling and lookup, or 
inconsistency will lead to cache misses.

To guarantee this, this PR makes `CacheManager` the central place to do 
plan normalization, so that callers don't need to care about it. Now most APIs 
in `CacheManager` take either `Dataset` or `LogicalPlan`. For `Dataset`, we get 
the normalized plan directly. For `LogicalPlan`, we normalize it before further 
use.

The caller side should pass `Dataset` when invoking `CacheManager`, if it 
already creates `Dataset`. This is to reduce the impact, as extra creation of 
`Dataset` may have perf issues or introduce unexpected analysis exception.

### Why are the changes needed?

Avoid unnecessary cache misses for users who add custom normalization rules

### Does this PR introduce _any_ user-facing change?

No, perf only

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46465 from cloud-fan/cache.

Authored-by: Wenchen Fan 
Signed-off-by: Kent Yao 
---
 .../main/scala/org/apache/spark/sql/Dataset.scala  |   3 +-
 .../apache/spark/sql/execution/CacheManager.scala  | 160 +
 .../spark/sql/execution/QueryExecution.scala   |  37 +++--
 .../execution/command/AnalyzeColumnCommand.scala   |   4 +-
 .../spark/sql/execution/command/CommandUtils.scala |   2 +-
 .../execution/datasources/v2/CacheTableExec.scala  |  30 ++--
 .../datasources/v2/DataSourceV2Strategy.scala  |   2 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   5 +-
 .../org/apache/spark/sql/CachedTableSuite.scala|   2 +-
 .../org/apache/spark/sql/test/SQLTestUtils.scala   |   3 +-
 .../apache/spark/sql/hive/CachedTableSuite.scala   |   9 +-
 11 files changed, 150 insertions(+), 107 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 18c9704afdf8..3e843e64ebbf 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -3904,8 +3904,7 @@ class Dataset[T] private[sql](
* @since 1.6.0
*/
   def unpersist(blocking: Boolean): this.type = {
-sparkSession.sharedState.cacheManager.uncacheQuery(
-  sparkSession, logicalPlan, cascade = false, blocking)
+sparkSession.sharedState.cacheManager.uncacheQuery(this, cascade = false, 
blocking)
 this
   }
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
index ae99873a9f77..b96f257e6b5b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.{Dataset, SparkSession}
 import org.apache.spark.sql.catalyst.catalog.HiveTableRelation
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
SubqueryExpression}
 import org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint
-import org.apache.spark.sql.catalyst.plans.logical.{IgnoreCachedData, 
LogicalPlan, ResolvedHint, SubqueryAlias, View}
+import org.apache.spark.sql.catalyst.plans.logical.{IgnoreCachedData, 
LogicalPlan, ResolvedHint, View}
 import org.apache.spark.sql.catalyst.trees.TreePattern.PLAN_EXPRESSION
 import org.apache.spark.sql.catalyst.util.sideBySide
 import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
@@ -38,7 +38,10 @@ import org.apache.spark.storage.StorageLevel
 import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK
 
 /** Holds a cached logical plan and its data */
-case class CachedData(plan: LogicalPlan, cachedRepresentation: 
InMemoryRelation) {
+case class CachedData(
+// A normalized resolved plan (See QueryExecution#normalized).
+plan: LogicalPlan,
+cachedRepresentation: InMemoryRelation) {
   override def toString: String =
 s"""
|CachedData(
@@ -53,7 +56,9 @@ case class CachedData(plan: LogicalPlan, 
cachedRepresentation: InMemoryRelation)
  * InMemoryRelation.  This relation is automatically substituted query plans 
that return the
  * `sameResult` as the original

(spark) branch master updated (d7f69e7003a3 -> 003823b39d35)

2024-05-08 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d7f69e7003a3 [SPARK-48190][PYTHON][PS][TESTS] Introduce a helper 
function to drop metadata
 add 003823b39d35 [SPARK-48191][SQL] Support UTF-32 for string encode and 
decode

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +-
 .../spark/sql/catalyst/expressions/stringExpressions.scala | 10 +-
 .../sql/catalyst/expressions/StringExpressionsSuite.scala  |  2 ++
 .../sql-tests/analyzer-results/ansi/string-functions.sql.out   |  7 +++
 .../sql-tests/analyzer-results/string-functions.sql.out|  7 +++
 .../src/test/resources/sql-tests/inputs/string-functions.sql   |  1 +
 .../resources/sql-tests/results/ansi/string-functions.sql.out  |  8 
 .../test/resources/sql-tests/results/string-functions.sql.out  |  8 
 8 files changed, 39 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (f5401bab23c0 -> fe8b18b776f5)

2024-05-08 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f5401bab23c0 [MINOR][INFRA] Rename builds to have consistent names
 add fe8b18b776f5 [SPARK-48185][SQL] Fix 'symbolic reference class is not 
accessible: class sun.util.calendar.ZoneInfo'

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47914][SQL] Do not display the splits parameter in Range

2024-05-07 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5f883117203d [SPARK-47914][SQL] Do not display the splits parameter in 
Range
5f883117203d is described below

commit 5f883117203d823cb9914f483e314633845ecaa5
Author: guihuawen 
AuthorDate: Wed May 8 12:04:35 2024 +0800

[SPARK-47914][SQL] Do not display the splits parameter in Range

### What changes were proposed in this pull request?
[SQL]

explain extended select * from range(0, 4);

Before this pr, the split is also displayed in the logical execution phase 
as None, if it is not be set.
`

plan

== Parsed Logical Plan ==

'Project [*]

+- 'UnresolvedTableValuedFunction [range], [0, 4]

== Analyzed Logical Plan ==

id: bigint

Project 
[id#11L](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-47914?filter=allissues#11L)

+- Range (0, 4, step=1, splits=None)

== Optimized Logical Plan ==

Range (0, 4, step=1, splits=None)

== Physical Plan ==

*(1) Range (0, 4, step=1, splits=1)`

After this pr, the split will not be displayed in the logical execution 
phase , if it is not set. At the same time,  it will be  be displayed when it 
is be set.

`

plan

== Parsed Logical Plan ==

'Project [*]

+- 'UnresolvedTableValuedFunction [range], [0, 4]

== Analyzed Logical Plan ==

id: bigint

Project 
[id#11L](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-47914?filter=allissues#11L)

+- Range (0, 4, step=1)

== Optimized Logical Plan ==

Range (0, 4, step=1)

== Physical Plan ==

*(1) Range (0, 4, step=1, splits=1)`

### Why are the changes needed?
If the split is not be set.  it is also displayed in the logical execution 
phase as None, which is not very user-friendly.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

Closes #46136 from guixiaowen/SPARK-47914.

Authored-by: guihuawen 
Signed-off-by: Kent Yao 
---
 .../plans/logical/basicLogicalOperators.scala  |  3 +-
 .../sql-tests/analyzer-results/group-by.sql.out|  8 ++--
 .../analyzer-results/identifier-clause.sql.out |  2 +-
 .../analyzer-results/join-lateral.sql.out  |  2 +-
 .../sql-tests/analyzer-results/limit.sql.out   |  2 +-
 .../named-function-arguments.sql.out   |  4 +-
 .../analyzer-results/non-excludable-rule.sql.out   | 12 +++---
 .../postgreSQL/aggregates_part1.sql.out| 20 -
 .../analyzer-results/postgreSQL/int8.sql.out   |  4 +-
 .../analyzer-results/postgreSQL/join.sql.out   |  6 +--
 .../analyzer-results/postgreSQL/numeric.sql.out| 10 ++---
 .../analyzer-results/postgreSQL/text.sql.out   |  2 +-
 .../analyzer-results/postgreSQL/union.sql.out  | 50 +++---
 .../postgreSQL/window_part1.sql.out|  4 +-
 .../postgreSQL/window_part2.sql.out| 20 -
 .../postgreSQL/window_part3.sql.out|  8 ++--
 .../sql-compatibility-functions.sql.out|  2 +-
 .../analyzer-results/sql-session-variables.sql.out |  2 +-
 .../scalar-subquery-predicate.sql.out  |  4 +-
 .../scalar-subquery/scalar-subquery-select.sql.out |  4 +-
 .../table-valued-functions.sql.out | 14 +++---
 .../typeCoercion/native/concat.sql.out | 18 
 .../typeCoercion/native/elt.sql.out|  8 ++--
 .../udf/postgreSQL/udf-aggregates_part1.sql.out| 20 -
 .../udf/postgreSQL/udf-join.sql.out|  2 +-
 .../analyzer-results/udf/udf-group-by.sql.out  |  4 +-
 .../results/named-function-arguments.sql.out   |  2 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  |  6 +--
 28 files changed, 122 insertions(+), 121 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 4fd640afe3b2..9242a06cf1d6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -1071,7 +1071,8 @@ case class Range(
   override def newInstance(): Range = copy(output = 
output.map(_.newInstance()))
 
   override def simpleString(maxFields: Int): String = {
-s"Range ($start, $end, step=$step, splits=$numSlices)"
+val splits = if (numSlices.isDefined) { s", sp

(spark) branch master updated: [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer

2024-04-29 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3f15ad40640c [SPARK-47994][SQL] Fix bug with CASE WHEN column filter 
push down in SQLServer
3f15ad40640c is described below

commit 3f15ad40640ce71764d1d00b8fae7d88df5e2194
Author: Stefan Bukorovic 
AuthorDate: Mon Apr 29 19:42:16 2024 +0800

[SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in 
SQLServer

### What changes were proposed in this pull request?

In this PR I propose a change in QueryBuilder for SQLServer. This change 
modifies push down of predicate that contains a column that is generated with a 
CASE WHEN construct, so that we add simple ` = 1` comparison to this query, 
making it work on SQLServer.

### Why are the changes needed?

SQLServer does not support 0 or 1 as a boolean values. There are certain 
situations where spark optimizer rewrites filters that contain CASE WHEN 
columns in a way that adds 1 or 0 as a boolean values, which fails on SQLServer 
side with an error "An expression of non-boolean type specified in a context 
where a condition is expected". With these changes, we modify pushing this 
filters down, and error is no longer present.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
A new test case is added, which fails without these changes.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46231 from stefanbuk-db/SQLServer_case_when_bugfix.

Authored-by: Stefan Bukorovic 
Signed-off-by: Kent Yao 
---
 .../spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 13 +
 .../spark/sql/connector/util/V2ExpressionSQLBuilder.java|  2 +-
 .../org/apache/spark/sql/jdbc/MsSqlServerDialect.scala  |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
index f5f5d00d6bda..65f7579de820 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala
@@ -131,4 +131,17 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JD
   "WHERE (dept > 1 AND ((name LIKE 'am%') = (name LIKE '%y')))")
 assert(df3.collect().length == 3)
   }
+
+  test("SPARK-47994: SQLServer does not support 1 or 0 as boolean type in CASE 
WHEN filter") {
+val df = sql(
+  s"""
+|WITH tbl AS (
+|SELECT CASE
+|WHEN e.dept = 1 THEN 'first' WHEN e.dept = 2 THEN 'second' ELSE 
'third' END
+|AS deptString FROM $catalogName.employee as e)
+|SELECT * FROM tbl
+|WHERE deptString = 'first'
+|""".stripMargin)
+assert(df.collect().length == 2)
+  }
 }
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
index 61d68d4a3e88..e42d9193ea39 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
@@ -356,7 +356,7 @@ public class V2ExpressionSQLBuilder {
 return joiner.toString();
   }
 
-  private String[] expressionsToStringArray(Expression[] expressions) {
+  protected String[] expressionsToStringArray(Expression[] expressions) {
 String[] result = new String[expressions.length];
 for (int i = 0; i < expressions.length; i++) {
   result[i] = build(expressions[i]);
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
index 5535545efba8..e341bf3720f4 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
@@ -92,6 +92,7 @@ private case class MsSqlServerDialect() extends JdbcDialect {
   case o => inputToSQL(o)
 }
 visitBinaryComparison(e.name(), l, r)
+  case "CASE_WHEN" => 
visitCaseWhen(expressionsToStringArray(e.children())) + " = 1"
   case _ => super.build(expr)
 }
 case _ => super.build(expr)

(spark) branch branch-3.4 updated: [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark

2024-04-28 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new e2f34c75a6ea [SPARK-48034][TESTS] NullPointerException in 
MapStatusesSerDeserBenchmark
e2f34c75a6ea is described below

commit e2f34c75a6ea686eb6fa4260584bc32b558ce01f
Author: Kent Yao 
AuthorDate: Mon Apr 29 11:40:39 2024 +0800

[SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark

### What changes were proposed in this pull request?

This PR fixes an NPE in MapStatusesSerDeserBenchmark. The cause is that we 
try to stop the tracker twice.

```
3197java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.rpc.RpcEndpointRef.askSync(Object, scala.reflect.ClassTag)" 
because the return value of 
"org.apache.spark.MapOutputTracker.trackerEndpoint()" is null
3198at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:541)
3199at 
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:551)
3200at 
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:1242)
3201at org.apache.spark.SparkEnv.stop(SparkEnv.scala:112)
3202at 
org.apache.spark.SparkContext.$anonfun$stop$25(SparkContext.scala:2354)
3203at 
org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1294)
3204at org.apache.spark.SparkContext.stop(SparkContext.scala:2354)
3205at org.apache.spark.SparkContext.stop(SparkContext.scala:2259)
3206at 
org.apache.spark.MapStatusesSerDeserBenchmark$.afterAll(MapStatusesSerDeserBenchmark.scala:128)
3207at 
org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:80)
3208at 
org.apache.spark.MapStatusesSerDeserBenchmark.main(MapStatusesSerDeserBenchmark.scala)
3209at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
3210at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
3211at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
3212at java.base/java.lang.reflect.Method.invoke(Method.java:568)
3213at 
org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128)
3214at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
3215at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91)
3216at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala)
```
### Why are the changes needed?

test bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manually

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46270 from yaooqinn/SPARK-48034.

    Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 59d5946cfd377e9203ccf572deb34f87fab7510c)
Signed-off-by: Kent Yao 
---
 core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala 
b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
index 797b650799ea..795da65079d6 100644
--- a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
+++ b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
@@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase {
   }
 
   override def afterAll(): Unit = {
-tracker.stop()
 if (sc != null) {
   sc.stop()
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark

2024-04-28 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 616c2162242f [SPARK-48034][TESTS] NullPointerException in 
MapStatusesSerDeserBenchmark
616c2162242f is described below

commit 616c2162242f99a3217caa0b7e4344e2979a5e54
Author: Kent Yao 
AuthorDate: Mon Apr 29 11:40:39 2024 +0800

[SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark

### What changes were proposed in this pull request?

This PR fixes an NPE in MapStatusesSerDeserBenchmark. The cause is that we 
try to stop the tracker twice.

```
3197java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.rpc.RpcEndpointRef.askSync(Object, scala.reflect.ClassTag)" 
because the return value of 
"org.apache.spark.MapOutputTracker.trackerEndpoint()" is null
3198at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:541)
3199at 
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:551)
3200at 
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:1242)
3201at org.apache.spark.SparkEnv.stop(SparkEnv.scala:112)
3202at 
org.apache.spark.SparkContext.$anonfun$stop$25(SparkContext.scala:2354)
3203at 
org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1294)
3204at org.apache.spark.SparkContext.stop(SparkContext.scala:2354)
3205at org.apache.spark.SparkContext.stop(SparkContext.scala:2259)
3206at 
org.apache.spark.MapStatusesSerDeserBenchmark$.afterAll(MapStatusesSerDeserBenchmark.scala:128)
3207at 
org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:80)
3208at 
org.apache.spark.MapStatusesSerDeserBenchmark.main(MapStatusesSerDeserBenchmark.scala)
3209at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
3210at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
3211at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
3212at java.base/java.lang.reflect.Method.invoke(Method.java:568)
3213at 
org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128)
3214at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
3215at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91)
3216at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala)
```
### Why are the changes needed?

test bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manually

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46270 from yaooqinn/SPARK-48034.

    Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 59d5946cfd377e9203ccf572deb34f87fab7510c)
Signed-off-by: Kent Yao 
---
 core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala 
b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
index 797b650799ea..795da65079d6 100644
--- a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
+++ b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
@@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase {
   }
 
   override def afterAll(): Unit = {
-tracker.stop()
 if (sc != null) {
   sc.stop()
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark

2024-04-28 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 59d5946cfd37 [SPARK-48034][TESTS] NullPointerException in 
MapStatusesSerDeserBenchmark
59d5946cfd37 is described below

commit 59d5946cfd377e9203ccf572deb34f87fab7510c
Author: Kent Yao 
AuthorDate: Mon Apr 29 11:40:39 2024 +0800

[SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark

### What changes were proposed in this pull request?

This PR fixes an NPE in MapStatusesSerDeserBenchmark. The cause is that we 
try to stop the tracker twice.

```
3197java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.rpc.RpcEndpointRef.askSync(Object, scala.reflect.ClassTag)" 
because the return value of 
"org.apache.spark.MapOutputTracker.trackerEndpoint()" is null
3198at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:541)
3199at 
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:551)
3200at 
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:1242)
3201at org.apache.spark.SparkEnv.stop(SparkEnv.scala:112)
3202at 
org.apache.spark.SparkContext.$anonfun$stop$25(SparkContext.scala:2354)
3203at 
org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1294)
3204at org.apache.spark.SparkContext.stop(SparkContext.scala:2354)
3205at org.apache.spark.SparkContext.stop(SparkContext.scala:2259)
3206at 
org.apache.spark.MapStatusesSerDeserBenchmark$.afterAll(MapStatusesSerDeserBenchmark.scala:128)
3207at 
org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:80)
3208at 
org.apache.spark.MapStatusesSerDeserBenchmark.main(MapStatusesSerDeserBenchmark.scala)
3209at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
3210at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
3211at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
3212at java.base/java.lang.reflect.Method.invoke(Method.java:568)
3213at 
org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128)
3214at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
3215at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91)
3216at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala)
```
### Why are the changes needed?

test bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manually

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46270 from yaooqinn/SPARK-48034.

    Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala 
b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
index ca85ffda4e60..75f952d063d3 100644
--- a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
+++ b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
@@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase {
   }
 
   override def afterAll(): Unit = {
-tracker.stop()
 if (sc != null) {
   sc.stop()
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48025][SQL][TESTS] Fix org.apache.spark.sql.execution.benchmark.DateTimeBenchmark

2024-04-28 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0bf39459e435 [SPARK-48025][SQL][TESTS] Fix 
org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
0bf39459e435 is described below

commit 0bf39459e4354c0881c1329ec550c357726a1761
Author: Kent Yao 
AuthorDate: Sun Apr 28 17:53:43 2024 +0800

[SPARK-48025][SQL][TESTS] Fix 
org.apache.spark.sql.execution.benchmark.DateTimeBenchmark

### What changes were proposed in this pull request?

This PR fixes several issues in 
org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
- Misuse `trunc` function, a.k.a, the parameter order is wrong in reverse 
order
- Some benchmarks not compatible with ANSI by default

### Why are the changes needed?

restore benchmark cases

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

benchmark

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46261 from yaooqinn/SPARK-48025.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../benchmarks/DateTimeBenchmark-jdk21-results.txt | 372 ++---
 sql/core/benchmarks/DateTimeBenchmark-results.txt  | 372 ++---
 .../execution/benchmark/DateTimeBenchmark.scala|  14 +-
 3 files changed, 382 insertions(+), 376 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt
index 143f433a3160..4b2d34ba4915 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt
@@ -2,460 +2,460 @@
 datetime +/- interval
 

 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 datetime +/- interval:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-date + interval(m)  850887 
 33 11.8  85.0   1.0X
-date + interval(m, d)   863864 
  2 11.6  86.3   1.0X
-date + interval(m, d, ms)  3507   3511 
  5  2.9 350.7   0.2X
-date - interval(m)  841851 
  9 11.9  84.1   1.0X
-date - interval(m, d)   864870 
  5 11.6  86.4   1.0X
-date - interval(m, d, ms)  3518   3519 
  2  2.8 351.8   0.2X
-timestamp + interval(m)1756   1759 
  5  5.7 175.6   0.5X
-timestamp + interval(m, d) 1802   1805 
  4  5.5 180.2   0.5X
-timestamp + interval(m, d, ms) 1958   1961 
  4  5.1 195.8   0.4X
-timestamp - interval(m)1744   1745 
  2  5.7 174.4   0.5X
-timestamp - interval(m, d) 1796   1799 
  4  5.6 179.6   0.5X
-timestamp - interval(m, d, ms) 1944   1947 
  5  5.1 194.4   0.4X
+date + interval(m) 1149   1158 
 12  8.7 114.9   1.0X
+date + interval(m, d)  1136   1137 
  1  8.8 113.6   1.0X
+date + interval(m, d, ms)  3779   3799 
 29  2.6 377.9   0.3X
+date - interval(m) 1113   1116 
  4  9.0 111.3   1.0X
+date - interval(m, d)  1124   1141 
 25  8.9 112.4   1.0X
+date - interval(m, d, ms)  3795   3796 
  1  2.6 379.5   0.3X
+timestamp + interval(m)1528   1530 
  3  6.5 152.8   0.8X
+timestamp + interval(m, d) 1581   1585 
  6  6.3 158.1   0.7X
+timestamp + interval(m, d, ms) 2037   2044 
 10  4.9 203.7

(spark) branch master updated: [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2'

2024-04-27 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 356830ada6c6 [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2'
356830ada6c6 is described below

commit 356830ada6c6ebbf54e7852c37266c32bfa137ea
Author: Ruifeng Zheng 
AuthorDate: Sat Apr 27 22:57:37 2024 +0800

[SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2'

### What changes were proposed in this pull request?
1, pin 'pandas==2.2.2' for `pypy3.9`
2, also change `pandas<=2.2.2` to 'pandas==2.2.2' to avoid unexpected 
version installation (e.g. for pypy3.8 `pandas<=2.2.2` actually installs 
version 2.0.3)

### Why are the changes needed?
pypy had been upgraded

### Does this PR introduce _any_ user-facing change?
no, test only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46256 from zhengruifeng/pip_pandas_version.

Authored-by: Ruifeng Zheng 
Signed-off-by: Kent Yao 
---
 dev/infra/Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 870fb694045c..cdaa2f8b7c09 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -86,10 +86,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.9 && \
 ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
-RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.0.3' scipy coverage 
matplotlib lxml
+RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.2' scipy coverage 
matplotlib lxml
 
 
-ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.2.2 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas==2.2.2 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 protobuf==4.25.1 
googleapis-common-protos==1.56.4"
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer

2024-04-26 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new beda1a4615c7 [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown 
for syntax with boolean comparison in MsSqlServer
beda1a4615c7 is described below

commit beda1a4615c7f33110e360c150cb78832b0fe420
Author: Kent Yao 
AuthorDate: Fri Apr 26 23:51:58 2024 +0800

[SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with 
boolean comparison in MsSqlServer

### What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/45564, predicate pushdown with 
boolean comparison syntax in MsSqlServer is disabled as MsSqlServer does not 
support such a feature.

In this PR, we reenable this feature by converting the boolean comparison 
to an equivalent 1&0 comparison.

### Why are the changes needed?

Avoid performance regressions

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing test

```
[info] MsSqlServerIntegrationSuite:
[info] - SPARK-47440: SQLServer does not support boolean expression in 
binary comparison (2 seconds, 206 milliseconds)
```

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46236 from yaooqinn/SPARK-47440.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java  | 2 +-
 .../scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala | 9 ++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
index fd1b8f5dd1ee..61d68d4a3e88 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java
@@ -212,7 +212,7 @@ public class V2ExpressionSQLBuilder {
 return l + " LIKE '%" + escapeSpecialCharsForLikePattern(value) + "%' 
ESCAPE '\\'";
   }
 
-  private String inputToSQL(Expression input) {
+  protected String inputToSQL(Expression input) {
 if (input.children().length > 1) {
   return "(" + build(input) + ")";
 } else {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
index a1492d81bf53..5535545efba8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
@@ -86,9 +86,12 @@ private case class MsSqlServerDialect() extends JdbcDialect {
   // We shouldn't propagate these queries to MsSqlServer
   expr match {
 case e: Predicate => e.name() match {
-  case "=" | "<>" | "<=>" | "<" | "<=" | ">" | ">="
-  if e.children().exists(_.isInstanceOf[Predicate]) =>
-super.visitUnexpectedExpr(expr)
+  case "=" | "<>" | "<=>" | "<" | "<=" | ">" | ">=" =>
+val Array(l, r) = e.children().map {
+  case p: Predicate => s"CASE WHEN ${inputToSQL(p)} THEN 1 ELSE 0 
END"
+  case o => inputToSQL(o)
+}
+visitBinaryComparison(e.name(), l, r)
   case _ => super.build(expr)
 }
 case _ => super.build(expr)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType

2024-04-26 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 733e53a4ff03 [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to 
TimestampType
733e53a4ff03 is described below

commit 733e53a4ff035b71a4865e1a88271af067d4765d
Author: Kent Yao 
AuthorDate: Fri Apr 26 23:42:20 2024 +0800

[SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType

### What changes were proposed in this pull request?

This PR changes the `datatimeoffset -> StringType` mapping to 
`datatimeoffset -> TimestampType` mapping as we use `mssql-jdbc` for Microsoft 
SQL Server. `spark.sql.legacy.mssqlserver.datetimeoffsetMapping.enabled` is 
provided for user to restore the old behavior.

### Why are the changes needed?

With the official SQL Server client, it's more reasonable to read it as 
TimestampType, which is also much more compliant with other jdbc datasources

### Does this PR introduce _any_ user-facing change?

Yes, (please refer to the first section)
### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46239 from yaooqinn/SPARK-47968.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 59 +-
 docs/sql-data-sources-jdbc.md  |  2 +-
 docs/sql-migration-guide.md|  1 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../apache/spark/sql/jdbc/MsSqlServerDialect.scala |  7 ++-
 5 files changed, 55 insertions(+), 26 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
index a39dcb60406e..623f404339e9 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
@@ -223,29 +223,42 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
   test("Date types") {
 withDefaultTimeZone(UTC) {
-  {
-val df = spark.read
-  .option("preferTimestampNTZ", "false")
-  .jdbc(jdbcUrl, "dates", new Properties)
-checkAnswer(df, Row(
-  Date.valueOf("1991-11-09"),
-  Timestamp.valueOf("1999-01-01 13:23:35"),
-  Timestamp.valueOf("-12-31 23:59:59"),
-  "1901-05-09 23:59:59.000 +14:00",
-  Timestamp.valueOf("1996-01-01 23:24:00"),
-  Timestamp.valueOf("1970-01-01 13:31:24")))
-  }
-  {
-val df = spark.read
-  .option("preferTimestampNTZ", "true")
-  .jdbc(jdbcUrl, "dates", new Properties)
-checkAnswer(df, Row(
-  Date.valueOf("1991-11-09"),
-  LocalDateTime.of(1999, 1, 1, 13, 23, 35),
-  LocalDateTime.of(, 12, 31, 23, 59, 59),
-  "1901-05-09 23:59:59.000 +14:00",
-  LocalDateTime.of(1996, 1, 1, 23, 24, 0),
-  LocalDateTime.of(1970, 1, 1, 13, 31, 24)))
+  Seq(true, false).foreach { ntz =>
+Seq(true, false).foreach { legacy =>
+  withSQLConf(
+SQLConf.LEGACY_MSSQLSERVER_DATETIMEOFFSET_MAPPING_ENABLED.key -> 
legacy.toString) {
+val df = spark.read
+  .option("preferTimestampNTZ", ntz)
+  .jdbc(jdbcUrl, "dates", new Properties)
+checkAnswer(df, Row(
+  Date.valueOf("1991-11-09"),
+  if (ntz) {
+LocalDateTime.of(1999, 1, 1, 13, 23, 35)
+  } else {
+Timestamp.valueOf("1999-01-01 13:23:35")
+  },
+  if (ntz) {
+LocalDateTime.of(, 12, 31, 23, 59, 59)
+  } else {
+Timestamp.valueOf("-12-31 23:59:59")
+  },
+  if (legacy) {
+"1901-05-09 23:59:59.000 +14:00"
+  } else {
+Timestamp.valueOf("1901-05-09 09:59:59")
+  },
+  if (ntz) {
+LocalDateTime.of(1996, 1, 1, 23, 24, 0)
+  } else {
+Timestamp.valueOf("1996-01-01 23:24:00")
+  },
+  if (ntz) {
+LocalDateTime.of(1970, 1, 1, 13, 31, 24)
+  } else {
+Timestamp.valueOf(&qu

(spark) branch master updated: [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write

2024-04-26 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b8b6d17ad8e4 [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 
write
b8b6d17ad8e4 is described below

commit b8b6d17ad8e472307fb4c03ca388efcc4ac7059e
Author: ulysses-you 
AuthorDate: Fri Apr 26 18:32:18 2024 +0800

[SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write

### What changes were proposed in this pull request?

This pr adds a new trait `WriteFilesExecBase` for v1 write, so that the 
downstream project can inherit `WriteFilesExecBase` rather than 
`WriteFilesExec`. The reason is that, inherit a `case class` is a bad practice 
in scala world.

### Why are the changes needed?

Make downstream project easy to develop.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Pass CI

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46240 from ulysses-you/WriteFilesExecBase.

Authored-by: ulysses-you 
Signed-off-by: Kent Yao 
---
 .../spark/sql/execution/datasources/V1Writes.scala   |  4 ++--
 .../spark/sql/execution/datasources/WriteFiles.scala | 16 +---
 .../apache/spark/sql/SparkSessionExtensionSuite.scala| 13 ++---
 .../sql/execution/datasources/V1WriteCommandSuite.scala  |  8 
 4 files changed, 21 insertions(+), 20 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala
index d7a8d7aec0b7..1d6c2a6f8112 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala
@@ -213,9 +213,9 @@ object V1WritesUtils {
 }
   }
 
-  def getWriteFilesOpt(child: SparkPlan): Option[WriteFilesExec] = {
+  def getWriteFilesOpt(child: SparkPlan): Option[WriteFilesExecBase] = {
 child.collectFirst {
-  case w: WriteFilesExec => w
+  case w: WriteFilesExecBase => w
 }
   }
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala
index a4fd57e7dffa..c6c34b7fcea3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala
@@ -58,6 +58,14 @@ case class WriteFiles(
 copy(child = newChild)
 }
 
+trait WriteFilesExecBase extends UnaryExecNode {
+  override def output: Seq[Attribute] = Seq.empty
+
+  override protected def doExecute(): RDD[InternalRow] = {
+throw SparkException.internalError(s"$nodeName does not support doExecute")
+  }
+}
+
 /**
  * Responsible for writing files.
  */
@@ -67,9 +75,7 @@ case class WriteFilesExec(
 partitionColumns: Seq[Attribute],
 bucketSpec: Option[BucketSpec],
 options: Map[String, String],
-staticPartitions: TablePartitionSpec) extends UnaryExecNode {
-  override def output: Seq[Attribute] = Seq.empty
-
+staticPartitions: TablePartitionSpec) extends WriteFilesExecBase {
   override protected def doExecuteWrite(
   writeFilesSpec: WriteFilesSpec): RDD[WriterCommitMessage] = {
 val rdd = child.execute()
@@ -105,10 +111,6 @@ case class WriteFilesExec(
 }
   }
 
-  override protected def doExecute(): RDD[InternalRow] = {
-throw SparkException.internalError(s"$nodeName does not support doExecute")
-  }
-
   override protected def stringArgs: Iterator[Any] = Iterator(child)
 
   override protected def withNewChildInternal(newChild: SparkPlan): 
WriteFilesExec =
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
index 1c44d0c3b4ea..4d38e360f438 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
@@ -40,7 +40,7 @@ import 
org.apache.spark.sql.connector.write.WriterCommitMessage
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanExec, 
AdaptiveSparkPlanHelper, AQEShuffleReadExec, QueryStageExec, 
ShuffleQueryStageExec}
 import org.apache.spark.sql.execution.aggregate.HashAggregateExec
-import org.apache.spark.sql.execution.datasources.{FileFormat, WriteFilesExec, 
WriteFilesSpec}
+import org.apache.spark.sql.execution.datasources.{FileFormat, WriteFilesExec, 
WriteFilesExecBase, WriteFilesSpec}
 import org.apache.sp

(spark) branch master updated: [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`

2024-04-26 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a7150074f0a5 [SPARK-48001][CORE] Remove unused `private implicit def 
arrayToArrayWritable` from `SparkContext`
a7150074f0a5 is described below

commit a7150074f0a5983374b7dcd7ea8710cdcd050cd6
Author: yangjie01 
AuthorDate: Fri Apr 26 15:32:58 2024 +0800

[SPARK-48001][CORE] Remove unused `private implicit def 
arrayToArrayWritable` from `SparkContext`

### What changes were proposed in this pull request?
The private implicit function `arrayToArrayWritable` was introduced 
alongside other implicit functions such as `rddToPairRDDExtras`, 
`rddToSequencePairRDDExtras`, `intToIntWritable`, etc., in the commit 
https://github.com/apache/spark/commit/38f2ba99cc32584f0645f3875f051516b4b738d2.
 Apart from `arrayToArrayWritable`, the other implicit functions were 
deprecated in SPARK-4397 and SPARK-4795, and subsequently removed in 
SPARK-12615. `arrayToArrayWritable` remained as a leftover private  [...]

### Why are the changes needed?
Clean up useless code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46238 from LuciferYang/remove-arrayToArrayWritable.

Authored-by: yangjie01 
Signed-off-by: Kent Yao 
---
 core/src/main/scala/org/apache/spark/SparkContext.scala | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 58472a1b3a44..5e231544e249 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -28,14 +28,13 @@ import scala.collection.concurrent.{Map => 
ScalaConcurrentMap}
 import scala.collection.immutable
 import scala.collection.mutable.HashMap
 import scala.jdk.CollectionConverters._
-import scala.language.implicitConversions
 import scala.reflect.{classTag, ClassTag}
 import scala.util.control.NonFatal
 
 import com.google.common.collect.MapMaker
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileSystem, Path}
-import org.apache.hadoop.io.{ArrayWritable, BooleanWritable, BytesWritable, 
DoubleWritable, FloatWritable, IntWritable, LongWritable, NullWritable, Text, 
Writable}
+import org.apache.hadoop.io.{BooleanWritable, BytesWritable, DoubleWritable, 
FloatWritable, IntWritable, LongWritable, NullWritable, Text, Writable}
 import org.apache.hadoop.mapred.{FileInputFormat, InputFormat, JobConf, 
SequenceFileInputFormat, TextInputFormat}
 import org.apache.hadoop.mapreduce.{InputFormat => NewInputFormat, Job => 
NewHadoopJob}
 import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat => 
NewFileInputFormat}
@@ -3046,14 +3045,6 @@ object SparkContext extends Logging {
 }
   }
 
-  private implicit def arrayToArrayWritable[T <: Writable : ClassTag](arr: 
Iterable[T])
-: ArrayWritable = {
-def anyToWritable[U <: Writable](u: U): Writable = u
-
-new ArrayWritable(classTag[T].runtimeClass.asInstanceOf[Class[Writable]],
-arr.map(x => anyToWritable(x)).toArray)
-  }
-
   /**
* Find the JAR from which a given class was loaded, to make it easy for 
users to pass
* their JARs to SparkContext.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47982][BUILD] Update some code style's plugins to latest version

2024-04-25 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a84cffd8b3da [SPARK-47982][BUILD] Update some code style's plugins to 
latest version
a84cffd8b3da is described below

commit a84cffd8b3dac777350a78896794ca726e91b080
Author: panbingkun 
AuthorDate: Thu Apr 25 16:12:14 2024 +0800

[SPARK-47982][BUILD] Update some code style's plugins to latest version

### What changes were proposed in this pull request?
The pr aims to update some some code style's plugins to latest version, 
include:
- `mvn-scalafmt_2.13` from `1.1.1684076452.9f83818` to 
`1.1.1713302731.c3d0074`.
- `checkstyle` from `10.14.0` to `10.15.0`.
- `scalafmt` from `3.8.0` to `3.8.1`.

### Why are the changes needed?
1.`mvn-scalafmt_2.13`

https://github.com/SimonJPegg/mvn_scalafmt/releases/tag/v2.13-1.1.1713302731.c3d0074
Minor version bumps and dropping 2.12

2.`checkstyle`
https://checkstyle.org/releasenotes.html#Release_10.15.0
https://checkstyle.org/releasenotes.html#Release_10.14.2
https://checkstyle.org/releasenotes.html#Release_10.14.1

3.`scalafmt`
https://github.com/scalameta/scalafmt/releases/tag/v3.8.1
https://github.com/apache/spark/assets/15246973/657fb99a-2267-4d4f-84bb-1d2006818fdd;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46216 from panbingkun/SPARK-47982.

Authored-by: panbingkun 
Signed-off-by: Kent Yao 
---
 dev/.scalafmt.conf  | 2 +-
 pom.xml | 4 ++--
 project/plugins.sbt | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf
index 9a01136dfaf8..43be4717c9ab 100644
--- a/dev/.scalafmt.conf
+++ b/dev/.scalafmt.conf
@@ -27,4 +27,4 @@ danglingParentheses.preset = false
 docstrings.style = Asterisk
 maxColumn = 98
 runner.dialect = scala213
-version = 3.8.0
+version = 3.8.1
diff --git a/pom.xml b/pom.xml
index 338b4050e0f8..c98514efa356 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3548,7 +3548,7 @@
 -->
 com.puppycrawl.tools
 checkstyle
-10.14.0
+10.15.0
   
 
 
@@ -3610,7 +3610,7 @@
   
 org.antipathy
 mvn-scalafmt_${scala.binary.version}
-1.1.1684076452.9f83818
+1.1.1713302731.c3d0074
 
   ${scalafmt.validateOnly} 
   ${scalafmt.skip}
diff --git a/project/plugins.sbt b/project/plugins.sbt
index 8f422ca07cbb..44b357d95eb9 100644
--- a/project/plugins.sbt
+++ b/project/plugins.sbt
@@ -20,10 +20,10 @@ addSbtPlugin("software.purpledragon" % 
"sbt-checkstyle-plugin" % "4.0.1")
 // sbt-checkstyle-plugin uses an old version of checkstyle. Match it to 
Maven's.
 // If you are changing the dependency setting for checkstyle plugin,
 // please check pom.xml in the root of the source tree too.
-libraryDependencies += "com.puppycrawl.tools" % "checkstyle" % "10.14.0"
+libraryDependencies += "com.puppycrawl.tools" % "checkstyle" % "10.15.0"
 
-// checkstyle uses guava 31.0.1-jre.
-libraryDependencies += "com.google.guava" % "guava" % "31.0.1-jre"
+// checkstyle uses guava 33.1.0-jre.
+libraryDependencies += "com.google.guava" % "guava" % "33.1.0-jre"
 
 addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.2.0")
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47984][ML][SQL] Change `MetricsAggregate/V2Aggregator#serialize/deserialize` to call `SparkSerDeUtils#serialize/deserialize`

2024-04-25 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5f730c84abd7 [SPARK-47984][ML][SQL] Change 
`MetricsAggregate/V2Aggregator#serialize/deserialize` to call 
`SparkSerDeUtils#serialize/deserialize`
5f730c84abd7 is described below

commit 5f730c84abd789360157585ba623537c23b08f78
Author: yangjie01 
AuthorDate: Thu Apr 25 16:05:41 2024 +0800

[SPARK-47984][ML][SQL] Change 
`MetricsAggregate/V2Aggregator#serialize/deserialize` to call 
`SparkSerDeUtils#serialize/deserialize`

### What changes were proposed in this pull request?
The utility methods `serialize` and `deserialize` exist in 
`SparkSerDeUtils`:


https://github.com/apache/spark/blob/08caa567fb29e762f3f7f9d94cd42c02f1e47247/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala#L23-L36

This PR changes the implementation of `serialize/deserialize` methods in 
`MetricsAggregate` and `V2Aggregator` to call the `serialize/deserialize` 
methods in `SparkSerDeUtils` to eliminate duplicate code.

### Why are the changes needed?
Eliminate duplicate code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46218 from LuciferYang/utils-sede.

Authored-by: yangjie01 
Signed-off-by: Kent Yao 
---
 .../src/main/scala/org/apache/spark/ml/stat/Summarizer.scala | 10 +++---
 .../sql/catalyst/expressions/aggregate/V2Aggregator.scala| 12 +++-
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala 
b/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala
index 4697bfbe4b09..7a27b32aa24c 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala
@@ -31,6 +31,7 @@ import 
org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggreg
 import org.apache.spark.sql.catalyst.trees.BinaryLike
 import org.apache.spark.sql.functions.lit
 import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
 
 /**
  * A builder object that provides summary statistics about a given column.
@@ -397,17 +398,12 @@ private[spark] object SummaryBuilderImpl extends Logging {
 
 override def serialize(state: SummarizerBuffer): Array[Byte] = {
   // TODO: Use ByteBuffer to optimize
-  val bos = new ByteArrayOutputStream()
-  val oos = new ObjectOutputStream(bos)
-  oos.writeObject(state)
-  bos.toByteArray
+  Utils.serialize(state)
 }
 
 override def deserialize(bytes: Array[Byte]): SummarizerBuffer = {
   // TODO: Use ByteBuffer to optimize
-  val bis = new ByteArrayInputStream(bytes)
-  val ois = new ObjectInputStream(bis)
-  ois.readObject().asInstanceOf[SummarizerBuffer]
+  Utils.deserialize(bytes)
 }
 
 override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): MetricsAggregate = {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala
index bb94421bc7d4..49ba2ec8b904 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala
@@ -17,13 +17,12 @@
 
 package org.apache.spark.sql.catalyst.expressions.aggregate
 
-import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
ObjectInputStream, ObjectOutputStream}
-
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.{Expression, 
ImplicitCastInputTypes, UnsafeProjection}
 import org.apache.spark.sql.connector.catalog.functions.{AggregateFunction => 
V2AggregateFunction}
 import org.apache.spark.sql.types.{AbstractDataType, DataType}
 import org.apache.spark.util.ArrayImplicits._
+import org.apache.spark.util.Utils
 
 case class V2Aggregator[BUF <: java.io.Serializable, OUT](
 aggrFunc: V2AggregateFunction[BUF, OUT],
@@ -50,16 +49,11 @@ case class V2Aggregator[BUF <: java.io.Serializable, OUT](
   }
 
   override def serialize(buffer: BUF): Array[Byte] = {
-val bos = new ByteArrayOutputStream()
-val out = new ObjectOutputStream(bos)
-out.writeObject(buffer)
-out.close()
-bos.toByteArray
+Utils.serialize(buffer)
   }
 
   override def deserialize(bytes: Array[Byte]): BUF = {
-val in = new ObjectInputStream(new ByteArrayInputStream(bytes))
-in.readObject().asInstanceOf[BUF]
+Utils.deserialize(bytes)
   }

(spark) branch master updated: [SPARK-47981][BUILD] Upgrade `Arrow` to 16.0.0

2024-04-25 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7090bc1f43fd [SPARK-47981][BUILD] Upgrade `Arrow` to 16.0.0
7090bc1f43fd is described below

commit 7090bc1f43fd5ae5e67214f84de0276ee6e5df79
Author: sychen 
AuthorDate: Thu Apr 25 16:02:01 2024 +0800

[SPARK-47981][BUILD] Upgrade `Arrow` to 16.0.0

### What changes were proposed in this pull request?
The pr aims to upgrade `Arrow` from `15.0.2` to `16.0.0`.

### Why are the changes needed?
https://arrow.apache.org/release/16.0.0.html

SPARK-46718 and SPARK-47531 upgraded the arrow version from 14 to 15, and 
15 introduced the `eclipse-collections` dependency.

https://github.com/apache/arrow/issues/40896

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46214 from cxzl25/SPARK-47981.

Authored-by: sychen 
Signed-off-by: Kent Yao 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 12 ++--
 pom.xml   |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index c1adff73d339..f6adb6d18b85 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -16,10 +16,11 @@ antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar
 aopalliance-repackaged/3.0.3//aopalliance-repackaged-3.0.3.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
-arrow-format/15.0.2//arrow-format-15.0.2.jar
-arrow-memory-core/15.0.2//arrow-memory-core-15.0.2.jar
-arrow-memory-netty/15.0.2//arrow-memory-netty-15.0.2.jar
-arrow-vector/15.0.2//arrow-vector-15.0.2.jar
+arrow-format/16.0.0//arrow-format-16.0.0.jar
+arrow-memory-core/16.0.0//arrow-memory-core-16.0.0.jar
+arrow-memory-netty-buffer-patch/16.0.0//arrow-memory-netty-buffer-patch-16.0.0.jar
+arrow-memory-netty/16.0.0//arrow-memory-netty-16.0.0.jar
+arrow-vector/16.0.0//arrow-vector-16.0.0.jar
 audience-annotations/0.12.0//audience-annotations-0.12.0.jar
 avro-ipc/1.11.3//avro-ipc-1.11.3.jar
 avro-mapred/1.11.3//avro-mapred-1.11.3.jar
@@ -33,6 +34,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
 bundle/2.24.6//bundle-2.24.6.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
+checker-qual/3.42.0//checker-qual-3.42.0.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
 chill_2.13/0.10.0//chill_2.13-0.10.0.jar
 commons-cli/1.6.0//commons-cli-1.6.0.jar
@@ -62,8 +64,6 @@ derby/10.16.1.1//derby-10.16.1.1.jar
 derbyshared/10.16.1.1//derbyshared-10.16.1.1.jar
 derbytools/10.16.1.1//derbytools-10.16.1.1.jar
 
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
-eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar
-eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar
 esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar
 flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar
 gcs-connector/hadoop3-2.2.21/shaded/gcs-connector-hadoop3-2.2.21-shaded.jar
diff --git a/pom.xml b/pom.xml
index 03c6b757ab02..338b4050e0f8 100644
--- a/pom.xml
+++ b/pom.xml
@@ -228,7 +228,7 @@
 ./python/pyspark/sql/pandas/utils.py, ./python/packaging/classic/setup.py
 and ./python/packaging/connect/setup.py too.
 -->
-15.0.2
+16.0.0
 3.0.0-M1
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47983][SQL] Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal

2024-04-25 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a066d0c17853 [SPARK-47983][SQL] Demote 
spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal
a066d0c17853 is described below

commit a066d0c178535489f461abebae9f84abbdc04891
Author: Kent Yao 
AuthorDate: Thu Apr 25 15:59:03 2024 +0800

[SPARK-47983][SQL] Demote 
spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal

### What changes were proposed in this pull request?

Make `spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled` as 
internal like other legacy configurations

### Why are the changes needed?

legacy configurations are not exposed to end users, except for this one.

### Does this PR introduce _any_ user-facing change?

doc change only

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46217 from yaooqinn/SPARK-47983.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 974810133859..f49e48dd3fa0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -4535,6 +4535,7 @@ object SQLConf {
 
   val LEGACY_INFER_ARRAY_TYPE_FROM_FIRST_ELEMENT =
 
buildConf("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled")
+  .internal()
   .doc("PySpark's SparkSession.createDataFrame infers the element type of 
an array from all " +
 "values in the array by default. If this config is set to true, it 
restores the legacy " +
 "behavior of only inferring the type from the first array element.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47980][SQL][TESTS] Reactivate test 'Empty float/double array columns raise EOFException'

2024-04-25 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 08caa567fb29 [SPARK-47980][SQL][TESTS] Reactivate test 'Empty 
float/double array columns raise EOFException'
08caa567fb29 is described below

commit 08caa567fb29e762f3f7f9d94cd42c02f1e47247
Author: Kent Yao 
AuthorDate: Thu Apr 25 14:16:29 2024 +0800

[SPARK-47980][SQL][TESTS] Reactivate test 'Empty float/double array columns 
raise EOFException'

### What changes were proposed in this pull request?

[SPARK-29462](https://issues.apache.org/jira/browse/SPARK-29462) have been 
resolved, so let's enable this test
### Why are the changes needed?

test cov

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

this test
### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46213 from yaooqinn/SPARK-47980.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala
index 610fc246cd84..284717739a81 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala
@@ -207,10 +207,7 @@ class HiveOrcQuerySuite extends OrcQueryTest with 
TestHiveSingleton {
 }
   }
 
-  // SPARK-28885 String value is not allowed to be stored as numeric type with
-  // ANSI store assignment policy.
-  // TODO: re-enable the test case when SPARK-29462 is fixed.
-  ignore("SPARK-23340 Empty float/double array columns raise EOFException") {
+  test("SPARK-23340 Empty float/double array columns raise EOFException") {
 withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "false") {
   withTable("spark_23340") {
 sql("CREATE TABLE spark_23340(a array, b array) STORED 
AS ORC")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (0fcced63be99 -> d23389252a7d)

2024-04-24 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0fcced63be99 [SPARK-47979][SQL][TESTS] Use Hive tables explicitly for 
Hive table capability tests
 add d23389252a7d [SPARK-47967][SQL] Make `JdbcUtils.makeGetter` handle 
reading time type as NTZ correctly

No new revisions were added by this update.

Summary of changes:
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 43 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  8 +++-
 2 files changed, 32 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (d1298e73a8d5 -> e74221e6525e)

2024-04-23 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d1298e73a8d5 [SPARK-47931][SQL] Remove unused and leaked 
threadlocal/session sessionHive
 add e74221e6525e [SPARK-47945][SQL] MsSQLServer: Document Mapping Spark 
SQL Data Types from Microsoft SQL Server and add tests

No new revisions were added by this update.

Summary of changes:
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  37 
 docs/sql-data-sources-jdbc.md  | 189 +
 2 files changed, 226 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (885e98ecbe64 -> d1298e73a8d5)

2024-04-23 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 885e98ecbe64 [SPARK-47412][SQL] Add Collation Support for LPad/RPad
 add d1298e73a8d5 [SPARK-47931][SQL] Remove unused and leaked 
threadlocal/session sessionHive

No new revisions were added by this update.

Summary of changes:
 .../service/cli/session/HiveSessionImplwithUGI.java | 21 -
 1 file changed, 21 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException

2024-04-19 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2bf43460b923 [SPARK-47833][SQL][CORE] Supply caller stackstrace for 
checkAndGlobPathIfNecessary AnalysisException
2bf43460b923 is described below

commit 2bf43460b923c95fb8debc7f0421d9a9e10531b0
Author: Cheng Pan 
AuthorDate: Fri Apr 19 15:57:17 2024 +0800

[SPARK-47833][SQL][CORE] Supply caller stackstrace for 
checkAndGlobPathIfNecessary AnalysisException

### What changes were proposed in this pull request?

SPARK-29089 parallelized `checkAndGlobPathIfNecessary` by leveraging 
ForkJoinPool, it also introduced a side effect, if something goes wrong, the 
reported error message loses caller side stack trace.

For example, I meet the following error on a Spark job, I have no idea what 
happened without the caller stack trace.

```
2024-04-12 14:31:21 CST ApplicationMaster INFO - Final app status: FAILED, 
exitCode: 15, (reason: User class threw exception: 
org.apache.spark.sql.AnalysisException: Path does not exist: 
hdfs://xyz-cluster/user/abc/hive_db/tmp.db/tmp_lskkh_1
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.dataPathNotExistError(QueryCompilationErrors.scala:1011)
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:785)
at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:782)
at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at 
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at 
java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
)
```

### Why are the changes needed?

Improve error message.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New UT is added, and the exception stacktrace differences are

raw stacktrace
```
java.lang.RuntimeException: Error occurred on Thread-9
at 
org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141)
at 
org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138)
```

enhanced exception stacktrace
```
java.lang.RuntimeException: Error occurred on Thread-9
at 
org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141)
at 
org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138)
at ... run in separate thread: Thread-9 ... ()
at 
org.apache.spark.util.ThreadUtilsSuite.$anonfun$new$16(ThreadUtilsSuite.scala:151)
at 
org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
at 
org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
at 
org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
at 
org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
at 
org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
(... other scalatest callsites)
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46028 from pan3793/SPARK-47833.

Authored-by: Cheng Pan 
Signed-off-by: Kent Yao 
---
 .../scala/org/apache/spark/util/ThreadUtils.scala  | 62 ++
 .../org/apache/spark/util/ThreadUtilsSuite.scala   | 39 +-
 .../sql/execution/datasources/DataSource.scala |  4 +-
 3 files changed, 80 insertions(+), 25 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala

(spark) branch branch-3.4 updated: [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12

2024-04-18 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new bcaf61b975d6 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance 
regression in scala 2.12
bcaf61b975d6 is described below

commit bcaf61b975d6e24222e483597eb2232aff822a98
Author: Zhen Wang <643348...@qq.com>
AuthorDate: Fri Apr 19 10:53:16 2024 +0800

[SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 
2.12

### What changes were proposed in this pull request?

Fix `ExpressionSet` performance regression in scala 2.12.

### Why are the changes needed?

The implementation of the `SetLike.++` method in scala 2.12 is to 
iteratively execute the `+` method. The `ExpressionSet.+` method first clones a 
new object and then adds element, which is very expensive.


https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186

After https://github.com/apache/spark/pull/36121, the `++` and `--` methods 
in ExpressionSet of scala 2.12 were removed, causing performance regression.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Benchmark code:

```
object TestBenchmark {
  def main(args: Array[String]): Unit = {
val count = 300
val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count)
val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1))

var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i))
val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 
300).map(i => aUpper + i))

benchmark.addCase("Test ++", 10) { _: Int =>
  for (_ <- 0L until count) {
initialSet ++= setToAddWithSameDeterministicExpression
  }
}
benchmark.run()
  }
}
```

before this change:

```
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64
Intel Core Processor (Skylake, IBRS)
Test ExpressionSetV2 ++ : Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Test ++1577   1691  
61  0.0 5255516.0   1.0X
```

after this change:

```
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64
Intel Core Processor (Skylake, IBRS)
Test ExpressionSetV2 ++ : Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Test ++  14 14  
 0  0.0   45395.2   1.0X
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46114 from wForget/SPARK-47897.

Authored-by: Zhen Wang <643348...@qq.com>
Signed-off-by: Kent Yao 
(cherry picked from commit afd99d19a2b85dda2245d3557506d1090187c5f4)
Signed-off-by: Kent Yao 
---
 .../sql/catalyst/expressions/ExpressionSet.scala| 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 
b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
index 3e545f745bae..c18679330f3a 100644
--- 
a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
+++ 
b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
-import scala.collection.mutable
+import scala.collection.{mutable, GenTraversableOnce}
 import scala.collection.mutable.ArrayBuffer
 
 object ExpressionSet {
@@ -108,12 +108,31 @@ class ExpressionSet protected(
 newSet
   }
 
+  /**
+   * SPARK-47897: In Scala 2.12, the `SetLike.++` method iteratively calls `+` 
method.
+   * `ExpressionSet.+` is expensive, so we override `++`.
+   */
+  override def ++(elems: GenTraversableOnce[Expression]): ExpressionSet = {
+val newSet = clone()
+elems.foreach(newSet.add)
+newSet
+  }
+
   override def -(elem: Expression): ExpressionSet = {
 val newSet = clone()
 newSet.remove(elem)
 newSet
   }
 
+  /**
+   * SPARK-47897: We need to override `--` like `++`.
+   */
+  override def --(elems: GenTraversableOnce[E

(spark) branch branch-3.5 updated: [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12

2024-04-18 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new afd99d19a2b8 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance 
regression in scala 2.12
afd99d19a2b8 is described below

commit afd99d19a2b85dda2245d3557506d1090187c5f4
Author: Zhen Wang <643348...@qq.com>
AuthorDate: Fri Apr 19 10:53:16 2024 +0800

[SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 
2.12

### What changes were proposed in this pull request?

Fix `ExpressionSet` performance regression in scala 2.12.

### Why are the changes needed?

The implementation of the `SetLike.++` method in scala 2.12 is to 
iteratively execute the `+` method. The `ExpressionSet.+` method first clones a 
new object and then adds element, which is very expensive.


https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186

After https://github.com/apache/spark/pull/36121, the `++` and `--` methods 
in ExpressionSet of scala 2.12 were removed, causing performance regression.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Benchmark code:

```
object TestBenchmark {
  def main(args: Array[String]): Unit = {
val count = 300
val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count)
val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1))

var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i))
val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 
300).map(i => aUpper + i))

benchmark.addCase("Test ++", 10) { _: Int =>
  for (_ <- 0L until count) {
initialSet ++= setToAddWithSameDeterministicExpression
  }
}
benchmark.run()
  }
}
```

before this change:

```
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64
Intel Core Processor (Skylake, IBRS)
Test ExpressionSetV2 ++ : Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Test ++1577   1691  
61  0.0 5255516.0   1.0X
```

after this change:

```
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64
Intel Core Processor (Skylake, IBRS)
Test ExpressionSetV2 ++ : Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


Test ++  14 14  
 0  0.0   45395.2   1.0X
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46114 from wForget/SPARK-47897.

Authored-by: Zhen Wang <643348...@qq.com>
Signed-off-by: Kent Yao 
---
 .../sql/catalyst/expressions/ExpressionSet.scala| 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 
b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
index 3e545f745bae..c18679330f3a 100644
--- 
a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
+++ 
b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
-import scala.collection.mutable
+import scala.collection.{mutable, GenTraversableOnce}
 import scala.collection.mutable.ArrayBuffer
 
 object ExpressionSet {
@@ -108,12 +108,31 @@ class ExpressionSet protected(
 newSet
   }
 
+  /**
+   * SPARK-47897: In Scala 2.12, the `SetLike.++` method iteratively calls `+` 
method.
+   * `ExpressionSet.+` is expensive, so we override `++`.
+   */
+  override def ++(elems: GenTraversableOnce[Expression]): ExpressionSet = {
+val newSet = clone()
+elems.foreach(newSet.add)
+newSet
+  }
+
   override def -(elem: Expression): ExpressionSet = {
 val newSet = clone()
 newSet.remove(elem)
 newSet
   }
 
+  /**
+   * SPARK-47897: We need to override `--` like `++`.
+   */
+  override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = {
+val newSet = clone()
+elems.foreach(newSet.remove)
+newSet
+  }
+

(spark) branch master updated (268856da31c1 -> 2054ab0fb03f)

2024-04-16 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 268856da31c1 [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of 
VARCHAR for VarcharType mapping
 add 2054ab0fb03f [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark 
SQL Data Types to Oracle

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-jdbc.md | 106 ++
 1 file changed, 106 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 527 matches

Mail list logo