(spark) branch master updated: [SPARK-48574][SQL] Fix support for StructTypes with collations
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3ac31b1b6eaf [SPARK-48574][SQL] Fix support for StructTypes with collations 3ac31b1b6eaf is described below commit 3ac31b1b6eaf9c1a45859f4238a7f7e2c4ffb9dc Author: Mihailo Milosevic AuthorDate: Wed Jun 19 16:07:59 2024 +0800 [SPARK-48574][SQL] Fix support for StructTypes with collations ### What changes were proposed in this pull request? Fix for ExtractValue expression ### Why are the changes needed? This fix is needed in case we change default collation. ### Does this PR introduce _any_ user-facing change? Yes, it fixes a problem. ### How was this patch tested? Added tests in `CollationSQLExpressionsSuite.scala` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46997 from mihailom-db/SPARK-48574. Authored-by: Mihailo Milosevic Signed-off-by: Kent Yao --- .../catalyst/expressions/complexTypeExtractors.scala | 4 ++-- .../spark/sql/CollationSQLExpressionsSuite.scala | 19 +++ 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala index a801d0367080..ff94322efdaa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala @@ -51,12 +51,12 @@ object ExtractValue { resolver: Resolver): Expression = { (child.dataType, extraction) match { - case (StructType(fields), NonNullLiteral(v, StringType)) => + case (StructType(fields), NonNullLiteral(v, _: StringType)) => val fieldName = v.toString val ordinal = findField(fields, fieldName, resolver) GetStructField(child, ordinal, Some(fieldName)) - case (ArrayType(StructType(fields), containsNull), NonNullLiteral(v, StringType)) => + case (ArrayType(StructType(fields), containsNull), NonNullLiteral(v, _: StringType)) => val fieldName = v.toString val ordinal = findField(fields, fieldName, resolver) GetArrayStructFields(child, fields(ordinal).copy(name = fieldName), diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala index a1c6f5f94317..0c54ccb7cfb1 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala @@ -1854,6 +1854,25 @@ class CollationSQLExpressionsSuite }) } + test("ExtractValue expression with collation") { +// Supported collations +testSuppCollations.foreach(collationName => { + withSQLConf(SqlApiConf.DEFAULT_COLLATION -> collationName) { +val query = + s""" + |select col['Field1'] + |from values (named_struct('Field1', 'Spark', 'Field2', 5)) as tab(col); + |""".stripMargin +// Result & data type check +val testQuery = sql(query) +val dataType = StringType(collationName) +val expectedResult = "Spark" +assert(testQuery.schema.fields.head.dataType.sameType(dataType)) +checkAnswer(testQuery, Row(expectedResult)) + } +}) + } + test("Lag expression with collation") { // Supported collations testSuppCollations.foreach(collationName => { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48644][SQL] Do a length check and throw COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 05c87e51a5e5 [SPARK-48644][SQL] Do a length check and throw COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex 05c87e51a5e5 is described below commit 05c87e51a5e50d1c156211848693b66937f12a8f Author: Kent Yao AuthorDate: Tue Jun 18 13:19:34 2024 +0800 [SPARK-48644][SQL] Do a length check and throw COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex ### What changes were proposed in this pull request? A length check is necessary for heximalising the byte array as the new array size is 2x the original. If the length of the new byte array exceeds Int.MaxValue, we shall report the actual length and the threshold instead of NegativeArraySizeException ### Why are the changes needed? improve error handling ### Does this PR introduce _any_ user-facing change? Yes, we convert large strings or binary values to hex strings, if the max length exceeds, the raised error is changed ### How was this patch tested? test manually without adding unit test because such a test case is quite memory consuming. ``` org.apache.spark.sql.catalyst.expressions.Hex.hex((" " * (Int.MaxValue / 2 + 1)).getBytes) org.apache.spark.SparkIllegalArgumentException: [COLLECTION_SIZE_LIMIT_EXCEEDED.INITIALIZE] Can't create array with 2147483648 elements which exceeding the array size limit 2147483647, cannot initialize an array with specified parameters. SQLSTATE: 54000 at org.apache.spark.sql.errors.QueryExecutionErrors$.tooManyArrayElementsError(QueryExecutionErrors.scala:2517) at org.apache.spark.sql.catalyst.expressions.Hex$.hex(mathExpressions.scala:1042) ... 42 elided ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #47001 from yaooqinn/SPARK-48644. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../apache/spark/sql/catalyst/expressions/mathExpressions.scala | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index 20bedeb04098..5981b42aead8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -1034,7 +1034,14 @@ object Hex { def hex(bytes: Array[Byte]): UTF8String = { val length = bytes.length -val value = new Array[Byte](length * 2) +if (length == 0) { + return UTF8String.EMPTY_UTF8 +} +val targetLength = length * 2L +if (targetLength > Int.MaxValue) { + throw QueryExecutionErrors.tooManyArrayElementsError(targetLength, Int.MaxValue) +} +val value = new Array[Byte](targetLength.toInt) var i = 0 while (i < length) { value(i * 2) = hexDigits((bytes(i) & 0xF0) >> 4) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (9ef092f19aaa -> 8fdd85f09779)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9ef092f19aaa [SPARK-48641][BUILD] Upgrade `curator` to 5.7.0 add 8fdd85f09779 [SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type widen capability No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/ReadSchemaSuite.scala| 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48641][BUILD] Upgrade `curator` to 5.7.0
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9ef092f19aaa [SPARK-48641][BUILD] Upgrade `curator` to 5.7.0 9ef092f19aaa is described below commit 9ef092f19aaa8c4afbf26d1c34336af328265bb0 Author: Wei Guo AuthorDate: Mon Jun 17 17:55:44 2024 +0800 [SPARK-48641][BUILD] Upgrade `curator` to 5.7.0 ### What changes were proposed in this pull request? This PR aims to upgrade `curator` to 5.7.0. ### Why are the changes needed? There are some bug fixes and improvements in Apache Curator 5.7.0: [[CURATOR-688](https://issues.apache.org/jira/browse/CURATOR-688)] - SharedCount will be never updated successful when version of ZNode is overflow [[CURATOR-696](https://issues.apache.org/jira/browse/CURATOR-696)] - Double leader for LeaderLatch [[CURATOR-704](https://issues.apache.org/jira/browse/CURATOR-704)] - Use server version to detect supported features https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12314425=12354115 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46998 from wayneguow/curator_upgrade. Authored-by: Wei Guo Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- pom.xml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 5bdd31086bdf..c74482eb2fdb 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -51,9 +51,9 @@ commons-math3/3.6.1//commons-math3-3.6.1.jar commons-pool/1.5.4//commons-pool-1.5.4.jar commons-text/1.12.0//commons-text-1.12.0.jar compress-lzf/1.1.2//compress-lzf-1.1.2.jar -curator-client/5.6.0//curator-client-5.6.0.jar -curator-framework/5.6.0//curator-framework-5.6.0.jar -curator-recipes/5.6.0//curator-recipes-5.6.0.jar +curator-client/5.7.0//curator-client-5.7.0.jar +curator-framework/5.7.0//curator-framework-5.7.0.jar +curator-recipes/5.7.0//curator-recipes-5.7.0.jar datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar diff --git a/pom.xml b/pom.xml index fc372ba17278..0c2fa604902f 100644 --- a/pom.xml +++ b/pom.xml @@ -128,7 +128,7 @@ 3.11.4 ${hadoop.version} 3.9.2 -5.6.0 +5.7.0 org.apache.hive core - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48627][SQL] Perf improvement for binary to to HEX_DISCRETE string
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0c16624c77ad [SPARK-48627][SQL] Perf improvement for binary to to HEX_DISCRETE string 0c16624c77ad is described below commit 0c16624c77ad311a3d076c4cfb4451b1f19f8a9b Author: Kent Yao AuthorDate: Mon Jun 17 13:42:46 2024 +0800 [SPARK-48627][SQL] Perf improvement for binary to to HEX_DISCRETE string ### What changes were proposed in this pull request? By replacing `String.format`, we can achieve nearly 200x performance improvement. The SparkStringUtils.getHexString is widely used by - the Spark Thrift Server to convert binary to string when sending results to clients - the Spark SQL shell for display - the Spark Shell when calling `show` - the Spark Connect scala client when stringifying binaries in arrow vectors ``` +OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 +Apple M2 Max +Cardinality 10: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +Spark 42210 43595 1207 0.0 422102.9 1.0X +Java238243 2 0.42381.9 177.2X ``` ### Why are the changes needed? perf improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? By existing binary*.sql's results ### Was this patch authored or co-authored using generative AI tooling? no Closes #46984 from yaooqinn/SPARK-48627. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../scala/org/apache/spark/sql/catalyst/util/StringUtils.scala| 8 +++- .../scala/org/apache/spark/sql/catalyst/util/StringUtils.scala| 6 -- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala index aa8826dd48b6..edb1ee371b15 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala @@ -16,6 +16,7 @@ */ package org.apache.spark.sql.catalyst.util +import java.util.HexFormat import java.util.concurrent.atomic.AtomicBoolean import org.apache.spark.internal.Logging @@ -101,11 +102,16 @@ object SparkStringUtils extends Logging { truncatedString(seq, "", sep, "", maxFields) } + private final lazy val SPACE_DELIMITED_UPPERCASE_HEX = +HexFormat.of().withDelimiter(" ").withUpperCase() + /** * Returns a pretty string of the byte array which prints each byte as a hex digit and add spaces * between them. For example, [1A C0]. */ - def getHexString(bytes: Array[Byte]): String = bytes.map("%02X".format(_)).mkString("[", " ", "]") + def getHexString(bytes: Array[Byte]): String = { +s"[${SPACE_DELIMITED_UPPERCASE_HEX.formatHex(bytes)}]" + } def sideBySide(left: String, right: String): Seq[String] = { sideBySide(left.split("\n").toImmutableArraySeq, right.split("\n").toImmutableArraySeq) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala index 2fecd9a23759..e2a5319cbe1a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala @@ -66,12 +66,6 @@ object StringUtils extends Logging { "(?s)" + out.result() // (?s) enables dotall mode, causing "." to match new lines } - /** - * Returns a pretty string of the byte array which prints each byte as a hex digit and add spaces - * between them. For example, [1A C0]. - */ - def getHexString(bytes: Array[Byte]): String = bytes.map("%02X".format(_)).mkString("[", " ", "]") - private[this] val trueStrings = Set("t", "true", "y", "yes", "1").map(UTF8String.fromString) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48621][SQL] Fix Like simplification in Optimizer for collated strings
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8ee8abaa599f [SPARK-48621][SQL] Fix Like simplification in Optimizer for collated strings 8ee8abaa599f is described below commit 8ee8abaa599fd6efea85018549f1ec135af319e0 Author: Uros Bojanic <157381213+uros...@users.noreply.github.com> AuthorDate: Fri Jun 14 17:32:16 2024 +0800 [SPARK-48621][SQL] Fix Like simplification in Optimizer for collated strings ### What changes were proposed in this pull request? Enable `LikeSimplification` optimizer rule for collated strings. ### Why are the changes needed? Optimize how `Like` expression works with collated strings and ensure collation awareness when replacing `Like` expressions with `StartsWith` / `EndsWith` / `Contains` / `EqualTo` under special conditions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New e2e sql tests in `CollationSQLRegexpSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46976 from uros-db/like-simplification. Authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com> Signed-off-by: Kent Yao --- .../spark/sql/catalyst/optimizer/expressions.scala | 17 +++ .../apache/spark/sql/CollationSQLRegexpSuite.scala | 56 ++ 2 files changed, 65 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala index 2c55e4c8fd37..2606dd2d7737 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala @@ -738,18 +738,19 @@ object LikeSimplification extends Rule[LogicalPlan] with PredicateHelper { } else { pattern match { case startsWith(prefix) => - Some(StartsWith(input, Literal(prefix))) + Some(StartsWith(input, Literal.create(prefix, input.dataType))) case endsWith(postfix) => - Some(EndsWith(input, Literal(postfix))) + Some(EndsWith(input, Literal.create(postfix, input.dataType))) // 'a%a' pattern is basically same with 'a%' && '%a'. // However, the additional `Length` condition is required to prevent 'a' match 'a%a'. -case startsAndEndsWith(prefix, postfix) => - Some(And(GreaterThanOrEqual(Length(input), Literal(prefix.length + postfix.length)), -And(StartsWith(input, Literal(prefix)), EndsWith(input, Literal(postfix) +case startsAndEndsWith(prefix, postfix) => Some( + And(GreaterThanOrEqual(Length(input), Literal.create(prefix.length + postfix.length)), + And(StartsWith(input, Literal.create(prefix, input.dataType)), +EndsWith(input, Literal.create(postfix, input.dataType) case contains(infix) => - Some(Contains(input, Literal(infix))) + Some(Contains(input, Literal.create(infix, input.dataType))) case equalTo(str) => - Some(EqualTo(input, Literal(str))) + Some(EqualTo(input, Literal.create(str, input.dataType))) case _ => None } } @@ -785,7 +786,7 @@ object LikeSimplification extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressionsWithPruning( _.containsPattern(LIKE_FAMLIY), ruleId) { -case l @ Like(input, Literal(pattern, StringType), escapeChar) => +case l @ Like(input, Literal(pattern, _: StringType), escapeChar) => if (pattern == null) { // If pattern is null, return null value directly, since "col like null" == null. Literal(null, BooleanType) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala index 740583064279..885ed3709868 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CollationSQLRegexpSuite.scala @@ -18,6 +18,8 @@ package org.apache.spark.sql import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.internal.SqlApiConf import org.apache.spark.sql.test.SharedSparkSession import org.apache.spark.sql.types.{ArrayType, BooleanType, IntegerType, StringType} @@ -55,6 +57,60 @@ class CollationSQLRegexpSuite }) } + test("Like simplification should work with collated strings") { +cas
(spark) branch master updated: [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dd8b05f25fdc [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer` dd8b05f25fdc is described below commit dd8b05f25fdc2c964e351f4cbbf0dd474474783c Author: wayneguow AuthorDate: Fri Jun 14 15:11:33 2024 +0800 [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer` ### What changes were proposed in this pull request? Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config spark.shuffle.localDisk.file.output.buffer instead. ### Why are the changes needed? The old config is desgined to be used in UnsafeShuffleWriter, but now it has been used in all local shuffle writers through LocalDiskShuffleMapOutputWriter, introduced by #25007. ### Does this PR introduce _any_ user-facing change? Old still works, advised to use new. ### How was this patch tested? Passed existing tests. Closes #39819 from wayneguow/shuffle_output_buffer. Authored-by: wayneguow Signed-off-by: Kent Yao --- .../shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java | 2 +- core/src/main/scala/org/apache/spark/SparkConf.scala | 4 +++- .../scala/org/apache/spark/internal/config/package.scala | 10 -- .../sort/io/LocalDiskShuffleMapOutputWriterSuite.scala | 2 +- docs/configuration.md| 12 ++-- docs/core-migration-guide.md | 2 ++ 6 files changed, 25 insertions(+), 7 deletions(-) diff --git a/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java b/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java index 606bb625f5b2..c0b9018c770a 100644 --- a/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java +++ b/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriter.java @@ -74,7 +74,7 @@ public class LocalDiskShuffleMapOutputWriter implements ShuffleMapOutputWriter { this.blockResolver = blockResolver; this.bufferSize = (int) (long) sparkConf.get( -package$.MODULE$.SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE()) * 1024; +package$.MODULE$.SHUFFLE_LOCAL_DISK_FILE_OUTPUT_BUFFER_SIZE()) * 1024; this.partitionLengths = new long[numPartitions]; this.outputFile = blockResolver.getDataFile(shuffleId, mapId); this.outputTempFile = null; diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala b/core/src/main/scala/org/apache/spark/SparkConf.scala index 95955455a9d4..cfb514913694 100644 --- a/core/src/main/scala/org/apache/spark/SparkConf.scala +++ b/core/src/main/scala/org/apache/spark/SparkConf.scala @@ -647,7 +647,9 @@ private[spark] object SparkConf extends Logging { DeprecatedConfig("spark.yarn.blacklist.executor.launch.blacklisting.enabled", "3.1.0", "Please use spark.yarn.executor.launch.excludeOnFailure.enabled"), DeprecatedConfig("spark.network.remoteReadNioBufferConversion", "3.5.2", -"Please open a JIRA ticket to report it if you need to use this configuration.") +"Please open a JIRA ticket to report it if you need to use this configuration."), + DeprecatedConfig("spark.shuffle.unsafe.file.output.buffer", "4.0.0", +"Please use spark.shuffle.localDisk.file.output.buffer") ) Map(configs.map { cfg => (cfg.key -> cfg) } : _*) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index a7268c640991..9fcd9ba529c1 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -1463,8 +1463,7 @@ package object config { private[spark] val SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE = ConfigBuilder("spark.shuffle.unsafe.file.output.buffer") - .doc("The file system for this buffer size after each partition " + -"is written in unsafe shuffle writer. In KiB unless otherwise specified.") + .doc("(Deprecated since Spark 4.0, please use 'spark.shuffle.localDisk.file.output.buffer'.)") .version("2.3.0") .bytesConf(ByteUnit.KiB) .checkValue(v => v > 0 && v <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH / 1024, @@ -1472,6 +1471,13 @@ package object config { s" ${B
(spark) branch master updated: [SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 38318863f641 [SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11 38318863f641 is described below commit 38318863f6411df7bb1ec105b8b2d3bd1dff3c6d Author: Wei Guo AuthorDate: Fri Jun 14 13:54:54 2024 +0800 [SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11 ### What changes were proposed in this pull request? Upgrade `mssql-jdbc` to 12.6.2.jre11 ### Why are the changes needed? There are some issue fixes and enhancements: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.6.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46981 from wayneguow/mssql-jdbc. Authored-by: Wei Guo Signed-off-by: Kent Yao --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index a900cd993335..a80bb3c0a6c3 100644 --- a/pom.xml +++ b/pom.xml @@ -326,7 +326,7 @@ 8.4.0 42.7.3 11.5.9.0 -12.6.1.jre11 +12.6.2.jre11 23.4.0.24.05 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to `parquet-java`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0b214f166a92 [MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to `parquet-java` 0b214f166a92 is described below commit 0b214f166a92c4e6b4fdc102f7718903a1a152d5 Author: Wei Guo AuthorDate: Fri Jun 14 10:33:49 2024 +0800 [MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to `parquet-java` ### What changes were proposed in this pull request? This pr replaces parquet related repo name from `parquet-mr` to `parquet-java` and repo link from `https://github.com/apache/parquet-mr` to `https://github.com/apache/parquet-java`. ### Why are the changes needed? The upstream repo name has made a change with [INFRA-25802](https://issues.apache.org/jira/browse/INFRA-25802), [PARQUET-2475](https://issues.apache.org/jira/browse/PARQUET-2475), it's better to update with the latest name and link. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46963 from wayneguow/parquet. Authored-by: Wei Guo Signed-off-by: Kent Yao --- docs/sql-data-sources-load-save-functions.md| 2 +- docs/sql-data-sources-parquet.md| 6 +++--- .../datasources/parquet/ParquetInteroperabilitySuite.scala | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/sql-data-sources-load-save-functions.md b/docs/sql-data-sources-load-save-functions.md index b42f6e84076d..70105c22e583 100644 --- a/docs/sql-data-sources-load-save-functions.md +++ b/docs/sql-data-sources-load-save-functions.md @@ -109,7 +109,7 @@ For example, you can control bloom filters and dictionary encodings for ORC data The following ORC example will create bloom filter and use dictionary encoding only for `favorite_color`. For Parquet, there exists `parquet.bloom.filter.enabled` and `parquet.enable.dictionary`, too. To find more detailed information about the extra ORC/Parquet options, -visit the official Apache [ORC](https://orc.apache.org/docs/spark-config.html) / [Parquet](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop) websites. +visit the official Apache [ORC](https://orc.apache.org/docs/spark-config.html) / [Parquet](https://github.com/apache/parquet-java/tree/master/parquet-hadoop) websites. ORC data source: diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index f5c5ccd3b89a..5a0ca595fabb 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -350,7 +350,7 @@ Dataset df2 = spark.read().parquet("/path/to/table.parquet.encrypted"); KMS Client -The InMemoryKMS class is provided only for illustration and simple demonstration of Parquet encryption functionality. **It should not be used in a real deployment**. The master encryption keys must be kept and managed in a production-grade KMS system, deployed in user's organization. Rollout of Spark with Parquet encryption requires implementation of a client class for the KMS server. Parquet provides a plug-in [interface](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/p [...] +The InMemoryKMS class is provided only for illustration and simple demonstration of Parquet encryption functionality. **It should not be used in a real deployment**. The master encryption keys must be kept and managed in a production-grade KMS system, deployed in user's organization. Rollout of Spark with Parquet encryption requires implementation of a client class for the KMS server. Parquet provides a plug-in [interface](https://github.com/apache/parquet-java/blob/apache-parquet-1.13.1 [...] {% highlight java %} @@ -371,9 +371,9 @@ public interface KmsClient { -An [example](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/crypto/keytools/samples/VaultClient.java) of such class for an open source [KMS](https://www.vaultproject.io/api/secret/transit) can be found in the parquet-mr repository. The production KMS client should be designed in cooperation with organization's security administrators, and built by developers with an experience in access control management. Once such class is created, it c [...] +An [example](https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/crypto/keytools/samples/VaultClient.java) of such class for an open source [KMS](https://www.vaultproject.io/api/secret/transit) can be found in the parquet-java repository. The production KMS client should be designed in c
(spark) branch master updated (be154a371df0 -> 70bdcc97910e)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from be154a371df0 [SPARK-48622][SQL] get SQLConf once when resolving column names add 70bdcc97910e [MINOR][DOCS] Fix metrics info of shuffle service No new revisions were added by this update. Summary of changes: docs/monitoring.md | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48622][SQL] get SQLConf once when resolving column names
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new be154a371df0 [SPARK-48622][SQL] get SQLConf once when resolving column names be154a371df0 is described below commit be154a371df0401163deb221efc3b54fa089f49c Author: Andrew Xue AuthorDate: Fri Jun 14 10:04:24 2024 +0800 [SPARK-48622][SQL] get SQLConf once when resolving column names ### What changes were proposed in this pull request? `SQLConf.caseSensitiveAnalysis` is currently being retrieved for every column when resolving column names. This is expensive if there are many columns. We can instead retrieve it once before the loop, and reuse the result. ### Why are the changes needed? Performance improvement. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Profiles of adding 1 column on an empty 10k column table (hms-parquet): Before (55s): https://github.com/databricks/runtime/assets/169104436/58de6a56-943e-465a-9005-ae98f960779e;> After (13s): https://github.com/databricks/runtime/assets/169104436/e9bdabc4-6e29-4012-bb01-103fa0b640fc;> ### Was this patch authored or co-authored using generative AI tooling? No Closes #46979 from andrewxue-db/andrewxue-db/spark-48622. Authored-by: Andrew Xue Signed-off-by: Kent Yao --- .../org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 7a19f276b513..0e0852d0a550 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -480,8 +480,9 @@ class SessionCatalog( val catalogTable = externalCatalog.getTable(db, table) val oldDataSchema = catalogTable.dataSchema // not supporting dropping columns yet +val resolver = conf.resolver val nonExistentColumnNames = - oldDataSchema.map(_.name).filterNot(columnNameResolved(newDataSchema, _)) + oldDataSchema.map(_.name).filterNot(columnNameResolved(resolver, newDataSchema, _)) if (nonExistentColumnNames.nonEmpty) { throw QueryCompilationErrors.dropNonExistentColumnsNotSupportedError(nonExistentColumnNames) } @@ -489,8 +490,11 @@ class SessionCatalog( externalCatalog.alterTableDataSchema(db, table, newDataSchema) } - private def columnNameResolved(schema: StructType, colName: String): Boolean = { -schema.fields.map(_.name).exists(conf.resolver(_, colName)) + private def columnNameResolved( + resolver: Resolver, + schema: StructType, + colName: String): Boolean = { +schema.fields.exists(f => resolver(f.name, colName)) } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (fd045c9887fe -> ea2bca74923e)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from fd045c9887fe [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of `commons-io` called in Spark add ea2bca74923e [SPARK-48602][SQL] Make csv generator support different output style with spark.sql.binaryOutputStyle No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/csv/UnivocityGenerator.scala | 8 +--- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../resources/sql-tests/analyzer-results/binary.sql.out| 7 +++ .../sql-tests/analyzer-results/binary_base64.sql.out | 7 +++ .../sql-tests/analyzer-results/binary_basic.sql.out| 7 +++ .../sql-tests/analyzer-results/binary_hex.sql.out | 7 +++ .../{binary_basic.sql.out => binary_hex_discrete.sql.out} | 7 +++ sql/core/src/test/resources/sql-tests/inputs/binary.sql| 1 + .../resources/sql-tests/inputs/binary_hex_discrete.sql | 3 +++ .../src/test/resources/sql-tests/results/binary.sql.out| 8 .../test/resources/sql-tests/results/binary_base64.sql.out | 8 .../test/resources/sql-tests/results/binary_basic.sql.out | 8 .../test/resources/sql-tests/results/binary_hex.sql.out| 8 .../{binary_basic.sql.out => binary_hex_discrete.sql.out} | 14 +++--- .../sql/hive/thriftserver/ThriftServerQueryTestSuite.scala | 1 + 15 files changed, 89 insertions(+), 7 deletions(-) copy sql/core/src/test/resources/sql-tests/analyzer-results/{binary_basic.sql.out => binary_hex_discrete.sql.out} (69%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/binary_hex_discrete.sql copy sql/core/src/test/resources/sql-tests/results/{binary_basic.sql.out => binary_hex_discrete.sql.out} (55%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48596][SQL] Perf improvement for calculating hex string for long
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b5e1b7988031 [SPARK-48596][SQL] Perf improvement for calculating hex string for long b5e1b7988031 is described below commit b5e1b7988031044d3cbdb277668b775c08db1a74 Author: Kent Yao AuthorDate: Wed Jun 12 20:23:03 2024 +0800 [SPARK-48596][SQL] Perf improvement for calculating hex string for long ### What changes were proposed in this pull request? This pull request optimizes the `Hex.hex(num: Long)` method by removing leading zeros, thus eliminating the need to copy the array to remove them afterward. ### Why are the changes needed? - Unit tests added - Did a benchmark locally (30~50% speedup) ```scala Hex Long Tests: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Legacy 1062 1094 16 9.4 106.2 1.0X New 739807 26 13.5 73.9 1.4X ``` ```scala object HexBenchmark extends BenchmarkBase { override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { val N = 10_000_000 runBenchmark("Hex") { val benchmark = new Benchmark("Hex Long Tests", N, 10, output = output) val range = 1 to 12 benchmark.addCase("Legacy") { _ => (1 to N).foreach(x => range.foreach(y => hexLegacy(x - y))) } benchmark.addCase("New") { _ => (1 to N).foreach(x => range.foreach(y => Hex.hex(x - y))) } benchmark.run() } } def hexLegacy(num: Long): UTF8String = { // Extract the hex digits of num into value[] from right to left val value = new Array[Byte](16) var numBuf = num var len = 0 do { len += 1 // Hex.hexDigits need to be seen here value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt) numBuf >>>= 4 } while (numBuf != 0) UTF8String.fromBytes(java.util.Arrays.copyOfRange(value, value.length - len, value.length)) } } ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? no Closes #46952 from yaooqinn/SPARK-48596. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../sql/catalyst/expressions/mathExpressions.scala | 28 --- .../spark/sql/catalyst/expressions/HexSuite.scala | 40 ++ 2 files changed, 55 insertions(+), 13 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index 8df46500ddcf..6801fc7c257c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -1018,9 +1018,9 @@ case class Bin(child: Expression) } object Hex { - val hexDigits = Array[Char]( -'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' - ).map(_.toByte) + private final val hexDigits = +Array[Byte]('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F') + private final val ZERO_UTF8 = UTF8String.fromBytes(Array[Byte]('0')) // lookup table to translate '0' -> 0 ... 'F'/'f' -> 15 val unhexDigits = { @@ -1036,24 +1036,26 @@ object Hex { val value = new Array[Byte](length * 2) var i = 0 while (i < length) { - value(i * 2) = Hex.hexDigits((bytes(i) & 0xF0) >> 4) - value(i * 2 + 1) = Hex.hexDigits(bytes(i) & 0x0F) + value(i * 2) = hexDigits((bytes(i) & 0xF0) >> 4) + value(i * 2 + 1) = hexDigits(bytes(i) & 0x0F) i += 1 } UTF8String.fromBytes(value) } def hex(num: Long): UTF8String = { -// Extract the hex digits of num into value[] from right to left -val value = new Array[Byte](16) +val zeros = jl.Long.numberOfLeadingZeros(num) +if (zeros == jl.Long.SIZE) return ZERO_UTF8 +val len = (jl.Long.SIZE - zeros + 3) / 4 var numBuf = num -var len = 0 -do { - len += 1 - value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt) +val value =
(spark) branch master updated: [SPARK-48595][CORE] Cleanup deprecated api usage related to `commons-compress`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a3625a98e78c [SPARK-48595][CORE] Cleanup deprecated api usage related to `commons-compress` a3625a98e78c is described below commit a3625a98e78c43c64cbe4a21f7c70f46307df508 Author: yangjie01 AuthorDate: Wed Jun 12 17:11:22 2024 +0800 [SPARK-48595][CORE] Cleanup deprecated api usage related to `commons-compress` ### What changes were proposed in this pull request? This pr use `org.apache.commons.io.output.CountingOutputStream` instead of `org.apache.commons.compress.utils.CountingOutputStream` to fix the following compilation warnings related to 'commons-compress': ``` [WARNING] [Warn] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala:308: class CountingOutputStream in package utils is deprecated Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.deploy.history.RollingEventLogFilesWriter.countingOutputStream, origin=org.apache.commons.compress.utils.CountingOutputStream [WARNING] [Warn] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala:351: class CountingOutputStream in package utils is deprecated Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.deploy.history.RollingEventLogFilesWriter.rollEventLogFile.$anonfun, origin=org.apache.commons.compress.utils.CountingOutputStream ``` The fix refers to: https://github.com/apache/commons-compress/blob/95727006cac0892c654951c4e7f1db142462f22a/src/main/java/org/apache/commons/compress/utils/CountingOutputStream.java#L25-L33 ``` /** * Stream that tracks the number of bytes read. * * since 1.3 * NotThreadSafe * deprecated Use {link org.apache.commons.io.output.CountingOutputStream}. */ Deprecated public class CountingOutputStream extends FilterOutputStream { ``` ### Why are the changes needed? Cleanup deprecated api usage related to `commons-compress` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46950 from LuciferYang/SPARK-48595. Authored-by: yangjie01 Signed-off-by: Kent Yao --- .../scala/org/apache/spark/deploy/history/EventLogFileWriters.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala b/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala index 963ed121547c..f3bb6d5af335 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala @@ -21,7 +21,7 @@ import java.io._ import java.net.URI import java.nio.charset.StandardCharsets -import org.apache.commons.compress.utils.CountingOutputStream +import org.apache.commons.io.output.CountingOutputStream import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileStatus, FileSystem, FSDataOutputStream, Path} import org.apache.hadoop.fs.permission.FsPermission @@ -330,7 +330,7 @@ class RollingEventLogFilesWriter( override def writeEvent(eventJson: String, flushLogger: Boolean = false): Unit = { writer.foreach { w => - val currentLen = countingOutputStream.get.getBytesWritten + val currentLen = countingOutputStream.get.getByteCount if (currentLen + eventJson.length > eventFileMaxLength) { rollEventLogFile() } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48584][SQL] Perf improvement for unescapePathName
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new da81d8ecb802 [SPARK-48584][SQL] Perf improvement for unescapePathName da81d8ecb802 is described below commit da81d8ecb80226fa5fb2b6e50048f05d67fb5904 Author: Kent Yao AuthorDate: Wed Jun 12 16:39:49 2024 +0800 [SPARK-48584][SQL] Perf improvement for unescapePathName ### What changes were proposed in this pull request? This PR improves perf for unescapePathName with algorithms briefly described as: - If a path contains no '%' or contains '%' at `position > path.length-2`, we return the original identity instead of creating a new StringBuilder to append char by char - Otherwise, we loop with 2 indices, `plaintextStartIdx` which starts from 0 and then points to the next char after resolving `%xx`, and `plaintextEndIdx` which points to the next `'%'`. `plaintextStartIdx` moves to `plaintextEndIdx + 3` if `%xx` is valid, or moves to `plaintextEndIdx + 1` if `%xx` is invalid. - Instead of using Integer.parseInt with error capture, we identify the high and low characters manually. ### Why are the changes needed? performance improvement for hotspots ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - new tests in ExternalCatalogUtilsSuite - Benchmark results (9-11x faster) ### Was this patch authored or co-authored using generative AI tooling? no Closes #46938 from yaooqinn/SPARK-48584. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../EscapePathBenchmark-jdk21-results.txt | 16 ++- .../benchmarks/EscapePathBenchmark-results.txt | 16 ++- .../catalyst/catalog/ExternalCatalogUtils.scala| 52 +- .../spark/sql/catalyst/EscapePathBenchmark.scala | 52 +- .../catalog/ExternalCatalogUtilsSuite.scala| 26 ++- 5 files changed, 135 insertions(+), 27 deletions(-) diff --git a/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt b/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt index 4fffb9bfd49a..3d16c874e8c9 100644 --- a/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt +++ b/sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt @@ -6,7 +6,19 @@ OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure AMD EPYC 7763 64-Core Processor Escape Tests: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Legacy 7128 7146 8 0.17127.9 1.0X -New 790795 5 1.3 789.7 9.0X +Legacy 6996 7009 9 0.16996.5 1.0X +New 771776 3 1.3 770.7 9.1X + + + +Unescape + + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure +AMD EPYC 7763 64-Core Processor +Unescape Tests: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +Legacy 5127 5137 6 0.25127.3 1.0X +New 579583 4 1.7 579.3 8.9X diff --git a/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt b/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt index 32e44f6e19ef..7cfa134652c2 100644 --- a/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt +++ b/sql/catalyst/benchmarks/EscapePathBenchmark-results.txt @@ -6,7 +6,19 @@ OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1021-azure AMD EPYC 7763 64-Core Processor Escape Tests: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Legacy 6719 6726 6 0.16719.3 1.0X -New
(spark) branch master updated: [SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8870efce19f2 [SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26 8870efce19f2 is described below commit 8870efce19f2abb8419f835d29304ffa7cc53251 Author: Wei Guo AuthorDate: Wed Jun 12 15:41:40 2024 +0800 [SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26 ### What changes were proposed in this pull request? Upgrade dropwizard metrics to 4.2.26. ### Why are the changes needed? There are some bug fixes as belows: - Correction for the Jetty-12 QTP metrics by dkaukov in https://github.com/dropwizard/metrics/pull/4181 - Fix metrics for InstrumentedEE10Handler by zUniQueX in https://github.com/dropwizard/metrics/pull/3928 The full release notes: https://github.com/dropwizard/metrics/releases/tag/v4.2.26 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46932 from wayneguow/codahale. Authored-by: Wei Guo Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +- pom.xml | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 4585b534e908..f1a575fb7446 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -190,11 +190,11 @@ log4j-layout-template-json/2.22.1//log4j-layout-template-json-2.22.1.jar log4j-slf4j2-impl/2.22.1//log4j-slf4j2-impl-2.22.1.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar -metrics-core/4.2.25//metrics-core-4.2.25.jar -metrics-graphite/4.2.25//metrics-graphite-4.2.25.jar -metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar -metrics-json/4.2.25//metrics-json-4.2.25.jar -metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar +metrics-core/4.2.26//metrics-core-4.2.26.jar +metrics-graphite/4.2.26//metrics-graphite-4.2.26.jar +metrics-jmx/4.2.26//metrics-jmx-4.2.26.jar +metrics-json/4.2.26//metrics-json-4.2.26.jar +metrics-jvm/4.2.26//metrics-jvm-4.2.26.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.110.Final//netty-all-4.1.110.Final.jar netty-buffer/4.1.110.Final//netty-buffer-4.1.110.Final.jar diff --git a/pom.xml b/pom.xml index bc81b810715b..c006a5a3234f 100644 --- a/pom.xml +++ b/pom.xml @@ -151,7 +151,7 @@ If you change codahale.metrics.version, you also need to change the link to metrics.dropwizard.io in docs/monitoring.md. --> -4.2.25 +4.2.26 1.11.3 1.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 72df3cb1a43b [SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test 72df3cb1a43b is described below commit 72df3cb1a43bd3cc0b20456733228dbb0b403305 Author: yangjie01 AuthorDate: Wed Jun 12 10:14:38 2024 +0800 [SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test ### What changes were proposed in this pull request? This pr aims to upgrade `braces` from 3.0.2 to 3.0.3 in ui-test. The original pr was submitted by `dependabot`: https://github.com/apache/spark/pull/46931 ### Why are the changes needed? The new version fix vulnerability https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727 - https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3 The complete list of changes is as follows: - https://github.com/micromatch/braces/compare/3.0.2...3.0.3 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46933 from LuciferYang/SPARK-48582. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Kent Yao --- ui-test/package-lock.json | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/ui-test/package-lock.json b/ui-test/package-lock.json index 23ff8ede6515..ec870dfa4801 100644 --- a/ui-test/package-lock.json +++ b/ui-test/package-lock.json @@ -1392,12 +1392,12 @@ } }, "node_modules/braces": { - "version": "3.0.2", - "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.2.tgz;, - "integrity": "sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A==", + "version": "3.0.3", + "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.3.tgz;, + "integrity": "sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==", "dev": true, "dependencies": { -"fill-range": "^7.0.1" +"fill-range": "^7.1.1" }, "engines": { "node": ">=8" @@ -1911,9 +1911,9 @@ } }, "node_modules/fill-range": { - "version": "7.0.1", - "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.0.1.tgz;, - "integrity": "sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ==", + "version": "7.1.1", + "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.1.1.tgz;, + "integrity": "sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==", "dev": true, "dependencies": { "to-regex-range": "^5.0.1" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: Update docker links on the download page (#522)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new c6fb226d71 Update docker links on the download page (#522) c6fb226d71 is described below commit c6fb226d7197e6c2e3a3922c04c506cd0cc6cee1 Author: Kent Yao AuthorDate: Tue Jun 11 18:52:26 2024 +0800 Update docker links on the download page (#522) --- downloads.md| 6 -- site/downloads.html | 6 -- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/downloads.md b/downloads.md index 6598534668..43312a2895 100644 --- a/downloads.md +++ b/downloads.md @@ -41,9 +41,11 @@ Spark artifacts are [hosted in Maven Central](https://search.maven.org/search?q= https://pypi.org/project/pyspark/;>PySpark is now available in pypi. To install just run `pip install pyspark`. -### Convenience Docker Container Images +### Installing with Docker -[Spark Docker Container images are available from DockerHub](https://hub.docker.com/r/apache/spark-py/tags), these images contain non-ASF software and may be subject to different license terms. +Spark docker images are available from Dockerhub under the accounts of both [The Apache Software Foundation](https://hub.docker.com/r/apache/spark/) and [Official Images](https://hub.docker.com/_/spark). + +Note that, these images contain non-ASF software and may be subject to different license terms. Please check their [Dockerfiles](https://github.com/apache/spark-docker) to verify whether to verify whether they are compatible with your deployment. ### Release notes for stable releases diff --git a/site/downloads.html b/site/downloads.html index 77baa1a1fe..fad86b7f58 100644 --- a/site/downloads.html +++ b/site/downloads.html @@ -182,9 +182,11 @@ version: 3.5.1 Installing with PyPi https://pypi.org/project/pyspark/;>PySpark is now available in pypi. To install just run pip install pyspark. -Convenience Docker Container Images +Installing with Docker -https://hub.docker.com/r/apache/spark-py/tags;>Spark Docker Container images are available from DockerHub, these images contain non-ASF software and may be subject to different license terms. +Spark docker images are available from Dockerhub under the accounts of both https://hub.docker.com/r/apache/spark/;>The Apache Software Foundation and https://hub.docker.com/_/spark;>Official Images. + +Note that, these images contain non-ASF software and may be subject to different license terms. Please check their https://github.com/apache/spark-docker;>Dockerfiles to verify whether to verify whether they are compatible with your deployment. Release notes for stable releases - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48565][UI] Fix thread dump display in UI
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 53d65fd12dd9 [SPARK-48565][UI] Fix thread dump display in UI 53d65fd12dd9 is described below commit 53d65fd12dd9231139188227ef9040d40d759021 Author: Cheng Pan AuthorDate: Tue Jun 11 11:28:50 2024 +0800 [SPARK-48565][UI] Fix thread dump display in UI ### What changes were proposed in this pull request? Thread dump display in UI is not pretty as before, this is side-effect introduced by SPARK-44863 ### Why are the changes needed? Restore thread dump display in UI. ### Does this PR introduce _any_ user-facing change? Yes, it only affects UI display. ### How was this patch tested? Current master: https://github.com/apache/spark/assets/26535726/5c6fd770-467f-481c-a635-2855a2853633;> With this patch applied: https://github.com/apache/spark/assets/26535726/3998c2aa-671f-4921-8444-b7bca8667202;> ### Was this patch authored or co-authored using generative AI tooling? No Closes #46916 from pan3793/SPARK-48565. Authored-by: Cheng Pan Signed-off-by: Kent Yao --- core/src/main/scala/org/apache/spark/status/api/v1/api.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala index 7a0c69e29488..6ae1dce57f31 100644 --- a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala +++ b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala @@ -510,7 +510,7 @@ case class StackTrace(elems: Seq[String]) { override def toString: String = elems.mkString def html: NodeSeq = { -val withNewLine = elems.foldLeft(NodeSeq.Empty) { (acc, elem) => +val withNewLine = elems.map(_.stripLineEnd).foldLeft(NodeSeq.Empty) { (acc, elem) => if (acc.isEmpty) { acc :+ Text(elem) } else { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable"
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b7d9c317aa2e Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable" b7d9c317aa2e is described below commit b7d9c317aa2e4de8024e44db895fa8b0cbbb36db Author: Kent Yao AuthorDate: Fri Jun 7 16:31:47 2024 +0800 Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable" This reverts commit 82b4ad2af64845503604da70ff02748c3969c991. --- common/utils/src/main/resources/error/error-conditions.json | 5 - .../execution/datasources/v2/jdbc/JDBCTableCatalog.scala| 13 + 2 files changed, 5 insertions(+), 13 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 36d8fe1daa37..7b8830073770 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -1255,11 +1255,6 @@ "List namespaces." ] }, - "LOAD_TABLE" : { -"message" : [ - "Load the table ." -] - }, "NAMESPACE_EXISTS" : { "message" : [ "Check that the namespace exists." diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala index e7a3fe0f8aa7..dbd8ee5981da 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala @@ -131,16 +131,13 @@ class JDBCTableCatalog extends TableCatalog checkNamespace(ident.namespace()) val optionsWithTableName = new JDBCOptions( options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> getTableName(ident))) -JdbcUtils.classifyException( - errorClass = "FAILED_JDBC.LOAD_TABLE", - messageParameters = Map( -"url" -> options.getRedactUrl(), -"tableName" -> toSQLId(ident)), - dialect, - description = s"Failed to load table: $ident" -) { +try { val schema = JDBCRDD.resolveTable(optionsWithTableName) JDBCTable(ident, schema, optionsWithTableName) +} catch { + case e: SQLException => +logWarning("Failed to load table", e) +throw QueryCompilationErrors.noSuchTableError(ident) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (82b4ad2af648 -> 94912920b0e9)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 82b4ad2af648 [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable add 94912920b0e9 [SPARK-48548][BUILD] Add LICENSE/NOTICE for spark-core with shaded dependencies No new revisions were added by this update. Summary of changes: core/pom.xml | 2 + .../src/main/resources/META-INF/LICENSE| 49 ++ core/src/main/resources/META-INF/NOTICE| 29 + 3 files changed, 43 insertions(+), 37 deletions(-) copy LICENSE => core/src/main/resources/META-INF/LICENSE (92%) create mode 100644 core/src/main/resources/META-INF/NOTICE - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 82b4ad2af648 [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable 82b4ad2af648 is described below commit 82b4ad2af64845503604da70ff02748c3969c991 Author: Wenchen Fan AuthorDate: Fri Jun 7 10:11:40 2024 +0800 [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/44335 , which missed to handle `loadTable` ### Why are the changes needed? better error message ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing test ### Was this patch authored or co-authored using generative AI tooling? no Closes #46905 from cloud-fan/jdbc. Authored-by: Wenchen Fan Signed-off-by: Kent Yao --- common/utils/src/main/resources/error/error-conditions.json | 5 + .../execution/datasources/v2/jdbc/JDBCTableCatalog.scala| 13 - 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 7b8830073770..36d8fe1daa37 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -1255,6 +1255,11 @@ "List namespaces." ] }, + "LOAD_TABLE" : { +"message" : [ + "Load the table ." +] + }, "NAMESPACE_EXISTS" : { "message" : [ "Check that the namespace exists." diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala index dbd8ee5981da..e7a3fe0f8aa7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala @@ -131,13 +131,16 @@ class JDBCTableCatalog extends TableCatalog checkNamespace(ident.namespace()) val optionsWithTableName = new JDBCOptions( options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> getTableName(ident))) -try { +JdbcUtils.classifyException( + errorClass = "FAILED_JDBC.LOAD_TABLE", + messageParameters = Map( +"url" -> options.getRedactUrl(), +"tableName" -> toSQLId(ident)), + dialect, + description = s"Failed to load table: $ident" +) { val schema = JDBCRDD.resolveTable(optionsWithTableName) JDBCTable(ident, schema, optionsWithTableName) -} catch { - case e: SQLException => -logWarning("Failed to load table", e) -throw QueryCompilationErrors.noSuchTableError(ident) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48540][CORE] Avoid ivy output loading settings to stdout
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f4434c36cc4f [SPARK-48540][CORE] Avoid ivy output loading settings to stdout f4434c36cc4f is described below commit f4434c36cc4f7b0147e0e8fe26ac0f177a5199cd Author: sychen AuthorDate: Thu Jun 6 14:35:52 2024 +0800 [SPARK-48540][CORE] Avoid ivy output loading settings to stdout ### What changes were proposed in this pull request? This PR aims to avoid ivy output loading settings to stdout. ### Why are the changes needed? Now `org.apache.spark.util.MavenUtils#getModuleDescriptor` will output the following string to stdout. This is due to the modified code order in SPARK-32596 . ``` :: loading settings :: url = jar:file:/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml ``` Stack trace ```java at org.apache.ivy.core.settings.IvySettings.load(IvySettings.java:404) at org.apache.ivy.core.settings.IvySettings.loadDefault(IvySettings.java:443) at org.apache.ivy.Ivy.configureDefault(Ivy.java:435) at org.apache.ivy.core.IvyContext.getDefaultIvy(IvyContext.java:201) at org.apache.ivy.core.IvyContext.getIvy(IvyContext.java:180) at org.apache.ivy.core.IvyContext.getSettings(IvyContext.java:216) at org.apache.ivy.core.module.status.StatusManager.getCurrent(StatusManager.java:40) at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.(DefaultModuleDescriptor.java:206) at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.newDefaultInstance(DefaultModuleDescriptor.java:107) at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.newDefaultInstance(DefaultModuleDescriptor.java:66) at org.apache.spark.deploy.SparkSubmitUtils$.getModuleDescriptor(SparkSubmit.scala:1413) at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1460) at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:327) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:942) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:181) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? local test ### Was this patch authored or co-authored using generative AI tooling? No Closes #46882 from cxzl25/SPARK-48540. Authored-by: sychen Signed-off-by: Kent Yao --- .../src/main/scala/org/apache/spark/util/MavenUtils.scala | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala index 08291859a32c..ae00987cd69f 100644 --- a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala +++ b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala @@ -462,14 +462,13 @@ private[spark] object MavenUtils extends Logging { val sysOut = System.out // Default configuration name for ivy val ivyConfName = "default" - - // A Module descriptor must be specified. Entries are dummy strings - val md = getModuleDescriptor - - md.setDefaultConf(ivyConfName) + var md: DefaultModuleDescriptor = null try { // To prevent ivy from logging to system out System.setOut(printStream) +// A Module descriptor must be specified. Entries are dummy strings +md = getModuleDescriptor +md.setDefaultConf(ivyConfName) val artifacts = extractMavenCoordinates(coordinates) // Directories for caching downloads through ivy and storing the jars when maven coordinates // are supplied to spark-submit @@ -548,7 +547,9 @@ private[spark] object MavenUtils extends Logging { } } finally { System.setOut(sysOut) -clearIvyResolutionFiles(md.getModuleRevisionId, ivySettings.getDefaultCache, ivyConfName) +if (md != null) { + clearIvyResolutionFiles(md.getModuleRevisionId, ivySettings.getDefaultCache, ivyConfName) +} } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (966c3d9ef1ed -> b3700ac09861)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 966c3d9ef1ed [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric add b3700ac09861 [SPARK-48539][BUILD][TESTS] Upgrade docker-java to 3.3.6 No new revisions were added by this update. Summary of changes: pom.xml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 31ce2db6d208 [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp 31ce2db6d208 is described below commit 31ce2db6d20828844d0acab464346d7e3a4206e8 Author: Kent Yao AuthorDate: Thu Jun 6 10:22:24 2024 +0800 [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp ### What changes were proposed in this pull request? As described in [HIVE-15551](https://issues.apache.org/jira/browse/HIVE-15551), HMS will memory leak when directsql is enabled for MySQL metastore DB. Although HIVE-15551 has been resolved already, the bug can still occur on our side as we have multiple hive version supported. Considering bonecp has been removed from hive since 4.0.0 and HikariCP is not supported by all hive versions we support, we replace bonecp with `DBCP` to avoid memory leak ### Why are the changes needed? fix memory leak of HMS ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Run `org.apache.spark.sql.hive.execution.SQLQuerySuite` and pass without linkage errors ### Was this patch authored or co-authored using generative AI tooling? no Closes #46879 from yaooqinn/SPARK-48538. Authored-by: Kent Yao Signed-off-by: Kent Yao --- LICENSE-binary | 1 - dev/deps/spark-deps-hadoop-3-hive-2.3| 1 - pom.xml | 4 .../scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala | 9 + 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index 456b07484257..b6971798e557 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -218,7 +218,6 @@ com.google.crypto.tink:tink com.google.flatbuffers:flatbuffers-java com.google.guava:guava com.jamesmurty.utils:java-xmlbuilder -com.jolbox:bonecp com.ning:compress-lzf com.squareup.okhttp3:logging-interceptor com.squareup.okhttp3:okhttp diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index acb236e1c4e0..8ab76b5787b8 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -29,7 +29,6 @@ azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar azure-storage/7.0.1//azure-storage-7.0.1.jar blas/3.0.3//blas-3.0.3.jar -bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar bundle/2.24.6//bundle-2.24.6.jar diff --git a/pom.xml b/pom.xml index bd384e42b0ec..585b8b193b32 100644 --- a/pom.xml +++ b/pom.xml @@ -2332,6 +2332,10 @@ co.cask.tephra * + +com.jolbox +bonecp + diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala index 2bb2fe970a11..11e077e891bd 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala @@ -1340,6 +1340,15 @@ private[hive] object HiveClientImpl extends Logging { log"will be reset to 'mr' to disable useless hive logic") hiveConf.set("hive.execution.engine", "mr", SOURCE_SPARK) } +val cpType = hiveConf.get("datanucleus.connectionPoolingType") +// Bonecp might cause memory leak, it could affect some hive client versions we support +// See more details in HIVE-15551 +// Also, Bonecp is removed in Hive 4.0.0, see HIVE-23258 +// Here we use DBCP to replace bonecp instead of HikariCP as HikariCP was introduced in +// Hive 2.2.0 (see HIVE-13931) while the minium Hive we support is 2.0.0. +if ("bonecp".equalsIgnoreCase(cpType)) { + hiveConf.set("datanucleus.connectionPoolingType", "DBCP", SOURCE_SPARK) +} hiveConf } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 88b8dc29e100 [SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement 88b8dc29e100 is described below commit 88b8dc29e100a51501701ffdffbcd0eff1f97c98 Author: Wenchen Fan AuthorDate: Wed Jun 5 17:40:59 2024 +0800 [SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/44976 . `ConcurrentHashMap#put` has a different semantic than the scala map, and it returns null if the key is new. We should update the checking code accordingly. ### Why are the changes needed? avoid wrong warning messages ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manual ### Was this patch authored or co-authored using generative AI tooling? no Closes #46876 from cloud-fan/log. Authored-by: Wenchen Fan Signed-off-by: Kent Yao --- .../scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index a52feaa41acf..588752f3fc17 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -222,7 +222,7 @@ trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] with Logging builder: FunctionBuilder): Unit = { val newFunction = (info, builder) functionBuilders.put(name, newFunction) match { - case previousFunction if previousFunction != newFunction => + case previousFunction if previousFunction != null => logWarning(log"The function ${MDC(FUNCTION_NAME, name)} replaced a " + log"previously registered function.") case _ => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d3a324d63f82 [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled d3a324d63f82 is described below commit d3a324d63f82ffc4a4818bb1bfe7485d12f1dada Author: Anish Shrigondekar AuthorDate: Wed Jun 5 16:34:45 2024 +0800 [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled ### What changes were proposed in this pull request? Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled ### Why are the changes needed? Clarifying the implications of turning off this config after a certain Spark version ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A - config doc only change ### Was this patch authored or co-authored using generative AI tooling? No Closes #46875 from anishshri-db/task/SPARK-48535. Authored-by: Anish Shrigondekar Signed-off-by: Kent Yao (cherry picked from commit c4f720dfb41919dade7002b49462b3ff6b91eb22) Signed-off-by: Kent Yao --- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 74ff4f09a157..ba27a03fdc31 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2120,7 +2120,9 @@ object SQLConf { buildConf("spark.sql.streaming.stateStore.skipNullsForStreamStreamJoins.enabled") .internal() .doc("When true, this config will skip null values in hash based stream-stream joins. " + - "The number of skipped null values will be shown as custom metric of stream join operator.") + "The number of skipped null values will be shown as custom metric of stream join operator. " + + "If the streaming query was started with Spark 3.5 or above, please exercise caution " + + "before enabling this config since it may hide potential data loss/corruption issues.") .version("3.3.0") .booleanConf .createWithDefault(false) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c4f720dfb419 [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled c4f720dfb419 is described below commit c4f720dfb41919dade7002b49462b3ff6b91eb22 Author: Anish Shrigondekar AuthorDate: Wed Jun 5 16:34:45 2024 +0800 [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled ### What changes were proposed in this pull request? Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled ### Why are the changes needed? Clarifying the implications of turning off this config after a certain Spark version ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A - config doc only change ### Was this patch authored or co-authored using generative AI tooling? No Closes #46875 from anishshri-db/task/SPARK-48535. Authored-by: Anish Shrigondekar Signed-off-by: Kent Yao --- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 88c2228e640c..c4e584b9e31d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2301,7 +2301,9 @@ object SQLConf { buildConf("spark.sql.streaming.stateStore.skipNullsForStreamStreamJoins.enabled") .internal() .doc("When true, this config will skip null values in hash based stream-stream joins. " + - "The number of skipped null values will be shown as custom metric of stream join operator.") + "The number of skipped null values will be shown as custom metric of stream join operator. " + + "If the streaming query was started with Spark 3.5 or above, please exercise caution " + + "before enabling this config since it may hide potential data loss/corruption issues.") .version("3.3.0") .booleanConf .createWithDefault(false) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: Revert "[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`"
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new db527ac346f2 Revert "[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`" db527ac346f2 is described below commit db527ac346f2f6f6dbddefe292a24848d1120172 Author: Kent Yao AuthorDate: Wed Jun 5 13:20:30 2024 +0800 Revert "[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`" This reverts commit abbe301d7645217f22641cf3a5c41502680e65be. --- core/src/main/scala/org/apache/spark/util/Utils.scala | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 991fb074d246..0ac1405abe6c 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -19,7 +19,7 @@ package org.apache.spark.util import java.io._ import java.lang.{Byte => JByte} -import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, ThreadInfo} +import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, PlatformManagedObject, ThreadInfo} import java.lang.reflect.InvocationTargetException import java.math.{MathContext, RoundingMode} import java.net._ @@ -3058,8 +3058,16 @@ private[spark] object Utils */ lazy val isG1GC: Boolean = { Try { - ManagementFactory.getGarbageCollectorMXBeans.asScala -.exists(_.getName.contains("G1")) + val clazz = Utils.classForName("com.sun.management.HotSpotDiagnosticMXBean") +.asInstanceOf[Class[_ <: PlatformManagedObject]] + val vmOptionClazz = Utils.classForName("com.sun.management.VMOption") + val hotSpotDiagnosticMXBean = ManagementFactory.getPlatformMXBean(clazz) + val vmOptionMethod = clazz.getMethod("getVMOption", classOf[String]) + val valueMethod = vmOptionClazz.getMethod("getValue") + + val useG1GCObject = vmOptionMethod.invoke(hotSpotDiagnosticMXBean, "UseG1GC") + val useG1GC = valueMethod.invoke(useG1GCObject).asInstanceOf[String] + "true".equals(useG1GC) }.getOrElse(false) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48518][CORE] Make LZF compression be able to run in parallel
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 90ee29992522 [SPARK-48518][CORE] Make LZF compression be able to run in parallel 90ee29992522 is described below commit 90ee299925220fa564c90e1f688a0d13ba0ac79d Author: Kent Yao AuthorDate: Tue Jun 4 18:58:33 2024 +0800 [SPARK-48518][CORE] Make LZF compression be able to run in parallel ### What changes were proposed in this pull request? This PR introduced a config that turns on LZF compression to parallel mode via using PLZFOutputStream. FYI, https://github.com/ning/compress?tab=readme-ov-file#parallel-processing ### Why are the changes needed? Improve performance ``` [info] OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5 [info] Apple M2 Max [info] Compress large objects:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative [info] - [info] Compression 1024 array values in 7 threads12 13 1 0.1 11788.2 1.0X [info] Compression 1024 array values single-threaded 23 23 0 0.0 22512.7 0.5X ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? benchmark ### Was this patch authored or co-authored using generative AI tooling? no Closes #46858 from yaooqinn/SPARK-48518. Authored-by: Kent Yao Signed-off-by: Kent Yao --- core/benchmarks/LZFBenchmark-jdk21-results.txt | 19 + core/benchmarks/LZFBenchmark-results.txt | 19 + .../org/apache/spark/internal/config/package.scala | 7 ++ .../org/apache/spark/io/CompressionCodec.scala | 8 +- .../scala/org/apache/spark/io/LZFBenchmark.scala | 93 ++ docs/configuration.md | 8 ++ 6 files changed, 153 insertions(+), 1 deletion(-) diff --git a/core/benchmarks/LZFBenchmark-jdk21-results.txt b/core/benchmarks/LZFBenchmark-jdk21-results.txt new file mode 100644 index ..e1566f201a1f --- /dev/null +++ b/core/benchmarks/LZFBenchmark-jdk21-results.txt @@ -0,0 +1,19 @@ + +Benchmark LZFCompressionCodec + + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure +AMD EPYC 7763 64-Core Processor +Compress small objects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +Compression 25600 int values in parallel598600 2428.2 2.3 1.0X +Compression 25600 int values single-threaded568570 2451.0 2.2 1.1X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure +AMD EPYC 7763 64-Core Processor +Compress large objects:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative +- +Compression 1024 array values in 1 threads39 45 5 0.0 38475.4 1.0X +Compression 1024 array values single-threaded 32 33 1 0.0 31154.5 1.2X + + diff --git a/core/benchmarks/LZFBenchmark-results.txt b/core/benchmarks/LZFBenchmark-results.txt new file mode 100644 index ..facc67f9cf4a --- /dev/null +++ b/core/benchmarks/LZFBenchmark-results.txt @@ -0,0 +1,19 @@ + +Benchmark LZFCompressionCodec + + +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1021-azure +AMD EPYC 7763 64-Core Processor +Compress small objects: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +Compression 25600 int values in parallel602612 6425.1 2.4 1.0X +Compression 25600 int
(spark) branch master updated: [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new abbe301d7645 [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC` abbe301d7645 is described below commit abbe301d7645217f22641cf3a5c41502680e65be Author: yangjie01 AuthorDate: Tue Jun 4 15:41:41 2024 +0800 [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC` ### What changes were proposed in this pull request? This PR changes to use the result of `ManagementFactory.getGarbageCollectorMXBeans` to determine whether G1GC is used. When G1GC is used, `ManagementFactory.getGarbageCollectorMXBeans` will return two instances of `GarbageCollectorExtImpl`, their names are `G1 Young Generation` and `G1 Old Generation` respectively. ### Why are the changes needed? Simplify the implementation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46783 from LuciferYang/refactor-isG1GC. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Kent Yao --- core/src/main/scala/org/apache/spark/util/Utils.scala | 14 +++--- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 0ac1405abe6c..991fb074d246 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -19,7 +19,7 @@ package org.apache.spark.util import java.io._ import java.lang.{Byte => JByte} -import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, PlatformManagedObject, ThreadInfo} +import java.lang.management.{LockInfo, ManagementFactory, MonitorInfo, ThreadInfo} import java.lang.reflect.InvocationTargetException import java.math.{MathContext, RoundingMode} import java.net._ @@ -3058,16 +3058,8 @@ private[spark] object Utils */ lazy val isG1GC: Boolean = { Try { - val clazz = Utils.classForName("com.sun.management.HotSpotDiagnosticMXBean") -.asInstanceOf[Class[_ <: PlatformManagedObject]] - val vmOptionClazz = Utils.classForName("com.sun.management.VMOption") - val hotSpotDiagnosticMXBean = ManagementFactory.getPlatformMXBean(clazz) - val vmOptionMethod = clazz.getMethod("getVMOption", classOf[String]) - val valueMethod = vmOptionClazz.getMethod("getValue") - - val useG1GCObject = vmOptionMethod.invoke(hotSpotDiagnosticMXBean, "UseG1GC") - val useG1GC = valueMethod.invoke(useG1GCObject).asInstanceOf[String] - "true".equals(useG1GC) + ManagementFactory.getGarbageCollectorMXBeans.asScala +.exists(_.getName.contains("G1")) }.getOrElse(false) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48514][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.0
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6475ddfed7f4 [SPARK-48514][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.0 6475ddfed7f4 is described below commit 6475ddfed7f4fc13ac362181c2a9d28f8f2454f7 Author: Bjørn Jørgensen AuthorDate: Tue Jun 4 14:51:15 2024 +0800 [SPARK-48514][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.0 ### What changes were proposed in this pull request? Upgrade kubernetes-client from 6.12.1 to 6.13.0 ### Why are the changes needed? Upgrade Fabric8 Kubernetes Model to Kubernetes v1.30.0 [Release log 6.13.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.13.0) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46854 from bjornjorgensen/kubclient6.13.0. Authored-by: Bjørn Jørgensen Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +-- pom.xml | 2 +- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index b7fdc6f670bd..65e627b1854f 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -155,31 +155,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar jul-to-slf4j/2.0.13//jul-to-slf4j-2.0.13.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar -kubernetes-client-api/6.12.1//kubernetes-client-api-6.12.1.jar -kubernetes-client/6.12.1//kubernetes-client-6.12.1.jar -kubernetes-httpclient-okhttp/6.12.1//kubernetes-httpclient-okhttp-6.12.1.jar -kubernetes-model-admissionregistration/6.12.1//kubernetes-model-admissionregistration-6.12.1.jar -kubernetes-model-apiextensions/6.12.1//kubernetes-model-apiextensions-6.12.1.jar -kubernetes-model-apps/6.12.1//kubernetes-model-apps-6.12.1.jar -kubernetes-model-autoscaling/6.12.1//kubernetes-model-autoscaling-6.12.1.jar -kubernetes-model-batch/6.12.1//kubernetes-model-batch-6.12.1.jar -kubernetes-model-certificates/6.12.1//kubernetes-model-certificates-6.12.1.jar -kubernetes-model-common/6.12.1//kubernetes-model-common-6.12.1.jar -kubernetes-model-coordination/6.12.1//kubernetes-model-coordination-6.12.1.jar -kubernetes-model-core/6.12.1//kubernetes-model-core-6.12.1.jar -kubernetes-model-discovery/6.12.1//kubernetes-model-discovery-6.12.1.jar -kubernetes-model-events/6.12.1//kubernetes-model-events-6.12.1.jar -kubernetes-model-extensions/6.12.1//kubernetes-model-extensions-6.12.1.jar -kubernetes-model-flowcontrol/6.12.1//kubernetes-model-flowcontrol-6.12.1.jar -kubernetes-model-gatewayapi/6.12.1//kubernetes-model-gatewayapi-6.12.1.jar -kubernetes-model-metrics/6.12.1//kubernetes-model-metrics-6.12.1.jar -kubernetes-model-networking/6.12.1//kubernetes-model-networking-6.12.1.jar -kubernetes-model-node/6.12.1//kubernetes-model-node-6.12.1.jar -kubernetes-model-policy/6.12.1//kubernetes-model-policy-6.12.1.jar -kubernetes-model-rbac/6.12.1//kubernetes-model-rbac-6.12.1.jar -kubernetes-model-resource/6.12.1//kubernetes-model-resource-6.12.1.jar -kubernetes-model-scheduling/6.12.1//kubernetes-model-scheduling-6.12.1.jar -kubernetes-model-storageclass/6.12.1//kubernetes-model-storageclass-6.12.1.jar +kubernetes-client-api/6.13.0//kubernetes-client-api-6.13.0.jar +kubernetes-client/6.13.0//kubernetes-client-6.13.0.jar +kubernetes-httpclient-okhttp/6.13.0//kubernetes-httpclient-okhttp-6.13.0.jar +kubernetes-model-admissionregistration/6.13.0//kubernetes-model-admissionregistration-6.13.0.jar +kubernetes-model-apiextensions/6.13.0//kubernetes-model-apiextensions-6.13.0.jar +kubernetes-model-apps/6.13.0//kubernetes-model-apps-6.13.0.jar +kubernetes-model-autoscaling/6.13.0//kubernetes-model-autoscaling-6.13.0.jar +kubernetes-model-batch/6.13.0//kubernetes-model-batch-6.13.0.jar +kubernetes-model-certificates/6.13.0//kubernetes-model-certificates-6.13.0.jar +kubernetes-model-common/6.13.0//kubernetes-model-common-6.13.0.jar +kubernetes-model-coordination/6.13.0//kubernetes-model-coordination-6.13.0.jar +kubernetes-model-core/6.13.0//kubernetes-model-core-6.13.0.jar +kubernetes-model-discovery/6.13.0//kubernetes-model-discovery-6.13.0.jar +kubernetes-model-events/6.13.0//kubernetes-model-events-6.13.0.jar +kubernetes-model-extensions/6.13.0//kubernetes-model-extensions-6.13.0.jar +kubernetes-model-flowcontrol/6.13.0//kubernetes-model-flowcontrol-6.13.0.jar +kubernetes-model-gatewayapi/6.13.0//kubernetes-model-gatewayapi-6.13.0.jar +kubernetes-model-metrics/6.13.0//kubernetes-model-metrics-6.13.0.jar +kubernetes-model-networking/6.13.0//kubernetes-model-networking-6.13.0.jar +kubernetes-model-node
(spark) branch branch-3.5 updated (7e0c31445c31 -> 7f99f2cbd7d2)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git from 7e0c31445c31 [SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset add 7f99f2cbd7d2 [SPARK-48394][3.5][CORE] Cleanup mapIdToMapIndex on mapoutput unregister No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/MapOutputTracker.scala | 26 ++ .../org/apache/spark/MapOutputTrackerSuite.scala | 55 ++ 2 files changed, 72 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48487][INFRA] Update License & Notice according to the dependency changes
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8d534c048866 [SPARK-48487][INFRA] Update License & Notice according to the dependency changes 8d534c048866 is described below commit 8d534c048866d55256d5db6d437f6682e0051f80 Author: Kent Yao AuthorDate: Mon Jun 3 10:04:19 2024 +0800 [SPARK-48487][INFRA] Update License & Notice according to the dependency changes ### What changes were proposed in this pull request? This PR updated License & Notice files according to the dependency changes I also did a little refactoring to make it in alphabetical order ### Why are the changes needed? to meet apache release policy ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually check ### Was this patch authored or co-authored using generative AI tooling? no Closes #46821 from yaooqinn/SPARK-48487. Authored-by: Kent Yao Signed-off-by: Kent Yao --- LICENSE-binary | 399 +- NOTICE-binary | 323 --- licenses-binary/LICENSE-check-qual.txt | 413 +++ licenses-binary/LICENSE-icu4j.txt | 519 licenses-binary/LICENSE-jakarta-servlet-api.txt | 277 + licenses-binary/LICENSE-jline3.txt | 34 ++ licenses-binary/LICENSE-loose-version.txt | 279 + licenses-binary/LICENSE-txw2.txt| 28 ++ licenses/LICENSE-loose-version.txt | 279 + pom.xml | 5 - 10 files changed, 2085 insertions(+), 471 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index 40271c9924bc..456b07484257 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -204,171 +204,168 @@ This project bundles some components that are also licensed under the Apache License Version 2.0: -org.apache.zookeeper:zookeeper -oro:oro -commons-configuration:commons-configuration -commons-digester:commons-digester -com.chuusai:shapeless_2.13 -com.googlecode.javaewah:JavaEWAH -com.twitter:chill-java -com.twitter:chill_2.13 -com.univocity:univocity-parsers -javax.jdo:jdo-api -joda-time:joda-time -net.sf.opencsv:opencsv -org.apache.derby:derby -org.objenesis:objenesis -org.roaringbitmap:RoaringBitmap -org.scalanlp:breeze-macros_2.13 -org.scalanlp:breeze_2.13 -org.typelevel:macro-compat_2.13 -org.yaml:snakeyaml -org.apache.xbean:xbean-asm7-shaded -com.squareup.okhttp3:logging-interceptor -com.squareup.okhttp3:okhttp -com.squareup.okio:okio -org.apache.spark:spark-catalyst_2.13 -org.apache.spark:spark-kvstore_2.13 -org.apache.spark:spark-launcher_2.13 -org.apache.spark:spark-mllib-local_2.13 -org.apache.spark:spark-network-common_2.13 -org.apache.spark:spark-network-shuffle_2.13 -org.apache.spark:spark-sketch_2.13 -org.apache.spark:spark-tags_2.13 -org.apache.spark:spark-unsafe_2.13 -commons-httpclient:commons-httpclient -com.vlkan:flatbuffers -com.ning:compress-lzf -io.airlift:aircompressor -io.dropwizard.metrics:metrics-core -io.dropwizard.metrics:metrics-graphite -io.dropwizard.metrics:metrics-json -io.dropwizard.metrics:metrics-jvm -io.dropwizard.metrics:metrics-jmx -org.iq80.snappy:snappy com.clearspring.analytics:stream -com.jamesmurty.utils:java-xmlbuilder -commons-codec:commons-codec -commons-collections:commons-collections -io.fabric8:kubernetes-client -io.fabric8:kubernetes-model -io.fabric8:kubernetes-model-common -io.netty:netty-all -net.hydromatic:eigenbase-properties -net.sf.supercsv:super-csv -org.apache.arrow:arrow-format -org.apache.arrow:arrow-memory -org.apache.arrow:arrow-vector -org.apache.commons:commons-crypto -org.apache.commons:commons-lang3 -org.apache.hadoop:hadoop-annotations -org.apache.hadoop:hadoop-auth -org.apache.hadoop:hadoop-client -org.apache.hadoop:hadoop-common -org.apache.hadoop:hadoop-hdfs -org.apache.hadoop:hadoop-hdfs-client -org.apache.hadoop:hadoop-mapreduce-client-app -org.apache.hadoop:hadoop-mapreduce-client-common -org.apache.hadoop:hadoop-mapreduce-client-core -org.apache.hadoop:hadoop-mapreduce-client-jobclient -org.apache.hadoop:hadoop-mapreduce-client-shuffle -org.apache.hadoop:hadoop-yarn-api -org.apache.hadoop:hadoop-yarn-client -org.apache.hadoop:hadoop-yarn-common -org.apache.hadoop:hadoop-yarn-server-common -org.apache.hadoop:hadoop-yarn-server-web-proxy -org.apache.httpcomponents:httpclient -org.apache.httpcomponents:httpcore -org.apache.kerby:kerb-admin -org.apache.kerby:kerb-client -org.apache.kerby:kerb-common -org.apache.kerby:kerb-core -org.apache.kerby:kerb-crypto -org.apache.kerby:kerb-identity -org.apache.kerby:kerb-server -org.apache.kerby:kerb-simplekdc -org.apache.ker
(spark) branch branch-3.4 updated: [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 0d4e1fa5dbb1 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects 0d4e1fa5dbb1 is described below commit 0d4e1fa5dbb129fd05cbdd61324cfc3e9389c1c4 Author: Mihailo Milosevic AuthorDate: Fri May 31 13:33:02 2024 +0800 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects ### What changes were proposed in this pull request? Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`. ### Why are the changes needed? https://github.com/apache/spark/pull/46588 Given PR was merged to master/3.5/3.4. This PR broke daily jobs for `OracleIntegrationSuite`. Upon inspection, it was noted that 3.4 and 3.5 are run with JDK8 while master is run with JDK21 and stripMargin was behaving differently in those cases. Upon removing stripMargin and spliting `INSERT INTO` statements into multiple lines, all integration tests have passed. ### Does this PR introduce _any_ user-facing change? No, only loading of the test data was changed to follow language requirements. ### How was this patch tested? Existing suite was aborted in the job and now it is running. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46806 Closes #46807 from mihailom-db/FixOracleMaster. Authored-by: Mihailo Milosevic Signed-off-by: Kent Yao (cherry picked from commit 4360ec733d248b62798a191301e2b671f7bcfbd5) Signed-off-by: Kent Yao --- .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 28 ++ 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 5f4f0b7a3afb..60345257f2dc 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -39,16 +39,24 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() -connection.prepareStatement( - s""" - |INSERT INTO pattern_testing_table VALUES - |('special_character_quote''_present'), - |('special_character_quote_not_present'), - |('special_character_percent%_present'), - |('special_character_percent_not_present'), - |('special_character_underscore_present'), - |('special_character_underscorenot_present') - """.stripMargin).executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_quote''_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_quote_not_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_percent%_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_percent_not_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_underscore_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_underscorenot_present')") + .executeUpdate() } def tablePreparation(connection: Connection): Unit - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d64f96cbacd9 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects d64f96cbacd9 is described below commit d64f96cbacd9d98b89f31c27cf4aa79262399659 Author: Mihailo Milosevic AuthorDate: Fri May 31 13:33:02 2024 +0800 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects ### What changes were proposed in this pull request? Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`. ### Why are the changes needed? https://github.com/apache/spark/pull/46588 Given PR was merged to master/3.5/3.4. This PR broke daily jobs for `OracleIntegrationSuite`. Upon inspection, it was noted that 3.4 and 3.5 are run with JDK8 while master is run with JDK21 and stripMargin was behaving differently in those cases. Upon removing stripMargin and spliting `INSERT INTO` statements into multiple lines, all integration tests have passed. ### Does this PR introduce _any_ user-facing change? No, only loading of the test data was changed to follow language requirements. ### How was this patch tested? Existing suite was aborted in the job and now it is running. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46806 Closes #46807 from mihailom-db/FixOracleMaster. Authored-by: Mihailo Milosevic Signed-off-by: Kent Yao (cherry picked from commit 4360ec733d248b62798a191301e2b671f7bcfbd5) Signed-off-by: Kent Yao --- .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 28 ++ 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 5f4f0b7a3afb..60345257f2dc 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -39,16 +39,24 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() -connection.prepareStatement( - s""" - |INSERT INTO pattern_testing_table VALUES - |('special_character_quote''_present'), - |('special_character_quote_not_present'), - |('special_character_percent%_present'), - |('special_character_percent_not_present'), - |('special_character_underscore_present'), - |('special_character_underscorenot_present') - """.stripMargin).executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_quote''_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_quote_not_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_percent%_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_percent_not_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_underscore_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_underscorenot_present')") + .executeUpdate() } def tablePreparation(connection: Connection): Unit - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4360ec733d24 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects 4360ec733d24 is described below commit 4360ec733d248b62798a191301e2b671f7bcfbd5 Author: Mihailo Milosevic AuthorDate: Fri May 31 13:33:02 2024 +0800 [SPARK-48172][SQL][FOLLOWUP] Fix escaping issues in JDBCDialects ### What changes were proposed in this pull request? Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`. ### Why are the changes needed? https://github.com/apache/spark/pull/46588 Given PR was merged to master/3.5/3.4. This PR broke daily jobs for `OracleIntegrationSuite`. Upon inspection, it was noted that 3.4 and 3.5 are run with JDK8 while master is run with JDK21 and stripMargin was behaving differently in those cases. Upon removing stripMargin and spliting `INSERT INTO` statements into multiple lines, all integration tests have passed. ### Does this PR introduce _any_ user-facing change? No, only loading of the test data was changed to follow language requirements. ### How was this patch tested? Existing suite was aborted in the job and now it is running. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46806 Closes #46807 from mihailom-db/FixOracleMaster. Authored-by: Mihailo Milosevic Signed-off-by: Kent Yao --- .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 28 ++ 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index 5f4f0b7a3afb..60345257f2dc 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -39,16 +39,24 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() -connection.prepareStatement( - s""" - |INSERT INTO pattern_testing_table VALUES - |('special_character_quote''_present'), - |('special_character_quote_not_present'), - |('special_character_percent%_present'), - |('special_character_percent_not_present'), - |('special_character_underscore_present'), - |('special_character_underscorenot_present') - """.stripMargin).executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_quote''_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_quote_not_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_percent%_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_percent_not_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_underscore_present')") + .executeUpdate() +connection.prepareStatement("INSERT INTO pattern_testing_table " ++ "VALUES ('special_character_underscorenot_present')") + .executeUpdate() } def tablePreparation(connection: Connection): Unit - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48471][CORE] Improve documentation and usage guide for history server
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new df15c8d7744b [SPARK-48471][CORE] Improve documentation and usage guide for history server df15c8d7744b is described below commit df15c8d7744becfd44cd4a447c362e8e007bd574 Author: Kent Yao AuthorDate: Thu May 30 17:16:31 2024 +0800 [SPARK-48471][CORE] Improve documentation and usage guide for history server ### What changes were proposed in this pull request? In this PR, we improve documentation and usage guide for the history server by: - Identify and print **unrecognized options** specified by users - Obtain and print all history server-related configurations dynamically instead of using an incomplete, outdated hardcoded list. - Ensure all configurations are documented for the usage guide ### Why are the changes needed? - Revise the help guide for the history server to make it more user-friendly. Missing configuration in the help guide is not always reachable in our official documentation. E.g. spark.history.fs.safemodeCheck.interval is still missing from the doc since added in 1.6. - Missusage shall be reported to users ### Does this PR introduce _any_ user-facing change? No, the print style is still AS-IS with items increased ### How was this patch tested? without this pr ``` Usage: ./sbin/start-history-server.sh [options] 24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for TERM 24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for HUP 24/05/30 15:37:23 INFO SignalUtils: Registering signal handler for INT Options: --properties-file FILE Path to a custom Spark properties file. Default is conf/spark-defaults.conf. Configuration options can be set by setting the corresponding JVM system property. History Server options are always available; additional options depend on the provider. History Server options: spark.history.ui.port Port where server will listen for connections (default 18080) spark.history.acls.enable Whether to enable view acls for all applications (default false) spark.history.provider Name of history provider class (defaults to file system-based provider) spark.history.retainedApplications Max number of application UIs to keep loaded in memory (default 50) FsHistoryProvider options: spark.history.fs.logDirectory Directory where app logs are stored (default: file:/tmp/spark-events) spark.history.fs.update.interval How often to reload log data from storage (in seconds, default: 10) ``` For error ```java Unrecognized options: --conf spark.history.ui.port=1 Usage: HistoryServer [options] Options: --properties-file FILE Path to a custom Spark properties file. Default is conf/spark-defaults.conf. ``` For help ```java sbin/start-history-server.sh --help Usage: ./sbin/start-history-server.sh [options] {"ts":"2024-05-30T07:15:29.740Z","level":"INFO","msg":"Registering signal handler for TERM","context":{"signal":"TERM"},"logger":"SignalUtils"} {"ts":"2024-05-30T07:15:29.741Z","level":"INFO","msg":"Registering signal handler for HUP","context":{"signal":"HUP"},"logger":"SignalUtils"} {"ts":"2024-05-30T07:15:29.741Z","level":"INFO","msg":"Registering signal handler for INT","context":{"signal":"INT"},"logger":"SignalUtils"} Options: --properties-file FILE Path to a custom Spark properties file. Default is conf/spark-defaults.conf. Configuration options can be set by setting the corresponding JVM system property. History Server options are always available; additional options depend on the provider. History Server options: spark.history.custom.executor.log.url
(spark) branch master updated (910c3733bfdd -> b477ef4fa992)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 910c3733bfdd Revert "[SPARK-48415][PYTHON] Refactor TypeName to support parameterized datatypes" add b477ef4fa992 [SPARK-47260][SQL] Assign name to error class _LEGACY_ERROR_TEMP_3250 No new revisions were added by this update. Summary of changes: common/utils/src/main/resources/error/error-conditions.json | 5 - sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala| 4 ++-- .../src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala | 4 ++-- 4 files changed, 6 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48426][SQL][DOCS] Add documentation for SQL operator precedence
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8bbbde7cb3c3 [SPARK-48426][SQL][DOCS] Add documentation for SQL operator precedence 8bbbde7cb3c3 is described below commit 8bbbde7cb3c396bc369c06853ed3a2ec021a2530 Author: Kent Yao AuthorDate: Wed May 29 13:31:38 2024 +0800 [SPARK-48426][SQL][DOCS] Add documentation for SQL operator precedence ### What changes were proposed in this pull request? This PR adds a doc for SQL operator precedence based on the current definition of `SqlBaseParser.g4` Not related to this PR, I have found that our `^` and `!` operators have quite different precedences than other modern systems. https://docs.oracle.com/cd/A58617_01/server.804/a58225/ch3all.htm https://learn.microsoft.com/en-us/sql/t-sql/language-elements/operator-precedence-transact-sql?view=sql-server-ver16 https://dev.mysql.com/doc/refman/8.0/en/operator-precedence.html https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-PRECEDENCE https://mariadb.com/kb/en/operator-precedence/ https://docs.databricks.com/en/sql/language-manual/sql-ref-functions-builtin.html#operator-precedence ### Why are the changes needed? doc improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ![image](https://github.com/apache/spark/assets/8326978/dd612740-dd8a-4dc9-af2c-488938f00dff) ### Was this patch authored or co-authored using generative AI tooling? no Closes #46757 from yaooqinn/SPARK-48426. Authored-by: Kent Yao Signed-off-by: Kent Yao --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-operators.md | 124 ++ 2 files changed, 126 insertions(+) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 46dc4f3388cb..059a9bdc1af4 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -85,6 +85,8 @@ url: sql-ref-datetime-pattern.html - text: Number Pattern url: sql-ref-number-pattern.html +- text: Operators + url: sql-ref-operators.html - text: Functions url: sql-ref-functions.html - text: Identifiers diff --git a/docs/sql-ref-operators.md b/docs/sql-ref-operators.md new file mode 100644 index ..102e45fba8d2 --- /dev/null +++ b/docs/sql-ref-operators.md @@ -0,0 +1,124 @@ +--- +layout: global +title: Operators +displayTitle: Operators +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +An SQL operator is a symbol specifying an action that is performed on one or more expressions. Operators are represented by special characters or by keywords. + +### Operator Precedence + +When a complex expression has multiple operators, operator precedence determines the sequence of operations in the expression, +e.g. in expression `1 + 2 * 3`, `*` has higher precedence than `+`, so the expression is evaluated as `1 + (2 * 3) = 7`. +The order of execution can significantly affect the resulting value. + +Operators have the precedence levels shown in the following table. +An operator on higher precedence is evaluated before an operator on a lower level. +In the following table, the operators in descending order of precedence, a.k.a. 1 is the highest level. +Operators listed on the same table cell have the same precedence and are evaluated from left to right or right to left based on the associativity. + + + + + Precedence + Operator + Operation + Associativity + + + + + 1 + .[]:: + member accesselement accesscast + Left to right + + + 2 + +-~ + unary plusunary minusbitwise NOT + Right to left + + + 3 + */%DIV + multiplicationdivision, modulointegral division + Left to right + + + 4 + +-|| + additionsubtractionconcatenation + Left to right + + + 5 + + bitwise shi
(spark) branch master updated: [SPARK-48436][SQL][TESTS] Use `c.m.c.j.Driver` instead of `c.m.j.Driver` in `MySQLNamespaceSuite`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 99ffcaa13b36 [SPARK-48436][SQL][TESTS] Use `c.m.c.j.Driver` instead of `c.m.j.Driver` in `MySQLNamespaceSuite` 99ffcaa13b36 is described below commit 99ffcaa13b36c9ffa5582dfeec29438fa58c3e73 Author: panbingkun AuthorDate: Wed May 29 10:04:45 2024 +0800 [SPARK-48436][SQL][TESTS] Use `c.m.c.j.Driver` instead of `c.m.j.Driver` in `MySQLNamespaceSuite` ### What changes were proposed in this pull request? The pr aims to use `com.mysql.cj.jdbc.Driver` instead of `com.mysql.jdbc.Driver` in `MySQLNamespaceSuite` ### Why are the changes needed? - The full class name of `mysql driver` has changed from `com.mysql.jdbc.Driver` (is deprecated) to `com.mysql.cj.jdbc.Driver`, - Eliminate warnings: https://github.com/apache/spark/assets/15246973/8b135f30-4f89-4d10-a57a-35574e2331a9;> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46773 from panbingkun/SPARK-48436. Authored-by: panbingkun Signed-off-by: Kent Yao --- .../test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala index d2a7aa775826..2b607fccd171 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala @@ -40,7 +40,7 @@ class MySQLNamespaceSuite extends DockerJDBCIntegrationSuite with V2JDBCNamespac val map = new CaseInsensitiveStringMap( Map("url" -> db.getJdbcUrl(dockerIp, externalPort), - "driver" -> "com.mysql.jdbc.Driver").asJava) + "driver" -> "com.mysql.cj.jdbc.Driver").asJava) catalog.initialize("mysql", map) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f164e4ae53ca -> a78ef738af02)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f164e4ae53ca [SPARK-48425][INFRA][FOLLOWUP] Do not copy the base spark folder add a78ef738af02 [SPARK-48168][SQL][FOLLOWUP] Match expression strings of shift operators & functions with user inputs No new revisions were added by this update. Summary of changes: .../explain-results/function_shiftleft.explain | 2 +- .../explain-results/function_shiftright.explain| 2 +- .../function_shiftrightunsigned.explain| 2 +- .../grouping_and_grouping_id.explain | 2 +- .../sql/catalyst/expressions/mathExpressions.scala | 8 ++-- .../spark/sql/catalyst/parser/AstBuilder.scala | 5 ++- .../sql-functions/sql-expression-schema.md | 12 +++--- .../analyzer-results/group-analytics.sql.out | 10 ++--- .../analyzer-results/grouping_set.sql.out | 6 +-- .../postgreSQL/groupingsets.sql.out| 44 +++--- .../analyzer-results/postgreSQL/int2.sql.out | 4 +- .../analyzer-results/postgreSQL/int4.sql.out | 4 +- .../analyzer-results/postgreSQL/int8.sql.out | 4 +- .../udf/udf-group-analytics.sql.out| 10 ++--- .../sql-tests/results/postgreSQL/int2.sql.out | 4 +- .../sql-tests/results/postgreSQL/int4.sql.out | 4 +- .../sql-tests/results/postgreSQL/int8.sql.out | 2 +- .../approved-plans-v1_4/q17/explain.txt| 2 +- .../approved-plans-v1_4/q25/explain.txt| 2 +- .../approved-plans-v1_4/q27.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q27/explain.txt| 2 +- .../approved-plans-v1_4/q29/explain.txt| 2 +- .../approved-plans-v1_4/q36.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q36/explain.txt| 2 +- .../approved-plans-v1_4/q39a/explain.txt | 2 +- .../approved-plans-v1_4/q39b/explain.txt | 2 +- .../approved-plans-v1_4/q49/explain.txt| 6 +-- .../approved-plans-v1_4/q5/explain.txt | 2 +- .../approved-plans-v1_4/q64/explain.txt| 4 +- .../approved-plans-v1_4/q70.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q70/explain.txt| 2 +- .../approved-plans-v1_4/q72/explain.txt| 2 +- .../approved-plans-v1_4/q85/explain.txt| 2 +- .../approved-plans-v1_4/q86.sf100/explain.txt | 2 +- .../approved-plans-v1_4/q86/explain.txt| 2 +- .../approved-plans-v2_7/q24.sf100/explain.txt | 2 +- .../approved-plans-v2_7/q49/explain.txt| 6 +-- .../approved-plans-v2_7/q5a/explain.txt| 2 +- .../approved-plans-v2_7/q64/explain.txt| 4 +- .../approved-plans-v2_7/q72/explain.txt| 2 +- 40 files changed, 93 insertions(+), 90 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48168][SQL][FOLLOWUP] Fix bitwise shifting operator's precedence
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b52645652eff [SPARK-48168][SQL][FOLLOWUP] Fix bitwise shifting operator's precedence b52645652eff is described below commit b52645652eff35345c868dc47e50b3970f3a7002 Author: Kent Yao AuthorDate: Mon May 27 17:55:17 2024 +0800 [SPARK-48168][SQL][FOLLOWUP] Fix bitwise shifting operator's precedence ### What changes were proposed in this pull request? After referencing both `C`, `MySQL`'s doc, https://en.cppreference.com/w/c/language/operator_precedence https://dev.mysql.com/doc/refman/8.0/en/operator-precedence.html And doing some experiments on scala-shell ```scala scala> 1 & 2 >> 1 val res0: Int = 1 scala> 2 >> 1 << 1 val res1: Int = 2 scala> 1 << 1 + 2 val res2: Int = 8 ``` The suitable precedence for `<< >> >>>` is between '+/-' and '&' with a left-to-right associativity. ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? now, unreleased yet ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46753 from yaooqinn/SPARK-48168-F. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 2 +- .../sql-tests/analyzer-results/bitwise.sql.out | 21 +++ .../test/resources/sql-tests/inputs/bitwise.sql| 6 +- .../resources/sql-tests/results/bitwise.sql.out| 24 ++ 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 index f0c0adb88121..4552c17e0cf1 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 @@ -986,11 +986,11 @@ valueExpression | operator=(MINUS | PLUS | TILDE) valueExpression #arithmeticUnary | left=valueExpression operator=(ASTERISK | SLASH | PERCENT | DIV) right=valueExpression #arithmeticBinary | left=valueExpression operator=(PLUS | MINUS | CONCAT_PIPE) right=valueExpression #arithmeticBinary +| left=valueExpression shiftOperator right=valueExpression #shiftExpression | left=valueExpression operator=AMPERSAND right=valueExpression #arithmeticBinary | left=valueExpression operator=HAT right=valueExpression #arithmeticBinary | left=valueExpression operator=PIPE right=valueExpression #arithmeticBinary | left=valueExpression comparisonOperator right=valueExpression #comparison -| left=valueExpression shiftOperator right=valueExpression #shiftExpression ; shiftOperator diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out index fee226c0c341..1267a984565a 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/bitwise.sql.out @@ -418,3 +418,24 @@ select cast(null as map>), 20181117 >> 2 -- !query analysis Project [cast(null as map>) AS NULL#x, (20181117 >> 2) AS (20181117 >> 2)#x] +- OneRowRelation + + +-- !query +select 1 << 1 + 2 as plus_over_shift +-- !query analysis +Project [(1 << (1 + 2)) AS plus_over_shift#x] ++- OneRowRelation + + +-- !query +select 2 >> 1 << 1 as left_to_right +-- !query analysis +Project [((2 >> 1) << 1) AS left_to_right#x] ++- OneRowRelation + + +-- !query +select 1 & 2 >> 1 as shift_over_ampersand +-- !query analysis +Project [(1 & (2 >> 1)) AS shift_over_ampersand#x] ++- OneRowRelation diff --git a/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql b/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql index 5823b22ef645..e080fdd32a4a 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/bitwise.sql @@ -86,4 +86,8 @@ SELECT 20181117 <<< 2; SELECT 20181117 >>>> 2; select cast(null as array>), 20181117 >> 2; select cast(null as array>), 20181117 >>> 2; -select cast(null as map>), 20181117 >> 2; \ No newline at
(spark) branch master updated: [SPARK-48427][BUILD] Upgrade `scala-parser-combinators` to 2.4
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 48a4bdb9eacb [SPARK-48427][BUILD] Upgrade `scala-parser-combinators` to 2.4 48a4bdb9eacb is described below commit 48a4bdb9eacb4c7a5c56812171a9093d120b98b7 Author: yangjie01 AuthorDate: Mon May 27 17:53:53 2024 +0800 [SPARK-48427][BUILD] Upgrade `scala-parser-combinators` to 2.4 ### What changes were proposed in this pull request? This pr aims to upgrade `scala-parser-combinators` from 2.3 to 2.4 ### Why are the changes needed? This version begins to validate the build and testing for Java 21. The full release note as follows: - https://github.com/scala/scala-parser-combinators/releases/tag/v2.4.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46754 from LuciferYang/SPARK-48427. Authored-by: yangjie01 Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 61d7861f4469..10d812c9fd8a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -250,7 +250,7 @@ scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar scala-compiler/2.13.14//scala-compiler-2.13.14.jar scala-library/2.13.14//scala-library-2.13.14.jar scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar -scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar +scala-parser-combinators_2.13/2.4.0//scala-parser-combinators_2.13-2.4.0.jar scala-reflect/2.13.14//scala-reflect-2.13.14.jar scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar slf4j-api/2.0.13//slf4j-api-2.0.13.jar diff --git a/pom.xml b/pom.xml index eef7237ac12f..5b088db7b20b 100644 --- a/pom.xml +++ b/pom.xml @@ -1151,7 +1151,7 @@ org.scala-lang.modules scala-parser-combinators_${scala.binary.version} -2.3.0 +2.4.0 jline - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b15b6cf1f537 [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version b15b6cf1f537 is described below commit b15b6cf1f537756eafbe8dd31a3b03dc500077f3 Author: panbingkun AuthorDate: Fri May 24 17:04:38 2024 +0800 [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version ### What changes were proposed in this pull request? The pr aims to upgrade some db docker image version, include: - `MySQL` from `8.3.0` to `8.4.0` - `Postgres` from `10.5.12` to `10.5.25` - `Mariadb` from `16.2-alpine` to `16.3-alpine` ### Why are the changes needed? Tests dependencies upgrading. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46704 from panbingkun/db_images_upgrade. Authored-by: panbingkun Signed-off-by: Kent Yao --- .../org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala | 6 +++--- .../scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala | 2 +- .../scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala | 6 +++--- .../org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala | 6 +++--- .../apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala| 6 +++--- .../apache/spark/sql/jdbc/querytest/PostgreSQLQueryTestSuite.scala | 6 +++--- .../org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 +++--- .../scala/org/apache/spark/sql/jdbc/v2/PostgresNamespaceSuite.scala | 6 +++--- 8 files changed, 22 insertions(+), 22 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala index 6825c001f767..efb2fa09f6a3 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala @@ -25,9 +25,9 @@ import org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnecti import org.apache.spark.tags.DockerTest /** - * To run this test suite for a specific version (e.g., mariadb:10.5.12): + * To run this test suite for a specific version (e.g., mariadb:10.5.25): * {{{ - * ENABLE_DOCKER_INTEGRATION_TESTS=1 MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.12 + * ENABLE_DOCKER_INTEGRATION_TESTS=1 MARIADB_DOCKER_IMAGE_NAME=mariadb:10.5.25 * ./build/sbt -Pdocker-integration-tests * "docker-integration-tests/testOnly org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite" * }}} @@ -38,7 +38,7 @@ class MariaDBKrbIntegrationSuite extends DockerKrbJDBCIntegrationSuite { override protected val keytabFileName = "mariadb.keytab" override val db = new DatabaseOnDocker { -override val imageName = sys.env.getOrElse("MARIADB_DOCKER_IMAGE_NAME", "mariadb:10.5.12") +override val imageName = sys.env.getOrElse("MARIADB_DOCKER_IMAGE_NAME", "mariadb:10.5.25") override val env = Map( "MYSQL_ROOT_PASSWORD" -> "rootpass" ) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala index 568eb5f10973..570a81ac3947 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLDatabaseOnDocker.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.jdbc class MySQLDatabaseOnDocker extends DatabaseOnDocker { - override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", "mysql:8.3.0") + override val imageName = sys.env.getOrElse("MYSQL_DOCKER_IMAGE_NAME", "mysql:8.4.0") override val env = Map( "MYSQL_ROOT_PASSWORD" -> "rootpass" ) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala index 5ad4f15216b7..12a71dbd7c7f 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/
(spark) branch master updated: [SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3346afd4b250 [SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import 3346afd4b250 is described below commit 3346afd4b250c3aead5a237666d4942018a463e0 Author: ulysses-you AuthorDate: Fri May 24 14:53:26 2024 +0800 [SPARK-46090][SQL][FOLLOWUP] Add DeveloperApi import ### What changes were proposed in this pull request? Add DeveloperApi import ### Why are the changes needed? Fix compile issue ### Does this PR introduce _any_ user-facing change? Fix compile issue ### How was this patch tested? pass CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #46730 from ulysses-you/hot-fix. Authored-by: ulysses-you Signed-off-by: Kent Yao --- .../org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala index fce20b79e113..23817be71c89 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql.execution.adaptive import scala.collection.mutable -import org.apache.spark.annotation.Experimental +import org.apache.spark.annotation.{DeveloperApi, Experimental} import org.apache.spark.sql.catalyst.SQLConfHelper /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48406][BUILD] Upgrade commons-cli to 1.8.0
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f42ed6c76004 [SPARK-48406][BUILD] Upgrade commons-cli to 1.8.0 f42ed6c76004 is described below commit f42ed6c760043b0213ebf0348a22dec7c0bb8244 Author: yangjie01 AuthorDate: Fri May 24 14:23:23 2024 +0800 [SPARK-48406][BUILD] Upgrade commons-cli to 1.8.0 ### What changes were proposed in this pull request? This pr aims to upgrade Apache `commons-cli` from 1.6.0 to 1.8.0. ### Why are the changes needed? The full release notes as follows: - https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.0 - https://commons.apache.org/proper/commons-cli/changes-report.html#a1.8.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46727 from LuciferYang/commons-cli-180. Authored-by: yangjie01 Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 35f6103e9fa4..46c5108e4eba 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -37,7 +37,7 @@ cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar checker-qual/3.42.0//checker-qual-3.42.0.jar chill-java/0.10.0//chill-java-0.10.0.jar chill_2.13/0.10.0//chill_2.13-0.10.0.jar -commons-cli/1.6.0//commons-cli-1.6.0.jar +commons-cli/1.8.0//commons-cli-1.8.0.jar commons-codec/1.17.0//commons-codec-1.17.0.jar commons-collections/3.2.2//commons-collections-3.2.2.jar commons-collections4/4.4//commons-collections4-4.4.jar diff --git a/pom.xml b/pom.xml index ecd05ee996e1..e8d47afa1cca 100644 --- a/pom.xml +++ b/pom.xml @@ -210,7 +210,7 @@ 4.17.0 3.1.0 1.1.0 -1.6.0 +1.8.0 1.78 1.13.0 6.0.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48405][BUILD] Upgrade `commons-compress` to 1.26.2
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3b9b52dff614 [SPARK-48405][BUILD] Upgrade `commons-compress` to 1.26.2 3b9b52dff614 is described below commit 3b9b52dff6149e499c59bb30641df777bd712d9b Author: panbingkun AuthorDate: Fri May 24 11:52:37 2024 +0800 [SPARK-48405][BUILD] Upgrade `commons-compress` to 1.26.2 ### What changes were proposed in this pull request? The pr aims to upgrade `commons-compress` to `1.26.2`. ### Why are the changes needed? The full release notes: https://commons.apache.org/proper/commons-compress/changes-report.html#a1.26.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46725 from panbingkun/SPARK-48405. Authored-by: panbingkun Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 79ce883dc672..35f6103e9fa4 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -42,7 +42,7 @@ commons-codec/1.17.0//commons-codec-1.17.0.jar commons-collections/3.2.2//commons-collections-3.2.2.jar commons-collections4/4.4//commons-collections4-4.4.jar commons-compiler/3.1.9//commons-compiler-3.1.9.jar -commons-compress/1.26.1//commons-compress-1.26.1.jar +commons-compress/1.26.2//commons-compress-1.26.2.jar commons-crypto/1.1.0//commons-crypto-1.1.0.jar commons-dbcp/1.4//commons-dbcp-1.4.jar commons-io/2.16.1//commons-io-2.16.1.jar diff --git a/pom.xml b/pom.xml index 6bbcf05b59e5..ecd05ee996e1 100644 --- a/pom.xml +++ b/pom.xml @@ -187,7 +187,7 @@ 1.1.10.5 3.0.3 1.17.0 -1.26.1 +1.26.2 2.16.1 2.6 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48399][SQL] Teradata: ByteType should map to BYTEINT instead of BYTE(binary)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6afa6cc3c16e [SPARK-48399][SQL] Teradata: ByteType should map to BYTEINT instead of BYTE(binary) 6afa6cc3c16e is described below commit 6afa6cc3c16e21f94087ebb6adb01bd1ff397086 Author: Kent Yao AuthorDate: Fri May 24 10:13:49 2024 +0800 [SPARK-48399][SQL] Teradata: ByteType should map to BYTEINT instead of BYTE(binary) ### What changes were proposed in this pull request? According to the doc of Teradata and Teradata jdbc, BYTE represents binary type in Teradata, while BYTEINT is used for tinyint. - https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Types-and-Literals/Numeric-Data-Types/BYTEINT-Data-Type - https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/frameset.html ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? Yes, ByteType used to be stored as binary type in Teradata, now it has become BYTEINT. (The use-case seems rare, the migration guide or legacy config are pending reviewer's comments) ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46715 from yaooqinn/SPARK-48399. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../scala/org/apache/spark/sql/jdbc/TeradataDialect.scala | 1 + .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 15 +-- 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala index 7acd22a3f10b..95a9f60b64ed 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala @@ -42,6 +42,7 @@ private case class TeradataDialect() extends JdbcDialect { override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case StringType => Some(JdbcType("VARCHAR(255)", java.sql.Types.VARCHAR)) case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR)) +case ByteType => Option(JdbcType("BYTEINT", java.sql.Types.TINYINT)) case _ => None } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala index 0a792f44d3e2..e4116b565818 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala @@ -1477,16 +1477,11 @@ class JDBCSuite extends QueryTest with SharedSparkSession { } } - test("SPARK-15648: teradataDialect StringType data mapping") { -val teradataDialect = JdbcDialects.get("jdbc:teradata://127.0.0.1/db") -assert(teradataDialect.getJDBCType(StringType). - map(_.databaseTypeDefinition).get == "VARCHAR(255)") - } - - test("SPARK-15648: teradataDialect BooleanType data mapping") { -val teradataDialect = JdbcDialects.get("jdbc:teradata://127.0.0.1/db") -assert(teradataDialect.getJDBCType(BooleanType). - map(_.databaseTypeDefinition).get == "CHAR(1)") + test("SPARK-48399: TeradataDialect jdbc data mapping") { +val dialect = JdbcDialects.get("jdbc:teradata://127.0.0.1/db") +assert(dialect.getJDBCType(StringType).map(_.databaseTypeDefinition).get == "VARCHAR(255)") +assert(dialect.getJDBCType(BooleanType).map(_.databaseTypeDefinition).get == "CHAR(1)") +assert(dialect.getJDBCType(ByteType).map(_.databaseTypeDefinition).get == "BYTEINT") } test("SPARK-38846: TeradataDialect catalyst type mapping") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48387][SQL] Postgres: Map TimestampType to TIMESTAMP WITH TIME ZONE
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a48365dd98c9 [SPARK-48387][SQL] Postgres: Map TimestampType to TIMESTAMP WITH TIME ZONE a48365dd98c9 is described below commit a48365dd98c9e52b5648d1cc0af203a7290cb1dc Author: Kent Yao AuthorDate: Thu May 23 10:27:16 2024 +0800 [SPARK-48387][SQL] Postgres: Map TimestampType to TIMESTAMP WITH TIME ZONE ### What changes were proposed in this pull request? Currently, Both TimestampType/TimestampNTZType are mapped to TIMESTAMP WITHOUT TIME ZONE for writing while being differentiated for reading. In this PR, we map TimestampType to TIMESTAMP WITH TIME ZONE to differentiate TimestampType/TimestampNTZType for writing against Postgres. ### Why are the changes needed? TimestampType <-> TIMESTAMP WITHOUT TIME ZONE is incorrect and ambiguous with TimestampNTZType ### Does this PR introduce _any_ user-facing change? Yes migration guide and legacy configuration provided ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46701 from yaooqinn/SPARK-48387. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 46 ++ docs/sql-data-sources-jdbc.md | 4 +- docs/sql-migration-guide.md| 3 +- .../org/apache/spark/sql/internal/SQLConf.scala| 14 +++ .../apache/spark/sql/jdbc/PostgresDialect.scala| 6 ++- 5 files changed, 68 insertions(+), 5 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala index dd6f1bfd3b3f..5ad4f15216b7 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala @@ -27,6 +27,7 @@ import org.apache.spark.SparkException import org.apache.spark.sql.{Column, DataFrame, Row} import org.apache.spark.sql.catalyst.expressions.Literal import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.tags.DockerTest @@ -583,4 +584,49 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite { assert(cause.getSQLState === "22003") } } + + test("SPARK-48387: Timestamp write as timestamp with time zone") { +val df = spark.sql("select TIMESTAMP '2018-11-17 13:33:33' as col0") +// write timestamps for preparation +withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> "false") { + // write timestamp as timestamp with time zone + df.write.jdbc(jdbcUrl, "ts_with_timezone_copy_false", new Properties) +} +withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> "true") { + // write timestamp as timestamp without time zone + df.write.jdbc(jdbcUrl, "ts_with_timezone_copy_true", new Properties) +} + +// read timestamps for test +withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> "true") { + val df1 = spark.read.option("preferTimestampNTZ", false) +.jdbc(jdbcUrl, "ts_with_timezone_copy_false", new Properties) + checkAnswer(df1, Row(Timestamp.valueOf("2018-11-17 13:33:33"))) + val df2 = spark.read.option("preferTimestampNTZ", true) +.jdbc(jdbcUrl, "ts_with_timezone_copy_false", new Properties) + checkAnswer(df2, Row(LocalDateTime.of(2018, 11, 17, 13, 33, 33))) + + val df3 = spark.read.option("preferTimestampNTZ", false) +.jdbc(jdbcUrl, "ts_with_timezone_copy_true", new Properties) + checkAnswer(df3, Row(Timestamp.valueOf("2018-11-17 13:33:33"))) + val df4 = spark.read.option("preferTimestampNTZ", true) +.jdbc(jdbcUrl, "ts_with_timezone_copy_true", new Properties) + checkAnswer(df4, Row(LocalDateTime.of(2018, 11, 17, 13, 33, 33))) +} +withSQLConf(SQLConf.LEGACY_POSTGRES_DATETIME_MAPPING_ENABLED.key -> "false") { + Seq("true", "false").foreach { prefer => +val prop = new Properties +prop.setProperty("preferTimestampNTZ", prefer) +val dfCopy = spark.read.jdbc(jdbcUrl, "ts_with_timezone_copy_false", prop) +checkA
(spark) branch master updated (f4958ba9587c -> bf7f664296c5)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f4958ba9587c [SPARK-48367][CONNECT][FOLLOWUP] Replace keywords that identify `lint-scala` detection results add bf7f664296c5 [SPARK-47515][SPARK-47406][SQL][FOLLOWUP] Add legacy config spark.sql.legacy.mysql.timestampNTZMapping.enabled No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 3 ++- .../org/apache/spark/sql/internal/SQLConf.scala | 14 ++ .../org/apache/spark/sql/jdbc/MySQLDialect.scala| 5 +++-- .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 21 + 4 files changed, 40 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48365][DOCS] DB2: Document Mapping Spark SQL Data Types to DB2
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 664c8c19dae7 [SPARK-48365][DOCS] DB2: Document Mapping Spark SQL Data Types to DB2 664c8c19dae7 is described below commit 664c8c19dae7ca23dc9142133471d96501093bed Author: Kent Yao AuthorDate: Tue May 21 17:16:21 2024 +0800 [SPARK-48365][DOCS] DB2: Document Mapping Spark SQL Data Types to DB2 ### What changes were proposed in this pull request? In this PR, we document the mapping rules for Spark SQL Data Types to DB2 ones ### Why are the changes needed? doc improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ![image](https://github.com/apache/spark/assets/8326978/40092f80-1392-48a0-96e9-8ef9cf9516e2) ### Was this patch authored or co-authored using generative AI tooling? no Closes #46677 from yaooqinn/SPARK-48365. Authored-by: Kent Yao Signed-off-by: Kent Yao --- docs/sql-data-sources-jdbc.md | 106 ++ 1 file changed, 106 insertions(+) diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md index 0c929fece679..54a8506bff51 100644 --- a/docs/sql-data-sources-jdbc.md +++ b/docs/sql-data-sources-jdbc.md @@ -1885,3 +1885,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to DB2 + +The below table describes the data type conversions from Spark SQL Data Types to DB2 data types, +when creating, altering, or writing data to a DB2 table using the built-in jdbc data source with +the [IBM Data Server Driver For JDBC and SQLJ](https://mvnrepository.com/artifact/com.ibm.db2/jcc) as the activated JDBC Driver. + + + + + Spark SQL Data Type + DB2 Data Type + Remarks + + + + + BooleanType + BOOLEAN + + + + ByteType + SMALLINT + + + + ShortType + SMALLINT + + + + IntegerType + INTEGER + + + + LongType + BIGINT + + + + FloatType + REAL + + + + DoubleType + DOUBLE PRECISION + + + + DecimalType(p, s) + DECIMAL(p,s) + The maximum value for 'p' is 31 in DB2, while it is 38 in Spark. It might fail when storing DecimalType(p>=32, s) to DB2 + + + DateType + DATE + + + + TimestampType + TIMESTAMP + + + + TimestampNTZType + TIMESTAMP + + + + StringType + CLOB + + + + BinaryType + BLOB + + + + CharType(n) + CHAR(n) + The maximum value for 'n' is 255 in DB2, while it is unlimited in Spark. + + + VarcharType(n) + VARCHAR(n) + The maximum value for 'n' is 255 in DB2, while it is unlimited in Spark. + + + + +The Spark Catalyst data types below are not supported with suitable DB2 types. + +- DayTimeIntervalType +- YearMonthIntervalType +- CalendarIntervalType +- ArrayType +- MapType +- StructType +- UserDefinedType +- NullType +- ObjectType +- VariantType - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48300][SQL] Codegen Support for `from_xml`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6213fa661ffe [SPARK-48300][SQL] Codegen Support for `from_xml` 6213fa661ffe is described below commit 6213fa661ffeff073f3f1a6253f7039a45f284c7 Author: panbingkun AuthorDate: Tue May 21 13:47:42 2024 +0800 [SPARK-48300][SQL] Codegen Support for `from_xml` ### What changes were proposed in this pull request? The PR aims to add `Codegen Support` for `from_xml` ### Why are the changes needed? - Improve codegen coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Add new UT & existed UT. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46609 from panbingkun/from_xml_codegen. Lead-authored-by: panbingkun Co-authored-by: Kent Yao Signed-off-by: Kent Yao --- .../apache/spark/sql/catalyst/expressions/xmlExpressions.scala | 6 +- .../test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala| 10 ++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala index 564a6fce1b80..48a87db291a8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala @@ -58,7 +58,6 @@ case class XmlToStructs( timeZoneId: Option[String] = None) extends UnaryExpression with TimeZoneAwareExpression - with CodegenFallback with ExpectsInputTypes with NullIntolerant with QueryErrorsBase { @@ -120,6 +119,11 @@ case class XmlToStructs( override def nullSafeEval(xml: Any): Any = converter(parser.parse(xml.asInstanceOf[UTF8String].toString)) + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val expr = ctx.addReferenceObj("this", this) +defineCodeGen(ctx, ev, input => s"(InternalRow) $expr.nullSafeEval($input)") + } + override def inputTypes: Seq[AbstractDataType] = StringTypeAnyCollation :: Nil override def prettyName: String = "from_xml" diff --git a/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala index bc910d7f30fb..1364fab3138e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala @@ -40,6 +40,16 @@ class XmlFunctionsSuite extends QueryTest with SharedSparkSession { Row(Row(1)) :: Nil) } + test("SPARK-48300: from_xml - Codegen Support") { +withTempView("XmlToStructsTable") { + val dataDF = Seq("""1""").toDF("value") + dataDF.createOrReplaceTempView("XmlToStructsTable") + val df = sql("SELECT from_xml(value, 'a INT') FROM XmlToStructsTable") + assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec]) + checkAnswer(df, Row(Row(1)) :: Nil) +} + } + test("from_xml with option (timestampFormat)") { val df = Seq("""26/08/2015 18:00""").toDS() val schema = new StructType().add("time", TimestampType) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0e0134a9a48d [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset) 0e0134a9a48d is described below commit 0e0134a9a48d3f58e81d26d01637dca6f2b05a92 Author: NOTHING AuthorDate: Tue May 21 13:46:29 2024 +0800 [MINOR][DOCS] correct the doc error in configuration page (fix rest to reset) ### What changes were proposed in this pull request? 1. correct the doc error in spark-docs's `configuration` page, it should be ```reset to their initial values by RESET command```, not ```rest to their initial values by RESET command``` ### Why are the changes needed? 1. correct the doc error to make the doc clearer ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? no need to test, just spell a word incorrectly ### Was this patch authored or co-authored using generative AI tooling? No Closes #46663 from Justontheway/patch-1. Authored-by: NOTHING Signed-off-by: Kent Yao --- docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index cb1fb6fba958..ecd9cd75487f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -3396,7 +3396,7 @@ Spark subsystems. Runtime SQL configurations are per-session, mutable Spark SQL configurations. They can be set with initial values by the config file and command-line options with `--conf/-c` prefixed, or by setting `SparkConf` that are used to create `SparkSession`. -Also, they can be set and queried by SET commands and rest to their initial values by RESET command, +Also, they can be set and queried by SET commands and reset to their initial values by RESET command, or by `SparkSession.conf`'s setter and getter methods in runtime. {% include_api_gen generated-runtime-sql-config-table.html %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (839efb1d72f6 -> 2a1bdc3eda8a)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 839efb1d72f6 [SPARK-48363][SQL] Cleanup some redundant codes in `from_xml` add 2a1bdc3eda8a [SPARK-48337][SQL] Fix precision loss for JDBC TIME values No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 31 +- .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 13 + .../sql/execution/datasources/jdbc/JdbcUtils.scala | 23 +++- 3 files changed, 29 insertions(+), 38 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48332][BUILD][TESTS] Upgrade `jdbc` related test dependencies
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6fcdaab27ae9 [SPARK-48332][BUILD][TESTS] Upgrade `jdbc` related test dependencies 6fcdaab27ae9 is described below commit 6fcdaab27ae900ee120e80c75bafe243a7e80765 Author: panbingkun AuthorDate: Mon May 20 18:53:17 2024 +0800 [SPARK-48332][BUILD][TESTS] Upgrade `jdbc` related test dependencies ### What changes were proposed in this pull request? The pr aims to upgrade `jdbc` related test dependencies, include: - com.mysql:mysql-connector-j from `8.3.0` to `8.4.0` - com.oracle.database.jdbc:ojdbc11 from `23.3.0.23.09` to `23.4.0.24.05` ### Why are the changes needed? - com.mysql:mysql-connector-j from, release notes: https://dev.mysql.com/doc/relnotes/connector-j/en/news-8-4-0.html - com.oracle.database.jdbc:ojdbc11, release notes: https://download.oracle.com/otn-pub/otn_software/jdbc/23c/JDBC-UCP-ReleaseNotes-23ai.txt?AuthParam=1716161887_dbf7a096828486d544bee00a2383f42a === Known Problems Fixed in the Patch Release 23.4.0.24.05 === Bug 36279736 - CONNECTION TO A PROXY USER FAILS WITH INVALID USERNAME/PASSWORD WHEN WALLET IS PROVIDED Bug 36187019 - LOB PROCESSING FAIL WHEN DB SET WITH EL8ISO8859P7 CHARACTER SET Bug 36152805 - GET CONNECTION AGAINST NORMAL PDB WITH TFO=ON SHOULD FAIL WITH ORA-18739 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46653 from panbingkun/jdbc_driver_upgrade. Authored-by: panbingkun Signed-off-by: Kent Yao --- pom.xml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pom.xml b/pom.xml index 5811e5b7716d..d92d210a5ffc 100644 --- a/pom.xml +++ b/pom.xml @@ -323,11 +323,11 @@ -Dio.netty.tryReflectionSetAccessible=true 2.7.12 -8.3.0 +8.4.0 42.7.3 11.5.9.0 12.6.1.jre11 -23.3.0.23.09 +23.4.0.24.05 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48323][SQL] DB2: Map BooleanType to BOOLEAN instead of CHAR(1)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3a888f725315 [SPARK-48323][SQL] DB2: Map BooleanType to BOOLEAN instead of CHAR(1) 3a888f725315 is described below commit 3a888f7253155bdee52439bafbdf2b04fe2f186a Author: Kent Yao AuthorDate: Mon May 20 18:52:07 2024 +0800 [SPARK-48323][SQL] DB2: Map BooleanType to BOOLEAN instead of CHAR(1) ### What changes were proposed in this pull request? This PR maps BooleanType to BOOLEAN instead of CHAR(1) when writing DB2 tables, users can restore the old behavior by setting spark.sql.legacy.db2.booleanTypeMapping.enabled to true ### Why are the changes needed? DB2 has supported boolean since v9.7, which is already EOL. It's reasonable to BooleanType to BOOLEAN ### Does this PR introduce _any_ user-facing change? yes, spark.sql.legacy.db2.booleanTypeMapping.enabled is provided to restore ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46637 from yaooqinn/SPARK-48323. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../apache/spark/sql/jdbc/DB2IntegrationSuite.scala| 18 ++ docs/sql-migration-guide.md| 1 + .../scala/org/apache/spark/sql/internal/SQLConf.scala | 11 +++ .../scala/org/apache/spark/sql/jdbc/DB2Dialect.scala | 10 ++ .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala| 2 +- 5 files changed, 33 insertions(+), 9 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala index dbf3eae5e655..72b2ac8074f4 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.{Row, SaveMode} import org.apache.spark.sql.catalyst.util.CharVarcharUtils import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._ import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, StructType} +import org.apache.spark.sql.types.{ByteType, ShortType, StructType} import org.apache.spark.tags.DockerTest /** @@ -174,13 +174,12 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationSuite { df3.write.jdbc(jdbcUrl, "stringscopy", new Properties) // spark types that does not have exact matching db2 table types. val df4 = sqlContext.createDataFrame( - sparkContext.parallelize(Seq(Row("1".toShort, "20".toByte, true))), - new StructType().add("c1", ShortType).add("b", ByteType).add("c3", BooleanType)) + sparkContext.parallelize(Seq(Row("1".toShort, "20".toByte))), + new StructType().add("c1", ShortType).add("b", ByteType)) df4.write.jdbc(jdbcUrl, "otherscopy", new Properties) val rows = sqlContext.read.jdbc(jdbcUrl, "otherscopy", new Properties).collect() assert(rows(0).getShort(0) == 1) assert(rows(0).getShort(1) == 20) -assert(rows(0).getString(2) == "1") } test("query JDBC option") { @@ -252,6 +251,17 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationSuite { test("SPARK-48269: boolean type") { val df = sqlContext.read.jdbc(jdbcUrl, "booleans", new Properties) checkAnswer(df, Row(true)) +Seq(true, false).foreach { legacy => + withSQLConf(SQLConf.LEGACY_DB2_BOOLEAN_MAPPING_ENABLED.key -> legacy.toString) { +val tbl = "booleanscopy" + legacy +df.write.jdbc(jdbcUrl, tbl, new Properties) +if (legacy) { + checkAnswer(sqlContext.read.jdbc(jdbcUrl, tbl, new Properties), Row("1")) +} else { + checkAnswer(sqlContext.read.jdbc(jdbcUrl, tbl, new Properties), Row(true)) +} + } +} } test("SPARK-48269: GRAPHIC types") { diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 02a4fae5d262..98075d019585 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -51,6 +51,7 @@ license: | - Since Spark 4.0, MsSQL Server JDBC datasource will read TINYINT as ShortType, while in Spark 3.5 and previous, read as IntegerType. To restore the previous behavior, set `spark.sql.legacy.mssqlserver.numericMapping.enabled` to `true`. - Since Spark 4.0, MsSQL Server JDBC datasource will read DATETIMEOFF
(spark) branch master updated (b0e535217bf8 -> 403619a3974c)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b0e535217bf8 [SPARK-48301][SQL][FOLLOWUP] Update the error message add 403619a3974c [SPARK-48306][SQL] Improve UDT in error message No new revisions were added by this update. Summary of changes: .../src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala | 2 +- .../scala/org/apache/spark/sql/errors/DataTypeErrorsBase.scala | 3 ++- .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala | 10 +- .../main/scala/org/apache/spark/sql/hive/HiveInspectors.scala | 5 +++-- .../sql/hive/execution/HiveScriptTransformationSuite.scala | 9 - .../org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala | 4 ++-- 6 files changed, 17 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (3bd845ea930a -> fa83d0f8fce7)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3bd845ea930a [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar add fa83d0f8fce7 [SPARK-48296][SQL] Codegen Support for `to_xml` No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/xmlExpressions.scala | 11 ++- .../org/apache/spark/sql/XmlFunctionsSuite.scala | 19 ++- 2 files changed, 24 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new c1dd4a5df693 [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar c1dd4a5df693 is described below commit c1dd4a5df69340884f3f0f0c28ce916bf9e30159 Author: Kent Yao AuthorDate: Thu May 16 17:29:47 2024 +0800 [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar ### What changes were proposed in this pull request? TRANSFORM with char/varchar has been accidentally invalidated since 3.1 with a scala.MatchError, this PR fixes it ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46603 from yaooqinn/SPARK-48297. Authored-by: Kent Yao Signed-off-by: Kent Yao (cherry picked from commit 3bd845ea930a4709b7a2f0447b5f8af64c697239) Signed-off-by: Kent Yao --- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 4 +++- .../resources/sql-tests/analyzer-results/transform.sql.out| 11 +++ sql/core/src/test/resources/sql-tests/inputs/transform.sql| 6 +- .../src/test/resources/sql-tests/results/transform.sql.out| 10 ++ 4 files changed, 29 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 5d68aed9245a..f38d41af445e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -787,7 +787,9 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { // Create the attributes. val (attributes, schemaLess) = if (transformClause.colTypeList != null) { // Typed return columns. - (DataTypeUtils.toAttributes(createSchema(transformClause.colTypeList)), false) + val schema = createSchema(transformClause.colTypeList) + val replacedSchema = CharVarcharUtils.replaceCharVarcharWithStringInSchema(schema) + (DataTypeUtils.toAttributes(replacedSchema), false) } else if (transformClause.identifierSeq != null) { // Untyped return columns. val attrs = visitIdentifierSeq(transformClause.identifierSeq).map { name => diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out index ceca433a1c91..aa595c551f79 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/transform.sql.out @@ -1035,3 +1035,14 @@ ScriptTransformation cat, [a#x, b#x], ScriptInputOutputSchema(List(),List(),None +- Project [a#x, b#x] +- SubqueryAlias complex_trans +- LocalRelation [a#x, b#x] + + +-- !query +SELECT TRANSFORM (a, b) + USING 'cat' AS (a CHAR(10), b VARCHAR(10)) +FROM VALUES('apache', 'spark') t(a, b) +-- !query analysis +ScriptTransformation cat, [a#x, b#x], ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false) ++- Project [a#x, b#x] + +- SubqueryAlias t + +- LocalRelation [a#x, b#x] diff --git a/sql/core/src/test/resources/sql-tests/inputs/transform.sql b/sql/core/src/test/resources/sql-tests/inputs/transform.sql index 922a1d817778..8570496d439e 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/transform.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/transform.sql @@ -415,4 +415,8 @@ FROM ( ORDER BY a ) map_output SELECT TRANSFORM(a, b) - USING 'cat' AS (a, b); \ No newline at end of file + USING 'cat' AS (a, b); + +SELECT TRANSFORM (a, b) + USING 'cat' AS (a CHAR(10), b VARCHAR(10)) +FROM VALUES('apache', 'spark') t(a, b); diff --git a/sql/core/src/test/resources/sql-tests/results/transform.sql.out b/sql/core/src/test/resources/sql-tests/results/transform.sql.out index ab726b93c07c..7975392fd014 100644 --- a/sql/core/src/test/resources/sql-tests/results/transform.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/transform.sql.out @@ -837,3 +837,13 @@ struct 3 3 3 3 3 3 + + +-- !query +SELECT TRANSFORM (a, b) + USING 'cat' AS (a CHAR(10), b VARCHAR(10)) +FROM VALUES('apache', 'spark') t(a, b) +-- !query schema +struct +-- !query output +apache spark - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-m
(spark) branch master updated (b53d78e94f6e -> 3bd845ea930a)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b53d78e94f6e [SPARK-48036][DOCS][FOLLOWUP] Update sql-ref-ansi-compliance.md add 3bd845ea930a [SPARK-48297][SQL] Fix a regression TRANSFORM clause with char/varchar No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 4 +++- .../resources/sql-tests/analyzer-results/transform.sql.out| 11 +++ sql/core/src/test/resources/sql-tests/inputs/transform.sql| 6 +- .../src/test/resources/sql-tests/results/transform.sql.out| 10 ++ 4 files changed, 29 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (0ba8ddc9ce5b -> b53d78e94f6e)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0ba8ddc9ce5b [SPARK-48293][SS] Add test for when ForeachBatchUserFuncException wraps interrupted exception due to query stop add b53d78e94f6e [SPARK-48036][DOCS][FOLLOWUP] Update sql-ref-ansi-compliance.md No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48264][BUILD] Upgrade `datasketches-java` to 6.0.0
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 97717363abae [SPARK-48264][BUILD] Upgrade `datasketches-java` to 6.0.0 97717363abae is described below commit 97717363abae0526f4a6f8c577f539da2d4ea314 Author: panbingkun AuthorDate: Thu May 16 14:14:36 2024 +0800 [SPARK-48264][BUILD] Upgrade `datasketches-java` to 6.0.0 ### What changes were proposed in this pull request? The pr aims to upgrade `datasketches-java` from `5.0.1` to `6.0.0` ### Why are the changes needed? The full release notes: - https://github.com/apache/datasketches-java/releases/tag/6.0.0 - https://github.com/apache/datasketches-java/releases/tag/5.0.2 https://github.com/apache/spark/assets/15246973/fff5905a-25e8-4e2f-9492-1b6099b2bd05;> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46563 from panbingkun/SPARK-48264. Authored-by: panbingkun Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 1bd135e05b58..4b6f5dda585b 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -58,7 +58,7 @@ curator-recipes/5.6.0//curator-recipes-5.6.0.jar datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar -datasketches-java/5.0.1//datasketches-java-5.0.1.jar +datasketches-java/6.0.0//datasketches-java-6.0.0.jar datasketches-memory/2.2.0//datasketches-memory-2.2.0.jar derby/10.16.1.1//derby-10.16.1.1.jar derbyshared/10.16.1.1//derbyshared-10.16.1.1.jar diff --git a/pom.xml b/pom.xml index da9f878b33b8..611e82f343d8 100644 --- a/pom.xml +++ b/pom.xml @@ -213,7 +213,7 @@ 1.6.0 1.78 1.13.0 -5.0.1 +6.0.0 4.1.109.Final 2.0.65.Final 72.1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47607] Add documentation for Structured logging framework
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9130f78fb12e [SPARK-47607] Add documentation for Structured logging framework 9130f78fb12e is described below commit 9130f78fb12eed94f48e1fd9ccedb6fe651a4440 Author: Gengliang Wang AuthorDate: Thu May 16 14:13:13 2024 +0800 [SPARK-47607] Add documentation for Structured logging framework ### What changes were proposed in this pull request? Add documentation for Structured logging framework ### Why are the changes needed? Provide document for Spark developers ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Doc preview: https://github.com/apache/spark/assets/1097932/d3c4fcdc-57e4-4af2-8b05-6b4f6731a8c0;> ### Was this patch authored or co-authored using generative AI tooling? No Closes #46605 from gengliangwang/updateGuideline. Authored-by: Gengliang Wang Signed-off-by: Kent Yao --- .../main/scala/org/apache/spark/internal/README.md | 33 ++ 1 file changed, 33 insertions(+) diff --git a/common/utils/src/main/scala/org/apache/spark/internal/README.md b/common/utils/src/main/scala/org/apache/spark/internal/README.md index c0190b965834..28d279485187 100644 --- a/common/utils/src/main/scala/org/apache/spark/internal/README.md +++ b/common/utils/src/main/scala/org/apache/spark/internal/README.md @@ -1,5 +1,38 @@ # Guidelines for the Structured Logging Framework +## Scala Logging +Use the `org.apache.spark.internal.Logging` trait for logging in Scala code: +* **Logging Messages with Variables**: When logging a message with variables, wrap all the variables with `MDC`s and they will be automatically added to the Mapped Diagnostic Context (MDC). This allows for structured logging and better log analysis. +```scala +logInfo(log"Trying to recover app: ${MDC(LogKeys.APP_ID, app.id)}") +``` +* **Constant String Messages**: If you are logging a constant string message, use the log methods that accept a constant string. +```scala +logInfo("StateStore stopped") +``` + +## Java Logging +Use the `org.apache.spark.internal.SparkLoggerFactory` to get the logger instance in Java code: +* **Getting Logger Instance**: Instead of using `org.slf4j.LoggerFactory`, use `org.apache.spark.internal.SparkLoggerFactory` to ensure structured logging. +```java +import org.apache.spark.internal.SparkLogger; +import org.apache.spark.internal.SparkLoggerFactory; + +private static final SparkLogger logger = SparkLoggerFactory.getLogger(JavaUtils.class); +``` +* **Logging Messages with Variables**: When logging messages with variables, wrap all the variables with `MDC`s and they will be automatically added to the Mapped Diagnostic Context (MDC). +```java +import org.apache.spark.internal.LogKeys; +import org.apache.spark.internal.MDC; + +logger.error("Unable to delete file for partition {}", MDC.of(LogKeys.PARTITION_ID$.MODULE$, i)); +``` + +* **Constant String Messages**: For logging constant string messages, use the standard logging methods. +```java +logger.error("Failed to abort the writer after failing to write map output.", e); +``` + ## LogKey `LogKey`s serve as identifiers for mapped diagnostic contexts (MDC) within logs. Follow these guidelines when adding a new LogKey: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (dec910ba3c36 -> 726f2c95d4dc)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from dec910ba3c36 [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` add 726f2c95d4dc [SPARK-48299][BUILD] Upgrade `scala-maven-plugin` to 4.9.1 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48219][CORE] StreamReader Charset fix with UTF8
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5e8322150a05 [SPARK-48219][CORE] StreamReader Charset fix with UTF8 5e8322150a05 is described below commit 5e8322150a050ad4d0c3962d62c9a2b3e9a937c1 Author: xuyu <11161...@vivo.com> AuthorDate: Thu May 16 12:11:44 2024 +0800 [SPARK-48219][CORE] StreamReader Charset fix with UTF8 ### What changes were proposed in this pull request? Fix some StreamReader not set with UTF8,if we actually default charset not support Chinese chars such as latin and conf contain Chinese chars,it would not resolve success,so we need set it as utf8 in StreamReader,we can find all StreamReader with utf8 charset in other compute framework,such as Calcite、Hive、Hudi and so on. ### Why are the changes needed? May cause string decode not as expected ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Not need ### Was this patch authored or co-authored using generative AI tooling? No Closes #46509 from xuzifu666/SPARK-48219. Authored-by: xuyu <11161...@vivo.com> Signed-off-by: Kent Yao --- .../main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java index 410d010a79bd..4b55453ec7a8 100644 --- a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java +++ b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java @@ -22,6 +22,7 @@ import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; +import java.nio.charset.StandardCharsets; import java.util.HashSet; import java.util.List; import java.util.Map; @@ -171,7 +172,7 @@ public class HiveSessionImpl implements HiveSession { FileInputStream initStream = null; BufferedReader bufferedReader = null; initStream = new FileInputStream(fileName); - bufferedReader = new BufferedReader(new InputStreamReader(initStream)); + bufferedReader = new BufferedReader(new InputStreamReader(initStream, StandardCharsets.UTF_8)); return bufferedReader; } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48289][DOCKER][TEST] Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4ffaa2e89a8a [SPARK-48289][DOCKER][TEST] Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset 4ffaa2e89a8a is described below commit 4ffaa2e89a8a777a374b7f5b22166ef9bac8b99f Author: Kent Yao AuthorDate: Thu May 16 10:09:15 2024 +0800 [SPARK-48289][DOCKER][TEST] Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset ### What changes were proposed in this pull request? This pull request improves the Oracle JDBC tests by skipping the redundant SYSTEM password reset. ### Why are the changes needed? These changes are necessary to clean up the Oracle JDBC tests. This pull request effectively reverts the modifications introduced in [SPARK-46592](https://issues.apache.org/jira/browse/SPARK-46592) and [PR #44594](https://github.com/apache/spark/pull/44594), which attempted to work around the sporadic occurrence of ORA-65048 and ORA-04021 errors by setting the Oracle parameter DDL_LOCK_TIMEOUT. As discussed in [issue #35](https://github.com/gvenzl/oci-oracle-free/issues/35), setting DDL_LOCK_TIMEOUT did not resolve the issue. The root cause appears to be an Oracle bug or unwanted behavior related to the use of Pluggable Database (PDB) rather than the expected functionality of Oracle itself. Additionally, with [SPARK-48141](https://issues.apache.org/jira/browse/SPARK-48141), we have upgraded the Oracle version used in the tests to Oracle Free 23ai, version 23.4. This upgrade should help address some of the issues observed with the previous Oracle version. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This patch was tested using the existing test suite, with a particular focus on Oracle JDBC tests. The following steps were executed: ``` export ENABLE_DOCKER_INTEGRATION_TESTS=1 ./build/sbt -Pdocker-integration-tests "docker-integration-tests/testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46598 from LucaCanali/fixOracleIntegrationTests. Lead-authored-by: Kent Yao Co-authored-by: Luca Canali Signed-off-by: Kent Yao --- .../spark/sql/jdbc/OracleDatabaseOnDocker.scala| 31 -- .../spark/sql/jdbc/OracleIntegrationSuite.scala| 8 +++--- .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 10 +++ .../spark/sql/jdbc/v2/OracleNamespaceSuite.scala | 8 +++--- 4 files changed, 13 insertions(+), 44 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala index 88bb23f9c653..dd6bbf0af8a3 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala @@ -17,12 +17,7 @@ package org.apache.spark.sql.jdbc -import java.io.{File, PrintWriter} - -import com.github.dockerjava.api.model._ - import org.apache.spark.internal.Logging -import org.apache.spark.util.Utils class OracleDatabaseOnDocker extends DatabaseOnDocker with Logging { lazy override val imageName = @@ -38,30 +33,4 @@ class OracleDatabaseOnDocker extends DatabaseOnDocker with Logging { override def getJdbcUrl(ip: String, port: Int): String = { s"jdbc:oracle:thin:system/$oracle_password@//$ip:$port/freepdb1" } - - override def beforeContainerStart( - hostConfigBuilder: HostConfig, - containerConfigBuilder: ContainerConfig): Unit = { -try { - val dir = Utils.createTempDir() - val writer = new PrintWriter(new File(dir, "install.sql")) - // SPARK-46592: gvenzl/oracle-free occasionally fails to start with the following error: - // 'ORA-04021: timeout occurred while waiting to lock object', when initializing the - // SYSTEM user. This is due to the fact that the default DDL_LOCK_TIMEOUT is 0, which - // means that the lock will no wait. We set the timeout to 30 seconds to try again. - // TODO: This workaround should be removed once the issue is fixed in the image. - // https://github.com/gvenzl/oci-oracle-free/issues/35 - writer.write("ALTER SESSION SET DDL_LOCK_TIMEOUT = 30;\n") - writer.write(s"""ALTER USER SYSTEM IDENTIFIED BY "$oracle_password";""") - writer.close() - val newBind = new Bind( -dir.getAbsolutePath,
(spark) branch master updated: [SPARK-46759][AVRO][FOLLOWUP] Fix configuration name for spark.sql.avro.xz.level
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d0385c4a99c1 [SPARK-46759][AVRO][FOLLOWUP] Fix configuration name for spark.sql.avro.xz.level d0385c4a99c1 is described below commit d0385c4a99c172fa3e1ba2d72a65c8632b5c72a9 Author: Kent Yao AuthorDate: Wed May 15 16:48:55 2024 +0800 [SPARK-46759][AVRO][FOLLOWUP] Fix configuration name for spark.sql.avro.xz.level ### What changes were proposed in this pull request? `spark.sql.avro.xz.level` is wrongly defined as `spark.sql.avro.zx.level` ### Why are the changes needed? Bugfix ### Does this PR introduce _any_ user-facing change? no, it is not exposed via releases ### How was this patch tested? manually ### Was this patch authored or co-authored using generative AI tooling? no Closes #46590 from yaooqinn/SPARK-46759-F. Authored-by: Kent Yao Signed-off-by: Kent Yao --- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 9edef5a1f3ca..afae4ebb5395 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -3815,7 +3815,7 @@ object SQLConf { .checkValues((1 to 9).toSet + Deflater.DEFAULT_COMPRESSION) .createOptional - val AVRO_XZ_LEVEL = buildConf("spark.sql.avro.zx.level") + val AVRO_XZ_LEVEL = buildConf("spark.sql.avro.xz.level") .doc("Compression level for the xz codec used in writing of AVRO files. " + "Valid value must be in the range of from 1 to 9 inclusive " + "The default value is 6.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48271][SQL] Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ad5fcae0b0ed [SPARK-48271][SQL] Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER ad5fcae0b0ed is described below commit ad5fcae0b0ed41f7e97ab419b32068e5adf71064 Author: Wenchen Fan AuthorDate: Wed May 15 16:45:20 2024 +0800 [SPARK-48271][SQL] Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER ### What changes were proposed in this pull request? Today we can't create `RowEncoder` with char/varchar data type, because we believe this can't happen. Spark will turn char/varchar into string type in leaf nodes. However, advanced users can even create custom logical plans and it's hard to guarantee no char/varchar data type in the entire query plan tree. UDF return type can also be char/varchar. This PR adds UNSUPPORTED_DATA_TYPE_FOR_ENCODER instead of throwing scala match error. ### Why are the changes needed? better error ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #46586 from cloud-fan/error. Authored-by: Wenchen Fan Signed-off-by: Kent Yao --- .../src/main/resources/error/error-conditions.json | 6 ++ .../apache/spark/sql/catalyst/encoders/RowEncoder.scala | 12 +--- .../src/test/scala/org/apache/spark/sql/UDFSuite.scala | 17 + 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json index 730999085de9..75067a1920f7 100644 --- a/common/utils/src/main/resources/error/error-conditions.json +++ b/common/utils/src/main/resources/error/error-conditions.json @@ -4207,6 +4207,12 @@ ], "sqlState" : "0A000" }, + "UNSUPPORTED_DATA_TYPE_FOR_ENCODER" : { +"message" : [ + "Cannot create encoder for . Please use a different output data type for your UDF or DataFrame." +], +"sqlState" : "0A000" + }, "UNSUPPORTED_DEFAULT_VALUE" : { "message" : [ "DEFAULT column values is not supported." diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala index 16ac283eccb1..c507e952630f 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala @@ -20,9 +20,9 @@ package org.apache.spark.sql.catalyst.encoders import scala.collection.mutable import scala.reflect.classTag -import org.apache.spark.sql.Row +import org.apache.spark.sql.{AnalysisException, Row} import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.{BinaryEncoder, BoxedBooleanEncoder, BoxedByteEncoder, BoxedDoubleEncoder, BoxedFloatEncoder, BoxedIntEncoder, BoxedLongEncoder, BoxedShortEncoder, CalendarIntervalEncoder, DateEncoder, DayTimeIntervalEncoder, EncoderField, InstantEncoder, IterableEncoder, JavaDecimalEncoder, LocalDateEncoder, LocalDateTimeEncoder, MapEncoder, NullEncoder, RowEncoder => AgnosticRowEncoder, StringEncoder, TimestampEncoder, UDTEncoder, VariantE [...] -import org.apache.spark.sql.errors.ExecutionErrors +import org.apache.spark.sql.errors.{DataTypeErrorsBase, ExecutionErrors} import org.apache.spark.sql.internal.SqlApiConf import org.apache.spark.sql.types._ import org.apache.spark.util.ArrayImplicits._ @@ -59,7 +59,7 @@ import org.apache.spark.util.ArrayImplicits._ * StructType -> org.apache.spark.sql.Row * }}} */ -object RowEncoder { +object RowEncoder extends DataTypeErrorsBase { def encoderFor(schema: StructType): AgnosticEncoder[Row] = { encoderFor(schema, lenient = false) } @@ -124,5 +124,11 @@ object RowEncoder { field.nullable, field.metadata) }.toImmutableArraySeq) + +case _ => + throw new AnalysisException( +errorClass = "UNSUPPORTED_DATA_TYPE_FOR_ENCODER", +messageParameters = Map("dataType" -> toSQLType(dataType)) +) } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala index fe47d6c68555..32ad5a94984b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala @@ -1194,4 +1194,21 @@ class UDFSuite extends QueryTest with SharedSparkSession { .sele
(spark) branch master updated (5e87e9fbd6e6 -> da78949eee04)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5e87e9fbd6e6 [SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage add da78949eee04 [SPARK-48269][DOCS][TESTS] DB2: Document Mapping Spark SQL Data Types from DB2 and add tests No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/DB2IntegrationSuite.scala | 37 + docs/sql-data-sources-jdbc.md | 149 + 2 files changed, 186 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 7a0c72ff7724 Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects" 7a0c72ff7724 is described below commit 7a0c72ff7724b2ee40843e5bd4f83833bfa56052 Author: Kent Yao AuthorDate: Wed May 15 10:10:03 2024 +0800 Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects" This reverts commit a848e2790cba0b7ee77d391dc534146bd35ee50a. --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 - .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 - .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 - .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 - .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 - .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 - .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 - .../sql/connector/util/V2ExpressionSQLBuilder.java | 3 + .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 + .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 -- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 14 insertions(+), 291 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 11ddce68aecd..1a25cd2802dd 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -67,12 +67,6 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() -connection.prepareStatement( - s"""CREATE TABLE pattern_testing_table ( - |pattern_testing_col LONGTEXT - |) - """.stripMargin -).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index a42caeafe6fe..72edfc9f1bf1 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,17 +38,6 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() - -connection.prepareStatement( - s""" - |INSERT INTO pattern_testing_table VALUES - |('special_character_quote\\'_present'), - |('special_character_quote_not_present'), - |('special_character_percent%_present'), - |('special_character_percent_not_present'), - |('special_character_underscore_present'), - |('special_character_underscorenot_present') - """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index 6658b5ed6c77..a527c6f8cb5b 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala @@ -66,12 +66,6 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JD connection.prepareStatement( "CREATE TABLE employee (dept INT, name VARCHAR(32), salary NUMERIC(20, 2), bonus FLOAT)") .executeUpdate() -connection.prepareStatement( - s"""CREATE TABLE pattern_testing_table ( - |pattern_testing_col LONGTEXT - |) - """.stripMargin -).executeUpdate() } override def notSupportsTableComment: Boolean = true diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v
(spark) branch branch-3.5 updated: Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 74724d61c3d0 Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects" 74724d61c3d0 is described below commit 74724d61c3d04925da6faa5d49643619aa14f206 Author: Kent Yao AuthorDate: Wed May 15 10:09:09 2024 +0800 Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects" This reverts commit f37fa436cd4e0ef9f486a60f9af91a3ce0195df9. --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 - .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 - .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 - .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 - .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 - .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 - .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 - .../sql/connector/util/V2ExpressionSQLBuilder.java | 3 + .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 + .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 -- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 14 insertions(+), 291 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 9b4916ddd36b..9a78244f5326 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -80,12 +80,6 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() -connection.prepareStatement( - s"""CREATE TABLE pattern_testing_table ( - |pattern_testing_col LONGTEXT - |) - """.stripMargin -).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index a42caeafe6fe..72edfc9f1bf1 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,17 +38,6 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() - -connection.prepareStatement( - s""" - |INSERT INTO pattern_testing_table VALUES - |('special_character_quote\\'_present'), - |('special_character_quote_not_present'), - |('special_character_percent%_present'), - |('special_character_percent_not_present'), - |('special_character_underscore_present'), - |('special_character_underscorenot_present') - """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index 57a2667557fa..0dc3a39f4db5 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala @@ -86,12 +86,6 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JD connection.prepareStatement( "CREATE TABLE employee (dept INT, name VARCHAR(32), salary NUMERIC(20, 2), bonus FLOAT)") .executeUpdate() -connection.prepareStatement( - s"""CREATE TABLE pattern_testing_table ( - |pattern_testing_col LONGTEXT - |) - """.stripMargin -).executeUpdate() } override def notSupportsTableComment: Boolean = true diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v
(spark) branch master updated: Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects"
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4ff5ca81cffb Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects" 4ff5ca81cffb is described below commit 4ff5ca81cffbd1940c864144ca8fbba54b605e4e Author: Kent Yao AuthorDate: Wed May 15 10:05:31 2024 +0800 Revert "[SPARK-48172][SQL] Fix escaping issues in JDBC Dialects" This reverts commit 47006a493f98ca85196194d16d58b5847177b1a3. --- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 6 - .../sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala | 11 - .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 - .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 6 - .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 6 - .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 - .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 229 - .../sql/connector/util/V2ExpressionSQLBuilder.java | 1 + .../sql/connector/expressions/expressions.scala| 4 +- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 7 + .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 15 -- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 6 +- 12 files changed, 12 insertions(+), 291 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 36795747319d..3642094d11b2 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala @@ -62,12 +62,6 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest { connection.prepareStatement( "CREATE TABLE employee (dept INTEGER, name VARCHAR(10), salary DECIMAL(20, 2), bonus DOUBLE)") .executeUpdate() -connection.prepareStatement( - s"""CREATE TABLE pattern_testing_table ( - |pattern_testing_col LONGTEXT - |) - """.stripMargin -).executeUpdate() } override def testUpdateColumnType(tbl: String): Unit = { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala index a42caeafe6fe..72edfc9f1bf1 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DockerJDBCIntegrationV2Suite.scala @@ -38,17 +38,6 @@ abstract class DockerJDBCIntegrationV2Suite extends DockerJDBCIntegrationSuite { .executeUpdate() connection.prepareStatement("INSERT INTO employee VALUES (6, 'jen', 12000, 1200)") .executeUpdate() - -connection.prepareStatement( - s""" - |INSERT INTO pattern_testing_table VALUES - |('special_character_quote\\'_present'), - |('special_character_quote_not_present'), - |('special_character_percent%_present'), - |('special_character_percent_not_present'), - |('special_character_underscore_present'), - |('special_character_underscorenot_present') - """.stripMargin).executeUpdate() } def tablePreparation(connection: Connection): Unit diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index 46530fe5419a..b1b8aec5ad33 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala @@ -70,12 +70,6 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JD connection.prepareStatement( "CREATE TABLE employee (dept INT, name VARCHAR(32), salary NUMERIC(20, 2), bonus FLOAT)") .executeUpdate() -connection.prepareStatement( - s"""CREATE TABLE pattern_testing_table ( - |pattern_testing_col LONGTEXT - |) - """.stripMargin -).executeUpdate() } override def notSupportsTableComment: Boolean = true diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQ
(spark) branch master updated: [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 207d675110e6 [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType 207d675110e6 is described below commit 207d675110e6fa699a434e81296f6f050eb0304b Author: Kent Yao AuthorDate: Thu May 9 17:27:04 2024 +0800 [SPARK-48211][SQL] DB2: Read SMALLINT as ShortType ### What changes were proposed in this pull request? This PR supports read SMALLINT from DB2 as ShortType ### Why are the changes needed? - 15 bits is sufficient - we write ShortType as SMALLINT - we read smallint from other builtin jdbc sources as ShortType ### Does this PR introduce _any_ user-facing change? yes, we add a migration guide for this ### How was this patch tested? changed tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46497 from yaooqinn/SPARK-48211. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../spark/sql/jdbc/DB2IntegrationSuite.scala | 69 +- docs/sql-migration-guide.md| 1 + .../org/apache/spark/sql/internal/SQLConf.scala| 11 .../org/apache/spark/sql/jdbc/DB2Dialect.scala | 3 + 4 files changed, 56 insertions(+), 28 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala index cedb33d491fb..aca174cce194 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala @@ -25,6 +25,7 @@ import org.scalatest.time.SpanSugar._ import org.apache.spark.sql.{Row, SaveMode} import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, StructType} import org.apache.spark.tags.DockerTest @@ -77,32 +78,44 @@ class DB2IntegrationSuite extends DockerJDBCIntegrationSuite { } test("Numeric types") { -val df = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties) -val rows = df.collect() -assert(rows.length == 1) -val types = rows(0).toSeq.map(x => x.getClass.toString) -assert(types.length == 10) -assert(types(0).equals("class java.lang.Integer")) -assert(types(1).equals("class java.lang.Integer")) -assert(types(2).equals("class java.lang.Long")) -assert(types(3).equals("class java.math.BigDecimal")) -assert(types(4).equals("class java.lang.Double")) -assert(types(5).equals("class java.lang.Double")) -assert(types(6).equals("class java.lang.Float")) -assert(types(7).equals("class java.math.BigDecimal")) -assert(types(8).equals("class java.math.BigDecimal")) -assert(types(9).equals("class java.math.BigDecimal")) -assert(rows(0).getInt(0) == 17) -assert(rows(0).getInt(1) == 7) -assert(rows(0).getLong(2) == 922337203685477580L) -val bd = new BigDecimal("123456745.567890123450") -assert(rows(0).getAs[BigDecimal](3).equals(bd)) -assert(rows(0).getDouble(4) == 42.75) -assert(rows(0).getDouble(5) == 5.4E-70) -assert(rows(0).getFloat(6) == 3.4028234663852886e+38) -assert(rows(0).getDecimal(7) == new BigDecimal("4.299900")) -assert(rows(0).getDecimal(8) == new BigDecimal(".00")) -assert(rows(0).getDecimal(9) == new BigDecimal("1234567891234567.123456789123456789")) +Seq(true, false).foreach { legacy => + withSQLConf(SQLConf.LEGACY_DB2_TIMESTAMP_MAPPING_ENABLED.key -> legacy.toString) { +val df = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties) +val rows = df.collect() +assert(rows.length == 1) +val types = rows(0).toSeq.map(x => x.getClass.toString) +assert(types.length == 10) +if (legacy) { + assert(types(0).equals("class java.lang.Integer")) +} else { + assert(types(0).equals("class java.lang.Short")) +} +assert(types(1).equals("class java.lang.Integer")) +assert(types(2).equals("class java.lang.Long")) +assert(types(3).equals("class java.math.BigDecimal")) +assert(types(4).equals("class java.lang.Double")) +assert(types(5).equals("class java.lang.Double
(spark) branch master updated: [SPARK-48188][SQL] Consistently use normalized plan for cache
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8950add773e6 [SPARK-48188][SQL] Consistently use normalized plan for cache 8950add773e6 is described below commit 8950add773e63a910900f796950a6a58e40a8577 Author: Wenchen Fan AuthorDate: Wed May 8 20:11:24 2024 +0800 [SPARK-48188][SQL] Consistently use normalized plan for cache ### What changes were proposed in this pull request? We must consistently use normalized plans for cache filling and lookup, or inconsistency will lead to cache misses. To guarantee this, this PR makes `CacheManager` the central place to do plan normalization, so that callers don't need to care about it. Now most APIs in `CacheManager` take either `Dataset` or `LogicalPlan`. For `Dataset`, we get the normalized plan directly. For `LogicalPlan`, we normalize it before further use. The caller side should pass `Dataset` when invoking `CacheManager`, if it already creates `Dataset`. This is to reduce the impact, as extra creation of `Dataset` may have perf issues or introduce unexpected analysis exception. ### Why are the changes needed? Avoid unnecessary cache misses for users who add custom normalization rules ### Does this PR introduce _any_ user-facing change? No, perf only ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46465 from cloud-fan/cache. Authored-by: Wenchen Fan Signed-off-by: Kent Yao --- .../main/scala/org/apache/spark/sql/Dataset.scala | 3 +- .../apache/spark/sql/execution/CacheManager.scala | 160 + .../spark/sql/execution/QueryExecution.scala | 37 +++-- .../execution/command/AnalyzeColumnCommand.scala | 4 +- .../spark/sql/execution/command/CommandUtils.scala | 2 +- .../execution/datasources/v2/CacheTableExec.scala | 30 ++-- .../datasources/v2/DataSourceV2Strategy.scala | 2 +- .../apache/spark/sql/internal/CatalogImpl.scala| 5 +- .../org/apache/spark/sql/CachedTableSuite.scala| 2 +- .../org/apache/spark/sql/test/SQLTestUtils.scala | 3 +- .../apache/spark/sql/hive/CachedTableSuite.scala | 9 +- 11 files changed, 150 insertions(+), 107 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index 18c9704afdf8..3e843e64ebbf 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -3904,8 +3904,7 @@ class Dataset[T] private[sql]( * @since 1.6.0 */ def unpersist(blocking: Boolean): this.type = { -sparkSession.sharedState.cacheManager.uncacheQuery( - sparkSession, logicalPlan, cascade = false, blocking) +sparkSession.sharedState.cacheManager.uncacheQuery(this, cascade = false, blocking) this } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala index ae99873a9f77..b96f257e6b5b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.{Dataset, SparkSession} import org.apache.spark.sql.catalyst.catalog.HiveTableRelation import org.apache.spark.sql.catalyst.expressions.{Attribute, SubqueryExpression} import org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint -import org.apache.spark.sql.catalyst.plans.logical.{IgnoreCachedData, LogicalPlan, ResolvedHint, SubqueryAlias, View} +import org.apache.spark.sql.catalyst.plans.logical.{IgnoreCachedData, LogicalPlan, ResolvedHint, View} import org.apache.spark.sql.catalyst.trees.TreePattern.PLAN_EXPRESSION import org.apache.spark.sql.catalyst.util.sideBySide import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper @@ -38,7 +38,10 @@ import org.apache.spark.storage.StorageLevel import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK /** Holds a cached logical plan and its data */ -case class CachedData(plan: LogicalPlan, cachedRepresentation: InMemoryRelation) { +case class CachedData( +// A normalized resolved plan (See QueryExecution#normalized). +plan: LogicalPlan, +cachedRepresentation: InMemoryRelation) { override def toString: String = s""" |CachedData( @@ -53,7 +56,9 @@ case class CachedData(plan: LogicalPlan, cachedRepresentation: InMemoryRelation) * InMemoryRelation. This relation is automatically substituted query plans that return the * `sameResult` as the original
(spark) branch master updated (d7f69e7003a3 -> 003823b39d35)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d7f69e7003a3 [SPARK-48190][PYTHON][PS][TESTS] Introduce a helper function to drop metadata add 003823b39d35 [SPARK-48191][SQL] Support UTF-32 for string encode and decode No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 +- .../spark/sql/catalyst/expressions/stringExpressions.scala | 10 +- .../sql/catalyst/expressions/StringExpressionsSuite.scala | 2 ++ .../sql-tests/analyzer-results/ansi/string-functions.sql.out | 7 +++ .../sql-tests/analyzer-results/string-functions.sql.out| 7 +++ .../src/test/resources/sql-tests/inputs/string-functions.sql | 1 + .../resources/sql-tests/results/ansi/string-functions.sql.out | 8 .../test/resources/sql-tests/results/string-functions.sql.out | 8 8 files changed, 39 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f5401bab23c0 -> fe8b18b776f5)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f5401bab23c0 [MINOR][INFRA] Rename builds to have consistent names add fe8b18b776f5 [SPARK-48185][SQL] Fix 'symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo' No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47914][SQL] Do not display the splits parameter in Range
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5f883117203d [SPARK-47914][SQL] Do not display the splits parameter in Range 5f883117203d is described below commit 5f883117203d823cb9914f483e314633845ecaa5 Author: guihuawen AuthorDate: Wed May 8 12:04:35 2024 +0800 [SPARK-47914][SQL] Do not display the splits parameter in Range ### What changes were proposed in this pull request? [SQL] explain extended select * from range(0, 4); Before this pr, the split is also displayed in the logical execution phase as None, if it is not be set. ` plan == Parsed Logical Plan == 'Project [*] +- 'UnresolvedTableValuedFunction [range], [0, 4] == Analyzed Logical Plan == id: bigint Project [id#11L](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-47914?filter=allissues#11L) +- Range (0, 4, step=1, splits=None) == Optimized Logical Plan == Range (0, 4, step=1, splits=None) == Physical Plan == *(1) Range (0, 4, step=1, splits=1)` After this pr, the split will not be displayed in the logical execution phase , if it is not set. At the same time, it will be be displayed when it is be set. ` plan == Parsed Logical Plan == 'Project [*] +- 'UnresolvedTableValuedFunction [range], [0, 4] == Analyzed Logical Plan == id: bigint Project [id#11L](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-47914?filter=allissues#11L) +- Range (0, 4, step=1) == Optimized Logical Plan == Range (0, 4, step=1) == Physical Plan == *(1) Range (0, 4, step=1, splits=1)` ### Why are the changes needed? If the split is not be set. it is also displayed in the logical execution phase as None, which is not very user-friendly. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? Closes #46136 from guixiaowen/SPARK-47914. Authored-by: guihuawen Signed-off-by: Kent Yao --- .../plans/logical/basicLogicalOperators.scala | 3 +- .../sql-tests/analyzer-results/group-by.sql.out| 8 ++-- .../analyzer-results/identifier-clause.sql.out | 2 +- .../analyzer-results/join-lateral.sql.out | 2 +- .../sql-tests/analyzer-results/limit.sql.out | 2 +- .../named-function-arguments.sql.out | 4 +- .../analyzer-results/non-excludable-rule.sql.out | 12 +++--- .../postgreSQL/aggregates_part1.sql.out| 20 - .../analyzer-results/postgreSQL/int8.sql.out | 4 +- .../analyzer-results/postgreSQL/join.sql.out | 6 +-- .../analyzer-results/postgreSQL/numeric.sql.out| 10 ++--- .../analyzer-results/postgreSQL/text.sql.out | 2 +- .../analyzer-results/postgreSQL/union.sql.out | 50 +++--- .../postgreSQL/window_part1.sql.out| 4 +- .../postgreSQL/window_part2.sql.out| 20 - .../postgreSQL/window_part3.sql.out| 8 ++-- .../sql-compatibility-functions.sql.out| 2 +- .../analyzer-results/sql-session-variables.sql.out | 2 +- .../scalar-subquery-predicate.sql.out | 4 +- .../scalar-subquery/scalar-subquery-select.sql.out | 4 +- .../table-valued-functions.sql.out | 14 +++--- .../typeCoercion/native/concat.sql.out | 18 .../typeCoercion/native/elt.sql.out| 8 ++-- .../udf/postgreSQL/udf-aggregates_part1.sql.out| 20 - .../udf/postgreSQL/udf-join.sql.out| 2 +- .../analyzer-results/udf/udf-group-by.sql.out | 4 +- .../results/named-function-arguments.sql.out | 2 +- .../scala/org/apache/spark/sql/ExplainSuite.scala | 6 +-- 28 files changed, 122 insertions(+), 121 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index 4fd640afe3b2..9242a06cf1d6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -1071,7 +1071,8 @@ case class Range( override def newInstance(): Range = copy(output = output.map(_.newInstance())) override def simpleString(maxFields: Int): String = { -s"Range ($start, $end, step=$step, splits=$numSlices)" +val splits = if (numSlices.isDefined) { s", sp
(spark) branch master updated: [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3f15ad40640c [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer 3f15ad40640c is described below commit 3f15ad40640ce71764d1d00b8fae7d88df5e2194 Author: Stefan Bukorovic AuthorDate: Mon Apr 29 19:42:16 2024 +0800 [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer ### What changes were proposed in this pull request? In this PR I propose a change in QueryBuilder for SQLServer. This change modifies push down of predicate that contains a column that is generated with a CASE WHEN construct, so that we add simple ` = 1` comparison to this query, making it work on SQLServer. ### Why are the changes needed? SQLServer does not support 0 or 1 as a boolean values. There are certain situations where spark optimizer rewrites filters that contain CASE WHEN columns in a way that adds 1 or 0 as a boolean values, which fails on SQLServer side with an error "An expression of non-boolean type specified in a context where a condition is expected". With these changes, we modify pushing this filters down, and error is no longer present. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new test case is added, which fails without these changes. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46231 from stefanbuk-db/SQLServer_case_when_bugfix. Authored-by: Stefan Bukorovic Signed-off-by: Kent Yao --- .../spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 13 + .../spark/sql/connector/util/V2ExpressionSQLBuilder.java| 2 +- .../org/apache/spark/sql/jdbc/MsSqlServerDialect.scala | 1 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala index f5f5d00d6bda..65f7579de820 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala @@ -131,4 +131,17 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JD "WHERE (dept > 1 AND ((name LIKE 'am%') = (name LIKE '%y')))") assert(df3.collect().length == 3) } + + test("SPARK-47994: SQLServer does not support 1 or 0 as boolean type in CASE WHEN filter") { +val df = sql( + s""" +|WITH tbl AS ( +|SELECT CASE +|WHEN e.dept = 1 THEN 'first' WHEN e.dept = 2 THEN 'second' ELSE 'third' END +|AS deptString FROM $catalogName.employee as e) +|SELECT * FROM tbl +|WHERE deptString = 'first' +|""".stripMargin) +assert(df.collect().length == 2) + } } diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java index 61d68d4a3e88..e42d9193ea39 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java @@ -356,7 +356,7 @@ public class V2ExpressionSQLBuilder { return joiner.toString(); } - private String[] expressionsToStringArray(Expression[] expressions) { + protected String[] expressionsToStringArray(Expression[] expressions) { String[] result = new String[expressions.length]; for (int i = 0; i < expressions.length; i++) { result[i] = build(expressions[i]); diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala index 5535545efba8..e341bf3720f4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala @@ -92,6 +92,7 @@ private case class MsSqlServerDialect() extends JdbcDialect { case o => inputToSQL(o) } visitBinaryComparison(e.name(), l, r) + case "CASE_WHEN" => visitCaseWhen(expressionsToStringArray(e.children())) + " = 1" case _ => super.build(expr) } case _ => super.build(expr)
(spark) branch branch-3.4 updated: [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new e2f34c75a6ea [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark e2f34c75a6ea is described below commit e2f34c75a6ea686eb6fa4260584bc32b558ce01f Author: Kent Yao AuthorDate: Mon Apr 29 11:40:39 2024 +0800 [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark ### What changes were proposed in this pull request? This PR fixes an NPE in MapStatusesSerDeserBenchmark. The cause is that we try to stop the tracker twice. ``` 3197java.lang.NullPointerException: Cannot invoke "org.apache.spark.rpc.RpcEndpointRef.askSync(Object, scala.reflect.ClassTag)" because the return value of "org.apache.spark.MapOutputTracker.trackerEndpoint()" is null 3198at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:541) 3199at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:551) 3200at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:1242) 3201at org.apache.spark.SparkEnv.stop(SparkEnv.scala:112) 3202at org.apache.spark.SparkContext.$anonfun$stop$25(SparkContext.scala:2354) 3203at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1294) 3204at org.apache.spark.SparkContext.stop(SparkContext.scala:2354) 3205at org.apache.spark.SparkContext.stop(SparkContext.scala:2259) 3206at org.apache.spark.MapStatusesSerDeserBenchmark$.afterAll(MapStatusesSerDeserBenchmark.scala:128) 3207at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:80) 3208at org.apache.spark.MapStatusesSerDeserBenchmark.main(MapStatusesSerDeserBenchmark.scala) 3209at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 3210at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 3211at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 3212at java.base/java.lang.reflect.Method.invoke(Method.java:568) 3213at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128) 3214at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323) 3215at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91) 3216at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala) ``` ### Why are the changes needed? test bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually ### Was this patch authored or co-authored using generative AI tooling? no Closes #46270 from yaooqinn/SPARK-48034. Authored-by: Kent Yao Signed-off-by: Kent Yao (cherry picked from commit 59d5946cfd377e9203ccf572deb34f87fab7510c) Signed-off-by: Kent Yao --- core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala | 1 - 1 file changed, 1 deletion(-) diff --git a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala index 797b650799ea..795da65079d6 100644 --- a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala +++ b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala @@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase { } override def afterAll(): Unit = { -tracker.stop() if (sc != null) { sc.stop() } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 616c2162242f [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark 616c2162242f is described below commit 616c2162242f99a3217caa0b7e4344e2979a5e54 Author: Kent Yao AuthorDate: Mon Apr 29 11:40:39 2024 +0800 [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark ### What changes were proposed in this pull request? This PR fixes an NPE in MapStatusesSerDeserBenchmark. The cause is that we try to stop the tracker twice. ``` 3197java.lang.NullPointerException: Cannot invoke "org.apache.spark.rpc.RpcEndpointRef.askSync(Object, scala.reflect.ClassTag)" because the return value of "org.apache.spark.MapOutputTracker.trackerEndpoint()" is null 3198at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:541) 3199at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:551) 3200at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:1242) 3201at org.apache.spark.SparkEnv.stop(SparkEnv.scala:112) 3202at org.apache.spark.SparkContext.$anonfun$stop$25(SparkContext.scala:2354) 3203at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1294) 3204at org.apache.spark.SparkContext.stop(SparkContext.scala:2354) 3205at org.apache.spark.SparkContext.stop(SparkContext.scala:2259) 3206at org.apache.spark.MapStatusesSerDeserBenchmark$.afterAll(MapStatusesSerDeserBenchmark.scala:128) 3207at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:80) 3208at org.apache.spark.MapStatusesSerDeserBenchmark.main(MapStatusesSerDeserBenchmark.scala) 3209at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 3210at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 3211at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 3212at java.base/java.lang.reflect.Method.invoke(Method.java:568) 3213at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128) 3214at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323) 3215at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91) 3216at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala) ``` ### Why are the changes needed? test bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually ### Was this patch authored or co-authored using generative AI tooling? no Closes #46270 from yaooqinn/SPARK-48034. Authored-by: Kent Yao Signed-off-by: Kent Yao (cherry picked from commit 59d5946cfd377e9203ccf572deb34f87fab7510c) Signed-off-by: Kent Yao --- core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala | 1 - 1 file changed, 1 deletion(-) diff --git a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala index 797b650799ea..795da65079d6 100644 --- a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala +++ b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala @@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase { } override def afterAll(): Unit = { -tracker.stop() if (sc != null) { sc.stop() } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 59d5946cfd37 [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark 59d5946cfd37 is described below commit 59d5946cfd377e9203ccf572deb34f87fab7510c Author: Kent Yao AuthorDate: Mon Apr 29 11:40:39 2024 +0800 [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark ### What changes were proposed in this pull request? This PR fixes an NPE in MapStatusesSerDeserBenchmark. The cause is that we try to stop the tracker twice. ``` 3197java.lang.NullPointerException: Cannot invoke "org.apache.spark.rpc.RpcEndpointRef.askSync(Object, scala.reflect.ClassTag)" because the return value of "org.apache.spark.MapOutputTracker.trackerEndpoint()" is null 3198at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:541) 3199at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:551) 3200at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:1242) 3201at org.apache.spark.SparkEnv.stop(SparkEnv.scala:112) 3202at org.apache.spark.SparkContext.$anonfun$stop$25(SparkContext.scala:2354) 3203at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1294) 3204at org.apache.spark.SparkContext.stop(SparkContext.scala:2354) 3205at org.apache.spark.SparkContext.stop(SparkContext.scala:2259) 3206at org.apache.spark.MapStatusesSerDeserBenchmark$.afterAll(MapStatusesSerDeserBenchmark.scala:128) 3207at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:80) 3208at org.apache.spark.MapStatusesSerDeserBenchmark.main(MapStatusesSerDeserBenchmark.scala) 3209at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 3210at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 3211at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 3212at java.base/java.lang.reflect.Method.invoke(Method.java:568) 3213at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128) 3214at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323) 3215at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91) 3216at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala) ``` ### Why are the changes needed? test bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually ### Was this patch authored or co-authored using generative AI tooling? no Closes #46270 from yaooqinn/SPARK-48034. Authored-by: Kent Yao Signed-off-by: Kent Yao --- core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala | 1 - 1 file changed, 1 deletion(-) diff --git a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala index ca85ffda4e60..75f952d063d3 100644 --- a/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala +++ b/core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala @@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase { } override def afterAll(): Unit = { -tracker.stop() if (sc != null) { sc.stop() } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48025][SQL][TESTS] Fix org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0bf39459e435 [SPARK-48025][SQL][TESTS] Fix org.apache.spark.sql.execution.benchmark.DateTimeBenchmark 0bf39459e435 is described below commit 0bf39459e4354c0881c1329ec550c357726a1761 Author: Kent Yao AuthorDate: Sun Apr 28 17:53:43 2024 +0800 [SPARK-48025][SQL][TESTS] Fix org.apache.spark.sql.execution.benchmark.DateTimeBenchmark ### What changes were proposed in this pull request? This PR fixes several issues in org.apache.spark.sql.execution.benchmark.DateTimeBenchmark - Misuse `trunc` function, a.k.a, the parameter order is wrong in reverse order - Some benchmarks not compatible with ANSI by default ### Why are the changes needed? restore benchmark cases ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? benchmark ### Was this patch authored or co-authored using generative AI tooling? no Closes #46261 from yaooqinn/SPARK-48025. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../benchmarks/DateTimeBenchmark-jdk21-results.txt | 372 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 372 ++--- .../execution/benchmark/DateTimeBenchmark.scala| 14 +- 3 files changed, 382 insertions(+), 376 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt index 143f433a3160..4b2d34ba4915 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk21-results.txt @@ -2,460 +2,460 @@ datetime +/- interval -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor datetime +/- interval:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -date + interval(m) 850887 33 11.8 85.0 1.0X -date + interval(m, d) 863864 2 11.6 86.3 1.0X -date + interval(m, d, ms) 3507 3511 5 2.9 350.7 0.2X -date - interval(m) 841851 9 11.9 84.1 1.0X -date - interval(m, d) 864870 5 11.6 86.4 1.0X -date - interval(m, d, ms) 3518 3519 2 2.8 351.8 0.2X -timestamp + interval(m)1756 1759 5 5.7 175.6 0.5X -timestamp + interval(m, d) 1802 1805 4 5.5 180.2 0.5X -timestamp + interval(m, d, ms) 1958 1961 4 5.1 195.8 0.4X -timestamp - interval(m)1744 1745 2 5.7 174.4 0.5X -timestamp - interval(m, d) 1796 1799 4 5.6 179.6 0.5X -timestamp - interval(m, d, ms) 1944 1947 5 5.1 194.4 0.4X +date + interval(m) 1149 1158 12 8.7 114.9 1.0X +date + interval(m, d) 1136 1137 1 8.8 113.6 1.0X +date + interval(m, d, ms) 3779 3799 29 2.6 377.9 0.3X +date - interval(m) 1113 1116 4 9.0 111.3 1.0X +date - interval(m, d) 1124 1141 25 8.9 112.4 1.0X +date - interval(m, d, ms) 3795 3796 1 2.6 379.5 0.3X +timestamp + interval(m)1528 1530 3 6.5 152.8 0.8X +timestamp + interval(m, d) 1581 1585 6 6.3 158.1 0.7X +timestamp + interval(m, d, ms) 2037 2044 10 4.9 203.7
(spark) branch master updated: [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2'
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 356830ada6c6 [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' 356830ada6c6 is described below commit 356830ada6c6ebbf54e7852c37266c32bfa137ea Author: Ruifeng Zheng AuthorDate: Sat Apr 27 22:57:37 2024 +0800 [SPARK-48020][INFRA][PYTHON] Pin 'pandas==2.2.2' ### What changes were proposed in this pull request? 1, pin 'pandas==2.2.2' for `pypy3.9` 2, also change `pandas<=2.2.2` to 'pandas==2.2.2' to avoid unexpected version installation (e.g. for pypy3.8 `pandas<=2.2.2` actually installs version 2.0.3) ### Why are the changes needed? pypy had been upgraded ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #46256 from zhengruifeng/pip_pandas_version. Authored-by: Ruifeng Zheng Signed-off-by: Kent Yao --- dev/infra/Dockerfile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 870fb694045c..cdaa2f8b7c09 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -86,10 +86,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.9 && \ ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.0.3' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.2' scipy coverage matplotlib lxml -ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.2.2 scipy plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2" +ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas==2.2.2 scipy plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2" # Python deps for Spark Connect ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 protobuf==4.25.1 googleapis-common-protos==1.56.4" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new beda1a4615c7 [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer beda1a4615c7 is described below commit beda1a4615c7f33110e360c150cb78832b0fe420 Author: Kent Yao AuthorDate: Fri Apr 26 23:51:58 2024 +0800 [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer ### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/45564, predicate pushdown with boolean comparison syntax in MsSqlServer is disabled as MsSqlServer does not support such a feature. In this PR, we reenable this feature by converting the boolean comparison to an equivalent 1&0 comparison. ### Why are the changes needed? Avoid performance regressions ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing test ``` [info] MsSqlServerIntegrationSuite: [info] - SPARK-47440: SQLServer does not support boolean expression in binary comparison (2 seconds, 206 milliseconds) ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #46236 from yaooqinn/SPARK-47440. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java | 2 +- .../scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala | 9 ++--- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java index fd1b8f5dd1ee..61d68d4a3e88 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java @@ -212,7 +212,7 @@ public class V2ExpressionSQLBuilder { return l + " LIKE '%" + escapeSpecialCharsForLikePattern(value) + "%' ESCAPE '\\'"; } - private String inputToSQL(Expression input) { + protected String inputToSQL(Expression input) { if (input.children().length > 1) { return "(" + build(input) + ")"; } else { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala index a1492d81bf53..5535545efba8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala @@ -86,9 +86,12 @@ private case class MsSqlServerDialect() extends JdbcDialect { // We shouldn't propagate these queries to MsSqlServer expr match { case e: Predicate => e.name() match { - case "=" | "<>" | "<=>" | "<" | "<=" | ">" | ">=" - if e.children().exists(_.isInstanceOf[Predicate]) => -super.visitUnexpectedExpr(expr) + case "=" | "<>" | "<=>" | "<" | "<=" | ">" | ">=" => +val Array(l, r) = e.children().map { + case p: Predicate => s"CASE WHEN ${inputToSQL(p)} THEN 1 ELSE 0 END" + case o => inputToSQL(o) +} +visitBinaryComparison(e.name(), l, r) case _ => super.build(expr) } case _ => super.build(expr) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 733e53a4ff03 [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType 733e53a4ff03 is described below commit 733e53a4ff035b71a4865e1a88271af067d4765d Author: Kent Yao AuthorDate: Fri Apr 26 23:42:20 2024 +0800 [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType ### What changes were proposed in this pull request? This PR changes the `datatimeoffset -> StringType` mapping to `datatimeoffset -> TimestampType` mapping as we use `mssql-jdbc` for Microsoft SQL Server. `spark.sql.legacy.mssqlserver.datetimeoffsetMapping.enabled` is provided for user to restore the old behavior. ### Why are the changes needed? With the official SQL Server client, it's more reasonable to read it as TimestampType, which is also much more compliant with other jdbc datasources ### Does this PR introduce _any_ user-facing change? Yes, (please refer to the first section) ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46239 from yaooqinn/SPARK-47968. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 59 +- docs/sql-data-sources-jdbc.md | 2 +- docs/sql-migration-guide.md| 1 + .../org/apache/spark/sql/internal/SQLConf.scala| 12 + .../apache/spark/sql/jdbc/MsSqlServerDialect.scala | 7 ++- 5 files changed, 55 insertions(+), 26 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala index a39dcb60406e..623f404339e9 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala @@ -223,29 +223,42 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite { test("Date types") { withDefaultTimeZone(UTC) { - { -val df = spark.read - .option("preferTimestampNTZ", "false") - .jdbc(jdbcUrl, "dates", new Properties) -checkAnswer(df, Row( - Date.valueOf("1991-11-09"), - Timestamp.valueOf("1999-01-01 13:23:35"), - Timestamp.valueOf("-12-31 23:59:59"), - "1901-05-09 23:59:59.000 +14:00", - Timestamp.valueOf("1996-01-01 23:24:00"), - Timestamp.valueOf("1970-01-01 13:31:24"))) - } - { -val df = spark.read - .option("preferTimestampNTZ", "true") - .jdbc(jdbcUrl, "dates", new Properties) -checkAnswer(df, Row( - Date.valueOf("1991-11-09"), - LocalDateTime.of(1999, 1, 1, 13, 23, 35), - LocalDateTime.of(, 12, 31, 23, 59, 59), - "1901-05-09 23:59:59.000 +14:00", - LocalDateTime.of(1996, 1, 1, 23, 24, 0), - LocalDateTime.of(1970, 1, 1, 13, 31, 24))) + Seq(true, false).foreach { ntz => +Seq(true, false).foreach { legacy => + withSQLConf( +SQLConf.LEGACY_MSSQLSERVER_DATETIMEOFFSET_MAPPING_ENABLED.key -> legacy.toString) { +val df = spark.read + .option("preferTimestampNTZ", ntz) + .jdbc(jdbcUrl, "dates", new Properties) +checkAnswer(df, Row( + Date.valueOf("1991-11-09"), + if (ntz) { +LocalDateTime.of(1999, 1, 1, 13, 23, 35) + } else { +Timestamp.valueOf("1999-01-01 13:23:35") + }, + if (ntz) { +LocalDateTime.of(, 12, 31, 23, 59, 59) + } else { +Timestamp.valueOf("-12-31 23:59:59") + }, + if (legacy) { +"1901-05-09 23:59:59.000 +14:00" + } else { +Timestamp.valueOf("1901-05-09 09:59:59") + }, + if (ntz) { +LocalDateTime.of(1996, 1, 1, 23, 24, 0) + } else { +Timestamp.valueOf("1996-01-01 23:24:00") + }, + if (ntz) { +LocalDateTime.of(1970, 1, 1, 13, 31, 24) + } else { +Timestamp.valueOf(&qu
(spark) branch master updated: [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b8b6d17ad8e4 [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write b8b6d17ad8e4 is described below commit b8b6d17ad8e472307fb4c03ca388efcc4ac7059e Author: ulysses-you AuthorDate: Fri Apr 26 18:32:18 2024 +0800 [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write ### What changes were proposed in this pull request? This pr adds a new trait `WriteFilesExecBase` for v1 write, so that the downstream project can inherit `WriteFilesExecBase` rather than `WriteFilesExec`. The reason is that, inherit a `case class` is a bad practice in scala world. ### Why are the changes needed? Make downstream project easy to develop. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Pass CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #46240 from ulysses-you/WriteFilesExecBase. Authored-by: ulysses-you Signed-off-by: Kent Yao --- .../spark/sql/execution/datasources/V1Writes.scala | 4 ++-- .../spark/sql/execution/datasources/WriteFiles.scala | 16 +--- .../apache/spark/sql/SparkSessionExtensionSuite.scala| 13 ++--- .../sql/execution/datasources/V1WriteCommandSuite.scala | 8 4 files changed, 21 insertions(+), 20 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala index d7a8d7aec0b7..1d6c2a6f8112 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala @@ -213,9 +213,9 @@ object V1WritesUtils { } } - def getWriteFilesOpt(child: SparkPlan): Option[WriteFilesExec] = { + def getWriteFilesOpt(child: SparkPlan): Option[WriteFilesExecBase] = { child.collectFirst { - case w: WriteFilesExec => w + case w: WriteFilesExecBase => w } } } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala index a4fd57e7dffa..c6c34b7fcea3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala @@ -58,6 +58,14 @@ case class WriteFiles( copy(child = newChild) } +trait WriteFilesExecBase extends UnaryExecNode { + override def output: Seq[Attribute] = Seq.empty + + override protected def doExecute(): RDD[InternalRow] = { +throw SparkException.internalError(s"$nodeName does not support doExecute") + } +} + /** * Responsible for writing files. */ @@ -67,9 +75,7 @@ case class WriteFilesExec( partitionColumns: Seq[Attribute], bucketSpec: Option[BucketSpec], options: Map[String, String], -staticPartitions: TablePartitionSpec) extends UnaryExecNode { - override def output: Seq[Attribute] = Seq.empty - +staticPartitions: TablePartitionSpec) extends WriteFilesExecBase { override protected def doExecuteWrite( writeFilesSpec: WriteFilesSpec): RDD[WriterCommitMessage] = { val rdd = child.execute() @@ -105,10 +111,6 @@ case class WriteFilesExec( } } - override protected def doExecute(): RDD[InternalRow] = { -throw SparkException.internalError(s"$nodeName does not support doExecute") - } - override protected def stringArgs: Iterator[Any] = Iterator(child) override protected def withNewChildInternal(newChild: SparkPlan): WriteFilesExec = diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala index 1c44d0c3b4ea..4d38e360f438 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala @@ -40,7 +40,7 @@ import org.apache.spark.sql.connector.write.WriterCommitMessage import org.apache.spark.sql.execution._ import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanExec, AdaptiveSparkPlanHelper, AQEShuffleReadExec, QueryStageExec, ShuffleQueryStageExec} import org.apache.spark.sql.execution.aggregate.HashAggregateExec -import org.apache.spark.sql.execution.datasources.{FileFormat, WriteFilesExec, WriteFilesSpec} +import org.apache.spark.sql.execution.datasources.{FileFormat, WriteFilesExec, WriteFilesExecBase, WriteFilesSpec} import org.apache.sp
(spark) branch master updated: [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a7150074f0a5 [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` a7150074f0a5 is described below commit a7150074f0a5983374b7dcd7ea8710cdcd050cd6 Author: yangjie01 AuthorDate: Fri Apr 26 15:32:58 2024 +0800 [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` ### What changes were proposed in this pull request? The private implicit function `arrayToArrayWritable` was introduced alongside other implicit functions such as `rddToPairRDDExtras`, `rddToSequencePairRDDExtras`, `intToIntWritable`, etc., in the commit https://github.com/apache/spark/commit/38f2ba99cc32584f0645f3875f051516b4b738d2. Apart from `arrayToArrayWritable`, the other implicit functions were deprecated in SPARK-4397 and SPARK-4795, and subsequently removed in SPARK-12615. `arrayToArrayWritable` remained as a leftover private [...] ### Why are the changes needed? Clean up useless code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46238 from LuciferYang/remove-arrayToArrayWritable. Authored-by: yangjie01 Signed-off-by: Kent Yao --- core/src/main/scala/org/apache/spark/SparkContext.scala | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 58472a1b3a44..5e231544e249 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -28,14 +28,13 @@ import scala.collection.concurrent.{Map => ScalaConcurrentMap} import scala.collection.immutable import scala.collection.mutable.HashMap import scala.jdk.CollectionConverters._ -import scala.language.implicitConversions import scala.reflect.{classTag, ClassTag} import scala.util.control.NonFatal import com.google.common.collect.MapMaker import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{FileSystem, Path} -import org.apache.hadoop.io.{ArrayWritable, BooleanWritable, BytesWritable, DoubleWritable, FloatWritable, IntWritable, LongWritable, NullWritable, Text, Writable} +import org.apache.hadoop.io.{BooleanWritable, BytesWritable, DoubleWritable, FloatWritable, IntWritable, LongWritable, NullWritable, Text, Writable} import org.apache.hadoop.mapred.{FileInputFormat, InputFormat, JobConf, SequenceFileInputFormat, TextInputFormat} import org.apache.hadoop.mapreduce.{InputFormat => NewInputFormat, Job => NewHadoopJob} import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat => NewFileInputFormat} @@ -3046,14 +3045,6 @@ object SparkContext extends Logging { } } - private implicit def arrayToArrayWritable[T <: Writable : ClassTag](arr: Iterable[T]) -: ArrayWritable = { -def anyToWritable[U <: Writable](u: U): Writable = u - -new ArrayWritable(classTag[T].runtimeClass.asInstanceOf[Class[Writable]], -arr.map(x => anyToWritable(x)).toArray) - } - /** * Find the JAR from which a given class was loaded, to make it easy for users to pass * their JARs to SparkContext. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47982][BUILD] Update some code style's plugins to latest version
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a84cffd8b3da [SPARK-47982][BUILD] Update some code style's plugins to latest version a84cffd8b3da is described below commit a84cffd8b3dac777350a78896794ca726e91b080 Author: panbingkun AuthorDate: Thu Apr 25 16:12:14 2024 +0800 [SPARK-47982][BUILD] Update some code style's plugins to latest version ### What changes were proposed in this pull request? The pr aims to update some some code style's plugins to latest version, include: - `mvn-scalafmt_2.13` from `1.1.1684076452.9f83818` to `1.1.1713302731.c3d0074`. - `checkstyle` from `10.14.0` to `10.15.0`. - `scalafmt` from `3.8.0` to `3.8.1`. ### Why are the changes needed? 1.`mvn-scalafmt_2.13` https://github.com/SimonJPegg/mvn_scalafmt/releases/tag/v2.13-1.1.1713302731.c3d0074 Minor version bumps and dropping 2.12 2.`checkstyle` https://checkstyle.org/releasenotes.html#Release_10.15.0 https://checkstyle.org/releasenotes.html#Release_10.14.2 https://checkstyle.org/releasenotes.html#Release_10.14.1 3.`scalafmt` https://github.com/scalameta/scalafmt/releases/tag/v3.8.1 https://github.com/apache/spark/assets/15246973/657fb99a-2267-4d4f-84bb-1d2006818fdd;> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46216 from panbingkun/SPARK-47982. Authored-by: panbingkun Signed-off-by: Kent Yao --- dev/.scalafmt.conf | 2 +- pom.xml | 4 ++-- project/plugins.sbt | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf index 9a01136dfaf8..43be4717c9ab 100644 --- a/dev/.scalafmt.conf +++ b/dev/.scalafmt.conf @@ -27,4 +27,4 @@ danglingParentheses.preset = false docstrings.style = Asterisk maxColumn = 98 runner.dialect = scala213 -version = 3.8.0 +version = 3.8.1 diff --git a/pom.xml b/pom.xml index 338b4050e0f8..c98514efa356 100644 --- a/pom.xml +++ b/pom.xml @@ -3548,7 +3548,7 @@ --> com.puppycrawl.tools checkstyle -10.14.0 +10.15.0 @@ -3610,7 +3610,7 @@ org.antipathy mvn-scalafmt_${scala.binary.version} -1.1.1684076452.9f83818 +1.1.1713302731.c3d0074 ${scalafmt.validateOnly} ${scalafmt.skip} diff --git a/project/plugins.sbt b/project/plugins.sbt index 8f422ca07cbb..44b357d95eb9 100644 --- a/project/plugins.sbt +++ b/project/plugins.sbt @@ -20,10 +20,10 @@ addSbtPlugin("software.purpledragon" % "sbt-checkstyle-plugin" % "4.0.1") // sbt-checkstyle-plugin uses an old version of checkstyle. Match it to Maven's. // If you are changing the dependency setting for checkstyle plugin, // please check pom.xml in the root of the source tree too. -libraryDependencies += "com.puppycrawl.tools" % "checkstyle" % "10.14.0" +libraryDependencies += "com.puppycrawl.tools" % "checkstyle" % "10.15.0" -// checkstyle uses guava 31.0.1-jre. -libraryDependencies += "com.google.guava" % "guava" % "31.0.1-jre" +// checkstyle uses guava 33.1.0-jre. +libraryDependencies += "com.google.guava" % "guava" % "33.1.0-jre" addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.2.0") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47984][ML][SQL] Change `MetricsAggregate/V2Aggregator#serialize/deserialize` to call `SparkSerDeUtils#serialize/deserialize`
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5f730c84abd7 [SPARK-47984][ML][SQL] Change `MetricsAggregate/V2Aggregator#serialize/deserialize` to call `SparkSerDeUtils#serialize/deserialize` 5f730c84abd7 is described below commit 5f730c84abd789360157585ba623537c23b08f78 Author: yangjie01 AuthorDate: Thu Apr 25 16:05:41 2024 +0800 [SPARK-47984][ML][SQL] Change `MetricsAggregate/V2Aggregator#serialize/deserialize` to call `SparkSerDeUtils#serialize/deserialize` ### What changes were proposed in this pull request? The utility methods `serialize` and `deserialize` exist in `SparkSerDeUtils`: https://github.com/apache/spark/blob/08caa567fb29e762f3f7f9d94cd42c02f1e47247/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala#L23-L36 This PR changes the implementation of `serialize/deserialize` methods in `MetricsAggregate` and `V2Aggregator` to call the `serialize/deserialize` methods in `SparkSerDeUtils` to eliminate duplicate code. ### Why are the changes needed? Eliminate duplicate code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #46218 from LuciferYang/utils-sede. Authored-by: yangjie01 Signed-off-by: Kent Yao --- .../src/main/scala/org/apache/spark/ml/stat/Summarizer.scala | 10 +++--- .../sql/catalyst/expressions/aggregate/V2Aggregator.scala| 12 +++- 2 files changed, 6 insertions(+), 16 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala b/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala index 4697bfbe4b09..7a27b32aa24c 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala @@ -31,6 +31,7 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggreg import org.apache.spark.sql.catalyst.trees.BinaryLike import org.apache.spark.sql.functions.lit import org.apache.spark.sql.types._ +import org.apache.spark.util.Utils /** * A builder object that provides summary statistics about a given column. @@ -397,17 +398,12 @@ private[spark] object SummaryBuilderImpl extends Logging { override def serialize(state: SummarizerBuffer): Array[Byte] = { // TODO: Use ByteBuffer to optimize - val bos = new ByteArrayOutputStream() - val oos = new ObjectOutputStream(bos) - oos.writeObject(state) - bos.toByteArray + Utils.serialize(state) } override def deserialize(bytes: Array[Byte]): SummarizerBuffer = { // TODO: Use ByteBuffer to optimize - val bis = new ByteArrayInputStream(bytes) - val ois = new ObjectInputStream(bis) - ois.readObject().asInstanceOf[SummarizerBuffer] + Utils.deserialize(bytes) } override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): MetricsAggregate = { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala index bb94421bc7d4..49ba2ec8b904 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/V2Aggregator.scala @@ -17,13 +17,12 @@ package org.apache.spark.sql.catalyst.expressions.aggregate -import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream} - import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.{Expression, ImplicitCastInputTypes, UnsafeProjection} import org.apache.spark.sql.connector.catalog.functions.{AggregateFunction => V2AggregateFunction} import org.apache.spark.sql.types.{AbstractDataType, DataType} import org.apache.spark.util.ArrayImplicits._ +import org.apache.spark.util.Utils case class V2Aggregator[BUF <: java.io.Serializable, OUT]( aggrFunc: V2AggregateFunction[BUF, OUT], @@ -50,16 +49,11 @@ case class V2Aggregator[BUF <: java.io.Serializable, OUT]( } override def serialize(buffer: BUF): Array[Byte] = { -val bos = new ByteArrayOutputStream() -val out = new ObjectOutputStream(bos) -out.writeObject(buffer) -out.close() -bos.toByteArray +Utils.serialize(buffer) } override def deserialize(bytes: Array[Byte]): BUF = { -val in = new ObjectInputStream(new ByteArrayInputStream(bytes)) -in.readObject().asInstanceOf[BUF] +Utils.deserialize(bytes) }
(spark) branch master updated: [SPARK-47981][BUILD] Upgrade `Arrow` to 16.0.0
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7090bc1f43fd [SPARK-47981][BUILD] Upgrade `Arrow` to 16.0.0 7090bc1f43fd is described below commit 7090bc1f43fd5ae5e67214f84de0276ee6e5df79 Author: sychen AuthorDate: Thu Apr 25 16:02:01 2024 +0800 [SPARK-47981][BUILD] Upgrade `Arrow` to 16.0.0 ### What changes were proposed in this pull request? The pr aims to upgrade `Arrow` from `15.0.2` to `16.0.0`. ### Why are the changes needed? https://arrow.apache.org/release/16.0.0.html SPARK-46718 and SPARK-47531 upgraded the arrow version from 14 to 15, and 15 introduced the `eclipse-collections` dependency. https://github.com/apache/arrow/issues/40896 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #46214 from cxzl25/SPARK-47981. Authored-by: sychen Signed-off-by: Kent Yao --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 12 ++-- pom.xml | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index c1adff73d339..f6adb6d18b85 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -16,10 +16,11 @@ antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar aopalliance-repackaged/3.0.3//aopalliance-repackaged-3.0.3.jar arpack/3.0.3//arpack-3.0.3.jar arpack_combined_all/0.1//arpack_combined_all-0.1.jar -arrow-format/15.0.2//arrow-format-15.0.2.jar -arrow-memory-core/15.0.2//arrow-memory-core-15.0.2.jar -arrow-memory-netty/15.0.2//arrow-memory-netty-15.0.2.jar -arrow-vector/15.0.2//arrow-vector-15.0.2.jar +arrow-format/16.0.0//arrow-format-16.0.0.jar +arrow-memory-core/16.0.0//arrow-memory-core-16.0.0.jar +arrow-memory-netty-buffer-patch/16.0.0//arrow-memory-netty-buffer-patch-16.0.0.jar +arrow-memory-netty/16.0.0//arrow-memory-netty-16.0.0.jar +arrow-vector/16.0.0//arrow-vector-16.0.0.jar audience-annotations/0.12.0//audience-annotations-0.12.0.jar avro-ipc/1.11.3//avro-ipc-1.11.3.jar avro-mapred/1.11.3//avro-mapred-1.11.3.jar @@ -33,6 +34,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar bundle/2.24.6//bundle-2.24.6.jar cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar +checker-qual/3.42.0//checker-qual-3.42.0.jar chill-java/0.10.0//chill-java-0.10.0.jar chill_2.13/0.10.0//chill_2.13-0.10.0.jar commons-cli/1.6.0//commons-cli-1.6.0.jar @@ -62,8 +64,6 @@ derby/10.16.1.1//derby-10.16.1.1.jar derbyshared/10.16.1.1//derbyshared-10.16.1.1.jar derbytools/10.16.1.1//derbytools-10.16.1.1.jar dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar -eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar -eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar gcs-connector/hadoop3-2.2.21/shaded/gcs-connector-hadoop3-2.2.21-shaded.jar diff --git a/pom.xml b/pom.xml index 03c6b757ab02..338b4050e0f8 100644 --- a/pom.xml +++ b/pom.xml @@ -228,7 +228,7 @@ ./python/pyspark/sql/pandas/utils.py, ./python/packaging/classic/setup.py and ./python/packaging/connect/setup.py too. --> -15.0.2 +16.0.0 3.0.0-M1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47983][SQL] Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a066d0c17853 [SPARK-47983][SQL] Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal a066d0c17853 is described below commit a066d0c178535489f461abebae9f84abbdc04891 Author: Kent Yao AuthorDate: Thu Apr 25 15:59:03 2024 +0800 [SPARK-47983][SQL] Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal ### What changes were proposed in this pull request? Make `spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled` as internal like other legacy configurations ### Why are the changes needed? legacy configurations are not exposed to end users, except for this one. ### Does this PR introduce _any_ user-facing change? doc change only ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #46217 from yaooqinn/SPARK-47983. Authored-by: Kent Yao Signed-off-by: Kent Yao --- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 974810133859..f49e48dd3fa0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -4535,6 +4535,7 @@ object SQLConf { val LEGACY_INFER_ARRAY_TYPE_FROM_FIRST_ELEMENT = buildConf("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled") + .internal() .doc("PySpark's SparkSession.createDataFrame infers the element type of an array from all " + "values in the array by default. If this config is set to true, it restores the legacy " + "behavior of only inferring the type from the first array element.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47980][SQL][TESTS] Reactivate test 'Empty float/double array columns raise EOFException'
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 08caa567fb29 [SPARK-47980][SQL][TESTS] Reactivate test 'Empty float/double array columns raise EOFException' 08caa567fb29 is described below commit 08caa567fb29e762f3f7f9d94cd42c02f1e47247 Author: Kent Yao AuthorDate: Thu Apr 25 14:16:29 2024 +0800 [SPARK-47980][SQL][TESTS] Reactivate test 'Empty float/double array columns raise EOFException' ### What changes were proposed in this pull request? [SPARK-29462](https://issues.apache.org/jira/browse/SPARK-29462) have been resolved, so let's enable this test ### Why are the changes needed? test cov ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? this test ### Was this patch authored or co-authored using generative AI tooling? no Closes #46213 from yaooqinn/SPARK-47980. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala index 610fc246cd84..284717739a81 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala @@ -207,10 +207,7 @@ class HiveOrcQuerySuite extends OrcQueryTest with TestHiveSingleton { } } - // SPARK-28885 String value is not allowed to be stored as numeric type with - // ANSI store assignment policy. - // TODO: re-enable the test case when SPARK-29462 is fixed. - ignore("SPARK-23340 Empty float/double array columns raise EOFException") { + test("SPARK-23340 Empty float/double array columns raise EOFException") { withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "false") { withTable("spark_23340") { sql("CREATE TABLE spark_23340(a array, b array) STORED AS ORC") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (0fcced63be99 -> d23389252a7d)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0fcced63be99 [SPARK-47979][SQL][TESTS] Use Hive tables explicitly for Hive table capability tests add d23389252a7d [SPARK-47967][SQL] Make `JdbcUtils.makeGetter` handle reading time type as NTZ correctly No new revisions were added by this update. Summary of changes: .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 43 +- .../sql/execution/datasources/jdbc/JdbcUtils.scala | 8 +++- 2 files changed, 32 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (d1298e73a8d5 -> e74221e6525e)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d1298e73a8d5 [SPARK-47931][SQL] Remove unused and leaked threadlocal/session sessionHive add e74221e6525e [SPARK-47945][SQL] MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests No new revisions were added by this update. Summary of changes: .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 37 docs/sql-data-sources-jdbc.md | 189 + 2 files changed, 226 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (885e98ecbe64 -> d1298e73a8d5)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 885e98ecbe64 [SPARK-47412][SQL] Add Collation Support for LPad/RPad add d1298e73a8d5 [SPARK-47931][SQL] Remove unused and leaked threadlocal/session sessionHive No new revisions were added by this update. Summary of changes: .../service/cli/session/HiveSessionImplwithUGI.java | 21 - 1 file changed, 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2bf43460b923 [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException 2bf43460b923 is described below commit 2bf43460b923c95fb8debc7f0421d9a9e10531b0 Author: Cheng Pan AuthorDate: Fri Apr 19 15:57:17 2024 +0800 [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException ### What changes were proposed in this pull request? SPARK-29089 parallelized `checkAndGlobPathIfNecessary` by leveraging ForkJoinPool, it also introduced a side effect, if something goes wrong, the reported error message loses caller side stack trace. For example, I meet the following error on a Spark job, I have no idea what happened without the caller stack trace. ``` 2024-04-12 14:31:21 CST ApplicationMaster INFO - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://xyz-cluster/user/abc/hive_db/tmp.db/tmp_lskkh_1 at org.apache.spark.sql.errors.QueryCompilationErrors$.dataPathNotExistError(QueryCompilationErrors.scala:1011) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:785) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:782) at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ) ``` ### Why are the changes needed? Improve error message. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New UT is added, and the exception stacktrace differences are raw stacktrace ``` java.lang.RuntimeException: Error occurred on Thread-9 at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141) at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138) ``` enhanced exception stacktrace ``` java.lang.RuntimeException: Error occurred on Thread-9 at org.apache.spark.util.ThreadUtilsSuite$$anon$3.internalMethod(ThreadUtilsSuite.scala:141) at org.apache.spark.util.ThreadUtilsSuite$$anon$3.run(ThreadUtilsSuite.scala:138) at ... run in separate thread: Thread-9 ... () at org.apache.spark.util.ThreadUtilsSuite.$anonfun$new$16(ThreadUtilsSuite.scala:151) at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) at org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) (... other scalatest callsites) ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46028 from pan3793/SPARK-47833. Authored-by: Cheng Pan Signed-off-by: Kent Yao --- .../scala/org/apache/spark/util/ThreadUtils.scala | 62 ++ .../org/apache/spark/util/ThreadUtilsSuite.scala | 39 +- .../sql/execution/datasources/DataSource.scala | 4 +- 3 files changed, 80 insertions(+), 25 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
(spark) branch branch-3.4 updated: [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new bcaf61b975d6 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 bcaf61b975d6 is described below commit bcaf61b975d6e24222e483597eb2232aff822a98 Author: Zhen Wang <643348...@qq.com> AuthorDate: Fri Apr 19 10:53:16 2024 +0800 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 ### What changes were proposed in this pull request? Fix `ExpressionSet` performance regression in scala 2.12. ### Why are the changes needed? The implementation of the `SetLike.++` method in scala 2.12 is to iteratively execute the `+` method. The `ExpressionSet.+` method first clones a new object and then adds element, which is very expensive. https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186 After https://github.com/apache/spark/pull/36121, the `++` and `--` methods in ExpressionSet of scala 2.12 were removed, causing performance regression. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Benchmark code: ``` object TestBenchmark { def main(args: Array[String]): Unit = { val count = 300 val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count) val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1)) var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i)) val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 300).map(i => aUpper + i)) benchmark.addCase("Test ++", 10) { _: Int => for (_ <- 0L until count) { initialSet ++= setToAddWithSameDeterministicExpression } } benchmark.run() } } ``` before this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Test ++1577 1691 61 0.0 5255516.0 1.0X ``` after this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Test ++ 14 14 0 0.0 45395.2 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46114 from wForget/SPARK-47897. Authored-by: Zhen Wang <643348...@qq.com> Signed-off-by: Kent Yao (cherry picked from commit afd99d19a2b85dda2245d3557506d1090187c5f4) Signed-off-by: Kent Yao --- .../sql/catalyst/expressions/ExpressionSet.scala| 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala index 3e545f745bae..c18679330f3a 100644 --- a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala +++ b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala @@ -17,7 +17,7 @@ package org.apache.spark.sql.catalyst.expressions -import scala.collection.mutable +import scala.collection.{mutable, GenTraversableOnce} import scala.collection.mutable.ArrayBuffer object ExpressionSet { @@ -108,12 +108,31 @@ class ExpressionSet protected( newSet } + /** + * SPARK-47897: In Scala 2.12, the `SetLike.++` method iteratively calls `+` method. + * `ExpressionSet.+` is expensive, so we override `++`. + */ + override def ++(elems: GenTraversableOnce[Expression]): ExpressionSet = { +val newSet = clone() +elems.foreach(newSet.add) +newSet + } + override def -(elem: Expression): ExpressionSet = { val newSet = clone() newSet.remove(elem) newSet } + /** + * SPARK-47897: We need to override `--` like `++`. + */ + override def --(elems: GenTraversableOnce[E
(spark) branch branch-3.5 updated: [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new afd99d19a2b8 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 afd99d19a2b8 is described below commit afd99d19a2b85dda2245d3557506d1090187c5f4 Author: Zhen Wang <643348...@qq.com> AuthorDate: Fri Apr 19 10:53:16 2024 +0800 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 ### What changes were proposed in this pull request? Fix `ExpressionSet` performance regression in scala 2.12. ### Why are the changes needed? The implementation of the `SetLike.++` method in scala 2.12 is to iteratively execute the `+` method. The `ExpressionSet.+` method first clones a new object and then adds element, which is very expensive. https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186 After https://github.com/apache/spark/pull/36121, the `++` and `--` methods in ExpressionSet of scala 2.12 were removed, causing performance regression. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Benchmark code: ``` object TestBenchmark { def main(args: Array[String]): Unit = { val count = 300 val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count) val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1)) var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i)) val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 300).map(i => aUpper + i)) benchmark.addCase("Test ++", 10) { _: Int => for (_ <- 0L until count) { initialSet ++= setToAddWithSameDeterministicExpression } } benchmark.run() } } ``` before this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Test ++1577 1691 61 0.0 5255516.0 1.0X ``` after this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative Test ++ 14 14 0 0.0 45395.2 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46114 from wForget/SPARK-47897. Authored-by: Zhen Wang <643348...@qq.com> Signed-off-by: Kent Yao --- .../sql/catalyst/expressions/ExpressionSet.scala| 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala index 3e545f745bae..c18679330f3a 100644 --- a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala +++ b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala @@ -17,7 +17,7 @@ package org.apache.spark.sql.catalyst.expressions -import scala.collection.mutable +import scala.collection.{mutable, GenTraversableOnce} import scala.collection.mutable.ArrayBuffer object ExpressionSet { @@ -108,12 +108,31 @@ class ExpressionSet protected( newSet } + /** + * SPARK-47897: In Scala 2.12, the `SetLike.++` method iteratively calls `+` method. + * `ExpressionSet.+` is expensive, so we override `++`. + */ + override def ++(elems: GenTraversableOnce[Expression]): ExpressionSet = { +val newSet = clone() +elems.foreach(newSet.add) +newSet + } + override def -(elem: Expression): ExpressionSet = { val newSet = clone() newSet.remove(elem) newSet } + /** + * SPARK-47897: We need to override `--` like `++`. + */ + override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = { +val newSet = clone() +elems.foreach(newSet.remove) +newSet + } +
(spark) branch master updated (268856da31c1 -> 2054ab0fb03f)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 268856da31c1 [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping add 2054ab0fb03f [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle No new revisions were added by this update. Summary of changes: docs/sql-data-sources-jdbc.md | 106 ++ 1 file changed, 106 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org