[spark] branch branch-2.4 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 90db0ab [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing 90db0ab is described below commit 90db0ab9e3adb65d0df5bebf45b9822327d1 Author: Prashant Sharma AuthorDate: Wed Feb 3 15:02:35 2021 +0900 [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing ### What changes were proposed in this pull request? Strip passwords from getting inlined into build information, inadvertently. ` https://user:passdomain/foo -> https://domain/foo` ### Why are the changes needed? This can be a serious security issue, esp. during a release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested by executing the following command on both Mac OSX and Ubuntu. ``` echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)\(.*\)|https://\2|') ``` Closes #31436 from ScrapCodes/strip_pass. Authored-by: Prashant Sharma Signed-off-by: HyukjinKwon (cherry picked from commit 89bf2afb3337a44f34009a36cae16dd0ff86b353) Signed-off-by: HyukjinKwon --- build/spark-build-info | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/build/spark-build-info b/build/spark-build-info index ad0ec67..eb0e3d7 100755 --- a/build/spark-build-info +++ b/build/spark-build-info @@ -32,7 +32,7 @@ echo_build_properties() { echo revision=$(git rev-parse HEAD) echo branch=$(git rev-parse --abbrev-ref HEAD) echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ) - echo url=$(git config --get remote.origin.url) + echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)@\(.*\)|https://\2|') } echo_build_properties $2 > "$SPARK_BUILD_INFO" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 602caba [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing 602caba is described below commit 602caba35c0d370a925e20fd43b68e9259e71d21 Author: Prashant Sharma AuthorDate: Wed Feb 3 15:02:35 2021 +0900 [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing ### What changes were proposed in this pull request? Strip passwords from getting inlined into build information, inadvertently. ` https://user:passdomain/foo -> https://domain/foo` ### Why are the changes needed? This can be a serious security issue, esp. during a release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested by executing the following command on both Mac OSX and Ubuntu. ``` echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)\(.*\)|https://\2|') ``` Closes #31436 from ScrapCodes/strip_pass. Authored-by: Prashant Sharma Signed-off-by: HyukjinKwon (cherry picked from commit 89bf2afb3337a44f34009a36cae16dd0ff86b353) Signed-off-by: HyukjinKwon --- build/spark-build-info | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/build/spark-build-info b/build/spark-build-info index ad0ec67..eb0e3d7 100755 --- a/build/spark-build-info +++ b/build/spark-build-info @@ -32,7 +32,7 @@ echo_build_properties() { echo revision=$(git rev-parse HEAD) echo branch=$(git rev-parse --abbrev-ref HEAD) echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ) - echo url=$(git config --get remote.origin.url) + echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)@\(.*\)|https://\2|') } echo_build_properties $2 > "$SPARK_BUILD_INFO" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 94245c4 [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing 94245c4 is described below commit 94245c45b8a6b94ae2670cacc89d944116a376f9 Author: Prashant Sharma AuthorDate: Wed Feb 3 15:02:35 2021 +0900 [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing ### What changes were proposed in this pull request? Strip passwords from getting inlined into build information, inadvertently. ` https://user:passdomain/foo -> https://domain/foo` ### Why are the changes needed? This can be a serious security issue, esp. during a release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested by executing the following command on both Mac OSX and Ubuntu. ``` echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)\(.*\)|https://\2|') ``` Closes #31436 from ScrapCodes/strip_pass. Authored-by: Prashant Sharma Signed-off-by: HyukjinKwon (cherry picked from commit 89bf2afb3337a44f34009a36cae16dd0ff86b353) Signed-off-by: HyukjinKwon --- build/spark-build-info | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/build/spark-build-info b/build/spark-build-info index ad0ec67..eb0e3d7 100755 --- a/build/spark-build-info +++ b/build/spark-build-info @@ -32,7 +32,7 @@ echo_build_properties() { echo revision=$(git rev-parse HEAD) echo branch=$(git rev-parse --abbrev-ref HEAD) echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ) - echo url=$(git config --get remote.origin.url) + echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)@\(.*\)|https://\2|') } echo_build_properties $2 > "$SPARK_BUILD_INFO" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 89bf2af [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing 89bf2af is described below commit 89bf2afb3337a44f34009a36cae16dd0ff86b353 Author: Prashant Sharma AuthorDate: Wed Feb 3 15:02:35 2021 +0900 [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing ### What changes were proposed in this pull request? Strip passwords from getting inlined into build information, inadvertently. ` https://user:passdomain/foo -> https://domain/foo` ### Why are the changes needed? This can be a serious security issue, esp. during a release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested by executing the following command on both Mac OSX and Ubuntu. ``` echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)\(.*\)|https://\2|') ``` Closes #31436 from ScrapCodes/strip_pass. Authored-by: Prashant Sharma Signed-off-by: HyukjinKwon --- build/spark-build-info | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/build/spark-build-info b/build/spark-build-info index ad0ec67..eb0e3d7 100755 --- a/build/spark-build-info +++ b/build/spark-build-info @@ -32,7 +32,7 @@ echo_build_properties() { echo revision=$(git rev-parse HEAD) echo branch=$(git rev-parse --abbrev-ref HEAD) echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ) - echo url=$(git config --get remote.origin.url) + echo url=$(git config --get remote.origin.url | sed 's|https://\(.*\)@\(.*\)|https://\2|') } echo_build_properties $2 > "$SPARK_BUILD_INFO" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fc80a5b -> a1d4bb3)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fc80a5b [SPARK-34307][SQL] TakeOrderedAndProjectExec avoid shuffle if input rdd has single partition add a1d4bb3 [SPARK-34313][SQL] Migrate ALTER TABLE SET/UNSET TBLPROPERTIES commands to use UnresolvedTable to resolve the identifier No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/ResolveCatalogs.scala| 13 --- .../spark/sql/catalyst/parser/AstBuilder.scala | 18 ++--- .../sql/catalyst/plans/logical/statements.scala| 15 .../sql/catalyst/plans/logical/v2Commands.scala| 19 + .../spark/sql/catalyst/parser/DDLParserSuite.scala | 18 ++--- .../catalyst/analysis/ResolveSessionCatalog.scala | 25 ++-- .../apache/spark/sql/execution/command/ddl.scala | 2 - .../datasources/v2/DataSourceV2Strategy.scala | 11 ++ .../apache/spark/sql/execution/SQLViewSuite.scala | 6 +++ .../execution/command/PlanResolutionSuite.scala| 45 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 10 - 11 files changed, 91 insertions(+), 91 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e927bf9 -> fc80a5b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e927bf9 Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path" add fc80a5b [SPARK-34307][SQL] TakeOrderedAndProjectExec avoid shuffle if input rdd has single partition No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/limit.scala | 27 -- 1 file changed, 15 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 3eb94de Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path" 3eb94de is described below commit 3eb94de8ad11e535351fd04a780f1f832f8c39f6 Author: HyukjinKwon AuthorDate: Wed Feb 3 12:33:16 2021 +0900 Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path" This reverts commit d9e54381e32bbc86247cf18b7d2ca1e3126bd917. --- .../scala/org/apache/spark/util/UtilsSuite.scala| 6 -- .../DataSourceScanExecRedactionSuite.scala | 21 +++-- 2 files changed, 3 insertions(+), 24 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index 18ff960..8fb4080 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -1308,12 +1308,6 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { assert(Utils.buildLocationMetadata(paths, 10) == "[path0, path1]") assert(Utils.buildLocationMetadata(paths, 15) == "[path0, path1, path2]") assert(Utils.buildLocationMetadata(paths, 25) == "[path0, path1, path2, path3]") - -// edge-case: we should consider the fact non-path chars including '[' and ", " are accounted -// 1. second path is not added due to the addition of '[' -assert(Utils.buildLocationMetadata(paths, 6) == "[path0]") -// 2. third path is not added due to the addition of ", " -assert(Utils.buildLocationMetadata(paths, 13) == "[path0, path1]") } test("checkHost supports both IPV4 and IPV6") { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala index 07bacad..c99be98 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala @@ -137,24 +137,9 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest { assert(location.isDefined) // The location metadata should at least contain one path assert(location.get.contains(paths.head)) - - // The location metadata should have bracket wrapping paths - assert(location.get.indexOf('[') > -1) - assert(location.get.indexOf(']') > -1) - - // extract paths in location metadata (removing classname, brackets, separators) - val pathsInLocation = location.get.substring( -location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq - - // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99, - // location should include more than one paths. Otherwise location should include only one - // path. - // (Note we apply subtraction with 1 to count start bracket '['.) - if (paths.head.length < 99) { -assert(pathsInLocation.size >= 2) - } else { -assert(pathsInLocation.size == 1) - } + // If the temp path length is larger than 100, the metadata length should not exceed + // twice of the length; otherwise, the metadata length should be controlled within 200. + assert(location.get.length < Math.max(paths.head.length, 100) * 2) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (603a7fd -> e927bf9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 603a7fd [SPARK-34308][SQL] Escape meta-characters in printSchema add e927bf9 Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path" No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/UtilsSuite.scala| 6 -- .../DataSourceScanExecRedactionSuite.scala | 21 +++-- 2 files changed, 3 insertions(+), 24 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (60c71c6 -> 603a7fd)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 60c71c6 [SPARK-34325][CORE] Remove unused shuffleBlockResolver variable inSortShuffleWriter add 603a7fd [SPARK-34308][SQL] Escape meta-characters in printSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/types/StructField.scala | 4 +- .../org/apache/spark/sql/util/SchemaUtils.scala| 14 ++ .../main/scala/org/apache/spark/sql/Dataset.scala | 14 +- .../org/apache/spark/sql/DataFrameSuite.scala | 53 ++ 4 files changed, 72 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (00120ea -> 60c71c6)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 00120ea [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision add 60c71c6 [SPARK-34325][CORE] Remove unused shuffleBlockResolver variable inSortShuffleWriter No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala | 3 +-- .../main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala | 3 +-- .../scala/org/apache/spark/shuffle/sort/SortShuffleWriterSuite.scala | 2 -- 3 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-33591][3.1][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 18def59 [SPARK-33591][3.1][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values 18def59 is described below commit 18def5955dbde1fdddfed78a691d9adc97cfe7d7 Author: Gengliang Wang AuthorDate: Wed Feb 3 09:29:35 2021 +0900 [SPARK-33591][3.1][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values ### What changes were proposed in this pull request? This PR is to backport https://github.com/apache/spark/pull/31421 and https://github.com/apache/spark/pull/31434 to branch 3.1 This is a follow up for https://github.com/apache/spark/pull/30538. It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior. It also adds document for the behavior change. ### Why are the changes needed? In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. ### Does this PR introduce _any_ user-facing change? Yes, adding a legacy configuration to restore the old behavior. ### How was this patch tested? Unit test. Closes #31439 from gengliangwang/backportLegacyConf3.1. Authored-by: Gengliang Wang Signed-off-by: HyukjinKwon --- docs/sql-migration-guide.md | 2 ++ .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 10 +++--- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 10 ++ .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala | 2 +- .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 +++ 5 files changed, 31 insertions(+), 4 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 2beddcb..36dccf9 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -70,6 +70,8 @@ license: | * `ALTER TABLE .. ADD PARTITION` throws `PartitionsAlreadyExistException` if new partition exists already * `ALTER TABLE .. DROP PARTITION` throws `NoSuchPartitionsException` for not existing partitions + - In Spark 3.0.2, `PARTITION(col=null)` is always parsed as a null literal in the partition spec. In Spark 3.0.1 or earlier, it is parsed as a string literal of its text representation, e.g., string "null", if the partition column is string type. To restore the legacy behavior, you can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. + ## Upgrading from Spark SQL 3.0 to 3.0.1 - In Spark 3.0, JSON datasource and JSON function `schema_of_json` infer TimestampType from string values if they match to the pattern defined by the JSON option `timestampFormat`. Since version 3.0.1, the timestamp type inference is disabled by default. Set the JSON option `inferTimestamp` to `true` to enable such type inference. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 34f56e9..c7ca4b5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -481,9 +481,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg */ override def visitPartitionSpec( ctx: PartitionSpecContext): Map[String, Option[String]] = withOrigin(ctx) { +val legacyNullAsString = + conf.getConf(SQLConf.LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL) val parts = ctx.partitionVal.asScala.map { pVal => val name = pVal.identifier.getText - val value = Option(pVal.constant).map(visitStringConstant) + val value = Option(pVal.constant).map(v => visitStringConstant(v, legacyNullAsString)) name -> value } // Before calling `toMap`, we check duplicated keys to avoid silently ignore partition values @@ -509,9 +511,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg * main purpose is to prevent slight differences due to back to back conversions i.e.: * String -> Literal -> String. */ - protected def visitStringConstant(ctx: ConstantContext): String = withOrigin(ctx) { + protected def visitStringConstant( + ctx: ConstantContext, + legacyNullAsString: Boolean): String = withOrigin(ctx) { ctx match { - case _: NullLiteralContext => null + case _: NullLiteralContext if !legacyNullAsString => null case s: StringLiteralContext => createString(s) case o => o.getText } diff --git a/sql/catalyst/src/main/scala/org/apache/spark
[spark] branch branch-2.4 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 5f4e9ea [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision 5f4e9ea is described below commit 5f4e9ea7a1a70b7ba3c5ff1a4977f019ab43a3a1 Author: Wenchen Fan AuthorDate: Wed Feb 3 09:26:36 2021 +0900 [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/31357 #31357 added a very strong restriction to the vectorized parquet reader, that the spark data type must exactly match the physical parquet type, when reading decimal fields. This restriction is actually not necessary, as we can safely read parquet decimals with a larger precision. This PR releases this restriction a little bit. ### Why are the changes needed? To not fail queries unnecessarily. ### Does this PR introduce _any_ user-facing change? Yes, now users can read parquet decimals with mismatched `DecimalType` as long as the scale is the same and precision is larger. ### How was this patch tested? updated test. Closes #31443 from cloud-fan/improve. Authored-by: Wenchen Fan Signed-off-by: HyukjinKwon (cherry picked from commit 00120ea53748d84976e549969f43cf2a50778c1c) Signed-off-by: HyukjinKwon --- .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java index 4739089..ed8755c 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java @@ -106,7 +106,9 @@ public class VectorizedColumnReader { private boolean isDecimalTypeMatched(DataType dt) { DecimalType d = (DecimalType) dt; DecimalMetadata dm = descriptor.getPrimitiveType().getDecimalMetadata(); -return dm != null && dm.getPrecision() == d.precision() && dm.getScale() == d.scale(); +// It's OK if the required decimal precision is larger than or equal to the physical decimal +// precision in the Parquet metadata, as long as the decimal scale is the same. +return dm != null && dm.getPrecision() <= d.precision() && dm.getScale() == d.scale(); } private boolean canReadAsIntDecimal(DataType dt) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index a2efed6..f262eab 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -3152,6 +3152,14 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS DECIMAL(36, 2)) c") df.write.parquet(path.toString) + Seq(true, false).foreach { vectorizedReader => +withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorizedReader.toString) { + // We can read the decimal parquet field with a larger precision, if scale is the same. + val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)" + checkAnswer(readParquet(schema, path), df) +} + } + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") { val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)" checkAnswer(readParquet(schema1, path), df) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 240016b [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision 240016b is described below commit 240016ba60b9f08983214f7bfe4a62c3e4ca7de5 Author: Wenchen Fan AuthorDate: Wed Feb 3 09:26:36 2021 +0900 [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/31357 #31357 added a very strong restriction to the vectorized parquet reader, that the spark data type must exactly match the physical parquet type, when reading decimal fields. This restriction is actually not necessary, as we can safely read parquet decimals with a larger precision. This PR releases this restriction a little bit. ### Why are the changes needed? To not fail queries unnecessarily. ### Does this PR introduce _any_ user-facing change? Yes, now users can read parquet decimals with mismatched `DecimalType` as long as the scale is the same and precision is larger. ### How was this patch tested? updated test. Closes #31443 from cloud-fan/improve. Authored-by: Wenchen Fan Signed-off-by: HyukjinKwon (cherry picked from commit 00120ea53748d84976e549969f43cf2a50778c1c) Signed-off-by: HyukjinKwon --- .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java index 7681ba9..eeff12b 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java @@ -110,7 +110,9 @@ public class VectorizedColumnReader { private boolean isDecimalTypeMatched(DataType dt) { DecimalType d = (DecimalType) dt; DecimalMetadata dm = descriptor.getPrimitiveType().getDecimalMetadata(); -return dm != null && dm.getPrecision() == d.precision() && dm.getScale() == d.scale(); +// It's OK if the required decimal precision is larger than or equal to the physical decimal +// precision in the Parquet metadata, as long as the decimal scale is the same. +return dm != null && dm.getPrecision() <= d.precision() && dm.getScale() == d.scale(); } private boolean canReadAsIntDecimal(DataType dt) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 0b78258..409e645 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -3598,6 +3598,14 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS DECIMAL(36, 2)) c") df.write.parquet(path.toString) + Seq(true, false).foreach { vectorizedReader => +withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorizedReader.toString) { + // We can read the decimal parquet field with a larger precision, if scale is the same. + val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)" + checkAnswer(readParquet(schema, path), df) +} + } + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") { val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)" checkAnswer(readParquet(schema1, path), df) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new bb0efc1 [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision bb0efc1 is described below commit bb0efc16a435346db8d4a6a0bae7f3e647f9f186 Author: Wenchen Fan AuthorDate: Wed Feb 3 09:26:36 2021 +0900 [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/31357 #31357 added a very strong restriction to the vectorized parquet reader, that the spark data type must exactly match the physical parquet type, when reading decimal fields. This restriction is actually not necessary, as we can safely read parquet decimals with a larger precision. This PR releases this restriction a little bit. ### Why are the changes needed? To not fail queries unnecessarily. ### Does this PR introduce _any_ user-facing change? Yes, now users can read parquet decimals with mismatched `DecimalType` as long as the scale is the same and precision is larger. ### How was this patch tested? updated test. Closes #31443 from cloud-fan/improve. Authored-by: Wenchen Fan Signed-off-by: HyukjinKwon (cherry picked from commit 00120ea53748d84976e549969f43cf2a50778c1c) Signed-off-by: HyukjinKwon --- .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java index 7a10aa0..119af8d 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java @@ -111,7 +111,9 @@ public class VectorizedColumnReader { private boolean isDecimalTypeMatched(DataType dt) { DecimalType d = (DecimalType) dt; DecimalMetadata dm = descriptor.getPrimitiveType().getDecimalMetadata(); -return dm != null && dm.getPrecision() == d.precision() && dm.getScale() == d.scale(); +// It's OK if the required decimal precision is larger than or equal to the physical decimal +// precision in the Parquet metadata, as long as the decimal scale is the same. +return dm != null && dm.getPrecision() <= d.precision() && dm.getScale() == d.scale(); } private boolean canReadAsIntDecimal(DataType dt) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index d2a578b..5ce236c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -3785,6 +3785,14 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS DECIMAL(36, 2)) c") df.write.parquet(path.toString) + Seq(true, false).foreach { vectorizedReader => +withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorizedReader.toString) { + // We can read the decimal parquet field with a larger precision, if scale is the same. + val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)" + checkAnswer(readParquet(schema, path), df) +} + } + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") { val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)" checkAnswer(readParquet(schema1, path), df) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (63866025 -> 00120ea)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 63866025 [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path add 00120ea [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision No new revisions were added by this update. Summary of changes: .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 2 files changed, 11 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8637205 -> aae6091)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8637205 [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas add aae6091 [SPARK-33591][3.0][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../spark/sql/catalyst/parser/AstBuilder.scala | 10 +- .../org/apache/spark/sql/internal/SQLConf.scala| 10 ++ .../spark/sql/execution/SparkSqlParser.scala | 2 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++ .../command/ShowPartitionsSuiteBase.scala | 193 + 6 files changed, 224 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new d9e5438 [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path d9e5438 is described below commit d9e54381e32bbc86247cf18b7d2ca1e3126bd917 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Wed Feb 3 07:35:22 2021 +0900 [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path ### What changes were proposed in this pull request? This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted. ### Why are the changes needed? The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths` is failing conditionally, depending on the length of the temp directory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Modified UTs explain the missing points, which also do the test. Closes #31435 from HeartSaVioR/SPARK-34326. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Jungtaek Lim (cherry picked from commit 63866025d2e4bb89251ba7e29160fb30dd48ddf7) Signed-off-by: Jungtaek Lim --- .../scala/org/apache/spark/util/UtilsSuite.scala| 6 ++ .../DataSourceScanExecRedactionSuite.scala | 21 ++--- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index 8fb4080..18ff960 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -1308,6 +1308,12 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { assert(Utils.buildLocationMetadata(paths, 10) == "[path0, path1]") assert(Utils.buildLocationMetadata(paths, 15) == "[path0, path1, path2]") assert(Utils.buildLocationMetadata(paths, 25) == "[path0, path1, path2, path3]") + +// edge-case: we should consider the fact non-path chars including '[' and ", " are accounted +// 1. second path is not added due to the addition of '[' +assert(Utils.buildLocationMetadata(paths, 6) == "[path0]") +// 2. third path is not added due to the addition of ", " +assert(Utils.buildLocationMetadata(paths, 13) == "[path0, path1]") } test("checkHost supports both IPV4 and IPV6") { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala index c99be98..07bacad 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala @@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest { assert(location.isDefined) // The location metadata should at least contain one path assert(location.get.contains(paths.head)) - // If the temp path length is larger than 100, the metadata length should not exceed - // twice of the length; otherwise, the metadata length should be controlled within 200. - assert(location.get.length < Math.max(paths.head.length, 100) * 2) + + // The location metadata should have bracket wrapping paths + assert(location.get.indexOf('[') > -1) + assert(location.get.indexOf(']') > -1) + + // extract paths in location metadata (removing classname, brackets, separators) + val pathsInLocation = location.get.substring( +location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq + + // If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99, + // location should include more than one paths. Otherwise location should include only one + // path. + // (Note we apply subtraction with 1 to count start bracket '['.) + if (paths.head.length < 99) { +assert(pathsInLocation.size >= 2) + } else { +assert(pathsInLocation.size == 1) + } } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cadca8d -> 63866025)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cadca8d [SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by default add 63866025 [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/UtilsSuite.scala| 6 ++ .../DataSourceScanExecRedactionSuite.scala | 21 ++--- 2 files changed, 24 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d308794 -> cadca8d)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d308794 [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals add cadca8d [SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by default No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d308794 [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals d308794 is described below commit d308794adb821d301847772de3ee1ef3166aaf5b Author: Kousuke Saruta AuthorDate: Wed Feb 3 01:07:12 2021 +0900 [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals ### What changes were proposed in this pull request? In the current master, the code for treating unicode/octal/escaped characters in string literals is a little bit complex so let's simplify it. ### Why are the changes needed? To keep it easy to maintain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `ParserUtilsSuite` passes. Closes #31362 from sarutak/refactor-unicode-escapes. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta --- .../spark/sql/catalyst/parser/ParserUtils.scala| 77 -- .../sql/catalyst/parser/ParserUtilsSuite.scala | 7 ++ 2 files changed, 34 insertions(+), 50 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala index 711b507..f7cf2ba 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.catalyst.parser import java.lang.{Long => JLong} +import java.nio.CharBuffer import java.util import scala.collection.mutable.StringBuilder @@ -33,6 +34,12 @@ import org.apache.spark.sql.errors.QueryParsingErrors * A collection of utility methods for use during the parsing process. */ object ParserUtils { + + val U16_CHAR_PATTERN = """\\u([a-fA-F0-9]{4})(?s).*""".r + val U32_CHAR_PATTERN = """\\U([a-fA-F0-9]{8})(?s).*""".r + val OCTAL_CHAR_PATTERN = """\\([01][0-7]{2})(?s).*""".r + val ESCAPED_CHAR_PATTERN = """\\((?s).)(?s).*""".r + /** Get the command which created the token. */ def command(ctx: ParserRuleContext): String = { val stream = ctx.getStart.getInputStream @@ -131,7 +138,6 @@ object ParserUtils { /** Unescape backslash-escaped string enclosed by quotes. */ def unescapeSQLString(b: String): String = { -var enclosure: Character = null val sb = new StringBuilder(b.length()) def appendEscapedChar(n: Char): Unit = { @@ -152,34 +158,19 @@ object ParserUtils { } } -var i = 0 -val strLength = b.length -while (i < strLength) { - val currentChar = b.charAt(i) - if (enclosure == null) { -if (currentChar == '\'' || currentChar == '\"') { - enclosure = currentChar -} - } else if (enclosure == currentChar) { -enclosure = null - } else if (currentChar == '\\') { - -if ((i + 6 < strLength) && b.charAt(i + 1) == 'u') { - // \u style 16-bit unicode character literals. +// Skip the first and last quotations enclosing the string literal. +val charBuffer = CharBuffer.wrap(b, 1, b.length - 1) - val base = i + 2 - val code = (0 until 4).foldLeft(0) { (mid, j) => -val digit = Character.digit(b.charAt(j + base), 16) -(mid << 4) + digit - } - sb.append(code.asInstanceOf[Char]) - i += 5 -} else if ((i + 10 < strLength) && b.charAt(i + 1) == 'U' && - (2 until 10).forall(j => Character.digit(b.charAt(i + j), 16) != -1)) { +while (charBuffer.remaining() > 0) { + charBuffer match { +case U16_CHAR_PATTERN(cp) => + // \u style 16-bit unicode character literals. + sb.append(Integer.parseInt(cp, 16).toChar) + charBuffer.position(charBuffer.position() + 6) +case U32_CHAR_PATTERN(cp) => // \U style 32-bit unicode character literals. - // Use Long to treat codePoint as unsigned in the range of 32-bit. - val codePoint = JLong.parseLong(b.substring(i + 2, i + 10), 16) + val codePoint = JLong.parseLong(cp, 16) if (codePoint < 0x1) { sb.append((codePoint & 0x).toChar) } else { @@ -188,33 +179,19 @@ object ParserUtils { sb.append(highSurrogate.toChar) sb.append(lowSurrogate.toChar) } - i += 9 -} else if (i + 4 < strLength) { + charBuffer.position(charBuffer.position() + 10) +case OCTAL_CHAR_PATTERN(cp) => // \000 style character literals. - - val i1 = b.c
[spark] branch master updated (ff1b6ec -> 79515b8)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ff1b6ec [SPARK-33591][SQL][FOLLOW-UP] Revise the version and doc of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` add 79515b8 [SPARK-34282][SQL][TESTS] Unify v1 and v2 TRUNCATE TABLE tests No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/parser/DDLParserSuite.scala | 12 - .../spark/sql/StatisticsCollectionSuite.scala | 50 --- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 15 - .../apache/spark/sql/execution/SQLViewSuite.scala | 21 +- .../spark/sql/execution/command/DDLSuite.scala | 186 +-- .../command/TruncateTableParserSuite.scala | 55 +++ .../command/TruncateTableSuiteBase.scala} | 26 +- .../execution/command/v1/TruncateTableSuite.scala | 368 + ...titionsSuite.scala => TruncateTableSuite.scala} | 27 +- .../apache/spark/sql/hive/CachedTableSuite.scala | 13 - .../sql/hive/execution/HiveCommandSuite.scala | 66 .../spark/sql/hive/execution/HiveDDLSuite.scala| 63 +--- ...titionsSuite.scala => TruncateTableSuite.scala} | 6 +- 13 files changed, 461 insertions(+), 447 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/command/TruncateTableParserSuite.scala copy sql/core/src/{main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala => test/scala/org/apache/spark/sql/execution/command/TruncateTableSuiteBase.scala} (53%) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/TruncateTableSuite.scala copy sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/{AlterTableRecoverPartitionsSuite.scala => TruncateTableSuite.scala} (56%) copy sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/{AlterTableRecoverPartitionsSuite.scala => TruncateTableSuite.scala} (82%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5b2ad59 -> ff1b6ec)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5b2ad59 [SPARK-33599][SQL] Restore the assert-like in catalyst/analysis add ff1b6ec [SPARK-33591][SQL][FOLLOW-UP] Revise the version and doc of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 4 ++-- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala| 12 ++-- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 8 .../org/apache/spark/sql/execution/SparkSqlParser.scala | 2 +- .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 +++ .../sql/execution/command/ShowPartitionsSuiteBase.scala | 11 --- 6 files changed, 24 insertions(+), 24 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (66f3480 -> 5b2ad59)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 66f3480 [SPARK-34318][SQL] Dataset.colRegex should work with column names and qualifiers which contain newlines add 5b2ad59 [SPARK-33599][SQL] Restore the assert-like in catalyst/analysis No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 8 .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala | 3 ++- .../org/apache/spark/sql/errors/QueryExecutionErrors.scala | 10 -- 3 files changed, 6 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5acc5b8 -> 66f3480)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5acc5b8 [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3 add 66f3480 [SPARK-34318][SQL] Dataset.colRegex should work with column names and qualifiers which contain newlines No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 9 + 2 files changed, 11 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6d3674b -> 5acc5b8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6d3674b [SPARK-34312][SQL] Support partition(s) truncation by `Supports(Atomic)PartitionManagement` add 5acc5b8 [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f024d30 -> 6d3674b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f024d30 [SPARK-34317][SQL] Introduce relationTypeMismatchHint to UnresolvedTable for a better error message add 6d3674b [SPARK-34312][SQL] Support partition(s) truncation by `Supports(Atomic)PartitionManagement` No new revisions were added by this update. Summary of changes: .../catalog/SupportsAtomicPartitionManagement.java | 20 + .../catalog/SupportsPartitionManagement.java | 17 +++ .../connector/InMemoryAtomicPartitionTable.scala | 12 +++- .../sql/connector/InMemoryPartitionTable.scala | 9 ++ .../apache/spark/sql/connector/InMemoryTable.scala | 7 + .../SupportsAtomicPartitionManagementSuite.scala | 33 -- .../catalog/SupportsPartitionManagementSuite.scala | 24 +++- 7 files changed, 118 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bb9bf66 -> f024d30)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bb9bf66 [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines add f024d30 [SPARK-34317][SQL] Introduce relationTypeMismatchHint to UnresolvedTable for a better error message No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 18 +++--- .../sql/catalyst/analysis/v2ResolutionPlans.scala | 3 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 31 ++ .../spark/sql/errors/QueryCompilationErrors.scala | 15 - .../spark/sql/catalyst/parser/DDLParserSuite.scala | 51 ++-- .../apache/spark/sql/internal/CatalogImpl.scala| 3 +- .../AlterTableAddPartitionParserSuite.scala| 10 +++- .../AlterTableDropPartitionParserSuite.scala | 20 +-- .../AlterTableRecoverPartitionsParserSuite.scala | 18 -- .../AlterTableRenamePartitionParserSuite.scala | 10 +++- .../command/ShowPartitionsParserSuite.scala| 10 ++-- .../spark/sql/hive/execution/HiveDDLSuite.scala| 67 +++--- 12 files changed, 177 insertions(+), 79 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org