date:20210202

[spark] branch branch-2.4 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 90db0ab  [SPARK-34327][BUILD] Strip passwords from inlining into build 
information while releasing
90db0ab is described below

commit 90db0ab9e3adb65d0df5bebf45b9822327d1
Author: Prashant Sharma 
AuthorDate: Wed Feb 3 15:02:35 2021 +0900

[SPARK-34327][BUILD] Strip passwords from inlining into build information 
while releasing

### What changes were proposed in this pull request?

Strip passwords from getting inlined into build information, inadvertently.

` https://user:passdomain/foo -> https://domain/foo`

### Why are the changes needed?
This can be a serious security issue, esp. during a release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested by executing the following command on both Mac OSX and Ubuntu.

```
echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)\(.*\)|https://\2|')
```

Closes #31436 from ScrapCodes/strip_pass.

Authored-by: Prashant Sharma 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 89bf2afb3337a44f34009a36cae16dd0ff86b353)
Signed-off-by: HyukjinKwon 
---
 build/spark-build-info | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/build/spark-build-info b/build/spark-build-info
index ad0ec67..eb0e3d7 100755
--- a/build/spark-build-info
+++ b/build/spark-build-info
@@ -32,7 +32,7 @@ echo_build_properties() {
   echo revision=$(git rev-parse HEAD)
   echo branch=$(git rev-parse --abbrev-ref HEAD)
   echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ)
-  echo url=$(git config --get remote.origin.url)
+  echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)@\(.*\)|https://\2|')
 }
 
 echo_build_properties $2 > "$SPARK_BUILD_INFO"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 602caba  [SPARK-34327][BUILD] Strip passwords from inlining into build 
information while releasing
602caba is described below

commit 602caba35c0d370a925e20fd43b68e9259e71d21
Author: Prashant Sharma 
AuthorDate: Wed Feb 3 15:02:35 2021 +0900

[SPARK-34327][BUILD] Strip passwords from inlining into build information 
while releasing

### What changes were proposed in this pull request?

Strip passwords from getting inlined into build information, inadvertently.

` https://user:passdomain/foo -> https://domain/foo`

### Why are the changes needed?
This can be a serious security issue, esp. during a release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested by executing the following command on both Mac OSX and Ubuntu.

```
echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)\(.*\)|https://\2|')
```

Closes #31436 from ScrapCodes/strip_pass.

Authored-by: Prashant Sharma 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 89bf2afb3337a44f34009a36cae16dd0ff86b353)
Signed-off-by: HyukjinKwon 
---
 build/spark-build-info | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/build/spark-build-info b/build/spark-build-info
index ad0ec67..eb0e3d7 100755
--- a/build/spark-build-info
+++ b/build/spark-build-info
@@ -32,7 +32,7 @@ echo_build_properties() {
   echo revision=$(git rev-parse HEAD)
   echo branch=$(git rev-parse --abbrev-ref HEAD)
   echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ)
-  echo url=$(git config --get remote.origin.url)
+  echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)@\(.*\)|https://\2|')
 }
 
 echo_build_properties $2 > "$SPARK_BUILD_INFO"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 94245c4  [SPARK-34327][BUILD] Strip passwords from inlining into build 
information while releasing
94245c4 is described below

commit 94245c45b8a6b94ae2670cacc89d944116a376f9
Author: Prashant Sharma 
AuthorDate: Wed Feb 3 15:02:35 2021 +0900

[SPARK-34327][BUILD] Strip passwords from inlining into build information 
while releasing

### What changes were proposed in this pull request?

Strip passwords from getting inlined into build information, inadvertently.

` https://user:passdomain/foo -> https://domain/foo`

### Why are the changes needed?
This can be a serious security issue, esp. during a release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested by executing the following command on both Mac OSX and Ubuntu.

```
echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)\(.*\)|https://\2|')
```

Closes #31436 from ScrapCodes/strip_pass.

Authored-by: Prashant Sharma 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 89bf2afb3337a44f34009a36cae16dd0ff86b353)
Signed-off-by: HyukjinKwon 
---
 build/spark-build-info | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/build/spark-build-info b/build/spark-build-info
index ad0ec67..eb0e3d7 100755
--- a/build/spark-build-info
+++ b/build/spark-build-info
@@ -32,7 +32,7 @@ echo_build_properties() {
   echo revision=$(git rev-parse HEAD)
   echo branch=$(git rev-parse --abbrev-ref HEAD)
   echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ)
-  echo url=$(git config --get remote.origin.url)
+  echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)@\(.*\)|https://\2|')
 }
 
 echo_build_properties $2 > "$SPARK_BUILD_INFO"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89bf2af  [SPARK-34327][BUILD] Strip passwords from inlining into build 
information while releasing
89bf2af is described below

commit 89bf2afb3337a44f34009a36cae16dd0ff86b353
Author: Prashant Sharma 
AuthorDate: Wed Feb 3 15:02:35 2021 +0900

[SPARK-34327][BUILD] Strip passwords from inlining into build information 
while releasing

### What changes were proposed in this pull request?

Strip passwords from getting inlined into build information, inadvertently.

` https://user:passdomain/foo -> https://domain/foo`

### Why are the changes needed?
This can be a serious security issue, esp. during a release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested by executing the following command on both Mac OSX and Ubuntu.

```
echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)\(.*\)|https://\2|')
```

Closes #31436 from ScrapCodes/strip_pass.

Authored-by: Prashant Sharma 
Signed-off-by: HyukjinKwon 
---
 build/spark-build-info | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/build/spark-build-info b/build/spark-build-info
index ad0ec67..eb0e3d7 100755
--- a/build/spark-build-info
+++ b/build/spark-build-info
@@ -32,7 +32,7 @@ echo_build_properties() {
   echo revision=$(git rev-parse HEAD)
   echo branch=$(git rev-parse --abbrev-ref HEAD)
   echo date=$(date -u +%Y-%m-%dT%H:%M:%SZ)
-  echo url=$(git config --get remote.origin.url)
+  echo url=$(git config --get remote.origin.url |  sed 
's|https://\(.*\)@\(.*\)|https://\2|')
 }
 
 echo_build_properties $2 > "$SPARK_BUILD_INFO"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fc80a5b -> a1d4bb3)

2021-02-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fc80a5b  [SPARK-34307][SQL] TakeOrderedAndProjectExec avoid shuffle if 
input rdd has single partition
 add a1d4bb3  [SPARK-34313][SQL] Migrate ALTER TABLE SET/UNSET 
TBLPROPERTIES commands to use UnresolvedTable to resolve the identifier

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/ResolveCatalogs.scala| 13 ---
 .../spark/sql/catalyst/parser/AstBuilder.scala | 18 ++---
 .../sql/catalyst/plans/logical/statements.scala| 15 
 .../sql/catalyst/plans/logical/v2Commands.scala| 19 +
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 18 ++---
 .../catalyst/analysis/ResolveSessionCatalog.scala  | 25 ++--
 .../apache/spark/sql/execution/command/ddl.scala   |  2 -
 .../datasources/v2/DataSourceV2Strategy.scala  | 11 ++
 .../apache/spark/sql/execution/SQLViewSuite.scala  |  6 +++
 .../execution/command/PlanResolutionSuite.scala| 45 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 10 -
 11 files changed, 91 insertions(+), 91 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e927bf9 -> fc80a5b)

2021-02-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e927bf9  Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 
depending on the length of temp path"
 add fc80a5b  [SPARK-34307][SQL] TakeOrderedAndProjectExec avoid shuffle if 
input rdd has single partition

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/limit.scala | 27 --
 1 file changed, 15 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path"

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 3eb94de  Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 
depending on the length of temp path"
3eb94de is described below

commit 3eb94de8ad11e535351fd04a780f1f832f8c39f6
Author: HyukjinKwon 
AuthorDate: Wed Feb 3 12:33:16 2021 +0900

Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on 
the length of temp path"

This reverts commit d9e54381e32bbc86247cf18b7d2ca1e3126bd917.
---
 .../scala/org/apache/spark/util/UtilsSuite.scala|  6 --
 .../DataSourceScanExecRedactionSuite.scala  | 21 +++--
 2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 18ff960..8fb4080 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1308,12 +1308,6 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 assert(Utils.buildLocationMetadata(paths, 10) == "[path0, path1]")
 assert(Utils.buildLocationMetadata(paths, 15) == "[path0, path1, path2]")
 assert(Utils.buildLocationMetadata(paths, 25) == "[path0, path1, path2, 
path3]")
-
-// edge-case: we should consider the fact non-path chars including '[' and 
", " are accounted
-// 1. second path is not added due to the addition of '['
-assert(Utils.buildLocationMetadata(paths, 6) == "[path0]")
-// 2. third path is not added due to the addition of ", "
-assert(Utils.buildLocationMetadata(paths, 13) == "[path0, path1]")
   }
 
   test("checkHost supports both IPV4 and IPV6") {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
index 07bacad..c99be98 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
@@ -137,24 +137,9 @@ class DataSourceScanExecRedactionSuite extends 
DataSourceScanRedactionTest {
   assert(location.isDefined)
   // The location metadata should at least contain one path
   assert(location.get.contains(paths.head))
-
-  // The location metadata should have bracket wrapping paths
-  assert(location.get.indexOf('[') > -1)
-  assert(location.get.indexOf(']') > -1)
-
-  // extract paths in location metadata (removing classname, brackets, 
separators)
-  val pathsInLocation = location.get.substring(
-location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", 
").toSeq
-
-  // If the temp path length is less than (stop appending threshold - 1), 
say, 100 - 1 = 99,
-  // location should include more than one paths. Otherwise location 
should include only one
-  // path.
-  // (Note we apply subtraction with 1 to count start bracket '['.)
-  if (paths.head.length < 99) {
-assert(pathsInLocation.size >= 2)
-  } else {
-assert(pathsInLocation.size == 1)
-  }
+  // If the temp path length is larger than 100, the metadata length 
should not exceed
+  // twice of the length; otherwise, the metadata length should be 
controlled within 200.
+  assert(location.get.length < Math.max(paths.head.length, 100) * 2)
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (603a7fd -> e927bf9)

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 603a7fd  [SPARK-34308][SQL] Escape meta-characters in printSchema
 add e927bf9  Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 
depending on the length of temp path"

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/util/UtilsSuite.scala|  6 --
 .../DataSourceScanExecRedactionSuite.scala  | 21 +++--
 2 files changed, 3 insertions(+), 24 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (60c71c6 -> 603a7fd)

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 60c71c6  [SPARK-34325][CORE] Remove unused shuffleBlockResolver 
variable inSortShuffleWriter
 add 603a7fd  [SPARK-34308][SQL] Escape meta-characters in printSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/types/StructField.scala   |  4 +-
 .../org/apache/spark/sql/util/SchemaUtils.scala| 14 ++
 .../main/scala/org/apache/spark/sql/Dataset.scala  | 14 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  | 53 ++
 4 files changed, 72 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (00120ea -> 60c71c6)

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 00120ea  [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can 
read decimal fields with a larger precision
 add 60c71c6  [SPARK-34325][CORE] Remove unused shuffleBlockResolver 
variable inSortShuffleWriter

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala  | 3 +--
 .../main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala   | 3 +--
 .../scala/org/apache/spark/shuffle/sort/SortShuffleWriterSuite.scala   | 2 --
 3 files changed, 2 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-33591][3.1][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 18def59  [SPARK-33591][3.1][SQL][FOLLOWUP] Add legacy config for 
recognizing null partition spec values
18def59 is described below

commit 18def5955dbde1fdddfed78a691d9adc97cfe7d7
Author: Gengliang Wang 
AuthorDate: Wed Feb 3 09:29:35 2021 +0900

[SPARK-33591][3.1][SQL][FOLLOWUP] Add legacy config for recognizing null 
partition spec values

### What changes were proposed in this pull request?

This PR is to backport https://github.com/apache/spark/pull/31421 and 
https://github.com/apache/spark/pull/31434 to branch 3.1
This is a follow up for https://github.com/apache/spark/pull/30538.
It adds a legacy conf 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants 
the legacy behavior.
It also adds document for the behavior change.

### Why are the changes needed?

In case users want the legacy behavior, they can set 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

### Does this PR introduce _any_ user-facing change?

Yes, adding a legacy configuration to restore the old behavior.

### How was this patch tested?

Unit test.

Closes #31439 from gengliangwang/backportLegacyConf3.1.

Authored-by: Gengliang Wang 
Signed-off-by: HyukjinKwon 
---
 docs/sql-migration-guide.md   |  2 ++
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 10 +++---
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 10 ++
 .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala |  2 +-
 .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala   | 11 +++
 5 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 2beddcb..36dccf9 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -70,6 +70,8 @@ license: |
 * `ALTER TABLE .. ADD PARTITION` throws `PartitionsAlreadyExistException` 
if new partition exists already
 * `ALTER TABLE .. DROP PARTITION` throws `NoSuchPartitionsException` for 
not existing partitions
 
+  - In Spark 3.0.2, `PARTITION(col=null)` is always parsed as a null literal 
in the partition spec. In Spark 3.0.1 or earlier, it is parsed as a string 
literal of its text representation, e.g., string "null", if the partition 
column is string type. To restore the legacy behavior, you can set 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.
+
 ## Upgrading from Spark SQL 3.0 to 3.0.1
 
 - In Spark 3.0, JSON datasource and JSON function `schema_of_json` infer 
TimestampType from string values if they match to the pattern defined by the 
JSON option `timestampFormat`. Since version 3.0.1, the timestamp type 
inference is disabled by default. Set the JSON option `inferTimestamp` to 
`true` to enable such type inference.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 34f56e9..c7ca4b5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -481,9 +481,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
*/
   override def visitPartitionSpec(
   ctx: PartitionSpecContext): Map[String, Option[String]] = 
withOrigin(ctx) {
+val legacyNullAsString =
+  conf.getConf(SQLConf.LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL)
 val parts = ctx.partitionVal.asScala.map { pVal =>
   val name = pVal.identifier.getText
-  val value = Option(pVal.constant).map(visitStringConstant)
+  val value = Option(pVal.constant).map(v => visitStringConstant(v, 
legacyNullAsString))
   name -> value
 }
 // Before calling `toMap`, we check duplicated keys to avoid silently 
ignore partition values
@@ -509,9 +511,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
* main purpose is to prevent slight differences due to back to back 
conversions i.e.:
* String -> Literal -> String.
*/
-  protected def visitStringConstant(ctx: ConstantContext): String = 
withOrigin(ctx) {
+  protected def visitStringConstant(
+  ctx: ConstantContext,
+  legacyNullAsString: Boolean): String = withOrigin(ctx) {
 ctx match {
-  case _: NullLiteralContext => null
+  case _: NullLiteralContext if !legacyNullAsString => null
   case s: StringLiteralContext => createString(s)
   case o => o.getText
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark

[spark] branch branch-2.4 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 5f4e9ea  [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can 
read decimal fields with a larger precision
5f4e9ea is described below

commit 5f4e9ea7a1a70b7ba3c5ff1a4977f019ab43a3a1
Author: Wenchen Fan 
AuthorDate: Wed Feb 3 09:26:36 2021 +0900

[SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal 
fields with a larger precision

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/31357

#31357 added a very strong restriction to the vectorized parquet reader, 
that the spark data type must exactly match the physical parquet type, when 
reading decimal fields. This restriction is actually not necessary, as we can 
safely read parquet decimals with a larger precision. This PR releases this 
restriction a little bit.

### Why are the changes needed?

To not fail queries unnecessarily.

### Does this PR introduce _any_ user-facing change?

Yes, now users can read parquet decimals with mismatched `DecimalType` as 
long as the scale is the same and precision is larger.

### How was this patch tested?

updated test.

Closes #31443 from cloud-fan/improve.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 00120ea53748d84976e549969f43cf2a50778c1c)
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala  | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index 4739089..ed8755c 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@ -106,7 +106,9 @@ public class VectorizedColumnReader {
   private boolean isDecimalTypeMatched(DataType dt) {
 DecimalType d = (DecimalType) dt;
 DecimalMetadata dm = descriptor.getPrimitiveType().getDecimalMetadata();
-return dm != null && dm.getPrecision() == d.precision() && dm.getScale() 
== d.scale();
+// It's OK if the required decimal precision is larger than or equal to 
the physical decimal
+// precision in the Parquet metadata, as long as the decimal scale is the 
same.
+return dm != null && dm.getPrecision() <= d.precision() && dm.getScale() 
== d.scale();
   }
 
   private boolean canReadAsIntDecimal(DataType dt) {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index a2efed6..f262eab 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -3152,6 +3152,14 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS 
DECIMAL(36, 2)) c")
   df.write.parquet(path.toString)
 
+  Seq(true, false).foreach { vectorizedReader =>
+withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorizedReader.toString) {
+  // We can read the decimal parquet field with a larger precision, if 
scale is the same.
+  val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)"
+  checkAnswer(readParquet(schema, path), df)
+}
+  }
+
   withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
 val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)"
 checkAnswer(readParquet(schema1, path), df)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 240016b  [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can 
read decimal fields with a larger precision
240016b is described below

commit 240016ba60b9f08983214f7bfe4a62c3e4ca7de5
Author: Wenchen Fan 
AuthorDate: Wed Feb 3 09:26:36 2021 +0900

[SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal 
fields with a larger precision

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/31357

#31357 added a very strong restriction to the vectorized parquet reader, 
that the spark data type must exactly match the physical parquet type, when 
reading decimal fields. This restriction is actually not necessary, as we can 
safely read parquet decimals with a larger precision. This PR releases this 
restriction a little bit.

### Why are the changes needed?

To not fail queries unnecessarily.

### Does this PR introduce _any_ user-facing change?

Yes, now users can read parquet decimals with mismatched `DecimalType` as 
long as the scale is the same and precision is larger.

### How was this patch tested?

updated test.

Closes #31443 from cloud-fan/improve.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 00120ea53748d84976e549969f43cf2a50778c1c)
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala  | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index 7681ba9..eeff12b 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@ -110,7 +110,9 @@ public class VectorizedColumnReader {
   private boolean isDecimalTypeMatched(DataType dt) {
 DecimalType d = (DecimalType) dt;
 DecimalMetadata dm = descriptor.getPrimitiveType().getDecimalMetadata();
-return dm != null && dm.getPrecision() == d.precision() && dm.getScale() 
== d.scale();
+// It's OK if the required decimal precision is larger than or equal to 
the physical decimal
+// precision in the Parquet metadata, as long as the decimal scale is the 
same.
+return dm != null && dm.getPrecision() <= d.precision() && dm.getScale() 
== d.scale();
   }
 
   private boolean canReadAsIntDecimal(DataType dt) {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 0b78258..409e645 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -3598,6 +3598,14 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS 
DECIMAL(36, 2)) c")
   df.write.parquet(path.toString)
 
+  Seq(true, false).foreach { vectorizedReader =>
+withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorizedReader.toString) {
+  // We can read the decimal parquet field with a larger precision, if 
scale is the same.
+  val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)"
+  checkAnswer(readParquet(schema, path), df)
+}
+  }
+
   withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
 val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)"
 checkAnswer(readParquet(schema1, path), df)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new bb0efc1  [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can 
read decimal fields with a larger precision
bb0efc1 is described below

commit bb0efc16a435346db8d4a6a0bae7f3e647f9f186
Author: Wenchen Fan 
AuthorDate: Wed Feb 3 09:26:36 2021 +0900

[SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal 
fields with a larger precision

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/31357

#31357 added a very strong restriction to the vectorized parquet reader, 
that the spark data type must exactly match the physical parquet type, when 
reading decimal fields. This restriction is actually not necessary, as we can 
safely read parquet decimals with a larger precision. This PR releases this 
restriction a little bit.

### Why are the changes needed?

To not fail queries unnecessarily.

### Does this PR introduce _any_ user-facing change?

Yes, now users can read parquet decimals with mismatched `DecimalType` as 
long as the scale is the same and precision is larger.

### How was this patch tested?

updated test.

Closes #31443 from cloud-fan/improve.

Authored-by: Wenchen Fan 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 00120ea53748d84976e549969f43cf2a50778c1c)
Signed-off-by: HyukjinKwon 
---
 .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala  | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index 7a10aa0..119af8d 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@ -111,7 +111,9 @@ public class VectorizedColumnReader {
   private boolean isDecimalTypeMatched(DataType dt) {
 DecimalType d = (DecimalType) dt;
 DecimalMetadata dm = descriptor.getPrimitiveType().getDecimalMetadata();
-return dm != null && dm.getPrecision() == d.precision() && dm.getScale() 
== d.scale();
+// It's OK if the required decimal precision is larger than or equal to 
the physical decimal
+// precision in the Parquet metadata, as long as the decimal scale is the 
same.
+return dm != null && dm.getPrecision() <= d.precision() && dm.getScale() 
== d.scale();
   }
 
   private boolean canReadAsIntDecimal(DataType dt) {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index d2a578b..5ce236c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -3785,6 +3785,14 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   val df = sql("SELECT 1.0 a, CAST(1.23 AS DECIMAL(17, 2)) b, CAST(1.23 AS 
DECIMAL(36, 2)) c")
   df.write.parquet(path.toString)
 
+  Seq(true, false).foreach { vectorizedReader =>
+withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorizedReader.toString) {
+  // We can read the decimal parquet field with a larger precision, if 
scale is the same.
+  val schema = "a DECIMAL(9, 1), b DECIMAL(18, 2), c DECIMAL(38, 2)"
+  checkAnswer(readParquet(schema, path), df)
+}
+  }
+
   withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
 val schema1 = "a DECIMAL(3, 2), b DECIMAL(18, 3), c DECIMAL(37, 3)"
 checkAnswer(readParquet(schema1, path), df)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (63866025 -> 00120ea)

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 63866025 [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 
depending on the length of temp path
 add 00120ea  [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can 
read decimal fields with a larger precision

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/datasources/parquet/VectorizedColumnReader.java | 4 +++-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala  | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (8637205 -> aae6091)

2021-02-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8637205  [SPARK-34319][SQL] Resolve duplicate attributes for 
FlatMapCoGroupsInPandas/MapInPandas
 add aae6091  [SPARK-33591][3.0][SQL][FOLLOWUP] Add legacy config for 
recognizing null partition spec values

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|   2 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  10 +-
 .../org/apache/spark/sql/internal/SQLConf.scala|  10 ++
 .../spark/sql/execution/SparkSqlParser.scala   |   2 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  11 ++
 .../command/ShowPartitionsSuiteBase.scala  | 193 +
 6 files changed, 224 insertions(+), 4 deletions(-)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

2021-02-02 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new d9e5438  [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 
depending on the length of temp path
d9e5438 is described below

commit d9e54381e32bbc86247cf18b7d2ca1e3126bd917
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Wed Feb 3 07:35:22 2021 +0900

[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the 
length of temp path

### What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all 
things contributing the length limit are properly accounted.

### Why are the changes needed?

The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec 
metadata should contain limited file paths` is failing conditionally, depending 
on the length of the temp directory.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified UTs explain the missing points, which also do the test.

Closes #31435 from HeartSaVioR/SPARK-34326.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Jungtaek Lim 
(cherry picked from commit 63866025d2e4bb89251ba7e29160fb30dd48ddf7)
Signed-off-by: Jungtaek Lim 
---
 .../scala/org/apache/spark/util/UtilsSuite.scala|  6 ++
 .../DataSourceScanExecRedactionSuite.scala  | 21 ++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
index 8fb4080..18ff960 100644
--- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
@@ -1308,6 +1308,12 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 assert(Utils.buildLocationMetadata(paths, 10) == "[path0, path1]")
 assert(Utils.buildLocationMetadata(paths, 15) == "[path0, path1, path2]")
 assert(Utils.buildLocationMetadata(paths, 25) == "[path0, path1, path2, 
path3]")
+
+// edge-case: we should consider the fact non-path chars including '[' and 
", " are accounted
+// 1. second path is not added due to the addition of '['
+assert(Utils.buildLocationMetadata(paths, 6) == "[path0]")
+// 2. third path is not added due to the addition of ", "
+assert(Utils.buildLocationMetadata(paths, 13) == "[path0, path1]")
   }
 
   test("checkHost supports both IPV4 and IPV6") {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
index c99be98..07bacad 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends 
DataSourceScanRedactionTest {
   assert(location.isDefined)
   // The location metadata should at least contain one path
   assert(location.get.contains(paths.head))
-  // If the temp path length is larger than 100, the metadata length 
should not exceed
-  // twice of the length; otherwise, the metadata length should be 
controlled within 200.
-  assert(location.get.length < Math.max(paths.head.length, 100) * 2)
+
+  // The location metadata should have bracket wrapping paths
+  assert(location.get.indexOf('[') > -1)
+  assert(location.get.indexOf(']') > -1)
+
+  // extract paths in location metadata (removing classname, brackets, 
separators)
+  val pathsInLocation = location.get.substring(
+location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", 
").toSeq
+
+  // If the temp path length is less than (stop appending threshold - 1), 
say, 100 - 1 = 99,
+  // location should include more than one paths. Otherwise location 
should include only one
+  // path.
+  // (Note we apply subtraction with 1 to count start bracket '['.)
+  if (paths.head.length < 99) {
+assert(pathsInLocation.size >= 2)
+  } else {
+assert(pathsInLocation.size == 1)
+  }
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cadca8d -> 63866025)

2021-02-02 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cadca8d  [SPARK-34324][SQL] FileTable should not list TRUNCATE in 
capabilities by default
 add 63866025 [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 
depending on the length of temp path

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/util/UtilsSuite.scala|  6 ++
 .../DataSourceScanExecRedactionSuite.scala  | 21 ++---
 2 files changed, 24 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d308794 -> cadca8d)

2021-02-02 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d308794  [SPARK-34263][SQL] Simplify the code for treating 
unicode/octal/escaped characters in string literals
 add cadca8d  [SPARK-34324][SQL] FileTable should not list TRUNCATE in 
capabilities by default

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals

2021-02-02 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d308794  [SPARK-34263][SQL] Simplify the code for treating 
unicode/octal/escaped characters in string literals
d308794 is described below

commit d308794adb821d301847772de3ee1ef3166aaf5b
Author: Kousuke Saruta 
AuthorDate: Wed Feb 3 01:07:12 2021 +0900

[SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped 
characters in string literals

### What changes were proposed in this pull request?

In the current master, the code for treating unicode/octal/escaped 
characters in string literals is a little bit complex so let's simplify it.

### Why are the changes needed?

To keep it easy to maintain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`ParserUtilsSuite` passes.

Closes #31362 from sarutak/refactor-unicode-escapes.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 .../spark/sql/catalyst/parser/ParserUtils.scala| 77 --
 .../sql/catalyst/parser/ParserUtilsSuite.scala |  7 ++
 2 files changed, 34 insertions(+), 50 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
index 711b507..f7cf2ba 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
@@ -17,6 +17,7 @@
 package org.apache.spark.sql.catalyst.parser
 
 import java.lang.{Long => JLong}
+import java.nio.CharBuffer
 import java.util
 
 import scala.collection.mutable.StringBuilder
@@ -33,6 +34,12 @@ import org.apache.spark.sql.errors.QueryParsingErrors
  * A collection of utility methods for use during the parsing process.
  */
 object ParserUtils {
+
+  val U16_CHAR_PATTERN = """\\u([a-fA-F0-9]{4})(?s).*""".r
+  val U32_CHAR_PATTERN = """\\U([a-fA-F0-9]{8})(?s).*""".r
+  val OCTAL_CHAR_PATTERN = """\\([01][0-7]{2})(?s).*""".r
+  val ESCAPED_CHAR_PATTERN = """\\((?s).)(?s).*""".r
+
   /** Get the command which created the token. */
   def command(ctx: ParserRuleContext): String = {
 val stream = ctx.getStart.getInputStream
@@ -131,7 +138,6 @@ object ParserUtils {
 
   /** Unescape backslash-escaped string enclosed by quotes. */
   def unescapeSQLString(b: String): String = {
-var enclosure: Character = null
 val sb = new StringBuilder(b.length())
 
 def appendEscapedChar(n: Char): Unit = {
@@ -152,34 +158,19 @@ object ParserUtils {
   }
 }
 
-var i = 0
-val strLength = b.length
-while (i < strLength) {
-  val currentChar = b.charAt(i)
-  if (enclosure == null) {
-if (currentChar == '\'' || currentChar == '\"') {
-  enclosure = currentChar
-}
-  } else if (enclosure == currentChar) {
-enclosure = null
-  } else if (currentChar == '\\') {
-
-if ((i + 6 < strLength) && b.charAt(i + 1) == 'u') {
-  // \u style 16-bit unicode character literals.
+// Skip the first and last quotations enclosing the string literal.
+val charBuffer = CharBuffer.wrap(b, 1, b.length - 1)
 
-  val base = i + 2
-  val code = (0 until 4).foldLeft(0) { (mid, j) =>
-val digit = Character.digit(b.charAt(j + base), 16)
-(mid << 4) + digit
-  }
-  sb.append(code.asInstanceOf[Char])
-  i += 5
-} else if ((i + 10 < strLength) && b.charAt(i + 1) == 'U' &&
-   (2 until 10).forall(j => Character.digit(b.charAt(i + j), 
16) != -1)) {
+while (charBuffer.remaining() > 0) {
+  charBuffer match {
+case U16_CHAR_PATTERN(cp) =>
+  // \u style 16-bit unicode character literals.
+  sb.append(Integer.parseInt(cp, 16).toChar)
+  charBuffer.position(charBuffer.position() + 6)
+case U32_CHAR_PATTERN(cp) =>
   // \U style 32-bit unicode character literals.
-
   // Use Long to treat codePoint as unsigned in the range of 32-bit.
-  val codePoint = JLong.parseLong(b.substring(i + 2, i + 10), 16)
+  val codePoint = JLong.parseLong(cp, 16)
   if (codePoint < 0x1) {
 sb.append((codePoint & 0x).toChar)
   } else {
@@ -188,33 +179,19 @@ object ParserUtils {
 sb.append(highSurrogate.toChar)
 sb.append(lowSurrogate.toChar)
   }
-  i += 9
-} else if (i + 4 < strLength) {
+  charBuffer.position(charBuffer.position() + 10)
+case OCTAL_CHAR_PATTERN(cp) =>
   // \000 style character literals.
-
-  val i1 = b.c

[spark] branch master updated (ff1b6ec -> 79515b8)

2021-02-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ff1b6ec  [SPARK-33591][SQL][FOLLOW-UP] Revise the version and doc of 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`
 add 79515b8  [SPARK-34282][SQL][TESTS] Unify v1 and v2 TRUNCATE TABLE tests

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  12 -
 .../spark/sql/StatisticsCollectionSuite.scala  |  50 ---
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  15 -
 .../apache/spark/sql/execution/SQLViewSuite.scala  |  21 +-
 .../spark/sql/execution/command/DDLSuite.scala | 186 +--
 .../command/TruncateTableParserSuite.scala |  55 +++
 .../command/TruncateTableSuiteBase.scala}  |  26 +-
 .../execution/command/v1/TruncateTableSuite.scala  | 368 +
 ...titionsSuite.scala => TruncateTableSuite.scala} |  27 +-
 .../apache/spark/sql/hive/CachedTableSuite.scala   |  13 -
 .../sql/hive/execution/HiveCommandSuite.scala  |  66 
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  63 +---
 ...titionsSuite.scala => TruncateTableSuite.scala} |   6 +-
 13 files changed, 461 insertions(+), 447 deletions(-)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/TruncateTableParserSuite.scala
 copy 
sql/core/src/{main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala
 => 
test/scala/org/apache/spark/sql/execution/command/TruncateTableSuiteBase.scala} 
(53%)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/TruncateTableSuite.scala
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/{AlterTableRecoverPartitionsSuite.scala
 => TruncateTableSuite.scala} (56%)
 copy 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/{AlterTableRecoverPartitionsSuite.scala
 => TruncateTableSuite.scala} (82%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5b2ad59 -> ff1b6ec)

2021-02-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5b2ad59  [SPARK-33599][SQL] Restore the assert-like in 
catalyst/analysis
 add ff1b6ec  [SPARK-33591][SQL][FOLLOW-UP] Revise the version and doc of 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md  |  4 ++--
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala| 12 ++--
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala   |  8 
 .../org/apache/spark/sql/execution/SparkSqlParser.scala  |  2 +-
 .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala  | 11 +++
 .../sql/execution/command/ShowPartitionsSuiteBase.scala  | 11 ---
 6 files changed, 24 insertions(+), 24 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (66f3480 -> 5b2ad59)

2021-02-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 66f3480  [SPARK-34318][SQL] Dataset.colRegex should work with column 
names and qualifiers which contain newlines
 add 5b2ad59  [SPARK-33599][SQL] Restore the assert-like in 
catalyst/analysis

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala  |  8 
 .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala |  3 ++-
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala | 10 --
 3 files changed, 6 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5acc5b8 -> 66f3480)

2021-02-02 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5acc5b8  [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3
 add 66f3480  [SPARK-34318][SQL] Dataset.colRegex should work with column 
names and qualifiers which contain newlines

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 9 +
 2 files changed, 11 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6d3674b -> 5acc5b8)

2021-02-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6d3674b  [SPARK-34312][SQL] Support partition(s) truncation by 
`Supports(Atomic)PartitionManagement`
 add 5acc5b8  [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +-
 pom.xml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f024d30 -> 6d3674b)

2021-02-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f024d30  [SPARK-34317][SQL] Introduce relationTypeMismatchHint to 
UnresolvedTable for a better error message
 add 6d3674b  [SPARK-34312][SQL] Support partition(s) truncation by 
`Supports(Atomic)PartitionManagement`

No new revisions were added by this update.

Summary of changes:
 .../catalog/SupportsAtomicPartitionManagement.java | 20 +
 .../catalog/SupportsPartitionManagement.java   | 17 +++
 .../connector/InMemoryAtomicPartitionTable.scala   | 12 +++-
 .../sql/connector/InMemoryPartitionTable.scala |  9 ++
 .../apache/spark/sql/connector/InMemoryTable.scala |  7 +
 .../SupportsAtomicPartitionManagementSuite.scala   | 33 --
 .../catalog/SupportsPartitionManagementSuite.scala | 24 +++-
 7 files changed, 118 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bb9bf66 -> f024d30)

2021-02-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bb9bf66  [SPARK-34199][SQL] Block `table.*` inside function to follow 
ANSI standard and other SQL engines
 add f024d30  [SPARK-34317][SQL] Introduce relationTypeMismatchHint to 
UnresolvedTable for a better error message

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 18 +++---
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |  3 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala | 31 ++
 .../spark/sql/errors/QueryCompilationErrors.scala  | 15 -
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 51 ++--
 .../apache/spark/sql/internal/CatalogImpl.scala|  3 +-
 .../AlterTableAddPartitionParserSuite.scala| 10 +++-
 .../AlterTableDropPartitionParserSuite.scala   | 20 +--
 .../AlterTableRecoverPartitionsParserSuite.scala   | 18 --
 .../AlterTableRenamePartitionParserSuite.scala | 10 +++-
 .../command/ShowPartitionsParserSuite.scala| 10 ++--
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 67 +++---
 12 files changed, 177 insertions(+), 79 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

[spark] branch branch-3.0 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

[spark] branch branch-3.1 updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

[spark] branch master updated: [SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

[spark] branch master updated (fc80a5b -> a1d4bb3)

[spark] branch master updated (e927bf9 -> fc80a5b)

[spark] branch branch-3.1 updated: Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path"

[spark] branch master updated (603a7fd -> e927bf9)

[spark] branch master updated (60c71c6 -> 603a7fd)

[spark] branch master updated (00120ea -> 60c71c6)

[spark] branch branch-3.1 updated: [SPARK-33591][3.1][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

[spark] branch branch-2.4 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision

[spark] branch branch-3.0 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision

[spark] branch branch-3.1 updated: [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision

[spark] branch master updated (63866025 -> 00120ea)

[spark] branch branch-3.0 updated (8637205 -> aae6091)

[spark] branch branch-3.1 updated: [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

[spark] branch master updated (cadca8d -> 63866025)

[spark] branch master updated (d308794 -> cadca8d)

[spark] branch master updated: [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals

[spark] branch master updated (ff1b6ec -> 79515b8)

[spark] branch master updated (5b2ad59 -> ff1b6ec)

[spark] branch master updated (66f3480 -> 5b2ad59)

[spark] branch master updated (5acc5b8 -> 66f3480)

[spark] branch master updated (6d3674b -> 5acc5b8)

[spark] branch master updated (f024d30 -> 6d3674b)

[spark] branch master updated (bb9bf66 -> f024d30)

27 matches

Site Navigation

Mail list logo

Footer information