date:20220608

[spark] branch branch-3.2 updated: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build

2022-06-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new a74a3e382e2 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 
in documentation build
a74a3e382e2 is described below

commit a74a3e382e28a36c552fd689e390275bb1d9811a
Author: Hyukjin Kwon 
AuthorDate: Thu Jun 9 14:26:45 2022 +0900

[SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation 
build

This PR fixes the Sphinx build failure below (see 
https://github.com/singhpk234/spark/runs/6799026458?check_suite_focus=true):

```
Moving to python/docs directory and building sphinx.
Running Sphinx v3.0.4
WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
is required to set this environment variable to '1' in both driver and executor 
sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it 
does not work if there is a Spark context already launched.
/__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: 
UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate 
the documentation; however, your version was 1.3.5
  warnings.warn(
Warning, treated as error:
node class 'meta' is already registered, its visitors will be overridden
make: *** [Makefile:35: html] Error 2

  Jekyll 4.2.1   Please append `--trace` to the `build` command
 for any additional information or backtrace.

```

Sphinx build fails apparently with the latest docutils (see also 
https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.

To recover the CI.

No, dev-only.

CI in this PR should test it out.

Closes #36813 from HyukjinKwon/SPARK-39421.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit c196ff4dfa1d9f1a8e20b884ee5b4a4e6e65a6e3)
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml | 1 +
 dev/requirements.txt | 1 +
 2 files changed, 2 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 1329b5ea27c..68424dc 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -399,6 +399,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs numpy 
pydata_sphinx_theme ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install sphinx_plotly_directive 'pyarrow<5.0.0' 
pandas 'plotly>=4.8'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
 Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 
'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 273294a96af..7f40737af91 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -32,6 +32,7 @@ numpydoc
 jinja2<3.0.0
 sphinx<3.1.0
 sphinx-plotly-directive
+docutils<0.18.0
 
 # Development scripts
 jira


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build

2022-06-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 66826567fa1 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 
in documentation build
66826567fa1 is described below

commit 66826567fa12e57119acc97f9971e36fe834df21
Author: Hyukjin Kwon 
AuthorDate: Thu Jun 9 14:26:45 2022 +0900

[SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation 
build

### What changes were proposed in this pull request?

This PR fixes the Sphinx build failure below (see 
https://github.com/singhpk234/spark/runs/6799026458?check_suite_focus=true):

```
Moving to python/docs directory and building sphinx.
Running Sphinx v3.0.4
WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
is required to set this environment variable to '1' in both driver and executor 
sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it 
does not work if there is a Spark context already launched.
/__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: 
UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate 
the documentation; however, your version was 1.3.5
  warnings.warn(
Warning, treated as error:
node class 'meta' is already registered, its visitors will be overridden
make: *** [Makefile:35: html] Error 2

  Jekyll 4.2.1   Please append `--trace` to the `build` command
 for any additional information or backtrace.

```

Sphinx build fails apparently with the latest docutils (see also 
https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.

### Why are the changes needed?

To recover the CI.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

CI in this PR should test it out.

Closes #36813 from HyukjinKwon/SPARK-39421.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit c196ff4dfa1d9f1a8e20b884ee5b4a4e6e65a6e3)
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml | 1 +
 dev/requirements.txt | 1 +
 2 files changed, 2 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 1f5df70cde9..e0e9f70556c 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -528,6 +528,7 @@ jobs:
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8' 
+python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
 Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 
'rmarkdown', 'markdown', 'e1071', 'roxygen2'), 
repos='https://cloud.r-project.org/')"
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 22e72d55543..e7e0a4b4274 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -35,6 +35,7 @@ numpydoc
 jinja2<3.0.0
 sphinx<3.1.0
 sphinx-plotly-directive
+docutils<0.18.0
 
 # Development scripts
 jira


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cb55efadea1 -> c196ff4dfa1)

2022-06-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cb55efadea1 [SPARK-39236][SQL] Make CreateTable and ListTables be 
compatible with 3 layer namespace
 add c196ff4dfa1 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 
in documentation build

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 1 +
 dev/requirements.txt | 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-08 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cb55efadea1 [SPARK-39236][SQL] Make CreateTable and ListTables be 
compatible with 3 layer namespace
cb55efadea1 is described below

commit cb55efadea1399e1ce6daae5d9ec7896ffce1b93
Author: Rui Wang 
AuthorDate: Thu Jun 9 11:08:00 2022 +0800

[SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 
layer namespace

### What changes were proposed in this pull request?

1. Change `CreateTable` API to make it support 3 layer namespace.
2. Change `ListTables` API such that a) it supports `database` parameter 
and if that `database` does not exist b) further check if the parameter is 
`catalog.database`.

### Why are the changes needed?

CreateTable and ListTables does not support 3 layer namespace.

### Does this PR introduce _any_ user-facing change?

Yes. The API change here is backward compatible and it extends the API to 
further support 3 layer namespace (e.g. catalog.database.table).

### How was this patch tested?

UT

Closes #36586 from amaliujia/catalogapi.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 R/pkg/tests/fulltests/test_sparkSQL.R  |   5 +-
 .../sql/catalyst/catalog/SessionCatalog.scala  |   4 +
 .../spark/sql/errors/QueryCompilationErrors.scala  |   4 +
 .../org/apache/spark/sql/catalog/interface.scala   |  26 +++-
 .../apache/spark/sql/internal/CatalogImpl.scala|  92 +++---
 .../spark/sql/execution/GlobalTempViewSuite.scala  |   4 +-
 .../apache/spark/sql/internal/CatalogSuite.scala   | 140 -
 .../spark/sql/hive/MetastoreDataSourcesSuite.scala |   2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|   4 +-
 9 files changed, 251 insertions(+), 30 deletions(-)

diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index df1094bacef..f0abc96613d 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -663,7 +663,7 @@ test_that("test tableNames and tables", {
   expect_equal(count(tables), count + 1)
   expect_equal(count(tables()), count(tables))
   expect_true("tableName" %in% colnames(tables()))
-  expect_true(all(c("tableName", "database", "isTemporary") %in% 
colnames(tables(
+  expect_true(all(c("tableName", "namespace", "isTemporary") %in% 
colnames(tables(
 
   suppressWarnings(registerTempTable(df, "table2"))
   tables <- listTables()
@@ -4026,7 +4026,8 @@ test_that("catalog APIs, listTables, listColumns, 
listFunctions", {
   tb <- listTables()
   count <- count(tables())
   expect_equal(nrow(tb), count)
-  expect_equal(colnames(tb), c("name", "database", "description", "tableType", 
"isTemporary"))
+  expect_equal(colnames(tb),
+   c("name", "catalog", "namespace", "description", "tableType", 
"isTemporary"))
 
   createOrReplaceTempView(as.DataFrame(cars), "cars")
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index d6c80f98bf7..0152f49c798 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -965,6 +965,10 @@ class SessionCatalog(
 isTempView(nameParts.asTableIdentifier)
   }
 
+  def isGlobalTempViewDB(dbName: String): Boolean = {
+globalTempViewManager.database.equals(dbName)
+  }
+
   def lookupTempView(name: TableIdentifier): Option[View] = {
 val tableName = formatTableName(name.table)
 if (name.database.isEmpty) {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 551eaa6aeb7..68f4320ff67 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -2188,6 +2188,10 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
 new AnalysisException(s"Table or view '$tableName' not found in database 
'$dbName'")
   }
 
+  def tableOrViewNotFound(ident: Seq[String]): Throwable = {
+new AnalysisException(s"Table or view '${ident.quoted}' not found")
+  }
+
   def unexpectedTypeOfRelationError(relation: LogicalPlan, tableName: String): 
Throwable = {
 new AnalysisException(
   s"Unexpected type ${relation.getClass.getCanonicalName} of the relation 
$tableName")
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala

[spark] branch master updated: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f9997ca9f3 [SPARK-39387][FOLLOWUP][TESTS] Add a test case for 
HIVE-25190
6f9997ca9f3 is described below

commit 6f9997ca9f3639f01b25a9cff4985a5b3b224578
Author: sychen 
AuthorDate: Wed Jun 8 19:59:26 2022 -0700

[SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

### What changes were proposed in this pull request?
Add UT, test whether the Overflow of newLength problem is fixed.

### Why are the changes needed?
https://github.com/apache/spark/pull/36772#pullrequestreview-996975725

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add UT

Closes #36787 from cxzl25/SPARK-39387-FOLLOWUP.

Authored-by: sychen 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/datasources/orc/OrcQuerySuite.scala  | 12 
 1 file changed, 12 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
index a289a94fdce..2c1120baa7c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due 
to overflow") {
+withTempPath { dir =>
+  val path = dir.getCanonicalPath
+  val df = spark.range(1, 22, 1, 1).map { _ =>
+val byteData = Array.fill[Byte](1024 * 1024)('X')
+val mapData = (1 to 100).map(i => (i, byteData))
+mapData
+  }.toDF()
+  df.write.format("orc").save(path)
+}
+  }
 }
 
 class OrcV1QuerySuite extends OrcQuerySuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-08 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d8e9ac01f8e [SPARK-39349] Add a centralized CheckError method for QA 
of error path
d8e9ac01f8e is described below

commit d8e9ac01f8e42f10707efc8a7579d32ff88dbd58
Author: Serge Rielau 
AuthorDate: Thu Jun 9 09:40:08 2022 +0800

[SPARK-39349] Add a centralized CheckError method for QA of error path

### What changes were proposed in this pull request?

Pulling error messages out of the code base into error-classes.json solves 
only one half of the problem.
This change aims to lay the infrastructure to pull error messages out of QA.
We do this by adding an central checkError() method in SparkFunSuite which 
is geared towards verifying the payload of an error only:
- ERROR_CLASS
- Optional ERROR_SUBCLASS
- Optional SQLSTATE (derived from error-classes.json, so debatable)
- Parameter values (with optional parameter names for extra points)

The method allows regex matching of parameter values.

### Why are the changes needed?

Pulling error-messages out of code and QA makes for a central place to fine 
tune error-messages for language and formatting.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A subset of QA tests has been rewritten to exercise the code.

Closes #36693 from srielau/textless-error-check.

Lead-authored-by: Serge Rielau 
Co-authored-by: Serge Rielau 
Co-authored-by: Gengliang Wang 
Signed-off-by: Wenchen Fan 
---
 .../main/java/org/apache/spark/SparkThrowable.java |  13 +
 .../apache/spark/memory/SparkOutOfMemoryError.java |   8 +-
 core/src/main/resources/error/error-classes.json   |  37 ++-
 .../main/scala/org/apache/spark/ErrorInfo.scala|  36 ++-
 .../scala/org/apache/spark/SparkException.scala| 199 ---
 .../scala/org/apache/spark/SparkFunSuite.scala |  63 +
 .../org/apache/spark/SparkThrowableSuite.scala |  10 +-
 .../org/apache/spark/sql/AnalysisException.scala   |  52 +++-
 .../spark/sql/catalyst/analysis/Analyzer.scala |   2 +-
 .../catalog/InvalidUDFClassException.scala |   2 +-
 .../spark/sql/catalyst/parser/ParseDriver.scala|  24 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  40 +--
 .../spark/sql/errors/QueryExecutionErrors.scala|  61 +++--
 .../spark/sql/errors/QueryParsingErrors.scala  |  32 ++-
 .../catalyst/encoders/EncoderResolutionSuite.scala | 146 ++-
 .../test/resources/sql-tests/results/date.sql.out  |   4 +-
 .../sql-tests/results/datetime-legacy.sql.out  |   4 +-
 .../resources/sql-tests/results/describe.sql.out   |   4 +-
 .../errors/QueryCompilationErrorsDSv2Suite.scala   |  17 +-
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 174 +++--
 .../spark/sql/errors/QueryErrorsSuiteBase.scala|   1 +
 .../sql/errors/QueryExecutionAnsiErrorsSuite.scala |  32 +--
 .../sql/errors/QueryExecutionErrorsSuite.scala | 273 +++--
 23 files changed, 791 insertions(+), 443 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/SparkThrowable.java 
b/core/src/main/java/org/apache/spark/SparkThrowable.java
index 2be0c3c0f94..581e1f6eebb 100644
--- a/core/src/main/java/org/apache/spark/SparkThrowable.java
+++ b/core/src/main/java/org/apache/spark/SparkThrowable.java
@@ -36,6 +36,10 @@ public interface SparkThrowable {
   // If null, error class is not set
   String getErrorClass();
 
+  default String getErrorSubClass() {
+return null;
+  }
+
   // Portable error identifier across SQL engines
   // If null, error class or SQLSTATE is not set
   default String getSqlState() {
@@ -46,4 +50,13 @@ public interface SparkThrowable {
   default boolean isInternalError() {
 return SparkThrowableHelper.isInternalError(this.getErrorClass());
   }
+
+  default String[] getMessageParameters() {
+return new String[]{};
+  }
+
+  // Returns a string array of all parameters that need to be passed to this 
error message.
+  default String[] getParameterNames() {
+return SparkThrowableHelper.getParameterNames(this.getErrorClass(), 
this.getErrorSubClass());
+  }
 }
diff --git 
a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java 
b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java
index c5f19a0c201..9d2739018a0 100644
--- a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java
+++ b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java
@@ -39,11 +39,17 @@ public final class SparkOutOfMemoryError extends 
OutOfMemoryError implements Spa
 }
 
 public SparkOutOfMemoryError(String errorClass, String[] 
messageParameters) {
-super(SparkThrowableHelper.getMessage(errorClass,

[spark] branch master updated: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

2022-06-08 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4e4eb6f1ed5 [SPARK-39400][SQL] spark-sql should remove hive resource 
dir in all case
4e4eb6f1ed5 is described below

commit 4e4eb6f1ed5ff0d3caa7f424d2df23f186bf32a2
Author: Angerszh 
AuthorDate: Thu Jun 9 08:06:13 2022 +0800

[SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

### What changes were proposed in this pull request?
In current code, when we use `spark-sql` `-e` , `-f` or use `ctrl + c` to 
close  `spark-sql` session, will remain hive session resource dir under `/tmp` 
path, this pr help to clean this files

### Why are the changes needed?
Clean remained files

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manuel tested

Closes #36786 from AngersZh/SPARK-39400.

Lead-authored-by: Angerszh 
Co-authored-by: AngersZh 
Signed-off-by: Yuming Wang 
---
 .../apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
index fccb2a65273..d40cf73be63 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
@@ -103,10 +103,13 @@ private[hive] object SparkSQLCLIDriver extends Logging {
   sessionState.info = new PrintStream(System.err, true, UTF_8.name())
   sessionState.err = new PrintStream(System.err, true, UTF_8.name())
 } catch {
-  case e: UnsupportedEncodingException => System.exit(ERROR_PATH_NOT_FOUND)
+  case e: UnsupportedEncodingException =>
+sessionState.close()
+System.exit(ERROR_PATH_NOT_FOUND)
 }
 
 if (!oproc.process_stage2(sessionState)) {
+  sessionState.close()
   System.exit(ERROR_MISUSE_SHELL_BUILTIN)
 }
 
@@ -140,7 +143,10 @@ private[hive] object SparkSQLCLIDriver extends Logging {
 SessionState.setCurrentSessionState(sessionState)
 
 // Clean up after we exit
-ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() }
+ShutdownHookManager.addShutdownHook { () =>
+  sessionState.close()
+  SparkSQLEnv.stop()
+}
 
 if (isRemoteMode(sessionState)) {
   // Hive 1.2 + not supported in CLI


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 512d337abf1 [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types
512d337abf1 is described below

commit 512d337abf1387a81ac47e50656e330eb3f51b22
Author: Amin Borjian 
AuthorDate: Wed Jun 8 13:30:44 2022 -0700

[SPARK-39393][SQL] Parquet data source only supports push-down predicate 
filters for non-repeated primitive types

### What changes were proposed in this pull request?

In Spark version 3.1.0 and newer, Spark creates extra filter predicate 
conditions for repeated parquet columns.
These fields do not have the ability to have a filter predicate, according 
to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in 
the parquet library.

This PR solves this problem until the appropriate functionality is provided 
by the parquet.

Before this PR:

Assume follow Protocol buffer schema:

```
message Model {
string name = 1;
repeated string keywords = 2;
}
```

Suppose a parquet file is created from a set of records in the above format 
with the help of the parquet-protobuf library.
Using Spark version 3.1.0 or newer, we get following exception when run the 
following query using spark-shell:

```
val data = spark.read.parquet("/path/to/parquet")
data.registerTempTable("models")
spark.sql("select * from models where array_contains(keywords, 
'X')").show(false)
```

```
Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
currently support repeated columns. Column keywords is repeated.
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
  at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
  at 
org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
  at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
  at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
...
```

The cause of the problem is due to a change in the data filtering 
conditions:

```
spark.sql("select * from log where array_contains(keywords, 
'X')").explain(true);

// Spark 3.0.2 and older
== Physical Plan ==
...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [array_contains(keywords#1, Google)]
  PushedFilters: []
  ...

// Spark 3.1.0 and newer
== Physical Plan == ...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
  PushedFilters: [IsNotNull(keywords)]
  ...
```

Pushing filters down for repeated columns of parquet is not necessary 
because it is not supported by parquet library for now. So we can exclude them 
from pushed predicate filters and solve issue.

### Why are the changes needed?

Predicate filters that are pushed down to parquet should not be created on 
repeated-type fields.

### Does this PR introduce any user-facing change?

No, It's only fixed a bug and before this, due to the limitations of the 
parquet library, no more work

[spark] branch branch-3.2 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new d42f53b5ec4 [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types
d42f53b5ec4 is described below

commit d42f53b5ec4d3442acadaa0f2737a8430172a562
Author: Amin Borjian 
AuthorDate: Wed Jun 8 13:30:44 2022 -0700

[SPARK-39393][SQL] Parquet data source only supports push-down predicate 
filters for non-repeated primitive types

### What changes were proposed in this pull request?

In Spark version 3.1.0 and newer, Spark creates extra filter predicate 
conditions for repeated parquet columns.
These fields do not have the ability to have a filter predicate, according 
to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in 
the parquet library.

This PR solves this problem until the appropriate functionality is provided 
by the parquet.

Before this PR:

Assume follow Protocol buffer schema:

```
message Model {
string name = 1;
repeated string keywords = 2;
}
```

Suppose a parquet file is created from a set of records in the above format 
with the help of the parquet-protobuf library.
Using Spark version 3.1.0 or newer, we get following exception when run the 
following query using spark-shell:

```
val data = spark.read.parquet("/path/to/parquet")
data.registerTempTable("models")
spark.sql("select * from models where array_contains(keywords, 
'X')").show(false)
```

```
Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
currently support repeated columns. Column keywords is repeated.
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
  at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
  at 
org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
  at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
  at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
...
```

The cause of the problem is due to a change in the data filtering 
conditions:

```
spark.sql("select * from log where array_contains(keywords, 
'X')").explain(true);

// Spark 3.0.2 and older
== Physical Plan ==
...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [array_contains(keywords#1, Google)]
  PushedFilters: []
  ...

// Spark 3.1.0 and newer
== Physical Plan == ...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
  PushedFilters: [IsNotNull(keywords)]
  ...
```

Pushing filters down for repeated columns of parquet is not necessary 
because it is not supported by parquet library for now. So we can exclude them 
from pushed predicate filters and solve issue.

### Why are the changes needed?

Predicate filters that are pushed down to parquet should not be created on 
repeated-type fields.

### Does this PR introduce any user-facing change?

No, It's only fixed a bug and before this, due to the limitations of the 
parquet library, no more work

[spark] branch branch-3.3 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 5847014fc3f [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types
5847014fc3f is described below

commit 5847014fc3fe08b8a59c107a99c1540fbb2c2208
Author: Amin Borjian 
AuthorDate: Wed Jun 8 13:30:44 2022 -0700

[SPARK-39393][SQL] Parquet data source only supports push-down predicate 
filters for non-repeated primitive types

### What changes were proposed in this pull request?

In Spark version 3.1.0 and newer, Spark creates extra filter predicate 
conditions for repeated parquet columns.
These fields do not have the ability to have a filter predicate, according 
to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in 
the parquet library.

This PR solves this problem until the appropriate functionality is provided 
by the parquet.

Before this PR:

Assume follow Protocol buffer schema:

```
message Model {
string name = 1;
repeated string keywords = 2;
}
```

Suppose a parquet file is created from a set of records in the above format 
with the help of the parquet-protobuf library.
Using Spark version 3.1.0 or newer, we get following exception when run the 
following query using spark-shell:

```
val data = spark.read.parquet("/path/to/parquet")
data.registerTempTable("models")
spark.sql("select * from models where array_contains(keywords, 
'X')").show(false)
```

```
Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
currently support repeated columns. Column keywords is repeated.
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
  at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
  at 
org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
  at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
  at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
...
```

The cause of the problem is due to a change in the data filtering 
conditions:

```
spark.sql("select * from log where array_contains(keywords, 
'X')").explain(true);

// Spark 3.0.2 and older
== Physical Plan ==
...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [array_contains(keywords#1, Google)]
  PushedFilters: []
  ...

// Spark 3.1.0 and newer
== Physical Plan == ...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
  PushedFilters: [IsNotNull(keywords)]
  ...
```

Pushing filters down for repeated columns of parquet is not necessary 
because it is not supported by parquet library for now. So we can exclude them 
from pushed predicate filters and solve issue.

### Why are the changes needed?

Predicate filters that are pushed down to parquet should not be created on 
repeated-type fields.

### Does this PR introduce any user-facing change?

No, It's only fixed a bug and before this, due to the limitations of the 
parquet library, no more work

[spark] branch master updated (19afe1341d2 -> ac2881a8c3c)

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 19afe1341d2 [SPARK-39412][SQL] Exclude IllegalStateException from 
Spark's internal errors
 add ac2881a8c3c [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetFilters.scala   |  6 -
 .../datasources/parquet/ParquetFilterSuite.scala   | 29 ++
 2 files changed, 34 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors

2022-06-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 19afe1341d2 [SPARK-39412][SQL] Exclude IllegalStateException from 
Spark's internal errors
19afe1341d2 is described below

commit 19afe1341d277bc2d7dd47175d142a8c71141138
Author: Max Gekk 
AuthorDate: Wed Jun 8 21:20:55 2022 +0300

[SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal 
errors

### What changes were proposed in this pull request?
In the PR, I propose to exclude `IllegalStateException` from the list of 
exceptions that are wrapped by `SparkException` with the `INTERNAL_ERROR` error 
class.

### Why are the changes needed?
See explanation in SPARK-39412.

### Does this PR introduce _any_ user-facing change?
No, the reverted changes haven't released yet.

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt "test:testOnly *ContinuousSuite"
$ build/sbt "test:testOnly *MicroBatchExecutionSuite"
$ build/sbt "test:testOnly *KafkaMicroBatchV1SourceSuite"
$ build/sbt "test:testOnly *KafkaMicroBatchV2SourceSuite"
$ build/sbt "test:testOnly *.WholeStageCodegenSuite"
```

Closes #36804 from MaxGekk/exclude-IllegalStateException.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala   | 11 ---
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala|  2 +-
 .../scala/org/apache/spark/sql/execution/QueryExecution.scala |  7 +++
 .../apache/spark/sql/execution/WholeStageCodegenSuite.scala   | 11 ---
 .../sql/execution/streaming/MicroBatchExecutionSuite.scala|  6 ++
 .../spark/sql/streaming/continuous/ContinuousSuite.scala  |  7 +++
 6 files changed, 17 insertions(+), 27 deletions(-)

diff --git 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
index 0a32b1b54d0..2396f31b954 100644
--- 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
+++ 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
@@ -34,7 +34,6 @@ import org.apache.kafka.common.TopicPartition
 import org.scalatest.concurrent.PatienceConfiguration.Timeout
 import org.scalatest.time.SpanSugar._
 
-import org.apache.spark.{SparkException, SparkThrowable}
 import org.apache.spark.sql.{Dataset, ForeachWriter, Row, SparkSession}
 import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
 import org.apache.spark.sql.connector.read.streaming.SparkDataStream
@@ -667,10 +666,9 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[SparkException](e => {
-assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
+  ExpectFailure[IllegalStateException](e => {
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getCause.getMessage.contains("was changed from 2 to 1"))
+assert(e.getMessage.contains("was changed from 2 to 1"))
   })
 )
   }
@@ -766,13 +764,12 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 
   testStream(df)(
 StartStream(checkpointLocation = metadataPath.getAbsolutePath),
-ExpectFailure[SparkException](e => {
-  assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
+ExpectFailure[IllegalStateException](e => {
   Seq(
 s"maximum supported log version is v1, but encountered v9",
 "produced by a newer version of Spark and cannot be read by this 
version"
   ).foreach { message =>
-assert(e.getCause.toString.contains(message))
+assert(e.toString.contains(message))
   }
 }))
 }
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 0a45cf92c6e..97a5318b3ed 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -3916,7 +3916,7 @@ class Dataset[T] private[sql](
 
   /**
* Wrap a Dataset action to track the QueryExecution and time cost, then 
report to the
-   * user-registered callback functions, and also to convert asserts/illegal 
states to
+   * user-registered callback functions, and also to convert asserts/NPE to
* the internal error exception.
*/
   private def withAction[U](name: String, qe: QueryExecution)(action:

[spark] branch branch-3.3 updated: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors

2022-06-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 94f3e4113ef [SPARK-39412][SQL] Exclude IllegalStateException from 
Spark's internal errors
94f3e4113ef is described below

commit 94f3e4113ef6fbf0940578bcb279f233e43c27f1
Author: Max Gekk 
AuthorDate: Wed Jun 8 21:20:55 2022 +0300

[SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal 
errors

### What changes were proposed in this pull request?
In the PR, I propose to exclude `IllegalStateException` from the list of 
exceptions that are wrapped by `SparkException` with the `INTERNAL_ERROR` error 
class.

### Why are the changes needed?
See explanation in SPARK-39412.

### Does this PR introduce _any_ user-facing change?
No, the reverted changes haven't released yet.

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt "test:testOnly *ContinuousSuite"
$ build/sbt "test:testOnly *MicroBatchExecutionSuite"
$ build/sbt "test:testOnly *KafkaMicroBatchV1SourceSuite"
$ build/sbt "test:testOnly *KafkaMicroBatchV2SourceSuite"
$ build/sbt "test:testOnly *.WholeStageCodegenSuite"
```

Closes #36804 from MaxGekk/exclude-IllegalStateException.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit 19afe1341d277bc2d7dd47175d142a8c71141138)
Signed-off-by: Max Gekk 
---
 .../spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala   | 11 ---
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala|  2 +-
 .../scala/org/apache/spark/sql/execution/QueryExecution.scala |  7 +++
 .../apache/spark/sql/execution/WholeStageCodegenSuite.scala   | 11 ---
 .../sql/execution/streaming/MicroBatchExecutionSuite.scala|  6 ++
 .../spark/sql/streaming/continuous/ContinuousSuite.scala  |  7 +++
 6 files changed, 17 insertions(+), 27 deletions(-)

diff --git 
a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 
b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
index 41277a535f5..db71f0fd918 100644
--- 
a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
+++ 
b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
@@ -34,7 +34,6 @@ import org.apache.kafka.common.TopicPartition
 import org.scalatest.concurrent.PatienceConfiguration.Timeout
 import org.scalatest.time.SpanSugar._
 
-import org.apache.spark.{SparkException, SparkThrowable}
 import org.apache.spark.sql.{Dataset, ForeachWriter, Row, SparkSession}
 import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
 import org.apache.spark.sql.connector.read.streaming.SparkDataStream
@@ -667,10 +666,9 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[SparkException](e => {
-assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
+  ExpectFailure[IllegalStateException](e => {
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getCause.getMessage.contains("was changed from 2 to 1"))
+assert(e.getMessage.contains("was changed from 2 to 1"))
   })
 )
   }
@@ -766,13 +764,12 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 
   testStream(df)(
 StartStream(checkpointLocation = metadataPath.getAbsolutePath),
-ExpectFailure[SparkException](e => {
-  assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
+ExpectFailure[IllegalStateException](e => {
   Seq(
 s"maximum supported log version is v1, but encountered v9",
 "produced by a newer version of Spark and cannot be read by this 
version"
   ).foreach { message =>
-assert(e.getCause.toString.contains(message))
+assert(e.toString.contains(message))
   }
 }))
 }
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index a4a40cc0e69..6ef9bc2a703 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -3848,7 +3848,7 @@ class Dataset[T] private[sql](
 
   /**
* Wrap a Dataset action to track the QueryExecution and time cost, then 
report to the
-   * user-registered callback functions, and also to convert asserts/illegal 
states to
+   * user-registered callback functions, and also to convert asserts/NPE to
* the

[spark] branch master updated: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d44b47596a [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite
7d44b47596a is described below

commit 7d44b47596a14269c4199ccf86aebf4e6c9e7ca4
Author: Jiaan Geng 
AuthorDate: Wed Jun 8 07:34:36 2022 -0700

[SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

### What changes were proposed in this pull request?
`JDBCV2Suite` exists some test case which uses sql keywords are not 
capitalized.
This PR will capitalize sql keywords in `JDBCV2Suite`.

### Why are the changes needed?
Capitalize sql keywords in `JDBCV2Suite`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update test cases.

### How was this patch tested?
N/A.

Closes #36805 from beliefer/SPARK-39413.

Authored-by: Jiaan Geng 
Signed-off-by: huaxingao 
---
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 66 +++---
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
index 9de4872fd60..cf96c35d8ae 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
@@ -679,7 +679,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   }
 
   test("scan with filter push-down with string functions") {
-val df1 = sql("select * FROM h2.test.employee where " +
+val df1 = sql("SELECT * FROM h2.test.employee WHERE " +
   "substr(name, 2, 1) = 'e'" +
   " AND upper(name) = 'JEN' AND lower(name) = 'jen' ")
 checkFiltersRemoved(df1)
@@ -689,7 +689,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkPushedInfo(df1, expectedPlanFragment1)
 checkAnswer(df1, Seq(Row(6, "jen", 12000, 1200, true)))
 
-val df2 = sql("select * FROM h2.test.employee where " +
+val df2 = sql("SELECT * FROM h2.test.employee WHERE " +
   "trim(name) = 'jen' AND trim('j', name) = 'en'" +
   "AND translate(name, 'e', 1) = 'j1n'")
 checkFiltersRemoved(df2)
@@ -699,7 +699,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkPushedInfo(df2, expectedPlanFragment2)
 checkAnswer(df2, Seq(Row(6, "jen", 12000, 1200, true)))
 
-val df3 = sql("select * FROM h2.test.employee where " +
+val df3 = sql("SELECT * FROM h2.test.employee WHERE " +
   "ltrim(name) = 'jen' AND ltrim('j', name) = 'en'")
 checkFiltersRemoved(df3)
 val expectedPlanFragment3 =
@@ -708,7 +708,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkPushedInfo(df3, expectedPlanFragment3)
 checkAnswer(df3, Seq(Row(6, "jen", 12000, 1200, true)))
 
-val df4 = sql("select * FROM h2.test.employee where " +
+val df4 = sql("SELECT * FROM h2.test.employee WHERE " +
   "rtrim(name) = 'jen' AND rtrim('n', name) = 'je'")
 checkFiltersRemoved(df4)
 val expectedPlanFragment4 =
@@ -718,7 +718,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkAnswer(df4, Seq(Row(6, "jen", 12000, 1200, true)))
 
 // H2 does not support OVERLAY
-val df5 = sql("select * FROM h2.test.employee where OVERLAY(NAME, '1', 2, 
1) = 'j1n'")
+val df5 = sql("SELECT * FROM h2.test.employee WHERE OVERLAY(NAME, '1', 2, 
1) = 'j1n'")
 checkFiltersRemoved(df5, false)
 val expectedPlanFragment5 =
   "PushedFilters: [NAME IS NOT NULL]"
@@ -727,8 +727,8 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   }
 
   test("scan with aggregate push-down: MAX AVG with filter and group by") {
-val df = sql("select MAX(SaLaRY), AVG(BONUS) FROM h2.test.employee where 
dept > 0" +
-  " group by DePt")
+val df = sql("SELECT MAX(SaLaRY), AVG(BONUS) FROM h2.test.employee WHERE 
dept > 0" +
+  " GROUP BY DePt")
 checkFiltersRemoved(df)
 checkAggregateRemoved(df)
 checkPushedInfo(df, "PushedAggregates: [MAX(SALARY), AVG(BONUS)], " +
@@ -749,7 +749,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   }
 
   test("scan with aggregate push-down: MAX AVG with filter without group by") {
-val df = sql("select MAX(ID), AVG(ID) FROM h2.test.people where id > 0")
+val df = sql("SELECT MAX(ID), AVG(ID) FROM h2.test.people WHERE id > 0")
 checkFiltersRemoved(df)
 checkAggregateRemoved(df)
 checkPushedInfo(df, "PushedAggregates: [MAX(ID), AVG(ID)], " +
@@ -776,7 +776,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   }

[spark] branch branch-3.3 updated: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 376c14ac8cf [SPARK-39411][BUILD] Fix release script to address type 
hint in pyspark/version.py
376c14ac8cf is described below

commit 376c14ac8cfb6d51c29755b5ee951e5e41981a1a
Author: Hyukjin Kwon 
AuthorDate: Wed Jun 8 17:14:29 2022 +0900

[SPARK-39411][BUILD] Fix release script to address type hint in 
pyspark/version.py

This PR proposes to address type hints `__version__: str` correctly in each 
release. The type hint was added from Spark 3.3.0 at 
https://github.com/apache/spark/commit/f59e1d548e2e7c97195697910c40c5383a76ca48.

For PySpark to have the correct version in releases.

No, dev-only.

Manually tested by setting environment variables and running the changed 
shall commands locally.

Closes #36803 from HyukjinKwon/SPARK-39411.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 87b0a41cfb46ba9389c6f5abb9628415a72c4f93)
Signed-off-by: Hyukjin Kwon 
---
 dev/create-release/release-build.sh |  7 ++-
 dev/create-release/release-tag.sh   | 13 ++---
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 78fd06ba2be..ddeb4d322ce 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -265,7 +265,12 @@ if [[ "$1" == "package" ]]; then
 # Write out the VERSION to PySpark version info we rewrite the - into a . 
and SNAPSHOT
 # to dev0 to be closer to PEP440.
 PYSPARK_VERSION=`echo "$SPARK_VERSION" |  sed -e "s/-/./" -e 
"s/SNAPSHOT/dev0/" -e "s/preview/dev/"`
-echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py
+
+if [[ $SPARK_VERSION == 3.0* ]] || [[ $SPARK_VERSION == 3.1* ]] || [[ 
$SPARK_VERSION == 3.2* ]]; then
+  echo "__version__ = '$PYSPARK_VERSION'" > python/pyspark/version.py
+else
+  echo "__version__: str = '$PYSPARK_VERSION'" > python/pyspark/version.py
+fi
 
 # Get maven home set by MVN
 MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'`
diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index 55aa2e569fc..255bda37ad8 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -85,7 +85,11 @@ fi
 sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' 
docs/_config.yml
 sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$RELEASE_VERSION"'/g' docs/_config.yml
 sed -i".tmp3" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$RELEASE_VERSION\"]/g" docs/_config.yml
-sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
+if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ 
$RELEASE_VERSION == 3.2* ]]; then
+  sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
+else
+  sed -i".tmp4" 's/__version__: str = .*$/__version__: str = 
"'"$RELEASE_VERSION"'"/' python/pyspark/version.py
+fi
 
 git commit -a -m "Preparing Spark release $RELEASE_TAG"
 echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH"
@@ -98,8 +102,11 @@ R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'`
 sed -i".tmp5" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
 # Write out the R_NEXT_VERSION to PySpark version info we use dev0 instead of 
SNAPSHOT to be closer
 # to PEP440.
-sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' 
python/pyspark/version.py
-
+if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ 
$RELEASE_VERSION == 3.2* ]]; then
+  sed -i".tmp6" 's/__version__ = .*$/__version__ = 
"'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py
+else
+  sed -i".tmp6" 's/__version__: str = .*$/__version__: str = 
"'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py
+fi
 
 # Update docs with next version
 sed -i".tmp7" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (87b0a41cfb4 -> 12b7e61e16c)

2022-06-08 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 87b0a41cfb4 [SPARK-39411][BUILD] Fix release script to address type 
hint in pyspark/version.py
 add 12b7e61e16c [SPARK-39404][SS] Minor fix for querying `_metadata` in 
streaming

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 87b0a41cfb4 [SPARK-39411][BUILD] Fix release script to address type 
hint in pyspark/version.py
87b0a41cfb4 is described below

commit 87b0a41cfb46ba9389c6f5abb9628415a72c4f93
Author: Hyukjin Kwon 
AuthorDate: Wed Jun 8 17:14:29 2022 +0900

[SPARK-39411][BUILD] Fix release script to address type hint in 
pyspark/version.py

### What changes were proposed in this pull request?

This PR proposes to address type hints `__version__: str` correctly in each 
release. The type hint was added from Spark 3.3.0 at 
https://github.com/apache/spark/commit/f59e1d548e2e7c97195697910c40c5383a76ca48.

### Why are the changes needed?

For PySpark to have the correct version in releases.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested by setting environment variables and running the changed 
shall commands locally.

Closes #36803 from HyukjinKwon/SPARK-39411.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 dev/create-release/release-build.sh |  7 ++-
 dev/create-release/release-tag.sh   | 13 ++---
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 78fd06ba2be..ddeb4d322ce 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -265,7 +265,12 @@ if [[ "$1" == "package" ]]; then
 # Write out the VERSION to PySpark version info we rewrite the - into a . 
and SNAPSHOT
 # to dev0 to be closer to PEP440.
 PYSPARK_VERSION=`echo "$SPARK_VERSION" |  sed -e "s/-/./" -e 
"s/SNAPSHOT/dev0/" -e "s/preview/dev/"`
-echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py
+
+if [[ $SPARK_VERSION == 3.0* ]] || [[ $SPARK_VERSION == 3.1* ]] || [[ 
$SPARK_VERSION == 3.2* ]]; then
+  echo "__version__ = '$PYSPARK_VERSION'" > python/pyspark/version.py
+else
+  echo "__version__: str = '$PYSPARK_VERSION'" > python/pyspark/version.py
+fi
 
 # Get maven home set by MVN
 MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'`
diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index 55aa2e569fc..255bda37ad8 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -85,7 +85,11 @@ fi
 sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' 
docs/_config.yml
 sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$RELEASE_VERSION"'/g' docs/_config.yml
 sed -i".tmp3" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$RELEASE_VERSION\"]/g" docs/_config.yml
-sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
+if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ 
$RELEASE_VERSION == 3.2* ]]; then
+  sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
+else
+  sed -i".tmp4" 's/__version__: str = .*$/__version__: str = 
"'"$RELEASE_VERSION"'"/' python/pyspark/version.py
+fi
 
 git commit -a -m "Preparing Spark release $RELEASE_TAG"
 echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH"
@@ -98,8 +102,11 @@ R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'`
 sed -i".tmp5" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
 # Write out the R_NEXT_VERSION to PySpark version info we use dev0 instead of 
SNAPSHOT to be closer
 # to PEP440.
-sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' 
python/pyspark/version.py
-
+if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ 
$RELEASE_VERSION == 3.2* ]]; then
+  sed -i".tmp6" 's/__version__ = .*$/__version__ = 
"'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py
+else
+  sed -i".tmp6" 's/__version__: str = .*$/__version__: str = 
"'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py
+fi
 
 # Update docs with next version
 sed -i".tmp7" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

2022-06-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 25f38b12c06 [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect 
CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'
25f38b12c06 is described below

commit 25f38b12c06daa108f2367e5244a5053e281df21
Author: Hyukjin Kwon 
AuthorDate: Wed Jun 8 17:13:35 2022 +0900

[SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect 
CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

### What changes were proposed in this pull request?

This PR fixes the test to make `CastWithAnsiOffSuite` properly respect 
`ansiEnabled` in `cast string to date #2` test by using 
`CastWithAnsiOffSuite.cast` instead of `Cast` expression.

### Why are the changes needed?

To make the tests pass. Currently it fails when ANSI mode is on:

https://github.com/apache/spark/runs/6786744647

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually tested in my IDE.

Closes #36802 from HyukjinKwon/SPARK-39321-followup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala  | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala
index 4e4bc096dea..56e586da2a3 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala
@@ -65,11 +65,11 @@ class CastWithAnsiOffSuite extends CastSuiteBase {
   }
 
   test("cast string to date #2") {
-checkEvaluation(Cast(Literal("2015-03-18X"), DateType), null)
-checkEvaluation(Cast(Literal("2015/03/18"), DateType), null)
-checkEvaluation(Cast(Literal("2015.03.18"), DateType), null)
-checkEvaluation(Cast(Literal("20150318"), DateType), null)
-checkEvaluation(Cast(Literal("2015-031-8"), DateType), null)
+checkEvaluation(cast(Literal("2015-03-18X"), DateType), null)
+checkEvaluation(cast(Literal("2015/03/18"), DateType), null)
+checkEvaluation(cast(Literal("2015.03.18"), DateType), null)
+checkEvaluation(cast(Literal("20150318"), DateType), null)
+checkEvaluation(cast(Literal("2015-031-8"), DateType), null)
   }
 
   test("casting to fixed-precision decimals") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 54aabb012e8 [SPARK-39350][SQL] Add flag to control breaking change 
process for: DESC NAMESPACE EXTENDED should redact properties
54aabb012e8 is described below

commit 54aabb012e85b5c46773b57960d57de580fa8bba
Author: Daniel Tenedorio 
AuthorDate: Wed Jun 8 15:56:13 2022 +0900

[SPARK-39350][SQL] Add flag to control breaking change process for: DESC 
NAMESPACE EXTENDED should redact properties

### What changes were proposed in this pull request?

Add a flag to control breaking change process for: DESC NAMESPACE EXTENDED 
should redact properties.

### Why are the changes needed?

This lets Spark users control how the new behavior rolls out to users.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR extends unit test coverage.

Closes #36799 from dtenedor/desc-namespace-breaking-change.

Authored-by: Daniel Tenedorio 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 
 .../apache/spark/sql/execution/command/ddl.scala   |  2 +
 .../datasources/v2/DescribeNamespaceExec.scala |  3 +
 .../command/v2/DescribeNamespaceSuite.scala| 64 +++---
 4 files changed, 58 insertions(+), 21 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 8c7702efd47..4b0d110b077 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3818,6 +3818,16 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+ val LEGACY_DESC_NAMESPACE_REDACT_PROPERTIES =
+buildConf("spark.sql.legacy.descNamespaceRedactProperties")
+  .internal()
+  .doc("When set to false, redact sensitive information in the result of 
DESC NAMESPACE " +
+"EXTENDED. If set to true, it restores the legacy behavior that this 
sensitive " +
+"information was included in the output.")
+  .version("3.4.0")
+  .booleanConf
+  .createWithDefault(false)
+
   /**
* Holds information about keys that have been deprecated.
*
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index 5cdcf33d6cd..19b737d7d80 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -190,6 +190,8 @@ case class DescribeDatabaseCommand(
   val propertiesStr =
 if (properties.isEmpty) {
   ""
+} else if 
(SQLConf.get.getConf(SQLConf.LEGACY_DESC_NAMESPACE_REDACT_PROPERTIES)) {
+  properties.toSeq.sortBy(_._1).mkString("(", ", ", ")")
 } else {
   conf.redactOptions(properties).toSeq.sortBy(_._1).mkString("(", ", 
", ")")
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala
index 75c12ea4201..950511e16c8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala
@@ -23,6 +23,7 @@ import scala.collection.mutable.ArrayBuffer
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.connector.catalog.{CatalogV2Util, 
SupportsNamespaces}
+import org.apache.spark.sql.internal.SQLConf
 
 /**
  * Physical plan node for describing a namespace.
@@ -48,6 +49,8 @@ case class DescribeNamespaceExec(
   val propertiesStr =
 if (properties.isEmpty) {
   ""
+} else if 
(SQLConf.get.getConf(SQLConf.LEGACY_DESC_NAMESPACE_REDACT_PROPERTIES)) {
+  properties.toSeq.sortBy(_._1).mkString("(", ", ", ")")
 } else {
   
conf.redactOptions(properties.toMap).toSeq.sortBy(_._1).mkString("(", ", ", ")")
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala
index 645399b9026..3f1108f379e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala
@@

[spark] branch branch-3.3 updated (86f1b6bfe39 -> 3a952933c34)

2022-06-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from 86f1b6bfe39 [SPARK-39394][DOCS][SS][3.3] Improve PySpark Structured 
Streaming page more readable
 add 3a952933c34 [SPARK-39392][SQL][3.3] Refine ANSI error messages for 
try_* function hints

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   | 10 ++--
 .../org/apache/spark/SparkThrowableSuite.scala |  3 +-
 .../spark/sql/errors/QueryExecutionErrors.scala|  7 ++-
 .../resources/sql-tests/results/ansi/array.sql.out |  8 +--
 .../resources/sql-tests/results/ansi/cast.sql.out  | 68 +++---
 .../resources/sql-tests/results/ansi/date.sql.out  |  6 +-
 .../results/ansi/datetime-parsing-invalid.sql.out  |  4 +-
 .../sql-tests/results/ansi/interval.sql.out| 32 +-
 .../resources/sql-tests/results/ansi/map.sql.out   |  8 +--
 .../results/ansi/string-functions.sql.out  |  8 +--
 .../resources/sql-tests/results/interval.sql.out   | 12 ++--
 .../sql-tests/results/postgreSQL/boolean.sql.out   | 32 +-
 .../sql-tests/results/postgreSQL/float4.sql.out| 14 ++---
 .../sql-tests/results/postgreSQL/float8.sql.out| 10 ++--
 .../sql-tests/results/postgreSQL/int8.sql.out  | 14 ++---
 .../results/postgreSQL/select_having.sql.out   |  2 +-
 .../sql-tests/results/postgreSQL/text.sql.out  |  4 +-
 .../results/postgreSQL/window_part2.sql.out|  2 +-
 .../results/postgreSQL/window_part3.sql.out|  2 +-
 .../results/postgreSQL/window_part4.sql.out|  2 +-
 .../results/timestampNTZ/timestamp-ansi.sql.out|  2 +-
 .../udf/postgreSQL/udf-select_having.sql.out   |  2 +-
 22 files changed, 128 insertions(+), 124 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build

[spark] branch branch-3.3 updated: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build

[spark] branch master updated (cb55efadea1 -> c196ff4dfa1)

[spark] branch master updated: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

[spark] branch master updated: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

[spark] branch master updated: [SPARK-39349] Add a centralized CheckError method for QA of error path

[spark] branch master updated: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

[spark] branch branch-3.1 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

[spark] branch branch-3.2 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

[spark] branch branch-3.3 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

[spark] branch master updated (19afe1341d2 -> ac2881a8c3c)

[spark] branch master updated: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors

[spark] branch branch-3.3 updated: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors

[spark] branch master updated: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

[spark] branch branch-3.3 updated: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

[spark] branch master updated (87b0a41cfb4 -> 12b7e61e16c)

[spark] branch master updated: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

[spark] branch master updated: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

[spark] branch master updated: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

[spark] branch branch-3.3 updated (86f1b6bfe39 -> 3a952933c34)

20 matches

Site Navigation

Mail list logo

Footer information