[spark] branch branch-3.2 updated: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new a74a3e382e2 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build a74a3e382e2 is described below commit a74a3e382e28a36c552fd689e390275bb1d9811a Author: Hyukjin Kwon AuthorDate: Thu Jun 9 14:26:45 2022 +0900 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build This PR fixes the Sphinx build failure below (see https://github.com/singhpk234/spark/runs/6799026458?check_suite_focus=true): ``` Moving to python/docs directory and building sphinx. Running Sphinx v3.0.4 WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate the documentation; however, your version was 1.3.5 warnings.warn( Warning, treated as error: node class 'meta' is already registered, its visitors will be overridden make: *** [Makefile:35: html] Error 2 Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. ``` Sphinx build fails apparently with the latest docutils (see also https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. To recover the CI. No, dev-only. CI in this PR should test it out. Closes #36813 from HyukjinKwon/SPARK-39421. Lead-authored-by: Hyukjin Kwon Co-authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit c196ff4dfa1d9f1a8e20b884ee5b4a4e6e65a6e3) Signed-off-by: Hyukjin Kwon --- .github/workflows/build_and_test.yml | 1 + dev/requirements.txt | 1 + 2 files changed, 2 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 1329b5ea27c..68424dc 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -399,6 +399,7 @@ jobs: python3.9 -m pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' python3.9 -m pip install sphinx_plotly_directive 'pyarrow<5.0.0' pandas 'plotly>=4.8' python3.9 -m pip install ipython_genutils # See SPARK-38517 +python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421 apt-get update -y apt-get install -y ruby ruby-dev Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')" diff --git a/dev/requirements.txt b/dev/requirements.txt index 273294a96af..7f40737af91 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -32,6 +32,7 @@ numpydoc jinja2<3.0.0 sphinx<3.1.0 sphinx-plotly-directive +docutils<0.18.0 # Development scripts jira - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 66826567fa1 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build 66826567fa1 is described below commit 66826567fa12e57119acc97f9971e36fe834df21 Author: Hyukjin Kwon AuthorDate: Thu Jun 9 14:26:45 2022 +0900 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build ### What changes were proposed in this pull request? This PR fixes the Sphinx build failure below (see https://github.com/singhpk234/spark/runs/6799026458?check_suite_focus=true): ``` Moving to python/docs directory and building sphinx. Running Sphinx v3.0.4 WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched. /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate the documentation; however, your version was 1.3.5 warnings.warn( Warning, treated as error: node class 'meta' is already registered, its visitors will be overridden make: *** [Makefile:35: html] Error 2 Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. ``` Sphinx build fails apparently with the latest docutils (see also https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version. ### Why are the changes needed? To recover the CI. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? CI in this PR should test it out. Closes #36813 from HyukjinKwon/SPARK-39421. Lead-authored-by: Hyukjin Kwon Co-authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit c196ff4dfa1d9f1a8e20b884ee5b4a4e6e65a6e3) Signed-off-by: Hyukjin Kwon --- .github/workflows/build_and_test.yml | 1 + dev/requirements.txt | 1 + 2 files changed, 2 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 1f5df70cde9..e0e9f70556c 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -528,6 +528,7 @@ jobs: python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' python3.9 -m pip install ipython_genutils # See SPARK-38517 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly>=4.8' +python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421 apt-get update -y apt-get install -y ruby ruby-dev Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'markdown', 'e1071', 'roxygen2'), repos='https://cloud.r-project.org/')" diff --git a/dev/requirements.txt b/dev/requirements.txt index 22e72d55543..e7e0a4b4274 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -35,6 +35,7 @@ numpydoc jinja2<3.0.0 sphinx<3.1.0 sphinx-plotly-directive +docutils<0.18.0 # Development scripts jira - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cb55efadea1 -> c196ff4dfa1)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cb55efadea1 [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace add c196ff4dfa1 [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 1 + dev/requirements.txt | 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cb55efadea1 [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace cb55efadea1 is described below commit cb55efadea1399e1ce6daae5d9ec7896ffce1b93 Author: Rui Wang AuthorDate: Thu Jun 9 11:08:00 2022 +0800 [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace ### What changes were proposed in this pull request? 1. Change `CreateTable` API to make it support 3 layer namespace. 2. Change `ListTables` API such that a) it supports `database` parameter and if that `database` does not exist b) further check if the parameter is `catalog.database`. ### Why are the changes needed? CreateTable and ListTables does not support 3 layer namespace. ### Does this PR introduce _any_ user-facing change? Yes. The API change here is backward compatible and it extends the API to further support 3 layer namespace (e.g. catalog.database.table). ### How was this patch tested? UT Closes #36586 from amaliujia/catalogapi. Authored-by: Rui Wang Signed-off-by: Wenchen Fan --- R/pkg/tests/fulltests/test_sparkSQL.R | 5 +- .../sql/catalyst/catalog/SessionCatalog.scala | 4 + .../spark/sql/errors/QueryCompilationErrors.scala | 4 + .../org/apache/spark/sql/catalog/interface.scala | 26 +++- .../apache/spark/sql/internal/CatalogImpl.scala| 92 +++--- .../spark/sql/execution/GlobalTempViewSuite.scala | 4 +- .../apache/spark/sql/internal/CatalogSuite.scala | 140 - .../spark/sql/hive/MetastoreDataSourcesSuite.scala | 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 4 +- 9 files changed, 251 insertions(+), 30 deletions(-) diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index df1094bacef..f0abc96613d 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -663,7 +663,7 @@ test_that("test tableNames and tables", { expect_equal(count(tables), count + 1) expect_equal(count(tables()), count(tables)) expect_true("tableName" %in% colnames(tables())) - expect_true(all(c("tableName", "database", "isTemporary") %in% colnames(tables( + expect_true(all(c("tableName", "namespace", "isTemporary") %in% colnames(tables( suppressWarnings(registerTempTable(df, "table2")) tables <- listTables() @@ -4026,7 +4026,8 @@ test_that("catalog APIs, listTables, listColumns, listFunctions", { tb <- listTables() count <- count(tables()) expect_equal(nrow(tb), count) - expect_equal(colnames(tb), c("name", "database", "description", "tableType", "isTemporary")) + expect_equal(colnames(tb), + c("name", "catalog", "namespace", "description", "tableType", "isTemporary")) createOrReplaceTempView(as.DataFrame(cars), "cars") diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index d6c80f98bf7..0152f49c798 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -965,6 +965,10 @@ class SessionCatalog( isTempView(nameParts.asTableIdentifier) } + def isGlobalTempViewDB(dbName: String): Boolean = { +globalTempViewManager.database.equals(dbName) + } + def lookupTempView(name: TableIdentifier): Option[View] = { val tableName = formatTableName(name.table) if (name.database.isEmpty) { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 551eaa6aeb7..68f4320ff67 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -2188,6 +2188,10 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { new AnalysisException(s"Table or view '$tableName' not found in database '$dbName'") } + def tableOrViewNotFound(ident: Seq[String]): Throwable = { +new AnalysisException(s"Table or view '${ident.quoted}' not found") + } + def unexpectedTypeOfRelationError(relation: LogicalPlan, tableName: String): Throwable = { new AnalysisException( s"Unexpected type ${relation.getClass.getCanonicalName} of the relation $tableName") diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala
[spark] branch master updated: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6f9997ca9f3 [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190 6f9997ca9f3 is described below commit 6f9997ca9f3639f01b25a9cff4985a5b3b224578 Author: sychen AuthorDate: Wed Jun 8 19:59:26 2022 -0700 [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190 ### What changes were proposed in this pull request? Add UT, test whether the Overflow of newLength problem is fixed. ### Why are the changes needed? https://github.com/apache/spark/pull/36772#pullrequestreview-996975725 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add UT Closes #36787 from cxzl25/SPARK-39387-FOLLOWUP. Authored-by: sychen Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/datasources/orc/OrcQuerySuite.scala | 12 1 file changed, 12 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala index a289a94fdce..2c1120baa7c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala @@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession { } } } + + test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") { +withTempPath { dir => + val path = dir.getCanonicalPath + val df = spark.range(1, 22, 1, 1).map { _ => +val byteData = Array.fill[Byte](1024 * 1024)('X') +val mapData = (1 to 100).map(i => (i, byteData)) +mapData + }.toDF() + df.write.format("orc").save(path) +} + } } class OrcV1QuerySuite extends OrcQuerySuite { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39349] Add a centralized CheckError method for QA of error path
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d8e9ac01f8e [SPARK-39349] Add a centralized CheckError method for QA of error path d8e9ac01f8e is described below commit d8e9ac01f8e42f10707efc8a7579d32ff88dbd58 Author: Serge Rielau AuthorDate: Thu Jun 9 09:40:08 2022 +0800 [SPARK-39349] Add a centralized CheckError method for QA of error path ### What changes were proposed in this pull request? Pulling error messages out of the code base into error-classes.json solves only one half of the problem. This change aims to lay the infrastructure to pull error messages out of QA. We do this by adding an central checkError() method in SparkFunSuite which is geared towards verifying the payload of an error only: - ERROR_CLASS - Optional ERROR_SUBCLASS - Optional SQLSTATE (derived from error-classes.json, so debatable) - Parameter values (with optional parameter names for extra points) The method allows regex matching of parameter values. ### Why are the changes needed? Pulling error-messages out of code and QA makes for a central place to fine tune error-messages for language and formatting. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A subset of QA tests has been rewritten to exercise the code. Closes #36693 from srielau/textless-error-check. Lead-authored-by: Serge Rielau Co-authored-by: Serge Rielau Co-authored-by: Gengliang Wang Signed-off-by: Wenchen Fan --- .../main/java/org/apache/spark/SparkThrowable.java | 13 + .../apache/spark/memory/SparkOutOfMemoryError.java | 8 +- core/src/main/resources/error/error-classes.json | 37 ++- .../main/scala/org/apache/spark/ErrorInfo.scala| 36 ++- .../scala/org/apache/spark/SparkException.scala| 199 --- .../scala/org/apache/spark/SparkFunSuite.scala | 63 + .../org/apache/spark/SparkThrowableSuite.scala | 10 +- .../org/apache/spark/sql/AnalysisException.scala | 52 +++- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../catalog/InvalidUDFClassException.scala | 2 +- .../spark/sql/catalyst/parser/ParseDriver.scala| 24 +- .../spark/sql/errors/QueryCompilationErrors.scala | 40 +-- .../spark/sql/errors/QueryExecutionErrors.scala| 61 +++-- .../spark/sql/errors/QueryParsingErrors.scala | 32 ++- .../catalyst/encoders/EncoderResolutionSuite.scala | 146 ++- .../test/resources/sql-tests/results/date.sql.out | 4 +- .../sql-tests/results/datetime-legacy.sql.out | 4 +- .../resources/sql-tests/results/describe.sql.out | 4 +- .../errors/QueryCompilationErrorsDSv2Suite.scala | 17 +- .../sql/errors/QueryCompilationErrorsSuite.scala | 174 +++-- .../spark/sql/errors/QueryErrorsSuiteBase.scala| 1 + .../sql/errors/QueryExecutionAnsiErrorsSuite.scala | 32 +-- .../sql/errors/QueryExecutionErrorsSuite.scala | 273 +++-- 23 files changed, 791 insertions(+), 443 deletions(-) diff --git a/core/src/main/java/org/apache/spark/SparkThrowable.java b/core/src/main/java/org/apache/spark/SparkThrowable.java index 2be0c3c0f94..581e1f6eebb 100644 --- a/core/src/main/java/org/apache/spark/SparkThrowable.java +++ b/core/src/main/java/org/apache/spark/SparkThrowable.java @@ -36,6 +36,10 @@ public interface SparkThrowable { // If null, error class is not set String getErrorClass(); + default String getErrorSubClass() { +return null; + } + // Portable error identifier across SQL engines // If null, error class or SQLSTATE is not set default String getSqlState() { @@ -46,4 +50,13 @@ public interface SparkThrowable { default boolean isInternalError() { return SparkThrowableHelper.isInternalError(this.getErrorClass()); } + + default String[] getMessageParameters() { +return new String[]{}; + } + + // Returns a string array of all parameters that need to be passed to this error message. + default String[] getParameterNames() { +return SparkThrowableHelper.getParameterNames(this.getErrorClass(), this.getErrorSubClass()); + } } diff --git a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java index c5f19a0c201..9d2739018a0 100644 --- a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java +++ b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java @@ -39,11 +39,17 @@ public final class SparkOutOfMemoryError extends OutOfMemoryError implements Spa } public SparkOutOfMemoryError(String errorClass, String[] messageParameters) { -super(SparkThrowableHelper.getMessage(errorClass,
[spark] branch master updated: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4e4eb6f1ed5 [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case 4e4eb6f1ed5 is described below commit 4e4eb6f1ed5ff0d3caa7f424d2df23f186bf32a2 Author: Angerszh AuthorDate: Thu Jun 9 08:06:13 2022 +0800 [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case ### What changes were proposed in this pull request? In current code, when we use `spark-sql` `-e` , `-f` or use `ctrl + c` to close `spark-sql` session, will remain hive session resource dir under `/tmp` path, this pr help to clean this files ### Why are the changes needed? Clean remained files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manuel tested Closes #36786 from AngersZh/SPARK-39400. Lead-authored-by: Angerszh Co-authored-by: AngersZh Signed-off-by: Yuming Wang --- .../apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala index fccb2a65273..d40cf73be63 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala @@ -103,10 +103,13 @@ private[hive] object SparkSQLCLIDriver extends Logging { sessionState.info = new PrintStream(System.err, true, UTF_8.name()) sessionState.err = new PrintStream(System.err, true, UTF_8.name()) } catch { - case e: UnsupportedEncodingException => System.exit(ERROR_PATH_NOT_FOUND) + case e: UnsupportedEncodingException => +sessionState.close() +System.exit(ERROR_PATH_NOT_FOUND) } if (!oproc.process_stage2(sessionState)) { + sessionState.close() System.exit(ERROR_MISUSE_SHELL_BUILTIN) } @@ -140,7 +143,10 @@ private[hive] object SparkSQLCLIDriver extends Logging { SessionState.setCurrentSessionState(sessionState) // Clean up after we exit -ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() } +ShutdownHookManager.addShutdownHook { () => + sessionState.close() + SparkSQLEnv.stop() +} if (isRemoteMode(sessionState)) { // Hive 1.2 + not supported in CLI - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 512d337abf1 [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types 512d337abf1 is described below commit 512d337abf1387a81ac47e50656e330eb3f51b22 Author: Amin Borjian AuthorDate: Wed Jun 8 13:30:44 2022 -0700 [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types ### What changes were proposed in this pull request? In Spark version 3.1.0 and newer, Spark creates extra filter predicate conditions for repeated parquet columns. These fields do not have the ability to have a filter predicate, according to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in the parquet library. This PR solves this problem until the appropriate functionality is provided by the parquet. Before this PR: Assume follow Protocol buffer schema: ``` message Model { string name = 1; repeated string keywords = 2; } ``` Suppose a parquet file is created from a set of records in the above format with the help of the parquet-protobuf library. Using Spark version 3.1.0 or newer, we get following exception when run the following query using spark-shell: ``` val data = spark.read.parquet("/path/to/parquet") data.registerTempTable("models") spark.sql("select * from models where array_contains(keywords, 'X')").show(false) ``` ``` Caused by: java.lang.IllegalArgumentException: FilterPredicates do not currently support repeated columns. Column keywords is repeated. at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56) at org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149) at org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72) at org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870) at org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789) at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162) at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127) ... ``` The cause of the problem is due to a change in the data filtering conditions: ``` spark.sql("select * from log where array_contains(keywords, 'X')").explain(true); // Spark 3.0.2 and older == Physical Plan == ... +- FileScan parquet [link#0,keywords#1] DataFilters: [array_contains(keywords#1, Google)] PushedFilters: [] ... // Spark 3.1.0 and newer == Physical Plan == ... +- FileScan parquet [link#0,keywords#1] DataFilters: [isnotnull(keywords#1), array_contains(keywords#1, Google)] PushedFilters: [IsNotNull(keywords)] ... ``` Pushing filters down for repeated columns of parquet is not necessary because it is not supported by parquet library for now. So we can exclude them from pushed predicate filters and solve issue. ### Why are the changes needed? Predicate filters that are pushed down to parquet should not be created on repeated-type fields. ### Does this PR introduce any user-facing change? No, It's only fixed a bug and before this, due to the limitations of the parquet library, no more work
[spark] branch branch-3.2 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new d42f53b5ec4 [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types d42f53b5ec4 is described below commit d42f53b5ec4d3442acadaa0f2737a8430172a562 Author: Amin Borjian AuthorDate: Wed Jun 8 13:30:44 2022 -0700 [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types ### What changes were proposed in this pull request? In Spark version 3.1.0 and newer, Spark creates extra filter predicate conditions for repeated parquet columns. These fields do not have the ability to have a filter predicate, according to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in the parquet library. This PR solves this problem until the appropriate functionality is provided by the parquet. Before this PR: Assume follow Protocol buffer schema: ``` message Model { string name = 1; repeated string keywords = 2; } ``` Suppose a parquet file is created from a set of records in the above format with the help of the parquet-protobuf library. Using Spark version 3.1.0 or newer, we get following exception when run the following query using spark-shell: ``` val data = spark.read.parquet("/path/to/parquet") data.registerTempTable("models") spark.sql("select * from models where array_contains(keywords, 'X')").show(false) ``` ``` Caused by: java.lang.IllegalArgumentException: FilterPredicates do not currently support repeated columns. Column keywords is repeated. at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56) at org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149) at org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72) at org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870) at org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789) at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162) at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127) ... ``` The cause of the problem is due to a change in the data filtering conditions: ``` spark.sql("select * from log where array_contains(keywords, 'X')").explain(true); // Spark 3.0.2 and older == Physical Plan == ... +- FileScan parquet [link#0,keywords#1] DataFilters: [array_contains(keywords#1, Google)] PushedFilters: [] ... // Spark 3.1.0 and newer == Physical Plan == ... +- FileScan parquet [link#0,keywords#1] DataFilters: [isnotnull(keywords#1), array_contains(keywords#1, Google)] PushedFilters: [IsNotNull(keywords)] ... ``` Pushing filters down for repeated columns of parquet is not necessary because it is not supported by parquet library for now. So we can exclude them from pushed predicate filters and solve issue. ### Why are the changes needed? Predicate filters that are pushed down to parquet should not be created on repeated-type fields. ### Does this PR introduce any user-facing change? No, It's only fixed a bug and before this, due to the limitations of the parquet library, no more work
[spark] branch branch-3.3 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 5847014fc3f [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types 5847014fc3f is described below commit 5847014fc3fe08b8a59c107a99c1540fbb2c2208 Author: Amin Borjian AuthorDate: Wed Jun 8 13:30:44 2022 -0700 [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types ### What changes were proposed in this pull request? In Spark version 3.1.0 and newer, Spark creates extra filter predicate conditions for repeated parquet columns. These fields do not have the ability to have a filter predicate, according to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in the parquet library. This PR solves this problem until the appropriate functionality is provided by the parquet. Before this PR: Assume follow Protocol buffer schema: ``` message Model { string name = 1; repeated string keywords = 2; } ``` Suppose a parquet file is created from a set of records in the above format with the help of the parquet-protobuf library. Using Spark version 3.1.0 or newer, we get following exception when run the following query using spark-shell: ``` val data = spark.read.parquet("/path/to/parquet") data.registerTempTable("models") spark.sql("select * from models where array_contains(keywords, 'X')").show(false) ``` ``` Caused by: java.lang.IllegalArgumentException: FilterPredicates do not currently support repeated columns. Column keywords is repeated. at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56) at org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192) at org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95) at org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) at org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149) at org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72) at org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870) at org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789) at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162) at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127) ... ``` The cause of the problem is due to a change in the data filtering conditions: ``` spark.sql("select * from log where array_contains(keywords, 'X')").explain(true); // Spark 3.0.2 and older == Physical Plan == ... +- FileScan parquet [link#0,keywords#1] DataFilters: [array_contains(keywords#1, Google)] PushedFilters: [] ... // Spark 3.1.0 and newer == Physical Plan == ... +- FileScan parquet [link#0,keywords#1] DataFilters: [isnotnull(keywords#1), array_contains(keywords#1, Google)] PushedFilters: [IsNotNull(keywords)] ... ``` Pushing filters down for repeated columns of parquet is not necessary because it is not supported by parquet library for now. So we can exclude them from pushed predicate filters and solve issue. ### Why are the changes needed? Predicate filters that are pushed down to parquet should not be created on repeated-type fields. ### Does this PR introduce any user-facing change? No, It's only fixed a bug and before this, due to the limitations of the parquet library, no more work
[spark] branch master updated (19afe1341d2 -> ac2881a8c3c)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 19afe1341d2 [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors add ac2881a8c3c [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types No new revisions were added by this update. Summary of changes: .../datasources/parquet/ParquetFilters.scala | 6 - .../datasources/parquet/ParquetFilterSuite.scala | 29 ++ 2 files changed, 34 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 19afe1341d2 [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors 19afe1341d2 is described below commit 19afe1341d277bc2d7dd47175d142a8c71141138 Author: Max Gekk AuthorDate: Wed Jun 8 21:20:55 2022 +0300 [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors ### What changes were proposed in this pull request? In the PR, I propose to exclude `IllegalStateException` from the list of exceptions that are wrapped by `SparkException` with the `INTERNAL_ERROR` error class. ### Why are the changes needed? See explanation in SPARK-39412. ### Does this PR introduce _any_ user-facing change? No, the reverted changes haven't released yet. ### How was this patch tested? By running the modified test suites: ``` $ build/sbt "test:testOnly *ContinuousSuite" $ build/sbt "test:testOnly *MicroBatchExecutionSuite" $ build/sbt "test:testOnly *KafkaMicroBatchV1SourceSuite" $ build/sbt "test:testOnly *KafkaMicroBatchV2SourceSuite" $ build/sbt "test:testOnly *.WholeStageCodegenSuite" ``` Closes #36804 from MaxGekk/exclude-IllegalStateException. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala | 11 --- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala| 2 +- .../scala/org/apache/spark/sql/execution/QueryExecution.scala | 7 +++ .../apache/spark/sql/execution/WholeStageCodegenSuite.scala | 11 --- .../sql/execution/streaming/MicroBatchExecutionSuite.scala| 6 ++ .../spark/sql/streaming/continuous/ContinuousSuite.scala | 7 +++ 6 files changed, 17 insertions(+), 27 deletions(-) diff --git a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala index 0a32b1b54d0..2396f31b954 100644 --- a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala +++ b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala @@ -34,7 +34,6 @@ import org.apache.kafka.common.TopicPartition import org.scalatest.concurrent.PatienceConfiguration.Timeout import org.scalatest.time.SpanSugar._ -import org.apache.spark.{SparkException, SparkThrowable} import org.apache.spark.sql.{Dataset, ForeachWriter, Row, SparkSession} import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap import org.apache.spark.sql.connector.read.streaming.SparkDataStream @@ -667,10 +666,9 @@ abstract class KafkaMicroBatchSourceSuiteBase extends KafkaSourceSuiteBase { testUtils.sendMessages(topic2, Array("6")) }, StartStream(), - ExpectFailure[SparkException](e => { -assert(e.asInstanceOf[SparkThrowable].getErrorClass === "INTERNAL_ERROR") + ExpectFailure[IllegalStateException](e => { // The offset of `topic2` should be changed from 2 to 1 -assert(e.getCause.getMessage.contains("was changed from 2 to 1")) +assert(e.getMessage.contains("was changed from 2 to 1")) }) ) } @@ -766,13 +764,12 @@ abstract class KafkaMicroBatchSourceSuiteBase extends KafkaSourceSuiteBase { testStream(df)( StartStream(checkpointLocation = metadataPath.getAbsolutePath), -ExpectFailure[SparkException](e => { - assert(e.asInstanceOf[SparkThrowable].getErrorClass === "INTERNAL_ERROR") +ExpectFailure[IllegalStateException](e => { Seq( s"maximum supported log version is v1, but encountered v9", "produced by a newer version of Spark and cannot be read by this version" ).foreach { message => -assert(e.getCause.toString.contains(message)) +assert(e.toString.contains(message)) } })) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index 0a45cf92c6e..97a5318b3ed 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -3916,7 +3916,7 @@ class Dataset[T] private[sql]( /** * Wrap a Dataset action to track the QueryExecution and time cost, then report to the - * user-registered callback functions, and also to convert asserts/illegal states to + * user-registered callback functions, and also to convert asserts/NPE to * the internal error exception. */ private def withAction[U](name: String, qe: QueryExecution)(action:
[spark] branch branch-3.3 updated: [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 94f3e4113ef [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors 94f3e4113ef is described below commit 94f3e4113ef6fbf0940578bcb279f233e43c27f1 Author: Max Gekk AuthorDate: Wed Jun 8 21:20:55 2022 +0300 [SPARK-39412][SQL] Exclude IllegalStateException from Spark's internal errors ### What changes were proposed in this pull request? In the PR, I propose to exclude `IllegalStateException` from the list of exceptions that are wrapped by `SparkException` with the `INTERNAL_ERROR` error class. ### Why are the changes needed? See explanation in SPARK-39412. ### Does this PR introduce _any_ user-facing change? No, the reverted changes haven't released yet. ### How was this patch tested? By running the modified test suites: ``` $ build/sbt "test:testOnly *ContinuousSuite" $ build/sbt "test:testOnly *MicroBatchExecutionSuite" $ build/sbt "test:testOnly *KafkaMicroBatchV1SourceSuite" $ build/sbt "test:testOnly *KafkaMicroBatchV2SourceSuite" $ build/sbt "test:testOnly *.WholeStageCodegenSuite" ``` Closes #36804 from MaxGekk/exclude-IllegalStateException. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit 19afe1341d277bc2d7dd47175d142a8c71141138) Signed-off-by: Max Gekk --- .../spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala | 11 --- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala| 2 +- .../scala/org/apache/spark/sql/execution/QueryExecution.scala | 7 +++ .../apache/spark/sql/execution/WholeStageCodegenSuite.scala | 11 --- .../sql/execution/streaming/MicroBatchExecutionSuite.scala| 6 ++ .../spark/sql/streaming/continuous/ContinuousSuite.scala | 7 +++ 6 files changed, 17 insertions(+), 27 deletions(-) diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala index 41277a535f5..db71f0fd918 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala @@ -34,7 +34,6 @@ import org.apache.kafka.common.TopicPartition import org.scalatest.concurrent.PatienceConfiguration.Timeout import org.scalatest.time.SpanSugar._ -import org.apache.spark.{SparkException, SparkThrowable} import org.apache.spark.sql.{Dataset, ForeachWriter, Row, SparkSession} import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap import org.apache.spark.sql.connector.read.streaming.SparkDataStream @@ -667,10 +666,9 @@ abstract class KafkaMicroBatchSourceSuiteBase extends KafkaSourceSuiteBase { testUtils.sendMessages(topic2, Array("6")) }, StartStream(), - ExpectFailure[SparkException](e => { -assert(e.asInstanceOf[SparkThrowable].getErrorClass === "INTERNAL_ERROR") + ExpectFailure[IllegalStateException](e => { // The offset of `topic2` should be changed from 2 to 1 -assert(e.getCause.getMessage.contains("was changed from 2 to 1")) +assert(e.getMessage.contains("was changed from 2 to 1")) }) ) } @@ -766,13 +764,12 @@ abstract class KafkaMicroBatchSourceSuiteBase extends KafkaSourceSuiteBase { testStream(df)( StartStream(checkpointLocation = metadataPath.getAbsolutePath), -ExpectFailure[SparkException](e => { - assert(e.asInstanceOf[SparkThrowable].getErrorClass === "INTERNAL_ERROR") +ExpectFailure[IllegalStateException](e => { Seq( s"maximum supported log version is v1, but encountered v9", "produced by a newer version of Spark and cannot be read by this version" ).foreach { message => -assert(e.getCause.toString.contains(message)) +assert(e.toString.contains(message)) } })) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index a4a40cc0e69..6ef9bc2a703 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -3848,7 +3848,7 @@ class Dataset[T] private[sql]( /** * Wrap a Dataset action to track the QueryExecution and time cost, then report to the - * user-registered callback functions, and also to convert asserts/illegal states to + * user-registered callback functions, and also to convert asserts/NPE to * the
[spark] branch master updated: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7d44b47596a [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite 7d44b47596a is described below commit 7d44b47596a14269c4199ccf86aebf4e6c9e7ca4 Author: Jiaan Geng AuthorDate: Wed Jun 8 07:34:36 2022 -0700 [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite ### What changes were proposed in this pull request? `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. ### Why are the changes needed? Capitalize sql keywords in `JDBCV2Suite`. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test cases. ### How was this patch tested? N/A. Closes #36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng Signed-off-by: huaxingao --- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 66 +++--- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index 9de4872fd60..cf96c35d8ae 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -679,7 +679,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with filter push-down with string functions") { -val df1 = sql("select * FROM h2.test.employee where " + +val df1 = sql("SELECT * FROM h2.test.employee WHERE " + "substr(name, 2, 1) = 'e'" + " AND upper(name) = 'JEN' AND lower(name) = 'jen' ") checkFiltersRemoved(df1) @@ -689,7 +689,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel checkPushedInfo(df1, expectedPlanFragment1) checkAnswer(df1, Seq(Row(6, "jen", 12000, 1200, true))) -val df2 = sql("select * FROM h2.test.employee where " + +val df2 = sql("SELECT * FROM h2.test.employee WHERE " + "trim(name) = 'jen' AND trim('j', name) = 'en'" + "AND translate(name, 'e', 1) = 'j1n'") checkFiltersRemoved(df2) @@ -699,7 +699,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel checkPushedInfo(df2, expectedPlanFragment2) checkAnswer(df2, Seq(Row(6, "jen", 12000, 1200, true))) -val df3 = sql("select * FROM h2.test.employee where " + +val df3 = sql("SELECT * FROM h2.test.employee WHERE " + "ltrim(name) = 'jen' AND ltrim('j', name) = 'en'") checkFiltersRemoved(df3) val expectedPlanFragment3 = @@ -708,7 +708,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel checkPushedInfo(df3, expectedPlanFragment3) checkAnswer(df3, Seq(Row(6, "jen", 12000, 1200, true))) -val df4 = sql("select * FROM h2.test.employee where " + +val df4 = sql("SELECT * FROM h2.test.employee WHERE " + "rtrim(name) = 'jen' AND rtrim('n', name) = 'je'") checkFiltersRemoved(df4) val expectedPlanFragment4 = @@ -718,7 +718,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel checkAnswer(df4, Seq(Row(6, "jen", 12000, 1200, true))) // H2 does not support OVERLAY -val df5 = sql("select * FROM h2.test.employee where OVERLAY(NAME, '1', 2, 1) = 'j1n'") +val df5 = sql("SELECT * FROM h2.test.employee WHERE OVERLAY(NAME, '1', 2, 1) = 'j1n'") checkFiltersRemoved(df5, false) val expectedPlanFragment5 = "PushedFilters: [NAME IS NOT NULL]" @@ -727,8 +727,8 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with aggregate push-down: MAX AVG with filter and group by") { -val df = sql("select MAX(SaLaRY), AVG(BONUS) FROM h2.test.employee where dept > 0" + - " group by DePt") +val df = sql("SELECT MAX(SaLaRY), AVG(BONUS) FROM h2.test.employee WHERE dept > 0" + + " GROUP BY DePt") checkFiltersRemoved(df) checkAggregateRemoved(df) checkPushedInfo(df, "PushedAggregates: [MAX(SALARY), AVG(BONUS)], " + @@ -749,7 +749,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with aggregate push-down: MAX AVG with filter without group by") { -val df = sql("select MAX(ID), AVG(ID) FROM h2.test.people where id > 0") +val df = sql("SELECT MAX(ID), AVG(ID) FROM h2.test.people WHERE id > 0") checkFiltersRemoved(df) checkAggregateRemoved(df) checkPushedInfo(df, "PushedAggregates: [MAX(ID), AVG(ID)], " + @@ -776,7 +776,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel }
[spark] branch branch-3.3 updated: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 376c14ac8cf [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py 376c14ac8cf is described below commit 376c14ac8cfb6d51c29755b5ee951e5e41981a1a Author: Hyukjin Kwon AuthorDate: Wed Jun 8 17:14:29 2022 +0900 [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py This PR proposes to address type hints `__version__: str` correctly in each release. The type hint was added from Spark 3.3.0 at https://github.com/apache/spark/commit/f59e1d548e2e7c97195697910c40c5383a76ca48. For PySpark to have the correct version in releases. No, dev-only. Manually tested by setting environment variables and running the changed shall commands locally. Closes #36803 from HyukjinKwon/SPARK-39411. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 87b0a41cfb46ba9389c6f5abb9628415a72c4f93) Signed-off-by: Hyukjin Kwon --- dev/create-release/release-build.sh | 7 ++- dev/create-release/release-tag.sh | 13 ++--- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh index 78fd06ba2be..ddeb4d322ce 100755 --- a/dev/create-release/release-build.sh +++ b/dev/create-release/release-build.sh @@ -265,7 +265,12 @@ if [[ "$1" == "package" ]]; then # Write out the VERSION to PySpark version info we rewrite the - into a . and SNAPSHOT # to dev0 to be closer to PEP440. PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"` -echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py + +if [[ $SPARK_VERSION == 3.0* ]] || [[ $SPARK_VERSION == 3.1* ]] || [[ $SPARK_VERSION == 3.2* ]]; then + echo "__version__ = '$PYSPARK_VERSION'" > python/pyspark/version.py +else + echo "__version__: str = '$PYSPARK_VERSION'" > python/pyspark/version.py +fi # Get maven home set by MVN MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'` diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index 55aa2e569fc..255bda37ad8 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -85,7 +85,11 @@ fi sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' docs/_config.yml sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$RELEASE_VERSION"'/g' docs/_config.yml sed -i".tmp3" "s/'facetFilters':.*$/'facetFilters': [\"version:$RELEASE_VERSION\"]/g" docs/_config.yml -sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' python/pyspark/version.py +if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ $RELEASE_VERSION == 3.2* ]]; then + sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' python/pyspark/version.py +else + sed -i".tmp4" 's/__version__: str = .*$/__version__: str = "'"$RELEASE_VERSION"'"/' python/pyspark/version.py +fi git commit -a -m "Preparing Spark release $RELEASE_TAG" echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH" @@ -98,8 +102,11 @@ R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'` sed -i".tmp5" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION # Write out the R_NEXT_VERSION to PySpark version info we use dev0 instead of SNAPSHOT to be closer # to PEP440. -sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py - +if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ $RELEASE_VERSION == 3.2* ]]; then + sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py +else + sed -i".tmp6" 's/__version__: str = .*$/__version__: str = "'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py +fi # Update docs with next version sed -i".tmp7" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' docs/_config.yml - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (87b0a41cfb4 -> 12b7e61e16c)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 87b0a41cfb4 [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py add 12b7e61e16c [SPARK-39404][SS] Minor fix for querying `_metadata` in streaming No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 87b0a41cfb4 [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py 87b0a41cfb4 is described below commit 87b0a41cfb46ba9389c6f5abb9628415a72c4f93 Author: Hyukjin Kwon AuthorDate: Wed Jun 8 17:14:29 2022 +0900 [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py ### What changes were proposed in this pull request? This PR proposes to address type hints `__version__: str` correctly in each release. The type hint was added from Spark 3.3.0 at https://github.com/apache/spark/commit/f59e1d548e2e7c97195697910c40c5383a76ca48. ### Why are the changes needed? For PySpark to have the correct version in releases. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested by setting environment variables and running the changed shall commands locally. Closes #36803 from HyukjinKwon/SPARK-39411. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- dev/create-release/release-build.sh | 7 ++- dev/create-release/release-tag.sh | 13 ++--- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh index 78fd06ba2be..ddeb4d322ce 100755 --- a/dev/create-release/release-build.sh +++ b/dev/create-release/release-build.sh @@ -265,7 +265,12 @@ if [[ "$1" == "package" ]]; then # Write out the VERSION to PySpark version info we rewrite the - into a . and SNAPSHOT # to dev0 to be closer to PEP440. PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"` -echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py + +if [[ $SPARK_VERSION == 3.0* ]] || [[ $SPARK_VERSION == 3.1* ]] || [[ $SPARK_VERSION == 3.2* ]]; then + echo "__version__ = '$PYSPARK_VERSION'" > python/pyspark/version.py +else + echo "__version__: str = '$PYSPARK_VERSION'" > python/pyspark/version.py +fi # Get maven home set by MVN MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'` diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index 55aa2e569fc..255bda37ad8 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -85,7 +85,11 @@ fi sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' docs/_config.yml sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$RELEASE_VERSION"'/g' docs/_config.yml sed -i".tmp3" "s/'facetFilters':.*$/'facetFilters': [\"version:$RELEASE_VERSION\"]/g" docs/_config.yml -sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' python/pyspark/version.py +if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ $RELEASE_VERSION == 3.2* ]]; then + sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' python/pyspark/version.py +else + sed -i".tmp4" 's/__version__: str = .*$/__version__: str = "'"$RELEASE_VERSION"'"/' python/pyspark/version.py +fi git commit -a -m "Preparing Spark release $RELEASE_TAG" echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH" @@ -98,8 +102,11 @@ R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'` sed -i".tmp5" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION # Write out the R_NEXT_VERSION to PySpark version info we use dev0 instead of SNAPSHOT to be closer # to PEP440. -sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py - +if [[ $RELEASE_VERSION == 3.0* ]] || [[ $RELEASE_VERSION == 3.1* ]] || [[ $RELEASE_VERSION == 3.2* ]]; then + sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py +else + sed -i".tmp6" 's/__version__: str = .*$/__version__: str = "'"$R_NEXT_VERSION.dev0"'"/' python/pyspark/version.py +fi # Update docs with next version sed -i".tmp7" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' docs/_config.yml - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 25f38b12c06 [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2' 25f38b12c06 is described below commit 25f38b12c06daa108f2367e5244a5053e281df21 Author: Hyukjin Kwon AuthorDate: Wed Jun 8 17:13:35 2022 +0900 [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2' ### What changes were proposed in this pull request? This PR fixes the test to make `CastWithAnsiOffSuite` properly respect `ansiEnabled` in `cast string to date #2` test by using `CastWithAnsiOffSuite.cast` instead of `Cast` expression. ### Why are the changes needed? To make the tests pass. Currently it fails when ANSI mode is on: https://github.com/apache/spark/runs/6786744647 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested in my IDE. Closes #36802 from HyukjinKwon/SPARK-39321-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala index 4e4bc096dea..56e586da2a3 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala @@ -65,11 +65,11 @@ class CastWithAnsiOffSuite extends CastSuiteBase { } test("cast string to date #2") { -checkEvaluation(Cast(Literal("2015-03-18X"), DateType), null) -checkEvaluation(Cast(Literal("2015/03/18"), DateType), null) -checkEvaluation(Cast(Literal("2015.03.18"), DateType), null) -checkEvaluation(Cast(Literal("20150318"), DateType), null) -checkEvaluation(Cast(Literal("2015-031-8"), DateType), null) +checkEvaluation(cast(Literal("2015-03-18X"), DateType), null) +checkEvaluation(cast(Literal("2015/03/18"), DateType), null) +checkEvaluation(cast(Literal("2015.03.18"), DateType), null) +checkEvaluation(cast(Literal("20150318"), DateType), null) +checkEvaluation(cast(Literal("2015-031-8"), DateType), null) } test("casting to fixed-precision decimals") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 54aabb012e8 [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties 54aabb012e8 is described below commit 54aabb012e85b5c46773b57960d57de580fa8bba Author: Daniel Tenedorio AuthorDate: Wed Jun 8 15:56:13 2022 +0900 [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties ### What changes were proposed in this pull request? Add a flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties. ### Why are the changes needed? This lets Spark users control how the new behavior rolls out to users. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR extends unit test coverage. Closes #36799 from dtenedor/desc-namespace-breaking-change. Authored-by: Daniel Tenedorio Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/internal/SQLConf.scala| 10 .../apache/spark/sql/execution/command/ddl.scala | 2 + .../datasources/v2/DescribeNamespaceExec.scala | 3 + .../command/v2/DescribeNamespaceSuite.scala| 64 +++--- 4 files changed, 58 insertions(+), 21 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 8c7702efd47..4b0d110b077 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -3818,6 +3818,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val LEGACY_DESC_NAMESPACE_REDACT_PROPERTIES = +buildConf("spark.sql.legacy.descNamespaceRedactProperties") + .internal() + .doc("When set to false, redact sensitive information in the result of DESC NAMESPACE " + +"EXTENDED. If set to true, it restores the legacy behavior that this sensitive " + +"information was included in the output.") + .version("3.4.0") + .booleanConf + .createWithDefault(false) + /** * Holds information about keys that have been deprecated. * diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala index 5cdcf33d6cd..19b737d7d80 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala @@ -190,6 +190,8 @@ case class DescribeDatabaseCommand( val propertiesStr = if (properties.isEmpty) { "" +} else if (SQLConf.get.getConf(SQLConf.LEGACY_DESC_NAMESPACE_REDACT_PROPERTIES)) { + properties.toSeq.sortBy(_._1).mkString("(", ", ", ")") } else { conf.redactOptions(properties).toSeq.sortBy(_._1).mkString("(", ", ", ")") } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala index 75c12ea4201..950511e16c8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala @@ -23,6 +23,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.connector.catalog.{CatalogV2Util, SupportsNamespaces} +import org.apache.spark.sql.internal.SQLConf /** * Physical plan node for describing a namespace. @@ -48,6 +49,8 @@ case class DescribeNamespaceExec( val propertiesStr = if (properties.isEmpty) { "" +} else if (SQLConf.get.getConf(SQLConf.LEGACY_DESC_NAMESPACE_REDACT_PROPERTIES)) { + properties.toSeq.sortBy(_._1).mkString("(", ", ", ")") } else { conf.redactOptions(properties.toMap).toSeq.sortBy(_._1).mkString("(", ", ", ")") } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala index 645399b9026..3f1108f379e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala @@
[spark] branch branch-3.3 updated (86f1b6bfe39 -> 3a952933c34)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git from 86f1b6bfe39 [SPARK-39394][DOCS][SS][3.3] Improve PySpark Structured Streaming page more readable add 3a952933c34 [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 10 ++-- .../org/apache/spark/SparkThrowableSuite.scala | 3 +- .../spark/sql/errors/QueryExecutionErrors.scala| 7 ++- .../resources/sql-tests/results/ansi/array.sql.out | 8 +-- .../resources/sql-tests/results/ansi/cast.sql.out | 68 +++--- .../resources/sql-tests/results/ansi/date.sql.out | 6 +- .../results/ansi/datetime-parsing-invalid.sql.out | 4 +- .../sql-tests/results/ansi/interval.sql.out| 32 +- .../resources/sql-tests/results/ansi/map.sql.out | 8 +-- .../results/ansi/string-functions.sql.out | 8 +-- .../resources/sql-tests/results/interval.sql.out | 12 ++-- .../sql-tests/results/postgreSQL/boolean.sql.out | 32 +- .../sql-tests/results/postgreSQL/float4.sql.out| 14 ++--- .../sql-tests/results/postgreSQL/float8.sql.out| 10 ++-- .../sql-tests/results/postgreSQL/int8.sql.out | 14 ++--- .../results/postgreSQL/select_having.sql.out | 2 +- .../sql-tests/results/postgreSQL/text.sql.out | 4 +- .../results/postgreSQL/window_part2.sql.out| 2 +- .../results/postgreSQL/window_part3.sql.out| 2 +- .../results/postgreSQL/window_part4.sql.out| 2 +- .../results/timestampNTZ/timestamp-ansi.sql.out| 2 +- .../udf/postgreSQL/udf-select_having.sql.out | 2 +- 22 files changed, 128 insertions(+), 124 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org