[spark] branch master updated: [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 24293cab2de [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled` 24293cab2de is described below commit 24293cab2de06a50ffd9f4871073e75481665bb8 Author: Max Gekk AuthorDate: Tue Aug 22 15:32:32 2023 +0300 [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled` ### What changes were proposed in this pull request? Add s" to the doc of the SQL config `spark.sql.optimizer.decorrelateJoinPredicate.enabled`. ### Why are the changes needed? To output the desired config name. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42607 from MaxGekk/followup-agubichev_spark-43780-corr-predicate. Authored-by: Max Gekk Signed-off-by: Max Gekk --- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 9b421251cf6..ca155683ec0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -4363,7 +4363,7 @@ object SQLConf { .internal() .doc("Decorrelate scalar and lateral subqueries with correlated references in join " + "predicates. This configuration is only effective when " + -"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.") +s"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.") .version("4.0.0") .booleanConf .createWithDefault(true) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 295c615b16b [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278] 295c615b16b is described below commit 295c615b16b8a77f242ffa99006b4fb95f8f3487 Author: panbingkun AuthorDate: Sat Aug 12 12:22:28 2023 +0500 [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278] ### What changes were proposed in this pull request? The pr aims to assign names to the error class, include: - _LEGACY_ERROR_TEMP_1009 => VIEW_EXCEED_MAX_NESTED_DEPTH - _LEGACY_ERROR_TEMP_1010 => UNSUPPORTED_VIEW_OPERATION.WITHOUT_SUGGESTION - _LEGACY_ERROR_TEMP_1013 => UNSUPPORTED_VIEW_OPERATION.WITH_SUGGESTION / UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION - _LEGACY_ERROR_TEMP_1014 => UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION - _LEGACY_ERROR_TEMP_1015 => UNSUPPORTED_TABLE_OPERATION.WITH_SUGGESTION - _LEGACY_ERROR_TEMP_1016 => UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION - _LEGACY_ERROR_TEMP_1278 => UNSUPPORTED_TABLE_OPERATION.WITHOUT_SUGGESTION ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. - Update UT. Closes #42109 from panbingkun/SPARK-44404. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- R/pkg/tests/fulltests/test_sparkSQL.R | 3 +- .../src/main/resources/error/error-classes.json| 91 --- ...ions-unsupported-table-operation-error-class.md | 36 +++ ...-unsupported-temp-view-operation-error-class.md | 36 +++ ...tions-unsupported-view-operation-error-class.md | 36 +++ docs/sql-error-conditions.md | 30 +++ .../spark/sql/catalyst/analysis/Analyzer.scala | 9 +- .../sql/catalyst/analysis/v2ResolutionPlans.scala | 4 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 32 ++- .../spark/sql/errors/QueryCompilationErrors.scala | 90 --- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 104 .../apache/spark/sql/execution/command/views.scala | 2 +- .../apache/spark/sql/internal/CatalogImpl.scala| 2 +- .../analyzer-results/change-column.sql.out | 16 +- .../sql-tests/results/change-column.sql.out| 16 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 7 +- .../apache/spark/sql/execution/SQLViewSuite.scala | 267 ++--- .../spark/sql/execution/SQLViewTestSuite.scala | 23 +- .../AlterTableAddPartitionParserSuite.scala| 4 +- .../AlterTableDropPartitionParserSuite.scala | 8 +- .../AlterTableRecoverPartitionsParserSuite.scala | 8 +- .../AlterTableRenamePartitionParserSuite.scala | 4 +- .../command/AlterTableSetLocationParserSuite.scala | 6 +- .../command/AlterTableSetSerdeParserSuite.scala| 16 +- .../spark/sql/execution/command/DDLSuite.scala | 36 ++- .../command/MsckRepairTableParserSuite.scala | 13 +- .../command/ShowPartitionsParserSuite.scala| 10 +- .../command/TruncateTableParserSuite.scala | 6 +- .../execution/command/TruncateTableSuiteBase.scala | 45 +++- .../execution/command/v1/ShowPartitionsSuite.scala | 57 - .../apache/spark/sql/internal/CatalogSuite.scala | 13 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 94 +++- 32 files changed, 717 insertions(+), 407 deletions(-) diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index d61501d248a..47688d7560c 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -4193,8 +4193,7 @@ test_that("catalog APIs, listTables, getTable, listColumns, listFunctions, funct # recoverPartitions does not work with temporary view expect_error(recoverPartitions("cars"), - paste("Error in recoverPartitions : analysis error - cars is a temp view.", - "'recoverPartitions()' expects a table"), fixed = TRUE) + "[UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION]*`cars`*") expect_error(refreshTable("cars"), NA) expect_error(refreshByPath("/"), NA) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 133c2dd826c..08f79bcecbb 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -3394,12 +3394,63 @
[spark] branch master updated: [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b9fc5c03ed6 [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF` b9fc5c03ed6 is described below commit b9fc5c03ed69e91d9c4cbe7ff5a1522c7b849568 Author: Max Gekk AuthorDate: Sat Aug 12 11:08:39 2023 +0500 [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF` ### What changes were proposed in this pull request? In the PR, I propose to extend the rules of `primaryExpression` in `SqlBaseParser.g4`, and one more function `TIMEDIFF` which accepts 3-args in the same way as the existing expressions `TIMESTAMPDIFF`. ### Why are the changes needed? To achieve feature parity w/ other system and make the migration to Spark SQL from such systems easier: 1. Snowflake: https://docs.snowflake.com/en/sql-reference/functions/timediff 2. MySQL/MariaDB: https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timediff ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the existing test suites: ``` $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" ``` Closes #42435 from MaxGekk/timediff. Authored-by: Max Gekk Signed-off-by: Max Gekk --- docs/sql-ref-ansi-compliance.md| 1 + .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 1 + .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 4 +- .../analyzer-results/ansi/timestamp.sql.out| 68 ++ .../analyzer-results/datetime-legacy.sql.out | 68 ++ .../sql-tests/analyzer-results/timestamp.sql.out | 68 ++ .../timestampNTZ/timestamp-ansi.sql.out| 70 +++ .../timestampNTZ/timestamp.sql.out | 70 +++ .../test/resources/sql-tests/inputs/timestamp.sql | 8 +++ .../sql-tests/results/ansi/keywords.sql.out| 1 + .../sql-tests/results/ansi/timestamp.sql.out | 80 ++ .../sql-tests/results/datetime-legacy.sql.out | 80 ++ .../resources/sql-tests/results/keywords.sql.out | 1 + .../resources/sql-tests/results/timestamp.sql.out | 80 ++ .../results/timestampNTZ/timestamp-ansi.sql.out| 80 ++ .../results/timestampNTZ/timestamp.sql.out | 80 ++ .../ThriftServerWithSparkContextSuite.scala| 2 +- 17 files changed, 760 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index f3a0e8f9afb..09c38a00995 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -636,6 +636,7 @@ Below is a list of all the keywords in Spark SQL. |TERMINATED|non-reserved|non-reserved|non-reserved| |THEN|reserved|non-reserved|reserved| |TIME|reserved|non-reserved|reserved| +|TIMEDIFF|non-reserved|non-reserved|non-reserved| |TIMESTAMP|non-reserved|non-reserved|non-reserved| |TIMESTAMP_LTZ|non-reserved|non-reserved|non-reserved| |TIMESTAMP_NTZ|non-reserved|non-reserved|non-reserved| diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 index bf6370575a1..d9128de0f5d 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 @@ -373,6 +373,7 @@ TEMPORARY: 'TEMPORARY' | 'TEMP'; TERMINATED: 'TERMINATED'; THEN: 'THEN'; TIME: 'TIME'; +TIMEDIFF: 'TIMEDIFF'; TIMESTAMP: 'TIMESTAMP'; TIMESTAMP_LTZ: 'TIMESTAMP_LTZ'; TIMESTAMP_NTZ: 'TIMESTAMP_NTZ'; diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 index a45ebee3106..7a69b10dadb 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 @@ -953,7 +953,7 @@ datetimeUnit primaryExpression : name=(CURRENT_DATE | CURRENT_TIMESTAMP | CURRENT_USER | USER) #currentLike | name=(TIMESTAMPADD | DATEADD | DATE_ADD) LEFT_PAREN (unit=datetimeUnit | invalidUnit=stringLit) COMMA unitsAmount=valueExpression COMMA timestamp=valueExpression RIGHT_PAREN #timestampadd -| name=(TIMESTAMPDIFF | DATEDIFF | DATE_DIFF) LEFT_PAREN (unit=datetimeUnit | invalidUnit=stringLit) COMMA startTimestamp=valueExpression COMMA endTimestamp=valueExpression RIGHT_PAREN #timestampdiff +| name=(TIMESTAMPDIFF
[spark] branch master updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f7879b4c250 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` f7879b4c250 is described below commit f7879b4c2500046cd7d889ba94adedd3000f8c41 Author: Max Gekk AuthorDate: Tue Aug 8 13:26:19 2023 +0500 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` ### What changes were proposed in this pull request? In the PR, I propose to check that `DEFAULT` clause contains a parameter. If so, raise appropriate error about the feature is not supported. Currently, table creation with `DEFAULT` containing any parameters finishes successfully even parameters are not supported in such case: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() ++ || ++ ++ scala> spark.sql("describe t12"); org.apache.spark.sql.AnalysisException: [INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT command because the destination table column `c1` has a DEFAULT value :parm, which fails to resolve as a valid expression. ``` ### Why are the changes needed? This improves user experience with Spark SQL by saying about the root cause of the issue. ### Does this PR introduce _any_ user-facing change? Yes. After the change, the table creation completes w/ the error: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() org.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32) == SQL == CREATE TABLE t12(c1 int default :parm) ^^^ ``` ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *ParametersSuite" ``` Closes #42365 from MaxGekk/fix-param-in-DEFAULT. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 1b9dda51bf0..0635e6a1b44 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.trees.CurrentOrigin +import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER import org.apache.spark.sql.catalyst.types.DataTypeUtils import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, GeneratedColumn, IntervalUtils, ResolveDefaultColumns} import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, stringToTimestamp, stringToTimestampWithoutTimeZone} @@ -3153,9 +3154,12 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { ctx.asScala.headOption.map(visitLocationSpec) } - private def verifyAndGetExpression(exprCtx: ExpressionContext): String = { + private def verifyAndGetExpression(exprCtx: ExpressionContext, place: String): String = { // Make sure it can be converted to Catalyst expressions. -expression(exprCtx) +val expr = expression(exprCtx) +if (expr.containsPattern(PARAMETER)) { + throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin) +} // Extract the raw expression text so that we can save the user provided text. We don't // use `Expression.sql` to avoid storing incorrect text caused by bugs in any expression's // `sql` method. Note: `exprCtx.getText` returns a string without spaces, so we need to @@ -3170,7 +3174,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ override def visitDefaultExpression(ctx: DefaultExpressionContext): String = withOrigin(ctx) { - verifyAndGetExpression(ctx.expression()) + verifyAndGetExpression(ctx.expression(), "DEFAULT") } /** @@ -3178,7 +3182,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ override def v
[spark] branch branch-3.5 updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new b623c28f521 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` b623c28f521 is described below commit b623c28f521e350b0f4bf15bfb911ca6bf0b1a80 Author: Max Gekk AuthorDate: Tue Aug 8 13:26:19 2023 +0500 [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT` ### What changes were proposed in this pull request? In the PR, I propose to check that `DEFAULT` clause contains a parameter. If so, raise appropriate error about the feature is not supported. Currently, table creation with `DEFAULT` containing any parameters finishes successfully even parameters are not supported in such case: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() ++ || ++ ++ scala> spark.sql("describe t12"); org.apache.spark.sql.AnalysisException: [INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT command because the destination table column `c1` has a DEFAULT value :parm, which fails to resolve as a valid expression. ``` ### Why are the changes needed? This improves user experience with Spark SQL by saying about the root cause of the issue. ### Does this PR introduce _any_ user-facing change? Yes. After the change, the table creation completes w/ the error: ```sql scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = Map("parm" -> 5)).show() org.apache.spark.sql.catalyst.parser.ParseException: [UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32) == SQL == CREATE TABLE t12(c1 int default :parm) ^^^ ``` ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *ParametersSuite" ``` Closes #42365 from MaxGekk/fix-param-in-DEFAULT. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit f7879b4c2500046cd7d889ba94adedd3000f8c41) Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 7a28efa3e42..83938632e53 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.trees.CurrentOrigin +import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER import org.apache.spark.sql.catalyst.types.DataTypeUtils import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, GeneratedColumn, IntervalUtils, ResolveDefaultColumns} import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, stringToTimestamp, stringToTimestampWithoutTimeZone} @@ -3130,9 +3131,12 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { ctx.asScala.headOption.map(visitLocationSpec) } - private def verifyAndGetExpression(exprCtx: ExpressionContext): String = { + private def verifyAndGetExpression(exprCtx: ExpressionContext, place: String): String = { // Make sure it can be converted to Catalyst expressions. -expression(exprCtx) +val expr = expression(exprCtx) +if (expr.containsPattern(PARAMETER)) { + throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin) +} // Extract the raw expression text so that we can save the user provided text. We don't // use `Expression.sql` to avoid storing incorrect text caused by bugs in any expression's // `sql` method. Note: `exprCtx.getText` returns a string without spaces, so we need to @@ -3147,7 +3151,7 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { */ override def visitDefaultExpression(ctx: DefaultExpressionContext): String = withOrigin(ctx) { - verifyAndGetExpression(ctx.expression()) + verifyAndGetExpression(ctx.expression(), "DEFAULT") } /** @@ -3155,7 +3
[spark] branch master updated: [SPARK-38475][CORE] Use error class in org.apache.spark.serializer
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2a23c7a18a0 [SPARK-38475][CORE] Use error class in org.apache.spark.serializer 2a23c7a18a0 is described below commit 2a23c7a18a0ba75d95ee1d898896a8f0dc2c5531 Author: Bo Zhang AuthorDate: Mon Aug 7 22:10:01 2023 +0500 [SPARK-38475][CORE] Use error class in org.apache.spark.serializer ### What changes were proposed in this pull request? This PR aims to change exceptions created in package org.apache.spark.serializer to use error class. ### Why are the changes needed? This is to move exceptions created in package org.apache.spark.serializer to error class. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #42243 from bozhang2820/spark-38475. Lead-authored-by: Bo Zhang Co-authored-by: Bo Zhang Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 21 + .../spark/serializer/GenericAvroSerializer.scala | 6 ++--- .../apache/spark/serializer/KryoSerializer.scala | 27 -- docs/sql-error-conditions.md | 24 +++ 4 files changed, 68 insertions(+), 10 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 680f787429c..0ea1eed35e4 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -831,6 +831,11 @@ "Not found an encoder of the type to Spark SQL internal representation. Consider to change the input type to one of supported at '/sql-ref-datatypes.html'." ] }, + "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT" : { +"message" : [ + "Error reading avro data -- encountered an unknown fingerprint: , not sure what schema to use. This could happen if you registered additional schemas after starting your spark context." +] + }, "EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE" : { "message" : [ "The event time has the invalid type , but expected \"TIMESTAMP\"." @@ -864,6 +869,11 @@ ], "sqlState" : "22018" }, + "FAILED_REGISTER_CLASS_WITH_KRYO" : { +"message" : [ + "Failed to register classes with Kryo." +] + }, "FAILED_RENAME_PATH" : { "message" : [ "Failed to rename to as destination already exists." @@ -1564,6 +1574,12 @@ ], "sqlState" : "22032" }, + "INVALID_KRYO_SERIALIZER_BUFFER_SIZE" : { +"message" : [ + "The value of the config \"\" must be less than 2048 MiB, but got MiB." +], +"sqlState" : "F" + }, "INVALID_LAMBDA_FUNCTION_CALL" : { "message" : [ "Invalid lambda function call." @@ -2006,6 +2022,11 @@ "The join condition has the invalid type , expected \"BOOLEAN\"." ] }, + "KRYO_BUFFER_OVERFLOW" : { +"message" : [ + "Kryo serialization failed: . To avoid this, increase \"\" value." +] + }, "LOAD_DATA_PATH_NOT_EXISTS" : { "message" : [ "LOAD DATA input path does not exist: ." diff --git a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala index 7d2923fdf37..d09abff2773 100644 --- a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala +++ b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala @@ -140,9 +140,9 @@ private[serializer] class GenericAvroSerializer[D <: GenericContainer] case Some(s) => new Schema.Parser().setValidateDefaults(false).parse(s) case None => throw new SparkException( -"Error reading attempting to read avro data -- encountered an unknown " + - s"fingerprint: $fingerprint, not sure what schema to use. This could happen " + - "if you registered additional schemas after starting your spark context.") +errorClass = "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT", +messageParameters = Map("fingerprint" -> fingerprint.toString), +cause = null) } }) } else { diff --git a/core/src/main/scala/org/apache/spark/serializer/KryoSe
[spark] branch master updated (1f10cc4a594 -> f139733b92d)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic add f139733b92d [SPARK-42321][SQL] Assign name to _LEGACY_ERROR_TEMP_2133 No new revisions were added by this update. Summary of changes: .../utils/src/main/resources/error/error-classes.json | 10 +- ...ditions-malformed-record-in-parsing-error-class.md | 4 .../spark/sql/catalyst/json/JacksonParser.scala | 8 .../spark/sql/catalyst/util/BadRecordException.scala | 9 + .../spark/sql/catalyst/util/FailureSafeParser.scala | 3 +++ .../spark/sql/errors/QueryExecutionErrors.scala | 19 --- .../spark/sql/errors/QueryExecutionErrorsSuite.scala | 17 + 7 files changed, 54 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic 1f10cc4a594 is described below commit 1f10cc4a59457ed0de0fd4dc0a1c61514d77261a Author: panbingkun AuthorDate: Mon Aug 7 12:01:47 2023 +0500 [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic ### What changes were proposed in this pull request? The pr aims to clear some unused codes in "***Errors" and extract some common logic. ### Why are the changes needed? Make code clear. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #42238 from panbingkun/clear_error. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../apache/spark/sql/errors/DataTypeErrors.scala | 18 ++--- .../apache/spark/sql/errors/QueryErrorsBase.scala | 6 +- .../spark/sql/errors/QueryExecutionErrors.scala| 86 -- 3 files changed, 10 insertions(+), 100 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala index 7a34a386cd8..5e52e283338 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala @@ -192,15 +192,7 @@ private[sql] object DataTypeErrors extends DataTypeErrorsBase { decimalPrecision: Int, decimalScale: Int, context: SQLQueryContext = null): ArithmeticException = { -new SparkArithmeticException( - errorClass = "NUMERIC_VALUE_OUT_OF_RANGE", - messageParameters = Map( -"value" -> value.toPlainString, -"precision" -> decimalPrecision.toString, -"scale" -> decimalScale.toString, -"config" -> toSQLConf("spark.sql.ansi.enabled")), - context = getQueryContext(context), - summary = getSummary(context)) +numericValueOutOfRange(value, decimalPrecision, decimalScale, context) } def cannotChangeDecimalPrecisionError( @@ -208,6 +200,14 @@ private[sql] object DataTypeErrors extends DataTypeErrorsBase { decimalPrecision: Int, decimalScale: Int, context: SQLQueryContext = null): ArithmeticException = { +numericValueOutOfRange(value, decimalPrecision, decimalScale, context) + } + + private def numericValueOutOfRange( + value: Decimal, + decimalPrecision: Int, + decimalScale: Int, + context: SQLQueryContext): ArithmeticException = { new SparkArithmeticException( errorClass = "NUMERIC_VALUE_OUT_OF_RANGE", messageParameters = Map( diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala index db256fbee87..26600117a0c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.errors import org.apache.spark.sql.catalyst.expressions.{Expression, Literal} -import org.apache.spark.sql.catalyst.util.{toPrettySQL, QuotingUtils} +import org.apache.spark.sql.catalyst.util.toPrettySQL import org.apache.spark.sql.types.{DataType, DoubleType, FloatType} /** @@ -55,10 +55,6 @@ private[sql] trait QueryErrorsBase extends DataTypeErrorsBase { quoteByDefault(toPrettySQL(e)) } - def toSQLSchema(schema: String): String = { -QuotingUtils.toSQLSchema(schema) - } - // Converts an error class parameter to its SQL representation def toSQLValue(v: Any, t: DataType): String = Literal.create(v, t) match { case Literal(null, _) => "NULL" diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 45b5d6b6692..f960a091ec0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -32,7 +32,6 @@ import org.apache.spark._ import org.apache.spark.launcher.SparkLauncher import org.apache.spark.memory.SparkOutOfMemoryError import org.apache.spark.sql.AnalysisException -import org.apache.spark.sql.catalyst.ScalaReflection.Schema import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.analysis.UnresolvedGenerator import org.ap
[spark] branch branch-3.5 updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new a1ca1e6e763 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` a1ca1e6e763 is described below commit a1ca1e6e7633c3fbb36427a82635cda7d21f1dab Author: Koray Beyaz AuthorDate: Thu Aug 3 10:57:26 2023 +0500 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` ### What changes were proposed in this pull request? - Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND - Add a test case for the error class. ### Why are the changes needed? We are migrating onto error classes ### Does this PR introduce _any_ user-facing change? Yes, the error message will include the error class name ### How was this patch tested? `testOnly *RuleIdCollectionSuite` and Github Actions Closes #40991 from kori73/SPARK-42330. Lead-authored-by: Koray Beyaz Co-authored-by: Koray Beyaz Signed-off-by: Max Gekk (cherry picked from commit f824d058b14e3c58b1c90f64fefc45fac105c7dd) Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 11 ++- docs/sql-error-conditions.md | 6 ++ .../org/apache/spark/sql/errors/QueryExecutionErrors.scala| 5 ++--- .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 11 +++ 4 files changed, 25 insertions(+), 8 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index df425d7b2df..d9d1963c958 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2412,6 +2412,12 @@ ], "sqlState" : "42883" }, + "RULE_ID_NOT_FOUND" : { +"message" : [ + "Not found an id for the rule name \"\". Please modify RuleIdCollection.scala if you are adding a new rule." +], +"sqlState" : "22023" + }, "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : { "message" : [ "The correlated scalar subquery '' is neither present in GROUP BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or wrap it in `first()` (or `first_value`) if you don't care which value you get." @@ -5425,11 +5431,6 @@ "." ] }, - "_LEGACY_ERROR_TEMP_2175" : { -"message" : [ - "Rule id not found for . Please modify RuleIdCollection.scala if you are adding a new rule." -] - }, "_LEGACY_ERROR_TEMP_2176" : { "message" : [ "Cannot create array with elements of data due to exceeding the limit elements for ArrayData. " diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 9e2a484d057..e1430e94db5 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1578,6 +1578,12 @@ The function `` cannot be found. Verify the spelling and correctnes If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP FUNCTION IF EXISTS. +### RULE_ID_NOT_FOUND + +[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception) + +Not found an id for the rule name "``". Please modify RuleIdCollection.scala if you are adding a new rule. + ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION SQLSTATE: none assigned diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 89c080409e2..7685e0f907c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE def ruleIdNotFoundForRuleError(ruleName: String): Throwable = { new SparkException( - errorClass = "_LEGACY_ERROR_TEMP_2175", - messageParameters = Map( -"ruleName" -> ruleName), + errorClass = "RULE_ID_NOT_FOUND", + messageParameters = Map("ruleName" -> ruleName), cause = null) } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/
[spark] branch master updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f824d058b14 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` f824d058b14 is described below commit f824d058b14e3c58b1c90f64fefc45fac105c7dd Author: Koray Beyaz AuthorDate: Thu Aug 3 10:57:26 2023 +0500 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` ### What changes were proposed in this pull request? - Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND - Add a test case for the error class. ### Why are the changes needed? We are migrating onto error classes ### Does this PR introduce _any_ user-facing change? Yes, the error message will include the error class name ### How was this patch tested? `testOnly *RuleIdCollectionSuite` and Github Actions Closes #40991 from kori73/SPARK-42330. Lead-authored-by: Koray Beyaz Co-authored-by: Koray Beyaz Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 11 ++- docs/sql-error-conditions.md | 6 ++ .../org/apache/spark/sql/errors/QueryExecutionErrors.scala| 5 ++--- .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 11 +++ 4 files changed, 25 insertions(+), 8 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index a9619b97bd9..20f2ab4eb24 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2471,6 +2471,12 @@ ], "sqlState" : "42883" }, + "RULE_ID_NOT_FOUND" : { +"message" : [ + "Not found an id for the rule name \"\". Please modify RuleIdCollection.scala if you are adding a new rule." +], +"sqlState" : "22023" + }, "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : { "message" : [ "The correlated scalar subquery '' is neither present in GROUP BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or wrap it in `first()` (or `first_value`) if you don't care which value you get." @@ -5489,11 +5495,6 @@ "." ] }, - "_LEGACY_ERROR_TEMP_2175" : { -"message" : [ - "Rule id not found for . Please modify RuleIdCollection.scala if you are adding a new rule." -] - }, "_LEGACY_ERROR_TEMP_2176" : { "message" : [ "Cannot create array with elements of data due to exceeding the limit elements for ArrayData. " diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 161f3bdbef1..5609d60f974 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1586,6 +1586,12 @@ The function `` cannot be found. Verify the spelling and correctnes If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP FUNCTION IF EXISTS. +### RULE_ID_NOT_FOUND + +[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception) + +Not found an id for the rule name "``". Please modify RuleIdCollection.scala if you are adding a new rule. + ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION SQLSTATE: none assigned diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 3622ffebb74..45b5d6b6692 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionE def ruleIdNotFoundForRuleError(ruleName: String): Throwable = { new SparkException( - errorClass = "_LEGACY_ERROR_TEMP_2175", - messageParameters = Map( -"ruleName" -> ruleName), + errorClass = "RULE_ID_NOT_FOUND", + messageParameters = Map("ruleName" -> ruleName), cause = null) } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala index e70d04b7b5a..ae1c0a86a14 100644 --- a/sql/core/src/test/scala/org/apache/s
[spark] branch branch-3.5 updated: [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new c47d9b1bcf6 [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names c47d9b1bcf6 is described below commit c47d9b1bcf61f65a7078d43361b438fd56d0af81 Author: panbingkun AuthorDate: Wed Aug 2 10:51:16 2023 +0500 [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names ### What changes were proposed in this pull request? The pr aims to 1. Use `checkError()` to check Exception in `command` Suite. 2. Assign some error class names, include: `UNSUPPORTED_FEATURE.PURGE_PARTITION` and `UNSUPPORTED_FEATURE.PURGE_TABLE`. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #42169 from panbingkun/checkError_for_command. Authored-by: panbingkun Signed-off-by: Max Gekk (cherry picked from commit 4ec27c3801aaa0cbba3e086c278a0ff96260b84a) Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 10 ...r-conditions-unsupported-feature-error-class.md | 8 ++ .../catalog/SupportsAtomicPartitionManagement.java | 3 ++- .../catalog/SupportsPartitionManagement.java | 3 ++- .../spark/sql/connector/catalog/TableCatalog.java | 3 ++- .../spark/sql/errors/QueryExecutionErrors.scala| 12 + .../SupportsAtomicPartitionManagementSuite.scala | 13 ++ .../catalog/SupportsPartitionManagementSuite.scala | 13 ++ .../command/v1/AlterTableAddPartitionSuite.scala | 14 ++ .../command/v1/AlterTableDropPartitionSuite.scala | 12 + .../command/v1/AlterTableRenameSuite.scala | 11 +--- .../command/v1/AlterTableSetLocationSuite.scala| 11 +--- .../command/v1/ShowCreateTableSuite.scala | 12 + .../sql/execution/command/v1/ShowTablesSuite.scala | 22 ++-- .../execution/command/v1/TruncateTableSuite.scala | 11 +--- .../command/v2/AlterTableDropPartitionSuite.scala | 12 ++--- .../v2/AlterTableRecoverPartitionsSuite.scala | 11 +--- .../command/v2/AlterTableSetLocationSuite.scala| 12 + .../sql/execution/command/v2/DropTableSuite.scala | 12 ++--- .../command/v2/MsckRepairTableSuite.scala | 11 +--- .../sql/execution/command/v2/ShowTablesSuite.scala | 11 +--- .../execution/command/ShowCreateTableSuite.scala | 30 +- 22 files changed, 172 insertions(+), 85 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 385435c740e..480ec636283 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2956,6 +2956,16 @@ "Pivoting by the value '' of the column data type ." ] }, + "PURGE_PARTITION" : { +"message" : [ + "Partition purge." +] + }, + "PURGE_TABLE" : { +"message" : [ + "Purge table." +] + }, "PYTHON_UDF_IN_ON_CLAUSE" : { "message" : [ "Python UDF in the ON clause of a JOIN. In case of an INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause." diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md b/docs/sql-error-conditions-unsupported-feature-error-class.md index aa1c622c458..7a60dc76fa6 100644 --- a/docs/sql-error-conditions-unsupported-feature-error-class.md +++ b/docs/sql-error-conditions-unsupported-feature-error-class.md @@ -141,6 +141,14 @@ PIVOT clause following a GROUP BY clause. Consider pushing the GROUP BY into a s Pivoting by the value '``' of the column data type ``. +## PURGE_PARTITION + +Partition purge. + +## PURGE_TABLE + +Purge table. + ## PYTHON_UDF_IN_ON_CLAUSE Python UDF in the ON clause of a `` JOIN. In case of an INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause. diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java index 3eb9bf9f913..48c6392d2b8 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java @@ -23,6
[spark] branch master updated: [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4ec27c3801a [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names 4ec27c3801a is described below commit 4ec27c3801aaa0cbba3e086c278a0ff96260b84a Author: panbingkun AuthorDate: Wed Aug 2 10:51:16 2023 +0500 [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names ### What changes were proposed in this pull request? The pr aims to 1. Use `checkError()` to check Exception in `command` Suite. 2. Assign some error class names, include: `UNSUPPORTED_FEATURE.PURGE_PARTITION` and `UNSUPPORTED_FEATURE.PURGE_TABLE`. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #42169 from panbingkun/checkError_for_command. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 10 ...r-conditions-unsupported-feature-error-class.md | 8 ++ .../catalog/SupportsAtomicPartitionManagement.java | 3 ++- .../catalog/SupportsPartitionManagement.java | 3 ++- .../spark/sql/connector/catalog/TableCatalog.java | 3 ++- .../spark/sql/errors/QueryExecutionErrors.scala| 12 + .../SupportsAtomicPartitionManagementSuite.scala | 13 ++ .../catalog/SupportsPartitionManagementSuite.scala | 13 ++ .../command/v1/AlterTableAddPartitionSuite.scala | 14 ++ .../command/v1/AlterTableDropPartitionSuite.scala | 12 + .../command/v1/AlterTableRenameSuite.scala | 11 +--- .../command/v1/AlterTableSetLocationSuite.scala| 11 +--- .../command/v1/ShowCreateTableSuite.scala | 12 + .../sql/execution/command/v1/ShowTablesSuite.scala | 22 ++-- .../execution/command/v1/TruncateTableSuite.scala | 11 +--- .../command/v2/AlterTableDropPartitionSuite.scala | 12 ++--- .../v2/AlterTableRecoverPartitionsSuite.scala | 11 +--- .../command/v2/AlterTableSetLocationSuite.scala| 12 + .../sql/execution/command/v2/DropTableSuite.scala | 12 ++--- .../command/v2/MsckRepairTableSuite.scala | 11 +--- .../sql/execution/command/v2/ShowTablesSuite.scala | 11 +--- .../execution/command/ShowCreateTableSuite.scala | 30 +- 22 files changed, 172 insertions(+), 85 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 7012c66c895..06350522834 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -3020,6 +3020,16 @@ "Pivoting by the value '' of the column data type ." ] }, + "PURGE_PARTITION" : { +"message" : [ + "Partition purge." +] + }, + "PURGE_TABLE" : { +"message" : [ + "Purge table." +] + }, "PYTHON_UDF_IN_ON_CLAUSE" : { "message" : [ "Python UDF in the ON clause of a JOIN. In case of an INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause." diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md b/docs/sql-error-conditions-unsupported-feature-error-class.md index aa1c622c458..7a60dc76fa6 100644 --- a/docs/sql-error-conditions-unsupported-feature-error-class.md +++ b/docs/sql-error-conditions-unsupported-feature-error-class.md @@ -141,6 +141,14 @@ PIVOT clause following a GROUP BY clause. Consider pushing the GROUP BY into a s Pivoting by the value '``' of the column data type ``. +## PURGE_PARTITION + +Partition purge. + +## PURGE_TABLE + +Purge table. + ## PYTHON_UDF_IN_ON_CLAUSE Python UDF in the ON clause of a `` JOIN. In case of an INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause. diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java index 3eb9bf9f913..48c6392d2b8 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java @@ -23,6 +23,7 @@ import org.apache.spark.annotation.Experimental; import org.apache.spar
[spark] branch branch-3.4 updated: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 53383fcd2be [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike` 53383fcd2be is described below commit 53383fcd2be178f4f0d231334ee36f1c3d67f64d Author: Max Gekk AuthorDate: Fri Jul 14 08:37:29 2023 +0300 [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike` ### What changes were proposed in this pull request? In the PR, I propose to check the number of argument types in the `InvokeLike` expressions. If the input types are provided, the number of types should be exactly the same as the number of argument expressions. This is a backport of https://github.com/apache/spark/pull/41954. ### Why are the changes needed? 1. This PR checks the contract described in the comment explicitly: https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247 that can prevent the errors of expression implementations, and improve code maintainability. 2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the related tests: ``` $ build/sbt "test:testOnly *UrlFunctionsSuite" $ build/sbt "test:testOnly *DataSourceV2FunctionSuite" ``` Authored-by: Max Gekk (cherry picked from commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758) Closes #41985 from MaxGekk/fix-url_decode-3.4. Authored-by: Max Gekk Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 5 + .../sql-error-conditions-datatype-mismatch-error-class.md | 4 .../spark/sql/catalyst/analysis/CheckAnalysis.scala | 5 +++-- .../spark/sql/catalyst/expressions/objects/objects.scala | 15 +++ .../spark/sql/catalyst/expressions/urlExpressions.scala | 4 ++-- 5 files changed, 29 insertions(+), 4 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index febed9283d8..90dec2ee45e 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -468,6 +468,11 @@ "The must be between (current value = )." ] }, + "WRONG_NUM_ARG_TYPES" : { +"message" : [ + "The expression requires argument types but the actual number is ." +] + }, "WRONG_NUM_ENDPOINTS" : { "message" : [ "The number of endpoints must be >= 2 to construct intervals but the actual number is ." diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md b/docs/sql-error-conditions-datatype-mismatch-error-class.md index 6ccd63e6ee9..2178deca4f2 100644 --- a/docs/sql-error-conditions-datatype-mismatch-error-class.md +++ b/docs/sql-error-conditions-datatype-mismatch-error-class.md @@ -231,6 +231,10 @@ The input of `` can't be `` type data. The `` must be between `` (current value = ``). +## WRONG_NUM_ARG_TYPES + +The expression requires `` argument types but the actual number is ``. + ## WRONG_NUM_ENDPOINTS The number of endpoints must be >= 2 to construct intervals but the actual number is ``. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 223fdf12d6d..e717483ec94 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -288,8 +288,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB "srcType" -> c.child.dataType.catalogString, "targetType" -> c.dataType.catalogString)) case e: RuntimeReplaceable if !e.replacement.resolved => -throw new IllegalStateException("Illegal RuntimeReplaceable: " + e + - "\nReplacement is unresolved: " + e.replacement) +throw SparkException.internalError( + s"Cannot resolve the runtime replaceable expression ${toSQLExpr(e)}. " + + s"The replacement is unresolved: ${toSQLExpr(e.replacement)}.") case g: Grouping => g.failAnalysis(errorClass = "_LEGACY_ERROR_TEMP_2445", messageParameters = Map.empty) d
[spark] branch master updated: [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new efed39516c0 [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes efed39516c0 is described below commit efed39516c0c4e9654aec447ce91676026368384 Author: itholic AuthorDate: Thu Jul 13 17:21:29 2023 +0300 [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_1204, "INCOMPATIBLE_DATA_TO_TABLE" and its sub classes: - CANNOT_FIND_DATA - AMBIGUOUS_COLUMN_NAME - EXTRA_STRUCT_FIELDS - NULLABLE_COLUMN - NULLABLE_ARRAY_ELEMENTS - NULLABLE_MAP_VALUES - CANNOT_SAFELY_CAST - STRUCT_MISSING_FIELDS - UNEXPECTED_COLUMN_NAME ### Why are the changes needed? We should assign proper name to _LEGACY_ERROR_TEMP_* ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*` Closes #39937 from itholic/LEGACY_1204. Authored-by: itholic Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/README.md| 14 + .../src/main/resources/error/error-classes.json| 59 ++- docs/_data/menu-sql.yaml | 2 +- ...ions-incompatible-data-for-table-error-class.md | 64 +++ ...tions-incompatible-data-to-table-error-class.md | 64 +++ docs/sql-error-conditions.md | 8 + docs/sql-ref-ansi-compliance.md| 3 +- .../sql/catalyst/analysis/AssignmentUtils.scala| 5 +- .../catalyst/analysis/TableOutputResolver.scala| 97 +++-- .../spark/sql/catalyst/types/DataTypeUtils.scala | 59 +-- .../spark/sql/errors/QueryCompilationErrors.scala | 110 +- .../catalyst/analysis/V2WriteAnalysisSuite.scala | 267 ++--- .../types/DataTypeWriteCompatibilitySuite.scala| 429 - .../apache/spark/sql/DataFrameWriterV2Suite.scala | 39 +- .../org/apache/spark/sql/SQLInsertTestSuite.scala | 5 +- .../command/AlignMergeAssignmentsSuite.scala | 78 +++- .../command/AlignUpdateAssignmentsSuite.scala | 54 ++- .../org/apache/spark/sql/sources/InsertSuite.scala | 98 +++-- .../sql/test/DataFrameReaderWriterSuite.scala | 47 ++- .../spark/sql/hive/client/HiveClientSuite.scala| 22 +- 20 files changed, 1100 insertions(+), 424 deletions(-) diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md index 838991c2b6a..dfcb42d49e7 100644 --- a/common/utils/src/main/resources/error/README.md +++ b/common/utils/src/main/resources/error/README.md @@ -1294,6 +1294,20 @@ The following SQLSTATEs are collated from: |IM013|IM |ODBC driver |013 |Trace file error|SQL Server |N |SQL Server | |IM014|IM |ODBC driver |014 |Invalid name of File DSN|SQL Server |N |SQL Server | |IM015|IM |ODBC driver |015 |Corrupt file data source|SQL Server |N |SQL Server | +|KD000 |KD |datasource specific errors|000 |datasource specific errors |Databricks |N |Databricks | +|KD001 |KD |datasource specific errors|001 |Cannot read file footer |Databricks |N |Databricks | +|KD002 |KD |datasource specific errors|002 |Unexpected version |Databricks |N |Databricks | +|KD003 |KD |datasource specific errors|003 |Incorrect access to data type |Databricks |N |Databricks | +|KD004 |KD |datasource specific errors|004 |Delta protocol version error |Databricks |N
[spark] branch master updated: [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4dedb4ad2c9 [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite 4dedb4ad2c9 is described below commit 4dedb4ad2c9b2ecd75dd9ccec5f565805752ad8e Author: panbingkun AuthorDate: Thu Jul 13 16:26:34 2023 +0300 [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite ### What changes were proposed in this pull request? The pr aims to use `checkError()` to check `Exception` in `*View*Suite`, `*Namespace*Suite`, `*DataSource*Suite`, include: - sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite - sql/core/src/test/scala/org/apache/spark/sql/NestedDataSourceSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2FunctionSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite - sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite - sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowNamespacesSuite - sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite - sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite - sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite - sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite - sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterNamespaceSetLocationSuite ### Why are the changes needed? Migration on checkError() will make the tests independent from the text of error messages. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #41952 from panbingkun/view_and_namespace_checkerror. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../spark/sql/FileBasedDataSourceSuite.scala | 67 +++-- .../apache/spark/sql/NestedDataSourceSuite.scala | 24 +- .../sql/connector/DataSourceV2DataFrameSuite.scala | 21 +- .../sql/connector/DataSourceV2FunctionSuite.scala | 189 +++-- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 54 +++- .../spark/sql/connector/DataSourceV2Suite.scala| 38 ++- .../apache/spark/sql/execution/SQLViewSuite.scala | 313 ++--- .../execution/command/v2/ShowNamespacesSuite.scala | 28 +- .../sql/sources/ResolvedDataSourceSuite.scala | 24 +- .../sources/StreamingDataSourceV2Suite.scala | 68 +++-- .../spark/sql/hive/MetastoreDataSourcesSuite.scala | 88 +++--- .../sql/hive/execution/HiveSQLViewSuite.scala | 31 +- .../command/AlterNamespaceSetLocationSuite.scala | 11 +- 13 files changed, 671 insertions(+), 285 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala index e7e53285d62..d69a68f5726 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala @@ -26,7 +26,7 @@ import scala.collection.mutable import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{LocalFileSystem, Path} -import org.apache.spark.SparkException +import org.apache.spark.{SparkException, SparkFileNotFoundException, SparkRuntimeException} import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd} import org.apache.spark.sql.TestingUDT.{IntervalUDT, NullData, NullUDT} import org.apache.spark.sql.catalyst.expressions.{AttributeReference, GreaterThan, Literal} @@ -129,11 +129,13 @@ class FileBasedDataSourceSuite extends QueryTest allFileBasedDataSources.foreach { format => test(s"SPARK-23372 error while writing empty schema files using $format") { withTempPath { outputPath => -val errMsg = intercept[AnalysisException] { - spark.emptyDataFrame.write.format(format).save(outputPath.toString) -} -assert(errMsg.getMessage.contains( - "Datasource does not support writing empty or nested empty schemas")) +checkError( + exception = intercept[AnalysisException] { +spark.emptyDataFrame.write.format(format).save(outputPath.toString) + }, + errorClass = "_LEGACY_ERROR_TEMP_1142", + parameters = Map.empty +) } // Nested empt
[spark] branch master updated: [SPARK-44391][SQL] Check the number of argument types in `InvokeLike`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3e82ac6ea3d [SPARK-44391][SQL] Check the number of argument types in `InvokeLike` 3e82ac6ea3d is described below commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758 Author: Max Gekk AuthorDate: Thu Jul 13 12:17:20 2023 +0300 [SPARK-44391][SQL] Check the number of argument types in `InvokeLike` ### What changes were proposed in this pull request? In the PR, I propose to check the number of argument types in the `InvokeLike` expressions. If the input types are provided, the number of types should be exactly the same as the number of argument expressions. ### Why are the changes needed? 1. This PR checks the contract described in the comment explicitly: https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247 that can prevent the errors of expression implementations, and improve code maintainability. 2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the related tests: ``` $ build/sbt "test:testOnly *UrlFunctionsSuite" $ build/sbt "test:testOnly *DataSourceV2FunctionSuite" ``` Closes #41954 from MaxGekk/fix-url_decode. Authored-by: Max Gekk Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 5 + .../explain-results/function_url_decode.explain | 2 +- .../explain-results/function_url_encode.explain | 2 +- .../sql-error-conditions-datatype-mismatch-error-class.md | 4 .../spark/sql/catalyst/analysis/CheckAnalysis.scala | 5 +++-- .../spark/sql/catalyst/expressions/objects/objects.scala | 15 +++ .../spark/sql/catalyst/expressions/urlExpressions.scala | 4 ++-- 7 files changed, 31 insertions(+), 6 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 347ce026476..2c4d2b533a6 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -657,6 +657,11 @@ "The must be between (current value = )." ] }, + "WRONG_NUM_ARG_TYPES" : { +"message" : [ + "The expression requires argument types but the actual number is ." +] + }, "WRONG_NUM_ENDPOINTS" : { "message" : [ "The number of endpoints must be >= 2 to construct intervals but the actual number is ." diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain index 36b21e27c10..d612190396d 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain @@ -1,2 +1,2 @@ -Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, UTF-8, StringType, true, true, true) AS url_decode(g)#0] +Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, UTF-8, StringType, StringType, true, true, true) AS url_decode(g)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain index 70a0f628fc9..bd2c63e19c6 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain @@ -1,2 +1,2 @@ -Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, UTF-8, StringType, true, true, true) AS url_encode(g)#0] +Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, UTF-8, StringType, StringType, true, true, true) AS url_encode(g)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md b/docs/sql-error-conditions-datatype-mismatch-error-class.md index 3bd63925323..ddc3e0c2b1b 100644 --- a/docs/sql-error-
[spark] branch master updated: [SPARK-38476][CORE] Use error class in org.apache.spark.storage
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0f6a4a737ee [SPARK-38476][CORE] Use error class in org.apache.spark.storage 0f6a4a737ee is described below commit 0f6a4a737ee9457a0b0c336b7d079cdd878d20e8 Author: Bo Zhang AuthorDate: Tue Jul 11 13:06:52 2023 +0300 [SPARK-38476][CORE] Use error class in org.apache.spark.storage ### What changes were proposed in this pull request? This PR aims to change exceptions created in package org.apache.spark.shuffle to use error class. This also adds an error class INTERNAL_ERROR_STORAGE and uses that for the internal errors in the package. ### Why are the changes needed? This is to move exceptions created in package org.apache.spark.storage to error class. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated existing tests. Closes #41923 from bozhang2820/spark-38476. Authored-by: Bo Zhang Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 6 ++ .../org/apache/spark/storage/BlockInfoManager.scala | 7 --- .../scala/org/apache/spark/storage/BlockManager.scala| 4 ++-- .../org/apache/spark/storage/DiskBlockManager.scala | 10 +- .../org/apache/spark/storage/DiskBlockObjectWriter.scala | 4 +++- .../main/scala/org/apache/spark/storage/DiskStore.scala | 7 --- .../scala/org/apache/spark/storage/FallbackStorage.scala | 5 +++-- .../spark/storage/ShuffleBlockFetcherIterator.scala | 5 +++-- .../org/apache/spark/storage/memory/MemoryStore.scala| 16 ++-- .../org/apache/spark/storage/BlockInfoManagerSuite.scala | 4 ++-- .../spark/storage/DiskBlockObjectWriterSuite.scala | 4 ++-- .../spark/storage/PartiallySerializedBlockSuite.scala| 14 +++--- docs/sql-error-conditions.md | 6 ++ 13 files changed, 57 insertions(+), 35 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 66305c20112..347ce026476 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1089,6 +1089,12 @@ ], "sqlState" : "XX000" }, + "INTERNAL_ERROR_STORAGE" : { +"message" : [ + "" +], +"sqlState" : "XX000" + }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ "." diff --git a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala index fb532dd0736..45ebb6eafa6 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala @@ -29,7 +29,7 @@ import scala.reflect.ClassTag import com.google.common.collect.{ConcurrentHashMultiset, ImmutableMultiset} import com.google.common.util.concurrent.Striped -import org.apache.spark.TaskContext +import org.apache.spark.{SparkException, TaskContext} import org.apache.spark.errors.SparkCoreErrors import org.apache.spark.internal.Logging @@ -543,8 +543,9 @@ private[storage] class BlockInfoManager(trackingCacheVisibility: Boolean = false logTrace(s"Task $taskAttemptId trying to remove block $blockId") blockInfo(blockId) { (info, condition) => if (info.writerTask != taskAttemptId) { -throw new IllegalStateException( - s"Task $taskAttemptId called remove() on block $blockId without a write lock") +throw SparkException.internalError( + s"Task $taskAttemptId called remove() on block $blockId without a write lock", + category = "STORAGE") } else { invisibleRDDBlocks.synchronized { blockInfoWrappers.remove(blockId) diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala index b4453b4d35e..05d57c67576 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala @@ -1171,8 +1171,8 @@ private[spark] class BlockManager( val buf = blockTransferService.fetchBlockSync(loc.host, loc.port, loc.executorId, blockId.toString, tempFileManager) if (blockSize > 0 && buf.size() == 0) { - throw new IllegalStateException("Empty buffer received for non empty block " + -s"when fetching remote block $blockId from $loc") +
[spark] branch master updated: [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c5a23e9c23f [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277] c5a23e9c23f is described below commit c5a23e9c23f7bd7066060d0791f290ad38fca76f Author: panbingkun AuthorDate: Tue Jul 11 11:16:03 2023 +0300 [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277] ### What changes were proposed in this pull request? The pr aims to assign names to the error class, include: - _LEGACY_ERROR_TEMP_1067 => UNSUPPORTED_FEATURE.DROP_DATABASE - _LEGACY_ERROR_TEMP_1150 => UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE - _LEGACY_ERROR_TEMP_1220 => UNSUPPORTED_FEATURE.HIVE_TABLE_TYPE - _LEGACY_ERROR_TEMP_1265 => LOAD_DATA_PATH_NOT_EXISTS - _LEGACY_ERROR_TEMP_1277 => CREATE_VIEW_COLUMN_ARITY_MISMATCH.TOO_MANY_SOURCE_COLUMNS / CREATE_VIEW_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_SOURCE_COLUMNS ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update & Add new UT. - Manually test. - Pass GA. Closes #41909 from panbingkun/SPARK-44320. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 69 ++-- .../org/apache/spark/sql/avro/AvroSuite.scala | 34 +- ...reate-view-column-arity-mismatch-error-class.md | 40 ++ ...-error-conditions-invalid-format-error-class.md | 2 +- ...r-conditions-unsupported-feature-error-class.md | 8 + docs/sql-error-conditions.md | 20 + .../sql/catalyst/catalog/SessionCatalog.scala | 2 +- .../spark/sql/errors/QueryCompilationErrors.scala | 50 ++- .../apache/spark/sql/execution/command/views.scala | 12 +- .../spark/sql/FileBasedDataSourceSuite.scala | 456 + .../apache/spark/sql/execution/SQLViewSuite.scala | 27 +- .../spark/sql/execution/command/DDLSuite.scala | 7 +- .../execution/command/v1/DropNamespaceSuite.scala | 11 +- .../sql/execution/datasources/csv/CSVSuite.scala | 7 +- .../spark/sql/hive/client/HiveClientImpl.scala | 2 +- .../spark/sql/hive/client/HiveClientSuite.scala| 12 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 54 ++- .../spark/sql/hive/orc/HiveOrcSourceSuite.scala| 92 +++-- 18 files changed, 610 insertions(+), 295 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 9f0ed7ace3a..66305c20112 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -370,6 +370,28 @@ ], "sqlState" : "42710" }, + "CREATE_VIEW_COLUMN_ARITY_MISMATCH" : { +"message" : [ + "Cannot create view , the reason is" +], +"subClass" : { + "NOT_ENOUGH_DATA_COLUMNS" : { +"message" : [ + "not enough data columns:", + "View columns: .", + "Data columns: ." +] + }, + "TOO_MANY_DATA_COLUMNS" : { +"message" : [ + "too many data columns:", + "View columns: .", + "Data columns: ." +] + } +}, +"sqlState" : "21S01" + }, "DATATYPE_MISMATCH" : { "message" : [ "Cannot resolve due to data type mismatch:" @@ -1247,7 +1269,7 @@ }, "MISMATCH_INPUT" : { "message" : [ - "The input '' does not match the format." + "The input does not match the format." ] }, "THOUSANDS_SEPS_MUST_BEFORE_DEC" : { @@ -1772,6 +1794,11 @@ "The join condition has the invalid type , expected \"BOOLEAN\"." ] }, + "LOAD_DATA_PATH_NOT_EXISTS" : { +"message" : [ + "LOAD DATA input path does not exist: ." +] + }, "LOCAL_MUST_WITH_SCHEMA_FILE" : { "message" : [ "LOCAL must be used together with the schema of `file`, but got: ``." @@ -2544,6 +2571,11 @@ "The direct query on files does not support the data source type: . Please try a different data source type or consider using a different query method." ] }, + "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE" : { +&q
[spark] branch master updated (990affdd503 -> b7c6c846c08)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 990affdd503 [SPARK-44290][CONNECT][FOLLOW-UP] Skip flaky tests, and fix a typo in session UUID together add b7c6c846c08 [SPARK-44328][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328] No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 57 + ...-conditions-cannot-update-field-error-class.md} | 26 docs/sql-error-conditions.md | 14 ++-- .../sql/catalyst/analysis/CheckAnalysis.scala | 18 -- .../spark/sql/connector/AlterTableTests.scala | 74 -- 5 files changed, 120 insertions(+), 69 deletions(-) copy docs/{sql-error-conditions-invalid-limit-like-expression-error-class.md => sql-error-conditions-cannot-update-field-error-class.md} (65%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fdeb8d8551e -> 5e31f4dfc20)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from fdeb8d8551e [SPARK-44321][CONNECT] Decouple ParseException from AnalysisException add 5e31f4dfc20 [SPARK-38477][CORE] Use error class in org.apache.spark.shuffle No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 16 + .../org/apache/spark/errors/SparkCoreErrors.scala | 11 - .../spark/shuffle/IndexShuffleBlockResolver.scala | 27 -- .../shuffle/ShufflePartitionPairsWriter.scala | 5 ++-- docs/sql-error-conditions.md | 12 ++ .../spark/sql/errors/QueryExecutionErrors.scala| 2 +- 6 files changed, 52 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1fbb94b87c0 -> 1adf2866915)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1fbb94b87c0 [SPARK-44284][CONNECT] Create simple conf system for sql/api add 1adf2866915 [SPARK-44303][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324] No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 50 ++-- .../connect/planner/SparkConnectProtoSuite.scala | 8 +- .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 36 +++-- ...ditions-invalid-observed-metrics-error-class.md | 12 +++ docs/sql-error-conditions.md | 12 +++ .../sql/catalyst/analysis/CheckAnalysis.scala | 21 +++-- .../sql/catalyst/analysis/AnalysisSuite.scala | 27 +-- .../spark/sql/connector/AlterTableTests.scala | 92 +- .../connector/V2CommandsCaseSensitivitySuite.scala | 34 +++- .../v2/jdbc/JDBCTableCatalogSuite.scala| 36 +++-- 10 files changed, 243 insertions(+), 85 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: Revert "[SPARK-43851][SQL] Support LCA in grouping expressions"
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a68e362dca1 Revert "[SPARK-43851][SQL] Support LCA in grouping expressions" a68e362dca1 is described below commit a68e362dca10f1c0173fbe51bf321428378e4602 Author: Jia Fan AuthorDate: Thu Jul 6 15:20:38 2023 +0300 Revert "[SPARK-43851][SQL] Support LCA in grouping expressions" ### What changes were proposed in this pull request? This reverts commit 9353d67f9290bae1e7d7e16a2caf5256cc4e2f92. After discussion in #41817 , we should revert LCA in grouping expressions. Becuase the current solution has problems. ### Why are the changes needed? revert PR ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist test Closes #41869 from Hisoka-X/SPARK-43851_revert. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 5 + ...r-conditions-unsupported-feature-error-class.md | 4 .../analysis/ResolveReferencesInAggregate.scala| 22 ++ .../column-resolution-aggregate.sql.out| 26 +- .../results/column-resolution-aggregate.sql.out| 16 + 5 files changed, 44 insertions(+), 29 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 44bec5e8ced..a3b12022b66 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2613,6 +2613,11 @@ "Referencing lateral column alias in the aggregate query both with window expressions and with having clause. Please rewrite the aggregate query by removing the having clause or removing lateral alias reference in the SELECT list." ] }, + "LATERAL_COLUMN_ALIAS_IN_GROUP_BY" : { +"message" : [ + "Referencing a lateral column alias via GROUP BY alias/ALL is not supported yet." +] + }, "LATERAL_COLUMN_ALIAS_IN_WINDOW" : { "message" : [ "Referencing a lateral column alias in window expression ." diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md b/docs/sql-error-conditions-unsupported-feature-error-class.md index 25f09118f74..a41502b609a 100644 --- a/docs/sql-error-conditions-unsupported-feature-error-class.md +++ b/docs/sql-error-conditions-unsupported-feature-error-class.md @@ -85,6 +85,10 @@ Referencing a lateral column alias `` in the aggregate function `` Referencing lateral column alias `` in the aggregate query both with window expressions and with having clause. Please rewrite the aggregate query by removing the having clause or removing lateral alias reference in the SELECT list. +## LATERAL_COLUMN_ALIAS_IN_GROUP_BY + +Referencing a lateral column alias via GROUP BY alias/ALL is not supported yet. + ## LATERAL_COLUMN_ALIAS_IN_WINDOW Referencing a lateral column alias `` in window expression ``. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala index 41bcb337c67..09ae87b071f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala @@ -17,8 +17,9 @@ package org.apache.spark.sql.catalyst.analysis +import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.SQLConfHelper -import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, Expression, LateralColumnAliasReference, NamedExpression} +import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, Expression, NamedExpression} import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, AppendColumns, LogicalPlan} import org.apache.spark.sql.catalyst.trees.TreePattern.{LATERAL_COLUMN_ALIAS_REFERENCE, UNRESOLVED_ATTRIBUTE} @@ -73,6 +74,12 @@ object ResolveReferencesInAggregate extends SQLConfHelper resolvedAggExprsWithOuter, resolveGroupByAlias(resolvedAggExprsWithOuter, resolvedGroupExprsNoOuter) ).map(resolveOuterRef) + // TODO: currently we don't support LCA in `groupingExpressions` yet. + if (resolved.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE))) { +throw new AnalysisException( + errorClass = "
[spark] branch master updated: [SPARK-44299][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5d840eb4553 [SPARK-44299][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8] 5d840eb4553 is described below commit 5d840eb455350ef3f6235a031a1689bf4a51007d Author: panbingkun AuthorDate: Thu Jul 6 10:08:45 2023 +0300 [SPARK-44299][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8] ### What changes were proposed in this pull request? The pr aims to assign names to the error class, include: - _LEGACY_ERROR_TEMP_2274 => UNSUPPORTED_FEATURE.REPLACE_NESTED_COLUMN - _LEGACY_ERROR_TEMP_2275 => CANNOT_INVOKE_IN_TRANSFORMATIONS - _LEGACY_ERROR_TEMP_2276 => UNSUPPORTED_FEATURE .HIVE_WITH_ANSI_INTERVALS - _LEGACY_ERROR_TEMP_2278 => INVALID_FORMAT.MISMATCH_INPUT ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update & Add new UT. - Manually test. - Pass GA. Closes #41858 from panbingkun/SPARK-44299. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 40 +++--- ...-error-conditions-invalid-format-error-class.md | 4 +++ ...r-conditions-unsupported-feature-error-class.md | 8 + docs/sql-error-conditions.md | 6 .../spark/sql/catalyst/util/ToNumberParser.scala | 4 +-- .../spark/sql/errors/QueryExecutionErrors.scala| 20 +-- .../expressions/StringExpressionsSuite.scala | 9 +++-- .../apache/spark/sql/execution/command/ddl.scala | 2 +- .../sql-tests/results/postgreSQL/numeric.sql.out | 10 +++--- .../results/postgreSQL/numeric.sql.out.java21 | 10 +++--- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 13 +++ .../spark/sql/DataFrameNaFunctionsSuite.scala | 12 --- .../spark/sql/hive/execution/HiveDDLSuite.scala| 2 +- .../command/AlterTableAddColumnsSuite.scala| 13 --- 14 files changed, 101 insertions(+), 52 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 8bdb02470ef..44bec5e8ced 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -128,6 +128,11 @@ ], "sqlState" : "22546" }, + "CANNOT_INVOKE_IN_TRANSFORMATIONS" : { +"message" : [ + "Dataset transformations and actions can only be invoked by the driver, not inside of other Dataset transformations; for example, dataset1.map(x => dataset2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the dataset1.map transformation. For more information, see SPARK-28702." +] + }, "CANNOT_LOAD_FUNCTION_CLASS" : { "message" : [ "Cannot load class when registering the function , please make sure it is on the classpath." @@ -1192,6 +1197,11 @@ "The escape character is not allowed to precede ." ] }, + "MISMATCH_INPUT" : { +"message" : [ + "The input '' does not match the format." +] + }, "THOUSANDS_SEPS_MUST_BEFORE_DEC" : { "message" : [ "Thousands separators (, or G) may not appear after the decimal point in the number format." @@ -2583,6 +2593,11 @@ "Drop the namespace ." ] }, + "HIVE_WITH_ANSI_INTERVALS" : { +"message" : [ + "Hive table with ANSI intervals." +] + }, "INSERT_PARTITION_SPEC_IF_NOT_EXISTS" : { "message" : [ "INSERT INTO with IF NOT EXISTS in the PARTITION spec." @@ -2663,6 +2678,11 @@ "Remove a comment from the namespace ." ] }, + "REPLACE_NESTED_COLUMN" : { +"message" : [ + "The replace function does not support nested column ." +] + }, "SET_NAMESPACE_PROPERTY" : { "message" : [ " is a reserved namespace property, ." @@ -5627,31 +5647,11 @@ "" ] }, - "_LEGACY_ERROR_TEMP_2274" : { -"message" : [ - "Nested field is not supported." -] - }, - "_LEGACY_ERROR_TEMP_2275" : { -"message" : [ - "Dataset transformations and actions can only be invoke
[spark] branch master updated (68862589a0c -> d53585c91b2)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 68862589a0c [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19 add d53585c91b2 [SPARK-44292][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2315-2319] No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 57 -- ...ror-conditions-datatype-mismatch-error-class.md | 4 ++ ...itions-invalid-observed-metrics-error-class.md} | 24 - .../sql/catalyst/analysis/CheckAnalysis.scala | 24 + .../sql/catalyst/analysis/AnalysisSuite.scala | 40 ++- 5 files changed, 92 insertions(+), 57 deletions(-) copy docs/{sql-error-conditions-invalid-schema-error-class.md => sql-error-conditions-invalid-observed-metrics-error-class.md} (61%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b573cca90ea -> 7bc28d54f83)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b573cca90ea [SPARK-44288][SS] Set the column family options before passing to DBOptions in RocksDB state store provider add 7bc28d54f83 [SPARK-44269][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2310-2314] No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 25 +- docs/sql-error-conditions.md | 6 ++ .../sql/catalyst/analysis/CheckAnalysis.scala | 11 -- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 11 +- .../main/scala/org/apache/spark/sql/Dataset.scala | 8 +++ .../apache/spark/sql/DataFrameWriterV2Suite.scala | 19 .../test/DataStreamReaderWriterSuite.scala | 20 - 7 files changed, 54 insertions(+), 46 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42169][SQL] Implement code generation for to_csv function (StructsToCsv)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 45ae9c5cc67 [SPARK-42169][SQL] Implement code generation for to_csv function (StructsToCsv) 45ae9c5cc67 is described below commit 45ae9c5cc67d379f5bbeadf8c56c032f2bdaaac0 Author: narek_karapetian AuthorDate: Mon Jul 3 10:13:12 2023 +0300 [SPARK-42169][SQL] Implement code generation for to_csv function (StructsToCsv) ### What changes were proposed in this pull request? This PR enhances `StructsToCsv` class with `doGenCode` function instead of extending it from `CodegenFallback` trait (performance improvement). ### Why are the changes needed? It will improve performance. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? an additional test case were added to `org.apache.spark.sql.CsvFunctionsSuite` class. Closes #39719 from NarekDW/SPARK-42169. Authored-by: narek_karapetian Signed-off-by: Max Gekk --- .../sql/catalyst/expressions/csvExpressions.scala | 11 ++- .../catalyst/expressions/CsvExpressionsSuite.scala | 7 ++ sql/core/benchmarks/CSVBenchmark-jdk11-results.txt | 82 +-- sql/core/benchmarks/CSVBenchmark-jdk17-results.txt | 82 +-- sql/core/benchmarks/CSVBenchmark-results.txt | 94 +++--- 5 files changed, 144 insertions(+), 132 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala index e47cf493d4c..cdab9faacd4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch import org.apache.spark.sql.catalyst.csv._ -import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodegenFallback, ExprCode} import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryErrorsBase} import org.apache.spark.sql.internal.SQLConf @@ -245,8 +245,7 @@ case class StructsToCsv( options: Map[String, String], child: Expression, timeZoneId: Option[String] = None) - extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes -with NullIntolerant { + extends UnaryExpression with TimeZoneAwareExpression with ExpectsInputTypes with NullIntolerant { override def nullable: Boolean = true def this(options: Map[String, String], child: Expression) = this(options, child, None) @@ -293,4 +292,10 @@ case class StructsToCsv( override protected def withNewChildInternal(newChild: Expression): StructsToCsv = copy(child = newChild) + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val structsToCsv = ctx.addReferenceObj("structsToCsv", this) +nullSafeCodeGen(ctx, ev, + eval => s"${ev.value} = (UTF8String) $structsToCsv.converter().apply($eval);") + } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala index 1d174ed2145..a89cb58c3e0 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala @@ -246,4 +246,11 @@ class CsvExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper with P CsvToStructs(schema, Map.empty, Literal.create("1 day")), InternalRow(new CalendarInterval(0, 1, 0))) } + + test("StructsToCsv should not generate codes beyond 64KB") { +val range = Range.inclusive(1, 5000) +val struct = CreateStruct.create(range.map(Literal.apply)) +val expected = range.mkString(",") +checkEvaluation(StructsToCsv(Map.empty, struct), expected) + } } diff --git a/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt b/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt index 7b5ea10bc4e..7fca105a8c2 100644 --- a/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt @@ -2,69 +2,69 @@ Benchmark to measure CSV read/write performance =
[spark] branch master updated: [SPARK-44268][CORE][TEST] Add tests to ensure error-classes.json and docs are in sync
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b557f5752af [SPARK-44268][CORE][TEST] Add tests to ensure error-classes.json and docs are in sync b557f5752af is described below commit b557f5752afc32d614b37be610dbbca44519664b Author: Jia Fan AuthorDate: Sun Jul 2 18:51:09 2023 +0300 [SPARK-44268][CORE][TEST] Add tests to ensure error-classes.json and docs are in sync ### What changes were proposed in this pull request? Add new test to make sure to `error-classes.json` are match with series of `sql-error-conditions.md`. After this PR, any difference between `error-classes.json` and document with report a error during test. Note: only compare error class name at now. Also fix all different which be found by new test case. ### Why are the changes needed? Make sure our error-classes.json always sync with doc. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test. Closes #41813 from Hisoka-X/SPARK-44268_sync_error_classes_to_doc. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../org/apache/spark/SparkThrowableSuite.scala | 51 ++ ...ror-conditions-datatype-mismatch-error-class.md | 4 + ...tions-incompatible-data-to-table-error-class.md | 64 --- ...ror-conditions-insert-column-arity-mismatch.md} | 30 +- ...rror-conditions-insufficient-table-property.md} | 26 +- ... => sql-error-conditions-invalid-as-of-join.md} | 26 +- ...md => sql-error-conditions-invalid-boundary.md} | 26 +- ... sql-error-conditions-invalid-default-value.md} | 26 +- ...> sql-error-conditions-invalid-inline-table.md} | 26 +- ...ror-conditions-invalid-lamdba-function-call.md} | 26 +- ...or-conditions-invalid-limit-like-expression.md} | 26 +- ...nditions-invalid-parameter-value-error-class.md | 14 +- ...rror-conditions-invalid-partition-operation.md} | 26 +- docs/sql-error-conditions-invalid-sql-syntax.md| 92 ...nditions-invalid-time-travel-timestamp-expr.md} | 26 +- ...error-conditions-invalid-write-distribution.md} | 26 +- ...rror-conditions-malformed-record-in-parsing.md} | 24 +- ... => sql-error-conditions-missing-attributes.md} | 26 +- ...onditions-not-a-constant-string-error-class.md} | 26 +- ...=> sql-error-conditions-not-allowed-in-from.md} | 26 +- ...or-conditions-not-supported-in-jdbc-catalog.md} | 26 +- ...> sql-error-conditions-unsupported-add-file.md} | 26 +- ...-error-conditions-unsupported-default-value.md} | 26 +- ...r-conditions-unsupported-feature-error-class.md | 36 ++ ... => sql-error-conditions-unsupported-insert.md} | 26 +- ...rror-conditions-unsupported-merge-condition.md} | 26 +- ... sql-error-conditions-unsupported-overwrite.md} | 26 +- docs/sql-error-conditions.md | 609 - 28 files changed, 984 insertions(+), 434 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala index 96c4e3b8ab7..034a782e533 100644 --- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala @@ -141,6 +141,57 @@ class SparkThrowableSuite extends SparkFunSuite { checkIfUnique(messageFormats) } + test("SPARK-44268: Error classes match with document") { +val sqlstateDoc = "sql-error-conditions-sqlstates.md" +val errors = errorReader.errorInfoMap +val errorDocPaths = getWorkspaceFilePath("docs").toFile + .listFiles(_.getName.startsWith("sql-error-conditions-")) + .filter(!_.getName.equals(sqlstateDoc)) + .map(f => IOUtils.toString(f.toURI, StandardCharsets.UTF_8)).map(_.split("\n")) +// check the error classes in document should be in error-classes.json +val linkInDocRegex = "\\[(.*)\\]\\((.*)\\)".r +val commonErrorsInDoc = IOUtils.toString(getWorkspaceFilePath("docs", + "sql-error-conditions.md").toUri, StandardCharsets.UTF_8).split("\n") + .filter(_.startsWith("###")).map(s => s.replace("###", "").trim) + .filter(linkInDocRegex.findFirstMatchIn(_).isEmpty) + +commonErrorsInDoc.foreach(s => assert(errors.contains(s), + s"Error class: $s is not in error-classes.json")) + +val titlePrefix = "title:" +val errorsInDoc = errorDocPaths.map(lines => { + val errorClass = lines.filter(_.startsWith(titlePrefix)) +.map(s => s.replace("error class", "").replace(titlePrefix, "").trim).head + assert(errors.contains(errorClass), s&
[spark] branch master updated: [SPARK-44254][SQL] Move QueryExecutionErrors that used by DataType to sql/api as DataTypeErrors
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cf852b284d5 [SPARK-44254][SQL] Move QueryExecutionErrors that used by DataType to sql/api as DataTypeErrors cf852b284d5 is described below commit cf852b284d550f9425ae7893796ae0042be6010f Author: Rui Wang AuthorDate: Sun Jul 2 10:18:43 2023 +0300 [SPARK-44254][SQL] Move QueryExecutionErrors that used by DataType to sql/api as DataTypeErrors ### What changes were proposed in this pull request? Moving some QueryExecutionErrors that are used by data types to `sql/api` and name those as DataType erros so that DataType can use those if DataType only stay in `sql/api` module. ### Why are the changes needed? Towards a simpler DataType interface. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing test Closes #41794 from amaliujia/datatype_more_refactors. Authored-by: Rui Wang Signed-off-by: Max Gekk --- sql/api/pom.xml| 5 ++ .../apache/spark/sql/errors/DataTypeErrors.scala | 95 ++ .../spark/sql/errors/QueryExecutionErrors.scala| 42 ++ .../apache/spark/sql/types/AbstractDataType.scala | 4 +- .../scala/org/apache/spark/sql/types/Decimal.scala | 9 +- .../org/apache/spark/sql/types/DecimalType.scala | 4 +- .../org/apache/spark/sql/types/Metadata.scala | 10 +-- .../org/apache/spark/sql/types/ObjectType.scala| 4 +- .../apache/spark/sql/types/UDTRegistration.scala | 6 +- 9 files changed, 127 insertions(+), 52 deletions(-) diff --git a/sql/api/pom.xml b/sql/api/pom.xml index 9b7917e0343..41a5b85d4c6 100644 --- a/sql/api/pom.xml +++ b/sql/api/pom.xml @@ -40,6 +40,11 @@ spark-common-utils_${scala.binary.version} ${project.version} + +org.apache.spark +spark-unsafe_${scala.binary.version} +${project.version} + target/scala-${scala.binary.version}/classes diff --git a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala new file mode 100644 index 000..02e8b12c707 --- /dev/null +++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.errors + +import org.apache.spark.{SparkArithmeticException, SparkException, SparkRuntimeException, SparkUnsupportedOperationException} +import org.apache.spark.unsafe.types.UTF8String + +/** + * Object for grouping error messages from (most) exceptions thrown during query execution. + * This does not include exceptions thrown during the eager execution of commands, which are + * grouped into [[QueryCompilationErrors]]. + */ +private[sql] object DataTypeErrors { + def unsupportedOperationExceptionError(): SparkUnsupportedOperationException = { +new SparkUnsupportedOperationException( + errorClass = "_LEGACY_ERROR_TEMP_2225", + messageParameters = Map.empty) + } + + def decimalPrecisionExceedsMaxPrecisionError( + precision: Int, maxPrecision: Int): SparkArithmeticException = { +new SparkArithmeticException( + errorClass = "DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION", + messageParameters = Map( +"precision" -> precision.toString, +"maxPrecision" -> maxPrecision.toString + ), + context = Array.empty, + summary = "") + } + + def unsupportedRoundingMode(roundMode: BigDecimal.RoundingMode.Value): SparkException = { +SparkException.internalError(s"Not supported rounding mode: ${roundMode.toString}.") + } + + def outOfDecimalTypeRangeError(str: UTF8String): SparkArithmeticException = { +new SparkArithmeticException( + errorClass = "NUMERIC_OUT_OF_SUPPORTE
[spark] branch master updated: [SPARK-44255][SQL] Relocate StorageLevel to common/utils
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f985e3e84a2 [SPARK-44255][SQL] Relocate StorageLevel to common/utils f985e3e84a2 is described below commit f985e3e84a23ab5a83842047408e3fd92887447a Author: Rui Wang AuthorDate: Sat Jul 1 12:10:22 2023 +0300 [SPARK-44255][SQL] Relocate StorageLevel to common/utils ### What changes were proposed in this pull request? Relocate `StorageLevel` to `common/utils`. ### Why are the changes needed? Scala client needs `StorageLevel` so this can be shared in the `common/utils`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #41797 from amaliujia/move_storage_level_to_common_utils. Authored-by: Rui Wang Signed-off-by: Max Gekk --- .../java/org/apache/spark/memory/MemoryMode.java | 0 .../org/apache/spark/storage/StorageLevel.scala| 6 ++--- .../org/apache/spark/util/SparkErrorUtils.scala| 30 +- .../main/scala/org/apache/spark/util/Utils.scala | 11 +--- project/MimaExcludes.scala | 4 +++ 5 files changed, 32 insertions(+), 19 deletions(-) diff --git a/core/src/main/java/org/apache/spark/memory/MemoryMode.java b/common/utils/src/main/java/org/apache/spark/memory/MemoryMode.java similarity index 100% copy from core/src/main/java/org/apache/spark/memory/MemoryMode.java copy to common/utils/src/main/java/org/apache/spark/memory/MemoryMode.java diff --git a/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala b/common/utils/src/main/scala/org/apache/spark/storage/StorageLevel.scala similarity index 97% rename from core/src/main/scala/org/apache/spark/storage/StorageLevel.scala rename to common/utils/src/main/scala/org/apache/spark/storage/StorageLevel.scala index 4a2b705e069..73bc53dab89 100644 --- a/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala +++ b/common/utils/src/main/scala/org/apache/spark/storage/StorageLevel.scala @@ -22,7 +22,7 @@ import java.util.concurrent.ConcurrentHashMap import org.apache.spark.annotation.DeveloperApi import org.apache.spark.memory.MemoryMode -import org.apache.spark.util.Utils +import org.apache.spark.util.SparkErrorUtils /** * :: DeveloperApi :: @@ -98,12 +98,12 @@ class StorageLevel private( ret } - override def writeExternal(out: ObjectOutput): Unit = Utils.tryOrIOException { + override def writeExternal(out: ObjectOutput): Unit = SparkErrorUtils.tryOrIOException { out.writeByte(toInt) out.writeByte(_replication) } - override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException { + override def readExternal(in: ObjectInput): Unit = SparkErrorUtils.tryOrIOException { val flags = in.readByte() _useDisk = (flags & 8) != 0 _useMemory = (flags & 4) != 0 diff --git a/core/src/main/java/org/apache/spark/memory/MemoryMode.java b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala similarity index 50% rename from core/src/main/java/org/apache/spark/memory/MemoryMode.java rename to common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala index 3a5e72d8aae..8e4de01885e 100644 --- a/core/src/main/java/org/apache/spark/memory/MemoryMode.java +++ b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala @@ -14,13 +14,31 @@ * See the License for the specific language governing permissions and * limitations under the License. */ +package org.apache.spark.util -package org.apache.spark.memory; +import java.io.IOException -import org.apache.spark.annotation.Private; +import scala.util.control.NonFatal -@Private -public enum MemoryMode { - ON_HEAP, - OFF_HEAP +import org.apache.spark.internal.Logging + +object SparkErrorUtils extends Logging { + /** + * Execute a block of code that returns a value, re-throwing any non-fatal uncaught + * exceptions as IOException. This is used when implementing Externalizable and Serializable's + * read and write methods, since Java's serializer will not report non-IOExceptions properly; + * see SPARK-4080 for more context. + */ + def tryOrIOException[T](block: => T): T = { +try { + block +} catch { + case e: IOException => +logError("Exception encountered", e) +throw e + case NonFatal(e) => +logError("Exception encountered", e) +throw new IOException(e) +} + } } diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index ada0cffd2b0..60895c791b5 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark
[spark] branch master updated: [SPARK-44244][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2305-2309]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3baf7f7b710 [SPARK-44244][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2305-2309] 3baf7f7b710 is described below commit 3baf7f7b7106f3fd30257b793ff4908d0f1ec427 Author: Jiaan Geng AuthorDate: Sat Jul 1 12:03:42 2023 +0300 [SPARK-44244][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2305-2309] ### What changes were proposed in this pull request? The pr aims to assign names to the error class _LEGACY_ERROR_TEMP_[2305-2309]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases updated and added new test cases. Closes #41788 from beliefer/SPARK-44244. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 35 ++ .../spark/sql/catalyst/analysis/Analyzer.scala | 14 - .../catalyst/analysis/ResolveInlineTables.scala| 10 +++ .../sql/catalyst/analysis/AnalysisSuite.scala | 6 ++-- .../ansi/higher-order-functions.sql.out| 2 +- .../higher-order-functions.sql.out | 2 +- .../analyzer-results/inline-table.sql.out | 16 +- .../table-valued-functions.sql.out | 20 ++--- .../analyzer-results/udf/udf-inline-table.sql.out | 16 +- .../results/ansi/higher-order-functions.sql.out| 2 +- .../results/higher-order-functions.sql.out | 2 +- .../sql-tests/results/inline-table.sql.out | 16 +- .../results/table-valued-functions.sql.out | 20 ++--- .../sql-tests/results/udf/udf-inline-table.sql.out | 16 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 13 .../execution/command/PlanResolutionSuite.scala| 19 16 files changed, 105 insertions(+), 104 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 14bd3bc6bac..027d09eae10 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1241,6 +1241,11 @@ "message" : [ "Found incompatible types in the column for inline table." ] + }, + "NUM_COLUMNS_MISMATCH" : { +"message" : [ + "Inline table expected columns but found columns in row ." +] } } }, @@ -1266,6 +1271,11 @@ "The lambda function has duplicate arguments . Please, consider to rename the argument names or set to \"true\"." ] }, + "NON_HIGHER_ORDER_FUNCTION" : { +"message" : [ + "A lambda function should only be used in a higher order function. However, its class is , which is not a higher order function." +] + }, "NUM_ARGS_MISMATCH" : { "message" : [ "A higher order function expects arguments, but got ." @@ -1939,6 +1949,11 @@ ], "sqlState" : "42826" }, + "NUM_TABLE_VALUE_ALIASES_MISMATCH" : { +"message" : [ + "Number of given aliases does not match number of output columns. Function name: ; number of aliases: ; number of output columns: ." +] + }, "ORDER_BY_POS_OUT_OF_RANGE" : { "message" : [ "ORDER BY position is not in select list (valid range is [1, ])." @@ -5589,26 +5604,6 @@ "The input '' does not match the given number format: ''." ] }, - "_LEGACY_ERROR_TEMP_2305" : { -"message" : [ - "expected columns but found columns in row ." -] - }, - "_LEGACY_ERROR_TEMP_2306" : { -"message" : [ - "A lambda function should only be used in a higher order function. However, its class is , which is not a higher order function." -] - }, - "_LEGACY_ERROR_TEMP_2307" : { -"message" : [ - "Number of given aliases does not match number of output columns. Function name: ; number of aliases: ; number of output columns: ." -] - }, - "_LEGACY_ERROR_TEMP_2309" : { -"message" : [ - "cannot resolve in MERGE command given columns []." -] - }, "_LEGACY_ERROR_TEMP_2311" : { "message" : [ "'writeTo' can not be called on streaming Dataset/DataFrame." diff --git
[spark] branch master updated: [SPARK-44044][SS] Improve Error message for Window functions with streaming
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f406b54b2a8 [SPARK-44044][SS] Improve Error message for Window functions with streaming f406b54b2a8 is described below commit f406b54b2a899d03bae2e6f70eef7fedfed63d65 Author: Siying Dong AuthorDate: Sat Jul 1 08:51:22 2023 +0300 [SPARK-44044][SS] Improve Error message for Window functions with streaming ### What changes were proposed in this pull request? Replace existing error message when non-time window function is used with streaming to include aggregation function and column. The error message looks like following now: org.apache.spark.sql.AnalysisException: Window function is not supported in 'row_number()' as column 'rn_col' on streaming DataFrames/Datasets. Structured Streaming only supports time-window aggregation using the `window` unction. (window specification: '(PARTITION BY col1 ORDER BY col2 ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)') Note that the message is a little bit unnatural as the existing unit test requires the exception follows the pattern that it includes "not supported", "streaming" "DataFrames" and "Dataset". ### Why are the changes needed? The exiting error message is vague and a full logical plan is included. A user reports that they aren't able to identify what the problem is. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test Closes #41578 from siying/window_error. Lead-authored-by: Siying Dong Co-authored-by: Siying Dong Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 5 .../analysis/UnsupportedOperationChecker.scala | 17 ++--- .../spark/sql/errors/QueryExecutionErrors.scala| 16 - .../analysis/UnsupportedOperationsSuite.scala | 24 ++- .../apache/spark/sql/streaming/StreamSuite.scala | 28 ++ 5 files changed, 80 insertions(+), 10 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index eabd5533e13..14bd3bc6bac 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1775,6 +1775,11 @@ ], "sqlState" : "42000" }, + "NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING" : { +"message" : [ + "Window function is not supported in (as column ) on streaming DataFrames/Datasets. Structured Streaming only supports time-window aggregation using the WINDOW function. (window specification: )" +] + }, "NOT_ALLOWED_IN_FROM" : { "message" : [ "Not allowed in the FROM clause:" diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala index daa7c0d54b7..2a09d85d8f2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala @@ -19,11 +19,12 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.internal.Logging import org.apache.spark.sql.AnalysisException -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BinaryComparison, CurrentDate, CurrentTimestampLike, Expression, GreaterThan, GreaterThanOrEqual, GroupingSets, LessThan, LessThanOrEqual, LocalTimestamp, MonotonicallyIncreasingID, SessionWindow} +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BinaryComparison, CurrentDate, CurrentTimestampLike, Expression, GreaterThan, GreaterThanOrEqual, GroupingSets, LessThan, LessThanOrEqual, LocalTimestamp, MonotonicallyIncreasingID, SessionWindow, WindowExpression} import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.streaming.InternalOutputModes +import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.streaming.{GroupStateTimeout, OutputMode} @@ -508,8 +509,18 @@ object UnsupportedOperationChecker extends Logging { case Sample(_, _, _, _, child) if child.isStreaming => throwError("Sampling is not supported on streaming DataFrames/Datasets&
[spark] branch master updated: [SPARK-43851][SQL] Support LCA in grouping expressions
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9353d67f929 [SPARK-43851][SQL] Support LCA in grouping expressions 9353d67f929 is described below commit 9353d67f9290bae1e7d7e16a2caf5256cc4e2f92 Author: Jia Fan AuthorDate: Sat Jul 1 08:48:10 2023 +0300 [SPARK-43851][SQL] Support LCA in grouping expressions ### What changes were proposed in this pull request? This PR bring support lateral column alias reference in grouping expressions. ### Why are the changes needed? add new feature for LCA ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist test Closes #41804 from Hisoka-X/SPARK-43851_LCA_in_group. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 5 - ...r-conditions-unsupported-feature-error-class.md | 4 .../analysis/ResolveReferencesInAggregate.scala| 22 -- .../column-resolution-aggregate.sql.out| 26 +- .../results/column-resolution-aggregate.sql.out| 16 - 5 files changed, 29 insertions(+), 44 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 3cc35d668e0..eabd5533e13 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2530,11 +2530,6 @@ "Referencing lateral column alias in the aggregate query both with window expressions and with having clause. Please rewrite the aggregate query by removing the having clause or removing lateral alias reference in the SELECT list." ] }, - "LATERAL_COLUMN_ALIAS_IN_GROUP_BY" : { -"message" : [ - "Referencing a lateral column alias via GROUP BY alias/ALL is not supported yet." -] - }, "LATERAL_COLUMN_ALIAS_IN_WINDOW" : { "message" : [ "Referencing a lateral column alias in window expression ." diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md b/docs/sql-error-conditions-unsupported-feature-error-class.md index 64d7eb347e5..78bf301c49d 100644 --- a/docs/sql-error-conditions-unsupported-feature-error-class.md +++ b/docs/sql-error-conditions-unsupported-feature-error-class.md @@ -65,10 +65,6 @@ Referencing a lateral column alias `` in the aggregate function `` Referencing lateral column alias `` in the aggregate query both with window expressions and with having clause. Please rewrite the aggregate query by removing the having clause or removing lateral alias reference in the SELECT list. -## LATERAL_COLUMN_ALIAS_IN_GROUP_BY - -Referencing a lateral column alias via GROUP BY alias/ALL is not supported yet. - ## LATERAL_COLUMN_ALIAS_IN_WINDOW Referencing a lateral column alias `` in window expression ``. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala index 09ae87b071f..41bcb337c67 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala @@ -17,9 +17,8 @@ package org.apache.spark.sql.catalyst.analysis -import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.SQLConfHelper -import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, Expression, NamedExpression} +import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, Expression, LateralColumnAliasReference, NamedExpression} import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, AppendColumns, LogicalPlan} import org.apache.spark.sql.catalyst.trees.TreePattern.{LATERAL_COLUMN_ALIAS_REFERENCE, UNRESOLVED_ATTRIBUTE} @@ -74,12 +73,6 @@ object ResolveReferencesInAggregate extends SQLConfHelper resolvedAggExprsWithOuter, resolveGroupByAlias(resolvedAggExprsWithOuter, resolvedGroupExprsNoOuter) ).map(resolveOuterRef) - // TODO: currently we don't support LCA in `groupingExpressions` yet. - if (resolved.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE))) { -throw new AnalysisException( - errorClass = "UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY", - messageParameters = Map.empty) - } resolved } else {
[spark] branch master updated: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 706829d9731 [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020 706829d9731 is described below commit 706829d97312c6812bf791d9893d0a70d81676ae Author: itholic AuthorDate: Fri Jun 30 21:25:04 2023 +0300 [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_1020, "INVALID_USAGE_OF_STAR_OR_REGEX". ### Why are the changes needed? We should assign proper name to _LEGACY_ERROR_TEMP_* ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*` Closes #39702 from itholic/LEGACY_1020. Authored-by: itholic Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 11 +-- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../spark/sql/errors/QueryCompilationErrors.scala | 2 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 86 -- .../catalyst/analysis/ResolveSubquerySuite.scala | 6 +- .../org/apache/spark/sql/DataFrameSuite.scala | 11 ++- .../scala/org/apache/spark/sql/DatasetSuite.scala | 4 +- 7 files changed, 82 insertions(+), 40 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index abe88db1267..3cc35d668e0 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1596,6 +1596,12 @@ "The url is invalid: . If necessary set to \"false\" to bypass this error." ] }, + "INVALID_USAGE_OF_STAR_OR_REGEX" : { +"message" : [ + "Invalid usage of in ." +], +"sqlState" : "42000" + }, "INVALID_VIEW_TEXT" : { "message" : [ "The view cannot be displayed due to invalid view text: . This may be caused by an unauthorized modification of the view or an incorrect query syntax. Please check your query syntax and verify that the view has not been tampered with." @@ -3169,11 +3175,6 @@ " is a permanent view, which is not supported by streaming reading API such as `DataStreamReader.table` yet." ] }, - "_LEGACY_ERROR_TEMP_1020" : { -"message" : [ - "Invalid usage of in ." -] - }, "_LEGACY_ERROR_TEMP_1021" : { "message" : [ "count(.*) is not allowed. Please use count(*) or expand the columns manually, e.g. count(col1, col2)." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 32cec909401..b61dbae686b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1897,7 +1897,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor }) // count(*) has been replaced by count(1) case o if containsStar(o.children) => - throw QueryCompilationErrors.invalidStarUsageError(s"expression '${o.prettyName}'", + throw QueryCompilationErrors.invalidStarUsageError(s"expression `${o.prettyName}`", extractStar(o.children)) } } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 94cbf880b57..e02708105d2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -475,7 +475,7 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { } val elem = Seq(starMsg, resExprMsg).flatten.mkString(" and ") new AnalysisException( - errorClass = "_LEGACY_ERROR_TEMP_1020", + errorClass = "INVALID_USAGE_OF_STAR_OR_REGEX", messageParameters = Map("elem" -> elem, "prettyName" -> prettyName)) } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala index f994c03..fdaeadc5445 100644 -
[spark] branch master updated: [SPARK-43986][SQL] Create error classes for HyperLogLog function call failures
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ab67f461987 [SPARK-43986][SQL] Create error classes for HyperLogLog function call failures ab67f461987 is described below commit ab67f4619873f21b5dcf7f67658afce7e1028657 Author: Daniel Tenedorio AuthorDate: Fri Jun 30 19:44:14 2023 +0300 [SPARK-43986][SQL] Create error classes for HyperLogLog function call failures ### What changes were proposed in this pull request? This PR creates error classes for HyperLogLog function call failures. ### Why are the changes needed? These replace previous Java exceptions or other cases, in order to improve the user experience and bring consistency with other parts of Spark. ### Does this PR introduce _any_ user-facing change? Yes, error messages change slightly. ### How was this patch tested? This PR also adds SQL query test files for the HLL functions. Closes #41486 from dtenedor/hll-error-classes. Authored-by: Daniel Tenedorio Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 15 + .../aggregate/datasketchesAggregates.scala | 71 +++-- .../expressions/datasketchesExpressions.scala | 29 +- .../spark/sql/errors/QueryExecutionErrors.scala| 26 ++ .../sql-tests/analyzer-results/hll.sql.out | 215 + .../src/test/resources/sql-tests/inputs/hll.sql| 76 + .../test/resources/sql-tests/results/hll.sql.out | 262 .../apache/spark/sql/DataFrameAggregateSuite.scala | 338 - 8 files changed, 850 insertions(+), 182 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index db6b9a97012..abe88db1267 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -782,6 +782,21 @@ "The expression cannot be used as a grouping expression because its data type is not an orderable data type." ] }, + "HLL_INVALID_INPUT_SKETCH_BUFFER" : { +"message" : [ + "Invalid call to ; only valid HLL sketch buffers are supported as inputs (such as those produced by the `hll_sketch_agg` function)." +] + }, + "HLL_INVALID_LG_K" : { +"message" : [ + "Invalid call to ; the `lgConfigK` value must be between and , inclusive: ." +] + }, + "HLL_UNION_DIFFERENT_LG_K" : { +"message" : [ + "Sketches have different `lgConfigK` values: and . Set the `allowDifferentLgConfigK` parameter to true to call with different `lgConfigK` values." +] + }, "IDENTIFIER_TOO_MANY_NAME_PARTS" : { "message" : [ " is not a valid identifier as it has more than 2 name parts." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala index 8b24efe12b4..17c69f798d8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala @@ -17,23 +17,23 @@ package org.apache.spark.sql.catalyst.expressions.aggregate -import org.apache.datasketches.SketchesArgumentException import org.apache.datasketches.hll.{HllSketch, TgtHllType, Union} import org.apache.datasketches.memory.Memory import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, Expression, ExpressionDescription, Literal} import org.apache.spark.sql.catalyst.trees.BinaryLike +import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.types.{AbstractDataType, BinaryType, BooleanType, DataType, IntegerType, LongType, StringType, TypeCollection} import org.apache.spark.unsafe.types.UTF8String /** - * The HllSketchAgg function utilizes a Datasketches HllSketch instance to - * count a probabilistic approximation of the number of unique values in - * a given column, and outputs the binary representation of the HllSketch. + * The HllSketchAgg function utilizes a Datasketches HllSketch instance to count a probabilistic + * approximation of the number of unique values in a given column, and outputs the binary + * representation of the HllSketch. * - * See [[https://datasketches.apache.org/docs/HLL/HLL.html]] for more information + * See [[https://datasketches.apache.org/docs/HLL/HLL.html]] for mor
[spark] branch master updated: [SPARK-44260][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1215-1245-2329] & Use checkError() to check Exception in *CharVarchar*Suite
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3fb9a2c6135 [SPARK-44260][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1215-1245-2329] & Use checkError() to check Exception in *CharVarchar*Suite 3fb9a2c6135 is described below commit 3fb9a2c6135d49cc7b80546c0f228d7d2bc78bf6 Author: panbingkun AuthorDate: Fri Jun 30 18:36:46 2023 +0300 [SPARK-44260][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1215-1245-2329] & Use checkError() to check Exception in *CharVarchar*Suite ### What changes were proposed in this pull request? The pr aims to: 1.Assign clear error class names for some logic in `CharVarcharCodegenUtils` that directly uses exceptions - EXCEED_LIMIT_LENGTH 2.Assign names to the error class - _LEGACY_ERROR_TEMP_1215 -> UNSUPPORTED_CHAR_OR_VARCHAR_AS_STRING - _LEGACY_ERROR_TEMP_1245 -> NOT_SUPPORTED_CHANGE_COLUMN - _LEGACY_ERROR_TEMP_2329 -> merge to NOT_SUPPORTED_CHANGE_COLUMN(_LEGACY_ERROR_TEMP_1245) 3.Use checkError() to check Exception in `*CharVarchar*Suite` ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update UT - Pass GA - Manually test. Closes #41768 from panbingkun/CharVarchar_checkError. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 30 +-- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 19 +- .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 19 +- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 19 +- .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 19 +- .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 19 +- .../sql/catalyst/util/CharVarcharCodegenUtils.java | 3 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 11 +- .../spark/sql/errors/QueryCompilationErrors.scala | 17 +- .../spark/sql/errors/QueryExecutionErrors.scala| 7 + .../apache/spark/sql/execution/command/ddl.scala | 3 +- .../analyzer-results/change-column.sql.out | 11 +- .../sql-tests/analyzer-results/charvarchar.sql.out | 11 +- .../sql-tests/results/change-column.sql.out| 11 +- .../sql-tests/results/charvarchar.sql.out | 11 +- .../apache/spark/sql/CharVarcharTestSuite.scala| 291 + .../spark/sql/connector/AlterTableTests.scala | 25 +- .../execution/command/CharVarcharDDLTestBase.scala | 120 +++-- .../spark/sql/HiveCharVarcharTestSuite.scala | 12 +- 19 files changed, 494 insertions(+), 164 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 1b2a1ce305a..db6b9a97012 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -680,6 +680,11 @@ "The event time has the invalid type , but expected \"TIMESTAMP\"." ] }, + "EXCEED_LIMIT_LENGTH" : { +"message" : [ + "Exceeds char/varchar type length limitation: ." +] + }, "EXPRESSION_TYPE_IS_NOT_ORDERABLE" : { "message" : [ "Column expression cannot be sorted because its type is not orderable." @@ -1817,6 +1822,11 @@ }, "sqlState" : "42000" }, + "NOT_SUPPORTED_CHANGE_COLUMN" : { +"message" : [ + "ALTER TABLE ALTER/CHANGE COLUMN is not supported for changing 's column with type to with type ." +] + }, "NOT_SUPPORTED_COMMAND_FOR_V2_TABLE" : { "message" : [ " is not supported for v2 tables." @@ -2351,6 +2361,11 @@ ], "sqlState" : "0A000" }, + "UNSUPPORTED_CHAR_OR_VARCHAR_AS_STRING" : { +"message" : [ + "The char/varchar type can't be used in the table schema. If you want Spark treat them as string type as same as Spark 3.0 and earlier, please set \"spark.sql.legacy.charVarcharAsString\" to \"true\"." +] + }, "UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY" : { "message" : [ "Unsupported data source type for direct query on files: " @@ -3875,11 +3890,6 @@ "Found different window function type in ." ] }, - "_LEGACY_ERROR_TEMP_1215" : { -"message" : [ - "char/varchar type can only be used in the table schema. You can set to true, so that Spark treat them as string type as same as Spark 3.0 and
[spark] branch master updated: [SPARK-43922][SQL] Add named parameter support in parser for function calls
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 91c45812520 [SPARK-43922][SQL] Add named parameter support in parser for function calls 91c45812520 is described below commit 91c458125203d2feefd1e7443a9315c480dfaa00 Author: Richard Yu AuthorDate: Fri Jun 30 13:09:12 2023 +0300 [SPARK-43922][SQL] Add named parameter support in parser for function calls ### What changes were proposed in this pull request? We plan on adding two new tokens called ```namedArgumentExpression``` and ```functionArgument``` which would enable this feature. When parsing this logic, we also make changes to ASTBuilder such that it can detect if the argument passed is a named argument or a positional one. Here is the link for the design document: https://docs.google.com/document/d/1uOTX0MICxqu8fNanIsiyB8FV68CceGGpa8BJLP2u9o4/edit ### Why are the changes needed? This is part of a larger project to implement named parameter support for user defined functions, built-in functions, and table valued functions. ### Does this PR introduce _any_ user-facing change? Yes, the user would be able to call functions with argument lists that contain named arguments. ### How was this patch tested? We add tests in the PlanParserSuite that will verify that the plan parsed is as intended. Closes #41796 from learningchess2003/43922-new. Authored-by: Richard Yu Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 5 + .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 1 + .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 14 ++- .../expressions/NamedArgumentExpression.scala | 58 ++ .../spark/sql/catalyst/parser/AstBuilder.scala | 37 +-- .../spark/sql/errors/QueryCompilationErrors.scala | 9 ++ .../org/apache/spark/sql/internal/SQLConf.scala| 7 ++ .../catalyst/parser/ExpressionParserSuite.scala| 18 +++ .../sql/catalyst/parser/PlanParserSuite.scala | 29 + .../named-function-arguments.sql.out | 112 +++ .../sql-tests/inputs/named-function-arguments.sql | 5 + .../results/named-function-arguments.sql.out | 122 + .../spark/sql/errors/QueryParsingErrorsSuite.scala | 38 ++- 13 files changed, 443 insertions(+), 12 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 6db8c5e3bf1..1b2a1ce305a 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1708,6 +1708,11 @@ "Not allowed to implement multiple UDF interfaces, UDF class ." ] }, + "NAMED_ARGUMENTS_SUPPORT_DISABLED" : { +"message" : [ + "Cannot call function because named argument references are not enabled here. In this case, the named argument reference was . Set \"spark.sql.allowNamedFunctionArguments\" to \"true\" to turn on feature." +] + }, "NESTED_AGGREGATE_FUNCTION" : { "message" : [ "It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query." diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 index 6c9b3a71266..fb440ef8d37 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 @@ -443,6 +443,7 @@ CONCAT_PIPE: '||'; HAT: '^'; COLON: ':'; ARROW: '->'; +FAT_ARROW : '=>'; HENT_START: '/*+'; HENT_END: '*/'; QUESTION: '?'; diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 index d1e672e9472..ab6c0d0861f 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 @@ -789,7 +789,7 @@ inlineTable ; functionTable -: funcName=functionName LEFT_PAREN (expression (COMMA expression)*)? RIGHT_PAREN tableAlias +: funcName=functionName LEFT_PAREN (functionArgument (COMMA functionArgument)*)? RIGHT_PAREN tableAlias ; tableAlias @@ -862,6 +862,15 @@ expression : booleanExpression ; +namedArgumentExpression +: key=identifier FAT_ARROW value=expression +; + +functionArgument +: expre
[spark] branch master updated: [SPARK-44030][SQL][FOLLOW-UP] Move unapply from AnyTimestampType to AnyTimestampTypeExpression
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 618b52097c0 [SPARK-44030][SQL][FOLLOW-UP] Move unapply from AnyTimestampType to AnyTimestampTypeExpression 618b52097c0 is described below commit 618b52097c07105d734aaf9b2a22b372920b3f31 Author: Rui Wang AuthorDate: Fri Jun 30 08:38:39 2023 +0300 [SPARK-44030][SQL][FOLLOW-UP] Move unapply from AnyTimestampType to AnyTimestampTypeExpression ### What changes were proposed in this pull request? Move unapply from AnyTimestampType to AnyTimestampTypeExpression. ### Why are the changes needed? To align with the effort that we use separate type expression class to host `unapply`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing Test Closes #41771 from amaliujia/atomic_datatype_expression. Authored-by: Rui Wang Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 4 ++-- .../spark/sql/catalyst/analysis/AnsiTypeCoercion.scala | 14 -- .../apache/spark/sql/catalyst/analysis/TypeCoercion.scala | 12 +++- .../org/apache/spark/sql/types/AbstractDataType.scala | 3 --- .../org/apache/spark/sql/types/DataTypeExpression.scala| 5 + 5 files changed, 22 insertions(+), 16 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 8a192a4c132..32cec909401 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -428,8 +428,8 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor UnaryMinus(r, mode == EvalMode.ANSI), ansiEnabled = mode == EvalMode.ANSI)) case (_, CalendarIntervalType | _: DayTimeIntervalType) => Cast(DatetimeSub(l, r, TimeAdd(l, UnaryMinus(r, mode == EvalMode.ANSI))), l.dataType) - case _ if AnyTimestampType.unapply(l) || AnyTimestampType.unapply(r) => -SubtractTimestamps(l, r) + case _ if AnyTimestampTypeExpression.unapply(l) || +AnyTimestampTypeExpression.unapply(r) => SubtractTimestamps(l, r) case (_, DateType) => SubtractDates(l, r) case (DateType, dt) if dt != StringType => DateSub(l, r) case _ => s diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala index d3f20f87493..5854f42a061 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala @@ -284,7 +284,7 @@ object AnsiTypeCoercion extends TypeCoercionBase { // Skip nodes who's children have not been resolved yet. case g if !g.childrenResolved => g - case g: GetDateField if AnyTimestampType.unapply(g.child) => + case g: GetDateField if AnyTimestampTypeExpression.unapply(g.child) => g.withNewChildren(Seq(Cast(g.child, DateType))) } } @@ -294,14 +294,16 @@ object AnsiTypeCoercion extends TypeCoercionBase { // Skip nodes who's children have not been resolved yet. case e if !e.childrenResolved => e - case d @ DateAdd(AnyTimestampType(), _) => d.copy(startDate = Cast(d.startDate, DateType)) - case d @ DateSub(AnyTimestampType(), _) => d.copy(startDate = Cast(d.startDate, DateType)) + case d @ DateAdd(AnyTimestampTypeExpression(), _) => +d.copy(startDate = Cast(d.startDate, DateType)) + case d @ DateSub(AnyTimestampTypeExpression(), _) => +d.copy(startDate = Cast(d.startDate, DateType)) - case s @ SubtractTimestamps(DateTypeExpression(), AnyTimestampType(), _, _) => + case s @ SubtractTimestamps(DateTypeExpression(), AnyTimestampTypeExpression(), _, _) => s.copy(left = Cast(s.left, s.right.dataType)) - case s @ SubtractTimestamps(AnyTimestampType(), DateTypeExpression(), _, _) => + case s @ SubtractTimestamps(AnyTimestampTypeExpression(), DateTypeExpression(), _, _) => s.copy(right = Cast(s.right, s.left.dataType)) - case s @ SubtractTimestamps(AnyTimestampType(), AnyTimestampType(), _, _) + case s @ SubtractTimestamps(AnyTimestampTypeExpression(), AnyTimestampTypeExpression(), _, _) if s.left.dataType != s.right.dataType => val newLeft = castIfNotSameType(s.left, TimestampN
[spark] branch master updated: [SPARK-44208][CORE][SQL] Assign clear error class names for some logic that directly uses exceptions
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a9129defc0e [SPARK-44208][CORE][SQL] Assign clear error class names for some logic that directly uses exceptions a9129defc0e is described below commit a9129defc0ebbe68f20ec888352c30a90925d7ea Author: panbingkun AuthorDate: Thu Jun 29 17:31:03 2023 +0300 [SPARK-44208][CORE][SQL] Assign clear error class names for some logic that directly uses exceptions ### What changes were proposed in this pull request? The pr aims to assign clear error class names for some logic that directly uses exceptions, include: - ALL_PARTITION_COLUMNS_NOT_ALLOWED - INVALID_HIVE_COLUMN_NAME - SPECIFY_BUCKETING_IS_NOT_ALLOWED - SPECIFY_PARTITION_IS_NOT_ALLOWED - UNSUPPORTED_ADD_FILE.DIRECTORY - UNSUPPORTED_ADD_FILE.LOCAL_DIRECTORY ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update UT. - Pass GA. Closes #41740 from panbingkun/assign_new_name. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 42 +++--- .../main/scala/org/apache/spark/SparkContext.scala | 7 ++-- .../org/apache/spark/errors/SparkCoreErrors.scala | 14 .../spark/sql/errors/QueryCompilationErrors.scala | 2 +- .../spark/sql/execution/datasources/rules.scala| 16 + .../spark/sql/execution/command/DDLSuite.scala | 34 +- .../spark/sql/hive/HiveExternalCatalog.scala | 12 --- .../spark/sql/hive/execution/HiveDDLSuite.scala| 12 +++ 8 files changed, 97 insertions(+), 42 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 192a0747dfd..6db8c5e3bf1 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -4,6 +4,11 @@ "Non-deterministic expression should not appear in the arguments of an aggregate function." ] }, + "ALL_PARTITION_COLUMNS_NOT_ALLOWED" : { +"message" : [ + "Cannot use all columns for partition columns." +] + }, "ALTER_TABLE_COLUMN_DESCRIPTOR_DUPLICATE" : { "message" : [ "ALTER TABLE column specifies descriptor \"\" more than once, which is invalid." @@ -1180,6 +1185,11 @@ ], "sqlState" : "22023" }, + "INVALID_HIVE_COLUMN_NAME" : { +"message" : [ + "Cannot create the table having the nested column whose name contains invalid characters in Hive metastore." +] + }, "INVALID_IDENTIFIER" : { "message" : [ "The identifier is invalid. Please, consider quoting it with back-quotes as ``." @@ -2081,6 +2091,16 @@ "sortBy must be used together with bucketBy." ] }, + "SPECIFY_BUCKETING_IS_NOT_ALLOWED" : { +"message" : [ + "Cannot specify bucketing information if the table schema is not specified when creating and will be inferred at runtime." +] + }, + "SPECIFY_PARTITION_IS_NOT_ALLOWED" : { +"message" : [ + "It is not allowed to specify partition columns when the table schema is not defined. When the table schema is not provided, schema and partition columns will be inferred." +] + }, "SQL_CONF_NOT_FOUND" : { "message" : [ "The SQL config cannot be found. Please verify that the config exists." @@ -2303,6 +2323,23 @@ "Attempted to unset non-existent properties [] in table ." ] }, + "UNSUPPORTED_ADD_FILE" : { +"message" : [ + "Don't support add file." +], +"subClass" : { + "DIRECTORY" : { +"message" : [ + "The file is a directory, consider to set \"spark.sql.legacy.addSingleFileInAddFile\" to \"false\"." +] + }, + "LOCAL_DIRECTORY" : { +"message" : [ + "The local directory is not supported in a non-local master mode." +] + } +} + }, "UNSUPPORTED_ARROWTYPE" : { "message" : [ "Unsupported arrow type ." @@ -3588,11 +3625,6 @@ "Cannot use for partition column." ] }, - "_LEGACY_ERROR_TEMP_1154" : { -
[spark] branch branch-3.4 updated: [SPARK-44079][SQL][3.4] Fix `ArrayIndexOutOfBoundsException` when parse array as struct using PERMISSIVE mode with corrupt record
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new ad29290a02f [SPARK-44079][SQL][3.4] Fix `ArrayIndexOutOfBoundsException` when parse array as struct using PERMISSIVE mode with corrupt record ad29290a02f is described below commit ad29290a02fb94a958fd21e301100338c9f5b82a Author: Jia Fan AuthorDate: Thu Jun 29 16:38:02 2023 +0300 [SPARK-44079][SQL][3.4] Fix `ArrayIndexOutOfBoundsException` when parse array as struct using PERMISSIVE mode with corrupt record ### What changes were proposed in this pull request? cherry pick #41662 , fix parse array as struct bug on branch 3.4 ### Why are the changes needed? Fix the bug when parse array as struct using PERMISSIVE mode with corrupt record ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. Closes #41784 from Hisoka-X/SPARK-44079_3.4_cherry_pick. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../spark/sql/catalyst/csv/UnivocityParser.scala | 4 ++-- .../spark/sql/catalyst/json/JacksonParser.scala | 20 +++- .../spark/sql/catalyst/util/BadRecordException.scala | 14 -- .../spark/sql/catalyst/util/FailureSafeParser.scala | 9 +++-- .../sql/execution/datasources/json/JsonSuite.scala | 15 +++ 5 files changed, 51 insertions(+), 11 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala index 42e03630b14..b58649da61c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala @@ -318,7 +318,7 @@ class UnivocityParser( if (tokens == null) { throw BadRecordException( () => getCurrentInput, -() => None, +() => Array.empty, QueryExecutionErrors.malformedCSVRecordError("")) } @@ -362,7 +362,7 @@ class UnivocityParser( } else { if (badRecordException.isDefined) { throw BadRecordException( - () => currentInput, () => requiredRow.headOption, badRecordException.get) + () => currentInput, () => Array(requiredRow.get), badRecordException.get) } else { requiredRow } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index bf07d65caa0..d9bff3dc7ec 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala @@ -135,7 +135,7 @@ class JacksonParser( // List([str_a_2,null], [null,str_b_3]) // case START_ARRAY if allowArrayAsStructs => -val array = convertArray(parser, elementConverter, isRoot = true) +val array = convertArray(parser, elementConverter, isRoot = true, arrayAsStructs = true) // Here, as we support reading top level JSON arrays and take every element // in such an array as a row, this case is possible. if (array.numElements() == 0) { @@ -517,7 +517,8 @@ class JacksonParser( private def convertArray( parser: JsonParser, fieldConverter: ValueConverter, - isRoot: Boolean = false): ArrayData = { + isRoot: Boolean = false, + arrayAsStructs: Boolean = false): ArrayData = { val values = ArrayBuffer.empty[Any] var badRecordException: Option[Throwable] = None @@ -537,6 +538,9 @@ class JacksonParser( if (badRecordException.isEmpty) { arrayData +} else if (arrayAsStructs) { + throw PartialResultArrayException(arrayData.toArray[InternalRow](schema), +badRecordException.get) } else { throw PartialResultException(InternalRow(arrayData), badRecordException.get) } @@ -570,7 +574,7 @@ class JacksonParser( // JSON parser currently doesn't support partial results for corrupted records. // For such records, all fields other than the field configured by // `columnNameOfCorruptRecord` are set to `null`. -throw BadRecordException(() => recordLiteral(record), () => None, e) +throw BadRecordException(() => recordLiteral(record), () => Array.empty, e) case e: CharConversionException if options.encoding.isEmpty => val msg = """JSON parser cannot handle a character in its input. @@ -578,11 +582,17 @@ class JacksonParser( |""".stripMargin + e
[spark] branch master updated: [MINOR][TESTS] Fix potential bug for AlterTableTest
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6511a3e9020 [MINOR][TESTS] Fix potential bug for AlterTableTest 6511a3e9020 is described below commit 6511a3e90206473985c2d6fd28d06eb7bcf8c98f Author: panbingkun AuthorDate: Thu Jun 29 12:28:03 2023 +0300 [MINOR][TESTS] Fix potential bug for AlterTableTest ### What changes were proposed in this pull request? The pr aims to fix potential bug for `AlterTableTest`. ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #41783 from panbingkun/AlterTableTests_fix. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../spark/sql/connector/AlterTableTests.scala | 373 + 1 file changed, 164 insertions(+), 209 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala index 2047212a4ea..122b3ab07e6 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala @@ -42,7 +42,7 @@ trait AlterTableTests extends SharedSparkSession with QueryErrorsBase { if (catalogAndNamespace.isEmpty) { s"default.$tableName" } else { - s"${catalogAndNamespace}table_name" + s"$catalogAndNamespace$tableName" } } @@ -63,7 +63,7 @@ trait AlterTableTests extends SharedSparkSession with QueryErrorsBase { } test("AlterTable: change rejected by implementation") { -val t = s"${catalogAndNamespace}table_name" +val t = fullTableName("table_name") withTable(t) { sql(s"CREATE TABLE $t (id int) USING $v2Format") @@ -74,38 +74,35 @@ trait AlterTableTests extends SharedSparkSession with QueryErrorsBase { assert(exc.getMessage.contains("Unsupported table change")) assert(exc.getMessage.contains("Cannot drop all fields")) // from the implementation - val tableName = fullTableName(t) - val table = getTableMetadata(tableName) + val table = getTableMetadata(t) - assert(table.name === tableName) + assert(table.name === t) assert(table.schema === new StructType().add("id", IntegerType)) } } test("AlterTable: add top-level column") { -val t = s"${catalogAndNamespace}table_name" +val t = fullTableName("table_name") withTable(t) { sql(s"CREATE TABLE $t (id int) USING $v2Format") sql(s"ALTER TABLE $t ADD COLUMN data string") - val tableName = fullTableName(t) - val table = getTableMetadata(tableName) + val table = getTableMetadata(t) - assert(table.name === tableName) + assert(table.name === t) assert(table.schema === new StructType().add("id", IntegerType).add("data", StringType)) } } test("AlterTable: add column with NOT NULL") { -val t = s"${catalogAndNamespace}table_name" +val t = fullTableName("table_name") withTable(t) { sql(s"CREATE TABLE $t (id int) USING $v2Format") sql(s"ALTER TABLE $t ADD COLUMN data string NOT NULL") - val tableName = fullTableName(t) - val table = getTableMetadata(tableName) + val table = getTableMetadata(t) - assert(table.name === tableName) + assert(table.name === t) assert(table.schema === StructType(Seq( StructField("id", IntegerType), StructField("data", StringType, nullable = false @@ -113,15 +110,14 @@ trait AlterTableTests extends SharedSparkSession with QueryErrorsBase { } test("AlterTable: add column with comment") { -val t = s"${catalogAndNamespace}table_name" +val t = fullTableName("table_name") withTable(t) { sql(s"CREATE TABLE $t (id int) USING $v2Format") sql(s"ALTER TABLE $t ADD COLUMN data string COMMENT 'doc'") - val tableName = fullTableName(t) - val table = getTableMetadata(tableName) + val table = getTableMetadata(t) - assert(table.name === tableName) + assert(table.name === t) assert(table.schema === StructType(Seq( StructField("id", IntegerType), StructField("data", StringType).withComment("doc" @@ -129,7 +125,7 @@ trait AlterTableTests extends SharedSparkSession with QueryErrorsBase { }
[spark] branch master updated: [SPARK-44169][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ffbd1a3b5b1 [SPARK-44169][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304] ffbd1a3b5b1 is described below commit ffbd1a3b5b17386759a378dee5ef5cf6df7f2d09 Author: Jiaan Geng AuthorDate: Thu Jun 29 12:26:24 2023 +0300 [SPARK-44169][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304] ### What changes were proposed in this pull request? The pr aims to assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases updated and added new test cases. Closes #41719 from beliefer/SPARK-44169. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json| 74 --- .../catalyst/analysis/ResolveInlineTables.scala| 12 +- .../catalyst/analysis/higherOrderFunctions.scala | 14 +- .../analysis/ResolveLambdaVariablesSuite.scala | 18 +- .../spark/sql/execution/datasources/rules.scala| 4 +- .../sql-tests/analyzer-results/cte.sql.out | 4 +- .../analyzer-results/inline-table.sql.out | 12 +- .../analyzer-results/postgreSQL/boolean.sql.out| 2 +- .../postgreSQL/window_part3.sql.out| 2 +- .../postgreSQL/window_part4.sql.out| 2 +- .../analyzer-results/udf/udf-inline-table.sql.out | 12 +- .../test/resources/sql-tests/results/cte.sql.out | 4 +- .../sql-tests/results/inline-table.sql.out | 12 +- .../sql-tests/results/postgreSQL/boolean.sql.out | 2 +- .../results/postgreSQL/window_part3.sql.out| 2 +- .../results/postgreSQL/window_part4.sql.out| 2 +- .../sql-tests/results/udf/udf-inline-table.sql.out | 12 +- .../apache/spark/sql/ColumnExpressionSuite.scala | 33 +++- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 219 +++-- 19 files changed, 297 insertions(+), 145 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index e441686432a..192a0747dfd 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -704,11 +704,6 @@ ], "sqlState" : "42K04" }, - "FAILED_SQL_EXPRESSION_EVALUATION" : { -"message" : [ - "Failed to evaluate the SQL expression: . Please check your syntax and ensure all required tables and columns are available." -] - }, "FIELD_NOT_FOUND" : { "message" : [ "No such struct field in ." @@ -1197,6 +1192,28 @@ ], "sqlState" : "22003" }, + "INVALID_INLINE_TABLE" : { +"message" : [ + "Invalid inline table." +], +"subClass" : { + "CANNOT_EVALUATE_EXPRESSION_IN_INLINE_TABLE" : { +"message" : [ + "Cannot evaluate the expression in inline table definition." +] + }, + "FAILED_SQL_EXPRESSION_EVALUATION" : { +"message" : [ + "Failed to evaluate the SQL expression . Please check your syntax and ensure all required tables and columns are available." +] + }, + "INCOMPATIBLE_TYPES_IN_INLINE_TABLE" : { +"message" : [ + "Found incompatible types in the column for inline table." +] + } +} + }, "INVALID_JSON_ROOT_FIELD" : { "message" : [ "Cannot convert JSON root field to target Spark type." @@ -1209,6 +1226,23 @@ ], "sqlState" : "22032" }, + "INVALID_LAMBDA_FUNCTION_CALL" : { +"message" : [ + "Invalid lambda function call." +], +"subClass" : { + "DUPLICATE_ARG_NAMES" : { +"message" : [ + "The lambda function has duplicate arguments . Please, consider to rename the argument names or set to \"true\"." +] + }, + "NUM_ARGS_MISMATCH" : { +"message" : [ + "A higher order function expects arguments, but got ." +] + } +} + }, "INVALID_LATERAL_JOIN_TYPE" : { "message" : [ "The JOIN with LATERAL correlation is not allowed because an OUTER subquery cannot correlate to its join partner. Remove the LATERAL co
[spark] branch master updated (af536459501 -> 70f34278cbf)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from af536459501 [SPARK-44237][CORE] Simplify DirectByteBuffer constructor lookup logic add 70f34278cbf [SPARK-44079][SQL] Fix `ArrayIndexOutOfBoundsException` when parse array as struct using PERMISSIVE mode with corrupt record No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/csv/UnivocityParser.scala | 4 ++-- .../spark/sql/catalyst/json/JacksonParser.scala | 20 +++- .../spark/sql/catalyst/util/BadRecordException.scala | 14 -- .../spark/sql/catalyst/util/FailureSafeParser.scala | 9 +++-- .../sql/execution/datasources/json/JsonSuite.scala | 15 +++ 5 files changed, 51 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f26bdb7bfde -> d14a6ecd9e1)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f26bdb7bfde [SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0 add d14a6ecd9e1 [SPARK-40850][SQL] Fix test case interpreted queries may execute Codegen No new revisions were added by this update. Summary of changes: .../src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43914][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1c8c47cb55d [SPARK-43914][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437] 1c8c47cb55d is described below commit 1c8c47cb55da75526fef4dd41ed0734b01e71814 Author: Jiaan Geng AuthorDate: Wed Jun 28 08:22:01 2023 +0300 [SPARK-43914][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437] ### What changes were proposed in this pull request? The pr aims to assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases updated. Closes #41476 from beliefer/SPARK-43914. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 34 -- .../sql/catalyst/analysis/CheckAnalysis.scala | 65 +++ .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 74 -- .../org/apache/spark/sql/DataFrameSuite.scala | 14 4 files changed, 120 insertions(+), 67 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 342af0ffa6c..e441686432a 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -5637,40 +5637,6 @@ "Cannot change nullable column to non-nullable: ." ] }, - "_LEGACY_ERROR_TEMP_2433" : { -"message" : [ - "Only a single table generating function is allowed in a SELECT clause, found:", - "." -] - }, - "_LEGACY_ERROR_TEMP_2434" : { -"message" : [ - "Failure when resolving conflicting references in Join:", - "", - "Conflicting attributes: ." -] - }, - "_LEGACY_ERROR_TEMP_2435" : { -"message" : [ - "Failure when resolving conflicting references in Intersect:", - "", - "Conflicting attributes: ." -] - }, - "_LEGACY_ERROR_TEMP_2436" : { -"message" : [ - "Failure when resolving conflicting references in Except:", - "", - "Conflicting attributes: ." -] - }, - "_LEGACY_ERROR_TEMP_2437" : { -"message" : [ - "Failure when resolving conflicting references in AsOfJoin:", - "", - "Conflicting attributes: ." -] - }, "_LEGACY_ERROR_TEMP_2446" : { "message" : [ "Operation not allowed: only works on table with location provided: " diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 7c0e8f1490d..a0296d27361 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -674,9 +674,8 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB } case p @ Project(exprs, _) if containsMultipleGenerators(exprs) => -p.failAnalysis( - errorClass = "_LEGACY_ERROR_TEMP_2433", - messageParameters = Map("sqlExprs" -> exprs.map(_.sql).mkString(","))) +val generators = exprs.filter(expr => expr.exists(_.isInstanceOf[Generator])) +throw QueryCompilationErrors.moreThanOneGeneratorError(generators, "SELECT") case p @ Project(projectList, _) => projectList.foreach(_.transformDownWithPruning( @@ -686,36 +685,48 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB }) case j: Join if !j.duplicateResolved => -val conflictingAttributes = j.left.outputSet.intersect(j.right.outputSet) -j.failAnalysis( - errorClass = "_LEGACY_ERROR_TEMP_2434", - messageParameters = Map( -"plan" -> plan.toString, -"conflictingAttributes" -> conflictingAttributes.mkString(","))) +val conflictingAttributes = + j.left.outputSet.intersect(j.right.outputSet).map(toSQLExpr(_)).mkString(", ") +throw SparkException.internalError( + msg = s""" +
[spark] branch master updated: [SPARK-44171][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new be8b07a1534 [SPARK-44171][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes be8b07a1534 is described below commit be8b07a15348d8fea15c33d35a75969ca1693ff6 Author: panbingkun AuthorDate: Tue Jun 27 19:31:30 2023 +0300 [SPARK-44171][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes ### What changes were proposed in this pull request? The pr aims to assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] and delete some unused error classes, details as follows: _LEGACY_ERROR_TEMP_0036 -> `Delete` _LEGACY_ERROR_TEMP_1341 -> `Delete` _LEGACY_ERROR_TEMP_1342 -> `Delete` _LEGACY_ERROR_TEMP_1304 -> `Delete` _LEGACY_ERROR_TEMP_2072 -> `Delete` _LEGACY_ERROR_TEMP_2279 -> `Delete` _LEGACY_ERROR_TEMP_2280 -> UNSUPPORTED_FEATURE.COMMENT_NAMESPACE _LEGACY_ERROR_TEMP_2281 -> UNSUPPORTED_FEATURE.REMOVE_NAMESPACE_COMMENT _LEGACY_ERROR_TEMP_2282 -> UNSUPPORTED_FEATURE.DROP_NAMESPACE_RESTRICT ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41721 from panbingkun/SPARK-44171. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala| 19 +-- core/src/main/resources/error/error-classes.json | 60 ++ .../spark/sql/errors/QueryCompilationErrors.scala | 16 -- .../spark/sql/errors/QueryExecutionErrors.scala| 30 +-- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 6 +-- 5 files changed, 47 insertions(+), 84 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala index a7ef8d4e104..d58146fecdf 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala @@ -73,7 +73,8 @@ class MySQLNamespaceSuite extends DockerJDBCIntegrationSuite with V2JDBCNamespac exception = intercept[SparkSQLFeatureNotSupportedException] { catalog.createNamespace(Array("foo"), Map("comment" -> "test comment").asJava) }, - errorClass = "_LEGACY_ERROR_TEMP_2280" + errorClass = "UNSUPPORTED_FEATURE.COMMENT_NAMESPACE", + parameters = Map("namespace" -> "`foo`") ) assert(catalog.namespaceExists(Array("foo")) === false) catalog.createNamespace(Array("foo"), Map.empty[String, String].asJava) @@ -84,13 +85,25 @@ class MySQLNamespaceSuite extends DockerJDBCIntegrationSuite with V2JDBCNamespac Array("foo"), NamespaceChange.setProperty("comment", "comment for foo")) }, - errorClass = "_LEGACY_ERROR_TEMP_2280") + errorClass = "UNSUPPORTED_FEATURE.COMMENT_NAMESPACE", + parameters = Map("namespace" -> "`foo`") +) checkError( exception = intercept[SparkSQLFeatureNotSupportedException] { catalog.alterNamespace(Array("foo"), NamespaceChange.removeProperty("comment")) }, - errorClass = "_LEGACY_ERROR_TEMP_2281") + errorClass = "UNSUPPORTED_FEATURE.REMOVE_NAMESPACE_COMMENT", + parameters = Map("namespace" -> "`foo`") +) + +checkError( + exception = intercept[SparkSQLFeatureNotSupportedException] { +catalog.dropNamespace(Array("foo"), cascade = false) + }, + errorClass = "UNSUPPORTED_FEATURE.DROP_NAMESPACE", + parameters = Map("namespace" -> "`foo`") +) catalog.dropNamespace(Array("foo"), cascade = true) assert(catalog.namespaceExists(Array("foo")) === false) } diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 78b54d5230d..342af0ffa6c 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -2383,11 +2383,21 @@ "Combination of ORDER BY/SORT BY/DISTRIBUTE BY/CLUSTER BY." ] }, +
[spark] branch master updated: [SPARK-44189][CONNECT][PYTHON] Support positional parameters by `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e98987220ae [SPARK-44189][CONNECT][PYTHON] Support positional parameters by `sql()` e98987220ae is described below commit e98987220ae191ecc10944026fee9c57ddf478c1 Author: Max Gekk AuthorDate: Mon Jun 26 19:42:17 2023 +0300 [SPARK-44189][CONNECT][PYTHON] Support positional parameters by `sql()` ### What changes were proposed in this pull request? In the PR, I propose to extend the `sql()` method of Python connect client, and support positional parameters as list of Python objects that can be converted to literal expressions. ```python def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = None) -> DataFrame: ``` where - **args** is a dictionary of parameter names to Python objects or a list of Python objects that can be converted to SQL literal expressions. See the [link](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) regarding the supported value types in PySpark. For example: _1, "Steven", datetime.date(2023, 4, 2)_. The same as in Scala/Java API, a value can be also a `Column` of literal expression, in that case it is taken as is. For example: ```python >>> connect.sql("SELECT * FROM {df} WHERE {df[B]} > ? and ? < {df[A]}", [5, 2], df=mydf).show() +---+---+ | A| B| +---+---+ | 3| 6| +---+---+ ``` ### Why are the changes needed? To achieve feature parity with the PySpark API. ### Does this PR introduce _any_ user-facing change? No, the PR just extends the existing API. ### How was this patch tested? By running new test: ``` $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.tests.connect.test_connect_basic SparkConnectBasicTests.test_sql_with_pos_args' ``` and the renamed test: ``` $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.tests.connect.test_connect_basic SparkConnectBasicTests.test_sql_with_named_args' ``` Closes #41739 from MaxGekk/positional-params-python-connect. Authored-by: Max Gekk Signed-off-by: Max Gekk --- python/pyspark/sql/connect/plan.py | 36 -- python/pyspark/sql/connect/session.py | 2 +- .../sql/tests/connect/test_connect_basic.py| 7 - 3 files changed, 34 insertions(+), 11 deletions(-) diff --git a/python/pyspark/sql/connect/plan.py b/python/pyspark/sql/connect/plan.py index 406f65080d1..fabab98d9b2 100644 --- a/python/pyspark/sql/connect/plan.py +++ b/python/pyspark/sql/connect/plan.py @@ -1019,12 +1019,15 @@ class SubqueryAlias(LogicalPlan): class SQL(LogicalPlan): -def __init__(self, query: str, args: Optional[Dict[str, Any]] = None) -> None: +def __init__(self, query: str, args: Optional[Union[Dict[str, Any], List]] = None) -> None: super().__init__(None) if args is not None: -for k, v in args.items(): -assert isinstance(k, str) +if isinstance(args, Dict): +for k, v in args.items(): +assert isinstance(k, str) +else: +assert isinstance(args, List) self._query = query self._args = args @@ -1034,8 +1037,16 @@ class SQL(LogicalPlan): plan.sql.query = self._query if self._args is not None and len(self._args) > 0: -for k, v in self._args.items(): - plan.sql.args[k].CopyFrom(LiteralExpression._from_value(v).to_plan(session).literal) +if isinstance(self._args, Dict): +for k, v in self._args.items(): +plan.sql.args[k].CopyFrom( + LiteralExpression._from_value(v).to_plan(session).literal +) +else: +for v in self._args: +plan.sql.pos_args.append( + LiteralExpression._from_value(v).to_plan(session).literal +) return plan @@ -1043,10 +1054,17 @@ class SQL(LogicalPlan): cmd = proto.Command() cmd.sql_command.sql = self._query if self._args is not None and len(self._args) > 0: -for k, v in self._args.items(): -cmd.sql_command.args[k].CopyFrom( -LiteralExpression._from_value(v).to_plan(session).literal -) +if isinstance(self._args, Dict): +for k, v in self._args.items(): +cmd.sql_command.args[k].CopyFrom( + LiteralExpression._from_value(v).to_plan(session).literal +) +
[spark] branch master updated: [SPARK-44143][SQL][TESTS] Use checkError() to check Exception in *DDL*Suite
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 67abc430140 [SPARK-44143][SQL][TESTS] Use checkError() to check Exception in *DDL*Suite 67abc430140 is described below commit 67abc430140558e60c785b158e9199dc884fb15c Author: panbingkun AuthorDate: Mon Jun 26 09:28:02 2023 +0300 [SPARK-44143][SQL][TESTS] Use checkError() to check Exception in *DDL*Suite ### What changes were proposed in this pull request? The pr aims to use `checkError()` to check `Exception` in `*DDL*Suite`, include: - sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite - sql/core/src/test/scala/org/apache/spark/sql/sources/DDLSourceLoadSuite - sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite - sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/Hive_2_1_DDLSuite ### Why are the changes needed? Migration on checkError() will make the tests independent from the text of error messages. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #41699 from panbingkun/DDLSuite. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../spark/sql/execution/command/DDLSuite.scala | 454 .../spark/sql/sources/DDLSourceLoadSuite.scala | 30 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 769 ++--- .../sql/hive/execution/Hive_2_1_DDLSuite.scala | 17 +- 4 files changed, 865 insertions(+), 405 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala index 21e6980db8f..dd126027b36 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala @@ -189,10 +189,13 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSparkSession { sql("CREATE TABLE s(a INT, b INT) USING parquet") val source = catalog.getTableMetadata(TableIdentifier("s")) assert(source.provider == Some("parquet")) - val e = intercept[AnalysisException] { -sql("CREATE TABLE t LIKE s USING org.apache.spark.sql.hive.orc") - }.getMessage - assert(e.contains("Hive built-in ORC data source must be used with Hive support enabled")) + checkError( +exception = intercept[AnalysisException] { + sql("CREATE TABLE t LIKE s USING org.apache.spark.sql.hive.orc") +}, +errorClass = "_LEGACY_ERROR_TEMP_1138", +parameters = Map.empty + ) } } @@ -284,13 +287,6 @@ trait DDLSuiteBase extends SQLTestUtils { } } - protected def assertUnsupported(query: String): Unit = { -val e = intercept[AnalysisException] { - sql(query) -} -assert(e.getMessage.toLowerCase(Locale.ROOT).contains("operation not allowed")) - } - protected def maybeWrapException[T](expectException: Boolean)(body: => T): Unit = { if (expectException) intercept[AnalysisException] { body } else body } @@ -431,9 +427,11 @@ abstract class DDLSuite extends QueryTest with DDLSuiteBase { |$partitionClause """.stripMargin if (userSpecifiedSchema.isEmpty && userSpecifiedPartitionCols.nonEmpty) { -val e = intercept[AnalysisException](sql(sqlCreateTable)).getMessage -assert(e.contains( - "not allowed to specify partition columns when the table schema is not defined")) +checkError( + exception = intercept[AnalysisException](sql(sqlCreateTable)), + errorClass = null, + parameters = Map.empty +) } else { sql(sqlCreateTable) val tableMetadata = spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName)) @@ -615,17 +613,21 @@ abstract class DDLSuite extends QueryTest with DDLSuiteBase { .option("path", dir1.getCanonicalPath) .saveAsTable("path_test") - val ex = intercept[AnalysisException] { -Seq((3L, "c")).toDF("v1", "v2") - .write - .mode(SaveMode.Append) - .format("json") - .option("path", dir2.getCanonicalPath) - .saveAsTable("path_test") - }.getMessage - assert(ex.contains( -s"The location of the existing table `$SESSION_CATALOG_NAME`.`default`.`path_test`")) - + checkErrorMatchPVals( +exception =
[spark] branch master updated: [MINOR][CONNECT][TESTS] Check named parameters in `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 37c898c63b1 [MINOR][CONNECT][TESTS] Check named parameters in `sql()` 37c898c63b1 is described below commit 37c898c63b1fd9fcb9773313246ff28e631eb28f Author: Max Gekk AuthorDate: Mon Jun 26 09:17:56 2023 +0300 [MINOR][CONNECT][TESTS] Check named parameters in `sql()` ### What changes were proposed in this pull request? In the PR, I propose to add new tests to check named parameters in `sql()` of Scala connect client. ### Why are the changes needed? To improve test coverage. Before the PR, the feature has not been tested at all. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.ClientE2ETestSuite" ``` Closes #41726 from MaxGekk/test-named-params-proto. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala | 11 +++ 1 file changed, 11 insertions(+) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala index b24e445964a..0ababaa0af1 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala @@ -960,6 +960,17 @@ class ClientE2ETestSuite extends RemoteSparkSession with SQLHelper with PrivateM assert(result2(0).getInt(0) === 1) assert(result2(0).getString(1) === "abc") } + + test("sql() with named parameters") { +val result0 = spark.sql("select 1", Map.empty[String, Any]).collect() +assert(result0.length == 1 && result0(0).getInt(0) === 1) + +val result1 = spark.sql("select :abc", Map("abc" -> 1)).collect() +assert(result1.length == 1 && result1(0).getInt(0) === 1) + +val result2 = spark.sql("select :c0 limit :l0", Map("l0" -> 1, "c0" -> "abc")).collect() +assert(result2.length == 1 && result2(0).getString(0) === "abc") + } } private[sql] case class MyType(id: Long, a: Double, b: Double) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44140][SQL][PYTHON] Support positional parameters in Python `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 532a8325f5a [SPARK-44140][SQL][PYTHON] Support positional parameters in Python `sql()` 532a8325f5a is described below commit 532a8325f5a3f11974c383cd1e344bb2ed56e9d8 Author: Max Gekk AuthorDate: Thu Jun 22 16:38:36 2023 +0300 [SPARK-44140][SQL][PYTHON] Support positional parameters in Python `sql()` ### What changes were proposed in this pull request? In the PR, I propose to extend PySpark API and extend the `sql` method by: ```python def sql( self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = None, **kwargs: Any ) -> DataFrame: ``` which accepts an list of Python objects that can be converted to SQL literal expressions. For example: ```python spark.sql("SELECT * FROM {df} WHERE {df[B]} > ? and ? < {df[A]}", args=[5, 2], df=mydf).show() ``` The `sql()` method parses the input SQL statement and replaces the positional parameters by the literal values. ### Why are the changes needed? 1. To conform the SQL standard and JDBC/ODBC protocol. 2. To improve user experience with PySpark via - Using Spark as remote service (microservice). - Write SQL code that will power reports, dashboards, charts and other data presentation solutions that need to account for criteria modifiable by users through an interface. - Build a generic integration layer based on the PySpark API. The goal is to expose managed data to a wide application ecosystem with a microservice architecture. It is only natural in such a setup to ask for modular and reusable SQL code, that can be executed repeatedly with different parameter values. 3. To achieve feature parity with other systems that support positional parameters. ### Does this PR introduce _any_ user-facing change? No, the changes extend the existing API. ### How was this patch tested? By running new checks: ``` $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.session SparkSession.sql' $ python/run-tests --parallelism=1 --testnames 'pyspark.pandas.sql_formatter' ``` Closes #41695 from MaxGekk/parametrized-query-pos-param-python. Authored-by: Max Gekk Signed-off-by: Max Gekk --- python/pyspark/pandas/sql_formatter.py | 20 - python/pyspark/sql/session.py | 40 +++--- 2 files changed, 47 insertions(+), 13 deletions(-) diff --git a/python/pyspark/pandas/sql_formatter.py b/python/pyspark/pandas/sql_formatter.py index 4387a1e0909..350152a2cdb 100644 --- a/python/pyspark/pandas/sql_formatter.py +++ b/python/pyspark/pandas/sql_formatter.py @@ -43,7 +43,7 @@ _CAPTURE_SCOPES = 3 def sql( query: str, index_col: Optional[Union[str, List[str]]] = None, -args: Optional[Dict[str, Any]] = None, +args: Optional[Union[Dict[str, Any], List]] = None, **kwargs: Any, ) -> DataFrame: """ @@ -102,18 +102,21 @@ def sql( e f 3 6 Also note that the index name(s) should be matched to the existing name. -args : dict -A dictionary of parameter names to Python objects that can be converted to -SQL literal expressions. See +args : dict or list +A dictionary of parameter names to Python objects or a list of Python objects +that can be converted to SQL literal expressions. See https://spark.apache.org/docs/latest/sql-ref-datatypes.html;> Supported Data Types for supported value types in Python. For example, dictionary keys: "rank", "name", "birthdate"; dictionary values: 1, "Steven", datetime.date(2023, 4, 2). -Dict value can be also a `Column` of literal expression, in that case it is taken as is. +A value can be also a `Column` of literal expression, in that case it is taken as is. .. versionadded:: 3.4.0 +.. versionchanged:: 3.5.0 +Added positional parameters. + kwargs other variables that the user want to set that can be referenced in the query @@ -174,6 +177,13 @@ def sql( id 0 8 1 9 + +Or positional parameters marked by `?` in the SQL query by SQL literals. + +>>> ps.sql("SELECT * FROM range(10) WHERE id > ?", args=[7]) + id +0 8 +1 9 """ if os.environ.get("PYSPARK_PANDAS_SQL_LEGACY") == "1": from pyspark.pandas import sql_processor diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index 823164475ea..47b73700f0c 100644 --- a/python/pyspark/sql/sess
[spark] branch master updated: [SPARK-44066][SQL] Support positional parameters in Scala/Java `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1b4048bf62d [SPARK-44066][SQL] Support positional parameters in Scala/Java `sql()` 1b4048bf62d is described below commit 1b4048bf62dddae7d324c4b12aa409a1bd456dc5 Author: Max Gekk AuthorDate: Thu Jun 22 09:40:30 2023 +0300 [SPARK-44066][SQL] Support positional parameters in Scala/Java `sql()` ### What changes were proposed in this pull request? In the PR, I propose to extend SparkSession API and override the `sql` method by: ```scala def sql(sqlText: String, args: Array[_]): DataFrame ``` which accepts an array of Java/Scala objects that can be converted to SQL literal expressions. And the first argument `sqlText` might have named parameters in the positions of constants like literal values. A value can be also a `Column` of literal expression, in that case it is taken as is. For example: ```scala spark.sql( sqlText = "SELECT * FROM tbl WHERE date > ? LIMIT ?", args = Array(LocalDate.of(2023, 6, 15), 100)) ``` The new `sql()` method parses the input SQL statement and replaces the positional parameters by the literal values. ### Why are the changes needed? 1. To conform the SQL standard and JDBC/ODBC protocol. 2. To improve user experience with Spark SQL via - Using Spark as remote service (microservice). - Write SQL code that will power reports, dashboards, charts and other data presentation solutions that need to account for criteria modifiable by users through an interface. - Build a generic integration layer based on the SQL API. The goal is to expose managed data to a wide application ecosystem with a microservice architecture. It is only natural in such a setup to ask for modular and reusable SQL code, that can be executed repeatedly with different parameter values. 3. To achieve feature parity with other systems that support positional parameters. ### Does this PR introduce _any_ user-facing change? No, the changes extend the existing API. ### How was this patch tested? By running new tests: ``` $ build/sbt "test:testOnly *AnalysisSuite" $ build/sbt "test:testOnly *PlanParserSuite" $ build/sbt "test:testOnly *ParametersSuite" ``` and the affected test suites: ``` $ build/sbt "sql/testOnly *QueryExecutionErrorsSuite" ``` Closes #41568 from MaxGekk/parametrized-query-pos-param. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../CheckConnectJvmClientCompatibility.scala | 2 + .../sql/connect/planner/SparkConnectPlanner.scala | 4 +- .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 1 + .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 5 +- .../spark/sql/catalyst/analysis/parameters.scala | 95 ++-- .../spark/sql/catalyst/parser/AstBuilder.scala | 14 +- .../sql/catalyst/analysis/AnalysisSuite.scala | 22 +- .../sql/catalyst/parser/PlanParserSuite.scala | 25 +- .../scala/org/apache/spark/sql/SparkSession.scala | 34 ++- .../apache/spark/sql/JavaSparkSessionSuite.java| 28 +++ .../org/apache/spark/sql/ParametersSuite.scala | 265 +++-- .../sql/errors/QueryExecutionErrorsSuite.scala | 10 +- 12 files changed, 448 insertions(+), 57 deletions(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala index 6b648fd152b..acc469672b4 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala @@ -227,6 +227,8 @@ object CheckConnectJvmClientCompatibility { ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.createDataset"), ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.executeCommand"), ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.this"), + // TODO(SPARK-44068): Support positional parameters in Scala connect client + ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.sql"), // RuntimeConfig ProblemFilters.exclude[Problem]("org.apache.spark.sql.RuntimeConfig.this"), diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/sca
[spark] branch master updated: [SPARK-43915][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bbcc438e5b3 [SPARK-43915][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445] bbcc438e5b3 is described below commit bbcc438e5b3aef67bf430b6bb6e4f893d8e66d13 Author: Jiaan Geng AuthorDate: Wed Jun 21 21:20:01 2023 +0300 [SPARK-43915][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445] ### What changes were proposed in this pull request? The pr aims to assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases updated. Closes #41553 from beliefer/SPARK-43915. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 47 +- python/pyspark/sql/tests/test_udtf.py | 8 +++- .../spark/sql/catalyst/analysis/Analyzer.scala | 4 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 23 +-- .../sql/catalyst/analysis/AnalysisSuite.scala | 28 - .../analyzer-results/group-analytics.sql.out | 2 +- .../analyzer-results/join-lateral.sql.out | 4 +- .../udf/udf-group-analytics.sql.out| 2 +- .../sql-tests/results/group-analytics.sql.out | 2 +- .../sql-tests/results/join-lateral.sql.out | 4 +- .../results/udf/udf-group-analytics.sql.out| 2 +- .../spark/sql/DataFrameSetOperationsSuite.scala| 44 ++-- .../sql/connector/DataSourceV2FunctionSuite.scala | 13 +- .../sql/connector/DeleteFromTableSuiteBase.scala | 15 +-- .../connector/DeltaBasedDeleteFromTableSuite.scala | 20 + .../sql/connector/DeltaBasedUpdateTableSuite.scala | 21 ++ .../connector/GroupBasedDeleteFromTableSuite.scala | 22 +- .../sql/connector/GroupBasedUpdateTableSuite.scala | 23 ++- .../spark/sql/connector/UpdateTableSuiteBase.scala | 15 +-- 19 files changed, 195 insertions(+), 104 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 1d2f25b72f3..264d9b7c3a0 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -643,6 +643,11 @@ ], "sqlState" : "23505" }, + "DUPLICATED_METRICS_NAME" : { +"message" : [ + "The metric name is not unique: . The same name cannot be used for metrics with different results. However multiple instances of metrics with with same result and name are allowed (e.g. self-joins)." +] + }, "DUPLICATE_CLAUSES" : { "message" : [ "Found duplicate clauses: . Please, remove one of them." @@ -1237,6 +1242,11 @@ } } }, + "INVALID_NON_DETERMINISTIC_EXPRESSIONS" : { +"message" : [ + "The operator expects a deterministic expression, but the actual expression is ." +] + }, "INVALID_NUMERIC_LITERAL_RANGE" : { "message" : [ "Numeric literal is outside the valid range for with minimum value of and maximum value of . Please adjust the value accordingly." @@ -1512,6 +1522,11 @@ ], "sqlState" : "42604" }, + "INVALID_UDF_IMPLEMENTATION" : { +"message" : [ + "Function does not implement ScalarFunction or AggregateFunction." +] + }, "INVALID_URL" : { "message" : [ "The url is invalid: . If necessary set to \"false\" to bypass this error." @@ -2458,6 +2473,11 @@ " is a reserved namespace property, ." ] }, + "SET_OPERATION_ON_MAP_TYPE" : { +"message" : [ + "Cannot have MAP type columns in DataFrame which calls set operations (INTERSECT, EXCEPT, etc.), but the type of column is ." +] + }, "SET_PROPERTIES_AND_DBPROPERTIES" : { "message" : [ "set PROPERTIES and DBPROPERTIES at the same time." @@ -5659,33 +5679,6 @@ "Conflicting attributes: ." ] }, - "_LEGACY_ERROR_TEMP_2438" : { -"message" : [ - "Cannot have map type columns in DataFrame which calls set operations(intersect, except, etc.), but the type of column is ." -] - }, - "_LEGACY_ERROR_TEMP_2439" : { -"message" : [ - "nondeterministic expressions are
[spark] branch master updated: [SPARK-44056][SQL] Include UDF name in UDF execution failure error message when available
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6165f316063 [SPARK-44056][SQL] Include UDF name in UDF execution failure error message when available 6165f316063 is described below commit 6165f31606344efdf35f060d07cee46b85948e38 Author: Rob Reeves AuthorDate: Wed Jun 21 18:00:36 2023 +0300 [SPARK-44056][SQL] Include UDF name in UDF execution failure error message when available ### What changes were proposed in this pull request? This modifies the error message when a Scala UDF fails to execute by including the UDF name if it is available. ### Why are the changes needed? If there are multiple UDFs defined in the same location with the same method signature it can be hard to identify which UDF causes the issue. The current function class alone does not give enough information on its own. Adding the UDF name, if available, makes it easier to identify the exact problematic UDF. This is particularly helpful when the exception stack trace is not emitted due to a JVM performance optimization and codegen is enabled. Example in 3.1.1: ``` Caused by: org.apache.spark.SparkException: Failed to execute user defined function(UDFRegistration$$Lambda$666/1969461119: (bigint, string) => string) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.subExpr_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:249) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:248) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:131) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:523) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1535) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:526) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException ``` ### Does this PR introduce _any_ user-facing change? Yes, it adds the UDF name to the UDF failure error message. Before this change: > [FAILED_EXECUTE_UDF] Failed to execute user defined function (QueryExecutionErrorsSuite$$Lambda$970/181260145: (string, int) => string). After this change: > [FAILED_EXECUTE_UDF] Failed to execute user defined function (nextChar in QueryExecutionErrorsSuite$$Lambda$970/181260145: (string, int) => string). ### How was this patch tested? Unit test added. Closes #41599 from robreeves/roreeves/roreeves/udf_error. Lead-authored-by: Rob Reeves Co-authored-by: Rob Reeves Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/ScalaUDF.scala | 6 ++-- .../spark/sql/errors/QueryExecutionErrors.scala| 4 +-- .../sql/errors/QueryExecutionErrorsSuite.scala | 35 ++ .../spark/sql/hive/execution/HiveUDFSuite.scala| 6 ++-- 4 files changed, 39 insertions(+), 12 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index 137a8976a40..40274a83340 100644 --- a/sql/catalyst/src
[spark] branch master updated: [SPARK-44004][SQL] Assign name & improve error message for frequent LEGACY errors
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 94031ead786 [SPARK-44004][SQL] Assign name & improve error message for frequent LEGACY errors 94031ead786 is described below commit 94031ead78682bd5c1adab8b87e61055968c8998 Author: itholic AuthorDate: Wed Jun 21 10:36:04 2023 +0300 [SPARK-44004][SQL] Assign name & improve error message for frequent LEGACY errors ### What changes were proposed in this pull request? This PR proposes to assign name & improve error message for frequent LEGACY errors. ### Why are the changes needed? To improve the errors that most frequently occurring. ### Does this PR introduce _any_ user-facing change? No API changes, it's only for errors. ### How was this patch tested? The existing CI should passed. Closes #41504 from itholic/naming_top_error_class. Authored-by: itholic Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 80 +++--- .../spark/sql/catalyst/analysis/Analyzer.scala | 4 +- .../catalyst/analysis/ResolveInlineTables.scala| 5 +- .../spark/sql/catalyst/analysis/unresolved.scala | 3 +- .../spark/sql/errors/QueryCompilationErrors.scala | 22 +++--- .../spark/sql/errors/QueryParsingErrors.scala | 2 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 5 +- .../catalyst/analysis/ResolveSubquerySuite.scala | 11 ++- .../catalyst/parser/ExpressionParserSuite.scala| 10 +-- .../analyzer-results/ansi/literals.sql.out | 10 +-- .../columnresolution-negative.sql.out | 6 +- .../analyzer-results/join-lateral.sql.out | 6 +- .../sql-tests/analyzer-results/literals.sql.out| 10 +-- .../analyzer-results/postgreSQL/boolean.sql.out| 5 +- .../postgreSQL/window_part3.sql.out| 5 +- .../postgreSQL/window_part4.sql.out| 5 +- .../table-valued-functions.sql.out | 4 +- .../sql-tests/results/ansi/literals.sql.out| 10 +-- .../results/columnresolution-negative.sql.out | 6 +- .../sql-tests/results/join-lateral.sql.out | 6 +- .../resources/sql-tests/results/literals.sql.out | 10 +-- .../sql-tests/results/postgreSQL/boolean.sql.out | 5 +- .../results/postgreSQL/window_part3.sql.out| 5 +- .../results/postgreSQL/window_part4.sql.out| 5 +- .../results/table-valued-functions.sql.out | 4 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 12 ++-- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 6 +- .../spark/sql/execution/SQLViewTestSuite.scala | 4 +- 28 files changed, 134 insertions(+), 132 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index d9e729effeb..e35adcfbb5a 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -157,6 +157,11 @@ ], "sqlState" : "22018" }, + "CANNOT_PARSE_INTERVAL" : { +"message" : [ + "Unable to parse . Please ensure that the value provided is in a valid format for defining an interval. You can reference the documentation for the correct format. If the issue persists, please double check that the input value is not null or empty and try again." +] + }, "CANNOT_PARSE_JSON_FIELD" : { "message" : [ "Cannot parse the field name and the value of the JSON token type to target Spark data type ." @@ -191,6 +196,11 @@ ], "sqlState" : "0AKD0" }, + "CANNOT_RESOLVE_STAR_EXPAND" : { +"message" : [ + "Cannot resolve .* given input columns . Please check that the specified table or struct exists and is accessible in the input columns." +] + }, "CANNOT_RESTORE_PERMISSIONS_FOR_PATH" : { "message" : [ "Failed to set permissions on created path back to ." @@ -689,6 +699,11 @@ ], "sqlState" : "42K04" }, + "FAILED_SQL_EXPRESSION_EVALUATION" : { +"message" : [ + "Failed to evaluate the SQL expression: . Please check your syntax and ensure all required tables and columns are available." +] + }, "FIELD_NOT_FOUND" : { "message" : [ "No such struct field in ." @@ -1222,6 +1237,11 @@ } } }, + "INVALID_NUMERIC_LITERAL_RANGE" : { +"message" : [ + "Numeric literal is outside the valid range for with minimum value of and maximum value of . Please adjust the value acc
[spark] branch master updated: [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f3db20c17df [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 f3db20c17df is described below commit f3db20c17dfdc1cb5daa42c154afa732e5e3800b Author: panbingkun AuthorDate: Tue Jun 20 01:43:32 2023 +0300 [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 ### What changes were proposed in this pull request? The pr aims to: - Refactor `PreWriteCheck` to use error framework. - Make `INSERT_COLUMN_ARITY_MISMATCH` more generic & avoiding to embed error's text in source code. - Assign name to _LEGACY_ERROR_TEMP_1170. - In `INSERT_PARTITION_COLUMN_ARITY_MISMATCH` error message, replace '' with `toSQLId` for table column name. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #41458 from panbingkun/refactor_PreWriteCheck. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 62 --- python/pyspark/sql/tests/test_readwriter.py| 4 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../catalyst/analysis/ResolveInsertionBase.scala | 13 ++- .../catalyst/analysis/TableOutputResolver.scala| 4 +- .../spark/sql/errors/QueryCompilationErrors.scala | 40 +++ .../catalyst/analysis/V2WriteAnalysisSuite.scala | 48 +--- .../spark/sql/execution/datasources/rules.scala| 32 -- .../analyzer-results/postgreSQL/numeric.sql.out| 7 +- .../sql-tests/results/postgreSQL/numeric.sql.out | 7 +- .../org/apache/spark/sql/DataFrameSuite.scala | 33 -- .../org/apache/spark/sql/SQLInsertTestSuite.scala | 31 -- .../spark/sql/connector/InsertIntoTests.scala | 34 -- .../apache/spark/sql/execution/SQLViewSuite.scala | 11 +- .../spark/sql/execution/command/DDLSuite.scala | 54 + .../org/apache/spark/sql/sources/InsertSuite.scala | 122 + .../spark/sql/hive/thriftserver/CliSuite.scala | 2 +- .../org/apache/spark/sql/hive/InsertSuite.scala| 11 +- 18 files changed, 324 insertions(+), 193 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 54b920cc36f..d9e729effeb 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -888,10 +888,24 @@ }, "INSERT_COLUMN_ARITY_MISMATCH" : { "message" : [ - "Cannot write to '', :", - "Table columns: .", - "Data columns: ." + "Cannot write to , the reason is" ], +"subClass" : { + "NOT_ENOUGH_DATA_COLUMNS" : { +"message" : [ + "not enough data columns:", + "Table columns: .", + "Data columns: ." +] + }, + "TOO_MANY_DATA_COLUMNS" : { +"message" : [ + "too many data columns:", + "Table columns: .", + "Data columns: ." +] + } +}, "sqlState" : "21S01" }, "INSERT_PARTITION_COLUMN_ARITY_MISMATCH" : { @@ -1715,6 +1729,11 @@ ], "sqlState" : "46110" }, + "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT" : { +"message" : [ + " is not supported, if you want to enable it, please set \"spark.sql.catalogImplementation\" to \"hive\"." +] + }, "NOT_SUPPORTED_IN_JDBC_CATALOG" : { "message" : [ "Not supported command in JDBC catalog:" @@ -2464,6 +2483,33 @@ "grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup." ] }, + "UNSUPPORTED_INSERT" : { +"message" : [ + "Can't insert into the target." +], +"subClass" : { + "NOT_ALLOWED" : { +"message" : [ + "The target relation does not allow insertion." +] + }, + "NOT_PARTITIONED" : { +"message" : [ + "The target relation is not partitioned." +] + }, + "RDD_BASED" : { +"message" : [ + "An RDD-based table is not allowed." +] + }, +
[spark] branch master updated: [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline in all modules
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0fc7eeb39aa [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline in all modules 0fc7eeb39aa is described below commit 0fc7eeb39aad5997912c8a3f82aea089a4985898 Author: Hyukjin Kwon AuthorDate: Mon Jun 19 13:34:42 2023 +0300 [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline in all modules ### What changes were proposed in this pull request? I found that there are many instances same as https://github.com/apache/spark/pull/41655. This PR aims to address all the examples in all components in PySpark. ### Why are the changes needed? See https://github.com/apache/spark/pull/41655. ### Does this PR introduce _any_ user-facing change? Yes, it changes the documentation and makes the example copy-pastable, see also https://github.com/apache/spark/pull/41655. ### How was this patch tested? CI in this PR should validate them. This is logically the same as https://github.com/apache/spark/pull/41655. I will also build the documentation locally and test. Closes #41657 from HyukjinKwon/minor-newlines. Authored-by: Hyukjin Kwon Signed-off-by: Max Gekk --- python/pyspark/accumulators.py | 4 python/pyspark/context.py | 4 python/pyspark/ml/functions.py | 21 +++-- python/pyspark/ml/torch/distributor.py | 2 ++ python/pyspark/mllib/clustering.py | 2 ++ python/pyspark/rdd.py | 9 + python/pyspark/sql/dataframe.py| 4 python/pyspark/sql/functions.py| 4 python/pyspark/sql/pandas/group_ops.py | 6 ++ python/pyspark/sql/streaming/query.py | 2 ++ python/pyspark/sql/types.py| 1 + python/pyspark/sql/udtf.py | 1 + 12 files changed, 46 insertions(+), 14 deletions(-) diff --git a/python/pyspark/accumulators.py b/python/pyspark/accumulators.py index dc8520a844d..a95bd9debfc 100644 --- a/python/pyspark/accumulators.py +++ b/python/pyspark/accumulators.py @@ -88,12 +88,14 @@ class Accumulator(Generic[T]): >>> def f(x): ... global a ... a += x +... >>> rdd.foreach(f) >>> a.value 13 >>> b = sc.accumulator(0) >>> def g(x): ... b.add(x) +... >>> rdd.foreach(g) >>> b.value 6 @@ -106,6 +108,7 @@ class Accumulator(Generic[T]): >>> def h(x): ... global a ... a.value = 7 +... >>> rdd.foreach(h) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... @@ -198,6 +201,7 @@ class AccumulatorParam(Generic[T]): >>> def g(x): ... global va ... va += [x] * 3 +... >>> rdd = sc.parallelize([1,2,3]) >>> rdd.foreach(g) >>> va.value diff --git a/python/pyspark/context.py b/python/pyspark/context.py index 6f5094963be..51a4db67e8c 100644 --- a/python/pyspark/context.py +++ b/python/pyspark/context.py @@ -1802,6 +1802,7 @@ class SparkContext: >>> def f(x): ... global acc ... acc += 1 +... >>> rdd.foreach(f) >>> acc.value 15 @@ -2140,6 +2141,7 @@ class SparkContext: >>> def map_func(x): ... sleep(100) ... raise RuntimeError("Task should have been cancelled") +... >>> def start_job(x): ... global result ... try: @@ -2148,9 +2150,11 @@ class SparkContext: ... except Exception as e: ... result = "Cancelled" ... lock.release() +... >>> def stop_job(): ... sleep(5) ... sc.cancelJobGroup("job_to_cancel") +... >>> suppress = lock.acquire() >>> suppress = InheritableThread(target=start_job, args=(10,)).start() >>> suppress = InheritableThread(target=stop_job).start() diff --git a/python/pyspark/ml/functions.py b/python/pyspark/ml/functions.py index bce4101df1e..89b05b692ea 100644 --- a/python/pyspark/ml/functions.py +++ b/python/pyspark/ml/functions.py @@ -512,11 +512,10 @@ def predict_batch_udf( ... # outputs.shape = [batch_size] ... return inputs * 2 ... return predict ->>> +... >>> times_two_udf = predict_batch_udf(make_times_two_fn, ... return_type=FloatType(), ...
[spark] branch master updated: [SPARK-44093][SQL][TESTS] Make `catalyst` module passes in Java 21
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cbfc920c2d7 [SPARK-44093][SQL][TESTS] Make `catalyst` module passes in Java 21 cbfc920c2d7 is described below commit cbfc920c2d75451e898ff5e00622a2af4eed3709 Author: Dongjoon Hyun AuthorDate: Sun Jun 18 17:34:08 2023 +0300 [SPARK-44093][SQL][TESTS] Make `catalyst` module passes in Java 21 ### What changes were proposed in this pull request? This PR aims to make `catalyst` module passes in Java 21. ### Why are the changes needed? https://bugs.openjdk.org/browse/JDK-8267125 changes the error message at Java 18. **JAVA** ``` $ java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+27-2343) OpenJDK 64-Bit Server VM (build 21-ea+27-2343, mixed mode, sharing) ``` **BEFORE** ``` $ build/sbt "catalyst/test" ... [info] *** 1 TEST FAILED *** [error] Failed: Total 7122, Failed 1, Errors 0, Passed 7121, Ignored 5, Canceled 1 [error] Failed tests: [error] org.apache.spark.sql.catalyst.expressions.ExpressionImplUtilsSuite [error] (catalyst / Test / test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 212 s (03:32), completed Jun 18, 2023, 1:11:17 AM ``` **AFTER** ``` $ build/sbt "catalyst/test" ... [info] All tests passed. [info] Passed: Total 7122, Failed 0, Errors 0, Passed 7122, Ignored 5, Canceled 1 [success] Total time: 213 s (03:33), completed Jun 18, 2023, 1:15:37 AM ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manual test on Java 21. Closes #41649 from dongjoon-hyun/SPARK-44093. Authored-by: Dongjoon Hyun Signed-off-by: Max Gekk --- .../sql/catalyst/expressions/ExpressionImplUtilsSuite.scala | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala index 3b0dd82c173..4b33f9bc527 100644 --- a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala +++ b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala @@ -17,6 +17,8 @@ package org.apache.spark.sql.catalyst.expressions +import org.apache.commons.lang3.{JavaVersion, SystemUtils} + import org.apache.spark.{SparkFunSuite, SparkRuntimeException} import org.apache.spark.unsafe.types.UTF8String @@ -285,6 +287,12 @@ class ExpressionImplUtilsSuite extends SparkFunSuite { } } + // JDK-8267125 changes tag error message at Java 18 + val msgTagMismatch = if (SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) { +"Tag mismatch!" + } else { +"Tag mismatch" + } val corruptedCiphertexts = Seq( // This is truncated TestCase( @@ -310,7 +318,7 @@ class ExpressionImplUtilsSuite extends SparkFunSuite { errorParamsMap = Map( "parameter" -> "`expr`, `key`", "functionName" -> "`aes_encrypt`/`aes_decrypt`", -"detailMessage" -> "Tag mismatch!" +"detailMessage" -> msgTagMismatch ) ), // Valid ciphertext, wrong AAD @@ -324,7 +332,7 @@ class ExpressionImplUtilsSuite extends SparkFunSuite { errorParamsMap = Map( "parameter" -> "`expr`, `key`", "functionName" -> "`aes_encrypt`/`aes_decrypt`", -"detailMessage" -> "Tag mismatch!" +"detailMessage" -> msgTagMismatch ) ) ) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44089][SQL][TESTS] Remove the `@ignore` identifier from `AlterTableRenamePartitionSuite`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d05091eb8f0 [SPARK-44089][SQL][TESTS] Remove the `@ignore` identifier from `AlterTableRenamePartitionSuite` d05091eb8f0 is described below commit d05091eb8f0f6ee1398fae90fd7b593ac3314e44 Author: yangjie01 AuthorDate: Sun Jun 18 17:24:36 2023 +0300 [SPARK-44089][SQL][TESTS] Remove the `@ignore` identifier from `AlterTableRenamePartitionSuite` ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/41533 ignore `AlterTableRenamePartitionSuite` try to restore stability of `sql-others` test task, but it seems that it is not the root cause that affects stability, so this pr has removed the previously added `ignore` identifier to restore testing. ### Why are the changes needed? Resume testing of `AlterTableRenamePartitionSuite` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? should monitor ci Closes #41647 from LuciferYang/SPARK-44089. Authored-by: yangjie01 Signed-off-by: Max Gekk --- .../sql/execution/command/v2/AlterTableRenamePartitionSuite.scala | 3 --- 1 file changed, 3 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala index 764596685b5..bb06818da48 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala @@ -17,8 +17,6 @@ package org.apache.spark.sql.execution.command.v2 -import org.scalatest.Ignore - import org.apache.spark.sql.Row import org.apache.spark.sql.execution.command @@ -26,7 +24,6 @@ import org.apache.spark.sql.execution.command * The class contains tests for the `ALTER TABLE .. RENAME PARTITION` command * to check V2 table catalogs. */ -@Ignore class AlterTableRenamePartitionSuite extends command.AlterTableRenamePartitionSuiteBase with CommandSuiteBase { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44071] Define and use Unresolved[Leaf|Unary]Node traits
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 747953eb5c4 [SPARK-44071] Define and use Unresolved[Leaf|Unary]Node traits 747953eb5c4 is described below commit 747953eb5c46e121faf476a060049f1423ae7e91 Author: Ryan Johnson AuthorDate: Fri Jun 16 23:30:08 2023 +0300 [SPARK-44071] Define and use Unresolved[Leaf|Unary]Node traits ### What changes were proposed in this pull request? Looking at [unresolved.scala](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala), catalyst would benefit from an `UnresolvedNode` trait that various `UnresolvedFoo` classes could inherit: ```scala trait UnresolvedNode extends LogicalPlan { override def output: Seq[Attribute] = Nil override lazy val resolved = false } ``` Today, the code is duplicated in ~20 locations (7 of them in that one file). ### Why are the changes needed? Reduces redundancy, improves readability, documents programmer intent better. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Mild refactor, existing unit tests suffice. Closes #41617 from ryan-johnson-databricks/unresolved-node-trait. Authored-by: Ryan Johnson Signed-off-by: Max Gekk --- .../sql/catalyst/analysis/RelationTimeTravel.scala | 8 ++-- .../spark/sql/catalyst/analysis/parameters.scala | 10 ++--- .../spark/sql/catalyst/analysis/unresolved.scala | 48 +- .../sql/catalyst/analysis/v2ResolutionPlans.scala | 32 +++ .../spark/sql/catalyst/catalog/interface.scala | 11 ++--- .../plans/logical/basicLogicalOperators.scala | 6 +-- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 5 +-- 7 files changed, 39 insertions(+), 81 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala index 4daefa816a5..6e0d0998883 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala @@ -17,8 +17,8 @@ package org.apache.spark.sql.catalyst.analysis -import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} -import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan} +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.trees.TreePattern.{RELATION_TIME_TRAVEL, TreePattern} /** @@ -29,8 +29,6 @@ import org.apache.spark.sql.catalyst.trees.TreePattern.{RELATION_TIME_TRAVEL, Tr case class RelationTimeTravel( relation: LogicalPlan, timestamp: Option[Expression], -version: Option[String]) extends LeafNode { - override def output: Seq[Attribute] = Nil - override lazy val resolved: Boolean = false +version: Option[String]) extends UnresolvedLeafNode { override val nodePatterns: Seq[TreePattern] = Seq(RELATION_TIME_TRAVEL) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala index 2a31e90465c..a00f9cec92c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala @@ -18,8 +18,8 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.SparkException -import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression, LeafExpression, Literal, SubqueryExpression, Unevaluable} -import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UnaryNode} +import org.apache.spark.sql.catalyst.expressions.{Expression, LeafExpression, Literal, SubqueryExpression, Unevaluable} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.catalyst.trees.TreePattern.{PARAMETER, PARAMETERIZED_QUERY, TreePattern, UNRESOLVED_WITH} import org.apache.spark.sql.errors.QueryErrorsBase @@ -47,10 +47,10 @@ case class Parameter(name: String) extends LeafExpression with Unevaluable { * The logical plan representing a parameterized query. It will be removed during analysis after * the parameters are bind. */ -case class ParameterizedQuery(child: LogicalPlan, args: Map[String, Expression]) extends UnaryNode { +case class ParameterizedQuery(child: LogicalPlan, args: Map[String, Expression]) + extends
[spark] branch master updated: [SPARK-43290][SQL] Adds support for aes_encrypt IVs and AAD
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fb1ee25a89e [SPARK-43290][SQL] Adds support for aes_encrypt IVs and AAD fb1ee25a89e is described below commit fb1ee25a89e8b42178b7f55718859ab5117c2320 Author: Steve Weis AuthorDate: Fri Jun 16 15:42:05 2023 +0300 [SPARK-43290][SQL] Adds support for aes_encrypt IVs and AAD ### What changes were proposed in this pull request? This change adds support for user-provided initialization vectors (IVs) or authenticated additional data (AAD) to `aes_encrypt` / `aes_decrypt`. 12-byte IVs may optionally be passed if the mode is "GCM" and 16-byte IVs may be passed if the mode is "CBC". An arbitrary binary value may be passed as additional authenticated data only if "GCM" mode is used. ### Why are the changes needed? Callers may wish to provide their own IV values so that the output ciphertext matches a ciphertext generated outside of Spark. AAD is used to bind some input to a ciphertext and ensure that it is presented during decryption -- often used to scope an operation to a specific context. ### Does this PR introduce _any_ user-facing change? Yes, this change introduces two optional parameters to `aes_encrypt` and one optional parameter to `aes_decrypt`: ``` aes_encrypt(expr, key[, mode[, padding[, iv[, aad) aes_decrypt(expr, key[, mode[, padding[, iv]]]) ``` ### How was this patch tested? ``` build/sbt "sql/test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite -- -z aes" ``` Closes #41488 from sweisdb/SPARK-43290. Authored-by: Steve Weis Signed-off-by: Max Gekk --- .../catalyst/expressions/ExpressionImplUtils.java | 14 + .../spark/sql/catalyst/expressions/misc.scala | 64 +- .../expressions/ExpressionImplUtilsSuite.scala | 23 +++- .../sql-functions/sql-expression-schema.md | 6 +- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 50 + 5 files changed, 127 insertions(+), 30 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java index 6aae649718a..a604e6bf225 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java @@ -111,14 +111,6 @@ public class ExpressionImplUtils { return checkSum % 10 == 0; } - public static byte[] aesEncrypt(byte[] input, byte[] key, UTF8String mode, UTF8String padding) { -return aesEncrypt(input, key, mode, padding, null, null); - } - - public static byte[] aesDecrypt(byte[] input, byte[] key, UTF8String mode, UTF8String padding) { -return aesDecrypt(input, key, mode, padding, null); - } - public static byte[] aesEncrypt(byte[] input, byte[] key, UTF8String mode, @@ -192,7 +184,7 @@ public class ExpressionImplUtils { Cipher cipher = Cipher.getInstance(cipherMode.transformation); if (opmode == Cipher.ENCRYPT_MODE) { // This may be 0-length for ECB -if (iv == null) { +if (iv == null || iv.length == 0) { iv = generateIv(cipherMode); } else if (!cipherMode.usesSpec) { // If the caller passes an IV, ensure the mode actually uses it. @@ -210,7 +202,7 @@ public class ExpressionImplUtils { } // If the cipher mode supports additional authenticated data and it is provided, update it -if (aad != null) { +if (aad != null && aad.length != 0) { if (cipherMode.supportsAad != true) { throw QueryExecutionErrors.aesUnsupportedAad(mode); } @@ -231,7 +223,7 @@ public class ExpressionImplUtils { if (cipherMode.usesSpec) { AlgorithmParameterSpec algSpec = getParamSpec(cipherMode, input); cipher.init(opmode, secretKey, algSpec); - if (aad != null) { + if (aad != null && aad.length != 0) { if (cipherMode.supportsAad != true) { throw QueryExecutionErrors.aesUnsupportedAad(mode); } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala index 67328cde71a..92ed0843521 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.sca
[spark] branch master updated: [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c41be4ec0ad [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132 c41be4ec0ad is described below commit c41be4ec0ad97f587a0581d5583b2ca9975b2a0f Author: Hisoka AuthorDate: Mon Jun 12 23:54:02 2023 +0300 [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2132, "CANNOT_PARSE_JSON_ARRAYS_AS_STRUCTS". ### Why are the changes needed? Assign proper name to LEGACY_ERROR_TEMP ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ./build/sbt "testOnly org.apache.spark.sql.errors.QueryExecutionErrorsSuite" Closes #40632 from Hisoka-X/_LEGACY_ERROR_TEMP_2132. Lead-authored-by: Hisoka Co-authored-by: Jia Fan Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 20 ++-- .../spark/sql/catalyst/json/JacksonParser.scala | 2 +- .../spark/sql/catalyst/util/BadRecordException.scala | 5 + .../spark/sql/catalyst/util/FailureSafeParser.scala | 10 -- .../spark/sql/errors/QueryExecutionErrors.scala | 10 ++ .../catalyst/expressions/JsonExpressionsSuite.scala | 2 +- .../org/apache/spark/sql/CsvFunctionsSuite.scala | 2 +- .../org/apache/spark/sql/JsonFunctionsSuite.scala| 12 ++-- .../spark/sql/errors/QueryExecutionErrorsSuite.scala | 15 +++ .../sql/execution/datasources/csv/CSVSuite.scala | 2 +- .../sql/execution/datasources/json/JsonSuite.scala | 4 ++-- .../spark/sql/hive/thriftserver/CliSuite.scala | 4 ++-- .../ThriftServerWithSparkContextSuite.scala | 4 ++-- 13 files changed, 64 insertions(+), 28 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index a12a8000870..183ea31a7cb 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1542,7 +1542,20 @@ "message" : [ "Malformed records are detected in record parsing: .", "Parse Mode: . To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'." -] +], +"subClass" : { + "CANNOT_PARSE_JSON_ARRAYS_AS_STRUCTS" : { +"message" : [ + "Parsing JSON arrays as structs is forbidden." +] + }, + "WITHOUT_SUGGESTION" : { +"message" : [ + "" +] + } +}, +"sqlState" : "22023" }, "MISSING_AGGREGATION" : { "message" : [ @@ -4692,11 +4705,6 @@ "Exception when registering StreamingQueryListener." ] }, - "_LEGACY_ERROR_TEMP_2132" : { -"message" : [ - "Parsing JSON arrays as structs is forbidden." -] - }, "_LEGACY_ERROR_TEMP_2133" : { "message" : [ "Cannot parse field name , field value , [] as target spark data type []." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index bf07d65caa0..48ee50938cd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala @@ -144,7 +144,7 @@ class JacksonParser( array.toArray[InternalRow](schema) } case START_ARRAY => -throw QueryExecutionErrors.cannotParseJsonArraysAsStructsError() +throw JsonArraysAsStructsException() } } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala index 67defe78a6c..cfbe9da6ec5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala @@ -41,3 +41,8 @@ case class BadRecordException( @transient record: () => UTF8String, @transient partialResult: () => Option[InternalRow], cause: Throwable) extends Exception(cause) + +/** + * Exception thrown when the underlying parser parses a JSON array as a struct. + */ +case class JsonArraysAsStructsException() extends RuntimeException() diff --git a/sql/catalyst/src/main/scala/org/apache/spark/s
[spark] branch master updated: [SPARK-43971][CONNECT][PYTHON] Support Python's createDataFrame in streaming manner
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 93e0acbf7d9 [SPARK-43971][CONNECT][PYTHON] Support Python's createDataFrame in streaming manner 93e0acbf7d9 is described below commit 93e0acbf7d9fcf3422860b2a5d39379bebf7bc43 Author: Max Gekk AuthorDate: Sat Jun 10 01:25:04 2023 +0300 [SPARK-43971][CONNECT][PYTHON] Support Python's createDataFrame in streaming manner ### What changes were proposed in this pull request? In the PR, I propose to transfer a local relation from **the Python connect client** to the server in streaming way when it exceeds some size which is defined by the SQL config `spark.sql.session.localRelationCacheThreshold`. The implementation is similar to https://github.com/apache/spark/pull/40827. In particular: 1. The client applies the `sha256` function over **the proto form** of the local relation; 2. It checks presents of the relation at the server side by sending the relation hash to the server; 3. If the server doesn't have the local relation, the client transfers the local relation as an artefact with the name `cache/`; 4. As soon as the relation has presented at the server already, or transferred recently, the client transform the logical plan by replacing the `LocalRelation` node by `CachedLocalRelation` with the hash. 5. On another hand, the server converts `CachedLocalRelation` back to `LocalRelation` by retrieving the relation body from the local cache. ### Why are the changes needed? To fix the issues of creating a large dataframe from a local collection: ```python pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.RESOURCE_EXHAUSTED details = "Sent message larger than max (134218508 vs. 134217728)" debug_error_string = "UNKNOWN:Error received from peer localhost:50982 {grpc_message:"Sent message larger than max (134218508 vs. 134217728)", grpc_status:8, created_time:"2023-06-09T15:34:08.362797+03:00"} ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ python/run-tests --parallelism=1 --testnames 'pyspark.sql.tests.connect.test_connect_basic SparkConnectBasicTests.test_streaming_local_relation' ``` Closes #41537 from MaxGekk/streaming-createDataFrame-python-4. Authored-by: Max Gekk Signed-off-by: Max Gekk --- python/pyspark/sql/connect/client/core.py | 3 ++ python/pyspark/sql/connect/plan.py | 34 ++ python/pyspark/sql/connect/session.py | 26 +++-- .../sql/tests/connect/test_connect_basic.py| 19 4 files changed, 79 insertions(+), 3 deletions(-) diff --git a/python/pyspark/sql/connect/client/core.py b/python/pyspark/sql/connect/client/core.py index 25e395356d5..7368521259a 100644 --- a/python/pyspark/sql/connect/client/core.py +++ b/python/pyspark/sql/connect/client/core.py @@ -1257,6 +1257,9 @@ class SparkConnectClient(object): def copy_from_local_to_fs(self, local_path: str, dest_path: str) -> None: self._artifact_manager._add_forward_to_fs_artifacts(local_path, dest_path) +def cache_artifact(self, blob: bytes) -> str: +return self._artifact_manager.cache_artifact(blob) + class RetryState: """ diff --git a/python/pyspark/sql/connect/plan.py b/python/pyspark/sql/connect/plan.py index fc8b37b102c..406f65080d1 100644 --- a/python/pyspark/sql/connect/plan.py +++ b/python/pyspark/sql/connect/plan.py @@ -363,6 +363,10 @@ class LocalRelation(LogicalPlan): plan.local_relation.schema = self._schema return plan +def serialize(self, session: "SparkConnectClient") -> bytes: +p = self.plan(session) +return bytes(p.local_relation.SerializeToString()) + def print(self, indent: int = 0) -> str: return f"{' ' * indent}\n" @@ -374,6 +378,36 @@ class LocalRelation(LogicalPlan): """ +class CachedLocalRelation(LogicalPlan): +"""Creates a CachedLocalRelation plan object based on a hash of a LocalRelation.""" + +def __init__(self, hash: str) -> None: +super().__init__(None) + +self._hash = hash + +def plan(self, session: "SparkConnectClient") -> proto.Relation: +plan = self._create_proto_relation() +clr = plan.cached_local_relation + +if session._user_id: +clr.userId = session._user_id +clr.sessionId = session._session_id +clr.hash = self._h
[spark] branch master updated (3cae38b4f10 -> 958b8541803)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3cae38b4f10 [SPARK-43612][PYTHON][CONNECT][FOLLOW-UP] Copy dependent data files to data directory add 958b8541803 [SPARK-44006][CONNECT][PYTHON] Support cache artifacts No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/client/artifact.py | 52 +- .../sql/tests/connect/client/test_artifact.py | 10 + 2 files changed, 61 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43993][SQL][TESTS] Add tests for cache artifacts
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fead8a7962a [SPARK-43993][SQL][TESTS] Add tests for cache artifacts fead8a7962a is described below commit fead8a7962a717aae5cab9eef51eed2ac684f070 Author: Max Gekk AuthorDate: Wed Jun 7 16:00:49 2023 +0300 [SPARK-43993][SQL][TESTS] Add tests for cache artifacts ### What changes were proposed in this pull request? In the PR, I propose to add a test to check two methods of the artifact manager: - `isCachedArtifact()` - `cacheArtifact()` ### Why are the changes needed? To improve test coverage of Artifacts API. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.ArtifactSuite" ``` Closes #41493 from MaxGekk/test-cache-artifact. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../spark/sql/connect/client/ArtifactManager.scala | 2 +- .../spark/sql/connect/client/ArtifactSuite.scala | 14 .../connect/client/SparkConnectClientSuite.scala | 25 +- 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala index acd9f279c6d..6d0d16df946 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala @@ -108,7 +108,7 @@ class ArtifactManager( */ def addArtifacts(uris: Seq[URI]): Unit = addArtifacts(uris.flatMap(parseArtifacts)) - private def isCachedArtifact(hash: String): Boolean = { + private[client] def isCachedArtifact(hash: String): Boolean = { val artifactName = CACHE_PREFIX + "/" + hash val request = proto.ArtifactStatusesRequest .newBuilder() diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala index 506ad3625b0..39ab0eef412 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala @@ -25,6 +25,7 @@ import scala.collection.JavaConverters._ import com.google.protobuf.ByteString import io.grpc.{ManagedChannel, Server} import io.grpc.inprocess.{InProcessChannelBuilder, InProcessServerBuilder} +import org.apache.commons.codec.digest.DigestUtils.sha256Hex import org.scalatest.BeforeAndAfterEach import org.apache.spark.connect.proto @@ -248,4 +249,17 @@ class ArtifactSuite extends ConnectFunSuite with BeforeAndAfterEach { assertFileDataEquality(remainingArtifacts.get(0).getData, Paths.get(file3)) assertFileDataEquality(remainingArtifacts.get(1).getData, Paths.get(file4)) } + + test("cache an artifact and check its presence") { +val s = "Hello, World!" +val blob = s.getBytes("UTF-8") +val expectedHash = sha256Hex(blob) +assert(artifactManager.isCachedArtifact(expectedHash) === false) +val actualHash = artifactManager.cacheArtifact(blob) +assert(actualHash === expectedHash) +assert(artifactManager.isCachedArtifact(expectedHash) === true) + +val receivedRequests = service.getAndClearLatestAddArtifactRequests() +assert(receivedRequests.size == 1) + } } diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala index 7a0ad1a9e2a..7e0b687054d 100755 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.connect.client import java.util.concurrent.TimeUnit +import scala.collection.JavaConverters._ import scala.collection.mutable import io.grpc.{Server, StatusRuntimeException} @@ -26,7 +27,7 @@ import io.grpc.stub.StreamObserver import org.scalatest.BeforeAndAfterEach import org.apache.spark.connect.proto -import org.apache.spark.connect.proto.{AddArtifactsRequest, AddArtifactsResponse, AnalyzePlanRequest, AnalyzePlanResponse, ExecutePlanRequest, ExecutePlanResponse, SparkConnectServiceGrpc} +import o
[spark] branch master updated: [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0cd5ca5a7b3 [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432] 0cd5ca5a7b3 is described below commit 0cd5ca5a7b31f65a005c8ee2e90a6b4a29623ba7 Author: Jiaan Geng AuthorDate: Tue Jun 6 10:28:48 2023 +0300 [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432] ### What changes were proposed in this pull request? The pr aims to assign names to the error class `_LEGACY_ERROR_TEMP_[2426-2432]`. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. Closes #41424 from beliefer/SPARK-43913. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 58 -- .../sql/catalyst/analysis/CheckAnalysis.scala | 51 +++ .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 20 .../CreateTablePartitioningValidationSuite.scala | 22 .../negative-cases/invalid-correlation.sql.out | 6 ++- .../negative-cases/invalid-correlation.sql.out | 6 ++- 6 files changed, 93 insertions(+), 70 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index de80415d85b..8c3c076ce74 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -660,6 +660,11 @@ "The event time has the invalid type , but expected \"TIMESTAMP\"." ] }, + "EXPRESSION_TYPE_IS_NOT_ORDERABLE" : { +"message" : [ + "Column expression cannot be sorted because its type is not orderable." +] + }, "FAILED_EXECUTE_UDF" : { "message" : [ "Failed to execute user defined function (: () => )." @@ -1541,6 +1546,24 @@ ], "sqlState" : "42803" }, + "MISSING_ATTRIBUTES" : { +"message" : [ + "Resolved attribute(s) missing from in operator ." +], +"subClass" : { + "RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION" : { +"message" : [ + "Attribute(s) with the same name appear in the operation: .", + "Please check if the right attribute(s) are used." +] + }, + "RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT" : { +"message" : [ + "" +] + } +} + }, "MISSING_GROUP_BY" : { "message" : [ "The query does not include a GROUP BY clause. Add GROUP BY or turn it into the window functions using OVER clauses." @@ -1945,6 +1968,11 @@ "Query [id = , runId = ] terminated with exception: " ] }, + "SUM_OF_LIMIT_AND_OFFSET_EXCEEDS_MAX_INT" : { +"message" : [ + "The sum of the LIMIT clause and the OFFSET clause must not be greater than the maximum 32-bit integer value (2,147,483,647) but found limit = , offset = ." +] + }, "TABLE_OR_VIEW_ALREADY_EXISTS" : { "message" : [ "Cannot create table or view because it already exists.", @@ -2310,6 +2338,11 @@ "Parameter markers in unexpected statement: . Parameter markers must only be used in a query, or DML statement." ] }, + "PARTITION_WITH_NESTED_COLUMN_IS_UNSUPPORTED" : { +"message" : [ + "Invalid partitioning: is missing or is in a map or array." +] + }, "PIVOT_AFTER_GROUP_BY" : { "message" : [ "PIVOT clause following a GROUP BY clause. Consider pushing the GROUP BY into a subquery." @@ -5525,31 +5558,6 @@ "failed to evaluate expression : " ] }, - "_LEGACY_ERROR_TEMP_2426" : { -"message" : [ - "nondeterministic expression should not appear in grouping expression." -] - }, - "_LEGACY_ERROR_TEMP_2427" : { -"message" : [ - "sorting is not supported for columns of type ." -] - }, - "_LEGACY_ERROR_TEMP_2428" : { -"message" : [ - "The sum of the LIMIT clause and the OFFSET clause must not be greater than the maximum 32-bit integer value (2,147,483,647) but found limit = , offset = ." -] - }, - "_LEGACY_ERROR_TEMP_2431" : { -"message" : [ - "Invalid pa
[spark] branch master updated: [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNI
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61e6227fb62 [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE` 61e6227fb62 is described below commit 61e6227fb62c2452b01ac595c2bc43d4492686a0 Author: itholic AuthorDate: Tue Jun 6 10:25:24 2023 +0300 [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE` ### What changes were proposed in this pull request? This PR proposes to improve error messages for `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE`. **NOTE:** This PR is an experimental work that utilizes LLM to enhance error messages. The script was created using the `openai` Python library from OpenAI, and minimal review was conducted by author after executing the script. The five improved error messages were selected by the author. ### Why are the changes needed? For improving errors to make them more actionable and usable. ### Does this PR introduce _any_ user-facing change? No API changes, only error message improvement. ### How was this patch tested? The existing CI should pass. Closes #41455 from itholic/emi_1-5. Authored-by: itholic Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index bceea072e92..de80415d85b 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -114,7 +114,7 @@ }, "CANNOT_DECODE_URL" : { "message" : [ - "Cannot decode url : ." + "The provided URL cannot be decoded: . Please ensure that the URL is properly formatted and try again." ], "sqlState" : "22546" }, @@ -130,7 +130,7 @@ }, "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE" : { "message" : [ - "Failed to merge incompatible data types and ." + "Failed to merge incompatible data types and . Please check the data types of the columns being merged and ensure that they are compatible. If necessary, consider casting the columns to compatible data types before attempting the merge." ], "sqlState" : "42825" }, @@ -153,7 +153,7 @@ }, "CANNOT_PARSE_DECIMAL" : { "message" : [ - "Cannot parse decimal." + "Cannot parse decimal. Please ensure that the input is a valid number with optional decimal point or comma separators." ], "sqlState" : "22018" }, @@ -176,12 +176,12 @@ }, "CANNOT_READ_FILE_FOOTER" : { "message" : [ - "Could not read footer for file: ." + "Could not read footer for file: . Please ensure that the file is in either ORC or Parquet format. If not, please convert it to a valid format. If the file is in the valid format, please check if it is corrupt. If it is, you can choose to either ignore it or fix the corruption." ] }, "CANNOT_RECOGNIZE_HIVE_TYPE" : { "message" : [ - "Cannot recognize hive type string: , column: ." + "Cannot recognize hive type string: , column: . The specified data type for the field cannot be recognized by Spark SQL. Please check the data type of the specified field and ensure that it is a valid Spark SQL data type. Refer to the Spark SQL documentation for a list of valid data types and their format. If the data type is correct, please ensure that you are using a supported version of Spark SQL." ], "sqlState" : "429BB" }, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1df1d7661a3 -> d0fe6d4b796)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1df1d7661a3 [SPARK-43516][ML][PYTHON] Update MLv2 Transformer interfaces add d0fe6d4b796 [SPARK-43948][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059] No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 47 +- .../spark/sql/errors/QueryParsingErrors.scala | 15 --- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 2 +- .../spark/sql/execution/SparkSqlParser.scala | 2 +- .../command/v2/AlterTableReplaceColumnsSuite.scala | 17 +++- .../org/apache/spark/sql/sources/InsertSuite.scala | 12 +++--- 6 files changed, 61 insertions(+), 34 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 16ee478a9de [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206 16ee478a9de is described below commit 16ee478a9debe94eadbf62ead072c2ded10220c7 Author: Amanda Liu AuthorDate: Mon Jun 5 22:19:38 2023 +0300 [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206 ### What changes were proposed in this pull request? The PR assigns a more descriptive name to the error class `_LEGACY_ERROR_TEMP_2206` -> `BATCH_METADATA_NOT_FOUND` ### Why are the changes needed? This change improves the error framework by making the error name more descriptive. ### Does this PR introduce any user-facing change? No ### How was this patch tested? The error test will be handled in a future PR (see JIRA ticket: https://issues.apache.org/jira/browse/SPARK-43940) Closes #41387 from asl3/_LEGACY_ERROR_TEMP_2206. Lead-authored-by: Amanda Liu Co-authored-by: asl3 Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 11 ++- .../org/apache/spark/sql/errors/QueryExecutionErrors.scala| 2 +- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index c73223fba39..2da08829862 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -74,6 +74,12 @@ "Cannot convert Avro to SQL because the original encoded data type is , however you're trying to read the field as , which leads to data being read as null. Please provide a wider decimal type to get the correct result. To allow reading null to this field, enable the SQL configuration: ." ] }, + "BATCH_METADATA_NOT_FOUND" : { +"message" : [ + "Unable to find batch ." +], +"sqlState" : "42K03" + }, "BINARY_ARITHMETIC_OVERFLOW" : { "message" : [ " caused overflow." @@ -4978,11 +4984,6 @@ "Cannot set timeout timestamp without enabling event time timeout in [map|flatMapGroupsWithState." ] }, - "_LEGACY_ERROR_TEMP_2206" : { -"message" : [ - "Unable to find batch ." -] - }, "_LEGACY_ERROR_TEMP_2207" : { "message" : [ "Multiple streaming queries are concurrently using ." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 7ce3e7a9e7e..fd09e99b9ee 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -2011,7 +2011,7 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { def batchMetadataFileNotFoundError(batchMetadataFile: Path): SparkFileNotFoundException = { new SparkFileNotFoundException( - errorClass = "_LEGACY_ERROR_TEMP_2206", + errorClass = "BATCH_METADATA_NOT_FOUND", messageParameters = Map( "batchMetadataFile" -> batchMetadataFile.toString())) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 90ec7eaddb6 [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite` 90ec7eaddb6 is described below commit 90ec7eaddb66b6b2fe3afb8cdb68a9cf88f714de Author: panbingkun AuthorDate: Sat Jun 3 22:22:20 2023 +0300 [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite` ### What changes were proposed in this pull request? The pr aims to use `checkError()` to check `Exception` in `*Insert*Suite`, include: - sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedUpdateAsDeleteAndInsertTableSuite - sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite - sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite Note: But this pr does not include some of these cases, which directly throw AnalysisExecution, such as: https://github.com/apache/spark/blob/898ad77900d887ac64800a616bd382def816eea6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L505-L515 After this PR, I will refactor these, assign them a name, and use the error framework. As these tasks are completed, all exceptions checks in `*Insert*Suite` will eventually be migrated to `checkError`. ### Why are the changes needed? Migration on checkError() will make the tests independent from the text of error messages. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #41447 from panbingkun/check_error_for_insert_suites. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../org/apache/spark/sql/SQLInsertTestSuite.scala | 82 ++- ...ltaBasedUpdateAsDeleteAndInsertTableSuite.scala | 11 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 570 ++--- .../org/apache/spark/sql/hive/InsertSuite.scala| 50 +- 4 files changed, 477 insertions(+), 236 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala index 904980d58d6..af85e44519b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql import org.apache.spark.SparkConf +import org.apache.spark.SparkNumberFormatException import org.apache.spark.sql.catalyst.expressions.Hex import org.apache.spark.sql.connector.catalog.InMemoryPartitionTableCatalog import org.apache.spark.sql.internal.SQLConf @@ -181,16 +182,28 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils { } test("insert with column list - mismatched column list size") { -val msgs = Seq("Cannot write to table due to mismatched user specified column size", - "expected 3 columns but found") def test: Unit = { withTable("t1") { val cols = Seq("c1", "c2", "c3") createTable("t1", cols, Seq("int", "long", "string")) -val e1 = intercept[AnalysisException](sql(s"INSERT INTO t1 (c1, c2) values(1, 2, 3)")) -assert(e1.getMessage.contains(msgs(0)) || e1.getMessage.contains(msgs(1))) -val e2 = intercept[AnalysisException](sql(s"INSERT INTO t1 (c1, c2, c3) values(1, 2)")) -assert(e2.getMessage.contains(msgs(0)) || e2.getMessage.contains(msgs(1))) +checkError( + exception = intercept[AnalysisException] { +sql(s"INSERT INTO t1 (c1, c2) values(1, 2, 3)") + }, + sqlState = None, + errorClass = "_LEGACY_ERROR_TEMP_1038", + parameters = Map("columnSize" -> "2", "outputSize" -> "3"), + context = ExpectedContext("values(1, 2, 3)", 24, 38) +) +checkError( + exception = intercept[AnalysisException] { +sql(s"INSERT INTO t1 (c1, c2, c3) values(1, 2)") + }, + sqlState = None, + errorClass = "_LEGACY_ERROR_TEMP_1038", + parameters = Map("columnSize" -> "3", "outputSize" -> "2"), + context = ExpectedContext("values(1, 2)", 28, 39) +) } } withSQLConf(SQLConf.ENABLE_DEFAULT_COLUMNS.key -> "false") { @@ -259,10 +272,15 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils { "che
[spark] branch branch-3.4 updated: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new e140bf719e3 [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc] e140bf719e3 is described below commit e140bf719e3e8d7347f5d00b2ebaf77d6a5b2210 Author: Jiaan Geng AuthorDate: Sat Jun 3 22:15:15 2023 +0300 [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc] ### What changes were proposed in this pull request? This PR used to backport https://github.com/apache/spark/pull/41436 to 3.4 ### Why are the changes needed? Fix the bug doesn't display column's sql for Percentile[Cont|Disc]. ### Does this PR introduce _any_ user-facing change? 'Yes'. Users could see the correct sql information. ### How was this patch tested? Test cases updated. Closes #41445 from beliefer/SPARK-43956_followup. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- .../expressions/aggregate/percentiles.scala| 4 ++-- .../sql-tests/results/percentiles.sql.out | 24 +++--- .../results/postgreSQL/aggregates_part4.sql.out| 8 .../udf/postgreSQL/udf-aggregates_part4.sql.out| 8 4 files changed, 22 insertions(+), 22 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala index 81bc7e51499..8447a5f9b51 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala @@ -368,7 +368,7 @@ case class PercentileCont(left: Expression, right: Expression, reverse: Boolean override def sql(isDistinct: Boolean): String = { val distinct = if (isDistinct) "DISTINCT " else "" val direction = if (reverse) " DESC" else "" -s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY v$direction)" +s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY ${left.sql}$direction)" } override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): PercentileCont = @@ -408,7 +408,7 @@ case class PercentileDisc( override def sql(isDistinct: Boolean): String = { val distinct = if (isDistinct) "DISTINCT " else "" val direction = if (reverse) " DESC" else "" -s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY v$direction)" +s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY ${left.sql}$direction)" } override protected def withNewChildrenInternal( diff --git a/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out b/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out index 38319875c71..cd99ded56bf 100644 --- a/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out @@ -144,7 +144,7 @@ SELECT FROM basic_pays ORDER BY salary -- !query schema -struct 8900 WINDOW w AS (PARTITION BY department) ORDER BY salary -- !query schema -struct +struct -- !query output 0-10 2-6 @@ -608,7 +608,7 @@ FROM intervals GROUP BY k ORDER BY k -- !query schema -struct +struct -- !query output 0 0 00:00:10.00 00:00:30.0 1 0 00:00:12.50 00:00:17.5 @@ -626,7 +626,7 @@ FROM intervals GROUP BY k ORDER BY k -- !query schema -struct +struct -- !query output 0 0 00:10:00.00 00:30:00.0 1 0 00:12:30.00 00:17:30.0 @@ -641,7 +641,7 @@ SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY dt DESC) FROM intervals -- !query schema -struct +struct -- !query output 0-10 2-6 @@ -655,7 +655,7 @@ FROM intervals GROUP BY k ORDER BY k -- !query schema -struct +struct -- !query output 0 0 00:00:10.00 00:00:30.0 1 0 00:00:10.00 00:00:20.0 @@ -673,7 +673,7 @@ FROM intervals GROUP BY k ORDER BY k -- !query schema -struct +struct -- !query output 0 0 00:10:00.00 00:30:00.0 1 0 00:10:00.00 00:20:00.0 @@ -689,7 +689,7 @@ SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY dt) FROM intervals -- !query schema -struct +struct -- !query output 1-81-8 1-8 @@ -704,7 +704,7 @@ FROM intervals GROUP BY k ORDER BY k -- !query schema -struct +struct -- !query output 0 0 00:00:20.00 00:00:20.00 00:00:20.
[spark] branch master updated (c3b62708cd6 -> 18b9bd9dcb0)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from c3b62708cd6 [SPARK-43516][ML][FOLLOW-UP] Drop vector type support in Distributed ML for spark connect add 18b9bd9dcb0 [SPARK-43945][SQL][TESTS] Fix bug for `SQLQueryTestSuite` when run on local env No new revisions were added by this update. Summary of changes: sql/core/src/test/resources/sql-tests/results/identifier-clause.sql.out | 2 +- sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new adabbb50053 [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors adabbb50053 is described below commit adabbb50053d442c0852c0c39c125a02d777d04e Author: Max Gekk AuthorDate: Thu Jun 1 10:14:12 2023 +0300 [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors ### What changes were proposed in this pull request? In the PR, I propose the drop the prefix `__auto_generated_subquery_name` from SQL ids in errors. ### Why are the changes needed? The changes should improve user experience with Spark SQL by making error messages shorter and more clear. ### Does this PR introduce _any_ user-facing change? Should not. ### How was this patch tested? By running the affected test suites: ``` $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" $ build/sbt "test:testOnly *QueryCompilationErrorsSuite" $ build/sbt "sql/testOnly *QueryExecutionErrorsSuite" ``` Closes #41411 from MaxGekk/strip__auto_generated_subquery_name. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala| 6 +- .../test/resources/sql-tests/analyzer-results/natural-join.sql.out | 2 +- .../src/test/resources/sql-tests/analyzer-results/pivot.sql.out | 2 +- .../test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out | 2 +- sql/core/src/test/resources/sql-tests/results/natural-join.sql.out | 2 +- sql/core/src/test/resources/sql-tests/results/pivot.sql.out | 2 +- sql/core/src/test/resources/sql-tests/results/udf/udf-pivot.sql.out | 2 +- .../org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala | 3 ++- .../org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 2 +- 9 files changed, 14 insertions(+), 9 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala index 5460de77a14..885b2f775e0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala @@ -71,7 +71,11 @@ private[sql] trait QueryErrorsBase { } def toSQLId(parts: Seq[String]): String = { -parts.map(quoteIdentifier).mkString(".") +val cleaned = parts match { + case "__auto_generated_subquery_name" :: rest if rest != Nil => rest + case other => other +} +cleaned.map(quoteIdentifier).mkString(".") } def toSQLId(parts: String): String = { diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out index 8fe2ba77855..987fb3e0a09 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out @@ -494,7 +494,7 @@ org.apache.spark.sql.AnalysisException "sqlState" : "42703", "messageParameters" : { "objectName" : "`nt2`.`k`", -"proposal" : "`__auto_generated_subquery_name`.`k`, `__auto_generated_subquery_name`.`v1`, `__auto_generated_subquery_name`.`v2`" +"proposal" : "`k`, `v1`, `v2`" }, "queryContext" : [ { "objectType" : "", diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out index e5560c04ff1..d7b77f8ce01 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out @@ -743,7 +743,7 @@ org.apache.spark.sql.AnalysisException "errorClass" : "INCOMPARABLE_PIVOT_COLUMN", "sqlState" : "42818", "messageParameters" : { -"columnName" : "`__auto_generated_subquery_name`.`m`" +"columnName" : "`m`" } } diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out index b5f4a6be3b2..fa94f77207b 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out @@ -683,7 +683,7 @@ org.apache.spark.sql.AnalysisException "errorClas
[spark] branch master updated: [SPARK-43867][SQL] Improve suggested candidates for unresolved attribute
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a8893422752 [SPARK-43867][SQL] Improve suggested candidates for unresolved attribute a8893422752 is described below commit a88934227523334550e451e437ce013772001079 Author: Max Gekk AuthorDate: Wed May 31 21:04:44 2023 +0300 [SPARK-43867][SQL] Improve suggested candidates for unresolved attribute ### What changes were proposed in this pull request? In the PR, I propose to change the approach of stripping the common part of candidate qualifiers in `StringUtils.orderSuggestedIdentifiersBySimilarity`: 1. If all candidates have the same qualifier including namespace and table name, drop it. It should be dropped if the base string (unresolved attribute) doesn't include a namespace and table name. For example: - `[ns1.table1.col1, ns1.table1.col2] -> [col1, col2]` for unresolved attribute `col0` - `[ns1.table1.col1, ns1.table1.col2] -> [table1.col1, table1.col2]` for unresolved attribute `table1.col0` 2. If all candidates belong to the same namespace, just drop it. It should be dropped for any non-fully qualified unresolved attribute. For example: - `[ns1.table1.col1, ns1.table2.col2] -> [table1.col1, table2.col2]` for unresolved attribute `col0` or `table0.col0` - `[ns1.table1.col1, ns1.table1.col2] -> [ns1.table1.col1, ns1.table1.col2]` for unresolved attribute `ns0.table0.col0` 4. Otherwise take the suggested candidates AS IS. 5. Sort the candidate list using the levenshtein distance. ### Why are the changes needed? This should improve user experience with Spark SQL by simplifying the error message about an unresolved attribute. ### Does this PR introduce _any_ user-facing change? Yes, it changes the error message. ### How was this patch tested? By running the existing test suites: ``` $ build/sbt "test:testOnly *AnalysisErrorSuite" $ build/sbt "test:testOnly *QueryCompilationErrorsSuite" $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" $ build/sbt "test:testOnly *DatasetUnpivotSuite" $ build/sbt "test:testOnly *DatasetSuite" ``` Closes #41368 from MaxGekk/fix-suggested-column-list. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../sql/catalyst/analysis/CheckAnalysis.scala | 3 +- .../plans/logical/basicLogicalOperators.scala | 2 +- .../spark/sql/catalyst/util/StringUtils.scala | 46 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 4 +- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 5 ++- .../columnresolution-negative.sql.out | 2 +- .../analyzer-results/group-by-all.sql.out | 2 +- .../analyzer-results/join-lateral.sql.out | 2 +- .../postgreSQL/aggregates_part1.sql.out| 2 +- .../analyzer-results/postgreSQL/join.sql.out | 6 +-- .../udf/postgreSQL/udf-aggregates_part1.sql.out| 2 +- .../udf/postgreSQL/udf-join.sql.out| 6 +-- .../results/columnresolution-negative.sql.out | 2 +- .../sql-tests/results/group-by-all.sql.out | 2 +- .../sql-tests/results/join-lateral.sql.out | 2 +- .../results/postgreSQL/aggregates_part1.sql.out| 2 +- .../sql-tests/results/postgreSQL/join.sql.out | 6 +-- .../udf/postgreSQL/udf-aggregates_part1.sql.out| 2 +- .../results/udf/postgreSQL/udf-join.sql.out| 6 +-- .../org/apache/spark/sql/DatasetUnpivotSuite.scala | 2 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 4 +- .../sql/errors/QueryCompilationErrorsSuite.scala | 3 +- 22 files changed, 53 insertions(+), 60 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index c46dff1c4bf..594c0b666e8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -139,7 +139,8 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB a: Attribute, errorClass: String): Nothing = { val missingCol = a.sql -val candidates = operator.inputSet.toSeq.map(_.qualifiedName) +val candidates = operator.inputSet.toSeq + .map(attr => attr.qualifier :+ attr.name) val orderedCandidates = StringUtils.orderSuggestedIdentifiersBySimilarity(missingCol, candidates) throw QueryCompilationErrors.unresolvedAttributeError( diff --git a/sql/catalyst/src/main/scala/org/apache/spa
[spark] branch master updated (c2060e7c0a3 -> 3457b4be356)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from c2060e7c0a3 [SPARK-43081][ML][FOLLOW-UP] Improve torch distributor data loader code add 3457b4be356 [SPARK-43852][SPARK-43853][SPARK-43854][SPARK-43855][SPARK-43856] Assign names to the error class _LEGACY_ERROR_TEMP_2418-2425 No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 57 -- .../sql/tests/pandas/test_pandas_udf_scalar.py | 4 +- python/pyspark/sql/tests/test_udf.py | 4 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 18 +++ .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 28 --- .../apache/spark/sql/DataFrameAsOfJoinSuite.scala | 29 ++- .../apache/spark/sql/LateralColumnAliasSuite.scala | 32 .../sql/hive/execution/AggregationQuerySuite.scala | 25 ++ .../spark/sql/hive/execution/HiveUDAFSuite.scala | 15 -- .../spark/sql/hive/execution/UDAQuerySuite.scala | 25 ++ 10 files changed, 145 insertions(+), 92 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43882][SQL] Assign name to _LEGACY_ERROR_TEMP_2122
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2687d784fe4 [SPARK-43882][SQL] Assign name to _LEGACY_ERROR_TEMP_2122 2687d784fe4 is described below commit 2687d784fe4d20af321f11074139c0ce382bbaef Author: Jia Fan AuthorDate: Wed May 31 10:26:15 2023 +0300 [SPARK-43882][SQL] Assign name to _LEGACY_ERROR_TEMP_2122 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2122, "FAILED_PARSE_STRUCT_TYPE". ### Why are the changes needed? Assign proper name to _LEGACY_ERROR_TEMP_2122 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test Closes #41381 from Hisoka-X/SPARK-43882_LEGACY_ERROR_TEMP_2122. Authored-by: Jia Fan Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 11 ++- .../org/apache/spark/sql/errors/QueryExecutionErrors.scala| 4 ++-- .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 10 ++ 3 files changed, 18 insertions(+), 7 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 8c3ba1e190d..7f2b1975855 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -634,6 +634,12 @@ ], "sqlState" : "38000" }, + "FAILED_PARSE_STRUCT_TYPE" : { +"message" : [ + "Failed parsing struct: ." +], +"sqlState" : "22018" + }, "FAILED_RENAME_PATH" : { "message" : [ "Failed to rename to as destination already exists." @@ -4563,11 +4569,6 @@ "Do not support type ." ] }, - "_LEGACY_ERROR_TEMP_2122" : { -"message" : [ - "Failed parsing : ." -] - }, "_LEGACY_ERROR_TEMP_2124" : { "message" : [ "Failed to merge decimal types with incompatible scale and ." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 5daa8ed3b7f..7ce3e7a9e7e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -1305,8 +1305,8 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { def failedParsingStructTypeError(raw: String): SparkRuntimeException = { new SparkRuntimeException( - errorClass = "_LEGACY_ERROR_TEMP_2122", - messageParameters = Map("simpleString" -> StructType.simpleString, "raw" -> raw)) + errorClass = "FAILED_PARSE_STRUCT_TYPE", + messageParameters = Map("raw" -> toSQLValue(raw, StringType))) } def cannotMergeDecimalTypesWithIncompatibleScaleError( diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala index 4bcb1d115b7..6d2c2600cbb 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala @@ -633,6 +633,16 @@ class QueryExecutionErrorsSuite "config" -> s""""${SQLConf.ANSI_ENABLED.key}"""")) } + test("FAILED_PARSE_STRUCT_TYPE: parsing invalid struct type") { +val raw = """{"type":"array","elementType":"integer","containsNull":false}""" +checkError( + exception = intercept[SparkRuntimeException] { +StructType.fromString(raw) + }, + errorClass = "FAILED_PARSE_STRUCT_TYPE", + parameters = Map("raw" -> s"'$raw'")) + } + test("CAST_OVERFLOW: from long to ANSI intervals") { Seq( LongType -> "9223372036854775807L", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7d87fecda70 -> 11390c50972)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7d87fecda70 [SPARK-43878][BUILD] Upgrade `cyclonedx-maven-plugin` from 2.7.6 to 2.7.9 add 11390c50972 [SPARK-43815][SQL] Add `to_varchar` alias for `to_char` No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../sql-functions/sql-expression-schema.md | 1 + .../sql-tests/analyzer-results/charvarchar.sql.out | 21 +++ .../resources/sql-tests/inputs/charvarchar.sql | 5 + .../sql-tests/results/charvarchar.sql.out | 24 ++ 5 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43862][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_(1254 & 1315)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0c6ea478d6b [SPARK-43862][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_(1254 & 1315) 0c6ea478d6b is described below commit 0c6ea478d6b448caab5c969be122159acef2bbeb Author: panbingkun AuthorDate: Tue May 30 14:18:26 2023 +0300 [SPARK-43862][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_(1254 & 1315) ### What changes were proposed in this pull request? The pr aims to 1. Assign a name to the error class, include: - _LEGACY_ERROR_TEMP_1254 -> UNSUPPORTED_OVERWRITE.PATH - _LEGACY_ERROR_TEMP_1315 -> UNSUPPORTED_OVERWRITE.TABLE 2. Convert _LEGACY_ERROR_TEMP_0002 to INTERNAL_ERROR. ### Why are the changes needed? - The changes improve the error framework. - Because the subclass `SparkSqlAstBuilder` of `AstBuilder` has already override methods `visitInsertOverwriteDir` and `visitInsertOverwriteHiveDir`. In reality, `SparkSqlParser` is used in the Spark base code , and `SparkSqlAstBuilder` is used, The two exceptions mentioned above in AstBuilder will not be thrown through the user's perspective. https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L46-L47 - visitInsertOverwriteDir https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L802-L834 - visitInsertOverwriteHiveDir https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L848-L866 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manual testing: $ build/sbt "test:testOnly *DDLParserSuite" $ build/sbt "test:testOnly *InsertSuite" $ build/sbt "test:testOnly *MetastoreDataSourcesSuite" $ build/sbt "test:testOnly *HiveDDLSuite" - Pass GA. Closes #41367 from panbingkun/LEGACY_ERROR_TEMP_1254. Authored-by: panbingkun Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 32 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 4 +- .../spark/sql/errors/QueryCompilationErrors.scala | 18 +++--- .../spark/sql/errors/QueryParsingErrors.scala | 5 +- .../spark/sql/catalyst/analysis/AnalysisTest.scala | 5 ++ .../spark/sql/catalyst/parser/DDLParserSuite.scala | 20 +++ .../org/apache/spark/sql/DataFrameWriter.scala | 6 +- .../apache/spark/sql/execution/command/ddl.scala | 12 +++- .../execution/datasources/DataSourceStrategy.scala | 2 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 70 -- .../spark/sql/hive/MetastoreDataSourcesSuite.scala | 35 ++- .../spark/sql/hive/execution/HiveDDLSuite.scala| 11 ++-- 12 files changed, 149 insertions(+), 71 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 07ff6e1c7c2..8c3ba1e190d 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -2320,6 +2320,23 @@ "grouping()/grouping_id() can only be used with GroupingSets/Cube/Rollup." ] }, + "UNSUPPORTED_OVERWRITE" : { +"message" : [ + "Can't overwrite the target that is also being read from." +], +"subClass" : { + "PATH" : { +"message" : [ + "The target path is ." +] + }, + "TABLE" : { +"message" : [ + "The target table is ." +] + } +} + }, "UNSUPPORTED_SAVE_MODE" : { "message" : [ "The save mode is not supported for:" @@ -2477,11 +2494,6 @@ "Invalid InsertIntoContext." ] }, - "_LEGACY_ERROR_TEMP_0002" : { -"message" : [ - "INSERT OVERWRITE DIRECTORY is not supported." -] - }, "_LEGACY_ERROR_TEMP_0004" : { "message" : [ "Empty source for merge: you should specify a source table/subquery in merge." @@ -3669,11 +3681,6 @@ "Cannot alter a table with ALTER VIEW. Please use ALTER TABLE instead." ] }, - "_LEGACY_ERROR_TEMP_1254" : { -"message" : [ - "Cannot overwrite a path that is also being read from." -] - }, "_LEGACY_ERROR_TEMP_1255" : { "message
[spark] branch master updated (27bb384947e -> 8b464df9fcf)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 27bb384947e [SPARK-43841][SQL] Handle candidate attributes with no prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity` add 8b464df9fcf [SPARK-43846][SQL][TESTS] Use checkError() to check Exception in SessionCatalogSuite No new revisions were added by this update. Summary of changes: .../sql/catalyst/catalog/SessionCatalogSuite.scala | 462 - 1 file changed, 277 insertions(+), 185 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (31a8ef803a8 -> 27bb384947e)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 31a8ef803a8 [SPARK-43821][CONNECT][TESTS] Make the prompt for `findJar` method in IntegrationTestUtils clearer add 27bb384947e [SPARK-43841][SQL] Handle candidate attributes with no prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity` No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/StringUtils.scala | 2 +- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 7 ++ .../sql/errors/QueryCompilationErrorsSuite.scala | 27 ++ 3 files changed, 35 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e6e242e0181 [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL` e6e242e0181 is described below commit e6e242e01813ddcc735f61a668059ed648a6cefb Author: panbingkun AuthorDate: Sun May 28 21:15:24 2023 +0300 [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL` ### What changes were proposed in this pull request? The pr aims to convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL` and remove `_LEGACY_ERROR_TEMP_1335` ### Why are the changes needed? - The changes improve the error framework. - In the spark base code `_ LEGACY_ ERROR_ TEMP_ 1335` is no longer used anywhere. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Add new UT - Pass GA Closes #41349 from panbingkun/SPARK-43839. Authored-by: panbingkun Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json| 10 -- .../apache/spark/sql/errors/QueryCompilationErrors.scala| 6 -- .../sql/execution/datasources/v2/V2SessionCatalog.scala | 6 -- .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 13 + 4 files changed, 17 insertions(+), 18 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 36125d2cbae..f7c0879e1a2 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -4015,16 +4015,6 @@ "Cannot specify both version and timestamp when time travelling the table." ] }, - "_LEGACY_ERROR_TEMP_1335" : { -"message" : [ - " is not a valid timestamp expression for time travel." -] - }, - "_LEGACY_ERROR_TEMP_1337" : { -"message" : [ - "Table does not support time travel." -] - }, "_LEGACY_ERROR_TEMP_1338" : { "message" : [ "Sinks cannot request distribution and ordering in continuous execution mode." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 05b829838aa..45a9a03df4d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -3152,12 +3152,6 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { messageParameters = Map("relationId" -> relationId)) } - def tableNotSupportTimeTravelError(tableName: Identifier): Throwable = { -new AnalysisException( - errorClass = "_LEGACY_ERROR_TEMP_1337", - messageParameters = Map("tableName" -> tableName.toString)) - } - def writeDistributionAndOrderingNotSupportedInContinuousExecution(): Throwable = { new AnalysisException( errorClass = "_LEGACY_ERROR_TEMP_1338", diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala index 437194b7b5b..8234fb5a0b1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala @@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog) throw QueryCompilationErrors.timeTravelUnsupportedError( toSQLId(catalogTable.identifier.nameParts)) } else { - throw QueryCompilationErrors.tableNotSupportTimeTravelError(ident) + throw QueryCompilationErrors.timeTravelUnsupportedError( +toSQLId(catalogTable.identifier.nameParts)) } - case _ => throw QueryCompilationErrors.tableNotSupportTimeTravelError(ident) + case _ => throw QueryCompilationErrors.timeTravelUnsupportedError( +toSQLId(ident.asTableIdentifier.nameParts)) } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala index 377596466db..4bcb1d115b7 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala @@ -886,6 +886,19 @@ class QueryExecutio
[spark] branch master updated: [SPARK-43834][SQL] Use error classes in the compilation errors of `ResolveDefaultColumns`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 588188f481d [SPARK-43834][SQL] Use error classes in the compilation errors of `ResolveDefaultColumns` 588188f481d is described below commit 588188f481db899317bdc398438d6bd749224f9f Author: panbingkun AuthorDate: Sun May 28 19:08:25 2023 +0300 [SPARK-43834][SQL] Use error classes in the compilation errors of `ResolveDefaultColumns` ### What changes were proposed in this pull request? The pr aims to use error classes in the compilation errors of `ResolveDefaultColumns`. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update UT. - Pass GA. Closes #41345 from panbingkun/SPARK-43834. Authored-by: panbingkun Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 27 ++- .../catalyst/util/ResolveDefaultColumnsUtil.scala | 21 +-- .../spark/sql/errors/QueryCompilationErrors.scala | 43 - .../sql/catalyst/catalog/SessionCatalogSuite.scala | 38 +++- .../analysis/ResolveDefaultColumnsSuite.scala | 53 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 206 +++-- 6 files changed, 290 insertions(+), 98 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index c8e11e6e55e..36125d2cbae 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -948,6 +948,28 @@ ], "sqlState" : "42000" }, + "INVALID_DEFAULT_VALUE" : { +"message" : [ + "Failed to execute command because the destination table column has a DEFAULT value ," +], +"subClass" : { + "DATA_TYPE" : { +"message" : [ + "which requires type, but the statement provided a value of incompatible type." +] + }, + "SUBQUERY_EXPRESSION" : { +"message" : [ + "which contains subquery expressions." +] + }, + "UNRESOLVED_EXPRESSION" : { +"message" : [ + "which fails to resolve as a valid expression." +] + } +} + }, "INVALID_DRIVER_MEMORY" : { "message" : [ "System memory must be at least . Please increase heap size using the --driver-memory option or \"\" in Spark configuration." @@ -4048,11 +4070,6 @@ "Failed to execute command because DEFAULT values are not supported when adding new columns to previously existing target data source with table provider: \"\"." ] }, - "_LEGACY_ERROR_TEMP_1347" : { -"message" : [ - "Failed to execute command because subquery expressions are not allowed in DEFAULT values." -] - }, "_LEGACY_ERROR_TEMP_2000" : { "message" : [ ". If necessary set to false to bypass this error." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala index 8c7e2ad4f1d..0f5c413ed78 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala @@ -188,14 +188,13 @@ object ResolveDefaultColumns { parser.parseExpression(defaultSQL) } catch { case ex: ParseException => -throw new AnalysisException( - s"Failed to execute $statementType command because the destination table column " + -s"$colName has a DEFAULT value of $defaultSQL which fails to parse as a valid " + -s"expression: ${ex.getMessage}") +throw QueryCompilationErrors.defaultValuesUnresolvedExprError( + statementType, colName, defaultSQL, ex) } // Check invariants before moving on to analysis. if (parsed.containsPattern(PLAN_EXPRESSION)) { - throw QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions() + throw QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions( +statementType, colName, defaultSQL) } // Analyze the parse result. val plan = try { @@ -205,10 +204,8 @@ object ResolveDefaultColumns { ConstantFolding(analyzed) } catch { case ex: AnalysisEx
[spark] branch master updated: [SPARK-43837][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_103[1-2]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2c0a206a89f [SPARK-43837][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_103[1-2] 2c0a206a89f is described below commit 2c0a206a89ff9042a0577a7f5f30fa20fb8c984a Author: panbingkun AuthorDate: Sun May 28 18:59:20 2023 +0300 [SPARK-43837][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_103[1-2] ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_103[1-2]. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update existed UT. - Pass GA. Closes #41346 from panbingkun/SPARK-43837. Authored-by: panbingkun Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 27 +++--- .../spark/sql/errors/QueryCompilationErrors.scala | 18 +--- .../spark/sql/DataFrameWindowFramesSuite.scala | 33 ++ 3 files changed, 58 insertions(+), 20 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 3a11001ad9d..c8e11e6e55e 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -909,6 +909,23 @@ ], "sqlState" : "22003" }, + "INVALID_BOUNDARY" : { +"message" : [ + "The boundary is invalid: ." +], +"subClass" : { + "END" : { +"message" : [ + "Expected the value is '0', '', '[, ]'." +] + }, + "START" : { +"message" : [ + "Expected the value is '0', '', '[, ]'." +] + } +} + }, "INVALID_BUCKET_FILE" : { "message" : [ "Invalid bucket file: ." @@ -3840,16 +3857,6 @@ "Unable to find the column `` given []." ] }, - "_LEGACY_ERROR_TEMP_1301" : { -"message" : [ - "Boundary start is not a valid integer: ." -] - }, - "_LEGACY_ERROR_TEMP_1302" : { -"message" : [ - "Boundary end is not a valid integer: ." -] - }, "_LEGACY_ERROR_TEMP_1304" : { "message" : [ "Unexpected type of the relation ." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 3cb22491aed..18ace731dd4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -2877,14 +2877,24 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { def invalidBoundaryStartError(start: Long): Throwable = { new AnalysisException( - errorClass = "_LEGACY_ERROR_TEMP_1301", - messageParameters = Map("start" -> start.toString)) + errorClass = "INVALID_BOUNDARY.START", + messageParameters = Map( +"boundary" -> toSQLId("start"), +"invalidValue" -> toSQLValue(start, LongType), +"longMinValue" -> toSQLValue(Long.MinValue, LongType), +"intMinValue" -> toSQLValue(Int.MinValue, IntegerType), +"intMaxValue" -> toSQLValue(Int.MaxValue, IntegerType))) } def invalidBoundaryEndError(end: Long): Throwable = { new AnalysisException( - errorClass = "_LEGACY_ERROR_TEMP_1302", - messageParameters = Map("end" -> end.toString)) + errorClass = "INVALID_BOUNDARY.END", + messageParameters = Map( +"boundary" -> toSQLId("end"), +"invalidValue" -> toSQLValue(end, LongType), +"longMaxValue" -> toSQLValue(Long.MaxValue, LongType), +"intMinValue" -> toSQLValue(Int.MinValue, IntegerType), +"intMaxValue" -> toSQLValue(Int.MaxValue, IntegerType))) } def tableOrViewNotFound(ident: Seq[String]): Throwable = { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala index 48a3d740559..2a81f7e7c2f 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala +++ b/sql/core
[spark] branch master updated: [SPARK-43820][SPARK-43822][SPARK-43823][SPARK-43826][SPARK-43827] Assign names to the error class _LEGACY_ERROR_TEMP_241[1-7]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fe7bdce8d12 [SPARK-43820][SPARK-43822][SPARK-43823][SPARK-43826][SPARK-43827] Assign names to the error class _LEGACY_ERROR_TEMP_241[1-7] fe7bdce8d12 is described below commit fe7bdce8d121e2733e82706177d34f0342db0cbe Author: Jiaan Geng AuthorDate: Sun May 28 13:50:59 2023 +0300 [SPARK-43820][SPARK-43822][SPARK-43823][SPARK-43826][SPARK-43827] Assign names to the error class _LEGACY_ERROR_TEMP_241[1-7] ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_241[1-7]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. Closes #41339 from beliefer/2411-2417. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 45 +--- .../sql/catalyst/analysis/CheckAnalysis.scala | 32 --- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 20 ++--- .../sql-tests/analyzer-results/percentiles.sql.out | 48 +++--- .../sql-tests/results/percentiles.sql.out | 48 +++--- 5 files changed, 99 insertions(+), 94 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 10a483396e6..3a11001ad9d 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -617,6 +617,11 @@ "Not found an encoder of the type to Spark SQL internal representation. Consider to change the input type to one of supported at '/sql-ref-datatypes.html'." ] }, + "EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE" : { +"message" : [ + "The event time has the invalid type , but expected \"TIMESTAMP\"." +] + }, "FAILED_EXECUTE_UDF" : { "message" : [ "Failed to execute user defined function (: () => )." @@ -1371,6 +1376,11 @@ ], "sqlState" : "42903" }, + "INVALID_WINDOW_SPEC_FOR_AGGREGATION_FUNC" : { +"message" : [ + "Cannot specify ORDER BY or a window frame for ." +] + }, "INVALID_WRITE_DISTRIBUTION" : { "message" : [ "The requested write distribution is invalid." @@ -1393,6 +1403,11 @@ } } }, + "JOIN_CONDITION_IS_NOT_BOOLEAN_TYPE" : { +"message" : [ + "The join condition has the invalid type , expected \"BOOLEAN\"." +] + }, "LOCATION_ALREADY_EXISTS" : { "message" : [ "Cannot name the managed table as , as its associated location already exists. Please pick a different table name, or remove the existing location first." @@ -1785,6 +1800,11 @@ ], "sqlState" : "22023" }, + "SEED_EXPRESSION_IS_UNFOLDABLE" : { +"message" : [ + "The seed expression of the expression must be foldable." +] + }, "SORT_BY_WITHOUT_BUCKETING" : { "message" : [ "sortBy must be used together with bucketBy." @@ -5441,31 +5461,6 @@ "failed to evaluate expression : " ] }, - "_LEGACY_ERROR_TEMP_2411" : { -"message" : [ - "Cannot specify order by or frame for ''." -] - }, - "_LEGACY_ERROR_TEMP_2413" : { -"message" : [ - "Input argument to must be a constant." -] - }, - "_LEGACY_ERROR_TEMP_2414" : { -"message" : [ - "Event time must be defined on a window or a timestamp, but is of type ." -] - }, - "_LEGACY_ERROR_TEMP_2416" : { -"message" : [ - "join condition '' of type is not a boolean." -] - }, - "_LEGACY_ERROR_TEMP_2417" : { -"message" : [ - "join condition '' of type is not a boolean." -] - }, "_LEGACY_ERROR_TEMP_2418" : { "message" : [ "Input argument tolerance must be a constant." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 43f12fabf70..cafabb22d10 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/m
[spark] branch master updated (7ce4dc64273 -> d052a454fda)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7ce4dc64273 [SPARK-41775][PYTHON][FOLLOWUP] Use pyspark.cloudpickle instead of `cloudpickle` in torch distributor add d052a454fda [SPARK-43824][SPARK-43825] [SQL] Assign names to the error class _LEGACY_ERROR_TEMP_128[1-2] No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 20 ++-- .../spark/sql/errors/QueryCompilationErrors.scala| 14 +++--- .../apache/spark/sql/execution/command/views.scala | 3 +-- .../spark/sql/execution/SQLViewTestSuite.scala | 16 4 files changed, 26 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (24901bf187f -> bccfe71a32f)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 24901bf187f [SPARK-43808][SQL][TESTS] Use `checkError()` to check `Exception` in `SQLViewTestSuite` add bccfe71a32f [SPARK-43762][SPARK-43763][SPARK-43764][SPARK-43765][SPARK-43766][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_24[06-10] No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 40 - .../sql/catalyst/analysis/CheckAnalysis.scala | 26 +++--- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 9 +++-- .../sql/catalyst/analysis/AnalysisSuite.scala | 42 ++ .../analyzer-results/group-analytics.sql.out | 2 +- .../udf/udf-group-analytics.sql.out| 2 +- .../sql-tests/results/group-analytics.sql.out | 2 +- .../results/udf/udf-group-analytics.sql.out| 2 +- 8 files changed, 66 insertions(+), 59 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f718b025d87 -> 24901bf187f)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f718b025d87 [SPARK-43802][SQL] Fix codegen for unhex and unbase64 with failOnError=true add 24901bf187f [SPARK-43808][SQL][TESTS] Use `checkError()` to check `Exception` in `SQLViewTestSuite` No new revisions were added by this update. Summary of changes: .../spark/sql/execution/SQLViewTestSuite.scala | 145 ++--- 1 file changed, 95 insertions(+), 50 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43802][SQL] Fix codegen for unhex and unbase64 with failOnError=true
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f718b025d87 [SPARK-43802][SQL] Fix codegen for unhex and unbase64 with failOnError=true f718b025d87 is described below commit f718b025d87ae3726210c60ff71cb34917b32f51 Author: Adam Binford AuthorDate: Fri May 26 20:37:14 2023 +0300 [SPARK-43802][SQL] Fix codegen for unhex and unbase64 with failOnError=true ### What changes were proposed in this pull request? Fixes an error with codegen for unhex and unbase64 expression when failOnError is enabled introduced in https://github.com/apache/spark/pull/37483. ### Why are the changes needed? Codegen fails and Spark falls back to interpreted evaluation: ``` Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 47, Column 1: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 47, Column 1: Unknown variable or type "BASE64" ``` in the code block: ``` /* 107 */ if (!org.apache.spark.sql.catalyst.expressions.UnBase64.isValidBase64(project_value_1)) { /* 108 */ throw QueryExecutionErrors.invalidInputInConversionError( /* 109 */ ((org.apache.spark.sql.types.BinaryType$) references[1] /* to */), /* 110 */ project_value_1, /* 111 */ BASE64, /* 112 */ "try_to_binary"); /* 113 */ } ``` ### Does this PR introduce _any_ user-facing change? Bug fix. ### How was this patch tested? Added to the existing tests so evaluate an expression with failOnError enabled to test that path of the codegen. Closes #41317 from Kimahriman/bug-to-binary-codegen. Authored-by: Adam Binford Signed-off-by: Max Gekk --- .../sql/catalyst/expressions/mathExpressions.scala | 3 +- .../catalyst/expressions/stringExpressions.scala | 3 +- .../expressions/MathExpressionsSuite.scala | 3 ++ .../expressions/StringExpressionsSuite.scala | 4 +- .../sql/errors/QueryExecutionErrorsSuite.scala | 46 -- 5 files changed, 43 insertions(+), 16 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala index dcc821a24ea..add59a38b72 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala @@ -1172,14 +1172,13 @@ case class Unhex(child: Expression, failOnError: Boolean = false) nullSafeCodeGen(ctx, ev, c => { val hex = Hex.getClass.getName.stripSuffix("$") val maybeFailOnErrorCode = if (failOnError) { -val format = UTF8String.fromString("BASE64"); val binaryType = ctx.addReferenceObj("to", BinaryType, BinaryType.getClass.getName) s""" |if (${ev.value} == null) { | throw QueryExecutionErrors.invalidInputInConversionError( |$binaryType, |$c, - |$format, + |UTF8String.fromString("HEX"), |"try_to_binary"); |} |""".stripMargin diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 347dff0f4c4..03596ac40b1 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -2472,14 +2472,13 @@ case class UnBase64(child: Expression, failOnError: Boolean = false) nullSafeCodeGen(ctx, ev, child => { val maybeValidateInputCode = if (failOnError) { val unbase64 = UnBase64.getClass.getName.stripSuffix("$") -val format = UTF8String.fromString("BASE64"); val binaryType = ctx.addReferenceObj("to", BinaryType, BinaryType.getClass.getName) s""" |if (!$unbase64.isValidBase64($child)) { | throw QueryExecutionErrors.invalidInputInConversionError( |$binaryType, |$child, - |$format, + |UTF8String.fromString("BASE64"), |"try_to_binary"); |} """.stripMargin diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ex
[spark] branch master updated: [SPARK-43794][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1335
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5590c9a4654 [SPARK-43794][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1335 5590c9a4654 is described below commit 5590c9a4654607488379703581e341d4062f9666 Author: panbingkun AuthorDate: Fri May 26 16:37:01 2023 +0300 [SPARK-43794][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1335 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_1335. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Update existed UT. Pass GA. Closes #41314 from panbingkun/SPARK-43794. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 22 ++ .../sql/catalyst/analysis/TimeTravelSpec.scala | 12 .../spark/sql/errors/QueryCompilationErrors.scala | 6 +++--- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 17 + 4 files changed, 42 insertions(+), 15 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index bbf0368ac59..738e037c39d 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1326,6 +1326,28 @@ "Cannot create the persistent object of the type because it references to the temporary object of the type . Please make the temporary object persistent, or make the persistent object temporary." ] }, + "INVALID_TIME_TRAVEL_TIMESTAMP_EXPR" : { +"message" : [ + "The time travel timestamp expression is invalid." +], +"subClass" : { + "INPUT" : { +"message" : [ + "Cannot be casted to the \"TIMESTAMP\" type." +] + }, + "NON_DETERMINISTIC" : { +"message" : [ + "Must be deterministic." +] + }, + "UNEVALUABLE" : { +"message" : [ + "Must be evaluable." +] + } +} + }, "INVALID_TYPED_LITERAL" : { "message" : [ "The value of the typed literal is invalid: ." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala index e33ddbb3213..26856d9a5e0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala @@ -38,21 +38,25 @@ object TimeTravelSpec { val ts = timestamp.get assert(ts.resolved && ts.references.isEmpty && !SubqueryExpression.hasSubquery(ts)) if (!Cast.canAnsiCast(ts.dataType, TimestampType)) { -throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts) +throw QueryCompilationErrors.invalidTimestampExprForTimeTravel( + "INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.INPUT", ts) } val tsToEval = ts.transform { case r: RuntimeReplaceable => r.replacement case _: Unevaluable => - throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts) + throw QueryCompilationErrors.invalidTimestampExprForTimeTravel( +"INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.UNEVALUABLE", ts) case e if !e.deterministic => - throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts) + throw QueryCompilationErrors.invalidTimestampExprForTimeTravel( +"INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.NON_DETERMINISTIC", ts) } val tz = Some(conf.sessionLocalTimeZone) // Set `ansiEnabled` to false, so that it can return null for invalid input and we can provide // better error message. val value = Cast(tsToEval, TimestampType, tz, ansiEnabled = false).eval() if (value == null) { -throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts) +throw QueryCompilationErrors.invalidTimestampExprForTimeTravel( + "INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.INPUT", ts) } Some(AsOfTimestamp(value.asInstanceOf[Long])) } else if (version.nonEmpty) { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sq
[spark] branch master updated: [SPARK-43807][SQL] Migrate _LEGACY_ERROR_TEMP_1269 to PARTITION_SCHEMA_IS_EMPTY
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 45e5f2e375b [SPARK-43807][SQL] Migrate _LEGACY_ERROR_TEMP_1269 to PARTITION_SCHEMA_IS_EMPTY 45e5f2e375b is described below commit 45e5f2e375bec915e1683e6d2a222488ba831c91 Author: Jiaan Geng AuthorDate: Fri May 26 10:58:51 2023 +0300 [SPARK-43807][SQL] Migrate _LEGACY_ERROR_TEMP_1269 to PARTITION_SCHEMA_IS_EMPTY ### What changes were proposed in this pull request? Currently, DS V1 uses `_LEGACY_ERROR_TEMP_1269` and DS V2 uses `INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY` if the partition operation on non-partition table. This PR want migrate `_LEGACY_ERROR_TEMP_1269` to `PARTITION_SCHEMA_IS_EMPTY` ### Why are the changes needed? Migrate `_LEGACY_ERROR_TEMP_1269` to `PARTITION_SCHEMA_IS_EMPTY`. ### Does this PR introduce _any_ user-facing change? 'Yes'. The error msg has a little change. ### How was this patch tested? Test case updated. Closes #41325 from beliefer/SPARK-43807. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 5 - .../scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala | 4 ++-- .../apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala | 4 ++-- 3 files changed, 4 insertions(+), 9 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 0246d4f378e..bbf0368ac59 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -3618,11 +3618,6 @@ "Failed to truncate table when removing data of the path: ." ] }, - "_LEGACY_ERROR_TEMP_1269" : { -"message" : [ - "SHOW PARTITIONS is not allowed on a table that is not partitioned: ." -] - }, "_LEGACY_ERROR_TEMP_1270" : { "message" : [ "SHOW CREATE TABLE is not supported on a temporary view: ." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 879bf620188..9921f50014d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -2630,8 +2630,8 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { def showPartitionNotAllowedOnTableNotPartitionedError(tableIdentWithDB: String): Throwable = { new AnalysisException( - errorClass = "_LEGACY_ERROR_TEMP_1269", - messageParameters = Map("tableIdentWithDB" -> tableIdentWithDB)) + errorClass = "INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY", + messageParameters = Map("name" -> toSQLId(tableIdentWithDB))) } def showCreateTableNotSupportedOnTempView(table: String): Throwable = { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala index e67ed807a87..c423bfb9f24 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala @@ -130,8 +130,8 @@ class ShowPartitionsSuite extends ShowPartitionsSuiteBase with CommandSuiteBase exception = intercept[AnalysisException] { sql(sqlText) }, -errorClass = "_LEGACY_ERROR_TEMP_1269", -parameters = Map("tableIdentWithDB" -> tableName)) +errorClass = "INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY", +parameters = Map("name" -> tableName)) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43576][CORE] Remove unused declarations from Core module
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 82bf3fcc81a [SPARK-43576][CORE] Remove unused declarations from Core module 82bf3fcc81a is described below commit 82bf3fcc81ae0be8ce945242ae966cee4fae4104 Author: panbingkun AuthorDate: Fri May 26 10:19:46 2023 +0300 [SPARK-43576][CORE] Remove unused declarations from Core module ### What changes were proposed in this pull request? The pr aims to remove unused declarations from `Core` module ### Why are the changes needed? Make code clean. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41218 from panbingkun/remove_unused_declaration_core. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../src/main/resources/org/apache/spark/ui/static/executorspage.js | 1 - .../scala/org/apache/spark/deploy/history/ApplicationCache.scala | 1 - core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala | 3 --- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 5 - core/src/main/scala/org/apache/spark/ui/ToolTips.scala | 7 --- 5 files changed, 17 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js index 8c2dc13c35b..92d75c18e49 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js @@ -126,7 +126,6 @@ function totalDurationAlpha(totalGCTime, totalDuration) { (Math.min(totalGCTime / totalDuration + 0.5, 1)) : 1; } -// When GCTimePercent is edited change ToolTips.TASK_TIME to match var GCTimePercent = 0.1; function totalDurationStyle(totalGCTime, totalDuration) { diff --git a/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala b/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala index 829631a0454..909f5ea937c 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala @@ -394,7 +394,6 @@ private[history] class ApplicationCacheCheckFilter( val httpRequest = request.asInstanceOf[HttpServletRequest] val httpResponse = response.asInstanceOf[HttpServletResponse] val requestURI = httpRequest.getRequestURI -val operation = httpRequest.getMethod // if the request is for an attempt, check to see if it is in need of delete/refresh // and have the cache update the UI if so diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala index 0d905b46953..cad107256c5 100644 --- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala @@ -404,9 +404,6 @@ private[spark] object HadoopRDD extends Logging { */ val CONFIGURATION_INSTANTIATION_LOCK = new Object() - /** Update the input bytes read metric each time this number of records has been read */ - val RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES = 256 - /** * The three methods below are helpers for accessing the local map, a property of the SparkEnv of * the local process. diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index d8119fb9498..9582bdbf526 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -590,11 +590,6 @@ private class ProxyRedirectHandler(_proxyUri: String) extends HandlerWrapper { override def sendRedirect(location: String): Unit = { val newTarget = if (location != null) { val target = new URI(location) -val path = if (target.getPath().startsWith("/")) { - target.getPath() -} else { - req.getRequestURI().stripSuffix("/") + "/" + target.getPath() -} // The target path should already be encoded, so don't re-encode it, just the // proxy address part. val proxyBase = UIUtils.uiRoot(req) diff --git a/core/src/main/scala/org/apache/spark/ui/ToolTips.scala b/core/src/main/scala/org/apache/spark/ui/ToolTips.scala index 587046676ff..b80fba396b3 100644 --- a/core/src/main/scala/org/apache/spark/ui/ToolTips.scala +++ b/core/src/main/scala/org/apache/spark/ui/ToolTips.scala @@ -35,10 +35,6 @@ private[spark] object ToolTips { val OUTPUT = "Bytes written to Hadoop." - val STORAGE_MEMORY = -"Memory used / total available memory for storage of data " + - "like RD
[spark] branch master updated: [SPARK-43791][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1336
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 69803fb0244 [SPARK-43791][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1336 69803fb0244 is described below commit 69803fb0244c9fc110653092bcfab7c221448bce Author: panbingkun AuthorDate: Fri May 26 09:29:21 2023 +0300 [SPARK-43791][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1336 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_1336. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Update existed UT. Pass GA. Closes #41309 from panbingkun/LEGACY_ERROR_TEMP_1336. Authored-by: panbingkun Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 10 .../spark/sql/catalyst/analysis/Analyzer.scala | 3 +-- .../sql/catalyst/analysis/CTESubstitution.scala| 3 ++- .../spark/sql/errors/QueryCompilationErrors.scala | 6 ++--- .../spark/sql/execution/datasources/rules.scala| 3 ++- .../datasources/v2/V2SessionCatalog.scala | 4 +++- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 8 +++ .../spark/sql/execution/SQLViewTestSuite.scala | 27 -- 8 files changed, 40 insertions(+), 24 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 7683e7b8650..0246d4f378e 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -2139,6 +2139,11 @@ "Table does not support . Please check the current catalog and namespace to make sure the qualified table name is expected, and also check the catalog implementation which is configured by \"spark.sql.catalog\"." ] }, + "TIME_TRAVEL" : { +"message" : [ + "Time travel on the relation: ." +] + }, "TOO_MANY_TYPE_ARGUMENTS_FOR_UDF_CLASS" : { "message" : [ "UDF class with type arguments." @@ -3916,11 +3921,6 @@ " is not a valid timestamp expression for time travel." ] }, - "_LEGACY_ERROR_TEMP_1336" : { -"message" : [ - "Cannot time travel ." -] - }, "_LEGACY_ERROR_TEMP_1337" : { "message" : [ "Table does not support time travel." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 604fc3f84c8..dc7134a9605 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1169,8 +1169,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor throw QueryCompilationErrors.readNonStreamingTempViewError(identifier.quoted) } if (isTimeTravel) { - val target = if (tempViewPlan.isStreaming) "streams" else "views" - throw QueryCompilationErrors.timeTravelUnsupportedError(target) + throw QueryCompilationErrors.timeTravelUnsupportedError(toSQLId(identifier)) } tempViewPlan } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala index 77c687843c3..4e3234f9c0d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala @@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.expressions.SubqueryExpression import org.apache.spark.sql.catalyst.plans.logical.{Command, CTERelationDef, CTERelationRef, InsertIntoDir, LogicalPlan, ParsedStatement, SubqueryAlias, UnresolvedWith, WithCTE} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.catalyst.trees.TreePattern._ +import org.apache.spark.sql.catalyst.util.TypeUtils._ import org.apache.spark.sql.errors.QueryCompilationErrors import org.apache.spark.sql.internal.SQLConf.{LEGACY_CTE_PRECEDENCE_POLICY, LegacyBehaviorPolicy} @@ -253,7 +254,7 @@ object CTESubstitution extends Rule[LogicalPlan] { _.containsAnyPattern(RELATION_TIME_TRAVEL, UNRESOLVED_RELATION, PLAN_EXPRESS
[spark] branch master updated: [SPARK-43749][SPARK-43750][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[4-5]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3a6d2153b93 [SPARK-43749][SPARK-43750][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[4-5] 3a6d2153b93 is described below commit 3a6d2153b93c759b68e5827905d1867ba93ec9cf Author: Jiaan Geng AuthorDate: Thu May 25 20:14:00 2023 +0300 [SPARK-43749][SPARK-43750][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[4-5] ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_240[4-5]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? N/A Closes #41279 from beliefer/INVALID_PARTITION_OPERATION. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 29 +-- .../sql/catalyst/analysis/CheckAnalysis.scala | 8 ++--- .../command/ShowPartitionsSuiteBase.scala | 12 --- .../execution/command/v1/ShowPartitionsSuite.scala | 18 ++ .../command/v2/AlterTableAddPartitionSuite.scala | 20 --- .../command/v2/AlterTableDropPartitionSuite.scala | 19 +++--- .../execution/command/v2/ShowPartitionsSuite.scala | 41 +++--- .../execution/command/v2/TruncateTableSuite.scala | 20 --- 8 files changed, 122 insertions(+), 45 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 1ccbdfdc6eb..7683e7b8650 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1156,6 +1156,23 @@ }, "sqlState" : "22023" }, + "INVALID_PARTITION_OPERATION" : { +"message" : [ + "The partition command is invalid." +], +"subClass" : { + "PARTITION_MANAGEMENT_IS_UNSUPPORTED" : { +"message" : [ + "Table does not support partition management." +] + }, + "PARTITION_SCHEMA_IS_EMPTY" : { +"message" : [ + "Table is not partitioned." +] + } +} + }, "INVALID_PROPERTY_KEY" : { "message" : [ " is an invalid property key, please use quotes, e.g. SET =." @@ -5374,16 +5391,6 @@ "failed to evaluate expression : " ] }, - "_LEGACY_ERROR_TEMP_2404" : { -"message" : [ - "Table is not partitioned." -] - }, - "_LEGACY_ERROR_TEMP_2405" : { -"message" : [ - "Table does not support partition management." -] - }, "_LEGACY_ERROR_TEMP_2406" : { "message" : [ "invalid cast from to ." @@ -5772,4 +5779,4 @@ "Failed to get block , which is not a shuffle block" ] } -} \ No newline at end of file +} diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 407a9d363f4..fac3f491200 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -211,13 +211,13 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB case t: SupportsPartitionManagement => if (t.partitionSchema.isEmpty) { r.failAnalysis( - errorClass = "_LEGACY_ERROR_TEMP_2404", - messageParameters = Map("name" -> r.name)) + errorClass = "INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY", + messageParameters = Map("name" -> toSQLId(r.name))) } case _ => r.failAnalysis( -errorClass = "_LEGACY_ERROR_TEMP_2405", -messageParameters = Map("name" -> r.name)) +errorClass = "INVALID_PARTITION_OPERATION.PARTITION_MANAGEMENT_IS_UNSUPPORTED", +messageParameters = Map("name" -> toSQLId(r.name))) } case _ => } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala index 27d2eb98543..462b967a759 100644 --- a/sql/
[spark] branch master updated: [SPARK-43786][SQL][TESTS] Add a test for nullability about 'levenshtein' function
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 295f540a92f [SPARK-43786][SQL][TESTS] Add a test for nullability about 'levenshtein' function 295f540a92f is described below commit 295f540a92f9a4bde1da1244901b844223777a78 Author: panbingkun AuthorDate: Thu May 25 15:34:25 2023 +0300 [SPARK-43786][SQL][TESTS] Add a test for nullability about 'levenshtein' function ### What changes were proposed in this pull request? The pr aims to add a test for nullability about 'levenshtein' function. ### Why are the changes needed? Make testing more robust about 'levenshtein' function. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manual testing Closes #41303 from panbingkun/SPARK-43786. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala | 6 ++ 1 file changed, 6 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala index e887c570944..f612c5903dc 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala @@ -129,12 +129,18 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSession { val df = Seq(("kitten", "sitting"), ("frog", "fog")).toDF("l", "r") checkAnswer(df.select(levenshtein($"l", $"r")), Seq(Row(3), Row(1))) checkAnswer(df.selectExpr("levenshtein(l, r)"), Seq(Row(3), Row(1))) +checkAnswer(df.select(levenshtein($"l", lit(null))), Seq(Row(null), Row(null))) +checkAnswer(df.selectExpr("levenshtein(l, null)"), Seq(Row(null), Row(null))) checkAnswer(df.select(levenshtein($"l", $"r", 3)), Seq(Row(3), Row(1))) checkAnswer(df.selectExpr("levenshtein(l, r, 3)"), Seq(Row(3), Row(1))) +checkAnswer(df.select(levenshtein(lit(null), $"r", 3)), Seq(Row(null), Row(null))) +checkAnswer(df.selectExpr("levenshtein(null, r, 3)"), Seq(Row(null), Row(null))) checkAnswer(df.select(levenshtein($"l", $"r", 0)), Seq(Row(-1), Row(-1))) checkAnswer(df.selectExpr("levenshtein(l, r, 0)"), Seq(Row(-1), Row(-1))) +checkAnswer(df.select(levenshtein($"l", lit(null), 0)), Seq(Row(null), Row(null))) +checkAnswer(df.selectExpr("levenshtein(l, null, 0)"), Seq(Row(null), Row(null))) } test("string regex_replace / regex_extract") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (46949e692e8 -> 0db1f002c09)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 46949e692e8 [SPARK-43545][SQL][PYTHON] Support nested timestamp type add 0db1f002c09 [SPARK-43549][SQL] Convert _LEGACY_ERROR_TEMP_0036 to INVALID_SQL_SYNTAX.ANALYZE_TABLE_UNEXPECTED_NOSCAN No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 5 + .../org/apache/spark/sql/errors/QueryParsingErrors.scala | 4 ++-- .../apache/spark/sql/catalyst/parser/DDLParserSuite.scala| 12 ++-- 3 files changed, 13 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38464][CORE] Use error classes in org.apache.spark.io
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 76f82bd8c54 [SPARK-38464][CORE] Use error classes in org.apache.spark.io 76f82bd8c54 is described below commit 76f82bd8c54352a0b38c3e1d8de5b24627446b9c Author: Bo Zhang AuthorDate: Wed May 24 14:21:42 2023 +0300 [SPARK-38464][CORE] Use error classes in org.apache.spark.io ### What changes were proposed in this pull request? This PR aims to change exceptions created in package org.apache.spark.io to use error class. This PR also adds `toConf` and `toConfVal` in `SparkCoreErrors`. ### Why are the changes needed? This is to move exceptions created in package org.apache.spark.io to error class. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated existing tests. Closes #41277 from bozhang2820/spark-38464. Authored-by: Bo Zhang Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 10 ++ .../scala/org/apache/spark/errors/SparkCoreErrors.scala | 12 .../scala/org/apache/spark/io/CompressionCodec.scala | 15 +++ .../org/apache/spark/io/CompressionCodecSuite.scala | 16 4 files changed, 45 insertions(+), 8 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index fcb9ec249db..1b75f89cc10 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -187,6 +187,16 @@ ], "sqlState" : "22003" }, + "CODEC_NOT_AVAILABLE" : { +"message" : [ + "The codec is not available. Consider to set the config to ." +] + }, + "CODEC_SHORT_NAME_NOT_FOUND" : { +"message" : [ + "Cannot find a short name for the codec ." +] + }, "COLUMN_ALIASES_IS_NOT_ALLOWED" : { "message" : [ "Columns aliases are not allowed in ." diff --git a/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala b/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala index 8abb2564328..f8e7f2db259 100644 --- a/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala +++ b/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala @@ -466,4 +466,16 @@ private[spark] object SparkCoreErrors { "requestedBytes" -> requestedBytes.toString, "receivedBytes" -> receivedBytes.toString).asJava) } + + private def quoteByDefault(elem: String): String = { +"\"" + elem + "\"" + } + + def toConf(conf: String): String = { +quoteByDefault(conf) + } + + def toConfVal(conf: String): String = { +quoteByDefault(conf) + } } diff --git a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala index eb3dc938d4d..0bb392deb39 100644 --- a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala +++ b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala @@ -26,8 +26,9 @@ import net.jpountz.lz4.{LZ4BlockInputStream, LZ4BlockOutputStream, LZ4Factory} import net.jpountz.xxhash.XXHashFactory import org.xerial.snappy.{Snappy, SnappyInputStream, SnappyOutputStream} -import org.apache.spark.SparkConf +import org.apache.spark.{SparkConf, SparkIllegalArgumentException} import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.errors.SparkCoreErrors.{toConf, toConfVal} import org.apache.spark.internal.config._ import org.apache.spark.util.Utils @@ -88,8 +89,12 @@ private[spark] object CompressionCodec { } catch { case _: ClassNotFoundException | _: IllegalArgumentException => None } -codec.getOrElse(throw new IllegalArgumentException(s"Codec [$codecName] is not available. " + - s"Consider setting $configKey=$FALLBACK_COMPRESSION_CODEC")) +codec.getOrElse(throw new SparkIllegalArgumentException( + errorClass = "CODEC_NOT_AVAILABLE", + messageParameters = Map( +"codecName" -> codecName, +"configKey" -> toConf(configKey), +"configVal" -> toConfVal(FALLBACK_COMPRESSION_CODEC } /** @@ -102,7 +107,9 @@ private[spark] object CompressionCodec { } else { shortCompressionCodecNames .collectFirst { case (k, v) if v == codecName => k } -.getOrElse { throw new IllegalArgumentException(s"No short name for codec $codecName.") } +.getOrElse { throw new SparkIllegalArgumentException( +
[spark] branch master updated (5f325ec917c -> 85f2cb03c62)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5f325ec917c [SPARK-43747][PYTHON][CONNECT] Implement the pyfile support in SparkSession.addArtifacts add 85f2cb03c62 [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function No new revisions were added by this update. Summary of changes: .../org/apache/spark/unsafe/types/UTF8String.java | 93 +- .../CheckConnectJvmClientCompatibility.scala | 1 + .../explain-results/function_levenshtein.explain | 2 +- .../catalyst/expressions/stringExpressions.scala | 136 +++-- .../expressions/StringExpressionsSuite.scala | 89 ++ .../scala/org/apache/spark/sql/functions.scala | 13 +- .../apache/spark/sql/StringFunctionsSuite.scala| 6 + 7 files changed, 327 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43649][SPARK-43650][SPARK-43651][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[1-3]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 32d42bbe98d [SPARK-43649][SPARK-43650][SPARK-43651][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[1-3] 32d42bbe98d is described below commit 32d42bbe98da9a7e8c38b9c3187c75dbbfbb Author: Jiaan Geng AuthorDate: Tue May 23 12:41:06 2023 +0300 [SPARK-43649][SPARK-43650][SPARK-43651][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[1-3] ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_240[1-3]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. Closes #41252 from beliefer/offset-limit-error-improve. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 49 -- .../sql/catalyst/analysis/CheckAnalysis.scala | 18 +++--- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 74 ++ .../sql-tests/analyzer-results/limit.sql.out | 24 --- .../analyzer-results/postgreSQL/limit.sql.out | 8 +-- .../test/resources/sql-tests/results/limit.sql.out | 24 --- .../sql-tests/results/postgreSQL/limit.sql.out | 8 +-- 7 files changed, 136 insertions(+), 69 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index af0471199b7..5d19d180053 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1052,6 +1052,33 @@ ], "sqlState" : "42613" }, + "INVALID_LIMIT_LIKE_EXPRESSION" : { +"message" : [ + "The limit like expression is invalid." +], +"subClass" : { + "DATA_TYPE" : { +"message" : [ + "The expression must be integer type, but got ." +] + }, + "IS_NEGATIVE" : { +"message" : [ + "The expression must be equal to or greater than 0, but got ." +] + }, + "IS_NULL" : { +"message" : [ + "The evaluated expression must not be null." +] + }, + "IS_UNFOLDABLE" : { +"message" : [ + "The expression must evaluate to a constant value." +] + } +} + }, "INVALID_OPTIONS" : { "message" : [ "Invalid options:" @@ -1230,11 +1257,6 @@ } } }, - "LIMIT_LIKE_EXPRESSION_IS_UNFOLDABLE" : { -"message" : [ - "The expression must evaluate to a constant value, but got ." -] - }, "LOCATION_ALREADY_EXISTS" : { "message" : [ "Cannot name the managed table as , as its associated location already exists. Please pick a different table name, or remove the existing location first." @@ -5260,21 +5282,6 @@ "failed to evaluate expression : " ] }, - "_LEGACY_ERROR_TEMP_2401" : { -"message" : [ - "The expression must be integer type, but got ." -] - }, - "_LEGACY_ERROR_TEMP_2402" : { -"message" : [ - "The evaluated expression must not be null, but got ." -] - }, - "_LEGACY_ERROR_TEMP_2403" : { -"message" : [ - "The expression must be equal to or greater than 0, but got ." -] - }, "_LEGACY_ERROR_TEMP_2404" : { "message" : [ "Table is not partitioned." @@ -5673,4 +5680,4 @@ "Failed to get block , which is not a shuffle block" ] } -} +} \ No newline at end of file diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 3240f9bee56..407a9d363f4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -85,27 +85,29 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB private def checkLimitLikeClause(name: String, limitExpr: Expression): Unit = { limitExpr match { case e if !e.foldable => limitExpr.failAnalysis( -errorClass = "LIMIT_LIKE_EXPRESSION_IS_UNFOLDABLE", +errorClass = "INVALI
[spark] branch master updated: [SPARK-43714][SQL][TESTS] When formatting `error-classes.json` file with `SparkThrowableSuite` , the last line of the file should be empty line
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c97a4b55e1d [SPARK-43714][SQL][TESTS] When formatting `error-classes.json` file with `SparkThrowableSuite` , the last line of the file should be empty line c97a4b55e1d is described below commit c97a4b55e1d2f29e576463dbc822f53e9f86a251 Author: panbingkun AuthorDate: Tue May 23 10:36:11 2023 +0300 [SPARK-43714][SQL][TESTS] When formatting `error-classes.json` file with `SparkThrowableSuite` , the last line of the file should be empty line ### What changes were proposed in this pull request? The pr aims to generate a blank line when formatting `error-classes.json` file using `SparkThrowableSuite`. ### Why are the changes needed? - When I format `error-classes.json` file using `SparkThrowableSuite`, I found the last blank line of the file will be erased, which does not comply with universal underlying code specifications, similar: python: https://www.flake8rules.com/rules/W391.html - Promote developer experience. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual testing. Closes #41256 from panbingkun/SPARK-43714. Authored-by: panbingkun Signed-off-by: Max Gekk --- core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala index f5b5ad2ab10..e9554da082a 100644 --- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala @@ -21,6 +21,8 @@ import java.io.File import java.nio.charset.StandardCharsets import java.nio.file.Files +import scala.util.Properties.lineSeparator + import com.fasterxml.jackson.annotation.JsonInclude.Include import com.fasterxml.jackson.core.JsonParser.Feature.STRICT_DUPLICATE_DETECTION import com.fasterxml.jackson.core.`type`.TypeReference @@ -92,7 +94,10 @@ class SparkThrowableSuite extends SparkFunSuite { val errorClassesFile = errorJsonFilePath.toFile logInfo(s"Regenerating error class file $errorClassesFile") Files.delete(errorClassesFile.toPath) -FileUtils.writeStringToFile(errorClassesFile, rewrittenString, StandardCharsets.UTF_8) +FileUtils.writeStringToFile( + errorClassesFile, + rewrittenString + lineSeparator, + StandardCharsets.UTF_8) } } else { assert(rewrittenString.trim == errorClassFileContents.trim) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43591][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0013
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0900419de8c [SPARK-43591][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0013 0900419de8c is described below commit 0900419de8ca5d98b9921ec9ad2a8783e995f09c Author: panbingkun AuthorDate: Mon May 22 23:49:50 2023 +0300 [SPARK-43591][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0013 ### What changes were proposed in this pull request? The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_0013. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41236 from panbingkun/SPARK-43591. Lead-authored-by: panbingkun Co-authored-by: panbingkun <84731...@qq.com> Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 27 -- .../spark/sql/errors/QueryParsingErrors.scala | 6 +-- .../sql/catalyst/parser/PlanParserSuite.scala | 62 -- 3 files changed, 82 insertions(+), 13 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index fbb94c59e0e..af0471199b7 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1311,6 +1311,28 @@ ], "sqlState" : "42000" }, + "NOT_ALLOWED_IN_FROM" : { +"message" : [ + "Not allowed in the FROM clause:" +], +"subClass" : { + "LATERAL_WITH_PIVOT" : { +"message" : [ + "LATERAL together with PIVOT." +] + }, + "LATERAL_WITH_UNPIVOT" : { +"message" : [ + "LATERAL together with UNPIVOT." +] + }, + "UNPIVOT_WITH_PIVOT" : { +"message" : [ + "UNPIVOT together with PIVOT." +] + } +} + }, "NOT_A_PARTITIONED_TABLE" : { "message" : [ "Operation is not allowed for because it is not a partitioned table." @@ -2209,11 +2231,6 @@ "DISTRIBUTE BY is not supported." ] }, - "_LEGACY_ERROR_TEMP_0013" : { -"message" : [ - "LATERAL cannot be used together with PIVOT in FROM clause." -] - }, "_LEGACY_ERROR_TEMP_0014" : { "message" : [ "TABLESAMPLE does not accept empty inputs." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala index 4b6c3645916..28abaeb70ec 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala @@ -102,15 +102,15 @@ private[sql] object QueryParsingErrors extends QueryErrorsBase { } def unpivotWithPivotInFromClauseNotAllowedError(ctx: ParserRuleContext): Throwable = { -new ParseException("UNPIVOT cannot be used together with PIVOT in FROM clause", ctx) +new ParseException(errorClass = "NOT_ALLOWED_IN_FROM.UNPIVOT_WITH_PIVOT", ctx) } def lateralWithPivotInFromClauseNotAllowedError(ctx: ParserRuleContext): Throwable = { -new ParseException(errorClass = "_LEGACY_ERROR_TEMP_0013", ctx) +new ParseException(errorClass = "NOT_ALLOWED_IN_FROM.LATERAL_WITH_PIVOT", ctx) } def lateralWithUnpivotInFromClauseNotAllowedError(ctx: ParserRuleContext): Throwable = { -new ParseException("LATERAL cannot be used together with UNPIVOT in FROM clause", ctx) +new ParseException(errorClass = "NOT_ALLOWED_IN_FROM.LATERAL_WITH_UNPIVOT", ctx) } def lateralJoinWithUsingJoinUnsupportedError(ctx: ParserRuleContext): Throwable = { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala index 76be620f7bc..41e941da908 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala @@ -567,7 +567,7 @@ class PlanParserSuite extends AnalysisTest { "select * from t lateral view posexplode(x) posexpl as x, y", expected) -val sql = +val sql1 = """select * |from t |lateral vi
[spark] branch master updated (ba2d785b994 -> 6d0607f94de)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ba2d785b994 [SPARK-43290][SQL] Adds AES IV and AAD support to ExpressionImplUtils add 6d0607f94de [SPARK-43487][SQL] Fix Nested CTE error message No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 7 +++ .../spark/sql/errors/QueryCompilationErrors.scala | 17 +++ .../sql-tests/analyzer-results/cte-nested.sql.out | 56 -- .../resources/sql-tests/results/cte-nested.sql.out | 56 -- .../sql/errors/QueryCompilationErrorsSuite.scala | 27 +++ 5 files changed, 107 insertions(+), 56 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43290][SQL] Adds AES IV and AAD support to ExpressionImplUtils
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ba2d785b994 [SPARK-43290][SQL] Adds AES IV and AAD support to ExpressionImplUtils ba2d785b994 is described below commit ba2d785b99461871f588de6a8260f3201204f313 Author: Steve Weis AuthorDate: Mon May 22 22:43:46 2023 +0300 [SPARK-43290][SQL] Adds AES IV and AAD support to ExpressionImplUtils ### What changes were proposed in this pull request? This change adds support for optional IV and AAD fields to ExpressionImplUtils, which is the underlying library to support `aes_encrypt` and `aes_decrypt`. This allows callers to specify their own initialization vector values for some specific use cases, and to take advantage of AES-GCM's authenticated additional data optional input. This change does **not** add the support to the user-facing `aes_encrypt` and `aes_decrypt` yet. That will be added in a follow-up, rather than in a single complex change. ### Why are the changes needed? There are some use cases where callers to ExpressionImplUtils via aes_encrypt may want to provide initialization vectors (IVs) or additional authenticated data (AAD). The most common cases will be: 1. Ensuring that ciphertext matches values that have been encrypted by external tools. In those cases, the caller will need to provide an identical IV value. 2. For AES-CBC mode, there are some cases where callers want to generate deterministic encrypted output. 3. For AES-GCM mode, providing AAD fields allows callers to bind additional data to an encrypted ciphertext so that it can only be decrypted by a caller providing the same value. This is often used to enforce some context. ### Does this PR introduce _any_ user-facing change? Not yet. This change adds support to the underlying implementation, but does not yet update the SQL support to include the new parameters. ### How was this patch tested? All existing unit tests still pass and new tests in `ExpressionImplUtilsSuite` exercise the new code paths: ``` build/sbt "sql/test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite" build/sbt "catalyst/test:testOnly org.apache.spark.sql.catalyst.expressions.ExpressionImplUtilsSuite" ``` Closes #40970 from sweisdb/SPARK-43290. Lead-authored-by: Steve Weis Co-authored-by: sweisdb <60895808+swei...@users.noreply.github.com> Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 17 +- .../catalyst/expressions/ExpressionImplUtils.java | 98 ++-- .../spark/sql/errors/QueryExecutionErrors.scala| 28 ++- .../expressions/ExpressionImplUtilsSuite.scala | 268 ++--- .../sql/errors/QueryExecutionErrorsSuite.scala | 4 +- 5 files changed, 368 insertions(+), 47 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index b5b33758341..b3023fad83b 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1074,11 +1074,16 @@ "The value of parameter(s) in is invalid:" ], "subClass" : { - "AES_KEY" : { + "AES_CRYPTO_ERROR" : { "message" : [ "detail message: " ] }, + "AES_IV_LENGTH" : { +"message" : [ + "supports 16-byte CBC IVs and 12-byte GCM IVs, but got bytes for ." +] + }, "AES_KEY_LENGTH" : { "message" : [ "expects a binary value with 16, 24 or 32 bytes, but got bytes." @@ -1839,6 +1844,16 @@ "AES- with the padding by the function." ] }, + "AES_MODE_AAD" : { +"message" : [ + " with AES- does not support additional authenticate data (AAD)." +] + }, + "AES_MODE_IV" : { +"message" : [ + " with AES- does not support initialization vectors (IVs)." +] + }, "ANALYZE_UNCACHED_TEMP_VIEW" : { "message" : [ "The ANALYZE TABLE FOR COLUMNS command can operate on temporary views that have been cached already. Consider to cache the view ." diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java index 6843a348006..6aae649718a 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.jav