from:"maxgekk"

[spark] branch master updated: [SPARK-43780][SQL][FOLLOWUP] Fix the config doc `spark.sql.optimizer.decorrelateJoinPredicate.enabled`

2023-08-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 24293cab2de [SPARK-43780][SQL][FOLLOWUP] Fix the config doc 
`spark.sql.optimizer.decorrelateJoinPredicate.enabled`
24293cab2de is described below

commit 24293cab2de06a50ffd9f4871073e75481665bb8
Author: Max Gekk 
AuthorDate: Tue Aug 22 15:32:32 2023 +0300

[SPARK-43780][SQL][FOLLOWUP] Fix the config doc 
`spark.sql.optimizer.decorrelateJoinPredicate.enabled`

### What changes were proposed in this pull request?
Add s" to the doc of the SQL config 
`spark.sql.optimizer.decorrelateJoinPredicate.enabled`.

### Why are the changes needed?
To output the desired config name.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running CI.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42607 from MaxGekk/followup-agubichev_spark-43780-corr-predicate.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 9b421251cf6..ca155683ec0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -4363,7 +4363,7 @@ object SQLConf {
   .internal()
   .doc("Decorrelate scalar and lateral subqueries with correlated 
references in join " +
 "predicates. This configuration is only effective when " +
-"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.")
+s"'${DECORRELATE_INNER_QUERY_ENABLED.key}' is true.")
   .version("4.0.0")
   .booleanConf
   .createWithDefault(true)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-08-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 295c615b16b [SPARK-44404][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]
295c615b16b is described below

commit 295c615b16b8a77f242ffa99006b4fb95f8f3487
Author: panbingkun 
AuthorDate: Sat Aug 12 12:22:28 2023 +0500

[SPARK-44404][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class, include:
- _LEGACY_ERROR_TEMP_1009 => VIEW_EXCEED_MAX_NESTED_DEPTH
- _LEGACY_ERROR_TEMP_1010 => UNSUPPORTED_VIEW_OPERATION.WITHOUT_SUGGESTION
- _LEGACY_ERROR_TEMP_1013 => UNSUPPORTED_VIEW_OPERATION.WITH_SUGGESTION / 
UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION
- _LEGACY_ERROR_TEMP_1014 => 
UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION
- _LEGACY_ERROR_TEMP_1015 => UNSUPPORTED_TABLE_OPERATION.WITH_SUGGESTION
- _LEGACY_ERROR_TEMP_1016 => 
UNSUPPORTED_TEMP_VIEW_OPERATION.WITHOUT_SUGGESTION
- _LEGACY_ERROR_TEMP_1278 => UNSUPPORTED_TABLE_OPERATION.WITHOUT_SUGGESTION

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.
- Update UT.

Closes #42109 from panbingkun/SPARK-44404.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 R/pkg/tests/fulltests/test_sparkSQL.R  |   3 +-
 .../src/main/resources/error/error-classes.json|  91 ---
 ...ions-unsupported-table-operation-error-class.md |  36 +++
 ...-unsupported-temp-view-operation-error-class.md |  36 +++
 ...tions-unsupported-view-operation-error-class.md |  36 +++
 docs/sql-error-conditions.md   |  30 +++
 .../spark/sql/catalyst/analysis/Analyzer.scala |   9 +-
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |   4 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  32 ++-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  90 ---
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 104 
 .../apache/spark/sql/execution/command/views.scala |   2 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   2 +-
 .../analyzer-results/change-column.sql.out |  16 +-
 .../sql-tests/results/change-column.sql.out|  16 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |   7 +-
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 267 ++---
 .../spark/sql/execution/SQLViewTestSuite.scala |  23 +-
 .../AlterTableAddPartitionParserSuite.scala|   4 +-
 .../AlterTableDropPartitionParserSuite.scala   |   8 +-
 .../AlterTableRecoverPartitionsParserSuite.scala   |   8 +-
 .../AlterTableRenamePartitionParserSuite.scala |   4 +-
 .../command/AlterTableSetLocationParserSuite.scala |   6 +-
 .../command/AlterTableSetSerdeParserSuite.scala|  16 +-
 .../spark/sql/execution/command/DDLSuite.scala |  36 ++-
 .../command/MsckRepairTableParserSuite.scala   |  13 +-
 .../command/ShowPartitionsParserSuite.scala|  10 +-
 .../command/TruncateTableParserSuite.scala |   6 +-
 .../execution/command/TruncateTableSuiteBase.scala |  45 +++-
 .../execution/command/v1/ShowPartitionsSuite.scala |  57 -
 .../apache/spark/sql/internal/CatalogSuite.scala   |  13 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  94 +++-
 32 files changed, 717 insertions(+), 407 deletions(-)

diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index d61501d248a..47688d7560c 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -4193,8 +4193,7 @@ test_that("catalog APIs, listTables, getTable, 
listColumns, listFunctions, funct
 
   # recoverPartitions does not work with temporary view
   expect_error(recoverPartitions("cars"),
-   paste("Error in recoverPartitions : analysis error - cars is a 
temp view.",
- "'recoverPartitions()' expects a table"), fixed = TRUE)
+   "[UNSUPPORTED_TEMP_VIEW_OPERATION.WITH_SUGGESTION]*`cars`*")
   expect_error(refreshTable("cars"), NA)
   expect_error(refreshByPath("/"), NA)
 
diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 133c2dd826c..08f79bcecbb 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -3394,12 +3394,63 @

[spark] branch master updated: [SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF`

2023-08-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b9fc5c03ed6 [SPARK-44778][SQL] Add the alias `TIMEDIFF` for 
`TIMESTAMPDIFF`
b9fc5c03ed6 is described below

commit b9fc5c03ed69e91d9c4cbe7ff5a1522c7b849568
Author: Max Gekk 
AuthorDate: Sat Aug 12 11:08:39 2023 +0500

[SPARK-44778][SQL] Add the alias `TIMEDIFF` for `TIMESTAMPDIFF`

### What changes were proposed in this pull request?
In the PR, I propose to extend the rules of `primaryExpression` in 
`SqlBaseParser.g4`, and one more function `TIMEDIFF` which accepts 3-args in 
the same way as the existing expressions `TIMESTAMPDIFF`.

### Why are the changes needed?
To achieve feature parity w/ other system and make the migration to Spark 
SQL from such systems easier:
1. Snowflake: https://docs.snowflake.com/en/sql-reference/functions/timediff
2. MySQL/MariaDB: 
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timediff

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the existing test suites:
```
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
```

Closes #42435 from MaxGekk/timediff.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 docs/sql-ref-ansi-compliance.md|  1 +
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |  1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |  4 +-
 .../analyzer-results/ansi/timestamp.sql.out| 68 ++
 .../analyzer-results/datetime-legacy.sql.out   | 68 ++
 .../sql-tests/analyzer-results/timestamp.sql.out   | 68 ++
 .../timestampNTZ/timestamp-ansi.sql.out| 70 +++
 .../timestampNTZ/timestamp.sql.out | 70 +++
 .../test/resources/sql-tests/inputs/timestamp.sql  |  8 +++
 .../sql-tests/results/ansi/keywords.sql.out|  1 +
 .../sql-tests/results/ansi/timestamp.sql.out   | 80 ++
 .../sql-tests/results/datetime-legacy.sql.out  | 80 ++
 .../resources/sql-tests/results/keywords.sql.out   |  1 +
 .../resources/sql-tests/results/timestamp.sql.out  | 80 ++
 .../results/timestampNTZ/timestamp-ansi.sql.out| 80 ++
 .../results/timestampNTZ/timestamp.sql.out | 80 ++
 .../ThriftServerWithSparkContextSuite.scala|  2 +-
 17 files changed, 760 insertions(+), 2 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index f3a0e8f9afb..09c38a00995 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -636,6 +636,7 @@ Below is a list of all the keywords in Spark SQL.
 |TERMINATED|non-reserved|non-reserved|non-reserved|
 |THEN|reserved|non-reserved|reserved|
 |TIME|reserved|non-reserved|reserved|
+|TIMEDIFF|non-reserved|non-reserved|non-reserved|
 |TIMESTAMP|non-reserved|non-reserved|non-reserved|
 |TIMESTAMP_LTZ|non-reserved|non-reserved|non-reserved|
 |TIMESTAMP_NTZ|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
index bf6370575a1..d9128de0f5d 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
@@ -373,6 +373,7 @@ TEMPORARY: 'TEMPORARY' | 'TEMP';
 TERMINATED: 'TERMINATED';
 THEN: 'THEN';
 TIME: 'TIME';
+TIMEDIFF: 'TIMEDIFF';
 TIMESTAMP: 'TIMESTAMP';
 TIMESTAMP_LTZ: 'TIMESTAMP_LTZ';
 TIMESTAMP_NTZ: 'TIMESTAMP_NTZ';
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index a45ebee3106..7a69b10dadb 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -953,7 +953,7 @@ datetimeUnit
 primaryExpression
 : name=(CURRENT_DATE | CURRENT_TIMESTAMP | CURRENT_USER | USER)
   #currentLike
 | name=(TIMESTAMPADD | DATEADD | DATE_ADD) LEFT_PAREN (unit=datetimeUnit | 
invalidUnit=stringLit) COMMA unitsAmount=valueExpression COMMA 
timestamp=valueExpression RIGHT_PAREN #timestampadd
-| name=(TIMESTAMPDIFF | DATEDIFF | DATE_DIFF) LEFT_PAREN 
(unit=datetimeUnit | invalidUnit=stringLit) COMMA 
startTimestamp=valueExpression COMMA endTimestamp=valueExpression RIGHT_PAREN   
 #timestampdiff
+| name=(TIMESTAMPDIFF

[spark] branch master updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

2023-08-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f7879b4c250 [SPARK-44680][SQL] Improve the error for parameters in 
`DEFAULT`
f7879b4c250 is described below

commit f7879b4c2500046cd7d889ba94adedd3000f8c41
Author: Max Gekk 
AuthorDate: Tue Aug 8 13:26:19 2023 +0500

[SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

### What changes were proposed in this pull request?
In the PR, I propose to check that `DEFAULT` clause contains a parameter. 
If so, raise appropriate error about the feature is not supported. Currently, 
table creation with `DEFAULT` containing any parameters finishes successfully 
even parameters are not supported in such case:
```sql
scala>  spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
++
||
++
++
scala>  spark.sql("describe t12");
org.apache.spark.sql.AnalysisException: 
[INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT 
command because the destination table column `c1` has a DEFAULT value :parm, 
which fails to resolve as a valid expression.
```

### Why are the changes needed?
This improves user experience with Spark SQL by saying about the root cause 
of the issue.

### Does this PR introduce _any_ user-facing change?
Yes. After the change, the table creation completes w/ the error:
```sql
scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature 
is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32)

== SQL ==
CREATE TABLE t12(c1 int default :parm)
^^^
```

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *ParametersSuite"
```

Closes #42365 from MaxGekk/fix-param-in-DEFAULT.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 
 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 1b9dda51bf0..0635e6a1b44 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
 import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.trees.CurrentOrigin
+import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER
 import org.apache.spark.sql.catalyst.types.DataTypeUtils
 import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, 
GeneratedColumn, IntervalUtils, ResolveDefaultColumns}
 import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, 
convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, 
stringToTimestamp, stringToTimestampWithoutTimeZone}
@@ -3153,9 +3154,12 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 ctx.asScala.headOption.map(visitLocationSpec)
   }
 
-  private def verifyAndGetExpression(exprCtx: ExpressionContext): String = {
+  private def verifyAndGetExpression(exprCtx: ExpressionContext, place: 
String): String = {
 // Make sure it can be converted to Catalyst expressions.
-expression(exprCtx)
+val expr = expression(exprCtx)
+if (expr.containsPattern(PARAMETER)) {
+  throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin)
+}
 // Extract the raw expression text so that we can save the user provided 
text. We don't
 // use `Expression.sql` to avoid storing incorrect text caused by bugs in 
any expression's
 // `sql` method. Note: `exprCtx.getText` returns a string without spaces, 
so we need to
@@ -3170,7 +3174,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
*/
   override def visitDefaultExpression(ctx: DefaultExpressionContext): String =
 withOrigin(ctx) {
-  verifyAndGetExpression(ctx.expression())
+  verifyAndGetExpression(ctx.expression(), "DEFAULT")
 }
 
   /**
@@ -3178,7 +3182,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
*/
   override def v

[spark] branch branch-3.5 updated: [SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

2023-08-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new b623c28f521 [SPARK-44680][SQL] Improve the error for parameters in 
`DEFAULT`
b623c28f521 is described below

commit b623c28f521e350b0f4bf15bfb911ca6bf0b1a80
Author: Max Gekk 
AuthorDate: Tue Aug 8 13:26:19 2023 +0500

[SPARK-44680][SQL] Improve the error for parameters in `DEFAULT`

### What changes were proposed in this pull request?
In the PR, I propose to check that `DEFAULT` clause contains a parameter. 
If so, raise appropriate error about the feature is not supported. Currently, 
table creation with `DEFAULT` containing any parameters finishes successfully 
even parameters are not supported in such case:
```sql
scala>  spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
++
||
++
++
scala>  spark.sql("describe t12");
org.apache.spark.sql.AnalysisException: 
[INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT 
command because the destination table column `c1` has a DEFAULT value :parm, 
which fails to resolve as a valid expression.
```

### Why are the changes needed?
This improves user experience with Spark SQL by saying about the root cause 
of the issue.

### Does this PR introduce _any_ user-facing change?
Yes. After the change, the table creation completes w/ the error:
```sql
scala> spark.sql("CREATE TABLE t12(c1 int default :parm)", args = 
Map("parm" -> 5)).show()
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT] The feature 
is not supported: Parameter markers are not allowed in DEFAULT.(line 1, pos 32)

== SQL ==
CREATE TABLE t12(c1 int default :parm)
^^^
```

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *ParametersSuite"
```

Closes #42365 from MaxGekk/fix-param-in-DEFAULT.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit f7879b4c2500046cd7d889ba94adedd3000f8c41)
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 12 
 .../test/scala/org/apache/spark/sql/ParametersSuite.scala | 15 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 7a28efa3e42..83938632e53 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -40,6 +40,7 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
 import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.trees.CurrentOrigin
+import org.apache.spark.sql.catalyst.trees.TreePattern.PARAMETER
 import org.apache.spark.sql.catalyst.types.DataTypeUtils
 import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, 
GeneratedColumn, IntervalUtils, ResolveDefaultColumns}
 import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, 
convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, 
stringToTimestamp, stringToTimestampWithoutTimeZone}
@@ -3130,9 +3131,12 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 ctx.asScala.headOption.map(visitLocationSpec)
   }
 
-  private def verifyAndGetExpression(exprCtx: ExpressionContext): String = {
+  private def verifyAndGetExpression(exprCtx: ExpressionContext, place: 
String): String = {
 // Make sure it can be converted to Catalyst expressions.
-expression(exprCtx)
+val expr = expression(exprCtx)
+if (expr.containsPattern(PARAMETER)) {
+  throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin)
+}
 // Extract the raw expression text so that we can save the user provided 
text. We don't
 // use `Expression.sql` to avoid storing incorrect text caused by bugs in 
any expression's
 // `sql` method. Note: `exprCtx.getText` returns a string without spaces, 
so we need to
@@ -3147,7 +3151,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
*/
   override def visitDefaultExpression(ctx: DefaultExpressionContext): String =
 withOrigin(ctx) {
-  verifyAndGetExpression(ctx.expression())
+  verifyAndGetExpression(ctx.expression(), "DEFAULT")
 }
 
   /**
@@ -3155,7 +3

[spark] branch master updated: [SPARK-38475][CORE] Use error class in org.apache.spark.serializer

2023-08-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a23c7a18a0 [SPARK-38475][CORE] Use error class in 
org.apache.spark.serializer
2a23c7a18a0 is described below

commit 2a23c7a18a0ba75d95ee1d898896a8f0dc2c5531
Author: Bo Zhang 
AuthorDate: Mon Aug 7 22:10:01 2023 +0500

[SPARK-38475][CORE] Use error class in org.apache.spark.serializer

### What changes were proposed in this pull request?
This PR aims to change exceptions created in package 
org.apache.spark.serializer to use error class.

### Why are the changes needed?
This is to move exceptions created in package org.apache.spark.serializer 
to error class.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #42243 from bozhang2820/spark-38475.

Lead-authored-by: Bo Zhang 
Co-authored-by: Bo Zhang 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 21 +
 .../spark/serializer/GenericAvroSerializer.scala   |  6 ++---
 .../apache/spark/serializer/KryoSerializer.scala   | 27 --
 docs/sql-error-conditions.md   | 24 +++
 4 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 680f787429c..0ea1eed35e4 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -831,6 +831,11 @@
   "Not found an encoder of the type  to Spark SQL internal 
representation. Consider to change the input type to one of supported at 
'/sql-ref-datatypes.html'."
 ]
   },
+  "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT" : {
+"message" : [
+  "Error reading avro data -- encountered an unknown fingerprint: 
, not sure what schema to use. This could happen if you registered 
additional schemas after starting your spark context."
+]
+  },
   "EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE" : {
 "message" : [
   "The event time  has the invalid type , but 
expected \"TIMESTAMP\"."
@@ -864,6 +869,11 @@
 ],
 "sqlState" : "22018"
   },
+  "FAILED_REGISTER_CLASS_WITH_KRYO" : {
+"message" : [
+  "Failed to register classes with Kryo."
+]
+  },
   "FAILED_RENAME_PATH" : {
 "message" : [
   "Failed to rename  to  as destination already 
exists."
@@ -1564,6 +1574,12 @@
 ],
 "sqlState" : "22032"
   },
+  "INVALID_KRYO_SERIALIZER_BUFFER_SIZE" : {
+"message" : [
+  "The value of the config \"\" must be less than 2048 
MiB, but got  MiB."
+],
+"sqlState" : "F"
+  },
   "INVALID_LAMBDA_FUNCTION_CALL" : {
 "message" : [
   "Invalid lambda function call."
@@ -2006,6 +2022,11 @@
   "The join condition  has the invalid type 
, expected \"BOOLEAN\"."
 ]
   },
+  "KRYO_BUFFER_OVERFLOW" : {
+"message" : [
+  "Kryo serialization failed: . To avoid this, increase 
\"\" value."
+]
+  },
   "LOAD_DATA_PATH_NOT_EXISTS" : {
 "message" : [
   "LOAD DATA input path does not exist: ."
diff --git 
a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala
index 7d2923fdf37..d09abff2773 100644
--- 
a/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala
+++ 
b/core/src/main/scala/org/apache/spark/serializer/GenericAvroSerializer.scala
@@ -140,9 +140,9 @@ private[serializer] class GenericAvroSerializer[D <: 
GenericContainer]
 case Some(s) => new 
Schema.Parser().setValidateDefaults(false).parse(s)
 case None =>
   throw new SparkException(
-"Error reading attempting to read avro data -- encountered an 
unknown " +
-  s"fingerprint: $fingerprint, not sure what schema to use.  
This could happen " +
-  "if you registered additional schemas after starting your 
spark context.")
+errorClass = "ERROR_READING_AVRO_UNKNOWN_FINGERPRINT",
+messageParameters = Map("fingerprint" -> fingerprint.toString),
+cause = null)
   }
 })
   } else {
diff --git 
a/core/src/main/scala/org/apache/spark/serializer/KryoSe

[spark] branch master updated (1f10cc4a594 -> f139733b92d)

2023-08-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" 
and extract some common logic
 add f139733b92d [SPARK-42321][SQL] Assign name to _LEGACY_ERROR_TEMP_2133

No new revisions were added by this update.

Summary of changes:
 .../utils/src/main/resources/error/error-classes.json | 10 +-
 ...ditions-malformed-record-in-parsing-error-class.md |  4 
 .../spark/sql/catalyst/json/JacksonParser.scala   |  8 
 .../spark/sql/catalyst/util/BadRecordException.scala  |  9 +
 .../spark/sql/catalyst/util/FailureSafeParser.scala   |  3 +++
 .../spark/sql/errors/QueryExecutionErrors.scala   | 19 ---
 .../spark/sql/errors/QueryExecutionErrorsSuite.scala  | 17 +
 7 files changed, 54 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some common logic

2023-08-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1f10cc4a594 [SPARK-44628][SQL] Clear some unused codes in "***Errors" 
and extract some common logic
1f10cc4a594 is described below

commit 1f10cc4a59457ed0de0fd4dc0a1c61514d77261a
Author: panbingkun 
AuthorDate: Mon Aug 7 12:01:47 2023 +0500

[SPARK-44628][SQL] Clear some unused codes in "***Errors" and extract some 
common logic

### What changes were proposed in this pull request?
The pr aims to clear some unused codes in "***Errors" and extract some 
common logic.

### Why are the changes needed?
Make code clear.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42238 from panbingkun/clear_error.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/errors/DataTypeErrors.scala   | 18 ++---
 .../apache/spark/sql/errors/QueryErrorsBase.scala  |  6 +-
 .../spark/sql/errors/QueryExecutionErrors.scala| 86 --
 3 files changed, 10 insertions(+), 100 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
index 7a34a386cd8..5e52e283338 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
@@ -192,15 +192,7 @@ private[sql] object DataTypeErrors extends 
DataTypeErrorsBase {
   decimalPrecision: Int,
   decimalScale: Int,
   context: SQLQueryContext = null): ArithmeticException = {
-new SparkArithmeticException(
-  errorClass = "NUMERIC_VALUE_OUT_OF_RANGE",
-  messageParameters = Map(
-"value" -> value.toPlainString,
-"precision" -> decimalPrecision.toString,
-"scale" -> decimalScale.toString,
-"config" -> toSQLConf("spark.sql.ansi.enabled")),
-  context = getQueryContext(context),
-  summary = getSummary(context))
+numericValueOutOfRange(value, decimalPrecision, decimalScale, context)
   }
 
   def cannotChangeDecimalPrecisionError(
@@ -208,6 +200,14 @@ private[sql] object DataTypeErrors extends 
DataTypeErrorsBase {
   decimalPrecision: Int,
   decimalScale: Int,
   context: SQLQueryContext = null): ArithmeticException = {
+numericValueOutOfRange(value, decimalPrecision, decimalScale, context)
+  }
+
+  private def numericValueOutOfRange(
+  value: Decimal,
+  decimalPrecision: Int,
+  decimalScale: Int,
+  context: SQLQueryContext): ArithmeticException = {
 new SparkArithmeticException(
   errorClass = "NUMERIC_VALUE_OUT_OF_RANGE",
   messageParameters = Map(
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
index db256fbee87..26600117a0c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.sql.errors
 
 import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
-import org.apache.spark.sql.catalyst.util.{toPrettySQL, QuotingUtils}
+import org.apache.spark.sql.catalyst.util.toPrettySQL
 import org.apache.spark.sql.types.{DataType, DoubleType, FloatType}
 
 /**
@@ -55,10 +55,6 @@ private[sql] trait QueryErrorsBase extends 
DataTypeErrorsBase {
 quoteByDefault(toPrettySQL(e))
   }
 
-  def toSQLSchema(schema: String): String = {
-QuotingUtils.toSQLSchema(schema)
-  }
-
   // Converts an error class parameter to its SQL representation
   def toSQLValue(v: Any, t: DataType): String = Literal.create(v, t) match {
 case Literal(null, _) => "NULL"
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 45b5d6b6692..f960a091ec0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -32,7 +32,6 @@ import org.apache.spark._
 import org.apache.spark.launcher.SparkLauncher
 import org.apache.spark.memory.SparkOutOfMemoryError
 import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.ScalaReflection.Schema
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.UnresolvedGenerator
 import org.ap

[spark] branch branch-3.5 updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`

2023-08-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new a1ca1e6e763 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to 
the error class `_LEGACY_ERROR_TEMP_2175`
a1ca1e6e763 is described below

commit a1ca1e6e7633c3fbb36427a82635cda7d21f1dab
Author: Koray Beyaz 
AuthorDate: Thu Aug 3 10:57:26 2023 +0500

[SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class 
`_LEGACY_ERROR_TEMP_2175`

### What changes were proposed in this pull request?

- Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND

- Add a test case for the error class.

### Why are the changes needed?

We are migrating onto error classes

### Does this PR introduce _any_ user-facing change?

Yes, the error message will include the error class name

### How was this patch tested?

`testOnly *RuleIdCollectionSuite` and Github Actions

Closes #40991 from kori73/SPARK-42330.

Lead-authored-by: Koray Beyaz 
Co-authored-by: Koray Beyaz 
Signed-off-by: Max Gekk 
(cherry picked from commit f824d058b14e3c58b1c90f64fefc45fac105c7dd)
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json  | 11 ++-
 docs/sql-error-conditions.md  |  6 ++
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala|  5 ++---
 .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala   | 11 +++
 4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index df425d7b2df..d9d1963c958 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2412,6 +2412,12 @@
 ],
 "sqlState" : "42883"
   },
+  "RULE_ID_NOT_FOUND" : {
+"message" : [
+  "Not found an id for the rule name \"\". Please modify 
RuleIdCollection.scala if you are adding a new rule."
+],
+"sqlState" : "22023"
+  },
   "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : {
 "message" : [
   "The correlated scalar subquery '' is neither present in GROUP 
BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or 
wrap it in `first()` (or `first_value`) if you don't care which value you get."
@@ -5425,11 +5431,6 @@
   "."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2175" : {
-"message" : [
-  "Rule id not found for . Please modify RuleIdCollection.scala 
if you are adding a new rule."
-]
-  },
   "_LEGACY_ERROR_TEMP_2176" : {
 "message" : [
   "Cannot create array with  elements of data due to 
exceeding the limit  elements for ArrayData. 
"
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 9e2a484d057..e1430e94db5 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1578,6 +1578,12 @@ The function `` cannot be found. Verify the 
spelling and correctnes
 If you did not qualify the name with a schema and catalog, verify the 
current_schema() output, or qualify the name with the correct schema and 
catalog.
 To tolerate the error on drop use DROP FUNCTION IF EXISTS.
 
+### RULE_ID_NOT_FOUND
+
+[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception)
+
+Not found an id for the rule name "``". Please modify 
RuleIdCollection.scala if you are adding a new rule.
+
 ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION
 
 SQLSTATE: none assigned
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 89c080409e2..7685e0f907c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
 
   def ruleIdNotFoundForRuleError(ruleName: String): Throwable = {
 new SparkException(
-  errorClass = "_LEGACY_ERROR_TEMP_2175",
-  messageParameters = Map(
-"ruleName" -> ruleName),
+  errorClass = "RULE_ID_NOT_FOUND",
+  messageParameters = Map("ruleName" -> ruleName),
   cause = null)
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/

[spark] branch master updated: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`

2023-08-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f824d058b14 [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to 
the error class `_LEGACY_ERROR_TEMP_2175`
f824d058b14 is described below

commit f824d058b14e3c58b1c90f64fefc45fac105c7dd
Author: Koray Beyaz 
AuthorDate: Thu Aug 3 10:57:26 2023 +0500

[SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class 
`_LEGACY_ERROR_TEMP_2175`

### What changes were proposed in this pull request?

- Rename _LEGACY_ERROR_TEMP_2175 as RULE_ID_NOT_FOUND

- Add a test case for the error class.

### Why are the changes needed?

We are migrating onto error classes

### Does this PR introduce _any_ user-facing change?

Yes, the error message will include the error class name

### How was this patch tested?

`testOnly *RuleIdCollectionSuite` and Github Actions

Closes #40991 from kori73/SPARK-42330.

Lead-authored-by: Koray Beyaz 
Co-authored-by: Koray Beyaz 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json  | 11 ++-
 docs/sql-error-conditions.md  |  6 ++
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala|  5 ++---
 .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala   | 11 +++
 4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index a9619b97bd9..20f2ab4eb24 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2471,6 +2471,12 @@
 ],
 "sqlState" : "42883"
   },
+  "RULE_ID_NOT_FOUND" : {
+"message" : [
+  "Not found an id for the rule name \"\". Please modify 
RuleIdCollection.scala if you are adding a new rule."
+],
+"sqlState" : "22023"
+  },
   "SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION" : {
 "message" : [
   "The correlated scalar subquery '' is neither present in GROUP 
BY, nor in an aggregate function. Add it to GROUP BY using ordinal position or 
wrap it in `first()` (or `first_value`) if you don't care which value you get."
@@ -5489,11 +5495,6 @@
   "."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2175" : {
-"message" : [
-  "Rule id not found for . Please modify RuleIdCollection.scala 
if you are adding a new rule."
-]
-  },
   "_LEGACY_ERROR_TEMP_2176" : {
 "message" : [
   "Cannot create array with  elements of data due to 
exceeding the limit  elements for ArrayData. 
"
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 161f3bdbef1..5609d60f974 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1586,6 +1586,12 @@ The function `` cannot be found. Verify the 
spelling and correctnes
 If you did not qualify the name with a schema and catalog, verify the 
current_schema() output, or qualify the name with the correct schema and 
catalog.
 To tolerate the error on drop use DROP FUNCTION IF EXISTS.
 
+### RULE_ID_NOT_FOUND
+
+[SQLSTATE: 22023](sql-error-conditions-sqlstates.html#class-22-data-exception)
+
+Not found an id for the rule name "``". Please modify 
RuleIdCollection.scala if you are adding a new rule.
+
 ### SCALAR_SUBQUERY_IS_IN_GROUP_BY_OR_AGGREGATE_FUNCTION
 
 SQLSTATE: none assigned
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 3622ffebb74..45b5d6b6692 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1584,9 +1584,8 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
 
   def ruleIdNotFoundForRuleError(ruleName: String): Throwable = {
 new SparkException(
-  errorClass = "_LEGACY_ERROR_TEMP_2175",
-  messageParameters = Map(
-"ruleName" -> ruleName),
+  errorClass = "RULE_ID_NOT_FOUND",
+  messageParameters = Map("ruleName" -> ruleName),
   cause = null)
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
index e70d04b7b5a..ae1c0a86a14 100644
--- 
a/sql/core/src/test/scala/org/apache/s

[spark] branch branch-3.5 updated: [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c47d9b1bcf6 [SPARK-44555][SQL] Use checkError() to check Exception in 
command Suite & assign some error class names
c47d9b1bcf6 is described below

commit c47d9b1bcf61f65a7078d43361b438fd56d0af81
Author: panbingkun 
AuthorDate: Wed Aug 2 10:51:16 2023 +0500

[SPARK-44555][SQL] Use checkError() to check Exception in command Suite & 
assign some error class names

### What changes were proposed in this pull request?
The pr aims to
1. Use `checkError()` to check Exception in `command` Suite.
2. Assign some error class names, include: 
`UNSUPPORTED_FEATURE.PURGE_PARTITION` and `UNSUPPORTED_FEATURE.PURGE_TABLE`.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #42169 from panbingkun/checkError_for_command.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
(cherry picked from commit 4ec27c3801aaa0cbba3e086c278a0ff96260b84a)
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 10 
 ...r-conditions-unsupported-feature-error-class.md |  8 ++
 .../catalog/SupportsAtomicPartitionManagement.java |  3 ++-
 .../catalog/SupportsPartitionManagement.java   |  3 ++-
 .../spark/sql/connector/catalog/TableCatalog.java  |  3 ++-
 .../spark/sql/errors/QueryExecutionErrors.scala| 12 +
 .../SupportsAtomicPartitionManagementSuite.scala   | 13 ++
 .../catalog/SupportsPartitionManagementSuite.scala | 13 ++
 .../command/v1/AlterTableAddPartitionSuite.scala   | 14 ++
 .../command/v1/AlterTableDropPartitionSuite.scala  | 12 +
 .../command/v1/AlterTableRenameSuite.scala | 11 +---
 .../command/v1/AlterTableSetLocationSuite.scala| 11 +---
 .../command/v1/ShowCreateTableSuite.scala  | 12 +
 .../sql/execution/command/v1/ShowTablesSuite.scala | 22 ++--
 .../execution/command/v1/TruncateTableSuite.scala  | 11 +---
 .../command/v2/AlterTableDropPartitionSuite.scala  | 12 ++---
 .../v2/AlterTableRecoverPartitionsSuite.scala  | 11 +---
 .../command/v2/AlterTableSetLocationSuite.scala| 12 +
 .../sql/execution/command/v2/DropTableSuite.scala  | 12 ++---
 .../command/v2/MsckRepairTableSuite.scala  | 11 +---
 .../sql/execution/command/v2/ShowTablesSuite.scala | 11 +---
 .../execution/command/ShowCreateTableSuite.scala   | 30 +-
 22 files changed, 172 insertions(+), 85 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 385435c740e..480ec636283 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2956,6 +2956,16 @@
   "Pivoting by the value '' of the column data type ."
 ]
   },
+  "PURGE_PARTITION" : {
+"message" : [
+  "Partition purge."
+]
+  },
+  "PURGE_TABLE" : {
+"message" : [
+  "Purge table."
+]
+  },
   "PYTHON_UDF_IN_ON_CLAUSE" : {
 "message" : [
   "Python UDF in the ON clause of a  JOIN. In case of an 
INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index aa1c622c458..7a60dc76fa6 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -141,6 +141,14 @@ PIVOT clause following a GROUP BY clause. Consider pushing 
the GROUP BY into a s
 
 Pivoting by the value '``' of the column data type ``.
 
+## PURGE_PARTITION
+
+Partition purge.
+
+## PURGE_TABLE
+
+Purge table.
+
 ## PYTHON_UDF_IN_ON_CLAUSE
 
 Python UDF in the ON clause of a `` JOIN. In case of an INNNER JOIN 
consider rewriting to a CROSS JOIN with a WHERE clause.
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
index 3eb9bf9f913..48c6392d2b8 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
@@ -23,6

[spark] branch master updated: [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ec27c3801a [SPARK-44555][SQL] Use checkError() to check Exception in 
command Suite & assign some error class names
4ec27c3801a is described below

commit 4ec27c3801aaa0cbba3e086c278a0ff96260b84a
Author: panbingkun 
AuthorDate: Wed Aug 2 10:51:16 2023 +0500

[SPARK-44555][SQL] Use checkError() to check Exception in command Suite & 
assign some error class names

### What changes were proposed in this pull request?
The pr aims to
1. Use `checkError()` to check Exception in `command` Suite.
2. Assign some error class names, include: 
`UNSUPPORTED_FEATURE.PURGE_PARTITION` and `UNSUPPORTED_FEATURE.PURGE_TABLE`.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #42169 from panbingkun/checkError_for_command.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 10 
 ...r-conditions-unsupported-feature-error-class.md |  8 ++
 .../catalog/SupportsAtomicPartitionManagement.java |  3 ++-
 .../catalog/SupportsPartitionManagement.java   |  3 ++-
 .../spark/sql/connector/catalog/TableCatalog.java  |  3 ++-
 .../spark/sql/errors/QueryExecutionErrors.scala| 12 +
 .../SupportsAtomicPartitionManagementSuite.scala   | 13 ++
 .../catalog/SupportsPartitionManagementSuite.scala | 13 ++
 .../command/v1/AlterTableAddPartitionSuite.scala   | 14 ++
 .../command/v1/AlterTableDropPartitionSuite.scala  | 12 +
 .../command/v1/AlterTableRenameSuite.scala | 11 +---
 .../command/v1/AlterTableSetLocationSuite.scala| 11 +---
 .../command/v1/ShowCreateTableSuite.scala  | 12 +
 .../sql/execution/command/v1/ShowTablesSuite.scala | 22 ++--
 .../execution/command/v1/TruncateTableSuite.scala  | 11 +---
 .../command/v2/AlterTableDropPartitionSuite.scala  | 12 ++---
 .../v2/AlterTableRecoverPartitionsSuite.scala  | 11 +---
 .../command/v2/AlterTableSetLocationSuite.scala| 12 +
 .../sql/execution/command/v2/DropTableSuite.scala  | 12 ++---
 .../command/v2/MsckRepairTableSuite.scala  | 11 +---
 .../sql/execution/command/v2/ShowTablesSuite.scala | 11 +---
 .../execution/command/ShowCreateTableSuite.scala   | 30 +-
 22 files changed, 172 insertions(+), 85 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 7012c66c895..06350522834 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -3020,6 +3020,16 @@
   "Pivoting by the value '' of the column data type ."
 ]
   },
+  "PURGE_PARTITION" : {
+"message" : [
+  "Partition purge."
+]
+  },
+  "PURGE_TABLE" : {
+"message" : [
+  "Purge table."
+]
+  },
   "PYTHON_UDF_IN_ON_CLAUSE" : {
 "message" : [
   "Python UDF in the ON clause of a  JOIN. In case of an 
INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index aa1c622c458..7a60dc76fa6 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -141,6 +141,14 @@ PIVOT clause following a GROUP BY clause. Consider pushing 
the GROUP BY into a s
 
 Pivoting by the value '``' of the column data type ``.
 
+## PURGE_PARTITION
+
+Partition purge.
+
+## PURGE_TABLE
+
+Purge table.
+
 ## PYTHON_UDF_IN_ON_CLAUSE
 
 Python UDF in the ON clause of a `` JOIN. In case of an INNNER JOIN 
consider rewriting to a CROSS JOIN with a WHERE clause.
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
index 3eb9bf9f913..48c6392d2b8 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
@@ -23,6 +23,7 @@ import org.apache.spark.annotation.Experimental;
 import org.apache.spar

[spark] branch branch-3.4 updated: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 53383fcd2be [SPARK-44391][SQL][3.4] Check the number of argument types 
in `InvokeLike`
53383fcd2be is described below

commit 53383fcd2be178f4f0d231334ee36f1c3d67f64d
Author: Max Gekk 
AuthorDate: Fri Jul 14 08:37:29 2023 +0300

[SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`

### What changes were proposed in this pull request?
In the PR, I propose to check the number of argument types in the 
`InvokeLike` expressions. If the input types are provided, the number of types 
should be exactly the same as the number of argument expressions.

This is a backport of https://github.com/apache/spark/pull/41954.

### Why are the changes needed?
1. This PR checks the contract described in the comment explicitly:

https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247

that can prevent the errors of expression implementations, and improve code 
maintainability.

2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the related tests:
```
$ build/sbt "test:testOnly *UrlFunctionsSuite"
$ build/sbt "test:testOnly *DataSourceV2FunctionSuite"
```

Authored-by: Max Gekk 
(cherry picked from commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758)

    Closes #41985 from MaxGekk/fix-url_decode-3.4.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json  |  5 +
 .../sql-error-conditions-datatype-mismatch-error-class.md |  4 
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala   |  5 +++--
 .../spark/sql/catalyst/expressions/objects/objects.scala  | 15 +++
 .../spark/sql/catalyst/expressions/urlExpressions.scala   |  4 ++--
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index febed9283d8..90dec2ee45e 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -468,6 +468,11 @@
   "The  must be between  (current value = 
)."
 ]
   },
+  "WRONG_NUM_ARG_TYPES" : {
+"message" : [
+  "The expression requires  argument types but the actual 
number is ."
+]
+  },
   "WRONG_NUM_ENDPOINTS" : {
 "message" : [
   "The number of endpoints must be >= 2 to construct intervals but the 
actual number is ."
diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md 
b/docs/sql-error-conditions-datatype-mismatch-error-class.md
index 6ccd63e6ee9..2178deca4f2 100644
--- a/docs/sql-error-conditions-datatype-mismatch-error-class.md
+++ b/docs/sql-error-conditions-datatype-mismatch-error-class.md
@@ -231,6 +231,10 @@ The input of `` can't be `` type 
data.
 
 The `` must be between `` (current value = 
``).
 
+## WRONG_NUM_ARG_TYPES
+
+The expression requires `` argument types but the actual number 
is ``.
+
 ## WRONG_NUM_ENDPOINTS
 
 The number of endpoints must be >= 2 to construct intervals but the actual 
number is ``.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 223fdf12d6d..e717483ec94 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -288,8 +288,9 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 "srcType" -> c.child.dataType.catalogString,
 "targetType" -> c.dataType.catalogString))
   case e: RuntimeReplaceable if !e.replacement.resolved =>
-throw new IllegalStateException("Illegal RuntimeReplaceable: " + e 
+
-  "\nReplacement is unresolved: " + e.replacement)
+throw SparkException.internalError(
+  s"Cannot resolve the runtime replaceable expression 
${toSQLExpr(e)}. " +
+  s"The replacement is unresolved: ${toSQLExpr(e.replacement)}.")
 
   case g: Grouping =>
 g.failAnalysis(errorClass = "_LEGACY_ERROR_TEMP_2445", 
messageParameters = Map.empty)
d

[spark] branch master updated: [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new efed39516c0 [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` 
and sub classes
efed39516c0 is described below

commit efed39516c0c4e9654aec447ce91676026368384
Author: itholic 
AuthorDate: Thu Jul 13 17:21:29 2023 +0300

[SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes

### What changes were proposed in this pull request?

This PR proposes to assign name to _LEGACY_ERROR_TEMP_1204, 
"INCOMPATIBLE_DATA_TO_TABLE" and its sub classes:
- CANNOT_FIND_DATA
- AMBIGUOUS_COLUMN_NAME
- EXTRA_STRUCT_FIELDS
- NULLABLE_COLUMN
- NULLABLE_ARRAY_ELEMENTS
- NULLABLE_MAP_VALUES
- CANNOT_SAFELY_CAST
- STRUCT_MISSING_FIELDS
- UNEXPECTED_COLUMN_NAME

### Why are the changes needed?

We should assign proper name to _LEGACY_ERROR_TEMP_*

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*`

Closes #39937 from itholic/LEGACY_1204.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/README.md|  14 +
 .../src/main/resources/error/error-classes.json|  59 ++-
 docs/_data/menu-sql.yaml   |   2 +-
 ...ions-incompatible-data-for-table-error-class.md |  64 +++
 ...tions-incompatible-data-to-table-error-class.md |  64 +++
 docs/sql-error-conditions.md   |   8 +
 docs/sql-ref-ansi-compliance.md|   3 +-
 .../sql/catalyst/analysis/AssignmentUtils.scala|   5 +-
 .../catalyst/analysis/TableOutputResolver.scala|  97 +++--
 .../spark/sql/catalyst/types/DataTypeUtils.scala   |  59 +--
 .../spark/sql/errors/QueryCompilationErrors.scala  | 110 +-
 .../catalyst/analysis/V2WriteAnalysisSuite.scala   | 267 ++---
 .../types/DataTypeWriteCompatibilitySuite.scala| 429 -
 .../apache/spark/sql/DataFrameWriterV2Suite.scala  |  39 +-
 .../org/apache/spark/sql/SQLInsertTestSuite.scala  |   5 +-
 .../command/AlignMergeAssignmentsSuite.scala   |  78 +++-
 .../command/AlignUpdateAssignmentsSuite.scala  |  54 ++-
 .../org/apache/spark/sql/sources/InsertSuite.scala |  98 +++--
 .../sql/test/DataFrameReaderWriterSuite.scala  |  47 ++-
 .../spark/sql/hive/client/HiveClientSuite.scala|  22 +-
 20 files changed, 1100 insertions(+), 424 deletions(-)

diff --git a/common/utils/src/main/resources/error/README.md 
b/common/utils/src/main/resources/error/README.md
index 838991c2b6a..dfcb42d49e7 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -1294,6 +1294,20 @@ The following SQLSTATEs are collated from:
 |IM013|IM   |ODBC driver   |013 
|Trace file error|SQL Server |N 
  |SQL Server   
   |
 |IM014|IM   |ODBC driver   |014 
|Invalid name of File DSN|SQL Server |N 
  |SQL Server   
   |
 |IM015|IM   |ODBC driver   |015 
|Corrupt file data source|SQL Server |N 
  |SQL Server   
   |
+|KD000   |KD   |datasource specific errors|000 
|datasource specific errors  |Databricks
 |N   |Databricks   
   |
+|KD001   |KD   |datasource specific errors|001 
|Cannot read file footer |Databricks
 |N   |Databricks   
   |
+|KD002   |KD   |datasource specific errors|002 
|Unexpected version  |Databricks
 |N   |Databricks   
   |
+|KD003   |KD   |datasource specific errors|003 
|Incorrect access to data type   
|Databricks |N   |Databricks
  |
+|KD004   |KD   |datasource specific errors|004 
|Delta protocol version error
|Databricks |N

[spark] branch master updated: [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in ViewSuite, NamespaceSuite, DataSourceSuite

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dedb4ad2c9 [SPARK-44384][SQL][TESTS] Use checkError() to check 
Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite
4dedb4ad2c9 is described below

commit 4dedb4ad2c9b2ecd75dd9ccec5f565805752ad8e
Author: panbingkun 
AuthorDate: Thu Jul 13 16:26:34 2023 +0300

[SPARK-44384][SQL][TESTS] Use checkError() to check Exception in 
*View*Suite, *Namespace*Suite, *DataSource*Suite

### What changes were proposed in this pull request?
The pr aims to use `checkError()` to check `Exception` in `*View*Suite`, 
`*Namespace*Suite`, `*DataSource*Suite`, include:
- sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite
- sql/core/src/test/scala/org/apache/spark/sql/NestedDataSourceSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2FunctionSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite
- sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite
- sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowNamespacesSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite
- 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite
- 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite
- 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterNamespaceSetLocationSuite

### Why are the changes needed?
Migration on checkError() will make the tests independent from the text of 
error messages.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #41952 from panbingkun/view_and_namespace_checkerror.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../spark/sql/FileBasedDataSourceSuite.scala   |  67 +++--
 .../apache/spark/sql/NestedDataSourceSuite.scala   |  24 +-
 .../sql/connector/DataSourceV2DataFrameSuite.scala |  21 +-
 .../sql/connector/DataSourceV2FunctionSuite.scala  | 189 +++--
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  54 +++-
 .../spark/sql/connector/DataSourceV2Suite.scala|  38 ++-
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 313 ++---
 .../execution/command/v2/ShowNamespacesSuite.scala |  28 +-
 .../sql/sources/ResolvedDataSourceSuite.scala  |  24 +-
 .../sources/StreamingDataSourceV2Suite.scala   |  68 +++--
 .../spark/sql/hive/MetastoreDataSourcesSuite.scala |  88 +++---
 .../sql/hive/execution/HiveSQLViewSuite.scala  |  31 +-
 .../command/AlterNamespaceSetLocationSuite.scala   |  11 +-
 13 files changed, 671 insertions(+), 285 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
index e7e53285d62..d69a68f5726 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
@@ -26,7 +26,7 @@ import scala.collection.mutable
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{LocalFileSystem, Path}
 
-import org.apache.spark.SparkException
+import org.apache.spark.{SparkException, SparkFileNotFoundException, 
SparkRuntimeException}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
 import org.apache.spark.sql.TestingUDT.{IntervalUDT, NullData, NullUDT}
 import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
GreaterThan, Literal}
@@ -129,11 +129,13 @@ class FileBasedDataSourceSuite extends QueryTest
   allFileBasedDataSources.foreach { format =>
 test(s"SPARK-23372 error while writing empty schema files using $format") {
   withTempPath { outputPath =>
-val errMsg = intercept[AnalysisException] {
-  spark.emptyDataFrame.write.format(format).save(outputPath.toString)
-}
-assert(errMsg.getMessage.contains(
-  "Datasource does not support writing empty or nested empty schemas"))
+checkError(
+  exception = intercept[AnalysisException] {
+spark.emptyDataFrame.write.format(format).save(outputPath.toString)
+  },
+  errorClass = "_LEGACY_ERROR_TEMP_1142",
+  parameters = Map.empty
+)
   }
 
   // Nested empt

[spark] branch master updated: [SPARK-44391][SQL] Check the number of argument types in `InvokeLike`

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3e82ac6ea3d [SPARK-44391][SQL] Check the number of argument types in 
`InvokeLike`
3e82ac6ea3d is described below

commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758
Author: Max Gekk 
AuthorDate: Thu Jul 13 12:17:20 2023 +0300

[SPARK-44391][SQL] Check the number of argument types in `InvokeLike`

### What changes were proposed in this pull request?
In the PR, I propose to check the number of argument types in the 
`InvokeLike` expressions. If the input types are provided, the number of types 
should be exactly the same as the number of argument expressions.

### Why are the changes needed?
1. This PR checks the contract described in the comment explicitly:

https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247

that can prevent the errors of expression implementations, and improve code 
maintainability.

2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the related tests:
```
$ build/sbt "test:testOnly *UrlFunctionsSuite"
$ build/sbt "test:testOnly *DataSourceV2FunctionSuite"
```

    Closes #41954 from MaxGekk/fix-url_decode.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json  |  5 +
 .../explain-results/function_url_decode.explain   |  2 +-
 .../explain-results/function_url_encode.explain   |  2 +-
 .../sql-error-conditions-datatype-mismatch-error-class.md |  4 
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala   |  5 +++--
 .../spark/sql/catalyst/expressions/objects/objects.scala  | 15 +++
 .../spark/sql/catalyst/expressions/urlExpressions.scala   |  4 ++--
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 347ce026476..2c4d2b533a6 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -657,6 +657,11 @@
   "The  must be between  (current value = 
)."
 ]
   },
+  "WRONG_NUM_ARG_TYPES" : {
+"message" : [
+  "The expression requires  argument types but the actual 
number is ."
+]
+  },
   "WRONG_NUM_ENDPOINTS" : {
 "message" : [
   "The number of endpoints must be >= 2 to construct intervals but the 
actual number is ."
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
index 36b21e27c10..d612190396d 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
@@ -1,2 +1,2 @@
-Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, 
UTF-8, StringType, true, true, true) AS url_decode(g)#0]
+Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, 
UTF-8, StringType, StringType, true, true, true) AS url_decode(g)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
index 70a0f628fc9..bd2c63e19c6 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
@@ -1,2 +1,2 @@
-Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, 
UTF-8, StringType, true, true, true) AS url_encode(g)#0]
+Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, 
UTF-8, StringType, StringType, true, true, true) AS url_encode(g)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md 
b/docs/sql-error-conditions-datatype-mismatch-error-class.md
index 3bd63925323..ddc3e0c2b1b 100644
--- a/docs/sql-error-

[spark] branch master updated: [SPARK-38476][CORE] Use error class in org.apache.spark.storage

2023-07-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0f6a4a737ee [SPARK-38476][CORE] Use error class in 
org.apache.spark.storage
0f6a4a737ee is described below

commit 0f6a4a737ee9457a0b0c336b7d079cdd878d20e8
Author: Bo Zhang 
AuthorDate: Tue Jul 11 13:06:52 2023 +0300

[SPARK-38476][CORE] Use error class in org.apache.spark.storage

### What changes were proposed in this pull request?
This PR aims to change exceptions created in package 
org.apache.spark.shuffle to use error class. This also adds an error class 
INTERNAL_ERROR_STORAGE and uses that for the internal errors in the package.

### Why are the changes needed?
This is to move exceptions created in package org.apache.spark.storage to 
error class.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Updated existing tests.

Closes #41923 from bozhang2820/spark-38476.

Authored-by: Bo Zhang 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json |  6 ++
 .../org/apache/spark/storage/BlockInfoManager.scala  |  7 ---
 .../scala/org/apache/spark/storage/BlockManager.scala|  4 ++--
 .../org/apache/spark/storage/DiskBlockManager.scala  | 10 +-
 .../org/apache/spark/storage/DiskBlockObjectWriter.scala |  4 +++-
 .../main/scala/org/apache/spark/storage/DiskStore.scala  |  7 ---
 .../scala/org/apache/spark/storage/FallbackStorage.scala |  5 +++--
 .../spark/storage/ShuffleBlockFetcherIterator.scala  |  5 +++--
 .../org/apache/spark/storage/memory/MemoryStore.scala| 16 ++--
 .../org/apache/spark/storage/BlockInfoManagerSuite.scala |  4 ++--
 .../spark/storage/DiskBlockObjectWriterSuite.scala   |  4 ++--
 .../spark/storage/PartiallySerializedBlockSuite.scala| 14 +++---
 docs/sql-error-conditions.md |  6 ++
 13 files changed, 57 insertions(+), 35 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 66305c20112..347ce026476 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1089,6 +1089,12 @@
 ],
 "sqlState" : "XX000"
   },
+  "INTERNAL_ERROR_STORAGE" : {
+"message" : [
+  ""
+],
+"sqlState" : "XX000"
+  },
   "INTERVAL_ARITHMETIC_OVERFLOW" : {
 "message" : [
   "."
diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala
index fb532dd0736..45ebb6eafa6 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala
@@ -29,7 +29,7 @@ import scala.reflect.ClassTag
 import com.google.common.collect.{ConcurrentHashMultiset, ImmutableMultiset}
 import com.google.common.util.concurrent.Striped
 
-import org.apache.spark.TaskContext
+import org.apache.spark.{SparkException, TaskContext}
 import org.apache.spark.errors.SparkCoreErrors
 import org.apache.spark.internal.Logging
 
@@ -543,8 +543,9 @@ private[storage] class 
BlockInfoManager(trackingCacheVisibility: Boolean = false
 logTrace(s"Task $taskAttemptId trying to remove block $blockId")
 blockInfo(blockId) { (info, condition) =>
   if (info.writerTask != taskAttemptId) {
-throw new IllegalStateException(
-  s"Task $taskAttemptId called remove() on block $blockId without a 
write lock")
+throw SparkException.internalError(
+  s"Task $taskAttemptId called remove() on block $blockId without a 
write lock",
+  category = "STORAGE")
   } else {
 invisibleRDDBlocks.synchronized {
   blockInfoWrappers.remove(blockId)
diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index b4453b4d35e..05d57c67576 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -1171,8 +1171,8 @@ private[spark] class BlockManager(
 val buf = blockTransferService.fetchBlockSync(loc.host, loc.port, 
loc.executorId,
   blockId.toString, tempFileManager)
 if (blockSize > 0 && buf.size() == 0) {
-  throw new IllegalStateException("Empty buffer received for non empty 
block " +
-s"when fetching remote block $blockId from $loc")
+

[spark] branch master updated: [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]

2023-07-11 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c5a23e9c23f [SPARK-44320][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]
c5a23e9c23f is described below

commit c5a23e9c23f7bd7066060d0791f290ad38fca76f
Author: panbingkun 
AuthorDate: Tue Jul 11 11:16:03 2023 +0300

[SPARK-44320][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class, include:
- _LEGACY_ERROR_TEMP_1067 => UNSUPPORTED_FEATURE.DROP_DATABASE
- _LEGACY_ERROR_TEMP_1150 => UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE
- _LEGACY_ERROR_TEMP_1220 => UNSUPPORTED_FEATURE.HIVE_TABLE_TYPE
- _LEGACY_ERROR_TEMP_1265 => LOAD_DATA_PATH_NOT_EXISTS
- _LEGACY_ERROR_TEMP_1277 => 
CREATE_VIEW_COLUMN_ARITY_MISMATCH.TOO_MANY_SOURCE_COLUMNS / 
CREATE_VIEW_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_SOURCE_COLUMNS

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update & Add new UT.
- Manually test.
- Pass GA.

Closes #41909 from panbingkun/SPARK-44320.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  69 ++--
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  34 +-
 ...reate-view-column-arity-mismatch-error-class.md |  40 ++
 ...-error-conditions-invalid-format-error-class.md |   2 +-
 ...r-conditions-unsupported-feature-error-class.md |   8 +
 docs/sql-error-conditions.md   |  20 +
 .../sql/catalyst/catalog/SessionCatalog.scala  |   2 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  50 ++-
 .../apache/spark/sql/execution/command/views.scala |  12 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 456 +
 .../apache/spark/sql/execution/SQLViewSuite.scala  |  27 +-
 .../spark/sql/execution/command/DDLSuite.scala |   7 +-
 .../execution/command/v1/DropNamespaceSuite.scala  |  11 +-
 .../sql/execution/datasources/csv/CSVSuite.scala   |   7 +-
 .../spark/sql/hive/client/HiveClientImpl.scala |   2 +-
 .../spark/sql/hive/client/HiveClientSuite.scala|  12 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  54 ++-
 .../spark/sql/hive/orc/HiveOrcSourceSuite.scala|  92 +++--
 18 files changed, 610 insertions(+), 295 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 9f0ed7ace3a..66305c20112 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -370,6 +370,28 @@
 ],
 "sqlState" : "42710"
   },
+  "CREATE_VIEW_COLUMN_ARITY_MISMATCH" : {
+"message" : [
+  "Cannot create view , the reason is"
+],
+"subClass" : {
+  "NOT_ENOUGH_DATA_COLUMNS" : {
+"message" : [
+  "not enough data columns:",
+  "View columns: .",
+  "Data columns: ."
+]
+  },
+  "TOO_MANY_DATA_COLUMNS" : {
+"message" : [
+  "too many data columns:",
+  "View columns: .",
+  "Data columns: ."
+]
+  }
+},
+"sqlState" : "21S01"
+  },
   "DATATYPE_MISMATCH" : {
 "message" : [
   "Cannot resolve  due to data type mismatch:"
@@ -1247,7 +1269,7 @@
   },
   "MISMATCH_INPUT" : {
 "message" : [
-  "The input  '' does not match the format."
+  "The input   does not match the format."
 ]
   },
   "THOUSANDS_SEPS_MUST_BEFORE_DEC" : {
@@ -1772,6 +1794,11 @@
   "The join condition  has the invalid type 
, expected \"BOOLEAN\"."
 ]
   },
+  "LOAD_DATA_PATH_NOT_EXISTS" : {
+"message" : [
+  "LOAD DATA input path does not exist: ."
+]
+  },
   "LOCAL_MUST_WITH_SCHEMA_FILE" : {
 "message" : [
   "LOCAL must be used together with the schema of `file`, but got: 
``."
@@ -2544,6 +2571,11 @@
   "The direct query on files does not support the data source type: 
. Please try a different data source type or consider using a 
different query method."
 ]
   },
+  "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE" : {
+&q

[spark] branch master updated (990affdd503 -> b7c6c846c08)

2023-07-10 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 990affdd503 [SPARK-44290][CONNECT][FOLLOW-UP] Skip flaky tests, and 
fix a typo in session UUID together
 add b7c6c846c08 [SPARK-44328][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2325-2328]

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json| 57 +
 ...-conditions-cannot-update-field-error-class.md} | 26 
 docs/sql-error-conditions.md   | 14 ++--
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 18 --
 .../spark/sql/connector/AlterTableTests.scala  | 74 --
 5 files changed, 120 insertions(+), 69 deletions(-)
 copy docs/{sql-error-conditions-invalid-limit-like-expression-error-class.md 
=> sql-error-conditions-cannot-update-field-error-class.md} (65%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fdeb8d8551e -> 5e31f4dfc20)

2023-07-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fdeb8d8551e [SPARK-44321][CONNECT] Decouple ParseException from 
AnalysisException
 add 5e31f4dfc20 [SPARK-38477][CORE] Use error class in 
org.apache.spark.shuffle

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json| 16 +
 .../org/apache/spark/errors/SparkCoreErrors.scala  | 11 -
 .../spark/shuffle/IndexShuffleBlockResolver.scala  | 27 --
 .../shuffle/ShufflePartitionPairsWriter.scala  |  5 ++--
 docs/sql-error-conditions.md   | 12 ++
 .../spark/sql/errors/QueryExecutionErrors.scala|  2 +-
 6 files changed, 52 insertions(+), 21 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1fbb94b87c0 -> 1adf2866915)

2023-07-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1fbb94b87c0 [SPARK-44284][CONNECT] Create simple conf system for 
sql/api
 add 1adf2866915 [SPARK-44303][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2320-2324]

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json| 50 ++--
 .../connect/planner/SparkConnectProtoSuite.scala   |  8 +-
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 36 +++--
 ...ditions-invalid-observed-metrics-error-class.md | 12 +++
 docs/sql-error-conditions.md   | 12 +++
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 21 +++--
 .../sql/catalyst/analysis/AnalysisSuite.scala  | 27 +--
 .../spark/sql/connector/AlterTableTests.scala  | 92 +-
 .../connector/V2CommandsCaseSensitivitySuite.scala | 34 +++-
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 36 +++--
 10 files changed, 243 insertions(+), 85 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: Revert "[SPARK-43851][SQL] Support LCA in grouping expressions"

2023-07-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a68e362dca1 Revert "[SPARK-43851][SQL] Support LCA in grouping 
expressions"
a68e362dca1 is described below

commit a68e362dca10f1c0173fbe51bf321428378e4602
Author: Jia Fan 
AuthorDate: Thu Jul 6 15:20:38 2023 +0300

Revert "[SPARK-43851][SQL] Support LCA in grouping expressions"

### What changes were proposed in this pull request?
This reverts commit 9353d67f9290bae1e7d7e16a2caf5256cc4e2f92.

After discussion in #41817 , we should revert LCA in grouping expressions. 
Becuase the current solution has problems.

### Why are the changes needed?
revert PR

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
exist test

Closes #41869 from Hisoka-X/SPARK-43851_revert.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  5 +
 ...r-conditions-unsupported-feature-error-class.md |  4 
 .../analysis/ResolveReferencesInAggregate.scala| 22 ++
 .../column-resolution-aggregate.sql.out| 26 +-
 .../results/column-resolution-aggregate.sql.out| 16 +
 5 files changed, 44 insertions(+), 29 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 44bec5e8ced..a3b12022b66 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2613,6 +2613,11 @@
   "Referencing lateral column alias  in the aggregate query both 
with window expressions and with having clause. Please rewrite the aggregate 
query by removing the having clause or removing lateral alias reference in the 
SELECT list."
 ]
   },
+  "LATERAL_COLUMN_ALIAS_IN_GROUP_BY" : {
+"message" : [
+  "Referencing a lateral column alias via GROUP BY alias/ALL is not 
supported yet."
+]
+  },
   "LATERAL_COLUMN_ALIAS_IN_WINDOW" : {
 "message" : [
   "Referencing a lateral column alias  in window expression 
."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index 25f09118f74..a41502b609a 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -85,6 +85,10 @@ Referencing a lateral column alias `` in the aggregate 
function ``
 
 Referencing lateral column alias `` in the aggregate query both with 
window expressions and with having clause. Please rewrite the aggregate query 
by removing the having clause or removing lateral alias reference in the SELECT 
list.
 
+## LATERAL_COLUMN_ALIAS_IN_GROUP_BY
+
+Referencing a lateral column alias via GROUP BY alias/ALL is not supported yet.
+
 ## LATERAL_COLUMN_ALIAS_IN_WINDOW
 
 Referencing a lateral column alias `` in window expression ``.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
index 41bcb337c67..09ae87b071f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
@@ -17,8 +17,9 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.SQLConfHelper
-import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, 
Expression, LateralColumnAliasReference, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, 
Expression, NamedExpression}
 import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
 import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, AppendColumns, 
LogicalPlan}
 import 
org.apache.spark.sql.catalyst.trees.TreePattern.{LATERAL_COLUMN_ALIAS_REFERENCE,
 UNRESOLVED_ATTRIBUTE}
@@ -73,6 +74,12 @@ object ResolveReferencesInAggregate extends SQLConfHelper
 resolvedAggExprsWithOuter,
 resolveGroupByAlias(resolvedAggExprsWithOuter, 
resolvedGroupExprsNoOuter)
   ).map(resolveOuterRef)
+  // TODO: currently we don't support LCA in `groupingExpressions` yet.
+  if (resolved.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE))) {
+throw new AnalysisException(
+  errorClass = "

[spark] branch master updated: [SPARK-44299][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]

2023-07-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5d840eb4553 [SPARK-44299][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_227[4-6,8]
5d840eb4553 is described below

commit 5d840eb455350ef3f6235a031a1689bf4a51007d
Author: panbingkun 
AuthorDate: Thu Jul 6 10:08:45 2023 +0300

[SPARK-44299][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_227[4-6,8]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class, include:
- _LEGACY_ERROR_TEMP_2274 => UNSUPPORTED_FEATURE.REPLACE_NESTED_COLUMN
- _LEGACY_ERROR_TEMP_2275 => CANNOT_INVOKE_IN_TRANSFORMATIONS
- _LEGACY_ERROR_TEMP_2276 => UNSUPPORTED_FEATURE .HIVE_WITH_ANSI_INTERVALS
- _LEGACY_ERROR_TEMP_2278 => INVALID_FORMAT.MISMATCH_INPUT

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update & Add new UT.
- Manually test.
- Pass GA.

Closes #41858 from panbingkun/SPARK-44299.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 40 +++---
 ...-error-conditions-invalid-format-error-class.md |  4 +++
 ...r-conditions-unsupported-feature-error-class.md |  8 +
 docs/sql-error-conditions.md   |  6 
 .../spark/sql/catalyst/util/ToNumberParser.scala   |  4 +--
 .../spark/sql/errors/QueryExecutionErrors.scala| 20 +--
 .../expressions/StringExpressionsSuite.scala   |  9 +++--
 .../apache/spark/sql/execution/command/ddl.scala   |  2 +-
 .../sql-tests/results/postgreSQL/numeric.sql.out   | 10 +++---
 .../results/postgreSQL/numeric.sql.out.java21  | 10 +++---
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 13 +++
 .../spark/sql/DataFrameNaFunctionsSuite.scala  | 12 ---
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  2 +-
 .../command/AlterTableAddColumnsSuite.scala| 13 ---
 14 files changed, 101 insertions(+), 52 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 8bdb02470ef..44bec5e8ced 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -128,6 +128,11 @@
 ],
 "sqlState" : "22546"
   },
+  "CANNOT_INVOKE_IN_TRANSFORMATIONS" : {
+"message" : [
+  "Dataset transformations and actions can only be invoked by the driver, 
not inside of other Dataset transformations; for example, dataset1.map(x => 
dataset2.values.count() * x) is invalid because the values transformation and 
count action cannot be performed inside of the dataset1.map transformation. For 
more information, see SPARK-28702."
+]
+  },
   "CANNOT_LOAD_FUNCTION_CLASS" : {
 "message" : [
   "Cannot load class  when registering the function 
, please make sure it is on the classpath."
@@ -1192,6 +1197,11 @@
   "The escape character is not allowed to precede ."
 ]
   },
+  "MISMATCH_INPUT" : {
+"message" : [
+  "The input  '' does not match the format."
+]
+  },
   "THOUSANDS_SEPS_MUST_BEFORE_DEC" : {
 "message" : [
   "Thousands separators (, or G) may not appear after the decimal 
point in the number format."
@@ -2583,6 +2593,11 @@
   "Drop the namespace ."
 ]
   },
+  "HIVE_WITH_ANSI_INTERVALS" : {
+"message" : [
+  "Hive table  with ANSI intervals."
+]
+  },
   "INSERT_PARTITION_SPEC_IF_NOT_EXISTS" : {
 "message" : [
   "INSERT INTO  with IF NOT EXISTS in the PARTITION spec."
@@ -2663,6 +2678,11 @@
   "Remove a comment from the namespace ."
 ]
   },
+  "REPLACE_NESTED_COLUMN" : {
+"message" : [
+  "The replace function does not support nested column ."
+]
+  },
   "SET_NAMESPACE_PROPERTY" : {
 "message" : [
   " is a reserved namespace property, ."
@@ -5627,31 +5647,11 @@
   ""
 ]
   },
-  "_LEGACY_ERROR_TEMP_2274" : {
-"message" : [
-  "Nested field  is not supported."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2275" : {
-"message" : [
-  "Dataset transformations and actions can only be invoke

[spark] branch master updated (68862589a0c -> d53585c91b2)

2023-07-04 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 68862589a0c [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19
 add d53585c91b2 [SPARK-44292][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2315-2319]

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json| 57 --
 ...ror-conditions-datatype-mismatch-error-class.md |  4 ++
 ...itions-invalid-observed-metrics-error-class.md} | 24 -
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 24 +
 .../sql/catalyst/analysis/AnalysisSuite.scala  | 40 ++-
 5 files changed, 92 insertions(+), 57 deletions(-)
 copy docs/{sql-error-conditions-invalid-schema-error-class.md => 
sql-error-conditions-invalid-observed-metrics-error-class.md} (61%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b573cca90ea -> 7bc28d54f83)

2023-07-03 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b573cca90ea [SPARK-44288][SS] Set the column family options before 
passing to DBOptions in RocksDB state store provider
 add 7bc28d54f83 [SPARK-44269][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2310-2314]

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json| 25 +-
 docs/sql-error-conditions.md   |  6 ++
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 11 --
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 11 +-
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  8 +++
 .../apache/spark/sql/DataFrameWriterV2Suite.scala  | 19 
 .../test/DataStreamReaderWriterSuite.scala | 20 -
 7 files changed, 54 insertions(+), 46 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42169][SQL] Implement code generation for to_csv function (StructsToCsv)

2023-07-03 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 45ae9c5cc67 [SPARK-42169][SQL] Implement code generation for to_csv 
function (StructsToCsv)
45ae9c5cc67 is described below

commit 45ae9c5cc67d379f5bbeadf8c56c032f2bdaaac0
Author: narek_karapetian 
AuthorDate: Mon Jul 3 10:13:12 2023 +0300

[SPARK-42169][SQL] Implement code generation for to_csv function 
(StructsToCsv)

### What changes were proposed in this pull request?
This PR enhances `StructsToCsv` class with `doGenCode` function instead of 
extending it from `CodegenFallback` trait (performance improvement).

### Why are the changes needed?
It will improve performance.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
an additional test case were added to 
`org.apache.spark.sql.CsvFunctionsSuite` class.

Closes #39719 from NarekDW/SPARK-42169.

Authored-by: narek_karapetian 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/expressions/csvExpressions.scala  | 11 ++-
 .../catalyst/expressions/CsvExpressionsSuite.scala |  7 ++
 sql/core/benchmarks/CSVBenchmark-jdk11-results.txt | 82 +--
 sql/core/benchmarks/CSVBenchmark-jdk17-results.txt | 82 +--
 sql/core/benchmarks/CSVBenchmark-results.txt   | 94 +++---
 5 files changed, 144 insertions(+), 132 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
index e47cf493d4c..cdab9faacd4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch
 import org.apache.spark.sql.catalyst.csv._
-import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodegenFallback, ExprCode}
 import org.apache.spark.sql.catalyst.util._
 import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryErrorsBase}
 import org.apache.spark.sql.internal.SQLConf
@@ -245,8 +245,7 @@ case class StructsToCsv(
  options: Map[String, String],
  child: Expression,
  timeZoneId: Option[String] = None)
-  extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback 
with ExpectsInputTypes
-with NullIntolerant {
+  extends UnaryExpression with TimeZoneAwareExpression with ExpectsInputTypes 
with NullIntolerant {
   override def nullable: Boolean = true
 
   def this(options: Map[String, String], child: Expression) = this(options, 
child, None)
@@ -293,4 +292,10 @@ case class StructsToCsv(
 
   override protected def withNewChildInternal(newChild: Expression): 
StructsToCsv =
 copy(child = newChild)
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val structsToCsv = ctx.addReferenceObj("structsToCsv", this)
+nullSafeCodeGen(ctx, ev,
+  eval => s"${ev.value} = (UTF8String) 
$structsToCsv.converter().apply($eval);")
+  }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala
index 1d174ed2145..a89cb58c3e0 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala
@@ -246,4 +246,11 @@ class CsvExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper with P
   CsvToStructs(schema, Map.empty, Literal.create("1 day")),
   InternalRow(new CalendarInterval(0, 1, 0)))
   }
+
+  test("StructsToCsv should not generate codes beyond 64KB") {
+val range = Range.inclusive(1, 5000)
+val struct = CreateStruct.create(range.map(Literal.apply))
+val expected = range.mkString(",")
+checkEvaluation(StructsToCsv(Map.empty, struct), expected)
+  }
 }
diff --git a/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt
index 7b5ea10bc4e..7fca105a8c2 100644
--- a/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/CSVBenchmark-jdk11-results.txt
@@ -2,69 +2,69 @@
 Benchmark to measure CSV read/write performance
 
=

[spark] branch master updated: [SPARK-44268][CORE][TEST] Add tests to ensure error-classes.json and docs are in sync

2023-07-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b557f5752af [SPARK-44268][CORE][TEST] Add tests to ensure 
error-classes.json and docs are in sync
b557f5752af is described below

commit b557f5752afc32d614b37be610dbbca44519664b
Author: Jia Fan 
AuthorDate: Sun Jul 2 18:51:09 2023 +0300

[SPARK-44268][CORE][TEST] Add tests to ensure error-classes.json and docs 
are in sync

### What changes were proposed in this pull request?
Add new test to make sure to `error-classes.json` are match with series of 
`sql-error-conditions.md`.
After this PR, any difference between `error-classes.json` and document 
with report a error during test.
Note: only compare error class name at now.

Also fix all different which be found by new test case.

### Why are the changes needed?
Make sure our error-classes.json always sync with doc.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
new test.

Closes #41813 from Hisoka-X/SPARK-44268_sync_error_classes_to_doc.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/SparkThrowableSuite.scala |  51 ++
 ...ror-conditions-datatype-mismatch-error-class.md |   4 +
 ...tions-incompatible-data-to-table-error-class.md |  64 ---
 ...ror-conditions-insert-column-arity-mismatch.md} |  30 +-
 ...rror-conditions-insufficient-table-property.md} |  26 +-
 ... => sql-error-conditions-invalid-as-of-join.md} |  26 +-
 ...md => sql-error-conditions-invalid-boundary.md} |  26 +-
 ... sql-error-conditions-invalid-default-value.md} |  26 +-
 ...> sql-error-conditions-invalid-inline-table.md} |  26 +-
 ...ror-conditions-invalid-lamdba-function-call.md} |  26 +-
 ...or-conditions-invalid-limit-like-expression.md} |  26 +-
 ...nditions-invalid-parameter-value-error-class.md |  14 +-
 ...rror-conditions-invalid-partition-operation.md} |  26 +-
 docs/sql-error-conditions-invalid-sql-syntax.md|  92 
 ...nditions-invalid-time-travel-timestamp-expr.md} |  26 +-
 ...error-conditions-invalid-write-distribution.md} |  26 +-
 ...rror-conditions-malformed-record-in-parsing.md} |  24 +-
 ... => sql-error-conditions-missing-attributes.md} |  26 +-
 ...onditions-not-a-constant-string-error-class.md} |  26 +-
 ...=> sql-error-conditions-not-allowed-in-from.md} |  26 +-
 ...or-conditions-not-supported-in-jdbc-catalog.md} |  26 +-
 ...> sql-error-conditions-unsupported-add-file.md} |  26 +-
 ...-error-conditions-unsupported-default-value.md} |  26 +-
 ...r-conditions-unsupported-feature-error-class.md |  36 ++
 ... => sql-error-conditions-unsupported-insert.md} |  26 +-
 ...rror-conditions-unsupported-merge-condition.md} |  26 +-
 ... sql-error-conditions-unsupported-overwrite.md} |  26 +-
 docs/sql-error-conditions.md   | 609 -
 28 files changed, 984 insertions(+), 434 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 96c4e3b8ab7..034a782e533 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -141,6 +141,57 @@ class SparkThrowableSuite extends SparkFunSuite {
 checkIfUnique(messageFormats)
   }
 
+  test("SPARK-44268: Error classes match with document") {
+val sqlstateDoc = "sql-error-conditions-sqlstates.md"
+val errors = errorReader.errorInfoMap
+val errorDocPaths = getWorkspaceFilePath("docs").toFile
+  .listFiles(_.getName.startsWith("sql-error-conditions-"))
+  .filter(!_.getName.equals(sqlstateDoc))
+  .map(f => IOUtils.toString(f.toURI, 
StandardCharsets.UTF_8)).map(_.split("\n"))
+// check the error classes in document should be in error-classes.json
+val linkInDocRegex = "\\[(.*)\\]\\((.*)\\)".r
+val commonErrorsInDoc = IOUtils.toString(getWorkspaceFilePath("docs",
+  "sql-error-conditions.md").toUri, StandardCharsets.UTF_8).split("\n")
+  .filter(_.startsWith("###")).map(s => s.replace("###", "").trim)
+  .filter(linkInDocRegex.findFirstMatchIn(_).isEmpty)
+
+commonErrorsInDoc.foreach(s => assert(errors.contains(s),
+  s"Error class: $s is not in error-classes.json"))
+
+val titlePrefix = "title:"
+val errorsInDoc = errorDocPaths.map(lines => {
+  val errorClass = lines.filter(_.startsWith(titlePrefix))
+.map(s => s.replace("error class", "").replace(titlePrefix, 
"").trim).head
+  assert(errors.contains(errorClass), s&

[spark] branch master updated: [SPARK-44254][SQL] Move QueryExecutionErrors that used by DataType to sql/api as DataTypeErrors

2023-07-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cf852b284d5 [SPARK-44254][SQL] Move QueryExecutionErrors that used by 
DataType to sql/api as DataTypeErrors
cf852b284d5 is described below

commit cf852b284d550f9425ae7893796ae0042be6010f
Author: Rui Wang 
AuthorDate: Sun Jul 2 10:18:43 2023 +0300

[SPARK-44254][SQL] Move QueryExecutionErrors that used by DataType to 
sql/api as DataTypeErrors

### What changes were proposed in this pull request?

Moving some QueryExecutionErrors that are used by data types to `sql/api` 
and name those as DataType erros so that DataType can use those if DataType 
only stay in `sql/api` module.

### Why are the changes needed?

Towards a simpler DataType interface.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing test

Closes #41794 from amaliujia/datatype_more_refactors.

Authored-by: Rui Wang 
Signed-off-by: Max Gekk 
---
 sql/api/pom.xml|  5 ++
 .../apache/spark/sql/errors/DataTypeErrors.scala   | 95 ++
 .../spark/sql/errors/QueryExecutionErrors.scala| 42 ++
 .../apache/spark/sql/types/AbstractDataType.scala  |  4 +-
 .../scala/org/apache/spark/sql/types/Decimal.scala |  9 +-
 .../org/apache/spark/sql/types/DecimalType.scala   |  4 +-
 .../org/apache/spark/sql/types/Metadata.scala  | 10 +--
 .../org/apache/spark/sql/types/ObjectType.scala|  4 +-
 .../apache/spark/sql/types/UDTRegistration.scala   |  6 +-
 9 files changed, 127 insertions(+), 52 deletions(-)

diff --git a/sql/api/pom.xml b/sql/api/pom.xml
index 9b7917e0343..41a5b85d4c6 100644
--- a/sql/api/pom.xml
+++ b/sql/api/pom.xml
@@ -40,6 +40,11 @@
 spark-common-utils_${scala.binary.version}
 ${project.version}
 
+
+org.apache.spark
+spark-unsafe_${scala.binary.version}
+${project.version}
+
 
 
 
target/scala-${scala.binary.version}/classes
diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
new file mode 100644
index 000..02e8b12c707
--- /dev/null
+++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.errors
+
+import org.apache.spark.{SparkArithmeticException, SparkException, 
SparkRuntimeException, SparkUnsupportedOperationException}
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * Object for grouping error messages from (most) exceptions thrown during 
query execution.
+ * This does not include exceptions thrown during the eager execution of 
commands, which are
+ * grouped into [[QueryCompilationErrors]].
+ */
+private[sql] object DataTypeErrors {
+  def unsupportedOperationExceptionError(): SparkUnsupportedOperationException 
= {
+new SparkUnsupportedOperationException(
+  errorClass = "_LEGACY_ERROR_TEMP_2225",
+  messageParameters = Map.empty)
+  }
+
+  def decimalPrecisionExceedsMaxPrecisionError(
+  precision: Int, maxPrecision: Int): SparkArithmeticException = {
+new SparkArithmeticException(
+  errorClass = "DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION",
+  messageParameters = Map(
+"precision" -> precision.toString,
+"maxPrecision" -> maxPrecision.toString
+  ),
+  context = Array.empty,
+  summary = "")
+  }
+
+  def unsupportedRoundingMode(roundMode: BigDecimal.RoundingMode.Value): 
SparkException = {
+SparkException.internalError(s"Not supported rounding mode: 
${roundMode.toString}.")
+  }
+
+  def outOfDecimalTypeRangeError(str: UTF8String): SparkArithmeticException = {
+new SparkArithmeticException(
+  errorClass = "NUMERIC_OUT_OF_SUPPORTE

[spark] branch master updated: [SPARK-44255][SQL] Relocate StorageLevel to common/utils

2023-07-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f985e3e84a2 [SPARK-44255][SQL] Relocate StorageLevel to common/utils
f985e3e84a2 is described below

commit f985e3e84a23ab5a83842047408e3fd92887447a
Author: Rui Wang 
AuthorDate: Sat Jul 1 12:10:22 2023 +0300

[SPARK-44255][SQL] Relocate StorageLevel to common/utils

### What changes were proposed in this pull request?

Relocate `StorageLevel` to `common/utils`.

### Why are the changes needed?

Scala client needs `StorageLevel` so this can be shared in the 
`common/utils`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #41797 from amaliujia/move_storage_level_to_common_utils.

Authored-by: Rui Wang 
Signed-off-by: Max Gekk 
---
 .../java/org/apache/spark/memory/MemoryMode.java   |  0
 .../org/apache/spark/storage/StorageLevel.scala|  6 ++---
 .../org/apache/spark/util/SparkErrorUtils.scala| 30 +-
 .../main/scala/org/apache/spark/util/Utils.scala   | 11 +---
 project/MimaExcludes.scala |  4 +++
 5 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/memory/MemoryMode.java 
b/common/utils/src/main/java/org/apache/spark/memory/MemoryMode.java
similarity index 100%
copy from core/src/main/java/org/apache/spark/memory/MemoryMode.java
copy to common/utils/src/main/java/org/apache/spark/memory/MemoryMode.java
diff --git a/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala 
b/common/utils/src/main/scala/org/apache/spark/storage/StorageLevel.scala
similarity index 97%
rename from core/src/main/scala/org/apache/spark/storage/StorageLevel.scala
rename to 
common/utils/src/main/scala/org/apache/spark/storage/StorageLevel.scala
index 4a2b705e069..73bc53dab89 100644
--- a/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala
+++ b/common/utils/src/main/scala/org/apache/spark/storage/StorageLevel.scala
@@ -22,7 +22,7 @@ import java.util.concurrent.ConcurrentHashMap
 
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.memory.MemoryMode
-import org.apache.spark.util.Utils
+import org.apache.spark.util.SparkErrorUtils
 
 /**
  * :: DeveloperApi ::
@@ -98,12 +98,12 @@ class StorageLevel private(
 ret
   }
 
-  override def writeExternal(out: ObjectOutput): Unit = Utils.tryOrIOException 
{
+  override def writeExternal(out: ObjectOutput): Unit = 
SparkErrorUtils.tryOrIOException {
 out.writeByte(toInt)
 out.writeByte(_replication)
   }
 
-  override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException {
+  override def readExternal(in: ObjectInput): Unit = 
SparkErrorUtils.tryOrIOException {
 val flags = in.readByte()
 _useDisk = (flags & 8) != 0
 _useMemory = (flags & 4) != 0
diff --git a/core/src/main/java/org/apache/spark/memory/MemoryMode.java 
b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
similarity index 50%
rename from core/src/main/java/org/apache/spark/memory/MemoryMode.java
rename to 
common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
index 3a5e72d8aae..8e4de01885e 100644
--- a/core/src/main/java/org/apache/spark/memory/MemoryMode.java
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
@@ -14,13 +14,31 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+package org.apache.spark.util
 
-package org.apache.spark.memory;
+import java.io.IOException
 
-import org.apache.spark.annotation.Private;
+import scala.util.control.NonFatal
 
-@Private
-public enum MemoryMode {
-  ON_HEAP,
-  OFF_HEAP
+import org.apache.spark.internal.Logging
+
+object SparkErrorUtils extends Logging {
+  /**
+   * Execute a block of code that returns a value, re-throwing any non-fatal 
uncaught
+   * exceptions as IOException. This is used when implementing Externalizable 
and Serializable's
+   * read and write methods, since Java's serializer will not report 
non-IOExceptions properly;
+   * see SPARK-4080 for more context.
+   */
+  def tryOrIOException[T](block: => T): T = {
+try {
+  block
+} catch {
+  case e: IOException =>
+logError("Exception encountered", e)
+throw e
+  case NonFatal(e) =>
+logError("Exception encountered", e)
+throw new IOException(e)
+}
+  }
 }
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index ada0cffd2b0..60895c791b5 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark

[spark] branch master updated: [SPARK-44244][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2305-2309]

2023-07-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3baf7f7b710 [SPARK-44244][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2305-2309]
3baf7f7b710 is described below

commit 3baf7f7b7106f3fd30257b793ff4908d0f1ec427
Author: Jiaan Geng 
AuthorDate: Sat Jul 1 12:03:42 2023 +0300

[SPARK-44244][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2305-2309]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class 
_LEGACY_ERROR_TEMP_[2305-2309].

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases updated and added new test cases.

Closes #41788 from beliefer/SPARK-44244.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 35 ++
 .../spark/sql/catalyst/analysis/Analyzer.scala | 14 -
 .../catalyst/analysis/ResolveInlineTables.scala| 10 +++
 .../sql/catalyst/analysis/AnalysisSuite.scala  |  6 ++--
 .../ansi/higher-order-functions.sql.out|  2 +-
 .../higher-order-functions.sql.out |  2 +-
 .../analyzer-results/inline-table.sql.out  | 16 +-
 .../table-valued-functions.sql.out | 20 ++---
 .../analyzer-results/udf/udf-inline-table.sql.out  | 16 +-
 .../results/ansi/higher-order-functions.sql.out|  2 +-
 .../results/higher-order-functions.sql.out |  2 +-
 .../sql-tests/results/inline-table.sql.out | 16 +-
 .../results/table-valued-functions.sql.out | 20 ++---
 .../sql-tests/results/udf/udf-inline-table.sql.out | 16 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 13 
 .../execution/command/PlanResolutionSuite.scala| 19 
 16 files changed, 105 insertions(+), 104 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 14bd3bc6bac..027d09eae10 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1241,6 +1241,11 @@
 "message" : [
   "Found incompatible types in the column  for inline table."
 ]
+  },
+  "NUM_COLUMNS_MISMATCH" : {
+"message" : [
+  "Inline table expected  columns but found 
 columns in row ."
+]
   }
 }
   },
@@ -1266,6 +1271,11 @@
   "The lambda function has duplicate arguments . Please, 
consider to rename the argument names or set  to \"true\"."
 ]
   },
+  "NON_HIGHER_ORDER_FUNCTION" : {
+"message" : [
+  "A lambda function should only be used in a higher order function. 
However, its class is , which is not a higher order function."
+]
+  },
   "NUM_ARGS_MISMATCH" : {
 "message" : [
   "A higher order function expects  arguments, but 
got ."
@@ -1939,6 +1949,11 @@
 ],
 "sqlState" : "42826"
   },
+  "NUM_TABLE_VALUE_ALIASES_MISMATCH" : {
+"message" : [
+  "Number of given aliases does not match number of output columns. 
Function name: ; number of aliases: ; number of output 
columns: ."
+]
+  },
   "ORDER_BY_POS_OUT_OF_RANGE" : {
 "message" : [
   "ORDER BY position  is not in select list (valid range is [1, 
])."
@@ -5589,26 +5604,6 @@
   "The input  '' does not match the given number format: 
''."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2305" : {
-"message" : [
-  "expected  columns but found  columns in row ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2306" : {
-"message" : [
-  "A lambda function should only be used in a higher order function. 
However, its class is , which is not a higher order function."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2307" : {
-"message" : [
-  "Number of given aliases does not match number of output columns. 
Function name: ; number of aliases: ; number of output 
columns: ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2309" : {
-"message" : [
-  "cannot resolve  in MERGE command given columns []."
-]
-  },
   "_LEGACY_ERROR_TEMP_2311" : {
 "message" : [
   "'writeTo' can not be called on streaming Dataset/DataFrame."
diff --git

[spark] branch master updated: [SPARK-44044][SS] Improve Error message for Window functions with streaming

2023-06-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f406b54b2a8 [SPARK-44044][SS] Improve Error message for Window 
functions with streaming
f406b54b2a8 is described below

commit f406b54b2a899d03bae2e6f70eef7fedfed63d65
Author: Siying Dong 
AuthorDate: Sat Jul 1 08:51:22 2023 +0300

[SPARK-44044][SS] Improve Error message for Window functions with streaming

### What changes were proposed in this pull request?
Replace existing error message when non-time window function is used with 
streaming to include aggregation function and column. The error message looks 
like following now:

org.apache.spark.sql.AnalysisException: Window function is not supported in 
'row_number()' as column 'rn_col' on streaming DataFrames/Datasets. Structured 
Streaming only supports time-window aggregation using the `window` unction. 
(window specification: '(PARTITION BY col1 ORDER BY col2 ASC NULLS FIRST ROWS 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)')

Note that the message is a little bit unnatural as the existing unit test 
requires the exception follows the pattern that it includes "not supported", 
"streaming" "DataFrames" and "Dataset".

### Why are the changes needed?
The exiting error message is vague and a full logical plan is included. A 
user reports that they aren't able to identify what the problem is.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added a unit test

Closes #41578 from siying/window_error.

Lead-authored-by: Siying Dong 
Co-authored-by: Siying Dong 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  5 
 .../analysis/UnsupportedOperationChecker.scala | 17 ++---
 .../spark/sql/errors/QueryExecutionErrors.scala| 16 -
 .../analysis/UnsupportedOperationsSuite.scala  | 24 ++-
 .../apache/spark/sql/streaming/StreamSuite.scala   | 28 ++
 5 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index eabd5533e13..14bd3bc6bac 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1775,6 +1775,11 @@
 ],
 "sqlState" : "42000"
   },
+  "NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING" : {
+"message" : [
+  "Window function is not supported in  (as column 
) on streaming DataFrames/Datasets. Structured Streaming only 
supports time-window aggregation using the WINDOW function. (window 
specification: )"
+]
+  },
   "NOT_ALLOWED_IN_FROM" : {
 "message" : [
   "Not allowed in the FROM clause:"
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
index daa7c0d54b7..2a09d85d8f2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
@@ -19,11 +19,12 @@ package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference, BinaryComparison, CurrentDate, CurrentTimestampLike, 
Expression, GreaterThan, GreaterThanOrEqual, GroupingSets, LessThan, 
LessThanOrEqual, LocalTimestamp, MonotonicallyIncreasingID, SessionWindow}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference, BinaryComparison, CurrentDate, CurrentTimestampLike, 
Expression, GreaterThan, GreaterThanOrEqual, GroupingSets, LessThan, 
LessThanOrEqual, LocalTimestamp, MonotonicallyIncreasingID, SessionWindow, 
WindowExpression}
 import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
 import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.streaming.InternalOutputModes
+import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.streaming.{GroupStateTimeout, OutputMode}
 
@@ -508,8 +509,18 @@ object UnsupportedOperationChecker extends Logging {
 case Sample(_, _, _, _, child) if child.isStreaming =>
   throwError("Sampling is not supported on streaming 
DataFrames/Datasets&

[spark] branch master updated: [SPARK-43851][SQL] Support LCA in grouping expressions

2023-06-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9353d67f929 [SPARK-43851][SQL] Support LCA in grouping expressions
9353d67f929 is described below

commit 9353d67f9290bae1e7d7e16a2caf5256cc4e2f92
Author: Jia Fan 
AuthorDate: Sat Jul 1 08:48:10 2023 +0300

[SPARK-43851][SQL] Support LCA in grouping expressions

### What changes were proposed in this pull request?
This PR bring support lateral column alias reference in grouping 
expressions.

### Why are the changes needed?
add new feature for LCA

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
exist test

Closes #41804 from Hisoka-X/SPARK-43851_LCA_in_group.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  5 -
 ...r-conditions-unsupported-feature-error-class.md |  4 
 .../analysis/ResolveReferencesInAggregate.scala| 22 --
 .../column-resolution-aggregate.sql.out| 26 +-
 .../results/column-resolution-aggregate.sql.out| 16 -
 5 files changed, 29 insertions(+), 44 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 3cc35d668e0..eabd5533e13 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2530,11 +2530,6 @@
   "Referencing lateral column alias  in the aggregate query both 
with window expressions and with having clause. Please rewrite the aggregate 
query by removing the having clause or removing lateral alias reference in the 
SELECT list."
 ]
   },
-  "LATERAL_COLUMN_ALIAS_IN_GROUP_BY" : {
-"message" : [
-  "Referencing a lateral column alias via GROUP BY alias/ALL is not 
supported yet."
-]
-  },
   "LATERAL_COLUMN_ALIAS_IN_WINDOW" : {
 "message" : [
   "Referencing a lateral column alias  in window expression 
."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index 64d7eb347e5..78bf301c49d 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -65,10 +65,6 @@ Referencing a lateral column alias `` in the aggregate 
function ``
 
 Referencing lateral column alias `` in the aggregate query both with 
window expressions and with having clause. Please rewrite the aggregate query 
by removing the having clause or removing lateral alias reference in the SELECT 
list.
 
-## LATERAL_COLUMN_ALIAS_IN_GROUP_BY
-
-Referencing a lateral column alias via GROUP BY alias/ALL is not supported yet.
-
 ## LATERAL_COLUMN_ALIAS_IN_WINDOW
 
 Referencing a lateral column alias `` in window expression ``.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
index 09ae87b071f..41bcb337c67 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
@@ -17,9 +17,8 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
-import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.SQLConfHelper
-import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, 
Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{AliasHelper, Attribute, 
Expression, LateralColumnAliasReference, NamedExpression}
 import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
 import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, AppendColumns, 
LogicalPlan}
 import 
org.apache.spark.sql.catalyst.trees.TreePattern.{LATERAL_COLUMN_ALIAS_REFERENCE,
 UNRESOLVED_ATTRIBUTE}
@@ -74,12 +73,6 @@ object ResolveReferencesInAggregate extends SQLConfHelper
 resolvedAggExprsWithOuter,
 resolveGroupByAlias(resolvedAggExprsWithOuter, 
resolvedGroupExprsNoOuter)
   ).map(resolveOuterRef)
-  // TODO: currently we don't support LCA in `groupingExpressions` yet.
-  if (resolved.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE))) {
-throw new AnalysisException(
-  errorClass = "UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY",
-  messageParameters = Map.empty)
-  }
   resolved
 } else {

[spark] branch master updated: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020

2023-06-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 706829d9731 [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020
706829d9731 is described below

commit 706829d97312c6812bf791d9893d0a70d81676ae
Author: itholic 
AuthorDate: Fri Jun 30 21:25:04 2023 +0300

[SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020

### What changes were proposed in this pull request?

This PR proposes to assign name to _LEGACY_ERROR_TEMP_1020, 
"INVALID_USAGE_OF_STAR_OR_REGEX".

### Why are the changes needed?

We should assign proper name to _LEGACY_ERROR_TEMP_*

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*`

Closes #39702 from itholic/LEGACY_1020.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 11 +--
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  2 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 86 --
 .../catalyst/analysis/ResolveSubquerySuite.scala   |  6 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  | 11 ++-
 .../scala/org/apache/spark/sql/DatasetSuite.scala  |  4 +-
 7 files changed, 82 insertions(+), 40 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index abe88db1267..3cc35d668e0 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1596,6 +1596,12 @@
   "The url is invalid: . If necessary set  to \"false\" 
to bypass this error."
 ]
   },
+  "INVALID_USAGE_OF_STAR_OR_REGEX" : {
+"message" : [
+  "Invalid usage of  in ."
+],
+"sqlState" : "42000"
+  },
   "INVALID_VIEW_TEXT" : {
 "message" : [
   "The view  cannot be displayed due to invalid view text: 
. This may be caused by an unauthorized modification of the view or 
an incorrect query syntax. Please check your query syntax and verify that the 
view has not been tampered with."
@@ -3169,11 +3175,6 @@
   " is a permanent view, which is not supported by streaming 
reading API such as `DataStreamReader.table` yet."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1020" : {
-"message" : [
-  "Invalid usage of  in ."
-]
-  },
   "_LEGACY_ERROR_TEMP_1021" : {
 "message" : [
   "count(.*) is not allowed. Please use count(*) or expand 
the columns manually, e.g. count(col1, col2)."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 32cec909401..b61dbae686b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1897,7 +1897,7 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
   })
 // count(*) has been replaced by count(1)
 case o if containsStar(o.children) =>
-  throw QueryCompilationErrors.invalidStarUsageError(s"expression 
'${o.prettyName}'",
+  throw QueryCompilationErrors.invalidStarUsageError(s"expression 
`${o.prettyName}`",
 extractStar(o.children))
   }
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 94cbf880b57..e02708105d2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -475,7 +475,7 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
 }
 val elem = Seq(starMsg, resExprMsg).flatten.mkString(" and ")
 new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1020",
+  errorClass = "INVALID_USAGE_OF_STAR_OR_REGEX",
   messageParameters = Map("elem" -> elem, "prettyName" -> prettyName))
   }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
index f994c03..fdaeadc5445 100644
-

[spark] branch master updated: [SPARK-43986][SQL] Create error classes for HyperLogLog function call failures

2023-06-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ab67f461987 [SPARK-43986][SQL] Create error classes for HyperLogLog 
function call failures
ab67f461987 is described below

commit ab67f4619873f21b5dcf7f67658afce7e1028657
Author: Daniel Tenedorio 
AuthorDate: Fri Jun 30 19:44:14 2023 +0300

[SPARK-43986][SQL] Create error classes for HyperLogLog function call 
failures

### What changes were proposed in this pull request?

This PR creates error classes for HyperLogLog function call failures.

### Why are the changes needed?

These replace previous Java exceptions or other cases, in order to improve 
the user experience and bring consistency with other parts of Spark.

### Does this PR introduce _any_ user-facing change?

Yes, error messages change slightly.

### How was this patch tested?

This PR also adds SQL query test files for the HLL functions.

Closes #41486 from dtenedor/hll-error-classes.

Authored-by: Daniel Tenedorio 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  15 +
 .../aggregate/datasketchesAggregates.scala |  71 +++--
 .../expressions/datasketchesExpressions.scala  |  29 +-
 .../spark/sql/errors/QueryExecutionErrors.scala|  26 ++
 .../sql-tests/analyzer-results/hll.sql.out | 215 +
 .../src/test/resources/sql-tests/inputs/hll.sql|  76 +
 .../test/resources/sql-tests/results/hll.sql.out   | 262 
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 338 -
 8 files changed, 850 insertions(+), 182 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index db6b9a97012..abe88db1267 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -782,6 +782,21 @@
   "The expression  cannot be used as a grouping expression 
because its data type  is not an orderable data type."
 ]
   },
+  "HLL_INVALID_INPUT_SKETCH_BUFFER" : {
+"message" : [
+  "Invalid call to ; only valid HLL sketch buffers are supported 
as inputs (such as those produced by the `hll_sketch_agg` function)."
+]
+  },
+  "HLL_INVALID_LG_K" : {
+"message" : [
+  "Invalid call to ; the `lgConfigK` value must be between  
and , inclusive: ."
+]
+  },
+  "HLL_UNION_DIFFERENT_LG_K" : {
+"message" : [
+  "Sketches have different `lgConfigK` values:  and . Set the 
`allowDifferentLgConfigK` parameter to true to call  with different 
`lgConfigK` values."
+]
+  },
   "IDENTIFIER_TOO_MANY_NAME_PARTS" : {
 "message" : [
   " is not a valid identifier as it has more than 2 name 
parts."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
index 8b24efe12b4..17c69f798d8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
@@ -17,23 +17,23 @@
 
 package org.apache.spark.sql.catalyst.expressions.aggregate
 
-import org.apache.datasketches.SketchesArgumentException
 import org.apache.datasketches.hll.{HllSketch, TgtHllType, Union}
 import org.apache.datasketches.memory.Memory
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, ExpressionDescription, Literal}
 import org.apache.spark.sql.catalyst.trees.BinaryLike
+import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.types.{AbstractDataType, BinaryType, BooleanType, 
DataType, IntegerType, LongType, StringType, TypeCollection}
 import org.apache.spark.unsafe.types.UTF8String
 
 
 /**
- * The HllSketchAgg function utilizes a Datasketches HllSketch instance to
- * count a probabilistic approximation of the number of unique values in
- * a given column, and outputs the binary representation of the HllSketch.
+ * The HllSketchAgg function utilizes a Datasketches HllSketch instance to 
count a probabilistic
+ * approximation of the number of unique values in a given column, and outputs 
the binary
+ * representation of the HllSketch.
  *
- * See [[https://datasketches.apache.org/docs/HLL/HLL.html]] for more 
information
+ * See [[https://datasketches.apache.org/docs/HLL/HLL.html]] for mor

[spark] branch master updated: [SPARK-44260][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1215-1245-2329] & Use checkError() to check Exception in CharVarcharSuite

2023-06-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3fb9a2c6135 [SPARK-44260][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1215-1245-2329] & Use checkError() to check Exception in 
*CharVarchar*Suite
3fb9a2c6135 is described below

commit 3fb9a2c6135d49cc7b80546c0f228d7d2bc78bf6
Author: panbingkun 
AuthorDate: Fri Jun 30 18:36:46 2023 +0300

[SPARK-44260][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[1215-1245-2329] & Use checkError() to check Exception in 
*CharVarchar*Suite

### What changes were proposed in this pull request?
The pr aims to:
1.Assign clear error class names for some logic in 
`CharVarcharCodegenUtils` that directly uses exceptions
- EXCEED_LIMIT_LENGTH

2.Assign names to the error class
- _LEGACY_ERROR_TEMP_1215 -> UNSUPPORTED_CHAR_OR_VARCHAR_AS_STRING
- _LEGACY_ERROR_TEMP_1245 -> NOT_SUPPORTED_CHANGE_COLUMN
- _LEGACY_ERROR_TEMP_2329  -> merge to 
NOT_SUPPORTED_CHANGE_COLUMN(_LEGACY_ERROR_TEMP_1245)

3.Use checkError() to check Exception in `*CharVarchar*Suite`

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update UT
- Pass GA
- Manually test.

Closes #41768 from panbingkun/CharVarchar_checkError.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  30 +--
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|  19 +-
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |  19 +-
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  |  19 +-
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |  19 +-
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |  19 +-
 .../sql/catalyst/util/CharVarcharCodegenUtils.java |   3 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  11 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  17 +-
 .../spark/sql/errors/QueryExecutionErrors.scala|   7 +
 .../apache/spark/sql/execution/command/ddl.scala   |   3 +-
 .../analyzer-results/change-column.sql.out |  11 +-
 .../sql-tests/analyzer-results/charvarchar.sql.out |  11 +-
 .../sql-tests/results/change-column.sql.out|  11 +-
 .../sql-tests/results/charvarchar.sql.out  |  11 +-
 .../apache/spark/sql/CharVarcharTestSuite.scala| 291 +
 .../spark/sql/connector/AlterTableTests.scala  |  25 +-
 .../execution/command/CharVarcharDDLTestBase.scala | 120 +++--
 .../spark/sql/HiveCharVarcharTestSuite.scala   |  12 +-
 19 files changed, 494 insertions(+), 164 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 1b2a1ce305a..db6b9a97012 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -680,6 +680,11 @@
   "The event time  has the invalid type , but 
expected \"TIMESTAMP\"."
 ]
   },
+  "EXCEED_LIMIT_LENGTH" : {
+"message" : [
+  "Exceeds char/varchar type length limitation: ."
+]
+  },
   "EXPRESSION_TYPE_IS_NOT_ORDERABLE" : {
 "message" : [
   "Column expression  cannot be sorted because its type  
is not orderable."
@@ -1817,6 +1822,11 @@
 },
 "sqlState" : "42000"
   },
+  "NOT_SUPPORTED_CHANGE_COLUMN" : {
+"message" : [
+  "ALTER TABLE ALTER/CHANGE COLUMN is not supported for changing 's 
column  with type  to  with type ."
+]
+  },
   "NOT_SUPPORTED_COMMAND_FOR_V2_TABLE" : {
 "message" : [
   " is not supported for v2 tables."
@@ -2351,6 +2361,11 @@
 ],
 "sqlState" : "0A000"
   },
+  "UNSUPPORTED_CHAR_OR_VARCHAR_AS_STRING" : {
+"message" : [
+  "The char/varchar type can't be used in the table schema. If you want 
Spark treat them as string type as same as Spark 3.0 and earlier, please set 
\"spark.sql.legacy.charVarcharAsString\" to \"true\"."
+]
+  },
   "UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY" : {
 "message" : [
   "Unsupported data source type for direct query on files: 
"
@@ -3875,11 +3890,6 @@
   "Found different window function type in ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1215" : {
-"message" : [
-  "char/varchar type can only be used in the table schema. You can set 
 to true, so that Spark treat them as string type as same as Spark 3.0 
and

[spark] branch master updated: [SPARK-43922][SQL] Add named parameter support in parser for function calls

2023-06-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 91c45812520 [SPARK-43922][SQL] Add named parameter support in parser 
for function calls
91c45812520 is described below

commit 91c458125203d2feefd1e7443a9315c480dfaa00
Author: Richard Yu 
AuthorDate: Fri Jun 30 13:09:12 2023 +0300

[SPARK-43922][SQL] Add named parameter support in parser for function calls

### What changes were proposed in this pull request?
We plan on adding two new tokens called ```namedArgumentExpression``` and 
```functionArgument``` which would enable this feature. When parsing this 
logic, we also make changes to ASTBuilder such that it can detect if the 
argument passed is a named argument or a positional one.

Here is the link for the design document:

https://docs.google.com/document/d/1uOTX0MICxqu8fNanIsiyB8FV68CceGGpa8BJLP2u9o4/edit

### Why are the changes needed?
This is part of a larger project to implement named parameter support for 
user defined functions, built-in functions, and table valued functions.

### Does this PR introduce _any_ user-facing change?
Yes, the user would be able to call functions with argument lists that 
contain named arguments.

### How was this patch tested?
We add tests in the PlanParserSuite that will verify that the plan parsed 
is as intended.

Closes #41796 from learningchess2003/43922-new.

Authored-by: Richard Yu 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|   5 +
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |   1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |  14 ++-
 .../expressions/NamedArgumentExpression.scala  |  58 ++
 .../spark/sql/catalyst/parser/AstBuilder.scala |  37 +--
 .../spark/sql/errors/QueryCompilationErrors.scala  |   9 ++
 .../org/apache/spark/sql/internal/SQLConf.scala|   7 ++
 .../catalyst/parser/ExpressionParserSuite.scala|  18 +++
 .../sql/catalyst/parser/PlanParserSuite.scala  |  29 +
 .../named-function-arguments.sql.out   | 112 +++
 .../sql-tests/inputs/named-function-arguments.sql  |   5 +
 .../results/named-function-arguments.sql.out   | 122 +
 .../spark/sql/errors/QueryParsingErrorsSuite.scala |  38 ++-
 13 files changed, 443 insertions(+), 12 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 6db8c5e3bf1..1b2a1ce305a 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1708,6 +1708,11 @@
   "Not allowed to implement multiple UDF interfaces, UDF class 
."
 ]
   },
+  "NAMED_ARGUMENTS_SUPPORT_DISABLED" : {
+"message" : [
+  "Cannot call function  because named argument references 
are not enabled here. In this case, the named argument reference was 
. Set \"spark.sql.allowNamedFunctionArguments\" to \"true\" to turn 
on feature."
+]
+  },
   "NESTED_AGGREGATE_FUNCTION" : {
 "message" : [
   "It is not allowed to use an aggregate function in the argument of 
another aggregate function. Please use the inner aggregate function in a 
sub-query."
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
index 6c9b3a71266..fb440ef8d37 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
@@ -443,6 +443,7 @@ CONCAT_PIPE: '||';
 HAT: '^';
 COLON: ':';
 ARROW: '->';
+FAT_ARROW : '=>';
 HENT_START: '/*+';
 HENT_END: '*/';
 QUESTION: '?';
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index d1e672e9472..ab6c0d0861f 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -789,7 +789,7 @@ inlineTable
 ;
 
 functionTable
-: funcName=functionName LEFT_PAREN (expression (COMMA expression)*)? 
RIGHT_PAREN tableAlias
+: funcName=functionName LEFT_PAREN (functionArgument (COMMA 
functionArgument)*)? RIGHT_PAREN tableAlias
 ;
 
 tableAlias
@@ -862,6 +862,15 @@ expression
 : booleanExpression
 ;
 
+namedArgumentExpression
+: key=identifier FAT_ARROW value=expression
+;
+
+functionArgument
+: expre

[spark] branch master updated: [SPARK-44030][SQL][FOLLOW-UP] Move unapply from AnyTimestampType to AnyTimestampTypeExpression

2023-06-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 618b52097c0 [SPARK-44030][SQL][FOLLOW-UP] Move unapply from 
AnyTimestampType to AnyTimestampTypeExpression
618b52097c0 is described below

commit 618b52097c07105d734aaf9b2a22b372920b3f31
Author: Rui Wang 
AuthorDate: Fri Jun 30 08:38:39 2023 +0300

[SPARK-44030][SQL][FOLLOW-UP] Move unapply from AnyTimestampType to 
AnyTimestampTypeExpression

### What changes were proposed in this pull request?

Move unapply from AnyTimestampType to AnyTimestampTypeExpression.

### Why are the changes needed?

To align with the effort that we use separate type expression class to host 
`unapply`.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Existing Test

Closes #41771 from amaliujia/atomic_datatype_expression.

Authored-by: Rui Wang 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala  |  4 ++--
 .../spark/sql/catalyst/analysis/AnsiTypeCoercion.scala | 14 --
 .../apache/spark/sql/catalyst/analysis/TypeCoercion.scala  | 12 +++-
 .../org/apache/spark/sql/types/AbstractDataType.scala  |  3 ---
 .../org/apache/spark/sql/types/DataTypeExpression.scala|  5 +
 5 files changed, 22 insertions(+), 16 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 8a192a4c132..32cec909401 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -428,8 +428,8 @@ class Analyzer(override val catalogManager: CatalogManager) 
extends RuleExecutor
   UnaryMinus(r, mode == EvalMode.ANSI), ansiEnabled = mode == 
EvalMode.ANSI))
   case (_, CalendarIntervalType | _: DayTimeIntervalType) =>
 Cast(DatetimeSub(l, r, TimeAdd(l, UnaryMinus(r, mode == 
EvalMode.ANSI))), l.dataType)
-  case _ if AnyTimestampType.unapply(l) || AnyTimestampType.unapply(r) 
=>
-SubtractTimestamps(l, r)
+  case _ if AnyTimestampTypeExpression.unapply(l) ||
+AnyTimestampTypeExpression.unapply(r) => SubtractTimestamps(l, r)
   case (_, DateType) => SubtractDates(l, r)
   case (DateType, dt) if dt != StringType => DateSub(l, r)
   case _ => s
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
index d3f20f87493..5854f42a061 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
@@ -284,7 +284,7 @@ object AnsiTypeCoercion extends TypeCoercionBase {
   // Skip nodes who's children have not been resolved yet.
   case g if !g.childrenResolved => g
 
-  case g: GetDateField if AnyTimestampType.unapply(g.child) =>
+  case g: GetDateField if AnyTimestampTypeExpression.unapply(g.child) =>
 g.withNewChildren(Seq(Cast(g.child, DateType)))
 }
   }
@@ -294,14 +294,16 @@ object AnsiTypeCoercion extends TypeCoercionBase {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
-  case d @ DateAdd(AnyTimestampType(), _) => d.copy(startDate = 
Cast(d.startDate, DateType))
-  case d @ DateSub(AnyTimestampType(), _) => d.copy(startDate = 
Cast(d.startDate, DateType))
+  case d @ DateAdd(AnyTimestampTypeExpression(), _) =>
+d.copy(startDate = Cast(d.startDate, DateType))
+  case d @ DateSub(AnyTimestampTypeExpression(), _) =>
+d.copy(startDate = Cast(d.startDate, DateType))
 
-  case s @ SubtractTimestamps(DateTypeExpression(), AnyTimestampType(), _, 
_) =>
+  case s @ SubtractTimestamps(DateTypeExpression(), 
AnyTimestampTypeExpression(), _, _) =>
 s.copy(left = Cast(s.left, s.right.dataType))
-  case s @ SubtractTimestamps(AnyTimestampType(), DateTypeExpression(), _, 
_) =>
+  case s @ SubtractTimestamps(AnyTimestampTypeExpression(), 
DateTypeExpression(), _, _) =>
 s.copy(right = Cast(s.right, s.left.dataType))
-  case s @ SubtractTimestamps(AnyTimestampType(), AnyTimestampType(), _, _)
+  case s @ SubtractTimestamps(AnyTimestampTypeExpression(), 
AnyTimestampTypeExpression(), _, _)
 if s.left.dataType != s.right.dataType =>
 val newLeft = castIfNotSameType(s.left, TimestampN

[spark] branch master updated: [SPARK-44208][CORE][SQL] Assign clear error class names for some logic that directly uses exceptions

2023-06-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a9129defc0e [SPARK-44208][CORE][SQL] Assign clear error class names 
for some logic that directly uses exceptions
a9129defc0e is described below

commit a9129defc0ebbe68f20ec888352c30a90925d7ea
Author: panbingkun 
AuthorDate: Thu Jun 29 17:31:03 2023 +0300

[SPARK-44208][CORE][SQL] Assign clear error class names for some logic that 
directly uses exceptions

### What changes were proposed in this pull request?
The pr aims to assign clear error class names for some logic that directly 
uses exceptions, include:
- ALL_PARTITION_COLUMNS_NOT_ALLOWED
- INVALID_HIVE_COLUMN_NAME
- SPECIFY_BUCKETING_IS_NOT_ALLOWED
- SPECIFY_PARTITION_IS_NOT_ALLOWED
- UNSUPPORTED_ADD_FILE.DIRECTORY
- UNSUPPORTED_ADD_FILE.LOCAL_DIRECTORY

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update UT.
- Pass GA.

Closes #41740 from panbingkun/assign_new_name.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 42 +++---
 .../main/scala/org/apache/spark/SparkContext.scala |  7 ++--
 .../org/apache/spark/errors/SparkCoreErrors.scala  | 14 
 .../spark/sql/errors/QueryCompilationErrors.scala  |  2 +-
 .../spark/sql/execution/datasources/rules.scala| 16 +
 .../spark/sql/execution/command/DDLSuite.scala | 34 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   | 12 ---
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 12 +++
 8 files changed, 97 insertions(+), 42 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 192a0747dfd..6db8c5e3bf1 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -4,6 +4,11 @@
   "Non-deterministic expression  should not appear in the 
arguments of an aggregate function."
 ]
   },
+  "ALL_PARTITION_COLUMNS_NOT_ALLOWED" : {
+"message" : [
+  "Cannot use all columns for partition columns."
+]
+  },
   "ALTER_TABLE_COLUMN_DESCRIPTOR_DUPLICATE" : {
 "message" : [
   "ALTER TABLE  column  specifies descriptor 
\"\" more than once, which is invalid."
@@ -1180,6 +1185,11 @@
 ],
 "sqlState" : "22023"
   },
+  "INVALID_HIVE_COLUMN_NAME" : {
+"message" : [
+  "Cannot create the table  having the nested column 
 whose name contains invalid characters  in Hive 
metastore."
+]
+  },
   "INVALID_IDENTIFIER" : {
 "message" : [
   "The identifier  is invalid. Please, consider quoting it with 
back-quotes as ``."
@@ -2081,6 +2091,16 @@
   "sortBy must be used together with bucketBy."
 ]
   },
+  "SPECIFY_BUCKETING_IS_NOT_ALLOWED" : {
+"message" : [
+  "Cannot specify bucketing information if the table schema is not 
specified when creating and will be inferred at runtime."
+]
+  },
+  "SPECIFY_PARTITION_IS_NOT_ALLOWED" : {
+"message" : [
+  "It is not allowed to specify partition columns when the table schema is 
not defined. When the table schema is not provided, schema and partition 
columns will be inferred."
+]
+  },
   "SQL_CONF_NOT_FOUND" : {
 "message" : [
   "The SQL config  cannot be found. Please verify that the config 
exists."
@@ -2303,6 +2323,23 @@
   "Attempted to unset non-existent properties [] in table 
."
 ]
   },
+  "UNSUPPORTED_ADD_FILE" : {
+"message" : [
+  "Don't support add file."
+],
+"subClass" : {
+  "DIRECTORY" : {
+"message" : [
+  "The file  is a directory, consider to set 
\"spark.sql.legacy.addSingleFileInAddFile\" to \"false\"."
+]
+  },
+  "LOCAL_DIRECTORY" : {
+"message" : [
+  "The local directory  is not supported in a non-local master 
mode."
+]
+  }
+}
+  },
   "UNSUPPORTED_ARROWTYPE" : {
 "message" : [
   "Unsupported arrow type ."
@@ -3588,11 +3625,6 @@
   "Cannot use  for partition column."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1154" : {
-

[spark] branch branch-3.4 updated: [SPARK-44079][SQL][3.4] Fix `ArrayIndexOutOfBoundsException` when parse array as struct using PERMISSIVE mode with corrupt record

2023-06-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new ad29290a02f [SPARK-44079][SQL][3.4] Fix 
`ArrayIndexOutOfBoundsException` when parse array as struct using PERMISSIVE 
mode with corrupt record
ad29290a02f is described below

commit ad29290a02fb94a958fd21e301100338c9f5b82a
Author: Jia Fan 
AuthorDate: Thu Jun 29 16:38:02 2023 +0300

[SPARK-44079][SQL][3.4] Fix `ArrayIndexOutOfBoundsException` when parse 
array as struct using PERMISSIVE mode with corrupt record

### What changes were proposed in this pull request?
cherry pick #41662 , fix  parse array as struct bug on branch 3.4
### Why are the changes needed?
Fix the bug when parse array as struct using PERMISSIVE mode with corrupt 
record

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

Closes #41784 from Hisoka-X/SPARK-44079_3.4_cherry_pick.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/csv/UnivocityParser.scala |  4 ++--
 .../spark/sql/catalyst/json/JacksonParser.scala  | 20 +++-
 .../spark/sql/catalyst/util/BadRecordException.scala | 14 --
 .../spark/sql/catalyst/util/FailureSafeParser.scala  |  9 +++--
 .../sql/execution/datasources/json/JsonSuite.scala   | 15 +++
 5 files changed, 51 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
index 42e03630b14..b58649da61c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
@@ -318,7 +318,7 @@ class UnivocityParser(
 if (tokens == null) {
   throw BadRecordException(
 () => getCurrentInput,
-() => None,
+() => Array.empty,
 QueryExecutionErrors.malformedCSVRecordError(""))
 }
 
@@ -362,7 +362,7 @@ class UnivocityParser(
 } else {
   if (badRecordException.isDefined) {
 throw BadRecordException(
-  () => currentInput, () => requiredRow.headOption, 
badRecordException.get)
+  () => currentInput, () => Array(requiredRow.get), 
badRecordException.get)
   } else {
 requiredRow
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
index bf07d65caa0..d9bff3dc7ec 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
@@ -135,7 +135,7 @@ class JacksonParser(
 // List([str_a_2,null], [null,str_b_3])
 //
   case START_ARRAY if allowArrayAsStructs =>
-val array = convertArray(parser, elementConverter, isRoot = true)
+val array = convertArray(parser, elementConverter, isRoot = true, 
arrayAsStructs = true)
 // Here, as we support reading top level JSON arrays and take every 
element
 // in such an array as a row, this case is possible.
 if (array.numElements() == 0) {
@@ -517,7 +517,8 @@ class JacksonParser(
   private def convertArray(
   parser: JsonParser,
   fieldConverter: ValueConverter,
-  isRoot: Boolean = false): ArrayData = {
+  isRoot: Boolean = false,
+  arrayAsStructs: Boolean = false): ArrayData = {
 val values = ArrayBuffer.empty[Any]
 var badRecordException: Option[Throwable] = None
 
@@ -537,6 +538,9 @@ class JacksonParser(
 
 if (badRecordException.isEmpty) {
   arrayData
+} else if (arrayAsStructs) {
+  throw PartialResultArrayException(arrayData.toArray[InternalRow](schema),
+badRecordException.get)
 } else {
   throw PartialResultException(InternalRow(arrayData), 
badRecordException.get)
 }
@@ -570,7 +574,7 @@ class JacksonParser(
 // JSON parser currently doesn't support partial results for corrupted 
records.
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
-throw BadRecordException(() => recordLiteral(record), () => None, e)
+throw BadRecordException(() => recordLiteral(record), () => 
Array.empty, e)
   case e: CharConversionException if options.encoding.isEmpty =>
 val msg =
   """JSON parser cannot handle a character in its input.
@@ -578,11 +582,17 @@ class JacksonParser(
 |""".stripMargin + e

[spark] branch master updated: [MINOR][TESTS] Fix potential bug for AlterTableTest

2023-06-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6511a3e9020 [MINOR][TESTS] Fix potential bug for AlterTableTest
6511a3e9020 is described below

commit 6511a3e90206473985c2d6fd28d06eb7bcf8c98f
Author: panbingkun 
AuthorDate: Thu Jun 29 12:28:03 2023 +0300

[MINOR][TESTS] Fix potential bug for AlterTableTest

### What changes were proposed in this pull request?
The pr aims to fix potential bug for `AlterTableTest`.

### Why are the changes needed?
Fix bug.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #41783 from panbingkun/AlterTableTests_fix.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../spark/sql/connector/AlterTableTests.scala  | 373 +
 1 file changed, 164 insertions(+), 209 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala
index 2047212a4ea..122b3ab07e6 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala
@@ -42,7 +42,7 @@ trait AlterTableTests extends SharedSparkSession with 
QueryErrorsBase {
 if (catalogAndNamespace.isEmpty) {
   s"default.$tableName"
 } else {
-  s"${catalogAndNamespace}table_name"
+  s"$catalogAndNamespace$tableName"
 }
   }
 
@@ -63,7 +63,7 @@ trait AlterTableTests extends SharedSparkSession with 
QueryErrorsBase {
   }
 
   test("AlterTable: change rejected by implementation") {
-val t = s"${catalogAndNamespace}table_name"
+val t = fullTableName("table_name")
 withTable(t) {
   sql(s"CREATE TABLE $t (id int) USING $v2Format")
 
@@ -74,38 +74,35 @@ trait AlterTableTests extends SharedSparkSession with 
QueryErrorsBase {
   assert(exc.getMessage.contains("Unsupported table change"))
   assert(exc.getMessage.contains("Cannot drop all fields")) // from the 
implementation
 
-  val tableName = fullTableName(t)
-  val table = getTableMetadata(tableName)
+  val table = getTableMetadata(t)
 
-  assert(table.name === tableName)
+  assert(table.name === t)
   assert(table.schema === new StructType().add("id", IntegerType))
 }
   }
 
   test("AlterTable: add top-level column") {
-val t = s"${catalogAndNamespace}table_name"
+val t = fullTableName("table_name")
 withTable(t) {
   sql(s"CREATE TABLE $t (id int) USING $v2Format")
   sql(s"ALTER TABLE $t ADD COLUMN data string")
 
-  val tableName = fullTableName(t)
-  val table = getTableMetadata(tableName)
+  val table = getTableMetadata(t)
 
-  assert(table.name === tableName)
+  assert(table.name === t)
   assert(table.schema === new StructType().add("id", 
IntegerType).add("data", StringType))
 }
   }
 
   test("AlterTable: add column with NOT NULL") {
-val t = s"${catalogAndNamespace}table_name"
+val t = fullTableName("table_name")
 withTable(t) {
   sql(s"CREATE TABLE $t (id int) USING $v2Format")
   sql(s"ALTER TABLE $t ADD COLUMN data string NOT NULL")
 
-  val tableName = fullTableName(t)
-  val table = getTableMetadata(tableName)
+  val table = getTableMetadata(t)
 
-  assert(table.name === tableName)
+  assert(table.name === t)
   assert(table.schema === StructType(Seq(
 StructField("id", IntegerType),
 StructField("data", StringType, nullable = false
@@ -113,15 +110,14 @@ trait AlterTableTests extends SharedSparkSession with 
QueryErrorsBase {
   }
 
   test("AlterTable: add column with comment") {
-val t = s"${catalogAndNamespace}table_name"
+val t = fullTableName("table_name")
 withTable(t) {
   sql(s"CREATE TABLE $t (id int) USING $v2Format")
   sql(s"ALTER TABLE $t ADD COLUMN data string COMMENT 'doc'")
 
-  val tableName = fullTableName(t)
-  val table = getTableMetadata(tableName)
+  val table = getTableMetadata(t)
 
-  assert(table.name === tableName)
+  assert(table.name === t)
   assert(table.schema === StructType(Seq(
 StructField("id", IntegerType),
 StructField("data", StringType).withComment("doc"
@@ -129,7 +125,7 @@ trait AlterTableTests extends SharedSparkSession with 
QueryErrorsBase {
   }

[spark] branch master updated: [SPARK-44169][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304]

2023-06-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ffbd1a3b5b1 [SPARK-44169][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2300-2304]
ffbd1a3b5b1 is described below

commit ffbd1a3b5b17386759a378dee5ef5cf6df7f2d09
Author: Jiaan Geng 
AuthorDate: Thu Jun 29 12:26:24 2023 +0300

[SPARK-44169][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2300-2304]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class 
_LEGACY_ERROR_TEMP_[2300-2304].

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases updated and added new test cases.

Closes #41719 from beliefer/SPARK-44169.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  74 ---
 .../catalyst/analysis/ResolveInlineTables.scala|  12 +-
 .../catalyst/analysis/higherOrderFunctions.scala   |  14 +-
 .../analysis/ResolveLambdaVariablesSuite.scala |  18 +-
 .../spark/sql/execution/datasources/rules.scala|   4 +-
 .../sql-tests/analyzer-results/cte.sql.out |   4 +-
 .../analyzer-results/inline-table.sql.out  |  12 +-
 .../analyzer-results/postgreSQL/boolean.sql.out|   2 +-
 .../postgreSQL/window_part3.sql.out|   2 +-
 .../postgreSQL/window_part4.sql.out|   2 +-
 .../analyzer-results/udf/udf-inline-table.sql.out  |  12 +-
 .../test/resources/sql-tests/results/cte.sql.out   |   4 +-
 .../sql-tests/results/inline-table.sql.out |  12 +-
 .../sql-tests/results/postgreSQL/boolean.sql.out   |   2 +-
 .../results/postgreSQL/window_part3.sql.out|   2 +-
 .../results/postgreSQL/window_part4.sql.out|   2 +-
 .../sql-tests/results/udf/udf-inline-table.sql.out |  12 +-
 .../apache/spark/sql/ColumnExpressionSuite.scala   |  33 +++-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 219 +++--
 19 files changed, 297 insertions(+), 145 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index e441686432a..192a0747dfd 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -704,11 +704,6 @@
 ],
 "sqlState" : "42K04"
   },
-  "FAILED_SQL_EXPRESSION_EVALUATION" : {
-"message" : [
-  "Failed to evaluate the SQL expression: . Please check your 
syntax and ensure all required tables and columns are available."
-]
-  },
   "FIELD_NOT_FOUND" : {
 "message" : [
   "No such struct field  in ."
@@ -1197,6 +1192,28 @@
 ],
 "sqlState" : "22003"
   },
+  "INVALID_INLINE_TABLE" : {
+"message" : [
+  "Invalid inline table."
+],
+"subClass" : {
+  "CANNOT_EVALUATE_EXPRESSION_IN_INLINE_TABLE" : {
+"message" : [
+  "Cannot evaluate the expression  in inline table definition."
+]
+  },
+  "FAILED_SQL_EXPRESSION_EVALUATION" : {
+"message" : [
+  "Failed to evaluate the SQL expression . Please check your 
syntax and ensure all required tables and columns are available."
+]
+  },
+  "INCOMPATIBLE_TYPES_IN_INLINE_TABLE" : {
+"message" : [
+  "Found incompatible types in the column  for inline table."
+]
+  }
+}
+  },
   "INVALID_JSON_ROOT_FIELD" : {
 "message" : [
   "Cannot convert JSON root field to target Spark type."
@@ -1209,6 +1226,23 @@
 ],
 "sqlState" : "22032"
   },
+  "INVALID_LAMBDA_FUNCTION_CALL" : {
+"message" : [
+  "Invalid lambda function call."
+],
+"subClass" : {
+  "DUPLICATE_ARG_NAMES" : {
+"message" : [
+  "The lambda function has duplicate arguments . Please, 
consider to rename the argument names or set  to \"true\"."
+]
+  },
+  "NUM_ARGS_MISMATCH" : {
+"message" : [
+  "A higher order function expects  arguments, but 
got ."
+]
+  }
+}
+  },
   "INVALID_LATERAL_JOIN_TYPE" : {
 "message" : [
   "The  JOIN with LATERAL correlation is not allowed because an 
OUTER subquery cannot correlate to its join partner. Remove the LATERAL 
co

[spark] branch master updated (af536459501 -> 70f34278cbf)

2023-06-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from af536459501 [SPARK-44237][CORE] Simplify DirectByteBuffer constructor 
lookup logic
 add 70f34278cbf [SPARK-44079][SQL] Fix `ArrayIndexOutOfBoundsException` 
when parse array as struct using PERMISSIVE mode with corrupt record

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/csv/UnivocityParser.scala |  4 ++--
 .../spark/sql/catalyst/json/JacksonParser.scala  | 20 +++-
 .../spark/sql/catalyst/util/BadRecordException.scala | 14 --
 .../spark/sql/catalyst/util/FailureSafeParser.scala  |  9 +++--
 .../sql/execution/datasources/json/JsonSuite.scala   | 15 +++
 5 files changed, 51 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f26bdb7bfde -> d14a6ecd9e1)

2023-06-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f26bdb7bfde [SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0
 add d14a6ecd9e1 [SPARK-40850][SQL] Fix test case interpreted queries may 
execute Codegen

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43914][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]

2023-06-27 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1c8c47cb55d [SPARK-43914][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2433-2437]
1c8c47cb55d is described below

commit 1c8c47cb55da75526fef4dd41ed0734b01e71814
Author: Jiaan Geng 
AuthorDate: Wed Jun 28 08:22:01 2023 +0300

[SPARK-43914][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2433-2437]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class 
_LEGACY_ERROR_TEMP_[2433-2437].

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases updated.

Closes #41476 from beliefer/SPARK-43914.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 34 --
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 65 +++
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 74 --
 .../org/apache/spark/sql/DataFrameSuite.scala  | 14 
 4 files changed, 120 insertions(+), 67 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 342af0ffa6c..e441686432a 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -5637,40 +5637,6 @@
   "Cannot change nullable column to non-nullable: ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2433" : {
-"message" : [
-  "Only a single table generating function is allowed in a SELECT clause, 
found:",
-  "."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2434" : {
-"message" : [
-  "Failure when resolving conflicting references in Join:",
-  "",
-  "Conflicting attributes: ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2435" : {
-"message" : [
-  "Failure when resolving conflicting references in Intersect:",
-  "",
-  "Conflicting attributes: ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2436" : {
-"message" : [
-  "Failure when resolving conflicting references in Except:",
-  "",
-  "Conflicting attributes: ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2437" : {
-"message" : [
-  "Failure when resolving conflicting references in AsOfJoin:",
-  "",
-  "Conflicting attributes: ."
-]
-  },
   "_LEGACY_ERROR_TEMP_2446" : {
 "message" : [
   "Operation not allowed:  only works on table with location 
provided: "
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 7c0e8f1490d..a0296d27361 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -674,9 +674,8 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 }
 
   case p @ Project(exprs, _) if containsMultipleGenerators(exprs) =>
-p.failAnalysis(
-  errorClass = "_LEGACY_ERROR_TEMP_2433",
-  messageParameters = Map("sqlExprs" -> 
exprs.map(_.sql).mkString(",")))
+val generators = exprs.filter(expr => 
expr.exists(_.isInstanceOf[Generator]))
+throw QueryCompilationErrors.moreThanOneGeneratorError(generators, 
"SELECT")
 
   case p @ Project(projectList, _) =>
 projectList.foreach(_.transformDownWithPruning(
@@ -686,36 +685,48 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 })
 
   case j: Join if !j.duplicateResolved =>
-val conflictingAttributes = 
j.left.outputSet.intersect(j.right.outputSet)
-j.failAnalysis(
-  errorClass = "_LEGACY_ERROR_TEMP_2434",
-  messageParameters = Map(
-"plan" -> plan.toString,
-"conflictingAttributes" -> 
conflictingAttributes.mkString(",")))
+val conflictingAttributes =
+  
j.left.outputSet.intersect(j.right.outputSet).map(toSQLExpr(_)).mkString(", ")
+throw SparkException.internalError(
+  msg = s"""
+

[spark] branch master updated: [SPARK-44171][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes

2023-06-27 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new be8b07a1534 [SPARK-44171][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes
be8b07a1534 is described below

commit be8b07a15348d8fea15c33d35a75969ca1693ff6
Author: panbingkun 
AuthorDate: Tue Jun 27 19:31:30 2023 +0300

[SPARK-44171][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes

### What changes were proposed in this pull request?
The pr aims to assign names to the error class 
_LEGACY_ERROR_TEMP_[2279-2282] and delete some unused error classes,  details 
as follows:
_LEGACY_ERROR_TEMP_0036 -> `Delete`
_LEGACY_ERROR_TEMP_1341 -> `Delete`
_LEGACY_ERROR_TEMP_1342 -> `Delete`
_LEGACY_ERROR_TEMP_1304 -> `Delete`
_LEGACY_ERROR_TEMP_2072 -> `Delete`
_LEGACY_ERROR_TEMP_2279 -> `Delete`
_LEGACY_ERROR_TEMP_2280 -> UNSUPPORTED_FEATURE.COMMENT_NAMESPACE
_LEGACY_ERROR_TEMP_2281 -> UNSUPPORTED_FEATURE.REMOVE_NAMESPACE_COMMENT
_LEGACY_ERROR_TEMP_2282 -> UNSUPPORTED_FEATURE.DROP_NAMESPACE_RESTRICT

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #41721 from panbingkun/SPARK-44171.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala| 19 +--
 core/src/main/resources/error/error-classes.json   | 60 ++
 .../spark/sql/errors/QueryCompilationErrors.scala  | 16 --
 .../spark/sql/errors/QueryExecutionErrors.scala| 30 +--
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  6 +--
 5 files changed, 47 insertions(+), 84 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
index a7ef8d4e104..d58146fecdf 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLNamespaceSuite.scala
@@ -73,7 +73,8 @@ class MySQLNamespaceSuite extends DockerJDBCIntegrationSuite 
with V2JDBCNamespac
   exception = intercept[SparkSQLFeatureNotSupportedException] {
 catalog.createNamespace(Array("foo"), Map("comment" -> "test 
comment").asJava)
   },
-  errorClass = "_LEGACY_ERROR_TEMP_2280"
+  errorClass = "UNSUPPORTED_FEATURE.COMMENT_NAMESPACE",
+  parameters = Map("namespace" -> "`foo`")
 )
 assert(catalog.namespaceExists(Array("foo")) === false)
 catalog.createNamespace(Array("foo"), Map.empty[String, String].asJava)
@@ -84,13 +85,25 @@ class MySQLNamespaceSuite extends 
DockerJDBCIntegrationSuite with V2JDBCNamespac
   Array("foo"),
   NamespaceChange.setProperty("comment", "comment for foo"))
   },
-  errorClass = "_LEGACY_ERROR_TEMP_2280")
+  errorClass = "UNSUPPORTED_FEATURE.COMMENT_NAMESPACE",
+  parameters = Map("namespace" -> "`foo`")
+)
 
 checkError(
   exception = intercept[SparkSQLFeatureNotSupportedException] {
 catalog.alterNamespace(Array("foo"), 
NamespaceChange.removeProperty("comment"))
   },
-  errorClass = "_LEGACY_ERROR_TEMP_2281")
+  errorClass = "UNSUPPORTED_FEATURE.REMOVE_NAMESPACE_COMMENT",
+  parameters = Map("namespace" -> "`foo`")
+)
+
+checkError(
+  exception = intercept[SparkSQLFeatureNotSupportedException] {
+catalog.dropNamespace(Array("foo"), cascade = false)
+  },
+  errorClass = "UNSUPPORTED_FEATURE.DROP_NAMESPACE",
+  parameters = Map("namespace" -> "`foo`")
+)
 catalog.dropNamespace(Array("foo"), cascade = true)
 assert(catalog.namespaceExists(Array("foo")) === false)
   }
diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 78b54d5230d..342af0ffa6c 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -2383,11 +2383,21 @@
   "Combination of ORDER BY/SORT BY/DISTRIBUTE BY/CLUSTER BY."
 ]
   },
+

[spark] branch master updated: [SPARK-44189][CONNECT][PYTHON] Support positional parameters by `sql()`

2023-06-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e98987220ae [SPARK-44189][CONNECT][PYTHON] Support positional 
parameters by `sql()`
e98987220ae is described below

commit e98987220ae191ecc10944026fee9c57ddf478c1
Author: Max Gekk 
AuthorDate: Mon Jun 26 19:42:17 2023 +0300

[SPARK-44189][CONNECT][PYTHON] Support positional parameters by `sql()`

### What changes were proposed in this pull request?
In the PR, I propose to extend the `sql()` method of Python connect client, 
and support positional parameters as list of Python objects that can be 
converted to literal expressions.

```python
def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = 
None) -> DataFrame:
```

where

- **args** is a dictionary of parameter names to Python objects or a list 
of Python objects that can be converted to SQL literal expressions. See the 
[link](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) regarding 
the supported value types in PySpark. For example: _1, "Steven", 
datetime.date(2023, 4, 2)_. The same as in Scala/Java API, a value can be also 
a `Column` of literal expression, in that case it is taken as is.

For example:
```python
 >>> connect.sql("SELECT * FROM {df} WHERE {df[B]} > ? and ? < 
{df[A]}", [5, 2], df=mydf).show()
 +---+---+
 |  A|  B|
 +---+---+
 |  3|  6|
 +---+---+
```

### Why are the changes needed?
To achieve feature parity with the PySpark API.

### Does this PR introduce _any_ user-facing change?
No, the PR just extends the existing API.

### How was this patch tested?
By running new test:
```
$ python/run-tests --parallelism=1 --testnames 
'pyspark.sql.tests.connect.test_connect_basic 
SparkConnectBasicTests.test_sql_with_pos_args'
```
and the renamed test:
```
$ python/run-tests --parallelism=1 --testnames 
'pyspark.sql.tests.connect.test_connect_basic 
SparkConnectBasicTests.test_sql_with_named_args'
```

Closes #41739 from MaxGekk/positional-params-python-connect.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 python/pyspark/sql/connect/plan.py | 36 --
 python/pyspark/sql/connect/session.py  |  2 +-
 .../sql/tests/connect/test_connect_basic.py|  7 -
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/python/pyspark/sql/connect/plan.py 
b/python/pyspark/sql/connect/plan.py
index 406f65080d1..fabab98d9b2 100644
--- a/python/pyspark/sql/connect/plan.py
+++ b/python/pyspark/sql/connect/plan.py
@@ -1019,12 +1019,15 @@ class SubqueryAlias(LogicalPlan):
 
 
 class SQL(LogicalPlan):
-def __init__(self, query: str, args: Optional[Dict[str, Any]] = None) -> 
None:
+def __init__(self, query: str, args: Optional[Union[Dict[str, Any], List]] 
= None) -> None:
 super().__init__(None)
 
 if args is not None:
-for k, v in args.items():
-assert isinstance(k, str)
+if isinstance(args, Dict):
+for k, v in args.items():
+assert isinstance(k, str)
+else:
+assert isinstance(args, List)
 
 self._query = query
 self._args = args
@@ -1034,8 +1037,16 @@ class SQL(LogicalPlan):
 plan.sql.query = self._query
 
 if self._args is not None and len(self._args) > 0:
-for k, v in self._args.items():
-
plan.sql.args[k].CopyFrom(LiteralExpression._from_value(v).to_plan(session).literal)
+if isinstance(self._args, Dict):
+for k, v in self._args.items():
+plan.sql.args[k].CopyFrom(
+
LiteralExpression._from_value(v).to_plan(session).literal
+)
+else:
+for v in self._args:
+plan.sql.pos_args.append(
+
LiteralExpression._from_value(v).to_plan(session).literal
+)
 
 return plan
 
@@ -1043,10 +1054,17 @@ class SQL(LogicalPlan):
 cmd = proto.Command()
 cmd.sql_command.sql = self._query
 if self._args is not None and len(self._args) > 0:
-for k, v in self._args.items():
-cmd.sql_command.args[k].CopyFrom(
-LiteralExpression._from_value(v).to_plan(session).literal
-)
+if isinstance(self._args, Dict):
+for k, v in self._args.items():
+cmd.sql_command.args[k].CopyFrom(
+
LiteralExpression._from_value(v).to_plan(session).literal
+)
+

[spark] branch master updated: [SPARK-44143][SQL][TESTS] Use checkError() to check Exception in DDLSuite

2023-06-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 67abc430140 [SPARK-44143][SQL][TESTS] Use checkError() to check 
Exception in *DDL*Suite
67abc430140 is described below

commit 67abc430140558e60c785b158e9199dc884fb15c
Author: panbingkun 
AuthorDate: Mon Jun 26 09:28:02 2023 +0300

[SPARK-44143][SQL][TESTS] Use checkError() to check Exception in *DDL*Suite

### What changes were proposed in this pull request?
The pr aims to use `checkError()` to check `Exception` in `*DDL*Suite`, 
include:
- sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite
- sql/core/src/test/scala/org/apache/spark/sql/sources/DDLSourceLoadSuite
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite
- 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/Hive_2_1_DDLSuite

### Why are the changes needed?
Migration on checkError() will make the tests independent from the text of 
error messages.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #41699 from panbingkun/DDLSuite.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../spark/sql/execution/command/DDLSuite.scala | 454 
 .../spark/sql/sources/DDLSourceLoadSuite.scala |  30 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 769 ++---
 .../sql/hive/execution/Hive_2_1_DDLSuite.scala |  17 +-
 4 files changed, 865 insertions(+), 405 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
index 21e6980db8f..dd126027b36 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
@@ -189,10 +189,13 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with 
SharedSparkSession {
   sql("CREATE TABLE s(a INT, b INT) USING parquet")
   val source = catalog.getTableMetadata(TableIdentifier("s"))
   assert(source.provider == Some("parquet"))
-  val e = intercept[AnalysisException] {
-sql("CREATE TABLE t LIKE s USING org.apache.spark.sql.hive.orc")
-  }.getMessage
-  assert(e.contains("Hive built-in ORC data source must be used with Hive 
support enabled"))
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("CREATE TABLE t LIKE s USING org.apache.spark.sql.hive.orc")
+},
+errorClass = "_LEGACY_ERROR_TEMP_1138",
+parameters = Map.empty
+  )
 }
   }
 
@@ -284,13 +287,6 @@ trait DDLSuiteBase extends SQLTestUtils {
 }
   }
 
-  protected def assertUnsupported(query: String): Unit = {
-val e = intercept[AnalysisException] {
-  sql(query)
-}
-assert(e.getMessage.toLowerCase(Locale.ROOT).contains("operation not 
allowed"))
-  }
-
   protected def maybeWrapException[T](expectException: Boolean)(body: => T): 
Unit = {
 if (expectException) intercept[AnalysisException] { body } else body
   }
@@ -431,9 +427,11 @@ abstract class DDLSuite extends QueryTest with 
DDLSuiteBase {
|$partitionClause
  """.stripMargin
   if (userSpecifiedSchema.isEmpty && userSpecifiedPartitionCols.nonEmpty) {
-val e = intercept[AnalysisException](sql(sqlCreateTable)).getMessage
-assert(e.contains(
-  "not allowed to specify partition columns when the table schema is 
not defined"))
+checkError(
+  exception = intercept[AnalysisException](sql(sqlCreateTable)),
+  errorClass = null,
+  parameters = Map.empty
+)
   } else {
 sql(sqlCreateTable)
 val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName))
@@ -615,17 +613,21 @@ abstract class DDLSuite extends QueryTest with 
DDLSuiteBase {
 .option("path", dir1.getCanonicalPath)
 .saveAsTable("path_test")
 
-  val ex = intercept[AnalysisException] {
-Seq((3L, "c")).toDF("v1", "v2")
-  .write
-  .mode(SaveMode.Append)
-  .format("json")
-  .option("path", dir2.getCanonicalPath)
-  .saveAsTable("path_test")
-  }.getMessage
-  assert(ex.contains(
-s"The location of the existing table 
`$SESSION_CATALOG_NAME`.`default`.`path_test`"))
-
+  checkErrorMatchPVals(
+exception =

[spark] branch master updated: [MINOR][CONNECT][TESTS] Check named parameters in `sql()`

2023-06-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 37c898c63b1 [MINOR][CONNECT][TESTS] Check named parameters in `sql()`
37c898c63b1 is described below

commit 37c898c63b1fd9fcb9773313246ff28e631eb28f
Author: Max Gekk 
AuthorDate: Mon Jun 26 09:17:56 2023 +0300

[MINOR][CONNECT][TESTS] Check named parameters in `sql()`

### What changes were proposed in this pull request?
In the PR, I propose to add new tests to check named parameters in `sql()` 
of Scala connect client.

### Why are the changes needed?
To improve test coverage. Before the PR, the feature has not been tested at 
all.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly 
*.ClientE2ETestSuite"
```

Closes #41726 from MaxGekk/test-named-params-proto.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala  | 11 +++
 1 file changed, 11 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
index b24e445964a..0ababaa0af1 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
@@ -960,6 +960,17 @@ class ClientE2ETestSuite extends RemoteSparkSession with 
SQLHelper with PrivateM
 assert(result2(0).getInt(0) === 1)
 assert(result2(0).getString(1) === "abc")
   }
+
+  test("sql() with named parameters") {
+val result0 = spark.sql("select 1", Map.empty[String, Any]).collect()
+assert(result0.length == 1 && result0(0).getInt(0) === 1)
+
+val result1 = spark.sql("select :abc", Map("abc" -> 1)).collect()
+assert(result1.length == 1 && result1(0).getInt(0) === 1)
+
+val result2 = spark.sql("select :c0 limit :l0", Map("l0" -> 1, "c0" -> 
"abc")).collect()
+assert(result2.length == 1 && result2(0).getString(0) === "abc")
+  }
 }
 
 private[sql] case class MyType(id: Long, a: Double, b: Double)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44140][SQL][PYTHON] Support positional parameters in Python `sql()`

2023-06-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 532a8325f5a [SPARK-44140][SQL][PYTHON] Support positional parameters 
in Python `sql()`
532a8325f5a is described below

commit 532a8325f5a3f11974c383cd1e344bb2ed56e9d8
Author: Max Gekk 
AuthorDate: Thu Jun 22 16:38:36 2023 +0300

[SPARK-44140][SQL][PYTHON] Support positional parameters in Python `sql()`

### What changes were proposed in this pull request?
In the PR, I propose to extend PySpark API and extend the `sql` method by:
```python
def sql(
  self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = None, 
**kwargs: Any
) -> DataFrame:
```
which accepts an list of Python objects that can be converted to SQL 
literal expressions.

For example:
```python
spark.sql("SELECT * FROM {df} WHERE {df[B]} > ? and ? < {df[A]}", args=[5, 
2], df=mydf).show()
```
The `sql()` method parses the input SQL statement and replaces the 
positional parameters by the literal values.

### Why are the changes needed?
1. To conform the SQL standard and JDBC/ODBC protocol.
2. To improve user experience with PySpark via
- Using Spark as remote service (microservice).
- Write SQL code that will power reports, dashboards, charts and other 
data presentation solutions that need to account for criteria modifiable by 
users through an interface.
- Build a generic integration layer based on the PySpark API. The goal 
is to expose managed data to a wide application ecosystem with a microservice 
architecture. It is only natural in such a setup to ask for modular and 
reusable SQL code, that can be executed repeatedly with different parameter 
values.

3. To achieve feature parity with other systems that support positional 
parameters.

### Does this PR introduce _any_ user-facing change?
No, the changes extend the existing API.

### How was this patch tested?
By running new checks:
```
$ python/run-tests --parallelism=1 --testnames 'pyspark.sql.session 
SparkSession.sql'
$ python/run-tests --parallelism=1 --testnames 
'pyspark.pandas.sql_formatter'
```

Closes #41695 from MaxGekk/parametrized-query-pos-param-python.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 python/pyspark/pandas/sql_formatter.py | 20 -
 python/pyspark/sql/session.py  | 40 +++---
 2 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/python/pyspark/pandas/sql_formatter.py 
b/python/pyspark/pandas/sql_formatter.py
index 4387a1e0909..350152a2cdb 100644
--- a/python/pyspark/pandas/sql_formatter.py
+++ b/python/pyspark/pandas/sql_formatter.py
@@ -43,7 +43,7 @@ _CAPTURE_SCOPES = 3
 def sql(
 query: str,
 index_col: Optional[Union[str, List[str]]] = None,
-args: Optional[Dict[str, Any]] = None,
+args: Optional[Union[Dict[str, Any], List]] = None,
 **kwargs: Any,
 ) -> DataFrame:
 """
@@ -102,18 +102,21 @@ def sql(
 e  f   3  6
 
 Also note that the index name(s) should be matched to the existing 
name.
-args : dict
-A dictionary of parameter names to Python objects that can be 
converted to
-SQL literal expressions. See
+args : dict or list
+A dictionary of parameter names to Python objects or a list of Python 
objects
+that can be converted to SQL literal expressions. See
 https://spark.apache.org/docs/latest/sql-ref-datatypes.html;>
 Supported Data Types for supported value types in Python.
 For example, dictionary keys: "rank", "name", "birthdate";
 dictionary values: 1, "Steven", datetime.date(2023, 4, 2).
-Dict value can be also a `Column` of literal expression, in that case 
it is taken as is.
+A value can be also a `Column` of literal expression, in that case it 
is taken as is.
 
 
 .. versionadded:: 3.4.0
 
+.. versionchanged:: 3.5.0
+Added positional parameters.
+
 kwargs
 other variables that the user want to set that can be referenced in 
the query
 
@@ -174,6 +177,13 @@ def sql(
id
 0   8
 1   9
+
+Or positional parameters marked by `?` in the SQL query by SQL literals.
+
+>>> ps.sql("SELECT * FROM range(10) WHERE id > ?", args=[7])
+   id
+0   8
+1   9
 """
 if os.environ.get("PYSPARK_PANDAS_SQL_LEGACY") == "1":
 from pyspark.pandas import sql_processor
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 823164475ea..47b73700f0c 100644
--- a/python/pyspark/sql/sess

[spark] branch master updated: [SPARK-44066][SQL] Support positional parameters in Scala/Java `sql()`

2023-06-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1b4048bf62d [SPARK-44066][SQL] Support positional parameters in 
Scala/Java `sql()`
1b4048bf62d is described below

commit 1b4048bf62dddae7d324c4b12aa409a1bd456dc5
Author: Max Gekk 
AuthorDate: Thu Jun 22 09:40:30 2023 +0300

[SPARK-44066][SQL] Support positional parameters in Scala/Java `sql()`

### What changes were proposed in this pull request?
In the PR, I propose to extend SparkSession API and override the `sql` 
method by:
```scala
  def sql(sqlText: String, args: Array[_]): DataFrame
```
which accepts an array of Java/Scala objects that can be converted to SQL 
literal expressions.

And the first argument `sqlText` might have named parameters in the 
positions of constants like literal values. A value can be also a `Column` of 
literal expression, in that case it is taken as is.

For example:
```scala
  spark.sql(
sqlText = "SELECT * FROM tbl WHERE date > ? LIMIT ?",
args = Array(LocalDate.of(2023, 6, 15), 100))
```
The new `sql()` method parses the input SQL statement and replaces the 
positional parameters by the literal values.

### Why are the changes needed?
1. To conform the SQL standard and JDBC/ODBC protocol.
2. To improve user experience with Spark SQL via
- Using Spark as remote service (microservice).
- Write SQL code that will power reports, dashboards, charts and other 
data presentation solutions that need to account for criteria modifiable by 
users through an interface.
- Build a generic integration layer based on the SQL API. The goal is 
to expose managed data to a wide application ecosystem with a microservice 
architecture. It is only natural in such a setup to ask for modular and 
reusable SQL code, that can be executed repeatedly with different parameter 
values.

3. To achieve feature parity with other systems that support positional 
parameters.

### Does this PR introduce _any_ user-facing change?
No, the changes extend the existing API.

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *AnalysisSuite"
$ build/sbt "test:testOnly *PlanParserSuite"
$ build/sbt "test:testOnly *ParametersSuite"
```
and the affected test suites:
```
$ build/sbt "sql/testOnly *QueryExecutionErrorsSuite"
```

Closes #41568 from MaxGekk/parametrized-query-pos-param.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../CheckConnectJvmClientCompatibility.scala   |   2 +
 .../sql/connect/planner/SparkConnectPlanner.scala  |   4 +-
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |   1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |   5 +-
 .../spark/sql/catalyst/analysis/parameters.scala   |  95 ++--
 .../spark/sql/catalyst/parser/AstBuilder.scala |  14 +-
 .../sql/catalyst/analysis/AnalysisSuite.scala  |  22 +-
 .../sql/catalyst/parser/PlanParserSuite.scala  |  25 +-
 .../scala/org/apache/spark/sql/SparkSession.scala  |  34 ++-
 .../apache/spark/sql/JavaSparkSessionSuite.java|  28 +++
 .../org/apache/spark/sql/ParametersSuite.scala | 265 +++--
 .../sql/errors/QueryExecutionErrorsSuite.scala |  10 +-
 12 files changed, 448 insertions(+), 57 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
index 6b648fd152b..acc469672b4 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
@@ -227,6 +227,8 @@ object CheckConnectJvmClientCompatibility {
   
ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.createDataset"),
   
ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.executeCommand"),
   
ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.this"),
+  // TODO(SPARK-44068): Support positional parameters in Scala connect 
client
+  ProblemFilters.exclude[Problem]("org.apache.spark.sql.SparkSession.sql"),
 
   // RuntimeConfig
   
ProblemFilters.exclude[Problem]("org.apache.spark.sql.RuntimeConfig.this"),
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/sca

[spark] branch master updated: [SPARK-43915][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2438-2445]

2023-06-21 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bbcc438e5b3 [SPARK-43915][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2438-2445]
bbcc438e5b3 is described below

commit bbcc438e5b3aef67bf430b6bb6e4f893d8e66d13
Author: Jiaan Geng 
AuthorDate: Wed Jun 21 21:20:01 2023 +0300

[SPARK-43915][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2438-2445]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class 
_LEGACY_ERROR_TEMP_[2438-2445].

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases updated.

Closes #41553 from beliefer/SPARK-43915.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 47 +-
 python/pyspark/sql/tests/test_udtf.py  |  8 +++-
 .../spark/sql/catalyst/analysis/Analyzer.scala |  4 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 23 +--
 .../sql/catalyst/analysis/AnalysisSuite.scala  | 28 -
 .../analyzer-results/group-analytics.sql.out   |  2 +-
 .../analyzer-results/join-lateral.sql.out  |  4 +-
 .../udf/udf-group-analytics.sql.out|  2 +-
 .../sql-tests/results/group-analytics.sql.out  |  2 +-
 .../sql-tests/results/join-lateral.sql.out |  4 +-
 .../results/udf/udf-group-analytics.sql.out|  2 +-
 .../spark/sql/DataFrameSetOperationsSuite.scala| 44 ++--
 .../sql/connector/DataSourceV2FunctionSuite.scala  | 13 +-
 .../sql/connector/DeleteFromTableSuiteBase.scala   | 15 +--
 .../connector/DeltaBasedDeleteFromTableSuite.scala | 20 +
 .../sql/connector/DeltaBasedUpdateTableSuite.scala | 21 ++
 .../connector/GroupBasedDeleteFromTableSuite.scala | 22 +-
 .../sql/connector/GroupBasedUpdateTableSuite.scala | 23 ++-
 .../spark/sql/connector/UpdateTableSuiteBase.scala | 15 +--
 19 files changed, 195 insertions(+), 104 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 1d2f25b72f3..264d9b7c3a0 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -643,6 +643,11 @@
 ],
 "sqlState" : "23505"
   },
+  "DUPLICATED_METRICS_NAME" : {
+"message" : [
+  "The metric name is not unique: . The same name cannot be 
used for metrics with different results. However multiple instances of metrics 
with with same result and name are allowed (e.g. self-joins)."
+]
+  },
   "DUPLICATE_CLAUSES" : {
 "message" : [
   "Found duplicate clauses: . Please, remove one of them."
@@ -1237,6 +1242,11 @@
   }
 }
   },
+  "INVALID_NON_DETERMINISTIC_EXPRESSIONS" : {
+"message" : [
+  "The operator expects a deterministic expression, but the actual 
expression is ."
+]
+  },
   "INVALID_NUMERIC_LITERAL_RANGE" : {
 "message" : [
   "Numeric literal  is outside the valid range for 
 with minimum value of  and maximum value of . 
Please adjust the value accordingly."
@@ -1512,6 +1522,11 @@
 ],
 "sqlState" : "42604"
   },
+  "INVALID_UDF_IMPLEMENTATION" : {
+"message" : [
+  "Function  does not implement ScalarFunction or 
AggregateFunction."
+]
+  },
   "INVALID_URL" : {
 "message" : [
   "The url is invalid: . If necessary set  to \"false\" 
to bypass this error."
@@ -2458,6 +2473,11 @@
   " is a reserved namespace property, ."
 ]
   },
+  "SET_OPERATION_ON_MAP_TYPE" : {
+"message" : [
+  "Cannot have MAP type columns in DataFrame which calls set 
operations (INTERSECT, EXCEPT, etc.), but the type of column  is 
."
+]
+  },
   "SET_PROPERTIES_AND_DBPROPERTIES" : {
 "message" : [
   "set PROPERTIES and DBPROPERTIES at the same time."
@@ -5659,33 +5679,6 @@
   "Conflicting attributes: ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2438" : {
-"message" : [
-  "Cannot have map type columns in DataFrame which calls set 
operations(intersect, except, etc.), but the type of column  is 
."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2439" : {
-"message" : [
-  "nondeterministic expressions are

[spark] branch master updated: [SPARK-44056][SQL] Include UDF name in UDF execution failure error message when available

2023-06-21 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6165f316063 [SPARK-44056][SQL] Include UDF name in UDF execution 
failure error message when available
6165f316063 is described below

commit 6165f31606344efdf35f060d07cee46b85948e38
Author: Rob Reeves 
AuthorDate: Wed Jun 21 18:00:36 2023 +0300

[SPARK-44056][SQL] Include UDF name in UDF execution failure error message 
when available

### What changes were proposed in this pull request?
This modifies the error message when a Scala UDF fails to execute by 
including the UDF name if it is available.

### Why are the changes needed?
If there are multiple UDFs defined in the same location with the same 
method signature it can be hard to identify which UDF causes the issue. The 
current function class alone does not give enough information on its own. 
Adding the UDF name, if available, makes it easier to identify the exact 
problematic UDF.

This is particularly helpful when the exception stack trace is not emitted 
due to a JVM performance optimization and codegen is enabled. Example in 3.1.1:
```
Caused by: org.apache.spark.SparkException: Failed to execute user defined 
function(UDFRegistration$$Lambda$666/1969461119: (bigint, string) => string)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.subExpr_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
 Source)
at 
org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:249)
at 
org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:248)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:131)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:523)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1535)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:526)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
```

### Does this PR introduce _any_ user-facing change?
Yes, it adds the UDF name to the UDF failure error message. Before this 
change:
> [FAILED_EXECUTE_UDF] Failed to execute user defined function 
(QueryExecutionErrorsSuite$$Lambda$970/181260145: (string, int) => string).

After this change:
> [FAILED_EXECUTE_UDF] Failed to execute user defined function (nextChar in 
QueryExecutionErrorsSuite$$Lambda$970/181260145: (string, int) => string).

### How was this patch tested?
Unit test added.

Closes #41599 from robreeves/roreeves/roreeves/udf_error.

Lead-authored-by: Rob Reeves 
Co-authored-by: Rob Reeves 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |  6 ++--
 .../spark/sql/errors/QueryExecutionErrors.scala|  4 +--
 .../sql/errors/QueryExecutionErrorsSuite.scala | 35 ++
 .../spark/sql/hive/execution/HiveUDFSuite.scala|  6 ++--
 4 files changed, 39 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 137a8976a40..40274a83340 100644
--- 
a/sql/catalyst/src

[spark] branch master updated: [SPARK-44004][SQL] Assign name & improve error message for frequent LEGACY errors

2023-06-21 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 94031ead786 [SPARK-44004][SQL] Assign name & improve error message for 
frequent LEGACY errors
94031ead786 is described below

commit 94031ead78682bd5c1adab8b87e61055968c8998
Author: itholic 
AuthorDate: Wed Jun 21 10:36:04 2023 +0300

[SPARK-44004][SQL] Assign name & improve error message for frequent LEGACY 
errors

### What changes were proposed in this pull request?

This PR proposes to assign name & improve error message for frequent LEGACY 
errors.

### Why are the changes needed?

To improve the errors that most frequently occurring.

### Does this PR introduce _any_ user-facing change?

No API changes, it's only for errors.

### How was this patch tested?

The existing CI should passed.

Closes #41504 from itholic/naming_top_error_class.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 80 +++---
 .../spark/sql/catalyst/analysis/Analyzer.scala |  4 +-
 .../catalyst/analysis/ResolveInlineTables.scala|  5 +-
 .../spark/sql/catalyst/analysis/unresolved.scala   |  3 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 22 +++---
 .../spark/sql/errors/QueryParsingErrors.scala  |  2 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  5 +-
 .../catalyst/analysis/ResolveSubquerySuite.scala   | 11 ++-
 .../catalyst/parser/ExpressionParserSuite.scala| 10 +--
 .../analyzer-results/ansi/literals.sql.out | 10 +--
 .../columnresolution-negative.sql.out  |  6 +-
 .../analyzer-results/join-lateral.sql.out  |  6 +-
 .../sql-tests/analyzer-results/literals.sql.out| 10 +--
 .../analyzer-results/postgreSQL/boolean.sql.out|  5 +-
 .../postgreSQL/window_part3.sql.out|  5 +-
 .../postgreSQL/window_part4.sql.out|  5 +-
 .../table-valued-functions.sql.out |  4 +-
 .../sql-tests/results/ansi/literals.sql.out| 10 +--
 .../results/columnresolution-negative.sql.out  |  6 +-
 .../sql-tests/results/join-lateral.sql.out |  6 +-
 .../resources/sql-tests/results/literals.sql.out   | 10 +--
 .../sql-tests/results/postgreSQL/boolean.sql.out   |  5 +-
 .../results/postgreSQL/window_part3.sql.out|  5 +-
 .../results/postgreSQL/window_part4.sql.out|  5 +-
 .../results/table-valued-functions.sql.out |  4 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 12 ++--
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  6 +-
 .../spark/sql/execution/SQLViewTestSuite.scala |  4 +-
 28 files changed, 134 insertions(+), 132 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index d9e729effeb..e35adcfbb5a 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -157,6 +157,11 @@
 ],
 "sqlState" : "22018"
   },
+  "CANNOT_PARSE_INTERVAL" : {
+"message" : [
+  "Unable to parse . Please ensure that the value provided 
is in a valid format for defining an interval. You can reference the 
documentation for the correct format. If the issue persists, please double 
check that the input value is not null or empty and try again."
+]
+  },
   "CANNOT_PARSE_JSON_FIELD" : {
 "message" : [
   "Cannot parse the field name  and the value  of 
the JSON token type  to target Spark data type ."
@@ -191,6 +196,11 @@
 ],
 "sqlState" : "0AKD0"
   },
+  "CANNOT_RESOLVE_STAR_EXPAND" : {
+"message" : [
+  "Cannot resolve .* given input columns . Please 
check that the specified table or struct exists and is accessible in the input 
columns."
+]
+  },
   "CANNOT_RESTORE_PERMISSIONS_FOR_PATH" : {
 "message" : [
   "Failed to set permissions on created path  back to ."
@@ -689,6 +699,11 @@
 ],
 "sqlState" : "42K04"
   },
+  "FAILED_SQL_EXPRESSION_EVALUATION" : {
+"message" : [
+  "Failed to evaluate the SQL expression: . Please check your 
syntax and ensure all required tables and columns are available."
+]
+  },
   "FIELD_NOT_FOUND" : {
 "message" : [
   "No such struct field  in ."
@@ -1222,6 +1237,11 @@
   }
 }
   },
+  "INVALID_NUMERIC_LITERAL_RANGE" : {
+"message" : [
+  "Numeric literal  is outside the valid range for 
 with minimum value of  and maximum value of . 
Please adjust the value acc

[spark] branch master updated: [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170

2023-06-19 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f3db20c17df [SPARK-43969][SQL] Refactor & Assign names to the error 
class _LEGACY_ERROR_TEMP_1170
f3db20c17df is described below

commit f3db20c17dfdc1cb5daa42c154afa732e5e3800b
Author: panbingkun 
AuthorDate: Tue Jun 20 01:43:32 2023 +0300

[SPARK-43969][SQL] Refactor & Assign names to the error class 
_LEGACY_ERROR_TEMP_1170

### What changes were proposed in this pull request?
The pr aims to:
- Refactor `PreWriteCheck` to use error framework.
- Make `INSERT_COLUMN_ARITY_MISMATCH` more generic & avoiding to embed 
error's text in source code.
- Assign name to _LEGACY_ERROR_TEMP_1170.
- In `INSERT_PARTITION_COLUMN_ARITY_MISMATCH` error message, replace '' 
with `toSQLId` for table column name.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #41458 from panbingkun/refactor_PreWriteCheck.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   |  62 ---
 python/pyspark/sql/tests/test_readwriter.py|   4 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala |   2 +-
 .../catalyst/analysis/ResolveInsertionBase.scala   |  13 ++-
 .../catalyst/analysis/TableOutputResolver.scala|   4 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  40 +++
 .../catalyst/analysis/V2WriteAnalysisSuite.scala   |  48 +---
 .../spark/sql/execution/datasources/rules.scala|  32 --
 .../analyzer-results/postgreSQL/numeric.sql.out|   7 +-
 .../sql-tests/results/postgreSQL/numeric.sql.out   |   7 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  |  33 --
 .../org/apache/spark/sql/SQLInsertTestSuite.scala  |  31 --
 .../spark/sql/connector/InsertIntoTests.scala  |  34 --
 .../apache/spark/sql/execution/SQLViewSuite.scala  |  11 +-
 .../spark/sql/execution/command/DDLSuite.scala |  54 +
 .../org/apache/spark/sql/sources/InsertSuite.scala | 122 +
 .../spark/sql/hive/thriftserver/CliSuite.scala |   2 +-
 .../org/apache/spark/sql/hive/InsertSuite.scala|  11 +-
 18 files changed, 324 insertions(+), 193 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 54b920cc36f..d9e729effeb 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -888,10 +888,24 @@
   },
   "INSERT_COLUMN_ARITY_MISMATCH" : {
 "message" : [
-  "Cannot write to '', :",
-  "Table columns: .",
-  "Data columns: ."
+  "Cannot write to , the reason is"
 ],
+"subClass" : {
+  "NOT_ENOUGH_DATA_COLUMNS" : {
+"message" : [
+  "not enough data columns:",
+  "Table columns: .",
+  "Data columns: ."
+]
+  },
+  "TOO_MANY_DATA_COLUMNS" : {
+"message" : [
+  "too many data columns:",
+  "Table columns: .",
+  "Data columns: ."
+]
+  }
+},
 "sqlState" : "21S01"
   },
   "INSERT_PARTITION_COLUMN_ARITY_MISMATCH" : {
@@ -1715,6 +1729,11 @@
 ],
 "sqlState" : "46110"
   },
+  "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT" : {
+"message" : [
+  " is not supported, if you want to enable it, please set 
\"spark.sql.catalogImplementation\" to \"hive\"."
+]
+  },
   "NOT_SUPPORTED_IN_JDBC_CATALOG" : {
 "message" : [
   "Not supported command in JDBC catalog:"
@@ -2464,6 +2483,33 @@
   "grouping()/grouping_id() can only be used with 
GroupingSets/Cube/Rollup."
 ]
   },
+  "UNSUPPORTED_INSERT" : {
+"message" : [
+  "Can't insert into the target."
+],
+"subClass" : {
+  "NOT_ALLOWED" : {
+"message" : [
+  "The target relation  does not allow insertion."
+]
+  },
+  "NOT_PARTITIONED" : {
+"message" : [
+  "The target relation  is not partitioned."
+]
+  },
+  "RDD_BASED" : {
+"message" : [
+  "An RDD-based table is not allowed."
+]
+  },
+

[spark] branch master updated: [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline in all modules

2023-06-19 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0fc7eeb39aa [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by 
adding a newline in all modules
0fc7eeb39aa is described below

commit 0fc7eeb39aad5997912c8a3f82aea089a4985898
Author: Hyukjin Kwon 
AuthorDate: Mon Jun 19 13:34:42 2023 +0300

[SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline 
in all modules

### What changes were proposed in this pull request?

I found that there are many instances same as 
https://github.com/apache/spark/pull/41655. This PR aims to address all the 
examples in all components in PySpark.

### Why are the changes needed?

See https://github.com/apache/spark/pull/41655.

### Does this PR introduce _any_ user-facing change?

Yes, it changes the documentation and makes the example copy-pastable, see 
also https://github.com/apache/spark/pull/41655.

### How was this patch tested?

CI in this PR should validate them. This is logically the same as 
https://github.com/apache/spark/pull/41655. I will also build the documentation 
locally and test.

Closes #41657 from HyukjinKwon/minor-newlines.

Authored-by: Hyukjin Kwon 
Signed-off-by: Max Gekk 
---
 python/pyspark/accumulators.py |  4 
 python/pyspark/context.py  |  4 
 python/pyspark/ml/functions.py | 21 +++--
 python/pyspark/ml/torch/distributor.py |  2 ++
 python/pyspark/mllib/clustering.py |  2 ++
 python/pyspark/rdd.py  |  9 +
 python/pyspark/sql/dataframe.py|  4 
 python/pyspark/sql/functions.py|  4 
 python/pyspark/sql/pandas/group_ops.py |  6 ++
 python/pyspark/sql/streaming/query.py  |  2 ++
 python/pyspark/sql/types.py|  1 +
 python/pyspark/sql/udtf.py |  1 +
 12 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/python/pyspark/accumulators.py b/python/pyspark/accumulators.py
index dc8520a844d..a95bd9debfc 100644
--- a/python/pyspark/accumulators.py
+++ b/python/pyspark/accumulators.py
@@ -88,12 +88,14 @@ class Accumulator(Generic[T]):
 >>> def f(x):
 ... global a
 ... a += x
+...
 >>> rdd.foreach(f)
 >>> a.value
 13
 >>> b = sc.accumulator(0)
 >>> def g(x):
 ... b.add(x)
+...
 >>> rdd.foreach(g)
 >>> b.value
 6
@@ -106,6 +108,7 @@ class Accumulator(Generic[T]):
 >>> def h(x):
 ... global a
 ... a.value = 7
+...
 >>> rdd.foreach(h) # doctest: +IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
@@ -198,6 +201,7 @@ class AccumulatorParam(Generic[T]):
 >>> def g(x):
 ... global va
 ... va += [x] * 3
+...
 >>> rdd = sc.parallelize([1,2,3])
 >>> rdd.foreach(g)
 >>> va.value
diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index 6f5094963be..51a4db67e8c 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -1802,6 +1802,7 @@ class SparkContext:
 >>> def f(x):
 ... global acc
 ... acc += 1
+...
 >>> rdd.foreach(f)
 >>> acc.value
 15
@@ -2140,6 +2141,7 @@ class SparkContext:
 >>> def map_func(x):
 ... sleep(100)
 ... raise RuntimeError("Task should have been cancelled")
+...
 >>> def start_job(x):
 ... global result
 ... try:
@@ -2148,9 +2150,11 @@ class SparkContext:
 ... except Exception as e:
 ... result = "Cancelled"
 ... lock.release()
+...
 >>> def stop_job():
 ... sleep(5)
 ... sc.cancelJobGroup("job_to_cancel")
+...
 >>> suppress = lock.acquire()
 >>> suppress = InheritableThread(target=start_job, args=(10,)).start()
 >>> suppress = InheritableThread(target=stop_job).start()
diff --git a/python/pyspark/ml/functions.py b/python/pyspark/ml/functions.py
index bce4101df1e..89b05b692ea 100644
--- a/python/pyspark/ml/functions.py
+++ b/python/pyspark/ml/functions.py
@@ -512,11 +512,10 @@ def predict_batch_udf(
 ... # outputs.shape = [batch_size]
 ... return inputs * 2
 ... return predict
->>>
+...
 >>> times_two_udf = predict_batch_udf(make_times_two_fn,
 ...   return_type=FloatType(),
 ...

[spark] branch master updated: [SPARK-44093][SQL][TESTS] Make `catalyst` module passes in Java 21

2023-06-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cbfc920c2d7 [SPARK-44093][SQL][TESTS] Make `catalyst` module passes in 
Java 21
cbfc920c2d7 is described below

commit cbfc920c2d75451e898ff5e00622a2af4eed3709
Author: Dongjoon Hyun 
AuthorDate: Sun Jun 18 17:34:08 2023 +0300

[SPARK-44093][SQL][TESTS] Make `catalyst` module passes in Java 21

### What changes were proposed in this pull request?

This PR aims to make `catalyst` module passes in Java 21.

### Why are the changes needed?

https://bugs.openjdk.org/browse/JDK-8267125 changes the error message at 
Java 18.

**JAVA**
```
$ java -version
openjdk version "21-ea" 2023-09-19
OpenJDK Runtime Environment (build 21-ea+27-2343)
OpenJDK 64-Bit Server VM (build 21-ea+27-2343, mixed mode, sharing)
```

**BEFORE**
```
$ build/sbt "catalyst/test"
...
[info] *** 1 TEST FAILED ***
[error] Failed: Total 7122, Failed 1, Errors 0, Passed 7121, Ignored 5, 
Canceled 1
[error] Failed tests:
[error] 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtilsSuite
[error] (catalyst / Test / test) sbt.TestsFailedException: Tests 
unsuccessful
[error] Total time: 212 s (03:32), completed Jun 18, 2023, 1:11:17 AM
```

**AFTER**
```
$ build/sbt "catalyst/test"
...
[info] All tests passed.
[info] Passed: Total 7122, Failed 0, Errors 0, Passed 7122, Ignored 5, 
Canceled 1
[success] Total time: 213 s (03:33), completed Jun 18, 2023, 1:15:37 AM
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual test on Java 21.

Closes #41649 from dongjoon-hyun/SPARK-44093.

Authored-by: Dongjoon Hyun 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/expressions/ExpressionImplUtilsSuite.scala  | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala
 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala
index 3b0dd82c173..4b33f9bc527 100644
--- 
a/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtilsSuite.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+
 import org.apache.spark.{SparkFunSuite, SparkRuntimeException}
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -285,6 +287,12 @@ class ExpressionImplUtilsSuite extends SparkFunSuite {
 }
   }
 
+  // JDK-8267125 changes tag error message at Java 18
+  val msgTagMismatch = if 
(SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) {
+"Tag mismatch!"
+  } else {
+"Tag mismatch"
+  }
   val corruptedCiphertexts = Seq(
 // This is truncated
 TestCase(
@@ -310,7 +318,7 @@ class ExpressionImplUtilsSuite extends SparkFunSuite {
   errorParamsMap = Map(
 "parameter" -> "`expr`, `key`",
 "functionName" -> "`aes_encrypt`/`aes_decrypt`",
-"detailMessage" -> "Tag mismatch!"
+"detailMessage" -> msgTagMismatch
   )
 ),
 // Valid ciphertext, wrong AAD
@@ -324,7 +332,7 @@ class ExpressionImplUtilsSuite extends SparkFunSuite {
   errorParamsMap = Map(
 "parameter" -> "`expr`, `key`",
 "functionName" -> "`aes_encrypt`/`aes_decrypt`",
-"detailMessage" -> "Tag mismatch!"
+"detailMessage" -> msgTagMismatch
   )
 )
   )


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44089][SQL][TESTS] Remove the `@ignore` identifier from `AlterTableRenamePartitionSuite`

2023-06-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d05091eb8f0 [SPARK-44089][SQL][TESTS] Remove the `@ignore` identifier 
from `AlterTableRenamePartitionSuite`
d05091eb8f0 is described below

commit d05091eb8f0f6ee1398fae90fd7b593ac3314e44
Author: yangjie01 
AuthorDate: Sun Jun 18 17:24:36 2023 +0300

[SPARK-44089][SQL][TESTS] Remove the `@ignore` identifier from 
`AlterTableRenamePartitionSuite`

### What changes were proposed in this pull request?
https://github.com/apache/spark/pull/41533 ignore 
`AlterTableRenamePartitionSuite` try to restore stability of `sql-others` test 
task, but it seems that it is not the root cause that affects stability, so 
this pr has removed the previously added `ignore` identifier to restore testing.

### Why are the changes needed?
Resume testing of `AlterTableRenamePartitionSuite`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
should monitor ci

Closes #41647 from LuciferYang/SPARK-44089.

Authored-by: yangjie01 
Signed-off-by: Max Gekk 
---
 .../sql/execution/command/v2/AlterTableRenamePartitionSuite.scala  | 3 ---
 1 file changed, 3 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala
index 764596685b5..bb06818da48 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.sql.execution.command.v2
 
-import org.scalatest.Ignore
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.execution.command
 
@@ -26,7 +24,6 @@ import org.apache.spark.sql.execution.command
  * The class contains tests for the `ALTER TABLE .. RENAME PARTITION` command
  * to check V2 table catalogs.
  */
-@Ignore
 class AlterTableRenamePartitionSuite
   extends command.AlterTableRenamePartitionSuiteBase
   with CommandSuiteBase {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44071] Define and use Unresolved[Leaf|Unary]Node traits

2023-06-16 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 747953eb5c4 [SPARK-44071] Define and use Unresolved[Leaf|Unary]Node 
traits
747953eb5c4 is described below

commit 747953eb5c46e121faf476a060049f1423ae7e91
Author: Ryan Johnson 
AuthorDate: Fri Jun 16 23:30:08 2023 +0300

[SPARK-44071] Define and use Unresolved[Leaf|Unary]Node traits

### What changes were proposed in this pull request?

Looking at 
[unresolved.scala](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala),
 catalyst would benefit from an `UnresolvedNode` trait that various 
`UnresolvedFoo` classes could inherit:
```scala
trait UnresolvedNode extends LogicalPlan {
  override def output: Seq[Attribute] = Nil
  override lazy val resolved = false
}
```
Today, the code is duplicated in ~20 locations (7 of them in that one file).

### Why are the changes needed?

Reduces redundancy, improves readability, documents programmer intent 
better.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Mild refactor, existing unit tests suffice.

Closes #41617 from ryan-johnson-databricks/unresolved-node-trait.

Authored-by: Ryan Johnson 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/analysis/RelationTimeTravel.scala |  8 ++--
 .../spark/sql/catalyst/analysis/parameters.scala   | 10 ++---
 .../spark/sql/catalyst/analysis/unresolved.scala   | 48 +-
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  | 32 +++
 .../spark/sql/catalyst/catalog/interface.scala | 11 ++---
 .../plans/logical/basicLogicalOperators.scala  |  6 +--
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  5 +--
 7 files changed, 39 insertions(+), 81 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala
index 4daefa816a5..6e0d0998883 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RelationTimeTravel.scala
@@ -17,8 +17,8 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
-import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan}
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.trees.TreePattern.{RELATION_TIME_TRAVEL, 
TreePattern}
 
 /**
@@ -29,8 +29,6 @@ import 
org.apache.spark.sql.catalyst.trees.TreePattern.{RELATION_TIME_TRAVEL, Tr
 case class RelationTimeTravel(
 relation: LogicalPlan,
 timestamp: Option[Expression],
-version: Option[String]) extends LeafNode {
-  override def output: Seq[Attribute] = Nil
-  override lazy val resolved: Boolean = false
+version: Option[String]) extends UnresolvedLeafNode {
   override val nodePatterns: Seq[TreePattern] = Seq(RELATION_TIME_TRAVEL)
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala
index 2a31e90465c..a00f9cec92c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala
@@ -18,8 +18,8 @@
 package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.SparkException
-import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression, 
LeafExpression, Literal, SubqueryExpression, Unevaluable}
-import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UnaryNode}
+import org.apache.spark.sql.catalyst.expressions.{Expression, LeafExpression, 
Literal, SubqueryExpression, Unevaluable}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.catalyst.trees.TreePattern.{PARAMETER, 
PARAMETERIZED_QUERY, TreePattern, UNRESOLVED_WITH}
 import org.apache.spark.sql.errors.QueryErrorsBase
@@ -47,10 +47,10 @@ case class Parameter(name: String) extends LeafExpression 
with Unevaluable {
  * The logical plan representing a parameterized query. It will be removed 
during analysis after
  * the parameters are bind.
  */
-case class ParameterizedQuery(child: LogicalPlan, args: Map[String, 
Expression]) extends UnaryNode {
+case class ParameterizedQuery(child: LogicalPlan, args: Map[String, 
Expression])
+  extends

[spark] branch master updated: [SPARK-43290][SQL] Adds support for aes_encrypt IVs and AAD

2023-06-16 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb1ee25a89e [SPARK-43290][SQL] Adds support for aes_encrypt IVs and AAD
fb1ee25a89e is described below

commit fb1ee25a89e8b42178b7f55718859ab5117c2320
Author: Steve Weis 
AuthorDate: Fri Jun 16 15:42:05 2023 +0300

[SPARK-43290][SQL] Adds support for aes_encrypt IVs and AAD

### What changes were proposed in this pull request?

This change adds support for user-provided initialization vectors (IVs) or 
authenticated additional data (AAD) to `aes_encrypt` / `aes_decrypt`. 12-byte 
IVs may optionally be passed if the mode is "GCM" and 16-byte IVs may be passed 
if the mode is "CBC". An arbitrary binary value may be passed as additional 
authenticated data only if "GCM" mode is used.

### Why are the changes needed?

Callers may wish to provide their own IV values so that the output 
ciphertext matches a ciphertext generated outside of Spark. AAD is used to bind 
some input to a ciphertext and ensure that it is presented during decryption -- 
often used to scope an operation to a specific context.

### Does this PR introduce _any_ user-facing change?

Yes, this change introduces two optional parameters to `aes_encrypt` and 
one optional parameter to `aes_decrypt`:
```
aes_encrypt(expr, key[, mode[, padding[, iv[, aad)
aes_decrypt(expr, key[, mode[, padding[, iv]]])
```

### How was this patch tested?

```
build/sbt "sql/test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite 
-- -z aes"
```

Closes #41488 from sweisdb/SPARK-43290.

Authored-by: Steve Weis 
Signed-off-by: Max Gekk 
---
 .../catalyst/expressions/ExpressionImplUtils.java  | 14 +
 .../spark/sql/catalyst/expressions/misc.scala  | 64 +-
 .../expressions/ExpressionImplUtilsSuite.scala | 23 +++-
 .../sql-functions/sql-expression-schema.md |  6 +-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 50 +
 5 files changed, 127 insertions(+), 30 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
index 6aae649718a..a604e6bf225 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
@@ -111,14 +111,6 @@ public class ExpressionImplUtils {
 return checkSum % 10 == 0;
   }
 
-  public static byte[] aesEncrypt(byte[] input, byte[] key, UTF8String mode, 
UTF8String padding) {
-return aesEncrypt(input, key, mode, padding, null, null);
-  }
-
-  public static byte[] aesDecrypt(byte[] input, byte[] key, UTF8String mode, 
UTF8String padding) {
-return aesDecrypt(input, key, mode, padding, null);
-  }
-
   public static byte[] aesEncrypt(byte[] input,
   byte[] key,
   UTF8String mode,
@@ -192,7 +184,7 @@ public class ExpressionImplUtils {
   Cipher cipher = Cipher.getInstance(cipherMode.transformation);
   if (opmode == Cipher.ENCRYPT_MODE) {
 // This may be 0-length for ECB
-if (iv == null) {
+if (iv == null || iv.length == 0) {
   iv = generateIv(cipherMode);
 } else if (!cipherMode.usesSpec) {
   // If the caller passes an IV, ensure the mode actually uses it.
@@ -210,7 +202,7 @@ public class ExpressionImplUtils {
 }
 
 // If the cipher mode supports additional authenticated data and it is 
provided, update it
-if (aad != null) {
+if (aad != null && aad.length != 0) {
   if (cipherMode.supportsAad != true) {
 throw QueryExecutionErrors.aesUnsupportedAad(mode);
   }
@@ -231,7 +223,7 @@ public class ExpressionImplUtils {
 if (cipherMode.usesSpec) {
   AlgorithmParameterSpec algSpec = getParamSpec(cipherMode, input);
   cipher.init(opmode, secretKey, algSpec);
-  if (aad != null) {
+  if (aad != null && aad.length != 0) {
 if (cipherMode.supportsAad != true) {
   throw QueryExecutionErrors.aesUnsupportedAad(mode);
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
index 67328cde71a..92ed0843521 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.sca

[spark] branch master updated: [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132

2023-06-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c41be4ec0ad [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132
c41be4ec0ad is described below

commit c41be4ec0ad97f587a0581d5583b2ca9975b2a0f
Author: Hisoka 
AuthorDate: Mon Jun 12 23:54:02 2023 +0300

[SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132

### What changes were proposed in this pull request?
This PR proposes to assign name to _LEGACY_ERROR_TEMP_2132, 
"CANNOT_PARSE_JSON_ARRAYS_AS_STRUCTS".

### Why are the changes needed?
Assign proper name to LEGACY_ERROR_TEMP

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
./build/sbt "testOnly org.apache.spark.sql.errors.QueryExecutionErrorsSuite"

Closes #40632 from Hisoka-X/_LEGACY_ERROR_TEMP_2132.

Lead-authored-by: Hisoka 
Co-authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json | 20 ++--
 .../spark/sql/catalyst/json/JacksonParser.scala  |  2 +-
 .../spark/sql/catalyst/util/BadRecordException.scala |  5 +
 .../spark/sql/catalyst/util/FailureSafeParser.scala  | 10 --
 .../spark/sql/errors/QueryExecutionErrors.scala  | 10 ++
 .../catalyst/expressions/JsonExpressionsSuite.scala  |  2 +-
 .../org/apache/spark/sql/CsvFunctionsSuite.scala |  2 +-
 .../org/apache/spark/sql/JsonFunctionsSuite.scala| 12 ++--
 .../spark/sql/errors/QueryExecutionErrorsSuite.scala | 15 +++
 .../sql/execution/datasources/csv/CSVSuite.scala |  2 +-
 .../sql/execution/datasources/json/JsonSuite.scala   |  4 ++--
 .../spark/sql/hive/thriftserver/CliSuite.scala   |  4 ++--
 .../ThriftServerWithSparkContextSuite.scala  |  4 ++--
 13 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index a12a8000870..183ea31a7cb 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1542,7 +1542,20 @@
 "message" : [
   "Malformed records are detected in record parsing: .",
   "Parse Mode: . To process malformed records as null 
result, try setting the option 'mode' as 'PERMISSIVE'."
-]
+],
+"subClass" : {
+  "CANNOT_PARSE_JSON_ARRAYS_AS_STRUCTS" : {
+"message" : [
+  "Parsing JSON arrays as structs is forbidden."
+]
+  },
+  "WITHOUT_SUGGESTION" : {
+"message" : [
+  ""
+]
+  }
+},
+"sqlState" : "22023"
   },
   "MISSING_AGGREGATION" : {
 "message" : [
@@ -4692,11 +4705,6 @@
   "Exception when registering StreamingQueryListener."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2132" : {
-"message" : [
-  "Parsing JSON arrays as structs is forbidden."
-]
-  },
   "_LEGACY_ERROR_TEMP_2133" : {
 "message" : [
   "Cannot parse field name , field value , 
[] as target spark data type []."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
index bf07d65caa0..48ee50938cd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
@@ -144,7 +144,7 @@ class JacksonParser(
   array.toArray[InternalRow](schema)
 }
   case START_ARRAY =>
-throw QueryExecutionErrors.cannotParseJsonArraysAsStructsError()
+throw JsonArraysAsStructsException()
 }
   }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
index 67defe78a6c..cfbe9da6ec5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala
@@ -41,3 +41,8 @@ case class BadRecordException(
 @transient record: () => UTF8String,
 @transient partialResult: () => Option[InternalRow],
 cause: Throwable) extends Exception(cause)
+
+/**
+ * Exception thrown when the underlying parser parses a JSON array as a struct.
+ */
+case class JsonArraysAsStructsException() extends RuntimeException()
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/s

[spark] branch master updated: [SPARK-43971][CONNECT][PYTHON] Support Python's createDataFrame in streaming manner

2023-06-09 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 93e0acbf7d9 [SPARK-43971][CONNECT][PYTHON] Support Python's 
createDataFrame in streaming manner
93e0acbf7d9 is described below

commit 93e0acbf7d9fcf3422860b2a5d39379bebf7bc43
Author: Max Gekk 
AuthorDate: Sat Jun 10 01:25:04 2023 +0300

[SPARK-43971][CONNECT][PYTHON] Support Python's createDataFrame in 
streaming manner

### What changes were proposed in this pull request?
In the PR, I propose to transfer a local relation from **the Python connect 
client** to the server in streaming way when it exceeds some size which is 
defined by the SQL config `spark.sql.session.localRelationCacheThreshold`. The 
implementation is similar to https://github.com/apache/spark/pull/40827.  In 
particular:
1. The client applies the `sha256` function over **the proto form** of the 
local relation;
2. It checks presents of the relation at the server side by sending the 
relation hash to the server;
3. If the server doesn't have the local relation, the client transfers the 
local relation as an artefact with the name `cache/`;
4. As soon as the relation has presented at the server already, or 
transferred recently, the client transform the logical plan by replacing the 
`LocalRelation` node by `CachedLocalRelation` with the hash.
5. On another hand, the server converts `CachedLocalRelation` back to 
`LocalRelation` by retrieving the relation body from the local cache.

### Why are the changes needed?
To fix the issues of creating a large dataframe from a local collection:
```python
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
<_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Sent message larger than max (134218508 vs. 134217728)"
debug_error_string = "UNKNOWN:Error received from peer 
localhost:50982 {grpc_message:"Sent message larger than max (134218508 vs. 
134217728)", grpc_status:8, created_time:"2023-06-09T15:34:08.362797+03:00"}
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ python/run-tests --parallelism=1 --testnames 
'pyspark.sql.tests.connect.test_connect_basic 
SparkConnectBasicTests.test_streaming_local_relation'
```

Closes #41537 from MaxGekk/streaming-createDataFrame-python-4.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 python/pyspark/sql/connect/client/core.py  |  3 ++
 python/pyspark/sql/connect/plan.py | 34 ++
 python/pyspark/sql/connect/session.py  | 26 +++--
 .../sql/tests/connect/test_connect_basic.py| 19 
 4 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/connect/client/core.py 
b/python/pyspark/sql/connect/client/core.py
index 25e395356d5..7368521259a 100644
--- a/python/pyspark/sql/connect/client/core.py
+++ b/python/pyspark/sql/connect/client/core.py
@@ -1257,6 +1257,9 @@ class SparkConnectClient(object):
 def copy_from_local_to_fs(self, local_path: str, dest_path: str) -> None:
 self._artifact_manager._add_forward_to_fs_artifacts(local_path, 
dest_path)
 
+def cache_artifact(self, blob: bytes) -> str:
+return self._artifact_manager.cache_artifact(blob)
+
 
 class RetryState:
 """
diff --git a/python/pyspark/sql/connect/plan.py 
b/python/pyspark/sql/connect/plan.py
index fc8b37b102c..406f65080d1 100644
--- a/python/pyspark/sql/connect/plan.py
+++ b/python/pyspark/sql/connect/plan.py
@@ -363,6 +363,10 @@ class LocalRelation(LogicalPlan):
 plan.local_relation.schema = self._schema
 return plan
 
+def serialize(self, session: "SparkConnectClient") -> bytes:
+p = self.plan(session)
+return bytes(p.local_relation.SerializeToString())
+
 def print(self, indent: int = 0) -> str:
 return f"{' ' * indent}\n"
 
@@ -374,6 +378,36 @@ class LocalRelation(LogicalPlan):
 """
 
 
+class CachedLocalRelation(LogicalPlan):
+"""Creates a CachedLocalRelation plan object based on a hash of a 
LocalRelation."""
+
+def __init__(self, hash: str) -> None:
+super().__init__(None)
+
+self._hash = hash
+
+def plan(self, session: "SparkConnectClient") -> proto.Relation:
+plan = self._create_proto_relation()
+clr = plan.cached_local_relation
+
+if session._user_id:
+clr.userId = session._user_id
+clr.sessionId = session._session_id
+clr.hash = self._h

[spark] branch master updated (3cae38b4f10 -> 958b8541803)

2023-06-08 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3cae38b4f10 [SPARK-43612][PYTHON][CONNECT][FOLLOW-UP] Copy dependent 
data files to data directory
 add 958b8541803 [SPARK-44006][CONNECT][PYTHON] Support cache artifacts

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/client/artifact.py  | 52 +-
 .../sql/tests/connect/client/test_artifact.py  | 10 +
 2 files changed, 61 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43993][SQL][TESTS] Add tests for cache artifacts

2023-06-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fead8a7962a [SPARK-43993][SQL][TESTS] Add tests for cache artifacts
fead8a7962a is described below

commit fead8a7962a717aae5cab9eef51eed2ac684f070
Author: Max Gekk 
AuthorDate: Wed Jun 7 16:00:49 2023 +0300

[SPARK-43993][SQL][TESTS] Add tests for cache artifacts

### What changes were proposed in this pull request?
In the PR, I propose to add a test to check two methods of the artifact 
manager:
- `isCachedArtifact()`
- `cacheArtifact()`

### Why are the changes needed?
To improve test coverage of Artifacts API.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.ArtifactSuite"
```

Closes #41493 from MaxGekk/test-cache-artifact.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../spark/sql/connect/client/ArtifactManager.scala |  2 +-
 .../spark/sql/connect/client/ArtifactSuite.scala   | 14 
 .../connect/client/SparkConnectClientSuite.scala   | 25 +-
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala
index acd9f279c6d..6d0d16df946 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ArtifactManager.scala
@@ -108,7 +108,7 @@ class ArtifactManager(
*/
   def addArtifacts(uris: Seq[URI]): Unit = 
addArtifacts(uris.flatMap(parseArtifacts))
 
-  private def isCachedArtifact(hash: String): Boolean = {
+  private[client] def isCachedArtifact(hash: String): Boolean = {
 val artifactName = CACHE_PREFIX + "/" + hash
 val request = proto.ArtifactStatusesRequest
   .newBuilder()
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala
index 506ad3625b0..39ab0eef412 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ArtifactSuite.scala
@@ -25,6 +25,7 @@ import scala.collection.JavaConverters._
 import com.google.protobuf.ByteString
 import io.grpc.{ManagedChannel, Server}
 import io.grpc.inprocess.{InProcessChannelBuilder, InProcessServerBuilder}
+import org.apache.commons.codec.digest.DigestUtils.sha256Hex
 import org.scalatest.BeforeAndAfterEach
 
 import org.apache.spark.connect.proto
@@ -248,4 +249,17 @@ class ArtifactSuite extends ConnectFunSuite with 
BeforeAndAfterEach {
 assertFileDataEquality(remainingArtifacts.get(0).getData, Paths.get(file3))
 assertFileDataEquality(remainingArtifacts.get(1).getData, Paths.get(file4))
   }
+
+  test("cache an artifact and check its presence") {
+val s = "Hello, World!"
+val blob = s.getBytes("UTF-8")
+val expectedHash = sha256Hex(blob)
+assert(artifactManager.isCachedArtifact(expectedHash) === false)
+val actualHash = artifactManager.cacheArtifact(blob)
+assert(actualHash === expectedHash)
+assert(artifactManager.isCachedArtifact(expectedHash) === true)
+
+val receivedRequests = service.getAndClearLatestAddArtifactRequests()
+assert(receivedRequests.size == 1)
+  }
 }
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
index 7a0ad1a9e2a..7e0b687054d 100755
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
@@ -18,6 +18,7 @@ package org.apache.spark.sql.connect.client
 
 import java.util.concurrent.TimeUnit
 
+import scala.collection.JavaConverters._
 import scala.collection.mutable
 
 import io.grpc.{Server, StatusRuntimeException}
@@ -26,7 +27,7 @@ import io.grpc.stub.StreamObserver
 import org.scalatest.BeforeAndAfterEach
 
 import org.apache.spark.connect.proto
-import org.apache.spark.connect.proto.{AddArtifactsRequest, 
AddArtifactsResponse, AnalyzePlanRequest, AnalyzePlanResponse, 
ExecutePlanRequest, ExecutePlanResponse, SparkConnectServiceGrpc}
+import o

[spark] branch master updated: [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]

2023-06-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0cd5ca5a7b3 [SPARK-43913][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2426-2432]
0cd5ca5a7b3 is described below

commit 0cd5ca5a7b31f65a005c8ee2e90a6b4a29623ba7
Author: Jiaan Geng 
AuthorDate: Tue Jun 6 10:28:48 2023 +0300

[SPARK-43913][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2426-2432]

### What changes were proposed in this pull request?
The pr aims to assign names to the error class 
`_LEGACY_ERROR_TEMP_[2426-2432]`.

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases.

Closes #41424 from beliefer/SPARK-43913.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 58 --
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 51 +++
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 20 
 .../CreateTablePartitioningValidationSuite.scala   | 22 
 .../negative-cases/invalid-correlation.sql.out |  6 ++-
 .../negative-cases/invalid-correlation.sql.out |  6 ++-
 6 files changed, 93 insertions(+), 70 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index de80415d85b..8c3c076ce74 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -660,6 +660,11 @@
   "The event time  has the invalid type , but 
expected \"TIMESTAMP\"."
 ]
   },
+  "EXPRESSION_TYPE_IS_NOT_ORDERABLE" : {
+"message" : [
+  "Column expression  cannot be sorted because its type  
is not orderable."
+]
+  },
   "FAILED_EXECUTE_UDF" : {
 "message" : [
   "Failed to execute user defined function (: () 
=> )."
@@ -1541,6 +1546,24 @@
 ],
 "sqlState" : "42803"
   },
+  "MISSING_ATTRIBUTES" : {
+"message" : [
+  "Resolved attribute(s)  missing from  in 
operator ."
+],
+"subClass" : {
+  "RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION" : {
+"message" : [
+  "Attribute(s) with the same name appear in the operation: 
.",
+  "Please check if the right attribute(s) are used."
+]
+  },
+  "RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT" : {
+"message" : [
+  ""
+]
+  }
+}
+  },
   "MISSING_GROUP_BY" : {
 "message" : [
   "The query does not include a GROUP BY clause. Add GROUP BY or turn it 
into the window functions using OVER clauses."
@@ -1945,6 +1968,11 @@
   "Query [id = , runId = ] terminated with exception: "
 ]
   },
+  "SUM_OF_LIMIT_AND_OFFSET_EXCEEDS_MAX_INT" : {
+"message" : [
+  "The sum of the LIMIT clause and the OFFSET clause must not be greater 
than the maximum 32-bit integer value (2,147,483,647) but found limit = 
, offset = ."
+]
+  },
   "TABLE_OR_VIEW_ALREADY_EXISTS" : {
 "message" : [
   "Cannot create table or view  because it already exists.",
@@ -2310,6 +2338,11 @@
   "Parameter markers in unexpected statement: . Parameter 
markers must only be used in a query, or DML statement."
 ]
   },
+  "PARTITION_WITH_NESTED_COLUMN_IS_UNSUPPORTED" : {
+"message" : [
+  "Invalid partitioning:  is missing or is in a map or array."
+]
+  },
   "PIVOT_AFTER_GROUP_BY" : {
 "message" : [
   "PIVOT clause following a GROUP BY clause. Consider pushing the 
GROUP BY into a subquery."
@@ -5525,31 +5558,6 @@
   "failed to evaluate expression : "
 ]
   },
-  "_LEGACY_ERROR_TEMP_2426" : {
-"message" : [
-  "nondeterministic expression  should not appear in grouping 
expression."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2427" : {
-"message" : [
-  "sorting is not supported for columns of type ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2428" : {
-"message" : [
-  "The sum of the LIMIT clause and the OFFSET clause must not be greater 
than the maximum 32-bit integer value (2,147,483,647) but found limit = 
, offset = ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2431" : {
-"message" : [
-  "Invalid pa

[spark] branch master updated: [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNI

2023-06-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61e6227fb62 [SPARK-43962][SQL] Improve error messages: 
`CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, 
`CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE`
61e6227fb62 is described below

commit 61e6227fb62c2452b01ac595c2bc43d4492686a0
Author: itholic 
AuthorDate: Tue Jun 6 10:25:24 2023 +0300

[SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, 
`CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, 
`CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE`

### What changes were proposed in this pull request?

This PR proposes to improve error messages for `CANNOT_DECODE_URL`, 
`CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, 
`CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE`.

**NOTE:** This PR is an experimental work that utilizes LLM to enhance 
error messages. The script was created using the `openai` Python library from 
OpenAI, and minimal review was conducted by author after executing the script. 
The five improved error messages were selected by the author.

### Why are the changes needed?

For improving errors to make them more actionable and usable.

### Does this PR introduce _any_ user-facing change?

No API changes, only error message improvement.

### How was this patch tested?

The existing CI should pass.

Closes #41455 from itholic/emi_1-5.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index bceea072e92..de80415d85b 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -114,7 +114,7 @@
   },
   "CANNOT_DECODE_URL" : {
 "message" : [
-  "Cannot decode url : ."
+  "The provided URL cannot be decoded: . Please ensure that the URL 
is properly formatted and try again."
 ],
 "sqlState" : "22546"
   },
@@ -130,7 +130,7 @@
   },
   "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE" : {
 "message" : [
-  "Failed to merge incompatible data types  and ."
+  "Failed to merge incompatible data types  and . Please 
check the data types of the columns being merged and ensure that they are 
compatible. If necessary, consider casting the columns to compatible data types 
before attempting the merge."
 ],
 "sqlState" : "42825"
   },
@@ -153,7 +153,7 @@
   },
   "CANNOT_PARSE_DECIMAL" : {
 "message" : [
-  "Cannot parse decimal."
+  "Cannot parse decimal. Please ensure that the input is a valid number 
with optional decimal point or comma separators."
 ],
 "sqlState" : "22018"
   },
@@ -176,12 +176,12 @@
   },
   "CANNOT_READ_FILE_FOOTER" : {
 "message" : [
-  "Could not read footer for file: ."
+  "Could not read footer for file: . Please ensure that the file is 
in either ORC or Parquet format. If not, please convert it to a valid format. 
If the file is in the valid format, please check if it is corrupt. If it is, 
you can choose to either ignore it or fix the corruption."
 ]
   },
   "CANNOT_RECOGNIZE_HIVE_TYPE" : {
 "message" : [
-  "Cannot recognize hive type string: , column: ."
+  "Cannot recognize hive type string: , column: . 
The specified data type for the field cannot be recognized by Spark SQL. Please 
check the data type of the specified field and ensure that it is a valid Spark 
SQL data type. Refer to the Spark SQL documentation for a list of valid data 
types and their format. If the data type is correct, please ensure that you are 
using a supported version of Spark SQL."
 ],
 "sqlState" : "429BB"
   },


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1df1d7661a3 -> d0fe6d4b796)

2023-06-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1df1d7661a3 [SPARK-43516][ML][PYTHON] Update MLv2 Transformer 
interfaces
 add d0fe6d4b796 [SPARK-43948][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[0050|0057|0058|0059]

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   | 47 +-
 .../spark/sql/errors/QueryParsingErrors.scala  | 15 ---
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  2 +-
 .../spark/sql/execution/SparkSqlParser.scala   |  2 +-
 .../command/v2/AlterTableReplaceColumnsSuite.scala | 17 +++-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 12 +++---
 6 files changed, 61 insertions(+), 34 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-05 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 16ee478a9de [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206
16ee478a9de is described below

commit 16ee478a9debe94eadbf62ead072c2ded10220c7
Author: Amanda Liu 
AuthorDate: Mon Jun 5 22:19:38 2023 +0300

[SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

### What changes were proposed in this pull request?
The PR assigns a more descriptive name to the error class 
`_LEGACY_ERROR_TEMP_2206` -> `BATCH_METADATA_NOT_FOUND`

### Why are the changes needed?
This change improves the error framework by making the error name more 
descriptive.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
The error test will be handled in a future PR (see JIRA ticket: 
https://issues.apache.org/jira/browse/SPARK-43940)

Closes #41387 from asl3/_LEGACY_ERROR_TEMP_2206.

Lead-authored-by: Amanda Liu 
Co-authored-by: asl3 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json  | 11 ++-
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala|  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index c73223fba39..2da08829862 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -74,6 +74,12 @@
   "Cannot convert Avro  to SQL  because the original 
encoded data type is , however you're trying to read the field as 
, which leads to data being read as null. Please provide a wider 
decimal type to get the correct result. To allow reading null to this field, 
enable the SQL configuration: ."
 ]
   },
+  "BATCH_METADATA_NOT_FOUND" : {
+"message" : [
+  "Unable to find batch ."
+],
+"sqlState" : "42K03"
+  },
   "BINARY_ARITHMETIC_OVERFLOW" : {
 "message" : [
   "   caused overflow."
@@ -4978,11 +4984,6 @@
   "Cannot set timeout timestamp without enabling event time timeout in 
[map|flatMapGroupsWithState."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2206" : {
-"message" : [
-  "Unable to find batch ."
-]
-  },
   "_LEGACY_ERROR_TEMP_2207" : {
 "message" : [
   "Multiple streaming queries are concurrently using ."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 7ce3e7a9e7e..fd09e99b9ee 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -2011,7 +2011,7 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
 
   def batchMetadataFileNotFoundError(batchMetadataFile: Path): 
SparkFileNotFoundException = {
 new SparkFileNotFoundException(
-  errorClass = "_LEGACY_ERROR_TEMP_2206",
+  errorClass = "BATCH_METADATA_NOT_FOUND",
   messageParameters = Map(
 "batchMetadataFile" -> batchMetadataFile.toString()))
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `InsertSuite`

2023-06-03 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 90ec7eaddb6 [SPARK-43957][SQL][TESTS] Use `checkError()` to check 
`Exception` in `*Insert*Suite`
90ec7eaddb6 is described below

commit 90ec7eaddb66b6b2fe3afb8cdb68a9cf88f714de
Author: panbingkun 
AuthorDate: Sat Jun 3 22:22:20 2023 +0300

[SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in 
`*Insert*Suite`

### What changes were proposed in this pull request?
The pr aims to use `checkError()` to check `Exception` in `*Insert*Suite`, 
include:
- sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedUpdateAsDeleteAndInsertTableSuite
- sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite
- sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite

 Note:
But this pr does not include some of these cases, which directly throw 
AnalysisExecution, such as:

https://github.com/apache/spark/blob/898ad77900d887ac64800a616bd382def816eea6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L505-L515
After this PR, I will refactor these, assign them a name, and use the error 
framework. As these tasks are completed, all exceptions checks in 
`*Insert*Suite` will eventually be migrated to `checkError`.

### Why are the changes needed?
Migration on checkError() will make the tests independent from the text of 
error messages.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #41447 from panbingkun/check_error_for_insert_suites.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/SQLInsertTestSuite.scala  |  82 ++-
 ...ltaBasedUpdateAsDeleteAndInsertTableSuite.scala |  11 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 570 ++---
 .../org/apache/spark/sql/hive/InsertSuite.scala|  50 +-
 4 files changed, 477 insertions(+), 236 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala
index 904980d58d6..af85e44519b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql
 
 import org.apache.spark.SparkConf
+import org.apache.spark.SparkNumberFormatException
 import org.apache.spark.sql.catalyst.expressions.Hex
 import org.apache.spark.sql.connector.catalog.InMemoryPartitionTableCatalog
 import org.apache.spark.sql.internal.SQLConf
@@ -181,16 +182,28 @@ trait SQLInsertTestSuite extends QueryTest with 
SQLTestUtils {
   }
 
   test("insert with column list - mismatched column list size") {
-val msgs = Seq("Cannot write to table due to mismatched user specified 
column size",
-  "expected 3 columns but found")
 def test: Unit = {
   withTable("t1") {
 val cols = Seq("c1", "c2", "c3")
 createTable("t1", cols, Seq("int", "long", "string"))
-val e1 = intercept[AnalysisException](sql(s"INSERT INTO t1 (c1, c2) 
values(1, 2, 3)"))
-assert(e1.getMessage.contains(msgs(0)) || 
e1.getMessage.contains(msgs(1)))
-val e2 = intercept[AnalysisException](sql(s"INSERT INTO t1 (c1, c2, 
c3) values(1, 2)"))
-assert(e2.getMessage.contains(msgs(0)) || 
e2.getMessage.contains(msgs(1)))
+checkError(
+  exception = intercept[AnalysisException] {
+sql(s"INSERT INTO t1 (c1, c2) values(1, 2, 3)")
+  },
+  sqlState = None,
+  errorClass = "_LEGACY_ERROR_TEMP_1038",
+  parameters = Map("columnSize" -> "2", "outputSize" -> "3"),
+  context = ExpectedContext("values(1, 2, 3)", 24, 38)
+)
+checkError(
+  exception = intercept[AnalysisException] {
+sql(s"INSERT INTO t1 (c1, c2, c3) values(1, 2)")
+  },
+  sqlState = None,
+  errorClass = "_LEGACY_ERROR_TEMP_1038",
+  parameters = Map("columnSize" -> "3", "outputSize" -> "2"),
+  context = ExpectedContext("values(1, 2)", 28, 39)
+)
   }
 }
 withSQLConf(SQLConf.ENABLE_DEFAULT_COLUMNS.key -> "false") {
@@ -259,10 +272,15 @@ trait SQLInsertTestSuite extends QueryTest with 
SQLTestUtils {
 "che

[spark] branch branch-3.4 updated: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new e140bf719e3 [SPARK-43956][SQL][3.4] Fix the bug doesn't display 
column's sql for Percentile[Cont|Disc]
e140bf719e3 is described below

commit e140bf719e3e8d7347f5d00b2ebaf77d6a5b2210
Author: Jiaan Geng 
AuthorDate: Sat Jun 3 22:15:15 2023 +0300

[SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for 
Percentile[Cont|Disc]

### What changes were proposed in this pull request?
This PR used to backport https://github.com/apache/spark/pull/41436 to 3.4

### Why are the changes needed?
Fix the bug doesn't display column's sql for Percentile[Cont|Disc].

### Does this PR introduce _any_ user-facing change?
'Yes'.
Users could see the correct sql information.

### How was this patch tested?
Test cases updated.

Closes #41445 from beliefer/SPARK-43956_followup.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 .../expressions/aggregate/percentiles.scala|  4 ++--
 .../sql-tests/results/percentiles.sql.out  | 24 +++---
 .../results/postgreSQL/aggregates_part4.sql.out|  8 
 .../udf/postgreSQL/udf-aggregates_part4.sql.out|  8 
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
index 81bc7e51499..8447a5f9b51 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala
@@ -368,7 +368,7 @@ case class PercentileCont(left: Expression, right: 
Expression, reverse: Boolean
   override def sql(isDistinct: Boolean): String = {
 val distinct = if (isDistinct) "DISTINCT " else ""
 val direction = if (reverse) " DESC" else ""
-s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY v$direction)"
+s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY 
${left.sql}$direction)"
   }
   override protected def withNewChildrenInternal(
   newLeft: Expression, newRight: Expression): PercentileCont =
@@ -408,7 +408,7 @@ case class PercentileDisc(
   override def sql(isDistinct: Boolean): String = {
 val distinct = if (isDistinct) "DISTINCT " else ""
 val direction = if (reverse) " DESC" else ""
-s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY v$direction)"
+s"$prettyName($distinct${right.sql}) WITHIN GROUP (ORDER BY 
${left.sql}$direction)"
   }
 
   override protected def withNewChildrenInternal(
diff --git a/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out 
b/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out
index 38319875c71..cd99ded56bf 100644
--- a/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/percentiles.sql.out
@@ -144,7 +144,7 @@ SELECT
 FROM basic_pays
 ORDER BY salary
 -- !query schema
-struct 8900
 WINDOW w AS (PARTITION BY department)
 ORDER BY salary
 -- !query schema
-struct
+struct
 -- !query output
 0-10   2-6
 
@@ -608,7 +608,7 @@ FROM intervals
 GROUP BY k
 ORDER BY k
 -- !query schema
-struct
+struct
 -- !query output
 0  0 00:00:10.00 00:00:30.0
 1  0 00:00:12.50 00:00:17.5
@@ -626,7 +626,7 @@ FROM intervals
 GROUP BY k
 ORDER BY k
 -- !query schema
-struct
+struct
 -- !query output
 0  0 00:10:00.00 00:30:00.0
 1  0 00:12:30.00 00:17:30.0
@@ -641,7 +641,7 @@ SELECT
   percentile_disc(0.25) WITHIN GROUP (ORDER BY dt DESC)
 FROM intervals
 -- !query schema
-struct
+struct
 -- !query output
 0-10   2-6
 
@@ -655,7 +655,7 @@ FROM intervals
 GROUP BY k
 ORDER BY k
 -- !query schema
-struct
+struct
 -- !query output
 0  0 00:00:10.00 00:00:30.0
 1  0 00:00:10.00 00:00:20.0
@@ -673,7 +673,7 @@ FROM intervals
 GROUP BY k
 ORDER BY k
 -- !query schema
-struct
+struct
 -- !query output
 0  0 00:10:00.00 00:30:00.0
 1  0 00:10:00.00 00:20:00.0
@@ -689,7 +689,7 @@ SELECT
   percentile_cont(0.5) WITHIN GROUP (ORDER BY dt)
 FROM intervals
 -- !query schema
-struct
+struct
 -- !query output
 1-81-8 1-8
 
@@ -704,7 +704,7 @@ FROM intervals
 GROUP BY k
 ORDER BY k
 -- !query schema
-struct
+struct
 -- !query output
 0  0 00:00:20.00 00:00:20.00 00:00:20.

[spark] branch master updated (c3b62708cd6 -> 18b9bd9dcb0)

2023-06-02 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c3b62708cd6 [SPARK-43516][ML][FOLLOW-UP] Drop vector type support in 
Distributed ML for spark connect
 add 18b9bd9dcb0 [SPARK-43945][SQL][TESTS] Fix bug for `SQLQueryTestSuite` 
when run on local env

No new revisions were added by this update.

Summary of changes:
 sql/core/src/test/resources/sql-tests/results/identifier-clause.sql.out | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors

2023-06-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new adabbb50053 [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` 
from ids in errors
adabbb50053 is described below

commit adabbb50053d442c0852c0c39c125a02d777d04e
Author: Max Gekk 
AuthorDate: Thu Jun 1 10:14:12 2023 +0300

[SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors

### What changes were proposed in this pull request?
In the PR, I propose the drop the prefix `__auto_generated_subquery_name` 
from SQL ids in errors.

### Why are the changes needed?
The changes should improve user experience with Spark SQL by making error 
messages shorter and more clear.

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the affected test suites:
```
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
$ build/sbt "test:testOnly *QueryCompilationErrorsSuite"
$ build/sbt "sql/testOnly *QueryExecutionErrorsSuite"
```

Closes #41411 from MaxGekk/strip__auto_generated_subquery_name.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala| 6 +-
 .../test/resources/sql-tests/analyzer-results/natural-join.sql.out  | 2 +-
 .../src/test/resources/sql-tests/analyzer-results/pivot.sql.out | 2 +-
 .../test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out | 2 +-
 sql/core/src/test/resources/sql-tests/results/natural-join.sql.out  | 2 +-
 sql/core/src/test/resources/sql-tests/results/pivot.sql.out | 2 +-
 sql/core/src/test/resources/sql-tests/results/udf/udf-pivot.sql.out | 2 +-
 .../org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala   | 3 ++-
 .../org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 2 +-
 9 files changed, 14 insertions(+), 9 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
index 5460de77a14..885b2f775e0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
@@ -71,7 +71,11 @@ private[sql] trait QueryErrorsBase {
   }
 
   def toSQLId(parts: Seq[String]): String = {
-parts.map(quoteIdentifier).mkString(".")
+val cleaned = parts match {
+  case "__auto_generated_subquery_name" :: rest if rest != Nil => rest
+  case other => other
+}
+cleaned.map(quoteIdentifier).mkString(".")
   }
 
   def toSQLId(parts: String): String = {
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out
index 8fe2ba77855..987fb3e0a09 100644
--- 
a/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/analyzer-results/natural-join.sql.out
@@ -494,7 +494,7 @@ org.apache.spark.sql.AnalysisException
   "sqlState" : "42703",
   "messageParameters" : {
 "objectName" : "`nt2`.`k`",
-"proposal" : "`__auto_generated_subquery_name`.`k`, 
`__auto_generated_subquery_name`.`v1`, `__auto_generated_subquery_name`.`v2`"
+"proposal" : "`k`, `v1`, `v2`"
   },
   "queryContext" : [ {
 "objectType" : "",
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out
index e5560c04ff1..d7b77f8ce01 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out
@@ -743,7 +743,7 @@ org.apache.spark.sql.AnalysisException
   "errorClass" : "INCOMPARABLE_PIVOT_COLUMN",
   "sqlState" : "42818",
   "messageParameters" : {
-"columnName" : "`__auto_generated_subquery_name`.`m`"
+"columnName" : "`m`"
   }
 }
 
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out
index b5f4a6be3b2..fa94f77207b 100644
--- 
a/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/analyzer-results/udf/udf-pivot.sql.out
@@ -683,7 +683,7 @@ org.apache.spark.sql.AnalysisException
   "errorClas

[spark] branch master updated: [SPARK-43867][SQL] Improve suggested candidates for unresolved attribute

2023-05-31 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a8893422752 [SPARK-43867][SQL] Improve suggested candidates for 
unresolved attribute
a8893422752 is described below

commit a88934227523334550e451e437ce013772001079
Author: Max Gekk 
AuthorDate: Wed May 31 21:04:44 2023 +0300

[SPARK-43867][SQL] Improve suggested candidates for unresolved attribute

### What changes were proposed in this pull request?
In the PR, I propose to change the approach of stripping the common part of 
candidate qualifiers in `StringUtils.orderSuggestedIdentifiersBySimilarity`:
1. If all candidates have the same qualifier including namespace and table 
name, drop it. It should be dropped if the base string (unresolved attribute) 
doesn't include a namespace and table name. For example:
- `[ns1.table1.col1, ns1.table1.col2] -> [col1, col2]` for unresolved 
attribute `col0`
- `[ns1.table1.col1, ns1.table1.col2] -> [table1.col1, table1.col2]` 
for unresolved attribute `table1.col0`
2. If all candidates belong to the same namespace, just drop it. It should 
be dropped for any non-fully qualified unresolved attribute. For example:
- `[ns1.table1.col1, ns1.table2.col2] -> [table1.col1, table2.col2]` 
for unresolved attribute `col0` or `table0.col0`
- `[ns1.table1.col1, ns1.table1.col2] -> [ns1.table1.col1, 
ns1.table1.col2]` for unresolved attribute `ns0.table0.col0`
4. Otherwise take the suggested candidates AS IS.
5. Sort the candidate list using the levenshtein distance.

### Why are the changes needed?
This should improve user experience with Spark SQL by simplifying the error 
message about an unresolved attribute.

### Does this PR introduce _any_ user-facing change?
Yes, it changes the error message.

### How was this patch tested?
By running the existing test suites:
```
$ build/sbt "test:testOnly *AnalysisErrorSuite"
$ build/sbt "test:testOnly *QueryCompilationErrorsSuite"
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
$ build/sbt "test:testOnly *DatasetUnpivotSuite"
$ build/sbt "test:testOnly *DatasetSuite"

```

Closes #41368 from MaxGekk/fix-suggested-column-list.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  3 +-
 .../plans/logical/basicLogicalOperators.scala  |  2 +-
 .../spark/sql/catalyst/util/StringUtils.scala  | 46 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  4 +-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala |  5 ++-
 .../columnresolution-negative.sql.out  |  2 +-
 .../analyzer-results/group-by-all.sql.out  |  2 +-
 .../analyzer-results/join-lateral.sql.out  |  2 +-
 .../postgreSQL/aggregates_part1.sql.out|  2 +-
 .../analyzer-results/postgreSQL/join.sql.out   |  6 +--
 .../udf/postgreSQL/udf-aggregates_part1.sql.out|  2 +-
 .../udf/postgreSQL/udf-join.sql.out|  6 +--
 .../results/columnresolution-negative.sql.out  |  2 +-
 .../sql-tests/results/group-by-all.sql.out |  2 +-
 .../sql-tests/results/join-lateral.sql.out |  2 +-
 .../results/postgreSQL/aggregates_part1.sql.out|  2 +-
 .../sql-tests/results/postgreSQL/join.sql.out  |  6 +--
 .../udf/postgreSQL/udf-aggregates_part1.sql.out|  2 +-
 .../results/udf/postgreSQL/udf-join.sql.out|  6 +--
 .../org/apache/spark/sql/DatasetUnpivotSuite.scala |  2 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  4 +-
 .../sql/errors/QueryCompilationErrorsSuite.scala   |  3 +-
 22 files changed, 53 insertions(+), 60 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index c46dff1c4bf..594c0b666e8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -139,7 +139,8 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
   a: Attribute,
   errorClass: String): Nothing = {
 val missingCol = a.sql
-val candidates = operator.inputSet.toSeq.map(_.qualifiedName)
+val candidates = operator.inputSet.toSeq
+  .map(attr => attr.qualifier :+ attr.name)
 val orderedCandidates =
   StringUtils.orderSuggestedIdentifiersBySimilarity(missingCol, candidates)
 throw QueryCompilationErrors.unresolvedAttributeError(
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spa

[spark] branch master updated (c2060e7c0a3 -> 3457b4be356)

2023-05-31 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c2060e7c0a3 [SPARK-43081][ML][FOLLOW-UP] Improve torch distributor 
data loader code
 add 3457b4be356 
[SPARK-43852][SPARK-43853][SPARK-43854][SPARK-43855][SPARK-43856] Assign names 
to the error class _LEGACY_ERROR_TEMP_2418-2425

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   | 57 --
 .../sql/tests/pandas/test_pandas_udf_scalar.py |  4 +-
 python/pyspark/sql/tests/test_udf.py   |  4 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 18 +++
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 28 ---
 .../apache/spark/sql/DataFrameAsOfJoinSuite.scala  | 29 ++-
 .../apache/spark/sql/LateralColumnAliasSuite.scala | 32 
 .../sql/hive/execution/AggregationQuerySuite.scala | 25 ++
 .../spark/sql/hive/execution/HiveUDAFSuite.scala   | 15 --
 .../spark/sql/hive/execution/UDAQuerySuite.scala   | 25 ++
 10 files changed, 145 insertions(+), 92 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43882][SQL] Assign name to _LEGACY_ERROR_TEMP_2122

2023-05-31 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2687d784fe4 [SPARK-43882][SQL] Assign name to _LEGACY_ERROR_TEMP_2122
2687d784fe4 is described below

commit 2687d784fe4d20af321f11074139c0ce382bbaef
Author: Jia Fan 
AuthorDate: Wed May 31 10:26:15 2023 +0300

[SPARK-43882][SQL] Assign name to _LEGACY_ERROR_TEMP_2122

### What changes were proposed in this pull request?
This PR proposes to assign name to _LEGACY_ERROR_TEMP_2122, 
"FAILED_PARSE_STRUCT_TYPE".

### Why are the changes needed?
Assign proper name to _LEGACY_ERROR_TEMP_2122

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test

Closes #41381 from Hisoka-X/SPARK-43882_LEGACY_ERROR_TEMP_2122.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json  | 11 ++-
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala|  4 ++--
 .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala   | 10 ++
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 8c3ba1e190d..7f2b1975855 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -634,6 +634,12 @@
 ],
 "sqlState" : "38000"
   },
+  "FAILED_PARSE_STRUCT_TYPE" : {
+"message" : [
+  "Failed parsing struct: ."
+],
+"sqlState" : "22018"
+  },
   "FAILED_RENAME_PATH" : {
 "message" : [
   "Failed to rename  to  as destination already 
exists."
@@ -4563,11 +4569,6 @@
   "Do not support type ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2122" : {
-"message" : [
-  "Failed parsing : ."
-]
-  },
   "_LEGACY_ERROR_TEMP_2124" : {
 "message" : [
   "Failed to merge decimal types with incompatible scale  and 
."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 5daa8ed3b7f..7ce3e7a9e7e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -1305,8 +1305,8 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
 
   def failedParsingStructTypeError(raw: String): SparkRuntimeException = {
 new SparkRuntimeException(
-  errorClass = "_LEGACY_ERROR_TEMP_2122",
-  messageParameters = Map("simpleString" -> StructType.simpleString, "raw" 
-> raw))
+  errorClass = "FAILED_PARSE_STRUCT_TYPE",
+  messageParameters = Map("raw" -> toSQLValue(raw, StringType)))
   }
 
   def cannotMergeDecimalTypesWithIncompatibleScaleError(
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
index 4bcb1d115b7..6d2c2600cbb 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
@@ -633,6 +633,16 @@ class QueryExecutionErrorsSuite
 "config" -> s""""${SQLConf.ANSI_ENABLED.key}""""))
   }
 
+  test("FAILED_PARSE_STRUCT_TYPE: parsing invalid struct type") {
+val raw = 
"""{"type":"array","elementType":"integer","containsNull":false}"""
+checkError(
+  exception = intercept[SparkRuntimeException] {
+StructType.fromString(raw)
+  },
+  errorClass = "FAILED_PARSE_STRUCT_TYPE",
+  parameters = Map("raw" -> s"'$raw'"))
+  }
+
   test("CAST_OVERFLOW: from long to ANSI intervals") {
 Seq(
   LongType -> "9223372036854775807L",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7d87fecda70 -> 11390c50972)

2023-05-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7d87fecda70 [SPARK-43878][BUILD] Upgrade `cyclonedx-maven-plugin` from 
2.7.6 to 2.7.9
 add 11390c50972 [SPARK-43815][SQL] Add `to_varchar` alias for `to_char`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../sql-functions/sql-expression-schema.md |  1 +
 .../sql-tests/analyzer-results/charvarchar.sql.out | 21 +++
 .../resources/sql-tests/inputs/charvarchar.sql |  5 +
 .../sql-tests/results/charvarchar.sql.out  | 24 ++
 5 files changed, 52 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43862][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_(1254 & 1315)

2023-05-30 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0c6ea478d6b [SPARK-43862][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_(1254 & 1315)
0c6ea478d6b is described below

commit 0c6ea478d6b448caab5c969be122159acef2bbeb
Author: panbingkun 
AuthorDate: Tue May 30 14:18:26 2023 +0300

[SPARK-43862][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_(1254 & 1315)

### What changes were proposed in this pull request?
The pr aims to
1. Assign a name to the error class, include:
  - _LEGACY_ERROR_TEMP_1254 -> UNSUPPORTED_OVERWRITE.PATH
  - _LEGACY_ERROR_TEMP_1315 -> UNSUPPORTED_OVERWRITE.TABLE

2. Convert _LEGACY_ERROR_TEMP_0002 to INTERNAL_ERROR.

### Why are the changes needed?
- The changes improve the error framework.
- Because the subclass `SparkSqlAstBuilder` of `AstBuilder` has already 
override methods `visitInsertOverwriteDir` and `visitInsertOverwriteHiveDir`. 
In reality, `SparkSqlParser` is used in the Spark base code , and 
`SparkSqlAstBuilder` is used, The two exceptions mentioned above in AstBuilder 
will not be thrown through the user's perspective.

https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L46-L47

- visitInsertOverwriteDir

https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L802-L834

- visitInsertOverwriteHiveDir

https://github.com/apache/spark/blob/88f69d6f92860823b1a90bc162ebca2b7c8132fc/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L848-L866

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manual testing:
$ build/sbt "test:testOnly *DDLParserSuite"
$ build/sbt "test:testOnly *InsertSuite"
$ build/sbt "test:testOnly *MetastoreDataSourcesSuite"
$ build/sbt "test:testOnly *HiveDDLSuite"

- Pass GA.

Closes #41367 from panbingkun/LEGACY_ERROR_TEMP_1254.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 32 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  4 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 18 +++---
 .../spark/sql/errors/QueryParsingErrors.scala  |  5 +-
 .../spark/sql/catalyst/analysis/AnalysisTest.scala |  5 ++
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 20 +++
 .../org/apache/spark/sql/DataFrameWriter.scala |  6 +-
 .../apache/spark/sql/execution/command/ddl.scala   | 12 +++-
 .../execution/datasources/DataSourceStrategy.scala |  2 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 70 --
 .../spark/sql/hive/MetastoreDataSourcesSuite.scala | 35 ++-
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 11 ++--
 12 files changed, 149 insertions(+), 71 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 07ff6e1c7c2..8c3ba1e190d 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -2320,6 +2320,23 @@
   "grouping()/grouping_id() can only be used with 
GroupingSets/Cube/Rollup."
 ]
   },
+  "UNSUPPORTED_OVERWRITE" : {
+"message" : [
+  "Can't overwrite the target that is also being read from."
+],
+"subClass" : {
+  "PATH" : {
+"message" : [
+  "The target path is ."
+]
+  },
+  "TABLE" : {
+"message" : [
+  "The target table is ."
+]
+  }
+}
+  },
   "UNSUPPORTED_SAVE_MODE" : {
 "message" : [
   "The save mode  is not supported for:"
@@ -2477,11 +2494,6 @@
   "Invalid InsertIntoContext."
 ]
   },
-  "_LEGACY_ERROR_TEMP_0002" : {
-"message" : [
-  "INSERT OVERWRITE DIRECTORY is not supported."
-]
-  },
   "_LEGACY_ERROR_TEMP_0004" : {
 "message" : [
   "Empty source for merge: you should specify a source table/subquery in 
merge."
@@ -3669,11 +3681,6 @@
   "Cannot alter a table with ALTER VIEW. Please use ALTER TABLE instead."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1254" : {
-"message" : [
-  "Cannot overwrite a path that is also being read from."
-]
-  },
   "_LEGACY_ERROR_TEMP_1255" : {
 "message

[spark] branch master updated (27bb384947e -> 8b464df9fcf)

2023-05-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 27bb384947e [SPARK-43841][SQL] Handle candidate attributes with no 
prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`
 add 8b464df9fcf [SPARK-43846][SQL][TESTS] Use checkError() to check 
Exception in SessionCatalogSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/catalog/SessionCatalogSuite.scala | 462 -
 1 file changed, 277 insertions(+), 185 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (31a8ef803a8 -> 27bb384947e)

2023-05-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 31a8ef803a8 [SPARK-43821][CONNECT][TESTS] Make the prompt for 
`findJar` method in IntegrationTestUtils clearer
 add 27bb384947e [SPARK-43841][SQL] Handle candidate attributes with no 
prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/StringUtils.scala  |  2 +-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala |  7 ++
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 27 ++
 3 files changed, 35 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-05-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e6e242e0181 [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to 
`UNSUPPORTED_FEATURE.TIME_TRAVEL`
e6e242e0181 is described below

commit e6e242e01813ddcc735f61a668059ed648a6cefb
Author: panbingkun 
AuthorDate: Sun May 28 21:15:24 2023 +0300

[SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to 
`UNSUPPORTED_FEATURE.TIME_TRAVEL`

### What changes were proposed in this pull request?
The pr aims to convert `_LEGACY_ERROR_TEMP_1337` to 
`UNSUPPORTED_FEATURE.TIME_TRAVEL` and remove `_LEGACY_ERROR_TEMP_1335`

### Why are the changes needed?
- The changes improve the error framework.
- In the spark base code `_ LEGACY_ ERROR_ TEMP_ 1335` is no longer used 
anywhere.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Add new UT
- Pass GA

Closes #41349 from panbingkun/SPARK-43839.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json| 10 --
 .../apache/spark/sql/errors/QueryCompilationErrors.scala|  6 --
 .../sql/execution/datasources/v2/V2SessionCatalog.scala |  6 --
 .../apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 13 +
 4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 36125d2cbae..f7c0879e1a2 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -4015,16 +4015,6 @@
   "Cannot specify both version and timestamp when time travelling the 
table."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1335" : {
-"message" : [
-  " is not a valid timestamp expression for time travel."
-]
-  },
-  "_LEGACY_ERROR_TEMP_1337" : {
-"message" : [
-  "Table  does not support time travel."
-]
-  },
   "_LEGACY_ERROR_TEMP_1338" : {
 "message" : [
   "Sinks cannot request distribution and ordering in continuous execution 
mode."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 05b829838aa..45a9a03df4d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -3152,12 +3152,6 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
   messageParameters = Map("relationId" -> relationId))
   }
 
-  def tableNotSupportTimeTravelError(tableName: Identifier): Throwable = {
-new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1337",
-  messageParameters = Map("tableName" -> tableName.toString))
-  }
-
   def writeDistributionAndOrderingNotSupportedInContinuousExecution(): 
Throwable = {
 new AnalysisException(
   errorClass = "_LEGACY_ERROR_TEMP_1338",
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala
index 437194b7b5b..8234fb5a0b1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala
@@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog)
   throw QueryCompilationErrors.timeTravelUnsupportedError(
 toSQLId(catalogTable.identifier.nameParts))
 } else {
-  throw QueryCompilationErrors.tableNotSupportTimeTravelError(ident)
+  throw QueryCompilationErrors.timeTravelUnsupportedError(
+toSQLId(catalogTable.identifier.nameParts))
 }
 
-  case _ => throw 
QueryCompilationErrors.tableNotSupportTimeTravelError(ident)
+  case _ => throw QueryCompilationErrors.timeTravelUnsupportedError(
+toSQLId(ident.asTableIdentifier.nameParts))
 }
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
index 377596466db..4bcb1d115b7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
@@ -886,6 +886,19 @@ class QueryExecutio

[spark] branch master updated: [SPARK-43834][SQL] Use error classes in the compilation errors of `ResolveDefaultColumns`

2023-05-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 588188f481d [SPARK-43834][SQL] Use error classes in the compilation 
errors of `ResolveDefaultColumns`
588188f481d is described below

commit 588188f481db899317bdc398438d6bd749224f9f
Author: panbingkun 
AuthorDate: Sun May 28 19:08:25 2023 +0300

[SPARK-43834][SQL] Use error classes in the compilation errors of 
`ResolveDefaultColumns`

### What changes were proposed in this pull request?
The pr aims to use error classes in the compilation errors of 
`ResolveDefaultColumns`.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update UT.
- Pass GA.

Closes #41345 from panbingkun/SPARK-43834.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   |  27 ++-
 .../catalyst/util/ResolveDefaultColumnsUtil.scala  |  21 +--
 .../spark/sql/errors/QueryCompilationErrors.scala  |  43 -
 .../sql/catalyst/catalog/SessionCatalogSuite.scala |  38 +++-
 .../analysis/ResolveDefaultColumnsSuite.scala  |  53 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 206 +++--
 6 files changed, 290 insertions(+), 98 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index c8e11e6e55e..36125d2cbae 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -948,6 +948,28 @@
 ],
 "sqlState" : "42000"
   },
+  "INVALID_DEFAULT_VALUE" : {
+"message" : [
+  "Failed to execute  command because the destination table 
column  has a DEFAULT value ,"
+],
+"subClass" : {
+  "DATA_TYPE" : {
+"message" : [
+  "which requires  type, but the statement provided a 
value of incompatible  type."
+]
+  },
+  "SUBQUERY_EXPRESSION" : {
+"message" : [
+  "which contains subquery expressions."
+]
+  },
+  "UNRESOLVED_EXPRESSION" : {
+"message" : [
+  "which fails to resolve as a valid expression."
+]
+  }
+}
+  },
   "INVALID_DRIVER_MEMORY" : {
 "message" : [
   "System memory  must be at least . Please 
increase heap size using the --driver-memory option or \"\" in Spark 
configuration."
@@ -4048,11 +4070,6 @@
   "Failed to execute  command because DEFAULT values are 
not supported when adding new columns to previously existing target data source 
with table provider: \"\"."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1347" : {
-"message" : [
-  "Failed to execute command because subquery expressions are not allowed 
in DEFAULT values."
-]
-  },
   "_LEGACY_ERROR_TEMP_2000" : {
 "message" : [
   ". If necessary set  to false to bypass this error."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
index 8c7e2ad4f1d..0f5c413ed78 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
@@ -188,14 +188,13 @@ object ResolveDefaultColumns {
   parser.parseExpression(defaultSQL)
 } catch {
   case ex: ParseException =>
-throw new AnalysisException(
-  s"Failed to execute $statementType command because the destination 
table column " +
-s"$colName has a DEFAULT value of $defaultSQL which fails to parse 
as a valid " +
-s"expression: ${ex.getMessage}")
+throw QueryCompilationErrors.defaultValuesUnresolvedExprError(
+  statementType, colName, defaultSQL, ex)
 }
 // Check invariants before moving on to analysis.
 if (parsed.containsPattern(PLAN_EXPRESSION)) {
-  throw 
QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions()
+  throw 
QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions(
+statementType, colName, defaultSQL)
 }
 // Analyze the parse result.
 val plan = try {
@@ -205,10 +204,8 @@ object ResolveDefaultColumns {
   ConstantFolding(analyzed)
 } catch {
   case ex: AnalysisEx

[spark] branch master updated: [SPARK-43837][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_103[1-2]

2023-05-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2c0a206a89f [SPARK-43837][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_103[1-2]
2c0a206a89f is described below

commit 2c0a206a89ff9042a0577a7f5f30fa20fb8c984a
Author: panbingkun 
AuthorDate: Sun May 28 18:59:20 2023 +0300

[SPARK-43837][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_103[1-2]

### What changes were proposed in this pull request?
The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_103[1-2].

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update existed UT.
- Pass GA.

Closes #41346 from panbingkun/SPARK-43837.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 27 +++---
 .../spark/sql/errors/QueryCompilationErrors.scala  | 18 +---
 .../spark/sql/DataFrameWindowFramesSuite.scala | 33 ++
 3 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 3a11001ad9d..c8e11e6e55e 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -909,6 +909,23 @@
 ],
 "sqlState" : "22003"
   },
+  "INVALID_BOUNDARY" : {
+"message" : [
+  "The boundary  is invalid: ."
+],
+"subClass" : {
+  "END" : {
+"message" : [
+  "Expected the value is '0', '', '[, 
]'."
+]
+  },
+  "START" : {
+"message" : [
+  "Expected the value is '0', '', '[, 
]'."
+]
+  }
+}
+  },
   "INVALID_BUCKET_FILE" : {
 "message" : [
   "Invalid bucket file: ."
@@ -3840,16 +3857,6 @@
   "Unable to find the column `` given []."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1301" : {
-"message" : [
-  "Boundary start is not a valid integer: ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_1302" : {
-"message" : [
-  "Boundary end is not a valid integer: ."
-]
-  },
   "_LEGACY_ERROR_TEMP_1304" : {
 "message" : [
   "Unexpected type  of the relation ."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 3cb22491aed..18ace731dd4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -2877,14 +2877,24 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
 
   def invalidBoundaryStartError(start: Long): Throwable = {
 new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1301",
-  messageParameters = Map("start" -> start.toString))
+  errorClass = "INVALID_BOUNDARY.START",
+  messageParameters = Map(
+"boundary" -> toSQLId("start"),
+"invalidValue" -> toSQLValue(start, LongType),
+"longMinValue" -> toSQLValue(Long.MinValue, LongType),
+"intMinValue" -> toSQLValue(Int.MinValue, IntegerType),
+"intMaxValue" -> toSQLValue(Int.MaxValue, IntegerType)))
   }
 
   def invalidBoundaryEndError(end: Long): Throwable = {
 new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1302",
-  messageParameters = Map("end" -> end.toString))
+  errorClass = "INVALID_BOUNDARY.END",
+  messageParameters = Map(
+"boundary" -> toSQLId("end"),
+"invalidValue" -> toSQLValue(end, LongType),
+"longMaxValue" -> toSQLValue(Long.MaxValue, LongType),
+"intMinValue" -> toSQLValue(Int.MinValue, IntegerType),
+"intMaxValue" -> toSQLValue(Int.MaxValue, IntegerType)))
   }
 
   def tableOrViewNotFound(ident: Seq[String]): Throwable = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala
index 48a3d740559..2a81f7e7c2f 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala
+++ 
b/sql/core

[spark] branch master updated: [SPARK-43820][SPARK-43822][SPARK-43823][SPARK-43826][SPARK-43827] Assign names to the error class _LEGACY_ERROR_TEMP_241[1-7]

2023-05-28 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fe7bdce8d12 
[SPARK-43820][SPARK-43822][SPARK-43823][SPARK-43826][SPARK-43827] Assign names 
to the error class _LEGACY_ERROR_TEMP_241[1-7]
fe7bdce8d12 is described below

commit fe7bdce8d121e2733e82706177d34f0342db0cbe
Author: Jiaan Geng 
AuthorDate: Sun May 28 13:50:59 2023 +0300

[SPARK-43820][SPARK-43822][SPARK-43823][SPARK-43826][SPARK-43827] Assign 
names to the error class _LEGACY_ERROR_TEMP_241[1-7]

### What changes were proposed in this pull request?
The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_241[1-7].

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases.

Closes #41339 from beliefer/2411-2417.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 45 +---
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 32 ---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 20 ++---
 .../sql-tests/analyzer-results/percentiles.sql.out | 48 +++---
 .../sql-tests/results/percentiles.sql.out  | 48 +++---
 5 files changed, 99 insertions(+), 94 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 10a483396e6..3a11001ad9d 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -617,6 +617,11 @@
   "Not found an encoder of the type  to Spark SQL internal 
representation. Consider to change the input type to one of supported at 
'/sql-ref-datatypes.html'."
 ]
   },
+  "EVENT_TIME_IS_NOT_ON_TIMESTAMP_TYPE" : {
+"message" : [
+  "The event time  has the invalid type , but 
expected \"TIMESTAMP\"."
+]
+  },
   "FAILED_EXECUTE_UDF" : {
 "message" : [
   "Failed to execute user defined function (: () 
=> )."
@@ -1371,6 +1376,11 @@
 ],
 "sqlState" : "42903"
   },
+  "INVALID_WINDOW_SPEC_FOR_AGGREGATION_FUNC" : {
+"message" : [
+  "Cannot specify ORDER BY or a window frame for ."
+]
+  },
   "INVALID_WRITE_DISTRIBUTION" : {
 "message" : [
   "The requested write distribution is invalid."
@@ -1393,6 +1403,11 @@
   }
 }
   },
+  "JOIN_CONDITION_IS_NOT_BOOLEAN_TYPE" : {
+"message" : [
+  "The join condition  has the invalid type 
, expected \"BOOLEAN\"."
+]
+  },
   "LOCATION_ALREADY_EXISTS" : {
 "message" : [
   "Cannot name the managed table as , as its associated 
location  already exists. Please pick a different table name, or 
remove the existing location first."
@@ -1785,6 +1800,11 @@
 ],
 "sqlState" : "22023"
   },
+  "SEED_EXPRESSION_IS_UNFOLDABLE" : {
+"message" : [
+  "The seed expression  of the expression  must be 
foldable."
+]
+  },
   "SORT_BY_WITHOUT_BUCKETING" : {
 "message" : [
   "sortBy must be used together with bucketBy."
@@ -5441,31 +5461,6 @@
   "failed to evaluate expression : "
 ]
   },
-  "_LEGACY_ERROR_TEMP_2411" : {
-"message" : [
-  "Cannot specify order by or frame for ''."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2413" : {
-"message" : [
-  "Input argument to  must be a constant."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2414" : {
-"message" : [
-  "Event time must be defined on a window or a timestamp, but  is 
of type ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2416" : {
-"message" : [
-  "join condition '' of type  is not a boolean."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2417" : {
-"message" : [
-  "join condition '' of type  is not a boolean."
-]
-  },
   "_LEGACY_ERROR_TEMP_2418" : {
 "message" : [
   "Input argument tolerance must be a constant."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 43f12fabf70..cafabb22d10 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/m

[spark] branch master updated (7ce4dc64273 -> d052a454fda)

2023-05-27 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7ce4dc64273 [SPARK-41775][PYTHON][FOLLOWUP] Use pyspark.cloudpickle 
instead of `cloudpickle` in torch distributor
 add d052a454fda [SPARK-43824][SPARK-43825] [SQL] Assign names to the error 
class _LEGACY_ERROR_TEMP_128[1-2]

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json | 20 ++--
 .../spark/sql/errors/QueryCompilationErrors.scala| 14 +++---
 .../apache/spark/sql/execution/command/views.scala   |  3 +--
 .../spark/sql/execution/SQLViewTestSuite.scala   | 16 
 4 files changed, 26 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (24901bf187f -> bccfe71a32f)

2023-05-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 24901bf187f [SPARK-43808][SQL][TESTS] Use `checkError()` to check 
`Exception` in `SQLViewTestSuite`
 add bccfe71a32f 
[SPARK-43762][SPARK-43763][SPARK-43764][SPARK-43765][SPARK-43766][SQL] Assign 
names to the error class _LEGACY_ERROR_TEMP_24[06-10]

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   | 40 -
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 26 +++---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  9 +++--
 .../sql/catalyst/analysis/AnalysisSuite.scala  | 42 ++
 .../analyzer-results/group-analytics.sql.out   |  2 +-
 .../udf/udf-group-analytics.sql.out|  2 +-
 .../sql-tests/results/group-analytics.sql.out  |  2 +-
 .../results/udf/udf-group-analytics.sql.out|  2 +-
 8 files changed, 66 insertions(+), 59 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f718b025d87 -> 24901bf187f)

2023-05-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f718b025d87 [SPARK-43802][SQL] Fix codegen for unhex and unbase64 with 
failOnError=true
 add 24901bf187f [SPARK-43808][SQL][TESTS] Use `checkError()` to check 
`Exception` in `SQLViewTestSuite`

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/SQLViewTestSuite.scala | 145 ++---
 1 file changed, 95 insertions(+), 50 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43802][SQL] Fix codegen for unhex and unbase64 with failOnError=true

2023-05-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f718b025d87 [SPARK-43802][SQL] Fix codegen for unhex and unbase64 with 
failOnError=true
f718b025d87 is described below

commit f718b025d87ae3726210c60ff71cb34917b32f51
Author: Adam Binford 
AuthorDate: Fri May 26 20:37:14 2023 +0300

[SPARK-43802][SQL] Fix codegen for unhex and unbase64 with failOnError=true

### What changes were proposed in this pull request?

Fixes an error with codegen for unhex and unbase64 expression when 
failOnError is enabled introduced in https://github.com/apache/spark/pull/37483.

### Why are the changes needed?

Codegen fails and Spark falls back to interpreted evaluation:
```
Caused by: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 47, Column 1: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 47, 
Column 1: Unknown variable or type "BASE64"
```
in the code block:
```
/* 107 */ if 
(!org.apache.spark.sql.catalyst.expressions.UnBase64.isValidBase64(project_value_1))
 {
/* 108 */   throw 
QueryExecutionErrors.invalidInputInConversionError(
/* 109 */ ((org.apache.spark.sql.types.BinaryType$) 
references[1] /* to */),
/* 110 */ project_value_1,
/* 111 */ BASE64,
/* 112 */ "try_to_binary");
/* 113 */ }
```

### Does this PR introduce _any_ user-facing change?

Bug fix.

### How was this patch tested?

Added to the existing tests so evaluate an expression with failOnError 
enabled to test that path of the codegen.

Closes #41317 from Kimahriman/bug-to-binary-codegen.

Authored-by: Adam Binford 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/expressions/mathExpressions.scala |  3 +-
 .../catalyst/expressions/stringExpressions.scala   |  3 +-
 .../expressions/MathExpressionsSuite.scala |  3 ++
 .../expressions/StringExpressionsSuite.scala   |  4 +-
 .../sql/errors/QueryExecutionErrorsSuite.scala | 46 --
 5 files changed, 43 insertions(+), 16 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index dcc821a24ea..add59a38b72 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -1172,14 +1172,13 @@ case class Unhex(child: Expression, failOnError: 
Boolean = false)
 nullSafeCodeGen(ctx, ev, c => {
   val hex = Hex.getClass.getName.stripSuffix("$")
   val maybeFailOnErrorCode = if (failOnError) {
-val format = UTF8String.fromString("BASE64");
 val binaryType = ctx.addReferenceObj("to", BinaryType, 
BinaryType.getClass.getName)
 s"""
|if (${ev.value} == null) {
|  throw QueryExecutionErrors.invalidInputInConversionError(
|$binaryType,
|$c,
-   |$format,
+   |UTF8String.fromString("HEX"),
|"try_to_binary");
|}
|""".stripMargin
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
index 347dff0f4c4..03596ac40b1 100755
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
@@ -2472,14 +2472,13 @@ case class UnBase64(child: Expression, failOnError: 
Boolean = false)
 nullSafeCodeGen(ctx, ev, child => {
   val maybeValidateInputCode = if (failOnError) {
 val unbase64 = UnBase64.getClass.getName.stripSuffix("$")
-val format = UTF8String.fromString("BASE64");
 val binaryType = ctx.addReferenceObj("to", BinaryType, 
BinaryType.getClass.getName)
 s"""
|if (!$unbase64.isValidBase64($child)) {
|  throw QueryExecutionErrors.invalidInputInConversionError(
|$binaryType,
|$child,
-   |$format,
+   |UTF8String.fromString("BASE64"),
|"try_to_binary");
|}
""".stripMargin
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ex

[spark] branch master updated: [SPARK-43794][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1335

2023-05-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5590c9a4654 [SPARK-43794][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_1335
5590c9a4654 is described below

commit 5590c9a4654607488379703581e341d4062f9666
Author: panbingkun 
AuthorDate: Fri May 26 16:37:01 2023 +0300

[SPARK-43794][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1335

### What changes were proposed in this pull request?
The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_1335.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT.
Pass GA.

Closes #41314 from panbingkun/SPARK-43794.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 22 ++
 .../sql/catalyst/analysis/TimeTravelSpec.scala | 12 
 .../spark/sql/errors/QueryCompilationErrors.scala  |  6 +++---
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 17 +
 4 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index bbf0368ac59..738e037c39d 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1326,6 +1326,28 @@
   "Cannot create the persistent object  of the type  because 
it references to the temporary object  of the type . 
Please make the temporary object  persistent, or make the 
persistent object  temporary."
 ]
   },
+  "INVALID_TIME_TRAVEL_TIMESTAMP_EXPR" : {
+"message" : [
+  "The time travel timestamp expression  is invalid."
+],
+"subClass" : {
+  "INPUT" : {
+"message" : [
+  "Cannot be casted to the \"TIMESTAMP\" type."
+]
+  },
+  "NON_DETERMINISTIC" : {
+"message" : [
+  "Must be deterministic."
+]
+  },
+  "UNEVALUABLE" : {
+"message" : [
+  "Must be evaluable."
+]
+  }
+}
+  },
   "INVALID_TYPED_LITERAL" : {
 "message" : [
   "The value of the typed literal  is invalid: ."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala
index e33ddbb3213..26856d9a5e0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala
@@ -38,21 +38,25 @@ object TimeTravelSpec {
   val ts = timestamp.get
   assert(ts.resolved && ts.references.isEmpty && 
!SubqueryExpression.hasSubquery(ts))
   if (!Cast.canAnsiCast(ts.dataType, TimestampType)) {
-throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts)
+throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(
+  "INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.INPUT", ts)
   }
   val tsToEval = ts.transform {
 case r: RuntimeReplaceable => r.replacement
 case _: Unevaluable =>
-  throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts)
+  throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(
+"INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.UNEVALUABLE", ts)
 case e if !e.deterministic =>
-  throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts)
+  throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(
+"INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.NON_DETERMINISTIC", ts)
   }
   val tz = Some(conf.sessionLocalTimeZone)
   // Set `ansiEnabled` to false, so that it can return null for invalid 
input and we can provide
   // better error message.
   val value = Cast(tsToEval, TimestampType, tz, ansiEnabled = false).eval()
   if (value == null) {
-throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(ts)
+throw QueryCompilationErrors.invalidTimestampExprForTimeTravel(
+  "INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.INPUT", ts)
   }
   Some(AsOfTimestamp(value.asInstanceOf[Long]))
 } else if (version.nonEmpty) {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sq

[spark] branch master updated: [SPARK-43807][SQL] Migrate _LEGACY_ERROR_TEMP_1269 to PARTITION_SCHEMA_IS_EMPTY

2023-05-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 45e5f2e375b [SPARK-43807][SQL] Migrate _LEGACY_ERROR_TEMP_1269 to 
PARTITION_SCHEMA_IS_EMPTY
45e5f2e375b is described below

commit 45e5f2e375bec915e1683e6d2a222488ba831c91
Author: Jiaan Geng 
AuthorDate: Fri May 26 10:58:51 2023 +0300

[SPARK-43807][SQL] Migrate _LEGACY_ERROR_TEMP_1269 to 
PARTITION_SCHEMA_IS_EMPTY

### What changes were proposed in this pull request?
Currently, DS V1 uses `_LEGACY_ERROR_TEMP_1269` and DS V2 uses 
`INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY` if the partition 
operation on non-partition table.

This PR want migrate `_LEGACY_ERROR_TEMP_1269` to 
`PARTITION_SCHEMA_IS_EMPTY`

### Why are the changes needed?
Migrate `_LEGACY_ERROR_TEMP_1269` to `PARTITION_SCHEMA_IS_EMPTY`.

### Does this PR introduce _any_ user-facing change?
'Yes'.
The error msg has a little change.

### How was this patch tested?
Test case updated.

Closes #41325 from beliefer/SPARK-43807.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json | 5 -
 .../scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala   | 4 ++--
 .../apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala  | 4 ++--
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 0246d4f378e..bbf0368ac59 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -3618,11 +3618,6 @@
   "Failed to truncate table  when removing data of the 
path: ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1269" : {
-"message" : [
-  "SHOW PARTITIONS is not allowed on a table that is not partitioned: 
."
-]
-  },
   "_LEGACY_ERROR_TEMP_1270" : {
 "message" : [
   "SHOW CREATE TABLE is not supported on a temporary view: ."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 879bf620188..9921f50014d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -2630,8 +2630,8 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
 
   def showPartitionNotAllowedOnTableNotPartitionedError(tableIdentWithDB: 
String): Throwable = {
 new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1269",
-  messageParameters = Map("tableIdentWithDB" -> tableIdentWithDB))
+  errorClass = "INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY",
+  messageParameters = Map("name" -> toSQLId(tableIdentWithDB)))
   }
 
   def showCreateTableNotSupportedOnTempView(table: String): Throwable = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
index e67ed807a87..c423bfb9f24 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
@@ -130,8 +130,8 @@ class ShowPartitionsSuite extends ShowPartitionsSuiteBase 
with CommandSuiteBase
 exception = intercept[AnalysisException] {
   sql(sqlText)
 },
-errorClass = "_LEGACY_ERROR_TEMP_1269",
-parameters = Map("tableIdentWithDB" -> tableName))
+errorClass = "INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY",
+parameters = Map("name" -> tableName))
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43576][CORE] Remove unused declarations from Core module

2023-05-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 82bf3fcc81a [SPARK-43576][CORE] Remove unused declarations from Core 
module
82bf3fcc81a is described below

commit 82bf3fcc81ae0be8ce945242ae966cee4fae4104
Author: panbingkun 
AuthorDate: Fri May 26 10:19:46 2023 +0300

[SPARK-43576][CORE] Remove unused declarations from Core module

### What changes were proposed in this pull request?
The pr aims to remove unused declarations from `Core` module

### Why are the changes needed?
Make code clean.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #41218 from panbingkun/remove_unused_declaration_core.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/org/apache/spark/ui/static/executorspage.js | 1 -
 .../scala/org/apache/spark/deploy/history/ApplicationCache.scala   | 1 -
 core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala   | 3 ---
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala   | 5 -
 core/src/main/scala/org/apache/spark/ui/ToolTips.scala | 7 ---
 5 files changed, 17 deletions(-)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js 
b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
index 8c2dc13c35b..92d75c18e49 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
@@ -126,7 +126,6 @@ function totalDurationAlpha(totalGCTime, totalDuration) {
 (Math.min(totalGCTime / totalDuration + 0.5, 1)) : 1;
 }
 
-// When GCTimePercent is edited change ToolTips.TASK_TIME to match
 var GCTimePercent = 0.1;
 
 function totalDurationStyle(totalGCTime, totalDuration) {
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala 
b/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala
index 829631a0454..909f5ea937c 100644
--- a/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala
@@ -394,7 +394,6 @@ private[history] class ApplicationCacheCheckFilter(
 val httpRequest = request.asInstanceOf[HttpServletRequest]
 val httpResponse = response.asInstanceOf[HttpServletResponse]
 val requestURI = httpRequest.getRequestURI
-val operation = httpRequest.getMethod
 
 // if the request is for an attempt, check to see if it is in need of 
delete/refresh
 // and have the cache update the UI if so
diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
index 0d905b46953..cad107256c5 100644
--- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
@@ -404,9 +404,6 @@ private[spark] object HadoopRDD extends Logging {
*/
   val CONFIGURATION_INSTANTIATION_LOCK = new Object()
 
-  /** Update the input bytes read metric each time this number of records has 
been read */
-  val RECORDS_BETWEEN_BYTES_READ_METRIC_UPDATES = 256
-
   /**
* The three methods below are helpers for accessing the local map, a 
property of the SparkEnv of
* the local process.
diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index d8119fb9498..9582bdbf526 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -590,11 +590,6 @@ private class ProxyRedirectHandler(_proxyUri: String) 
extends HandlerWrapper {
 override def sendRedirect(location: String): Unit = {
   val newTarget = if (location != null) {
 val target = new URI(location)
-val path = if (target.getPath().startsWith("/")) {
-  target.getPath()
-} else {
-  req.getRequestURI().stripSuffix("/") + "/" + target.getPath()
-}
 // The target path should already be encoded, so don't re-encode it, 
just the
 // proxy address part.
 val proxyBase = UIUtils.uiRoot(req)
diff --git a/core/src/main/scala/org/apache/spark/ui/ToolTips.scala 
b/core/src/main/scala/org/apache/spark/ui/ToolTips.scala
index 587046676ff..b80fba396b3 100644
--- a/core/src/main/scala/org/apache/spark/ui/ToolTips.scala
+++ b/core/src/main/scala/org/apache/spark/ui/ToolTips.scala
@@ -35,10 +35,6 @@ private[spark] object ToolTips {
 
   val OUTPUT = "Bytes written to Hadoop."
 
-  val STORAGE_MEMORY =
-"Memory used / total available memory for storage of data " +
-  "like RD

[spark] branch master updated: [SPARK-43791][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1336

2023-05-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 69803fb0244 [SPARK-43791][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_1336
69803fb0244 is described below

commit 69803fb0244c9fc110653092bcfab7c221448bce
Author: panbingkun 
AuthorDate: Fri May 26 09:29:21 2023 +0300

[SPARK-43791][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1336

### What changes were proposed in this pull request?
The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_1336.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Update existed UT.
Pass GA.

Closes #41309 from panbingkun/LEGACY_ERROR_TEMP_1336.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 10 
 .../spark/sql/catalyst/analysis/Analyzer.scala |  3 +--
 .../sql/catalyst/analysis/CTESubstitution.scala|  3 ++-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  6 ++---
 .../spark/sql/execution/datasources/rules.scala|  3 ++-
 .../datasources/v2/V2SessionCatalog.scala  |  4 +++-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  8 +++
 .../spark/sql/execution/SQLViewTestSuite.scala | 27 --
 8 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 7683e7b8650..0246d4f378e 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -2139,6 +2139,11 @@
   "Table  does not support . Please check the 
current catalog and namespace to make sure the qualified table name is 
expected, and also check the catalog implementation which is configured by 
\"spark.sql.catalog\"."
 ]
   },
+  "TIME_TRAVEL" : {
+"message" : [
+  "Time travel on the relation: ."
+]
+  },
   "TOO_MANY_TYPE_ARGUMENTS_FOR_UDF_CLASS" : {
 "message" : [
   "UDF class with  type arguments."
@@ -3916,11 +3921,6 @@
   " is not a valid timestamp expression for time travel."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1336" : {
-"message" : [
-  "Cannot time travel ."
-]
-  },
   "_LEGACY_ERROR_TEMP_1337" : {
 "message" : [
   "Table  does not support time travel."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 604fc3f84c8..dc7134a9605 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1169,8 +1169,7 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
   throw 
QueryCompilationErrors.readNonStreamingTempViewError(identifier.quoted)
 }
 if (isTimeTravel) {
-  val target = if (tempViewPlan.isStreaming) "streams" else "views"
-  throw QueryCompilationErrors.timeTravelUnsupportedError(target)
+  throw 
QueryCompilationErrors.timeTravelUnsupportedError(toSQLId(identifier))
 }
 tempViewPlan
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
index 77c687843c3..4e3234f9c0d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
@@ -23,6 +23,7 @@ import 
org.apache.spark.sql.catalyst.expressions.SubqueryExpression
 import org.apache.spark.sql.catalyst.plans.logical.{Command, CTERelationDef, 
CTERelationRef, InsertIntoDir, LogicalPlan, ParsedStatement, SubqueryAlias, 
UnresolvedWith, WithCTE}
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.catalyst.trees.TreePattern._
+import org.apache.spark.sql.catalyst.util.TypeUtils._
 import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.internal.SQLConf.{LEGACY_CTE_PRECEDENCE_POLICY, 
LegacyBehaviorPolicy}
 
@@ -253,7 +254,7 @@ object CTESubstitution extends Rule[LogicalPlan] {
 _.containsAnyPattern(RELATION_TIME_TRAVEL, UNRESOLVED_RELATION, 
PLAN_EXPRESS

[spark] branch master updated: [SPARK-43749][SPARK-43750][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[4-5]

2023-05-25 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a6d2153b93 [SPARK-43749][SPARK-43750][SQL] Assign names to the error 
class _LEGACY_ERROR_TEMP_240[4-5]
3a6d2153b93 is described below

commit 3a6d2153b93c759b68e5827905d1867ba93ec9cf
Author: Jiaan Geng 
AuthorDate: Thu May 25 20:14:00 2023 +0300

[SPARK-43749][SPARK-43750][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_240[4-5]

### What changes were proposed in this pull request?
The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_240[4-5].

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

Closes #41279 from beliefer/INVALID_PARTITION_OPERATION.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 29 +--
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  8 ++---
 .../command/ShowPartitionsSuiteBase.scala  | 12 ---
 .../execution/command/v1/ShowPartitionsSuite.scala | 18 ++
 .../command/v2/AlterTableAddPartitionSuite.scala   | 20 ---
 .../command/v2/AlterTableDropPartitionSuite.scala  | 19 +++---
 .../execution/command/v2/ShowPartitionsSuite.scala | 41 +++---
 .../execution/command/v2/TruncateTableSuite.scala  | 20 ---
 8 files changed, 122 insertions(+), 45 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 1ccbdfdc6eb..7683e7b8650 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1156,6 +1156,23 @@
 },
 "sqlState" : "22023"
   },
+  "INVALID_PARTITION_OPERATION" : {
+"message" : [
+  "The partition command is invalid."
+],
+"subClass" : {
+  "PARTITION_MANAGEMENT_IS_UNSUPPORTED" : {
+"message" : [
+  "Table  does not support partition management."
+]
+  },
+  "PARTITION_SCHEMA_IS_EMPTY" : {
+"message" : [
+  "Table  is not partitioned."
+]
+  }
+}
+  },
   "INVALID_PROPERTY_KEY" : {
 "message" : [
   " is an invalid property key, please use quotes, e.g. SET 
=."
@@ -5374,16 +5391,6 @@
   "failed to evaluate expression : "
 ]
   },
-  "_LEGACY_ERROR_TEMP_2404" : {
-"message" : [
-  "Table  is not partitioned."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2405" : {
-"message" : [
-  "Table  does not support partition management."
-]
-  },
   "_LEGACY_ERROR_TEMP_2406" : {
 "message" : [
   "invalid cast from  to ."
@@ -5772,4 +5779,4 @@
   "Failed to get block , which is not a shuffle block"
 ]
   }
-}
\ No newline at end of file
+}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 407a9d363f4..fac3f491200 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -211,13 +211,13 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 case t: SupportsPartitionManagement =>
   if (t.partitionSchema.isEmpty) {
 r.failAnalysis(
-  errorClass = "_LEGACY_ERROR_TEMP_2404",
-  messageParameters = Map("name" -> r.name))
+  errorClass = 
"INVALID_PARTITION_OPERATION.PARTITION_SCHEMA_IS_EMPTY",
+  messageParameters = Map("name" -> toSQLId(r.name)))
   }
 case _ =>
   r.failAnalysis(
-errorClass = "_LEGACY_ERROR_TEMP_2405",
-messageParameters = Map("name" -> r.name))
+errorClass = 
"INVALID_PARTITION_OPERATION.PARTITION_MANAGEMENT_IS_UNSUPPORTED",
+messageParameters = Map("name" -> toSQLId(r.name)))
   }
   case _ =>
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala
index 27d2eb98543..462b967a759 100644
--- 
a/sql/

[spark] branch master updated: [SPARK-43786][SQL][TESTS] Add a test for nullability about 'levenshtein' function

2023-05-25 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 295f540a92f [SPARK-43786][SQL][TESTS] Add a test for nullability about 
'levenshtein' function
295f540a92f is described below

commit 295f540a92f9a4bde1da1244901b844223777a78
Author: panbingkun 
AuthorDate: Thu May 25 15:34:25 2023 +0300

[SPARK-43786][SQL][TESTS] Add a test for nullability about 'levenshtein' 
function

### What changes were proposed in this pull request?
The pr aims to add a test for nullability about 'levenshtein' function.

### Why are the changes needed?
Make testing more robust about 'levenshtein' function.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manual testing

Closes #41303 from panbingkun/SPARK-43786.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala  | 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
index e887c570944..f612c5903dc 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
@@ -129,12 +129,18 @@ class StringFunctionsSuite extends QueryTest with 
SharedSparkSession {
 val df = Seq(("kitten", "sitting"), ("frog", "fog")).toDF("l", "r")
 checkAnswer(df.select(levenshtein($"l", $"r")), Seq(Row(3), Row(1)))
 checkAnswer(df.selectExpr("levenshtein(l, r)"), Seq(Row(3), Row(1)))
+checkAnswer(df.select(levenshtein($"l", lit(null))), Seq(Row(null), 
Row(null)))
+checkAnswer(df.selectExpr("levenshtein(l, null)"), Seq(Row(null), 
Row(null)))
 
 checkAnswer(df.select(levenshtein($"l", $"r", 3)), Seq(Row(3), Row(1)))
 checkAnswer(df.selectExpr("levenshtein(l, r, 3)"), Seq(Row(3), Row(1)))
+checkAnswer(df.select(levenshtein(lit(null), $"r", 3)), Seq(Row(null), 
Row(null)))
+checkAnswer(df.selectExpr("levenshtein(null, r, 3)"), Seq(Row(null), 
Row(null)))
 
 checkAnswer(df.select(levenshtein($"l", $"r", 0)), Seq(Row(-1), Row(-1)))
 checkAnswer(df.selectExpr("levenshtein(l, r, 0)"), Seq(Row(-1), Row(-1)))
+checkAnswer(df.select(levenshtein($"l", lit(null), 0)), Seq(Row(null), 
Row(null)))
+checkAnswer(df.selectExpr("levenshtein(l, null, 0)"), Seq(Row(null), 
Row(null)))
   }
 
   test("string regex_replace / regex_extract") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (46949e692e8 -> 0db1f002c09)

2023-05-25 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 46949e692e8 [SPARK-43545][SQL][PYTHON] Support nested timestamp type
 add 0db1f002c09 [SPARK-43549][SQL] Convert _LEGACY_ERROR_TEMP_0036 to 
INVALID_SQL_SYNTAX.ANALYZE_TABLE_UNEXPECTED_NOSCAN

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json |  5 +
 .../org/apache/spark/sql/errors/QueryParsingErrors.scala |  4 ++--
 .../apache/spark/sql/catalyst/parser/DDLParserSuite.scala| 12 ++--
 3 files changed, 13 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38464][CORE] Use error classes in org.apache.spark.io

2023-05-24 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 76f82bd8c54 [SPARK-38464][CORE] Use error classes in 
org.apache.spark.io
76f82bd8c54 is described below

commit 76f82bd8c54352a0b38c3e1d8de5b24627446b9c
Author: Bo Zhang 
AuthorDate: Wed May 24 14:21:42 2023 +0300

[SPARK-38464][CORE] Use error classes in org.apache.spark.io

### What changes were proposed in this pull request?
This PR aims to change exceptions created in package org.apache.spark.io to 
use error class.

This PR also adds `toConf` and `toConfVal` in `SparkCoreErrors`.

### Why are the changes needed?
This is to move exceptions created in package org.apache.spark.io to error 
class.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Updated existing tests.

Closes #41277 from bozhang2820/spark-38464.

Authored-by: Bo Zhang 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json | 10 ++
 .../scala/org/apache/spark/errors/SparkCoreErrors.scala  | 12 
 .../scala/org/apache/spark/io/CompressionCodec.scala | 15 +++
 .../org/apache/spark/io/CompressionCodecSuite.scala  | 16 
 4 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index fcb9ec249db..1b75f89cc10 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -187,6 +187,16 @@
 ],
 "sqlState" : "22003"
   },
+  "CODEC_NOT_AVAILABLE" : {
+"message" : [
+  "The codec  is not available. Consider to set the config 
 to ."
+]
+  },
+  "CODEC_SHORT_NAME_NOT_FOUND" : {
+"message" : [
+  "Cannot find a short name for the codec ."
+]
+  },
   "COLUMN_ALIASES_IS_NOT_ALLOWED" : {
 "message" : [
   "Columns aliases are not allowed in ."
diff --git a/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala 
b/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala
index 8abb2564328..f8e7f2db259 100644
--- a/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala
+++ b/core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala
@@ -466,4 +466,16 @@ private[spark] object SparkCoreErrors {
 "requestedBytes" -> requestedBytes.toString,
 "receivedBytes" -> receivedBytes.toString).asJava)
   }
+
+  private def quoteByDefault(elem: String): String = {
+"\"" + elem + "\""
+  }
+
+  def toConf(conf: String): String = {
+quoteByDefault(conf)
+  }
+
+  def toConfVal(conf: String): String = {
+quoteByDefault(conf)
+  }
 }
diff --git a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala 
b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
index eb3dc938d4d..0bb392deb39 100644
--- a/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
+++ b/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
@@ -26,8 +26,9 @@ import net.jpountz.lz4.{LZ4BlockInputStream, 
LZ4BlockOutputStream, LZ4Factory}
 import net.jpountz.xxhash.XXHashFactory
 import org.xerial.snappy.{Snappy, SnappyInputStream, SnappyOutputStream}
 
-import org.apache.spark.SparkConf
+import org.apache.spark.{SparkConf, SparkIllegalArgumentException}
 import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.errors.SparkCoreErrors.{toConf, toConfVal}
 import org.apache.spark.internal.config._
 import org.apache.spark.util.Utils
 
@@ -88,8 +89,12 @@ private[spark] object CompressionCodec {
 } catch {
   case _: ClassNotFoundException | _: IllegalArgumentException => None
 }
-codec.getOrElse(throw new IllegalArgumentException(s"Codec [$codecName] is 
not available. " +
-  s"Consider setting $configKey=$FALLBACK_COMPRESSION_CODEC"))
+codec.getOrElse(throw new SparkIllegalArgumentException(
+  errorClass = "CODEC_NOT_AVAILABLE",
+  messageParameters = Map(
+"codecName" -> codecName,
+"configKey" -> toConf(configKey),
+"configVal" -> toConfVal(FALLBACK_COMPRESSION_CODEC
   }
 
   /**
@@ -102,7 +107,9 @@ private[spark] object CompressionCodec {
 } else {
   shortCompressionCodecNames
 .collectFirst { case (k, v) if v == codecName => k }
-.getOrElse { throw new IllegalArgumentException(s"No short name for 
codec $codecName.") }
+.getOrElse { throw new SparkIllegalArgumentException(
+

[spark] branch master updated (5f325ec917c -> 85f2cb03c62)

2023-05-23 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5f325ec917c [SPARK-43747][PYTHON][CONNECT] Implement the pyfile 
support in SparkSession.addArtifacts
 add 85f2cb03c62 [SPARK-43493][SQL] Add a max distance argument to the 
levenshtein() function

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/unsafe/types/UTF8String.java  |  93 +-
 .../CheckConnectJvmClientCompatibility.scala   |   1 +
 .../explain-results/function_levenshtein.explain   |   2 +-
 .../catalyst/expressions/stringExpressions.scala   | 136 +++--
 .../expressions/StringExpressionsSuite.scala   |  89 ++
 .../scala/org/apache/spark/sql/functions.scala |  13 +-
 .../apache/spark/sql/StringFunctionsSuite.scala|   6 +
 7 files changed, 327 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43649][SPARK-43650][SPARK-43651][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_240[1-3]

2023-05-23 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32d42bbe98d [SPARK-43649][SPARK-43650][SPARK-43651][SQL] Assign names 
to the error class _LEGACY_ERROR_TEMP_240[1-3]
32d42bbe98d is described below

commit 32d42bbe98da9a7e8c38b9c3187c75dbbfbb
Author: Jiaan Geng 
AuthorDate: Tue May 23 12:41:06 2023 +0300

[SPARK-43649][SPARK-43650][SPARK-43651][SQL] Assign names to the error 
class _LEGACY_ERROR_TEMP_240[1-3]

### What changes were proposed in this pull request?
The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_240[1-3].

### Why are the changes needed?
Improve the error framework.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases.

Closes #41252 from beliefer/offset-limit-error-improve.

Authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 49 --
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 18 +++---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 74 ++
 .../sql-tests/analyzer-results/limit.sql.out   | 24 ---
 .../analyzer-results/postgreSQL/limit.sql.out  |  8 +--
 .../test/resources/sql-tests/results/limit.sql.out | 24 ---
 .../sql-tests/results/postgreSQL/limit.sql.out |  8 +--
 7 files changed, 136 insertions(+), 69 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index af0471199b7..5d19d180053 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1052,6 +1052,33 @@
 ],
 "sqlState" : "42613"
   },
+  "INVALID_LIMIT_LIKE_EXPRESSION" : {
+"message" : [
+  "The limit like expression  is invalid."
+],
+"subClass" : {
+  "DATA_TYPE" : {
+"message" : [
+  "The  expression must be integer type, but got ."
+]
+  },
+  "IS_NEGATIVE" : {
+"message" : [
+  "The  expression must be equal to or greater than 0, but got 
."
+]
+  },
+  "IS_NULL" : {
+"message" : [
+  "The evaluated  expression must not be null."
+]
+  },
+  "IS_UNFOLDABLE" : {
+"message" : [
+  "The  expression must evaluate to a constant value."
+]
+  }
+}
+  },
   "INVALID_OPTIONS" : {
 "message" : [
   "Invalid options:"
@@ -1230,11 +1257,6 @@
   }
 }
   },
-  "LIMIT_LIKE_EXPRESSION_IS_UNFOLDABLE" : {
-"message" : [
-  "The  expression must evaluate to a constant value, but got 
."
-]
-  },
   "LOCATION_ALREADY_EXISTS" : {
 "message" : [
   "Cannot name the managed table as , as its associated 
location  already exists. Please pick a different table name, or 
remove the existing location first."
@@ -5260,21 +5282,6 @@
   "failed to evaluate expression : "
 ]
   },
-  "_LEGACY_ERROR_TEMP_2401" : {
-"message" : [
-  "The  expression must be integer type, but got ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2402" : {
-"message" : [
-  "The evaluated  expression must not be null, but got ."
-]
-  },
-  "_LEGACY_ERROR_TEMP_2403" : {
-"message" : [
-  "The  expression must be equal to or greater than 0, but got ."
-]
-  },
   "_LEGACY_ERROR_TEMP_2404" : {
 "message" : [
   "Table  is not partitioned."
@@ -5673,4 +5680,4 @@
   "Failed to get block , which is not a shuffle block"
 ]
   }
-}
+}
\ No newline at end of file
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 3240f9bee56..407a9d363f4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -85,27 +85,29 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
   private def checkLimitLikeClause(name: String, limitExpr: Expression): Unit 
= {
 limitExpr match {
   case e if !e.foldable => limitExpr.failAnalysis(
-errorClass = "LIMIT_LIKE_EXPRESSION_IS_UNFOLDABLE",
+errorClass = "INVALI

[spark] branch master updated: [SPARK-43714][SQL][TESTS] When formatting `error-classes.json` file with `SparkThrowableSuite` , the last line of the file should be empty line

2023-05-23 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c97a4b55e1d [SPARK-43714][SQL][TESTS] When formatting 
`error-classes.json` file with `SparkThrowableSuite` , the last line of the 
file should be empty line
c97a4b55e1d is described below

commit c97a4b55e1d2f29e576463dbc822f53e9f86a251
Author: panbingkun 
AuthorDate: Tue May 23 10:36:11 2023 +0300

[SPARK-43714][SQL][TESTS] When formatting `error-classes.json` file with 
`SparkThrowableSuite` , the last line of the file should be empty line

### What changes were proposed in this pull request?
The pr aims to generate a blank line when formatting `error-classes.json` 
file using `SparkThrowableSuite`.

### Why are the changes needed?
- When I format `error-classes.json` file using `SparkThrowableSuite`, I 
found the last blank line of the file will be erased, which does not comply 
with universal underlying code specifications, similar:
python: https://www.flake8rules.com/rules/W391.html

- Promote developer experience.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual testing.

Closes #41256 from panbingkun/SPARK-43714.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index f5b5ad2ab10..e9554da082a 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -21,6 +21,8 @@ import java.io.File
 import java.nio.charset.StandardCharsets
 import java.nio.file.Files
 
+import scala.util.Properties.lineSeparator
+
 import com.fasterxml.jackson.annotation.JsonInclude.Include
 import com.fasterxml.jackson.core.JsonParser.Feature.STRICT_DUPLICATE_DETECTION
 import com.fasterxml.jackson.core.`type`.TypeReference
@@ -92,7 +94,10 @@ class SparkThrowableSuite extends SparkFunSuite {
 val errorClassesFile = errorJsonFilePath.toFile
 logInfo(s"Regenerating error class file $errorClassesFile")
 Files.delete(errorClassesFile.toPath)
-FileUtils.writeStringToFile(errorClassesFile, rewrittenString, 
StandardCharsets.UTF_8)
+FileUtils.writeStringToFile(
+  errorClassesFile,
+  rewrittenString + lineSeparator,
+  StandardCharsets.UTF_8)
   }
 } else {
   assert(rewrittenString.trim == errorClassFileContents.trim)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43591][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0013

2023-05-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0900419de8c [SPARK-43591][SQL] Assign a name to the error class 
_LEGACY_ERROR_TEMP_0013
0900419de8c is described below

commit 0900419de8ca5d98b9921ec9ad2a8783e995f09c
Author: panbingkun 
AuthorDate: Mon May 22 23:49:50 2023 +0300

[SPARK-43591][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_0013

### What changes were proposed in this pull request?
The pr aims to assign a name to the error class _LEGACY_ERROR_TEMP_0013.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #41236 from panbingkun/SPARK-43591.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 27 --
 .../spark/sql/errors/QueryParsingErrors.scala  |  6 +--
 .../sql/catalyst/parser/PlanParserSuite.scala  | 62 --
 3 files changed, 82 insertions(+), 13 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index fbb94c59e0e..af0471199b7 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1311,6 +1311,28 @@
 ],
 "sqlState" : "42000"
   },
+  "NOT_ALLOWED_IN_FROM" : {
+"message" : [
+  "Not allowed in the FROM clause:"
+],
+"subClass" : {
+  "LATERAL_WITH_PIVOT" : {
+"message" : [
+  "LATERAL together with PIVOT."
+]
+  },
+  "LATERAL_WITH_UNPIVOT" : {
+"message" : [
+  "LATERAL together with UNPIVOT."
+]
+  },
+  "UNPIVOT_WITH_PIVOT" : {
+"message" : [
+  "UNPIVOT together with PIVOT."
+]
+  }
+}
+  },
   "NOT_A_PARTITIONED_TABLE" : {
 "message" : [
   "Operation  is not allowed for  because it 
is not a partitioned table."
@@ -2209,11 +2231,6 @@
   "DISTRIBUTE BY is not supported."
 ]
   },
-  "_LEGACY_ERROR_TEMP_0013" : {
-"message" : [
-  "LATERAL cannot be used together with PIVOT in FROM clause."
-]
-  },
   "_LEGACY_ERROR_TEMP_0014" : {
 "message" : [
   "TABLESAMPLE does not accept empty inputs."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index 4b6c3645916..28abaeb70ec 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -102,15 +102,15 @@ private[sql] object QueryParsingErrors extends 
QueryErrorsBase {
   }
 
   def unpivotWithPivotInFromClauseNotAllowedError(ctx: ParserRuleContext): 
Throwable = {
-new ParseException("UNPIVOT cannot be used together with PIVOT in FROM 
clause", ctx)
+new ParseException(errorClass = "NOT_ALLOWED_IN_FROM.UNPIVOT_WITH_PIVOT", 
ctx)
   }
 
   def lateralWithPivotInFromClauseNotAllowedError(ctx: ParserRuleContext): 
Throwable = {
-new ParseException(errorClass = "_LEGACY_ERROR_TEMP_0013", ctx)
+new ParseException(errorClass = "NOT_ALLOWED_IN_FROM.LATERAL_WITH_PIVOT", 
ctx)
   }
 
   def lateralWithUnpivotInFromClauseNotAllowedError(ctx: ParserRuleContext): 
Throwable = {
-new ParseException("LATERAL cannot be used together with UNPIVOT in FROM 
clause", ctx)
+new ParseException(errorClass = 
"NOT_ALLOWED_IN_FROM.LATERAL_WITH_UNPIVOT", ctx)
   }
 
   def lateralJoinWithUsingJoinUnsupportedError(ctx: ParserRuleContext): 
Throwable = {
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
index 76be620f7bc..41e941da908 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
@@ -567,7 +567,7 @@ class PlanParserSuite extends AnalysisTest {
   "select * from t lateral view posexplode(x) posexpl as x, y",
   expected)
 
-val sql =
+val sql1 =
   """select *
 |from t
 |lateral vi

[spark] branch master updated (ba2d785b994 -> 6d0607f94de)

2023-05-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ba2d785b994 [SPARK-43290][SQL] Adds AES IV and AAD support to 
ExpressionImplUtils
 add 6d0607f94de [SPARK-43487][SQL] Fix Nested CTE error message

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   |  7 +++
 .../spark/sql/errors/QueryCompilationErrors.scala  | 17 +++
 .../sql-tests/analyzer-results/cte-nested.sql.out  | 56 --
 .../resources/sql-tests/results/cte-nested.sql.out | 56 --
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 27 +++
 5 files changed, 107 insertions(+), 56 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43290][SQL] Adds AES IV and AAD support to ExpressionImplUtils

2023-05-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ba2d785b994 [SPARK-43290][SQL] Adds AES IV and AAD support to 
ExpressionImplUtils
ba2d785b994 is described below

commit ba2d785b99461871f588de6a8260f3201204f313
Author: Steve Weis 
AuthorDate: Mon May 22 22:43:46 2023 +0300

[SPARK-43290][SQL] Adds AES IV and AAD support to ExpressionImplUtils

### What changes were proposed in this pull request?
This change adds support for optional IV and AAD fields to 
ExpressionImplUtils, which is the underlying library to support `aes_encrypt` 
and `aes_decrypt`. This allows callers to specify their own initialization 
vector values for some specific use cases, and to take advantage of AES-GCM's 
authenticated additional data optional input.

This change does **not** add the support to the user-facing `aes_encrypt` 
and `aes_decrypt` yet. That will be added in a follow-up, rather than in a 
single complex change.

### Why are the changes needed?

There are some use cases where callers to ExpressionImplUtils via 
aes_encrypt may want to provide initialization vectors (IVs) or additional 
authenticated data (AAD). The most common cases will be:
1. Ensuring that ciphertext matches values that have been encrypted by 
external tools. In those cases, the caller will need to provide an identical IV 
value.
2. For AES-CBC mode, there are some cases where callers want to generate 
deterministic encrypted output.
3. For AES-GCM mode, providing AAD fields allows callers to bind additional 
data to an encrypted ciphertext so that it can only be decrypted by a caller 
providing the same value. This is often used to enforce some context.

### Does this PR introduce _any_ user-facing change?

Not yet. This change adds support to the underlying implementation, but 
does not yet update the SQL support to include the new parameters.

### How was this patch tested?

All existing unit tests still pass and new tests in 
`ExpressionImplUtilsSuite` exercise the new code paths:
```
build/sbt "sql/test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
build/sbt "catalyst/test:testOnly 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtilsSuite"
```

Closes #40970 from sweisdb/SPARK-43290.

Lead-authored-by: Steve Weis 
Co-authored-by: sweisdb <60895808+swei...@users.noreply.github.com>
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   |  17 +-
 .../catalyst/expressions/ExpressionImplUtils.java  |  98 ++--
 .../spark/sql/errors/QueryExecutionErrors.scala|  28 ++-
 .../expressions/ExpressionImplUtilsSuite.scala | 268 ++---
 .../sql/errors/QueryExecutionErrorsSuite.scala |   4 +-
 5 files changed, 368 insertions(+), 47 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index b5b33758341..b3023fad83b 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1074,11 +1074,16 @@
   "The value of parameter(s)  in  is invalid:"
 ],
 "subClass" : {
-  "AES_KEY" : {
+  "AES_CRYPTO_ERROR" : {
 "message" : [
   "detail message: "
 ]
   },
+  "AES_IV_LENGTH" : {
+"message" : [
+  "supports 16-byte CBC IVs and 12-byte GCM IVs, but got 
 bytes for ."
+]
+  },
   "AES_KEY_LENGTH" : {
 "message" : [
   "expects a binary value with 16, 24 or 32 bytes, but got 
 bytes."
@@ -1839,6 +1844,16 @@
   "AES- with the padding  by the  
function."
 ]
   },
+  "AES_MODE_AAD" : {
+"message" : [
+  " with AES- does not support additional 
authenticate data (AAD)."
+]
+  },
+  "AES_MODE_IV" : {
+"message" : [
+  " with AES- does not support initialization 
vectors (IVs)."
+]
+  },
   "ANALYZE_UNCACHED_TEMP_VIEW" : {
 "message" : [
   "The ANALYZE TABLE FOR COLUMNS command can operate on temporary 
views that have been cached already. Consider to cache the view ."
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
index 6843a348006..6aae649718a 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.jav

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1145 matches

Mail list logo