[GitHub] [spark] karenfeng commented on a change in pull request #34093: [SPARK-36294][SQL] Refactor fifth set of 20 query execution errors to use error classes

GitBox Tue, 28 Sep 2021 11:49:32 -0700


karenfeng commented on a change in pull request #34093:
URL: https://github.com/apache/spark/pull/34093#discussion_r717205325




##########
File path: R/pkg/tests/fulltests/test_sparkSQL.R
##########
@@ -3873,7 +3873,8 @@ test_that("Call DataFrameWriter.save() API in Java 
without path and check argume
   # It makes sure that we can omit path argument in write.df API and then it 
calls
   # DataFrameWriter.save() without path.
   expect_error(write.df(df, source = "csv"),
-              "Error in save : illegal argument - Expected exactly one path to 
be specified")
+              paste("Error in save : 
org.apache.spark.SparkIllegalArgumentException:",

Review comment:
       Hm... It may be more correct to keep the original behavior. Can you go 
here 
https://github.com/apache/spark/blob/e024bdc30620867943b4b926f703f6a5634f9322/R/pkg/R/utils.R#L836
 and add another case for `SparkIllegalArgumentException`?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {

Review comment:
       We can probably simplify this: CANNOT_CLEAR_SOME_DIRECTORY -> 
CANNOT_CLEAR_DIRECTORY

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -153,9 +215,17 @@
     "message" : [ "Unsupported literal type %s %s" ],
     "sqlState" : "0A000"
   },
+  "UNSUPPORTED_SAVE_MODE" : {
+    "message" : [ "unsupported save mode %s" ],
+    "sqlState" : "0A000"
+  },
   "UNSUPPORTED_SIMPLE_STRING_WITH_NODE_ID" : {
     "message" : [ "%s does not implement simpleStringWithNodeId" ]
   },
+  "UNSUPPORTED_STREAMED_OPERATOR_BY_DATASOURCE" : {

Review comment:
       DATASOURCE -> DATA_SOURCE

##########
File path: core/src/main/scala/org/apache/spark/SparkException.scala
##########
@@ -72,9 +72,11 @@ private[spark] case class ExecutorDeadException(message: 
String)
 /**
  * Exception thrown when Spark returns different result after upgrading to a 
new version.
  */
-private[spark] class SparkUpgradeException(version: String, message: String, 
cause: Throwable)
-  extends RuntimeException("You may get a different result due to the 
upgrading of Spark" +
-    s" $version: $message", cause)
+private[spark] class SparkUpgradeException(

Review comment:
       I'm not sure if it's safe to modify the existing constructor; can we 
overload it instead?
   
   

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option 
to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library 
is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been 
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH 
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame 
involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {

Review comment:
       This is a little confusing - the issue is not that they cannot upgrade. 
Maybe `READING_AMBIGUOUS_DATES`?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option 
to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library 
is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been 
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH 
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame 
involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of 
Spark %s reading dates before 1582-10-15 or timestamps before 
1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be 
written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid 
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See 
more details in SPARK-31404. You can set the SQL config '%s' or the datasource 
option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar 
difference during reading. To read the datetime values as it is, set the SQL 
config '%s' or the datasource option '%s' to 'CORRECTED'." ]
+  },
+  "CANNOT_UPGRADE_IN_WRITING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of 
Spark %s writing dates before 1582-10-15 or timestamps before 
1900-01-01T00:00:00Z into %s files can be dangerous, as the files may be read 
by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid 
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See 
more details in SPARK-31404. You can set %s to 'LEGACY' to rebase the datetime 
values w.r.t. the calendar difference during writing, to get maximum 
interoperability. Or set %s to 'CORRECTED' to write the datetime values as it 
is, if you are 100%% sure that the written files will only be read by Spark 
3.0+ or other systems that use Proleptic Gregorian calendar." ]

Review comment:
       Missing colon here as well

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option 
to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library 
is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been 
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH 
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame 
involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of 
Spark %s reading dates before 1582-10-15 or timestamps before 
1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be 
written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid 
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See 
more details in SPARK-31404. You can set the SQL config '%s' or the datasource 
option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar 
difference during reading. To read the datetime values as it is, set the SQL 
config '%s' or the datasource option '%s' to 'CORRECTED'." ]
+  },
+  "CANNOT_UPGRADE_IN_WRITING_DATES" : {

Review comment:
       WRITING_AMBIGUOUS_DATES may be a better descriptor

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option 
to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library 
is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been 
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH 
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame 
involved." ]

Review comment:
       Rather than use a literal `\n` in the message, you can split up the 
message into separate elements in the array and they'll be joined with newlines 
later:
   ```
       "message" : [ "%s", "It is possible the underlying files have been 
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH 
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame 
involved." ]
   ```

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option 
to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library 
is compatible with Spark 2.0" ]
+  },
   "CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
     "message" : [ "Cannot generate code for expression: %s" ]
   },
   "CANNOT_PARSE_DECIMAL" : {
     "message" : [ "Cannot parse decimal" ],
     "sqlState" : "42000"
   },
+  "CANNOT_READ_CURRENT_FILE" : {
+    "message" : [ "%s \n It is possible the underlying files have been 
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH 
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame 
involved." ]
+  },
   "CANNOT_TERMINATE_GENERATOR" : {
     "message" : [ "Cannot terminate expression: %s" ]
   },
+  "CANNOT_UPGRADE_IN_READING_DATES" : {
+    "message" : [ "You may get a different result due to the upgrading of 
Spark %s reading dates before 1582-10-15 or timestamps before 
1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be 
written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid 
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See 
more details in SPARK-31404. You can set the SQL config '%s' or the datasource 
option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar 
difference during reading. To read the datetime values as it is, set the SQL 
config '%s' or the datasource option '%s' to 'CORRECTED'." ]

Review comment:
       There was a colon here before: `Spark %s: reading dates`

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
     "message" : [ "Found duplicate keys '%s'" ],
     "sqlState" : "23000"
   },
+  "END_OF_STREAM" : {
+    "message" : [ "End of stream" ]
+  },
+  "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+    "message" : [ "Failed to cast value `%s` to `%s` for partition column 
`%s`" ],
+    "sqlState" : "22023"
+  },
   "FAILED_EXECUTE_UDF" : {
     "message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
   },
+  "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+    "message" : [ "The fallback v1 relation reports inconsistent schema:\n 
Schema of v2 scan:     %s\nSchema of v1 relation: %s" ]

Review comment:
       Split strings into array elements

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option 
to drop a non-empty namespace." ]
+  },
   "CANNOT_EVALUATE_EXPRESSION" : {
     "message" : [ "Cannot evaluate expression: %s" ]
   },
+  "CANNOT_FIND_CLASS_IN_SPARK2" : {
+    "message" : [ "%s was removed in Spark 2.0. Please check if your library 
is compatible with Spark 2.0" ]

Review comment:
       I would consider this to be an unsupported operation with SQLSTATE 0A000

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
     "message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
     "sqlState" : "22005"
   },
+  "CANNOT_CLEAR_SOME_DIRECTORY" : {
+    "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+  },
+  "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+    "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option 
to drop a non-empty namespace." ]

Review comment:
       I would consider this a syntax error; can you add SQLSTATE 42000?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
     "message" : [ "Found duplicate keys '%s'" ],
     "sqlState" : "23000"
   },
+  "END_OF_STREAM" : {
+    "message" : [ "End of stream" ]
+  },
+  "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+    "message" : [ "Failed to cast value `%s` to `%s` for partition column 
`%s`" ],
+    "sqlState" : "22023"
+  },
   "FAILED_EXECUTE_UDF" : {
     "message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
   },
+  "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+    "message" : [ "The fallback v1 relation reports inconsistent schema:\n 
Schema of v2 scan:     %s\nSchema of v1 relation: %s" ]
+  },
+  "FAILED_FIND_DATASOURCE" : {

Review comment:
       DATASOURCE -> DATA_SOURCE

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -110,6 +153,10 @@
     "message" : [ "Unknown static partition column: %s" ],
     "sqlState" : "42000"
   },
+  "MISSING_STREAMING_SOURCE_SCHEMA" : {
+    "message" : [ "Schema must be specified when creating a streaming source 
DataFrame. If some files already exist in the directory, then depending on the 
file format you may be able to create a static DataFrame on that directory with 
'spark.read.load(directory)' and infer schema from it." ],
+    "sqlState" : "3F000"

Review comment:
       Is this the same schema as intended in the SQLSTATE? It seems like that 
"schema" is referring to schema objects. Maybe this should be 22023?

##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
     "message" : [ "Found duplicate keys '%s'" ],
     "sqlState" : "23000"
   },
+  "END_OF_STREAM" : {
+    "message" : [ "End of stream" ]
+  },
+  "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+    "message" : [ "Failed to cast value `%s` to `%s` for partition column 
`%s`" ],
+    "sqlState" : "22023"
+  },
   "FAILED_EXECUTE_UDF" : {
     "message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
   },
+  "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+    "message" : [ "The fallback v1 relation reports inconsistent schema:\n 
Schema of v2 scan:     %s\nSchema of v1 relation: %s" ]
+  },
+  "FAILED_FIND_DATASOURCE" : {
+    "message" : [ "Failed to find data source: %s. Please find packages at 
http://spark.apache.org/third-party-projects.html"; ]

Review comment:
       Would this have a SQLSTATE like 22023?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] karenfeng commented on a change in pull request #34093: [SPARK-36294][SQL] Refactor fifth set of 20 query execution errors to use error classes

Reply via email to