karenfeng commented on a change in pull request #34093:
URL: https://github.com/apache/spark/pull/34093#discussion_r717205325
##########
File path: R/pkg/tests/fulltests/test_sparkSQL.R
##########
@@ -3873,7 +3873,8 @@ test_that("Call DataFrameWriter.save() API in Java
without path and check argume
# It makes sure that we can omit path argument in write.df API and then it
calls
# DataFrameWriter.save() without path.
expect_error(write.df(df, source = "csv"),
- "Error in save : illegal argument - Expected exactly one path to
be specified")
+ paste("Error in save :
org.apache.spark.SparkIllegalArgumentException:",
Review comment:
Hm... It may be more correct to keep the original behavior. Can you go
here
https://github.com/apache/spark/blob/e024bdc30620867943b4b926f703f6a5634f9322/R/pkg/R/utils.R#L836
and add another case for `SparkIllegalArgumentException`?
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
Review comment:
We can probably simplify this: CANNOT_CLEAR_SOME_DIRECTORY ->
CANNOT_CLEAR_DIRECTORY
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -153,9 +215,17 @@
"message" : [ "Unsupported literal type %s %s" ],
"sqlState" : "0A000"
},
+ "UNSUPPORTED_SAVE_MODE" : {
+ "message" : [ "unsupported save mode %s" ],
+ "sqlState" : "0A000"
+ },
"UNSUPPORTED_SIMPLE_STRING_WITH_NODE_ID" : {
"message" : [ "%s does not implement simpleStringWithNodeId" ]
},
+ "UNSUPPORTED_STREAMED_OPERATOR_BY_DATASOURCE" : {
Review comment:
DATASOURCE -> DATA_SOURCE
##########
File path: core/src/main/scala/org/apache/spark/SparkException.scala
##########
@@ -72,9 +72,11 @@ private[spark] case class ExecutorDeadException(message:
String)
/**
* Exception thrown when Spark returns different result after upgrading to a
new version.
*/
-private[spark] class SparkUpgradeException(version: String, message: String,
cause: Throwable)
- extends RuntimeException("You may get a different result due to the
upgrading of Spark" +
- s" $version: $message", cause)
+private[spark] class SparkUpgradeException(
Review comment:
I'm not sure if it's safe to modify the existing constructor; can we
overload it instead?
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
+ "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+ },
+ "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+ "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option
to drop a non-empty namespace." ]
+ },
"CANNOT_EVALUATE_EXPRESSION" : {
"message" : [ "Cannot evaluate expression: %s" ]
},
+ "CANNOT_FIND_CLASS_IN_SPARK2" : {
+ "message" : [ "%s was removed in Spark 2.0. Please check if your library
is compatible with Spark 2.0" ]
+ },
"CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
"message" : [ "Cannot generate code for expression: %s" ]
},
"CANNOT_PARSE_DECIMAL" : {
"message" : [ "Cannot parse decimal" ],
"sqlState" : "42000"
},
+ "CANNOT_READ_CURRENT_FILE" : {
+ "message" : [ "%s \n It is possible the underlying files have been
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame
involved." ]
+ },
"CANNOT_TERMINATE_GENERATOR" : {
"message" : [ "Cannot terminate expression: %s" ]
},
+ "CANNOT_UPGRADE_IN_READING_DATES" : {
Review comment:
This is a little confusing - the issue is not that they cannot upgrade.
Maybe `READING_AMBIGUOUS_DATES`?
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
+ "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+ },
+ "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+ "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option
to drop a non-empty namespace." ]
+ },
"CANNOT_EVALUATE_EXPRESSION" : {
"message" : [ "Cannot evaluate expression: %s" ]
},
+ "CANNOT_FIND_CLASS_IN_SPARK2" : {
+ "message" : [ "%s was removed in Spark 2.0. Please check if your library
is compatible with Spark 2.0" ]
+ },
"CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
"message" : [ "Cannot generate code for expression: %s" ]
},
"CANNOT_PARSE_DECIMAL" : {
"message" : [ "Cannot parse decimal" ],
"sqlState" : "42000"
},
+ "CANNOT_READ_CURRENT_FILE" : {
+ "message" : [ "%s \n It is possible the underlying files have been
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame
involved." ]
+ },
"CANNOT_TERMINATE_GENERATOR" : {
"message" : [ "Cannot terminate expression: %s" ]
},
+ "CANNOT_UPGRADE_IN_READING_DATES" : {
+ "message" : [ "You may get a different result due to the upgrading of
Spark %s reading dates before 1582-10-15 or timestamps before
1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be
written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See
more details in SPARK-31404. You can set the SQL config '%s' or the datasource
option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar
difference during reading. To read the datetime values as it is, set the SQL
config '%s' or the datasource option '%s' to 'CORRECTED'." ]
+ },
+ "CANNOT_UPGRADE_IN_WRITING_DATES" : {
+ "message" : [ "You may get a different result due to the upgrading of
Spark %s writing dates before 1582-10-15 or timestamps before
1900-01-01T00:00:00Z into %s files can be dangerous, as the files may be read
by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See
more details in SPARK-31404. You can set %s to 'LEGACY' to rebase the datetime
values w.r.t. the calendar difference during writing, to get maximum
interoperability. Or set %s to 'CORRECTED' to write the datetime values as it
is, if you are 100%% sure that the written files will only be read by Spark
3.0+ or other systems that use Proleptic Gregorian calendar." ]
Review comment:
Missing colon here as well
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
+ "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+ },
+ "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+ "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option
to drop a non-empty namespace." ]
+ },
"CANNOT_EVALUATE_EXPRESSION" : {
"message" : [ "Cannot evaluate expression: %s" ]
},
+ "CANNOT_FIND_CLASS_IN_SPARK2" : {
+ "message" : [ "%s was removed in Spark 2.0. Please check if your library
is compatible with Spark 2.0" ]
+ },
"CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
"message" : [ "Cannot generate code for expression: %s" ]
},
"CANNOT_PARSE_DECIMAL" : {
"message" : [ "Cannot parse decimal" ],
"sqlState" : "42000"
},
+ "CANNOT_READ_CURRENT_FILE" : {
+ "message" : [ "%s \n It is possible the underlying files have been
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame
involved." ]
+ },
"CANNOT_TERMINATE_GENERATOR" : {
"message" : [ "Cannot terminate expression: %s" ]
},
+ "CANNOT_UPGRADE_IN_READING_DATES" : {
+ "message" : [ "You may get a different result due to the upgrading of
Spark %s reading dates before 1582-10-15 or timestamps before
1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be
written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See
more details in SPARK-31404. You can set the SQL config '%s' or the datasource
option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar
difference during reading. To read the datetime values as it is, set the SQL
config '%s' or the datasource option '%s' to 'CORRECTED'." ]
+ },
+ "CANNOT_UPGRADE_IN_WRITING_DATES" : {
Review comment:
WRITING_AMBIGUOUS_DATES may be a better descriptor
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
+ "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+ },
+ "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+ "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option
to drop a non-empty namespace." ]
+ },
"CANNOT_EVALUATE_EXPRESSION" : {
"message" : [ "Cannot evaluate expression: %s" ]
},
+ "CANNOT_FIND_CLASS_IN_SPARK2" : {
+ "message" : [ "%s was removed in Spark 2.0. Please check if your library
is compatible with Spark 2.0" ]
+ },
"CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
"message" : [ "Cannot generate code for expression: %s" ]
},
"CANNOT_PARSE_DECIMAL" : {
"message" : [ "Cannot parse decimal" ],
"sqlState" : "42000"
},
+ "CANNOT_READ_CURRENT_FILE" : {
+ "message" : [ "%s \n It is possible the underlying files have been
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame
involved." ]
Review comment:
Rather than use a literal `\n` in the message, you can split up the
message into separate elements in the array and they'll be joined with newlines
later:
```
"message" : [ "%s", "It is possible the underlying files have been
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame
involved." ]
```
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
+ "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+ },
+ "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+ "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option
to drop a non-empty namespace." ]
+ },
"CANNOT_EVALUATE_EXPRESSION" : {
"message" : [ "Cannot evaluate expression: %s" ]
},
+ "CANNOT_FIND_CLASS_IN_SPARK2" : {
+ "message" : [ "%s was removed in Spark 2.0. Please check if your library
is compatible with Spark 2.0" ]
+ },
"CANNOT_GENERATE_CODE_FOR_EXPRESSION" : {
"message" : [ "Cannot generate code for expression: %s" ]
},
"CANNOT_PARSE_DECIMAL" : {
"message" : [ "Cannot parse decimal" ],
"sqlState" : "42000"
},
+ "CANNOT_READ_CURRENT_FILE" : {
+ "message" : [ "%s \n It is possible the underlying files have been
updated. You can explicitly invalidate the cache in Spark by running 'REFRESH
TABLE tableName' command in SQL or by recreating the Dataset/DataFrame
involved." ]
+ },
"CANNOT_TERMINATE_GENERATOR" : {
"message" : [ "Cannot terminate expression: %s" ]
},
+ "CANNOT_UPGRADE_IN_READING_DATES" : {
+ "message" : [ "You may get a different result due to the upgrading of
Spark %s reading dates before 1582-10-15 or timestamps before
1900-01-01T00:00:00Z from %s files can be ambiguous, as the files may be
written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid
calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See
more details in SPARK-31404. You can set the SQL config '%s' or the datasource
option '%s' to 'LEGACY' to rebase the datetime values w.r.t. the calendar
difference during reading. To read the datetime values as it is, set the SQL
config '%s' or the datasource option '%s' to 'CORRECTED'." ]
Review comment:
There was a colon here before: `Spark %s: reading dates`
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
"message" : [ "Found duplicate keys '%s'" ],
"sqlState" : "23000"
},
+ "END_OF_STREAM" : {
+ "message" : [ "End of stream" ]
+ },
+ "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+ "message" : [ "Failed to cast value `%s` to `%s` for partition column
`%s`" ],
+ "sqlState" : "22023"
+ },
"FAILED_EXECUTE_UDF" : {
"message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
},
+ "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+ "message" : [ "The fallback v1 relation reports inconsistent schema:\n
Schema of v2 scan: %s\nSchema of v1 relation: %s" ]
Review comment:
Split strings into array elements
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
+ "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+ },
+ "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+ "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option
to drop a non-empty namespace." ]
+ },
"CANNOT_EVALUATE_EXPRESSION" : {
"message" : [ "Cannot evaluate expression: %s" ]
},
+ "CANNOT_FIND_CLASS_IN_SPARK2" : {
+ "message" : [ "%s was removed in Spark 2.0. Please check if your library
is compatible with Spark 2.0" ]
Review comment:
I would consider this to be an unsupported operation with SQLSTATE 0A000
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -11,19 +11,37 @@
"message" : [ "%s cannot be represented as Decimal(%s, %s)." ],
"sqlState" : "22005"
},
+ "CANNOT_CLEAR_SOME_DIRECTORY" : {
+ "message" : [ "Unable to clear %s directory %s prior to writing to it" ]
+ },
+ "CANNOT_DROP_NONEMPTY_NAMESPACE" : {
+ "message" : [ "Cannot drop a non-empty namespace: %s. Use CASCADE option
to drop a non-empty namespace." ]
Review comment:
I would consider this a syntax error; can you add SQLSTATE 42000?
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
"message" : [ "Found duplicate keys '%s'" ],
"sqlState" : "23000"
},
+ "END_OF_STREAM" : {
+ "message" : [ "End of stream" ]
+ },
+ "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+ "message" : [ "Failed to cast value `%s` to `%s` for partition column
`%s`" ],
+ "sqlState" : "22023"
+ },
"FAILED_EXECUTE_UDF" : {
"message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
},
+ "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+ "message" : [ "The fallback v1 relation reports inconsistent schema:\n
Schema of v2 scan: %s\nSchema of v1 relation: %s" ]
+ },
+ "FAILED_FIND_DATASOURCE" : {
Review comment:
DATASOURCE -> DATA_SOURCE
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -110,6 +153,10 @@
"message" : [ "Unknown static partition column: %s" ],
"sqlState" : "42000"
},
+ "MISSING_STREAMING_SOURCE_SCHEMA" : {
+ "message" : [ "Schema must be specified when creating a streaming source
DataFrame. If some files already exist in the directory, then depending on the
file format you may be able to create a static DataFrame on that directory with
'spark.read.load(directory)' and infer schema from it." ],
+ "sqlState" : "3F000"
Review comment:
Is this the same schema as intended in the SQLSTATE? It seems like that
"schema" is referring to schema objects. Maybe this should be 22023?
##########
File path: core/src/main/resources/error/error-classes.json
##########
@@ -39,9 +57,31 @@
"message" : [ "Found duplicate keys '%s'" ],
"sqlState" : "23000"
},
+ "END_OF_STREAM" : {
+ "message" : [ "End of stream" ]
+ },
+ "FAILED_CAST_VALUE_TO_DATATYPE_FOR_PARTITION_COLUMN" : {
+ "message" : [ "Failed to cast value `%s` to `%s` for partition column
`%s`" ],
+ "sqlState" : "22023"
+ },
"FAILED_EXECUTE_UDF" : {
"message" : [ "Failed to execute user defined function (%s: (%s) => %s)" ]
},
+ "FAILED_FALLBACK_V1_BECAUSE_OF_INCONSISTENT_SCHEMA" : {
+ "message" : [ "The fallback v1 relation reports inconsistent schema:\n
Schema of v2 scan: %s\nSchema of v1 relation: %s" ]
+ },
+ "FAILED_FIND_DATASOURCE" : {
+ "message" : [ "Failed to find data source: %s. Please find packages at
http://spark.apache.org/third-party-projects.html" ]
Review comment:
Would this have a SQLSTATE like 22023?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]