Re: [PR] [SPARK-49392] Catch errors when failing to write to external data source [spark]

via GitHub Mon, 26 Aug 2024 13:27:04 -0700


allisonwang-db commented on code in PR #47873:
URL: https://github.com/apache/spark/pull/47873#discussion_r1731782588



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala:
##########
@@ -44,8 +45,23 @@ case class SaveIntoDataSourceCommand(
   override def innerChildren: Seq[QueryPlan[_]] = Seq(query)
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-    val relation = dataSource.createRelation(
-      sparkSession.sqlContext, mode, options, Dataset.ofRows(sparkSession, 
query))
+    var relation: BaseRelation = null
+
+    try {
+      relation = dataSource.createRelation(
+        sparkSession.sqlContext, mode, options, Dataset.ofRows(sparkSession, 
query))
+    } catch {
+      case e @ (_: NullPointerException | _: MatchError | _: 
ArrayIndexOutOfBoundsException |
+          _: IllegalArgumentException | _: ClassCastException | _: 
IllegalStateException) =>
+        // These are some of the exceptions thrown by the data source API. We 
catch these
+        // exceptions here and rethrow 
QueryCompilationErrors.externalDataSourceException to
+        // provide a more friendly error message for the user. This list is 
not exhaustive.
+        throw QueryCompilationErrors.externalDataSourceException(Some(e))
+      case _: Throwable =>
+        // Skip other exceptions for now, as they may be handled by some other 
part of the code.
+    }
+
+    assert(relation != null)

Review Comment:
   How can we guarantee the relation is not null here?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala:
##########
@@ -44,8 +45,23 @@ case class SaveIntoDataSourceCommand(
   override def innerChildren: Seq[QueryPlan[_]] = Seq(query)
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-    val relation = dataSource.createRelation(
-      sparkSession.sqlContext, mode, options, Dataset.ofRows(sparkSession, 
query))
+    var relation: BaseRelation = null
+
+    try {
+      relation = dataSource.createRelation(
+        sparkSession.sqlContext, mode, options, Dataset.ofRows(sparkSession, 
query))
+    } catch {
+      case e @ (_: NullPointerException | _: MatchError | _: 
ArrayIndexOutOfBoundsException |
+          _: IllegalArgumentException | _: ClassCastException | _: 
IllegalStateException) =>
+        // These are some of the exceptions thrown by the data source API. We 
catch these
+        // exceptions here and rethrow 
QueryCompilationErrors.externalDataSourceException to
+        // provide a more friendly error message for the user. This list is 
not exhaustive.
+        throw QueryCompilationErrors.externalDataSourceException(Some(e))
+      case _: Throwable =>
+        // Skip other exceptions for now, as they may be handled by some other 
part of the code.

Review Comment:
   I don’t think we should ignore the other types of errors here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49392] Catch errors when failing to write to external data source [spark]

Reply via email to