[GitHub] [spark] zsxwing commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

GitBox Thu, 15 Oct 2020 18:50:24 -0700


zsxwing commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r505975042




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
     } catch {
       case e: FetchFailedException =>
         throw e
+      case f: FileAlreadyExistsException =>

Review comment:
       > If users really want to retry, I believe it is to catch 
`FileAlreadyExistsException` and do retry in user code with changing filename, 
instead of relying on Spark to retry it.
   
   This applies to any other exceptions as well. Of cause, users can retry. But 
since Spark provides task retry, users may have a rare race that can cause 
`FileAlreadyExistsException` and decide to reply on Spark task retry to handle 
it. In addition, some users may just call a library or a service that they 
cannot make changes.
   
   Moreover, this is a behavior change and it may break user applications 
relying on task retry for `FileAlreadyExistsException`. But it's documented.
   
   IIUC, this PR seems just trying to fail fast when hitting 
`FileAlreadyExistsException` in some rare case ( SPARK-27194 ). Please correct 
me if I'm wrong. If so, it's not worth to make a behavior change in my opinion.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zsxwing commented on a change in pull request #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file

Reply via email to