zsxwing commented on a change in pull request #26312:
URL: https://github.com/apache/spark/pull/26312#discussion_r505975042
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
##########
@@ -281,6 +281,10 @@ object FileFormatWriter extends Logging {
} catch {
case e: FetchFailedException =>
throw e
+ case f: FileAlreadyExistsException =>
Review comment:
> If users really want to retry, I believe it is to catch
`FileAlreadyExistsException` and do retry in user code with changing filename,
instead of relying on Spark to retry it.
This applies to any other exceptions as well. Of cause, users can retry. But
since Spark provides task retry, users may have a rare race that can cause
`FileAlreadyExistsException` and decide to reply on Spark task retry to handle
it. In addition, some users may just call a library or a service that they
cannot make changes.
Moreover, this is a behavior change and it may break user applications
relying on task retry for `FileAlreadyExistsException`. But it's documented.
IIUC, this PR seems just trying to fail fast when hitting
`FileAlreadyExistsException` in some rare case ( SPARK-27194 ). Please correct
me if I'm wrong. If so, it's not worth to make a behavior change in my opinion.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]