rezasafi commented on issue #26159: [SPARK-29506][SQL] Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table URL: https://github.com/apache/spark/pull/26159#issuecomment-547563204 I think the jira and also the discussion on the first pr that was created for it is describing the issue in more detail. Basically the current dynamic partition overwrite implementation didn't consider the case that a task may fail during the commit. If a container fail, the task that was running there will fail but its written data won't be cleaned up and the rerun of the task will hit the FileAlreadyExistsException. The discussions under PR#24142 also debated a proposed solution because of the concern of duplicate data.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
