Clarkkkkk opened a new pull request #26090: [SPARK-29302]Fix writing file 
collision in dynamic partition overwrite mode within speculative execution
URL: https://github.com/apache/spark/pull/26090
 
 
   ### What changes were proposed in this pull request?
   When inserting into a partitioned DataSource table (would not reproduced if 
using a Hive table) with dynamic partition overwrite and speculative execution, 
attempts of same task will try to write same files.
   
   This PR reuse FileOutputCommitter to avoid write collision, and rename files 
in staging directory to final output directory using the original logic in 
HadoopMapReduceCommitProtocol#commitJob.
   
   
   ### Why are the changes needed?
   Task failed is this circumstance.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   This patch is tested by existing tests in 
org.apache.spark.sql.sources.InsertSuite.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to