[GitHub] [spark] turboFei commented on issue #26159: [SPARK-29506][SQL] Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table

GitBox Tue, 29 Oct 2019 16:54:23 -0700

turboFei commented on issue #26159: [SPARK-29506][SQL] Use 
dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table
URL: https://github.com/apache/spark/pull/26159#issuecomment-547679404
 
 
   @rezasafi  @viirya  For the issue you mentioned.
   The reason is that the filename is only escaped with taskId and without 
attemptId.
   We also met it when spark speculation is enabled, for that a task's filename 
is conflicted with its speculation task. 
   
   
   
https://github.com/apache/spark/blob/077fb99a26a9e92104503fade25c0a095fec5e5d/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L108-L118
   
   As shown in the code, I think the risk is same when the output committer is 
non-FileOutputCommitter, but I have not met this case and I think it is rare.
   
   I have read the comments under PR #24142, and it seems that there is a risk 
for non-FileOutputCommitter case, if a task aborted and failed to clean up its 
output, result would be duplicated.
   I created a PR #26086, which fix the issue for dynamic partition overwrite 
only.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] turboFei commented on issue #26159: [SPARK-29506][SQL] Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table

Reply via email to