[GitHub] [spark] turboFei edited a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

GitBox Sun, 05 Jul 2020 22:59:19 -0700


turboFei edited a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-653996517



   Just left some comments.
   
   This PR did resolve the issue, it also involve some costs.
   In this pr, for dynamic partition overwrite mode.
   Each task might create multi partition paths under a unique task attempt 
output path.
   In fact, Dynamic partition overwrite always cause too many small files if 
user does not repartition by dynamic partition columns.
   So, I am afraid that this pr might cause lots of directories during runtime.
   
   I prefer #28989, in this PR, I define a Spark staging output committer based 
on the current implementation of HadoopMapReduceCommitProtocol.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

Reply via email to