turboFei edited a comment on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-653996517
Just left some comments. This PR did resolve the issue, it also involve some costs. In this pr, for dynamic partition overwrite mode. Each task might create multi partition paths under a unique task attempt output path. In fact, Dynamic partition overwrite always cause too many small files if user does not repartition by dynamic partition columns. So, I am afraid that this pr might cause lots of directories during runtime. I prefer #28989, in this PR, I define a Spark staging output committer based on the current implementation of HadoopMapReduceCommitProtocol. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org