turboFei edited a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-653996517


   Just left some comments.
   
   This PR did resolve the issue, it also involve some costs.
   In this pr, for dynamic partition overwrite mode.
   Each task might create multi partition paths under a unique task attempt 
output path.
   In fact, Dynamic partition overwrite always cause too many small files if 
user does not repartition by dynamic partition columns.
   So, I am afraid that this pr might cause lots of directories during runtime.
   
   I prefer #28989, in this PR, I define a Spark staging output committer based 
on the current implementation of HadoopMapReduceCommitProtocol.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to