[GitHub] [spark] cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.
cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write. URL: https://github.com/apache/spark/pull/22163#issuecomment-584696725 makes sense to me. @10110346 do you have some performance numbers? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.
cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write. URL: https://github.com/apache/spark/pull/22163#issuecomment-554881072 Any updates here? @Ngone51 can you take it over if it's inactive? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.
cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write. URL: https://github.com/apache/spark/pull/22163#issuecomment-534013793 > Currently, only one record is written to a buffer each time, which increases the number of copies. This is very confusing. If this is true I don't think Spark shuffle can have reasonable performance. By looking at the code, it seems what you try to do is to not flush the buffer to disk when seeing a new partition. We can keep writing to the buffer if it's not full, even if we hit a new partition. Can you update the PR description to be more clear? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org