cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write. URL: https://github.com/apache/spark/pull/22163#issuecomment-534013793 > Currently, only one record is written to a buffer each time, which increases the number of copies. This is very confusing. If this is true I don't think Spark shuffle can have reasonable performance. By looking at the code, it seems what you try to do is to not flush the buffer to disk when seeing a new partition. We can keep writing to the buffer if it's not full, even if we hit a new partition. Can you update the PR description to be more clear?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
