Ngone51 commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write. URL: https://github.com/apache/spark/pull/22163#issuecomment-584509149 After taking another detail look on this, I feel that this change may not bring expected performance improvement and is unnecessary. Before this PR, we'll only copy one record from `recordPage` to `writeBuffer` (even if there's remaining free space for following records) at each time and call `DiskBlockObjectWriter.write()` after copy. And this PR changes it to copy multiple records at each time until there's no free space in `writeBuffer`. Then, call `DiskBlockObjectWriter.write()` and write multiple records in batch. So it looks like that this PR tries to reduce the invocation on `DiskBlockObjectWriter.write()` and expectedly to reduce I/O operations(I guess). But please note that, `DiskBlockObjectWriter` itself has already backed by a buffer(which is more bigger than `writeBuffer`) in its `BufferedOutputStream`. So, it's unnecessary for us to bring the duplicate work upon `DiskBlockObjectWriter`. Any thoughts? @10110346 @kiszk @cloud-fan @maropu
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
