Ngone51 commented on issue #22163: [SPARK-25166][CORE]Reduce the number of
write operations for shuffle write.
URL: https://github.com/apache/spark/pull/22163#issuecomment-584509149
After taking another detail look on this, I feel that this change may not
bring expected performance improvement and is unnecessary.
Before this PR, we'll only copy one record from `recordPage` to
`writeBuffer` (even if there's remaining free space for following records) at
each time and call `DiskBlockObjectWriter.write()` after copy. And this PR
changes it to copy multiple records at each time until there's no free space in
`writeBuffer`. Then, call `DiskBlockObjectWriter.write()` and write multiple
records in batch.
So it looks like that this PR tries to reduce the invocation on
`DiskBlockObjectWriter.write()` and expectedly to reduce I/O operations(I
guess). But please note that, `DiskBlockObjectWriter` itself has already backed
by a buffer(which is more bigger than `writeBuffer`) in its
`BufferedOutputStream`. So, it's unnecessary for us to bring the duplicate work
upon `DiskBlockObjectWriter`.
Any thoughts? @10110346 @kiszk @cloud-fan @maropu
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org