[GitHub] [spark] Ngone51 commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.

2020-02-10 Thread GitBox
Ngone51 commented on issue #22163: [SPARK-25166][CORE]Reduce the number of 
write operations for shuffle write.
URL: https://github.com/apache/spark/pull/22163#issuecomment-584509149
 
 
   After taking another detail look on this, I feel that this change may not 
bring expected performance improvement and  is unnecessary.
   
   Before this PR, we'll only copy one record from `recordPage` to 
`writeBuffer` (even if there's remaining free space for following records) at 
each time  and call `DiskBlockObjectWriter.write()` after copy.  And this PR 
changes it to copy multiple records at each time until there's no free space in 
`writeBuffer`. Then, call `DiskBlockObjectWriter.write()` and write multiple 
records in batch.
   
   So it looks like that this PR tries to reduce the invocation on 
`DiskBlockObjectWriter.write()`  and expectedly to reduce I/O operations(I 
guess). But please note that, `DiskBlockObjectWriter` itself has already backed 
by a buffer(which is more bigger than `writeBuffer`) in its 
`BufferedOutputStream`. So, it's unnecessary for us to bring the duplicate work 
upon `DiskBlockObjectWriter`.
   
   Any thoughts? @10110346 @kiszk @cloud-fan @maropu 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.

2020-02-04 Thread GitBox
Ngone51 commented on issue #22163: [SPARK-25166][CORE]Reduce the number of 
write operations for shuffle write.
URL: https://github.com/apache/spark/pull/22163#issuecomment-582253898
 
 
   Thanks for ping me. Let me try it in recent days.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org