[GitHub] [spark] cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.

2020-02-11 Thread GitBox
cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of 
write operations for shuffle write.
URL: https://github.com/apache/spark/pull/22163#issuecomment-584696725
 
 
   makes sense to me. @10110346 do you have some performance numbers?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.

2019-11-17 Thread GitBox
cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of 
write operations for shuffle write.
URL: https://github.com/apache/spark/pull/22163#issuecomment-554881072
 
 
   Any updates here? @Ngone51 can you take it over if it's inactive?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of write operations for shuffle write.

2019-09-23 Thread GitBox
cloud-fan commented on issue #22163: [SPARK-25166][CORE]Reduce the number of 
write operations for shuffle write.
URL: https://github.com/apache/spark/pull/22163#issuecomment-534013793
 
 
   > Currently, only one record is written to a buffer each time, which 
increases the number of copies.
   
   This is very confusing. If this is true I don't think Spark shuffle can have 
reasonable performance.
   
   By looking at the code, it seems what you try to do is to not flush the 
buffer to disk when seeing a new partition. We can keep writing to the buffer 
if it's not full, even if we hit a new partition. Can you update the PR 
description to be more clear?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org