HyukjinKwon opened a new pull request, #38613:
URL: https://github.com/apache/spark/pull/38613

   ### What changes were proposed in this pull request?
   
   This PR is a followup of https://github.com/apache/spark/pull/38468 that 
proposes to remove notify-wait approach, and introduce a new way to collect 
partitions in parallel, and send them in order.
   
   - Previously, it actually waits until all results are stored all first, and 
then send them one by one in Protobuf message; (therefore, notify-wait isn't 
needed in fact).
   
       Both worse and best cases, we will always collect all partitions first 
and send them partition by partition.
   
   - Now, it sends Protobuf messages in an order whenever 0th partition is 
available (and send the next if available).
   
       Worse case, we will collect all partitions and send them one by one. 
Best case is to send partition by partition as it's collected.
   
   
   ### Why are the changes needed?
   
   For better performance, less memory usage, and better readability and 
maintinability (by removing synchronization) 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, this feature is not released yet, and this is performance only fix.
   
   ### How was this patch tested?
   
   CI in this PR should test it out.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to