mridulm commented on PR #56055:
URL: https://github.com/apache/spark/pull/56055#issuecomment-4814595524

   > "The new ability is cross-stage gang scheduling, and that already implies 
a streaming shuffle"
   
   This is a fair extension to build - and if this is the modeling exercise - 
we should be looking to extend the existing gang scheduling construct. I 
started off asking about this 
[here](https://github.com/apache/spark/pull/56055#pullrequestreview-4405210859) 
:)
   
   > flows into the ShuffleDependency the exchange creates, and is read by the 
DAGScheduler at stage-creation time.
   
   This is not a hint from scheduler perspective. 
   We should make it explicit and call it `StreamingShuffleDependency` or some 
such - and define `DAGScheduler` contract on how it handles the DAG
   
   Post correctness validation (supported modes of wiring DAG) this would 
determine how stages are eligible for schedule + how they get scheduled 
(@cloud-fan 's articulation of cross stage gang scheduling for example, or 
something similar perhaps), straggler handling, failure handling, etc.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to