jerrypeng commented on PR #56055:
URL: https://github.com/apache/spark/pull/56055#issuecomment-4599583040

   @mridulm — This change is not needed when the streaming query is a single 
stage: a single long-running (map) stage runs fine on the existing scheduler, 
which is exactly why RTM support for single-stage stateless queries already 
shipped in 4.1 — no scheduler change required there.
   
   However, multi-stage queries (e.g. stateful queries) are today executed one 
stage at a time, with each stage's shuffle output fully materialized before the 
next stage starts. To reach millisecond-level latencies, we instead need the 
stages of a single query to run concurrently, connected by the streaming 
shuffle (currently being merged in incrementally). Enabling concurrent 
execution of dependent stages within a single job is what requires the 
scheduler change — which is why this work is needed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to