[GitHub] [spark] viirya commented on pull request #30812: [SPARK-33814][SS] Provide preferred locations for stateful operations without reported state store locations

GitBox Mon, 21 Dec 2020 21:58:01 -0800


viirya commented on pull request #30812:
URL: https://github.com/apache/spark/pull/30812#issuecomment-749356702



   > I see. This makes sense. But why do we need to avoid this?
   > What's the cost did you mean? The execution memory used by states?
   > It would be great if you can explain your case and what issue you would 
like to solve in the PR description.
   
   To avoid skew memory usage on an executor. Yes, it is mainly for memory. For 
streaming queries that store large states, memory usage is severe. I will 
update the PR description to make it more clear.
   
   > Ideally, we should let the Spark task scheduler to do its work rather than 
doing the task scheduling work in SS because we don't have the full context of 
the executors. For example, this PR has to assume each executor has the same 
capability, while the task scheduler knows more about slow and fast executors.
   
   Preferred location doesn't replace the task scheduler, it is just a 
suggestion and task scheduler can choose to use it or not. For example we 
already asked later batch to schedule tasks on same executors that store states 
in previous batch. This is how the preferred locations work, isn't?
   
   This PR doesn't assume executor capacity but suggests the task scheduler to 
evenly distribute statuful tasks across executors if possible, when no store 
location is available.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on pull request #30812: [SPARK-33814][SS] Provide preferred locations for stateful operations without reported state store locations

Reply via email to