[GitHub] [spark] viirya commented on pull request #30812: [SPARK-33814][SS] Provide preferred locations for stateful operations without reported state store locations

GitBox Mon, 21 Dec 2020 14:15:54 -0800


viirya commented on pull request #30812:
URL: https://github.com/apache/spark/pull/30812#issuecomment-749227210



   > Can you explain it a little bit why Spark cannot distribute the tasks 
evenly in the cluster? It would help me understand why this is not a problem 
for general tasks.
   
   I ran some streaming queries with stateful operation recently. When the 
first batch takes payload from latest offsets, this batch possibly finishes 
very quick. An executor might be assigned more than one task because the 
executor finishes previous task very quickly and becomes available again.
   
   Generally this is not a problem. It doesn't mater these tasks are evenly 
distributed or not, because the tasks are finished very quickly. But for SS 
stateful tasks, next batches will choose task locations of previous batch. So 
this is why this is an issue only for SS.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on pull request #30812: [SPARK-33814][SS] Provide preferred locations for stateful operations without reported state store locations

Reply via email to