HeartSaVioR commented on pull request #30812:
URL: https://github.com/apache/spark/pull/30812#issuecomment-750544363


   I have to agree that this approach isn't ideal one, as the logic is blind on 
executors' status and just try to distribute stateful tasks evenly on 
executors. (And it's not guaranteed task scheduler follows the requests.) 
Probably the logic needs to update to reflect the actual state distribution per 
every batch, as the logic wouldn't know how task scheduler finally makes the 
decision.
   
   But I also have to agree that there doesn't look to be other feasible 
approaches without making major change.
   
   > Ideally, we should let the Spark task scheduler to do its work rather than 
doing the task scheduling work in SS because we don't have the full context of 
the executors. For example, this PR has to assume each executor has the same 
capability, while the task scheduler knows more about slow and fast executors.
   
   Same applies to the task scheduler. Task scheduler doesn't have the full 
context of the characteristics on SS (preferred locations are not an 
enforcement), and given the cost of reloading state 
    (retrieving the file"s" from remote file system, and extracting the 
compression, and loading to the memory) is not trivial compared to the ideal 
micro-batch execution time, locality is no longer just a guidance.
   
   Probably the point here is the view of the cost - whether it's ignorable or 
not, compared to the actual execution. Kafka data source has the same 
characteristic (Kafka client and unread fetched data is cached in executor) but 
probably less costly. Assuming the large state, it's going to be no longer 
ignorable.
   
   If we want to draw the ideal picture here, IMO my ideal picture is to pin 
executors and force these executors to serve these stateful tasks on the 
lifetime of the query. It's ideal to guarantee these stateful tasks never have 
to reload the state unless crash. This would not be ideal if the application 
runs multiple queries where batch and streaming are mixed and streaming queries 
have longer trigger interval hence the chance to be idle. Either the query 
should wait to be assigned to the executor, or executor should be allowed to be 
idle for the query.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to