Ngone51 commented on pull request #30650:
URL: https://github.com/apache/spark/pull/30650#issuecomment-767433788


   Hi @mridulm  @tgravescs sorry for the delay.
   
   After times thinking, I think we should just keep the original behavior for 
the barrier taskset with the legacy delay scheduling. That means we should 
still abort the taskset and throw an exception when tasks are partially 
launched in that case. 
   
   Think about a case under the legacy delay scheduling, saying we have 2 tasks 
for barrier taskset and one task prefers executor-0 and another task has not 
preferred locations. On the other hand, we only have the resources (executor-0, 
host-0), (executor-1, host-1) for each resourceOffers. Then, within each 
resourceOffers, one task can always get scheduled at executor-0 first and 
**reset** the timer and current locality to PROCESS_LOCAL. And then, of course, 
another task can get scheduled at PROCESS_LOCAL. And if we try the next 
resourceOffer, we still can not launch the whole taskset since we'd start from 
the locality PROCESS_LOCAL again. Therefore, we'd never have a chance to get 
the barrier taskset launched.
   
   Non-legacy delay scheduling doesn't have this issue because it only resets 
when all tasks get launched in a single resourceOffer round. That means the 
locality level will go up (from local to ANY) as time goes by until we launched 
the taskset successfully. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to