tgravescs commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-821191952


   > IIUC, the plugin does not affect or change how the scheduler acts on 
barrier tasks. In the current dequeue logic, the scheduler doesn't have 
different behavior on barrier task/general task. For now if the scheduler 
cannot schedule all barrier tasks at once, it will reset the assigned resource 
offers. It is the same with the plugin.
   
   I didn't see an API for this?  informScheduledTask is called at scheduling 
so my assumption is the plugin may be keeping state about where things have 
went and if things reset I would have expected an API call to let the plugin 
know. in TaskSchedulerImpl.resourceOffers it may have assigned some tasks but 
then if it doesn't get all needed for barrier then it resets them.  Maybe your 
intention is that it doesn't keep state?  The intention is that api would just 
throw?
   
   > That is correct. However, even for not first micro-batch, we currently use 
preferred location + non-trivial locality config (e.g., 10h) to force Spark 
schedule tasks to previous locations. I think it is not flexible because 
locality is a global setting. A non-trivial locality config might cause 
sub-optimal result for other stages
   
   so I don't completely understand this.  Are you just saying that the 
locality is not specific enough? I get the first micro-batch case kind of 
especially perhaps in the dynamic allocation type case - is that the case here, 
seems like you kind of hint at it above in a comment,  but don't understand in 
other cases.  Have you tried the newer locality algorithm vs the old one? 
   
   Does this come down to you really just want scheduler to force evenly 
distributed and then after that locality should work? It seems like you are 
saying it needs more then that though and locality isn't enough, would like to 
understand why.
   
   > non-trivial locality config (e.g., 10h)
   
   I'm not sure what that means?  do you just mean it has more logic in 
figuring out the locality?
   
   Overall I'm fine with having some sort of a plugin to allow people to 
experiment but I also want it generic enough to cover the cases I mentioned and 
for it to not cause problems where people can shoot themselves and then 
complain as to why things aren't working.  It would be nice to really 
understand this case to see if that is needed or if just something else can be 
improved for all people benefit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to