tgravescs commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-821191952
> IIUC, the plugin does not affect or change how the scheduler acts on barrier tasks. In the current dequeue logic, the scheduler doesn't have different behavior on barrier task/general task. For now if the scheduler cannot schedule all barrier tasks at once, it will reset the assigned resource offers. It is the same with the plugin. I didn't see an API for this? informScheduledTask is called at scheduling so my assumption is the plugin may be keeping state about where things have went and if things reset I would have expected an API call to let the plugin know. in TaskSchedulerImpl.resourceOffers it may have assigned some tasks but then if it doesn't get all needed for barrier then it resets them. Maybe your intention is that it doesn't keep state? The intention is that api would just throw? > That is correct. However, even for not first micro-batch, we currently use preferred location + non-trivial locality config (e.g., 10h) to force Spark schedule tasks to previous locations. I think it is not flexible because locality is a global setting. A non-trivial locality config might cause sub-optimal result for other stages so I don't completely understand this. Are you just saying that the locality is not specific enough? I get the first micro-batch case kind of especially perhaps in the dynamic allocation type case - is that the case here, seems like you kind of hint at it above in a comment, but don't understand in other cases. Have you tried the newer locality algorithm vs the old one? Does this come down to you really just want scheduler to force evenly distributed and then after that locality should work? It seems like you are saying it needs more then that though and locality isn't enough, would like to understand why. > non-trivial locality config (e.g., 10h) I'm not sure what that means? do you just mean it has more logic in figuring out the locality? Overall I'm fine with having some sort of a plugin to allow people to experiment but I also want it generic enough to cover the cases I mentioned and for it to not cause problems where people can shoot themselves and then complain as to why things aren't working. It would be nice to really understand this case to see if that is needed or if just something else can be improved for all people benefit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
