tgravescs commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-818756850
I would like to see more overview and design details. I think the idea of having something here is good because some people may want to cluster tasks, while some might want to spread them. You might want to place them based on hardware or something else. I want to understand how flexible this plugin api you are proposing it. I just saw https://docs.google.com/document/d/1wfEaAZA7t02P6uBH4F3NGuH_qjK5e4X05v1E5pWNhlQ/edit# which has a few details. Would be good to link from description. questions: 1) what happens with locality? it looks like this is plugged in after locality, are you disabling locality then or it doesn't have any for your use case? if we create a plugin to choose location I wouldn't necessarily want locality to take affect. 2) @param tasks The full list of tasks => this is all tasks even if done? Would you want to know which ones are running already or which was have succeeded 3) this is being called from synchronized block, in the very least we need to document better and affects it could have on scheduling time 4) it looks like your plugin runs before blacklisting, is this really what we want or would plugin like to know to make better decision? 5) how does this interact with barrier where it resets things if it doesn't get scheduled? I would like to see how this applies to other use cases I mentioned above before putting this in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
