viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-857859603
> Yes. I'm thinking a bit more: we probably even don't need the stage level scheduling API ability. After knowing the "mapping", we can use it directly in resourcesMeetTaskRequirements. The "mapping" is actually a hard-coded task requirement, and use stage level scheduling API ability to specify that requirement looks redundant and unnecessary. Sounds making sense. So let me rephrase it, and correct me if I misunderstand it. Basically, we introduce new task location `StateStoreTaskLocation`. The RDDs using statestore return this kind of task location as preferred locations. When `TaskSetManager` builds the pending task list, it could establish a mapping from the locations. The mapping could be between specific resources (e.g. PVC) and task (i.e. state store). `resourcesMeetTaskRequirements` directly uses the mapping to schedule tasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
