tgravescs commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-854716584
> But if we still have to introduce many invading changes (e.g., the mapping) even if reusing the stage level scheduling, I think we should revisit our decision. I agree with this, its a matter of coming up with the right design to solve the problem and possibly others (in the case of plugin). If we discuss alternatives that become to complex we should drop them. But we should have the discussion like we are. > BTW, it seems not possible for end users to specify the resource request by themselves as streaming uses the DataFrame API and StateStoreRDD hides from it. if user can't specify it themselves with stage level api, you are saying Spark would internally do it for the user? > There is assertion that dynamic allocation should be enabled under stage-level scheduling. I mean if we remove such assertion, will it affect normal cases of stage-level scheduling? We can relax the requirement if something like this is specified. If we were to add allowing new ResourceProfiles to fit into existing containers that requirement would be relaxed for that also. We just need to make sure its clear to user so jobs don't hang waiting on getting containers they will never get. > I’m thinking about how to establish the “mapping” with stage level scheduling. My current idea is: Add a new type of task location, e.g., StateStoreTaskLocation(host, executorId, StateStoreProviderId) , and let BaseStateStoreRDD.getPreferredLocations returns it in string. Then, the TaskSetManager could establish the “mapping” while building the pending task list: so essentially this is extending the locality feature and then the only thing you would need in stage level scheduling api is ability to say use this new locality algorithm for this stage? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
