tgravescs commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-854716584


   > But if we still have to introduce many invading changes (e.g., the 
mapping) even if reusing the stage level scheduling, I think we should revisit 
our decision.
   
   I agree with this, its a matter of coming up with the right design to solve 
the problem and possibly others (in the case of plugin).  If we discuss 
alternatives that become to complex we should drop them.   But we should have 
the discussion like we are.
   
   > BTW, it seems not possible for end users to specify the resource request 
by themselves as streaming uses the DataFrame API and StateStoreRDD hides from 
it.
   
   if user can't specify it themselves with stage level api, you are saying 
Spark would internally do it for the user?
   
   > There is assertion that dynamic allocation should be enabled under 
stage-level scheduling. I mean if we remove such assertion, will it affect 
normal cases of stage-level scheduling?
   
   We can relax the requirement if something like this is specified. If we were 
to add allowing new ResourceProfiles to fit into existing containers that 
requirement would be relaxed for that also.  We just need to make sure its 
clear to user so jobs don't hang waiting on getting containers they will never 
get.
   
   > I’m thinking about how to establish the “mapping” with stage level 
scheduling. My current idea is:
   Add a new type of task location, e.g., StateStoreTaskLocation(host, 
executorId, StateStoreProviderId) , and let 
BaseStateStoreRDD.getPreferredLocations returns it in string. Then, the 
TaskSetManager could establish the “mapping” while building the pending task 
list:
   
   so essentially this is extending the locality feature and then the only 
thing you would need in stage level scheduling api is ability to say use this 
new locality algorithm for this stage?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to