Ngone51 commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-847498828


   > So, does it mean I can remove the dynamic allocation check for our case 
without affecting classic stage-level scheduling?
   
   Which check are you referring to?
   
   >> This means that if an executor goes down it has to wait for something 
else on executor to start up the task specific
   >>state store - what is going to do that in this scenario? Or you wait a 
certain period and schedule it anywhere.
   >
   >I think this could be configurable. Users can configure it to schedule on 
anywhere and load from checkpointed data, or in other case just fail the 
streaming app, after a certain period.
   
   Yes. Besides, in the Spark default use case, we can also move the state 
store resources to other active executors (`ExecutorData`). And tasks would 
just reconstruct them at the executor later.
   
   
   >>I think this would mean scheduler would have some specific logic to be 
able to match task id to state store id, right? Otherwise stage level 
scheduling would schedule a task on anything in that list., which seems like at 
that point makes a list not relavent if Spark knows how to do some sort of 
mapping.
   >
   >Hmm, @Ngone51, is it true? For other resources like GPUs, it makes sense 
but in this case we need specific (task id)-(resource id, e.g. state store id, 
PVC claim name, etc.) bound.
   
   That's true. We need the mapping. I thought about the using the exiting task 
info (e.g., prtitionId) should be enough to match the right state store but 
looks wrong. We'd have to add the extra info for the mapping.
   
   > which seems like at that point makes a list not relavent if Spark knows 
how to do some sort of mapping.
   
   @tgravescs yes the target might be archieved directly with the mapping. But 
I think that's would be the last choice as the community wants to introduce 
less invading changes when working across the modules. And that's the reason 
that @viirya proposed plugin APIs first and we're now discussing the 
possibility of reusing existing feature - stage level scheduling.
   But if we still have to introduce many invading changes (e.g., the mapping) 
even if reusing the stage level scheduling, I think we should revisit our 
decision.
   
   
   > it seems like this feature could existing outside creating a new 
ResourceProfile with the stage level scheduling api's and user should be able 
to specify this option that would only work with stateStoreRDD. Is it useful 
outside of that? I don't see how unless we added another plugin point for 
executor to report back any resource and then come up with some api that it 
could call to do the mapping of taskId to resourceId reported back to it.
   
   I think it's only used for `StateStoreRDD` yet. BTW, it seems not possible 
for end users to specify the resource request by themselves as streaming uses 
the DataFrame API and `StateStoreRDD` hides from it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to