[GitHub] [spark] viirya commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

GitBox Thu, 20 May 2021 01:03:42 -0700


viirya commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-844836740



   > This sounds feasible to me. We can treat the state store as a resource for 
the streaming task. And since the `StateStoreProviderId` is shared between 
batches, tasks between batches must be assigned the same state store as long as 
they require the same `StateStoreProviderId` (which is guaranteed by the stage 
level scheduling mechanism). Here's pseudo code may look like:
   > On the other side, driver should be able to update 
`ExecutorData.resourcesInfo` when `StateStoreCoordinatorRef` receives the 
active state store instance register so that the executor would contain the 
state store resource.
   
   Thanks @xuanyuanking and @Ngone51.
   
   This might be a feasible direction to unblock this. Let me think about it 
and maybe POC it locally.
   
   Roughly looking at it, stage level scheduling can deal with particular task 
to particular executor issue. This looks okay. 
   
   #32422 is based on the API added here. So there is a hook/API or something 
that we can rely to change the scheduler behavior for first batch. 
Specifically, it is a streaming concept and so without an API/hook there it 
will be ugly to add hack code into the scheduler to fit this need. We can 
discuss this, of course.
   
   However I'm not sure if stage level scheduling can deal with executor lost 
case. Based on above comment, seems it cannot. That will be a major concern for 
the use-case here. During the task scheduling, once an executor is lost, we may 
need the scheduler be able to re-schedule the task to a particular executor 
(e.g.  reused PVC in our case).
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

Reply via email to