[jira] [Updated] (SPARK-35022) Task Scheduling Plugin in Spark

L. C. Hsieh (Jira) Mon, 12 Apr 2021 14:56:35 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-35022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


L. C. Hsieh updated SPARK-35022:
--------------------------------
    Description: Spark scheduler schedules tasks to executors in an arbitrary 
way. The schedule schedules the tasks by itself. Although there is locality 
configuration, the configuration is used for data locality purposes. Generally 
we cannot suggest the scheduler where a task should be scheduled to. Normally 
it is not a problem because the general task is executor-agnostic. But for 
special tasks, for example stateful tasks in Structured Streaming, state store 
is maintained at the executor side. Changing task location means reloading 
checkpoint data from the last batch. It has disadvantages from the performance 
perspective and also casts some limitations when we want to implement advanced 
features in Structured Streaming.  (was: Spark scheduler schedules tasks to 
executors in an indeterminate way. Although there is locality configuration, 
the configuration is used for data locality purposes. Generally we cannot 
suggest the scheduler where a task should be scheduled to. Normally it is not a 
problem because the general task is executor-agnostic. But for special tasks, 
for example stateful tasks in Structured Streaming, state store is maintained 
at the executor side. Changing task location means reloading checkpoint data 
from the last batch. It has disadvantages from the performance perspective and 
also casts some limitations when we want to implement advanced features in 
Structured Streaming.)

> Task Scheduling Plugin in Spark
> -------------------------------
>
>                 Key: SPARK-35022
>                 URL: https://issues.apache.org/jira/browse/SPARK-35022
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.2.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>
> Spark scheduler schedules tasks to executors in an arbitrary way. The 
> schedule schedules the tasks by itself. Although there is locality 
> configuration, the configuration is used for data locality purposes. 
> Generally we cannot suggest the scheduler where a task should be scheduled 
> to. Normally it is not a problem because the general task is 
> executor-agnostic. But for special tasks, for example stateful tasks in 
> Structured Streaming, state store is maintained at the executor side. 
> Changing task location means reloading checkpoint data from the last batch. 
> It has disadvantages from the performance perspective and also casts some 
> limitations when we want to implement advanced features in Structured 
> Streaming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35022) Task Scheduling Plugin in Spark

Reply via email to