[ https://issues.apache.org/jira/browse/SPARK-35022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
L. C. Hsieh updated SPARK-35022: -------------------------------- Description: Spark scheduler schedules tasks to executors in an arbitrary way. The schedule schedules the tasks by itself. Although there is locality configuration, the configuration is used for data locality purposes. Generally we cannot suggest the scheduler where a task should be scheduled to. Normally it is not a problem because the general task is executor-agnostic. But for special tasks, for example stateful tasks in Structured Streaming, state store is maintained at the executor side. Changing task location means reloading checkpoint data from the last batch. It has disadvantages from the performance perspective and also casts some limitations when we want to implement advanced features in Structured Streaming. (was: Spark scheduler schedules tasks to executors in an indeterminate way. Although there is locality configuration, the configuration is used for data locality purposes. Generally we cannot suggest the scheduler where a task should be scheduled to. Normally it is not a problem because the general task is executor-agnostic. But for special tasks, for example stateful tasks in Structured Streaming, state store is maintained at the executor side. Changing task location means reloading checkpoint data from the last batch. It has disadvantages from the performance perspective and also casts some limitations when we want to implement advanced features in Structured Streaming.) > Task Scheduling Plugin in Spark > ------------------------------- > > Key: SPARK-35022 > URL: https://issues.apache.org/jira/browse/SPARK-35022 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 3.2.0 > Reporter: L. C. Hsieh > Assignee: L. C. Hsieh > Priority: Major > > Spark scheduler schedules tasks to executors in an arbitrary way. The > schedule schedules the tasks by itself. Although there is locality > configuration, the configuration is used for data locality purposes. > Generally we cannot suggest the scheduler where a task should be scheduled > to. Normally it is not a problem because the general task is > executor-agnostic. But for special tasks, for example stateful tasks in > Structured Streaming, state store is maintained at the executor side. > Changing task location means reloading checkpoint data from the last batch. > It has disadvantages from the performance perspective and also casts some > limitations when we want to implement advanced features in Structured > Streaming. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org