[jira] [Assigned] (SPARK-33827) Unload State Store asap once it becomes inactive

2020-12-27 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-33827:


Assignee: L. C. Hsieh

> Unload State Store asap once it becomes inactive
> 
>
> Key: SPARK-33827
> URL: https://issues.apache.org/jira/browse/SPARK-33827
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> SS maintains state stores in executors across batches. Due to the nature of 
> Spark scheduling, a state store might be allocated on another executor in 
> next batch. The state store in previous batch becomes inactive.
> Now we run a maintenance task periodically to unload inactive state stores. 
> So there will be some delays between a state store becomes inactive and it is 
> unloaded.
> Per the discussion on https://github.com/apache/spark/pull/30770 with 
> [~kabhwan], I think the preference is to unload inactive state store asap.
> However, we can force Spark to always allocate a state store to same 
> executor, by using task locality configuration. This can reduce the 
> possibility to have inactive state store.
> Normally, I think with locality configuration, we might not able to see 
> inactive state store generally. There is still chance an executor can be 
> failed and reallocated, but in this case, inactive state store is also lost 
> too. So it is not an issue.
> So unloading inactive store asap is only useful when we don't use task 
> locality to force state store locality across batches.
> The required change to make driver-executor bi-directional for state store 
> management looks non-trivial. If we already can reduce possibility of 
> inactive store, is it still worth making non-trivial here?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33827) Unload State Store asap once it becomes inactive

2020-12-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33827:


Assignee: Apache Spark

> Unload State Store asap once it becomes inactive
> 
>
> Key: SPARK-33827
> URL: https://issues.apache.org/jira/browse/SPARK-33827
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> SS maintains state stores in executors across batches. Due to the nature of 
> Spark scheduling, a state store might be allocated on another executor in 
> next batch. The state store in previous batch becomes inactive.
> Now we run a maintenance task periodically to unload inactive state stores. 
> So there will be some delays between a state store becomes inactive and it is 
> unloaded.
> Per the discussion on https://github.com/apache/spark/pull/30770 with 
> [~kabhwan], I think the preference is to unload inactive state store asap.
> However, we can force Spark to always allocate a state store to same 
> executor, by using task locality configuration. This can reduce the 
> possibility to have inactive state store.
> Normally, I think with locality configuration, we might not able to see 
> inactive state store generally. There is still chance an executor can be 
> failed and reallocated, but in this case, inactive state store is also lost 
> too. So it is not an issue.
> So unloading inactive store asap is only useful when we don't use task 
> locality to force state store locality across batches.
> The required change to make driver-executor bi-directional for state store 
> management looks non-trivial. If we already can reduce possibility of 
> inactive store, is it still worth making non-trivial here?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33827) Unload State Store asap once it becomes inactive

2020-12-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33827:


Assignee: (was: Apache Spark)

> Unload State Store asap once it becomes inactive
> 
>
> Key: SPARK-33827
> URL: https://issues.apache.org/jira/browse/SPARK-33827
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Priority: Major
>
> SS maintains state stores in executors across batches. Due to the nature of 
> Spark scheduling, a state store might be allocated on another executor in 
> next batch. The state store in previous batch becomes inactive.
> Now we run a maintenance task periodically to unload inactive state stores. 
> So there will be some delays between a state store becomes inactive and it is 
> unloaded.
> Per the discussion on https://github.com/apache/spark/pull/30770 with 
> [~kabhwan], I think the preference is to unload inactive state store asap.
> However, we can force Spark to always allocate a state store to same 
> executor, by using task locality configuration. This can reduce the 
> possibility to have inactive state store.
> Normally, I think with locality configuration, we might not able to see 
> inactive state store generally. There is still chance an executor can be 
> failed and reallocated, but in this case, inactive state store is also lost 
> too. So it is not an issue.
> So unloading inactive store asap is only useful when we don't use task 
> locality to force state store locality across batches.
> The required change to make driver-executor bi-directional for state store 
> management looks non-trivial. If we already can reduce possibility of 
> inactive store, is it still worth making non-trivial here?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org