[
https://issues.apache.org/jira/browse/FLINK-31410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhipeng Zhang updated FLINK-31410:
--
Description:
In Flink ML, we use ListStateWithCache [2] to enable caching data in memory and
filesystem. However, it does not support incremental snapshot now — It writes
all the data to checkpoint stream whenever calling snapshot [1], which could be
inefficient.
[1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116]
[2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java]
was:
In Flink ML, we used ListStateWithCache [2] to enable caching data in memory
and filesystem. However, it does not support incremental snapshot now — It
writes all the data to checkpoint stream when calling snapshot [1], which could
be inefficient.
[1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116]
[2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java]
> ListStateWithCache Should support incremental snapshot
> --
>
> Key: FLINK-31410
> URL: https://issues.apache.org/jira/browse/FLINK-31410
> Project: Flink
> Issue Type: Improvement
> Components: Library / Machine Learning
>Affects Versions: ml-2.2.0
>Reporter: Zhipeng Zhang
>Priority: Major
>
> In Flink ML, we use ListStateWithCache [2] to enable caching data in memory
> and filesystem. However, it does not support incremental snapshot now — It
> writes all the data to checkpoint stream whenever calling snapshot [1], which
> could be inefficient.
>
>
> [1][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/DataCacheSnapshot.java#L116]
>
> [2][https://github.com/apache/flink-ml/blob/master/flink-ml-iteration/src/main/java/org/apache/flink/iteration/datacache/nonkeyed/ListStateWithCache.java]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)