I am maintaining state data for a key in ValueState. As per [0] I can clear() state for that key.
[0] https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/state.html Please let me know. Thanks, On Thu, Jun 21, 2018 at 1:19 PM sihua zhou <summerle...@163.com> wrote: > Hi Garvit, > > Let's say you clearing the state at timestamp t1, then the checkpoints > completed before t1 will still contains the data you cleared. But the > future checkpoints won't contain the cleared data again. But I'm not sure > what you meaning by the cleared state, you can only clear a key-value pair > of the state currently, you can't cleared the whole state currently. > > Best, Sihua > > On 06/21/2018 15:41,Garvit Sharma<garvit...@gmail.com> > <garvit...@gmail.com> wrote: > > So, would it delete all the files in HDFS associated with the cleared > state? > > On Thu, Jun 21, 2018 at 12:58 PM sihua zhou <summerle...@163.com> wrote: > >> Hi Garvit, >> >> > Now, let's say, we clear the state. Would the state data be removed >> from HDFS too? >> >> The state data would not be removed from HDFS immediately, if you clear >> the state in your job. But after you clearing the state in your job, the >> later completed checkpoint won't contain the state any more. >> >> > How does Flink manage to clear the state data from state backend on >> clearing the keyed state? >> >> 1. you can use the {{tate.checkpoints.num-retained}} to set the number >> of the completed checkpoint maintanced on HDFS. >> 2. If you set {{ >> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup. >> DELETE_ON_CANCELLATION)}} then the checkpoints on HDFS will be removed >> once your job is finished(or cancled). And if you set {{ >> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup. >> RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained. >> >> Please refer to >> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html >> to >> find more information. >> >> >> Additional, I'd like to give a bref info of the checkpoint on HDFS. In a >> nutshell, what ever you did with the state in your running job, they only >> effect the content on the state backend locally. When checkpointing, flink >> takes a snapshot of the local state backend, and send it to the checkpoint >> target directory(in your case, the HDFS). The checkpoints on the HDFS looks >> like the periodic snapshot of the state backend of your job, they can be >> created or deleted but never be changed. Maybe Stefan(cc) could give you >> more professional information and plz correct me if I'm incorrect. >> >> Best, Sihua >> On 06/21/2018 14:40,Garvit Sharma<garvit...@gmail.com> >> <garvit...@gmail.com> wrote: >> >> Hi, >> >> Consider a managed keyed state backed by HDFS with checkpointing enabled. >> Now, as the state grows the state data will be saved on HDFS. >> >> Now, let's say, we clear the state. Would the state data be removed from >> HDFS too? >> >> How does Flink manage to clear the state data from state backend on >> clearing the keyed state? >> >> -- >> >> Garvit Sharma >> github.com/garvitlnmiit/ >> >> No Body is a Scholar by birth, its only hard work and strong >> determination that makes him master. >> >> > > -- > > Garvit Sharma > github.com/garvitlnmiit/ > > No Body is a Scholar by birth, its only hard work and strong determination > that makes him master. > > -- Garvit Sharma github.com/garvitlnmiit/ No Body is a Scholar by birth, its only hard work and strong determination that makes him master.