I am maintaining state data for a key in ValueState. As per [0] I can
clear() state for that key.

[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/state.html

Please let me know.

Thanks,


On Thu, Jun 21, 2018 at 1:19 PM sihua zhou <summerle...@163.com> wrote:

> Hi Garvit,
>
> Let's say you clearing the state at timestamp t1, then the checkpoints
> completed before t1 will still contains the data you cleared. But the
> future checkpoints won't contain the cleared data again. But I'm not sure
> what you meaning by the cleared state, you can only clear a key-value pair
> of the state currently, you can't cleared the whole state currently.
>
> Best, Sihua
>
> On 06/21/2018 15:41,Garvit Sharma<garvit...@gmail.com>
> <garvit...@gmail.com> wrote:
>
> So, would it delete all the files in HDFS associated with the cleared
> state?
>
> On Thu, Jun 21, 2018 at 12:58 PM sihua zhou <summerle...@163.com> wrote:
>
>> Hi Garvit,
>>
>> > Now, let's say, we clear the state. Would the state data be removed
>> from HDFS too?
>>
>> The state data would not be removed from HDFS immediately, if you clear
>> the state in your job. But after you clearing the state in your job, the
>> later completed checkpoint won't contain the state any more.
>>
>> > How does Flink manage to clear the state data from state backend on
>> clearing the keyed state?
>>
>> 1. you can use the {{tate.checkpoints.num-retained}} to set the number
>> of the completed checkpoint maintanced on HDFS.
>> 2. If you set {{
>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
>> DELETE_ON_CANCELLATION)}} then the checkpoints on HDFS will be removed
>> once your job is finished(or cancled). And if you set {{
>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
>>  RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained.
>>
>> Please refer to
>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html
>>  to
>> find more information.
>>
>>
>> Additional, I'd like to give a bref info of the checkpoint on HDFS. In a
>> nutshell, what ever you did with the state in your running job, they only
>> effect the content on the state backend locally. When checkpointing, flink
>> takes a snapshot of the local state backend, and send it to the checkpoint
>> target directory(in your case, the HDFS). The checkpoints on the HDFS looks
>> like the periodic snapshot of the state backend of your job, they can be
>> created or deleted but never be changed. Maybe Stefan(cc) could give you
>> more professional information and plz correct me if I'm incorrect.
>>
>> Best, Sihua
>> On 06/21/2018 14:40,Garvit Sharma<garvit...@gmail.com>
>> <garvit...@gmail.com> wrote:
>>
>> Hi,
>>
>> Consider a managed keyed state backed by HDFS with checkpointing enabled.
>> Now, as the state grows the state data will be saved on HDFS.
>>
>> Now, let's say, we clear the state. Would the state data be removed from
>> HDFS too?
>>
>> How does Flink manage to clear the state data from state backend on
>> clearing the keyed state?
>>
>> --
>>
>> Garvit Sharma
>> github.com/garvitlnmiit/
>>
>> No Body is a Scholar by birth, its only hard work and strong
>> determination that makes him master.
>>
>>
>
> --
>
> Garvit Sharma
> github.com/garvitlnmiit/
>
> No Body is a Scholar by birth, its only hard work and strong determination
> that makes him master.
>
>

-- 

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination
that makes him master.

Reply via email to