Re: Use State query to dump state into datalake

David Anderson Mon, 03 May 2021 01:18:30 -0700

I think you'd be better off using the State Processor API [1] instead. The
State Processor API has cleaner semantics -- as you'll be seeing a
self-consistent snapshot of all the state -- and it's also much more
performant.


Note also that the Queryable State API is "approaching end of life" [2].
The long-term objective is to replace this with something more useful.

Regards,
David

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html
[2] https://flink.apache.org/roadmap.html

On Sun, May 2, 2021 at 9:07 PM Lian Jiang <jiangok2...@gmail.com> wrote:

> Hi,
>
> I am interested in dumping Flink state from Rockdb to datalake using state
> query
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/queryable_state/.
> My map state could have 200 million key-values pairs and the total size
> could be 150G bytes. My batch job scheduled using airflow will have one
> task which uses Flink state query to dump the Flink state to datalake in
> parquet format so other spark tasks can use it.
>
> Is there any scalability concern for using state query in this way?
> Appreciate any insight. Thanks!
>

Re: Use State query to dump state into datalake

Reply via email to