I think you'd be better off using the State Processor API [1] instead. The State Processor API has cleaner semantics -- as you'll be seeing a self-consistent snapshot of all the state -- and it's also much more performant.
Note also that the Queryable State API is "approaching end of life" [2]. The long-term objective is to replace this with something more useful. Regards, David [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html [2] https://flink.apache.org/roadmap.html On Sun, May 2, 2021 at 9:07 PM Lian Jiang <jiangok2...@gmail.com> wrote: > Hi, > > I am interested in dumping Flink state from Rockdb to datalake using state > query > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/queryable_state/. > My map state could have 200 million key-values pairs and the total size > could be 150G bytes. My batch job scheduled using airflow will have one > task which uses Flink state query to dump the Flink state to datalake in > parquet format so other spark tasks can use it. > > Is there any scalability concern for using state query in this way? > Appreciate any insight. Thanks! >