Dear Flink Community,

I am using Flink with the DataStream API and operators implemented using RichedFunctions. I know that Flink provides a set of window-based operators with time-based semantics and tumbling/sliding windows.

By reading the Flink documentation, I understand that there is the possibility to change the memory backend utilized for storing the in-flight state of the operators. For example, using RocksDB for this purpose to cope with a larger-than-memory state. If I am not wrong, to transparently change the backend (e.g., from in-memory to RocksDB) we have to use a proper API to access the state. For example, the Keyed State API with different abstractions such as ValueState<T>, ListState<T>, etc... as reported here <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/state/>.

My question is related to the utilization of time-based window operators with the RocksDB backend. Suppose for example very large temporal windows with a huge number of keys in the stream. I am wondering if there is a possibility to use the built-in window operators of Flink (e.g., with an AggregateFunction or a more generic ProcessWindowFunction as here <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/windows/>) transparently with RocksDB support as a state back-end, or if I have to develop the window operator in a raw manner using the Keyed State API (e.g., ListState, AggregateState) for this purpose by implementing the underlying window logic manually in the code of RichedFunction of the operator (e.g., a FlatMap).

Thanks for your support,

--
Gabriele Mencagli

Reply via email to