Hi Krzysztof,

Non-keyed operator state only supports list-like state [1] as there exist no 
primary key in operator state. That is to say you cannot use map state in 
source operator.


[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/fault-tolerance/state/#using-operator-state


Best,
Yun Tang
________________________________
From: Krzysztof Chmielewski <krzysiek.chmielew...@gmail.com>
Sent: Thursday, December 23, 2021 6:32
To: user <user@flink.apache.org>
Subject: Operator state in New Source API

Hi,
Is it possible to use managed operator state like MapState in an implementation 
of new unified source interface [1]. I'm especially interested with using 
Managed State in SplitEnumerator implementation.

I have a use case that is a variation of File Source where I will have a great 
number of files that I need to process, for example a million. I know that 
FileSource maintains a collection of already processed paths in 
ContinuousFileSplitEnumerator object.

In my case I cannot afford to have all million Strings sitting on my heap. I'm 
hoping to use an operator state for this and build splits in batches, 
periodically adding new files to the alreadyProcessedPaths collection.

Regards,
Krzysztof Chmielewski


[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/

Reply via email to