Hi Marco, In the ideal setup, enrichment data existing in external databases is bootstrapped into the streaming job via Flink's State Processor API, and any follow-up changes to the enrichment data is streamed into the job as a second union input on the enrichment operator. For this solution to scale, lookups to the enrichment data needs to be by the same key as the input data, i.e. the enrichment data is co-partitioned with the input data stream.
I assume you've already thought about whether or not this would work for your case, as it's a common setup for streaming enrichment. Otherwise, I believe your brainstorming is heading in the right direction, in the case that remote database lookups + local caching in state is a must. I'm personally not familiar with the iterative streams in Flink, but in general I think it is currently discouraged to use it. On the other hand, I think using Stateful Function's [1] programing abstraction might work here, as it allows arbitrary messaging between functions and cyclic dataflows. There's also an SDK that allows you to embed StateFun functions within a Flink DataStream job [2]. Very briefly, the way you would model this database cache hit / remote lookup is by implementing a function, e.g. called DatabaseCache. The function would expect message types of Lookup(lookupKey), and replies with a response of Result(lookupKey, value). The abstraction allows you, for on incoming message, to register state (similar to vanilla Flink), as well as register async operations with which you'll use to perform remote database lookups in case of cache / state miss. It also provides means for "timers" in the form of delayed messages being sent to itself, if you need some mechanism for cache invalidation. Hope this provides some direction for you to think about! Cheers, Gordon [1] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/ [2] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/sdk/flink-datastream.html -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
