Hi Flink community,

We're building a Flink topology that aggregates events by key. When a key
is seen for the first time, we load its base state from an external store.
After performing some calculations, we emit the result to Kafka, and
another process is responsible for writing it back to the store.

Our goals are:

   1.

   *Keep per-key state alive for a few hours*, but *remove it if no new
   events arrive during that time*.
   2.

   *Suppress frequent updates* and *emit output only every 5 seconds*,
but *only
   if updates occurred* during that interval.

I see two possible implementation strategies:

   1.

   *Session Windows (processing time):*
   This would allow automatic state cleanup based on inactivity, but I'm
   concerned about *merging session states*, since our logic depends on
   having a well-defined "base state" and applying updates to it. I can not
   handle merge basically.
   2.

   *KeyedProcessFunction with timers:*
   In this approach, we’d maintain keyed state, and use:
   -

      A *5-second periodic timer* to check for and emit updates (if any).
      -

      That same timer can also track *last-seen timestamps* to expire state
      after a few idle hours.

*My questions:*
Is the session window approach even suitable here, given the merging issue
I have?
Is the KeyedProcessFunction with a "single recurring timer" a recommended
pattern in such use cases?
Are there any caveats or better alternatives for managing "suppressed +
expiring" keyed state?


Thanks in advance!

-- 
Ehud Lev, Staff Engineer
email: ehud....@forter.com  web: www.forter.com
mobile: 052-5832253

Reply via email to