Re: Best approach to aggregate state with idle timeout and periodic output
hi, we have a similar use case, and from my experience it is not simple (if at all) to implement your logic when using a session window. Eventually we used the state + timers approach and we have full control of the cleanup and merge logic. Thanks Sigalit On Mon, May 12, 2025 at 1:35 PM Sachin Mittal wrote: > Second approach is good to try out. > I am also solving for a similar problem using this approach only. > > Thanks > Sachin > > > On Mon, 12 May 2025 at 3:52 PM, Ehud Lev wrote: > >> Hi Flink community, >> >> We're building a Flink topology that aggregates events by key. When a key >> is seen for the first time, we load its base state from an external store. >> After performing some calculations, we emit the result to Kafka, and >> another process is responsible for writing it back to the store. >> >> Our goals are: >> >>1. >> >>*Keep per-key state alive for a few hours*, but *remove it if no new >>events arrive during that time*. >>2. >> >>*Suppress frequent updates* and *emit output only every 5 seconds*, >>but *only if updates occurred* during that interval. >> >> I see two possible implementation strategies: >> >>1. >> >>*Session Windows (processing time):* >>This would allow automatic state cleanup based on inactivity, but I'm >>concerned about *merging session states*, since our logic depends on >>having a well-defined "base state" and applying updates to it. I can not >>handle merge basically. >>2. >> >>*KeyedProcessFunction with timers:* >>In this approach, we’d maintain keyed state, and use: >>- >> >> A *5-second periodic timer* to check for and emit updates (if any). >> - >> >> That same timer can also track *last-seen timestamps* to expire >> state after a few idle hours. >> >> *My questions:* >> Is the session window approach even suitable here, given the merging >> issue I have? >> Is the KeyedProcessFunction with a "single recurring timer" a >> recommended pattern in such use cases? >> Are there any caveats or better alternatives for managing "suppressed + >> expiring" keyed state? >> >> >> Thanks in advance! >> >> -- >> Ehud Lev, Staff Engineer >> email: ehud@forter.com web: www.forter.com >> mobile: 052-5832253 >> >
Re: Best approach to aggregate state with idle timeout and periodic output
Second approach is good to try out. I am also solving for a similar problem using this approach only. Thanks Sachin On Mon, 12 May 2025 at 3:52 PM, Ehud Lev wrote: > Hi Flink community, > > We're building a Flink topology that aggregates events by key. When a key > is seen for the first time, we load its base state from an external store. > After performing some calculations, we emit the result to Kafka, and > another process is responsible for writing it back to the store. > > Our goals are: > >1. > >*Keep per-key state alive for a few hours*, but *remove it if no new >events arrive during that time*. >2. > >*Suppress frequent updates* and *emit output only every 5 seconds*, >but *only if updates occurred* during that interval. > > I see two possible implementation strategies: > >1. > >*Session Windows (processing time):* >This would allow automatic state cleanup based on inactivity, but I'm >concerned about *merging session states*, since our logic depends on >having a well-defined "base state" and applying updates to it. I can not >handle merge basically. >2. > >*KeyedProcessFunction with timers:* >In this approach, we’d maintain keyed state, and use: >- > > A *5-second periodic timer* to check for and emit updates (if any). > - > > That same timer can also track *last-seen timestamps* to expire > state after a few idle hours. > > *My questions:* > Is the session window approach even suitable here, given the merging issue > I have? > Is the KeyedProcessFunction with a "single recurring timer" a recommended > pattern in such use cases? > Are there any caveats or better alternatives for managing "suppressed + > expiring" keyed state? > > > Thanks in advance! > > -- > Ehud Lev, Staff Engineer > email: ehud@forter.com web: www.forter.com > mobile: 052-5832253 >
Best approach to aggregate state with idle timeout and periodic output
Hi Flink community, We're building a Flink topology that aggregates events by key. When a key is seen for the first time, we load its base state from an external store. After performing some calculations, we emit the result to Kafka, and another process is responsible for writing it back to the store. Our goals are: 1. *Keep per-key state alive for a few hours*, but *remove it if no new events arrive during that time*. 2. *Suppress frequent updates* and *emit output only every 5 seconds*, but *only if updates occurred* during that interval. I see two possible implementation strategies: 1. *Session Windows (processing time):* This would allow automatic state cleanup based on inactivity, but I'm concerned about *merging session states*, since our logic depends on having a well-defined "base state" and applying updates to it. I can not handle merge basically. 2. *KeyedProcessFunction with timers:* In this approach, we’d maintain keyed state, and use: - A *5-second periodic timer* to check for and emit updates (if any). - That same timer can also track *last-seen timestamps* to expire state after a few idle hours. *My questions:* Is the session window approach even suitable here, given the merging issue I have? Is the KeyedProcessFunction with a "single recurring timer" a recommended pattern in such use cases? Are there any caveats or better alternatives for managing "suppressed + expiring" keyed state? Thanks in advance! -- Ehud Lev, Staff Engineer email: ehud@forter.com web: www.forter.com mobile: 052-5832253