Re: Best approach to aggregate state with idle timeout and periodic output

2025-05-12 Thread Sigalit Eliazov
hi,
we have a similar use case,  and from my experience it is not simple (if at
all) to implement your logic when using a session window.
Eventually we used the state + timers approach and we have full  control of
the cleanup and merge logic.

Thanks
Sigalit

On Mon, May 12, 2025 at 1:35 PM Sachin Mittal  wrote:

> Second approach is good to try out.
> I am also solving for a similar problem using this approach only.
>
> Thanks
> Sachin
>
>
> On Mon, 12 May 2025 at 3:52 PM, Ehud Lev  wrote:
>
>> Hi Flink community,
>>
>> We're building a Flink topology that aggregates events by key. When a key
>> is seen for the first time, we load its base state from an external store.
>> After performing some calculations, we emit the result to Kafka, and
>> another process is responsible for writing it back to the store.
>>
>> Our goals are:
>>
>>1.
>>
>>*Keep per-key state alive for a few hours*, but *remove it if no new
>>events arrive during that time*.
>>2.
>>
>>*Suppress frequent updates* and *emit output only every 5 seconds*,
>>but *only if updates occurred* during that interval.
>>
>> I see two possible implementation strategies:
>>
>>1.
>>
>>*Session Windows (processing time):*
>>This would allow automatic state cleanup based on inactivity, but I'm
>>concerned about *merging session states*, since our logic depends on
>>having a well-defined "base state" and applying updates to it. I can not
>>handle merge basically.
>>2.
>>
>>*KeyedProcessFunction with timers:*
>>In this approach, we’d maintain keyed state, and use:
>>-
>>
>>   A *5-second periodic timer* to check for and emit updates (if any).
>>   -
>>
>>   That same timer can also track *last-seen timestamps* to expire
>>   state after a few idle hours.
>>
>> *My questions:*
>> Is the session window approach even suitable here, given the merging
>> issue I have?
>> Is the KeyedProcessFunction with a "single recurring timer" a
>> recommended pattern in such use cases?
>> Are there any caveats or better alternatives for managing "suppressed +
>> expiring" keyed state?
>>
>>
>> Thanks in advance!
>>
>> --
>> Ehud Lev, Staff Engineer
>> email: ehud@forter.com  web: www.forter.com
>> mobile: 052-5832253
>>
>


Re: Best approach to aggregate state with idle timeout and periodic output

2025-05-12 Thread Sachin Mittal
Second approach is good to try out.
I am also solving for a similar problem using this approach only.

Thanks
Sachin


On Mon, 12 May 2025 at 3:52 PM, Ehud Lev  wrote:

> Hi Flink community,
>
> We're building a Flink topology that aggregates events by key. When a key
> is seen for the first time, we load its base state from an external store.
> After performing some calculations, we emit the result to Kafka, and
> another process is responsible for writing it back to the store.
>
> Our goals are:
>
>1.
>
>*Keep per-key state alive for a few hours*, but *remove it if no new
>events arrive during that time*.
>2.
>
>*Suppress frequent updates* and *emit output only every 5 seconds*,
>but *only if updates occurred* during that interval.
>
> I see two possible implementation strategies:
>
>1.
>
>*Session Windows (processing time):*
>This would allow automatic state cleanup based on inactivity, but I'm
>concerned about *merging session states*, since our logic depends on
>having a well-defined "base state" and applying updates to it. I can not
>handle merge basically.
>2.
>
>*KeyedProcessFunction with timers:*
>In this approach, we’d maintain keyed state, and use:
>-
>
>   A *5-second periodic timer* to check for and emit updates (if any).
>   -
>
>   That same timer can also track *last-seen timestamps* to expire
>   state after a few idle hours.
>
> *My questions:*
> Is the session window approach even suitable here, given the merging issue
> I have?
> Is the KeyedProcessFunction with a "single recurring timer" a recommended
> pattern in such use cases?
> Are there any caveats or better alternatives for managing "suppressed +
> expiring" keyed state?
>
>
> Thanks in advance!
>
> --
> Ehud Lev, Staff Engineer
> email: ehud@forter.com  web: www.forter.com
> mobile: 052-5832253
>


Best approach to aggregate state with idle timeout and periodic output

2025-05-12 Thread Ehud Lev
Hi Flink community,

We're building a Flink topology that aggregates events by key. When a key
is seen for the first time, we load its base state from an external store.
After performing some calculations, we emit the result to Kafka, and
another process is responsible for writing it back to the store.

Our goals are:

   1.

   *Keep per-key state alive for a few hours*, but *remove it if no new
   events arrive during that time*.
   2.

   *Suppress frequent updates* and *emit output only every 5 seconds*,
but *only
   if updates occurred* during that interval.

I see two possible implementation strategies:

   1.

   *Session Windows (processing time):*
   This would allow automatic state cleanup based on inactivity, but I'm
   concerned about *merging session states*, since our logic depends on
   having a well-defined "base state" and applying updates to it. I can not
   handle merge basically.
   2.

   *KeyedProcessFunction with timers:*
   In this approach, we’d maintain keyed state, and use:
   -

  A *5-second periodic timer* to check for and emit updates (if any).
  -

  That same timer can also track *last-seen timestamps* to expire state
  after a few idle hours.

*My questions:*
Is the session window approach even suitable here, given the merging issue
I have?
Is the KeyedProcessFunction with a "single recurring timer" a recommended
pattern in such use cases?
Are there any caveats or better alternatives for managing "suppressed +
expiring" keyed state?


Thanks in advance!

-- 
Ehud Lev, Staff Engineer
email: ehud@forter.com  web: www.forter.com
mobile: 052-5832253