Interesting suggestion. Are you proposing that we transform our original
windowing strategy into a global window + OrderedListState +timer? Then the
state can indeed persist throughout the entire execution.
The only problem left I think is that the state would be lost whenever we
cancel and redeploy the pipeline.


On 6 May 2025, at 19:10, Kenneth Knowles <k...@apache.org> wrote:

It does sound to me like your use case may be a good fit for using DoFn
with state. I realize now that RequiresTimeSortedInput may not be supported
for the configuration. As a workaround, you can use OrderedListState to
buffer elements and then process them in order using a timer to "wake up"
your DoFn at the time when no more data can arrive for a window.

Kenn

On Tue, May 6, 2025 at 4:17 AM Shaochen Bai <shaoc...@kisi.io> wrote:

> Hello,
>
> Our objective is to maintain a persistent, time-relevant state per key.
> What we do now is that we use non-overlapping windows and apply
> GroupByKey.create() to gather an array of windowed data for each key. We
> then sort the data by timestamp and iterate through the array to update the
> associated state.
>
> At the moment, we rely on external storage (Bigtable) to persist state
> across windows. However, the I/O overhead from reading and writing to
> Bigtable has become significant. As a result, we're exploring the
> possibility of using stateful processing to manage this state more
> efficiently.
>
> Best,
>
> Shaochen
>
> On 5 May 2025, at 17:46, Kenneth Knowles <k...@apache.org> wrote:
>
> Hello!
>
> This is not possible in a simple way, because of the main fact: windows
> are processed simultaneously.
>
> Many windows may have some state and incoming data at the same time, even
> if the time ranges of your windows do not overlap. So, sharing state across
> windows would need concurrency control (potentially distributed concurrency
> control) or it would need to wait for all data to arrive and then sort it.
> Beam state is designed to avoid this, so that it can scale up efficiently.
>
> If you want to have some running accumulator that processes some data in
> order, there may be a way to express it, for example there is
> a @RequiresTimeSortedInput annotation that sometimes can handle it.
>
> Can you share more about your use case?
>
> Kenn
>
> On Fri, May 2, 2025 at 9:03 AM Shaochen Bai <shaoc...@kisi.io> wrote:
>
>> Hello,
>>
>> I read that in stateful processing
>> <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://beam.apache.org/blog/stateful-processing/%26source%3Dgmail-imap%26ust%3D1747064833000000%26usg%3DAOvVaw1-iRmfnjXeUF2s_zgI0TRD&source=gmail-imap&ust=1747156242000000&usg=AOvVaw1vHgxBtfjy-eUwiPWzy6ZD>
>> with Apache Beam, *“a state cell is scoped to a key+window pair.”* What
>> if I want to maintain a persistent state *across* windows? Is there a
>> workaround for this, and what are the common practices in such cases?
>>
>> Thanks!
>>
>> Best,
>> Shaochen
>>
>>
>>
>>
>> ---
>> This email is confidential/privileged. If you're not the intended
>> recipient, please delete it and notify us immediately; please do not
>> copy/use/disclose it for any purpose, to anyone. Thank you!
>>
>
>
> ---
> This email is confidential/privileged. If you're not the intended
> recipient, please delete it and notify us immediately; please do not
> copy/use/disclose it for any purpose, to anyone. Thank you!
>

-- 
---
This email is confidential/privileged. If you're not the intended 
recipient, please delete it and notify us immediately; please do not 
copy/use/disclose it for any purpose, to anyone. Thank you!

Reply via email to