Interesting suggestion. Are you proposing that we transform our original windowing strategy into a global window + OrderedListState +timer? Then the state can indeed persist throughout the entire execution. The only problem left I think is that the state would be lost whenever we cancel and redeploy the pipeline.
On 6 May 2025, at 19:10, Kenneth Knowles <k...@apache.org> wrote: It does sound to me like your use case may be a good fit for using DoFn with state. I realize now that RequiresTimeSortedInput may not be supported for the configuration. As a workaround, you can use OrderedListState to buffer elements and then process them in order using a timer to "wake up" your DoFn at the time when no more data can arrive for a window. Kenn On Tue, May 6, 2025 at 4:17 AM Shaochen Bai <shaoc...@kisi.io> wrote: > Hello, > > Our objective is to maintain a persistent, time-relevant state per key. > What we do now is that we use non-overlapping windows and apply > GroupByKey.create() to gather an array of windowed data for each key. We > then sort the data by timestamp and iterate through the array to update the > associated state. > > At the moment, we rely on external storage (Bigtable) to persist state > across windows. However, the I/O overhead from reading and writing to > Bigtable has become significant. As a result, we're exploring the > possibility of using stateful processing to manage this state more > efficiently. > > Best, > > Shaochen > > On 5 May 2025, at 17:46, Kenneth Knowles <k...@apache.org> wrote: > > Hello! > > This is not possible in a simple way, because of the main fact: windows > are processed simultaneously. > > Many windows may have some state and incoming data at the same time, even > if the time ranges of your windows do not overlap. So, sharing state across > windows would need concurrency control (potentially distributed concurrency > control) or it would need to wait for all data to arrive and then sort it. > Beam state is designed to avoid this, so that it can scale up efficiently. > > If you want to have some running accumulator that processes some data in > order, there may be a way to express it, for example there is > a @RequiresTimeSortedInput annotation that sometimes can handle it. > > Can you share more about your use case? > > Kenn > > On Fri, May 2, 2025 at 9:03 AM Shaochen Bai <shaoc...@kisi.io> wrote: > >> Hello, >> >> I read that in stateful processing >> <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://beam.apache.org/blog/stateful-processing/%26source%3Dgmail-imap%26ust%3D1747064833000000%26usg%3DAOvVaw1-iRmfnjXeUF2s_zgI0TRD&source=gmail-imap&ust=1747156242000000&usg=AOvVaw1vHgxBtfjy-eUwiPWzy6ZD> >> with Apache Beam, *“a state cell is scoped to a key+window pair.”* What >> if I want to maintain a persistent state *across* windows? Is there a >> workaround for this, and what are the common practices in such cases? >> >> Thanks! >> >> Best, >> Shaochen >> >> >> >> >> --- >> This email is confidential/privileged. If you're not the intended >> recipient, please delete it and notify us immediately; please do not >> copy/use/disclose it for any purpose, to anyone. Thank you! >> > > > --- > This email is confidential/privileged. If you're not the intended > recipient, please delete it and notify us immediately; please do not > copy/use/disclose it for any purpose, to anyone. Thank you! > -- --- This email is confidential/privileged. If you're not the intended recipient, please delete it and notify us immediately; please do not copy/use/disclose it for any purpose, to anyone. Thank you!