It does sound to me like your use case may be a good fit for using DoFn with state. I realize now that RequiresTimeSortedInput may not be supported for the configuration. As a workaround, you can use OrderedListState to buffer elements and then process them in order using a timer to "wake up" your DoFn at the time when no more data can arrive for a window.
Kenn On Tue, May 6, 2025 at 4:17 AM Shaochen Bai <shaoc...@kisi.io> wrote: > Hello, > > Our objective is to maintain a persistent, time-relevant state per key. > What we do now is that we use non-overlapping windows and apply > GroupByKey.create() to gather an array of windowed data for each key. We > then sort the data by timestamp and iterate through the array to update the > associated state. > > At the moment, we rely on external storage (Bigtable) to persist state > across windows. However, the I/O overhead from reading and writing to > Bigtable has become significant. As a result, we're exploring the > possibility of using stateful processing to manage this state more > efficiently. > > Best, > > Shaochen > > On 5 May 2025, at 17:46, Kenneth Knowles <k...@apache.org> wrote: > > Hello! > > This is not possible in a simple way, because of the main fact: windows > are processed simultaneously. > > Many windows may have some state and incoming data at the same time, even > if the time ranges of your windows do not overlap. So, sharing state across > windows would need concurrency control (potentially distributed concurrency > control) or it would need to wait for all data to arrive and then sort it. > Beam state is designed to avoid this, so that it can scale up efficiently. > > If you want to have some running accumulator that processes some data in > order, there may be a way to express it, for example there is > a @RequiresTimeSortedInput annotation that sometimes can handle it. > > Can you share more about your use case? > > Kenn > > On Fri, May 2, 2025 at 9:03 AM Shaochen Bai <shaoc...@kisi.io> wrote: > >> Hello, >> >> I read that in stateful processing >> <https://www.google.com/url?q=https://beam.apache.org/blog/stateful-processing/&source=gmail-imap&ust=1747064833000000&usg=AOvVaw1-iRmfnjXeUF2s_zgI0TRD> >> with Apache Beam, *“a state cell is scoped to a key+window pair.”* What >> if I want to maintain a persistent state *across* windows? Is there a >> workaround for this, and what are the common practices in such cases? >> >> Thanks! >> >> Best, >> Shaochen >> >> >> >> >> --- >> This email is confidential/privileged. If you're not the intended >> recipient, please delete it and notify us immediately; please do not >> copy/use/disclose it for any purpose, to anyone. Thank you! >> > > > --- > This email is confidential/privileged. If you're not the intended > recipient, please delete it and notify us immediately; please do not > copy/use/disclose it for any purpose, to anyone. Thank you! >