Also, is it possible to react in real-time to each events after groupbykey? In the use-case above, we need to react to each event but still aggregate the events and sessionize them. I went over windowing / triggers / groupby, and thus far I couldn't figure out a way to do it...
On Wed, Nov 10, 2021 at 4:21 PM Desmond F <[email protected]> wrote: > > Hi all, > > We have many clients connected via websockets through api gateway on > AWS, these clients submit events of various types periodically, each > event contains a sessionID (generated by the client), the session > logically ends when there's no activity for a specified duration of > time. We have a sequence model (RNN) written in PyTorch that needs to > send predictions back to the clients for each event processed. The > input to the model contains the raw events sent in the order they were > generated. > > I believe the pipeline looks something like this: Source -> Group By > Session ID -> Accumulate Events and Make Predictions > > 1. In your opinion, does this use-case fit naturally with Apache Beam > programming model? > 2. We'd like the session to expire after a predefined duration > (processing time) of no activity, how can this be achieved? > 3. Before session expiry we'd like to produce the session details to > another task that batchify sessions and writes them to S3 > 4. The "Accumulate Events and Make Predictions" should be a stateful > function that builds the session and for each event it should call the > model with the historical events, what is the best way to achieve > that? > 5. How to ensure that events for each key arrive in the same order > they were produced? Is it already guaranteed? > 6. The model needs to react to events in real-time (i.e, in less than > 500ms), do you believe this is achievable? > > Thanks in advance, > Desmond
