Hi all, We have many clients connected via websockets through api gateway on AWS, these clients submit events of various types periodically, each event contains a sessionID (generated by the client), the session logically ends when there's no activity for a specified duration of time. We have a sequence model (RNN) written in PyTorch that needs to send predictions back to the clients for each event processed. The input to the model contains the raw events sent in the order they were generated.
I believe the pipeline looks something like this: Source -> Group By Session ID -> Accumulate Events and Make Predictions 1. In your opinion, does this use-case fit naturally with Apache Beam programming model? 2. We'd like the session to expire after a predefined duration (processing time) of no activity, how can this be achieved? 3. Before session expiry we'd like to produce the session details to another task that batchify sessions and writes them to S3 4. The "Accumulate Events and Make Predictions" should be a stateful function that builds the session and for each event it should call the model with the historical events, what is the best way to achieve that? 5. How to ensure that events for each key arrive in the same order they were produced? Is it already guaranteed? 6. The model needs to react to events in real-time (i.e, in less than 500ms), do you believe this is achievable? Thanks in advance, Desmond
