Also, is it possible to react in real-time to each events after
groupbykey? In the use-case above, we need to react to each event but
still aggregate the events and sessionize them. I went over windowing
/ triggers / groupby, and thus far I couldn't figure out a way to do
it...

On Wed, Nov 10, 2021 at 4:21 PM Desmond F <[email protected]> wrote:
>
> Hi all,
>
> We have many clients connected via websockets through api gateway on
> AWS, these clients submit events of various types periodically, each
> event contains a sessionID (generated by the client), the session
> logically ends when there's no activity for a specified duration of
> time. We have a sequence model (RNN) written in PyTorch that needs to
> send predictions back to the clients for each event processed. The
> input to the model contains the raw events sent in the order they were
> generated.
>
> I believe the pipeline looks something like this: Source -> Group By
> Session ID -> Accumulate Events and Make Predictions
>
> 1. In your opinion, does this use-case fit naturally with Apache Beam
> programming model?
> 2. We'd like the session to expire after a predefined duration
> (processing time) of no activity, how can this be achieved?
> 3. Before session expiry we'd like to produce the session details to
> another task that batchify sessions and writes them to S3
> 4. The "Accumulate Events and Make Predictions" should be a stateful
> function that builds the session and for each event it should call the
> model with the historical events, what is the best way to achieve
> that?
> 5. How to ensure that events for each key arrive in the same order
> they were produced? Is it already guaranteed?
> 6. The model needs to react to events in real-time (i.e, in less than
> 500ms), do you believe this is achievable?
>
> Thanks in advance,
> Desmond

Reply via email to