Hello, I am trying to figure out if Apache Beam is the right framework for my use case. I have an unbounded stream, and there are a number of questions I would like to ask regarding the records in the stream:
- For example, one question may be trying to find a record with attribute A followed within no more than a minute by a record with attribute B followed within no more than 5 minutes by a record with attribute C. - Another question may be trying to find a sequence of at least N records with attribute X within 5 hours of each other, followed by an record with attribute Y no more than an hour later. - A third question would be seeing if there exist a record with attribute K *not* followed by a record with attribute L in the next 10 minutes. Every time I encounter the pattern of records I’m looking for, I would like to perform an action. If I understand the Beam model correctly, each question would correspond to a separate pipeline I would create, and it also sounds like I’m looking for session windows. However, I’m assuming I would need to tee the input source to all the separate pipelines? I have tried to look for documentation and/or examples on whether Apache Beam can be used to express such a setup and how to do it if so, but I haven’t been able to find anything concrete. Any help would be appreciated. Thanks! Ray
