Do the records have another attribute Z which joins them all together? Are the set of attributes A, B, C, X, Y, K, L are from a fixed set of values like enums or can be mapped onto a certain number of states (like an attribute A > 50 can be mapped onto a state "exceeds threshold")? For your examples, what should occur when there is late data in your three scenarios?
On Wed, Dec 21, 2016 at 9:05 AM, Ray Ruvinskiy <[email protected] > wrote: > Hello, > > I am trying to figure out if Apache Beam is the right framework for my use > case. I have an unbounded stream, and there are a number of questions I > would like to ask regarding the records in the stream: > > - For example, one question may be trying to find a record with attribute > A followed within no more than a minute by a record with attribute B > followed within no more than 5 minutes by a record with attribute C. > - Another question may be trying to find a sequence of at least N records > with attribute X within 5 hours of each other, followed by an record with > attribute Y no more than an hour later. > - A third question would be seeing if there exist a record with attribute > K *not* followed by a record with attribute L in the next 10 minutes. > > Every time I encounter the pattern of records I’m looking for, I would > like to perform an action. If I understand the Beam model correctly, each > question would correspond to a separate pipeline I would create, and it > also sounds like I’m looking for session windows. However, I’m assuming I > would need to tee the input source to all the separate pipelines? I have > tried to look for documentation and/or examples on whether Apache Beam can > be used to express such a setup and how to do it if so, but I haven’t been > able to find anything concrete. Any help would be appreciated. > > Thanks! > > Ray > > >
