Hi All

I am writing a Flink streaming program in which I need to enrich a
DataStream of user events using some static data set (information base, IB).

For E.g. Let's say we have a static data set of buyers and we have an
incoming clickstream of events, for each event we want to add a boolean
flag indicating whether the doer of the event is a buyer or not.

An ideal way to achieve this would be to partition the incoming stream by
user id, have the buyers set available in a DataSet partitioned again by
user id and then do a look up for each event in the stream into this
DataSet.

Since Flink does not allow using DataSets in a streaming program, how can I
achieve the above ?

Another option could be to use Managed Operator State to store buyers set,
but how can I keep this state distributed by user id so as to avoid network
i/o in individual event look ups ? In case of memory state backend, does
state remain distributed by some key, or is it replicated across all
operator subtasks ?

What is the right design pattern to achieve the above enriching requirement
in a Flink streaming program ?


Thanks

Vijay Kansal
Software Development Engineer
LimeRoad

Reply via email to