Hello Everyone,

I am in the process of implementing an existing pipeline (written using
Java and Kafka) in Apache Beam. The data from the source stream is
contrived and had to go through several steps of enrichment using REST API
calls and parsing of JSON data. The key
transformation in the existing pipeline is in shown below (a super high
level flow)

*Method A*
----Calls *Method B*
      ----Creates *Map 1, Map 2*
----Calls *Method C*
     ----Read *Map 2*
     ----Create *Map 3*
----*Method C*
     ----Read *Map 3* and
     ----update *Map 1*

The Map we use are multi-level maps and I am thinking of having
PCollections for each Maps and pass them as side inputs in a DoFn wherever
I have transformations that need two or more Maps. But there are certain
tasks which I want to make sure that I am following right approach, for
instance updating one of the side input maps inside a DoFn.

These are my initial thoughts/questions and I would like to get some expert
advice on how we typically design such an interleaved transformation in
Apache Beam. Appreciate your valuable insights on this.

-- 
Thanks,
Praveen K Viswanathan

Reply via email to