Another way to put this question is, how do we write a beam pipeline for an existing pipeline (in Java) that has a dozen of custom objects and you have to work with multiple HashMaps of those custom objects in order to transform it. Currently, I am writing a beam pipeline by using the same Custom objects, getters and setters and HashMap<CustomObjects> *but inside a DoFn*. Is this the optimal way or does Beam offer something else?
On Mon, Jun 22, 2020 at 3:47 PM Praveen K Viswanathan < [email protected]> wrote: > Hi Luke, > > We can say Map 2 as a kind of a template using which you want to enrich > data in Map 1. As I mentioned in my previous post, this is a high level > scenario. > > All these logic are spread across several classes (with ~4K lines of code > in total). As in any Java application, > > 1. The code has been modularized with multiple method calls > 2. Passing around HashMaps<CustomObject> as argument to each method > 3. Accessing the attributes of the custom object using getters and setters. > > This is a common pattern in a normal Java application but I have not seen > such an example of code in Beam. > > > On Mon, Jun 22, 2020 at 8:23 AM Luke Cwik <[email protected]> wrote: > >> Who reads map 1? >> Can it be stale? >> >> It is unclear what you are trying to do in parallel and why you wouldn't >> stick all this logic into a single DoFn / stateful DoFn. >> >> On Sat, Jun 20, 2020 at 7:14 PM Praveen K Viswanathan < >> [email protected]> wrote: >> >>> Hello Everyone, >>> >>> I am in the process of implementing an existing pipeline (written using >>> Java and Kafka) in Apache Beam. The data from the source stream is >>> contrived and had to go through several steps of enrichment using REST API >>> calls and parsing of JSON data. The key >>> transformation in the existing pipeline is in shown below (a super high >>> level flow) >>> >>> *Method A* >>> ----Calls *Method B* >>> ----Creates *Map 1, Map 2* >>> ----Calls *Method C* >>> ----Read *Map 2* >>> ----Create *Map 3* >>> ----*Method C* >>> ----Read *Map 3* and >>> ----update *Map 1* >>> >>> The Map we use are multi-level maps and I am thinking of having >>> PCollections for each Maps and pass them as side inputs in a DoFn wherever >>> I have transformations that need two or more Maps. But there are certain >>> tasks which I want to make sure that I am following right approach, for >>> instance updating one of the side input maps inside a DoFn. >>> >>> These are my initial thoughts/questions and I would like to get some >>> expert advice on how we typically design such an interleaved transformation >>> in Apache Beam. Appreciate your valuable insights on this. >>> >>> -- >>> Thanks, >>> Praveen K Viswanathan >>> >> > > -- > Thanks, > Praveen K Viswanathan > -- Thanks, Praveen K Viswanathan
