Thank you Luke. I will work on implementing my use case with Stateful ParDo itself and come back if I have any questions.
Appreciate your help. On Fri, Jun 26, 2020 at 8:14 AM Luke Cwik <lc...@google.com> wrote: > Use a stateful DoFn and buffer the elements in a bag state. You'll want to > use a key that contains enough data to match your join condition you are > trying to match. For example, if your trying to match on a customerId then > you would do something like: > element 1 -> ParDo(extract customer id) -> KV<customer id, element 1> -> > stateful ParDo(buffer element 1 in bag state) > ... > element 5 -> ParDo(extract customer id) -> KV<customer id, element 5> -> > stateful ParDo(output all element in bag) > > If you are matching on cudomerId and eventId then you would use a > composite key (customerId, eventId). > > You can always use a single global key but you will lose all parallelism > during processing (for small pipelines this likely won't matter). > > On Fri, Jun 26, 2020 at 7:29 AM Praveen K Viswanathan < > harish.prav...@gmail.com> wrote: > >> Hi All - I have a DoFn which generates data (KV pair) for each element >> that it is processing. It also has to read from that KV for other elements >> based on a key which means, the KV has to retain all the data that's >> getting added to it while processing every element. I was thinking >> about the "slow-caching side input pattern" but it is more of caching >> outside the DoFn and then using it inside. It doesn't update the cache >> inside a DoFn. Please share if anyone has thoughts on how to approach this >> case. >> >> Element 1 > Add a record to a KV > ..... Element 5 > Used the value from >> KV if there is a match in the key >> >> -- >> Thanks, >> Praveen K Viswanathan >> > -- Thanks, Praveen K Viswanathan