Hi Shivam, When you say "merge the PCollections" do you mean Flatten, or somehow join? CoGroupByKey[1] would be a good choice if you need to join based on key. You would then be able to implement application logic to keep 1 of the 2 records if there is a way to decipher an element from CollectionA vs. CollectionB by only examining the elements.
If there isn't a natural way of determining which element to keep by only examining the elements themselves, you could further nest the data in a KV ex. If CollectionA holds data like KV<k1, v1> and CollectionB is KV<k1, v2> you could transform these into something like KV<k1, KV<"COLLECTION_A", v1>> and KV<k1, KV<"COLLECTION_B", v2>>. Then when you CoGroupByKey, these elements would be grouped based on both having k1, and the source/origin PCollection could be deciphered based on the key of the inner KV. Thanks, Evan [1] https://beam.apache.org/documentation/transforms/java/aggregation/cogroupbykey/ On Wed, Aug 10, 2022 at 3:25 PM Shivam Singhal <[email protected]> wrote: > I have two PCollections, CollectionA & CollectionB of type KV<String, > Byte[]>. > > > I would like to merge them into one PCollection but CollectionA & > CollectionB might have some elements with the same key. In those repeated > cases, I would like to keep the element from CollectionA & drop the > repeated element from CollectionB. > > Does anyone know a simple method to do this? > > Thanks, > Shivam Singhal >
