Use CoGroupByKey to join the two PCollections and emit only the first value of each iterable with the key.
Duplicates will appear as iterables with more then one value while keys without duplicates will have iterables containing exactly one value. On Wed, Aug 10, 2022 at 12:25 PM Shivam Singhal <[email protected]> wrote: > I have two PCollections, CollectionA & CollectionB of type KV<String, > Byte[]>. > > > I would like to merge them into one PCollection but CollectionA & > CollectionB might have some elements with the same key. In those repeated > cases, I would like to keep the element from CollectionA & drop the > repeated element from CollectionB. > > Does anyone know a simple method to do this? > > Thanks, > Shivam Singhal >
