tion that compares both input records on all
> (non-distinct-key) fields to determine which record to return.
>
> I would go for the second approach because it is more efficient (no need
> to fully sort before the combiner).
>
> Best, Fabian
>
> 2018-08-09 18:12 GMT
I'm operating on a data set with some challenges to overcome. They are:
1. There is possibility for multiple entries for a single key
and
2. For a single key, there may be multiple unique value-tuples
For example
key, val1, val2, val3
1, 0,0,0
1, 0,0,0
1,