Re: Dataset.distinct - Question on deterministic results

2018-08-10 Thread Will Bastian
tion that compares both input records on all > (non-distinct-key) fields to determine which record to return. > > I would go for the second approach because it is more efficient (no need > to fully sort before the combiner). > > Best, Fabian > > 2018-08-09 18:12 GMT

Dataset.distinct - Question on deterministic results

2018-08-09 Thread Will Bastian
I'm operating on a data set with some challenges to overcome. They are: 1. There is possibility for multiple entries for a single key and 2. For a single key, there may be multiple unique value-tuples For example key, val1, val2, val3 1, 0,0,0 1, 0,0,0 1,