Re: Dataset.distinct - Question on deterministic results

2018-08-10 Thread Will Bastian
Fabian, Thanks for the clear response. You addressed my question, and the suggestions provide clear context on how to address. Best, Will On Fri, Aug 10, 2018 at 5:52 AM Fabian Hueske wrote: > Hi Will, > > The distinct operator is implemented as a groupBy(distinctKeys) and a > ReduceFunction

Re: Dataset.distinct - Question on deterministic results

2018-08-10 Thread Fabian Hueske
Hi Will, The distinct operator is implemented as a groupBy(distinctKeys) and a ReduceFunction that returns the first argument. Hence, it depends on the order in which the records are processed by the ReduceFunction. Flink does not maintain a deterministic order because it is quite expensive in