Hi Mary, the groupBy + reduceGroup works across all partitions of a DataSet. This means that elements from each partition are grouped (creating potentially a new partitioning) and then for each group the reduceGroup function is executed.
Cheers, Till On Thu, Apr 20, 2017 at 5:14 PM, Mary m <mm_program...@yahoo.com> wrote: > Hi > If groupeby+reduceGroup is used, does each groupeby+reduceGroup take place > on a single partition? If yes, if we have more groups than the partitions, > what happens? > > Cheers, > Mary >