Hi Guys,
Thanks so much for your quick replies, very appreciated!
Thanks
Tianji
On Wed, Mar 1, 2017 at 2:53 PM, Matthias J. Sax
wrote:
> It should be:
>
> groupBy -> always trigger repartitioning
> groupByKey -> maybe trigger repartitioning
>
> And there will not be two
It should be:
groupBy -> always trigger repartitioning
groupByKey -> maybe trigger repartitioning
And there will not be two repartitioning topics. The repartitioning will
be done by the groupBy/groupByKey operation, and thus, in the
aggregation step we know that data is correctly partitioned and
FYI: The difference between `groupBy` (may trigger re-partitioning) vs.
`groupByKey` (does not trigger re-partitioning) also applies to:
- `map` vs. `mapValues`
- `flatMap` vs. `flatMapValues`
On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy wrote:
> If you use
If you use stream.groupByKey() then there will be no repartitioning as long
as there have been no key changing operations preceding it, i.e, map,
selectKey, flatMap, transform. If you use stream.groupBy(...) then we see
it as a key changing operation, hence we need to repartition the data.
On