subject:"Re\: groupBy without auto\-repartition topics for Kafka Streams"

Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-02 Thread Tianji Li

Hi Guys, Thanks so much for your quick replies, very appreciated! Thanks Tianji On Wed, Mar 1, 2017 at 2:53 PM, Matthias J. Sax wrote: > It should be: > > groupBy -> always trigger repartitioning > groupByKey -> maybe trigger repartitioning > > And there will not be two

Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Matthias J. Sax

It should be: groupBy -> always trigger repartitioning groupByKey -> maybe trigger repartitioning And there will not be two repartitioning topics. The repartitioning will be done by the groupBy/groupByKey operation, and thus, in the aggregation step we know that data is correctly partitioned and

Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Michael Noll

FYI: The difference between `groupBy` (may trigger re-partitioning) vs. `groupByKey` (does not trigger re-partitioning) also applies to: - `map` vs. `mapValues` - `flatMap` vs. `flatMapValues` On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy wrote: > If you use

Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Damian Guy

If you use stream.groupByKey() then there will be no repartitioning as long as there have been no key changing operations preceding it, i.e, map, selectKey, flatMap, transform. If you use stream.groupBy(...) then we see it as a key changing operation, hence we need to repartition the data. On