You can use rebalance before keyBy because rebalance returns DataStream. The API does not allow rebalance on keyedStreamed which is returned after keyBy so you are safe.
On Mon 26 Nov, 2018, 2:25 PM Avi Levi <[email protected] wrote: > Ok, thanks for the clarification. but if I use it with keyed state so the > partition is by the key. rebalancing will not shuffle this partitioning ? > e.g > .addSource(source) > .rebalance > .keyBy(_.id) > .mapWithState(...) > > > On Mon, Nov 26, 2018 at 8:32 AM Taher Koitawala <[email protected]> > wrote: > >> Hi Avi, >> No, rebalance is not changing the number of kafka partitions. >> Lets say you have 6 kafka partitions and your flink parallelism is 8, in >> this case using rebalance will send records to all downstream operators in >> a round robin fashion. >> >> Regards, >> Taher Koitawala >> GS Lab Pune >> +91 8407979163 >> >> >> On Mon, Nov 26, 2018 at 11:33 AM Avi Levi <[email protected]> >> wrote: >> >>> Hi >>> Looking at this example >>> <https://github.com/dataArtisans/kafka-example/blob/master/src/main/java/com/dataartisans/ReadFromKafka.java>, >>> doing the "rebalance" (e.g messageStream.rebalance().map(...) ) >>> operation on heavy load stream wouldn't slow the stream ? is the >>> rebalancing action occurs only when there is a partition change ? >>> it says that "the rebelance call is causing a repartitioning of the >>> data so that all machines" is it actually changing the num of >>> partitions of the topic to match the num of flink operators ? >>> >>> Avi >>> >>
