Hi Rajat, > So to route old keys , I will have to route old keys first before I start accepting the new data. Right? The old keys will automatically be routed to different partitions based on the partitioner used (in partitioner.class config).
> A separate partitioner code has to be executed post increase in partitions which will read all records in kafka topic and run partition algorithm and then push it to new partition number simply. , then we can start accepting new messages. Is my understanding correct? Basically that is correct, except that the partitioner won't read records in kafka topics, instead, it reads records in producer. The partitioner exists in producer. When sending records to brokers, the partitioner in producer will decide which topic partition to send to, and find the partition leader broker for that partition. So, take an example. Before increasing the partition count, keys are "key-0", key-1", ... "key-99" Suppose using default partitioner, it maps to 1 partition each key, ex: key-0 -> p-0 (partition 0) key-1 -> p-1 ... key-99 -> p-99 (this is assumption, in reality, there might be some keys send to the same partition) So, after a while, you need to increase the partition count for some reasons. Again, for the records already sent to brokers, you can't change/update them. So, to keep the same key to the same partition after partition count increasing, you can write your own partitioner, to map key-0 -> p-0, key-1-> p-1... key-99 -> p-99. And for other keys not existed before, you use your own algorithm to distribute them, ex: [hash(key) % 10(new added partitions)] That said, if all record keys are identical before and after the partition count increasing, then, there might not be any good way to keep the same key to the same partition, and also leverage all the partitions (including new added partitions). Thank you. Luke On Mon, Nov 22, 2021 at 12:24 AM rajat kumar <[email protected]> wrote: > Hi Luke > > Thanks for responding. So to route old keys , I will have to route old keys > first before I start accepting the new data. Right? > A separate partitioner code has to be executed post increase in partitions > which will read all records in kafka topic and run partition algorithm and > then push it to new partition number simply. , then we can start accepting > new messages. Is my understanding correct? > > On Sun, Nov 21, 2021 at 6:19 PM Luke Chen <[email protected]> wrote: > > > Hello Rajat, > > > > I'm not sure what you mean to "reshuffle messages", because once the > > messages are stored in brokers, they can't be modified anymore. > > But if you want to make the previous added messages route to the same > > partitions after partition increasing, you can write custom partitioner: > > > https://kafka.apache.org/documentation/#producerconfigs_partitioner.class > > > > So, for example, you added 10 partitions for some new keys (ex: key-101 ~ > > key-110), you can write the partitioner to route the old keys to > [hash(key) > > % 100(old partition count)], and new keys route to [hash(key) % 10(new > > added partitions)]. > > > > Thank you. > > Luke > > > > On Sun, Nov 21, 2021 at 7:46 PM rajat kumar <[email protected]> > > wrote: > > > > > Hello Users, > > > > > > I am pretty new to Kafka, we will have key based messages coming up in > > > kafka. > > > We will have a 5 node cluster and I am going ahead with a 100 partition > > for > > > the topic for now. > > > Let's say if there is a need to increase the number of partitions. How > > do I > > > reshuffle messages , since previously added messages would end up in > the > > > wrong partition as per hash partition algo? > > > > > > Thanks > > > Rajat > > > > > >
