Re: Changing Partitions of kafka

Luke Chen Sun, 21 Nov 2021 21:09:30 -0800

Hi Rajat,
> So to route old keys , I will have to route old keys first before I start
accepting the new data. Right?
The old keys will automatically be routed to different partitions based on
the partitioner used (in partitioner.class config).

> A separate partitioner code has to be executed post increase in
partitions which will read all records in kafka topic and run partition
algorithm and then push it to new partition number simply. , then we can
start accepting new messages. Is my understanding correct?
Basically that is correct, except that the partitioner won't read records
in kafka topics, instead, it reads records in producer. The partitioner
exists in producer. When sending records to brokers, the partitioner in
producer will decide which topic partition to send to, and find the
partition leader broker for that partition.

So, take an example.
Before increasing the partition count, keys are "key-0", key-1", ...
"key-99"
Suppose using default partitioner, it maps to 1 partition each key, ex:
key-0 -> p-0 (partition 0)
key-1 -> p-1
...
key-99 -> p-99
(this is assumption, in reality, there might be some keys send to the same
partition)

So, after a while, you need to increase the partition count for some
reasons. Again, for the records already sent to brokers, you can't
change/update them. So, to keep the same key to the same partition after
partition count increasing, you can write your own partitioner, to map
key-0 -> p-0, key-1-> p-1... key-99 -> p-99. And for other keys not existed
before, you use your own algorithm to distribute them, ex: [hash(key) %
10(new added partitions)]

That said, if all record keys are identical before and after the partition
count increasing, then, there might not be any good way to keep the same
key to the same partition, and also leverage all the partitions (including
new added partitions).

Thank you.
Luke

On Mon, Nov 22, 2021 at 12:24 AM rajat kumar <[email protected]>
wrote:

> Hi Luke
>
> Thanks for responding. So to route old keys , I will have to route old keys
> first before I start accepting the new data. Right?
> A separate partitioner code has to be executed post increase in partitions
> which will read all records in kafka topic and run partition algorithm and
> then push it to new partition number simply. , then we can start accepting
> new messages. Is my understanding correct?
>
> On Sun, Nov 21, 2021 at 6:19 PM Luke Chen <[email protected]> wrote:
>
> > Hello Rajat,
> >
> > I'm not sure what you mean to "reshuffle messages", because once the
> > messages are stored in brokers, they can't be modified anymore.
> > But if you want to make the previous added messages route to the same
> > partitions after partition increasing, you can write custom partitioner:
> >
> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
> >
> > So, for example, you added 10 partitions for some new keys (ex: key-101 ~
> > key-110), you can write the partitioner to route the old keys to
> [hash(key)
> > % 100(old partition count)], and new keys route to [hash(key) % 10(new
> > added partitions)].
> >
> > Thank you.
> > Luke
> >
> > On Sun, Nov 21, 2021 at 7:46 PM rajat kumar <[email protected]>
> > wrote:
> >
> > > Hello Users,
> > >
> > > I am pretty new to Kafka, we will have key based messages coming up in
> > > kafka.
> > > We will have a 5 node cluster and I am going ahead with a 100 partition
> > for
> > > the topic for now.
> > > Let's say if there is a need to increase the number of partitions. How
> > do I
> > > reshuffle messages , since previously added messages would end up in
> the
> > > wrong partition as per hash partition algo?
> > >
> > > Thanks
> > > Rajat
> > >
> >
>

Re: Changing Partitions of kafka

Reply via email to