Another possibility, if you can pause processing, is to create a new topic with 
the higher number of partitions, then consume from the beginning of the old 
topic and produce to the new one. Then continue processing as normal and all 
events will be in the correct partitions. 

Regards,
Dave

> On Nov 21, 2021, at 7:38 AM, Pushkar Deole <[email protected]> wrote:
> 
> Thanks Luke, I am sure this problem would have been faced by many others
> before so would like to know if there are any existing custom algorithms
> that can be reused,
> 
> Note that we also have requirement to maintain key level ordering,  so the
> custom partitioner should support that as well
> 
>> On Sun, Nov 21, 2021, 18:29 Luke Chen <[email protected]> wrote:
>> 
>> Hello Pushkar,
>> Default distribution algorithm is by "hash(key) % partition_count", so
>> there's possibility to have the uneven distribution you saw.
>> 
>> Yes, there's a way to solve your problem: custom partitioner:
>> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
>> 
>> You can check the partitioner javadoc here
>> <
>> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
>>> 
>> for reference. You can see some examples from built-in partitioners, ex:
>> 
>> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
>> Basically, you want to focus on the "partition" method, to define your own
>> algorithm to distribute the keys based on the events, ex: key-1 ->
>> partition-1, key-2 -> partition-2... etc.
>> 
>> Thank you.
>> Luke
>> 
>> 
>> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <[email protected]>
>> wrote:
>> 
>>> Hi All,
>>> 
>>> We are experiencing some uneven distribution of events across topic
>>> partitions for a small set of unique keys: following are the details:
>>> 
>>> 1. topic with 6 partitions
>>> 2. 8 unique keys used to produce events onto the topic
>>> 
>>> Used 'key' based partitioning while producing events onto the above topic
>>> Observation: only 3 partitions were utilized for all the events
>> pertaining
>>> to those 8 unique keys.
>>> 
>>> Any idea how can the load be even across partitions while using key based
>>> partitioning strategy? Any help would be greatly appreciated.
>>> 
>>> Note: we cannot use round robin since key level ordering matters for us
>>> 
>> 

Reply via email to