Jason Rosenberg created KAFKA-2172:
--------------------------------------

             Summary: Round-robin partition assignment strategy too restrictive
                 Key: KAFKA-2172
                 URL: https://issues.apache.org/jira/browse/KAFKA-2172
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Rosenberg


The round-ropin partition assignment strategy, was introduced for the 
high-level consumer, starting with 0.8.2.1.  This appears to be a very 
attractive feature, but it has an unfortunate restriction, which prevents it 
from being easily utilized.  That is that it requires all consumers in the 
consumer group have identical topic regex selectors, and that they have the 
same number of consumer threads.

It turns out this is not always the case for our deployments.  It's not unusual 
to run multiple consumers within a single process (with different topic 
selectors), or we might have multiple processes dedicated for different topic 
subsets.  Agreed, we could change these to have separate group ids for each sub 
topic selector (but unfortunately, that's easier said than done).  In several 
cases, we do at least have separate client.ids set for each sub-consumer, so it 
would be incrementally better if we could at least loosen the requirement such 
that each set of topics selected by a groupid/clientid pair are the same.

But, if we want to do a rolling restart for a new version of a consumer config, 
the cluster will likely be in a state where it's not possible to have a single 
config until the full rolling restart completes across all nodes.  This results 
in a consumer outage while the rolling restart is happening.

Finally, it's especially problematic if we want to canary a new version for a 
period before rolling to the whole cluster.

I'm not sure why this restriction should exist (as it obviously does not exist 
for the 'range' assignment strategy).  It seems it could be made to work 
reasonably well with heterogenous topic selection and heterogenous thread 
counts.  The documentation states that "The round-robin partition assignor lays 
out all the available partitions and all the available consumer threads. It 
then proceeds to do a round-robin assignment from partition to consumer thread."

If the assignor can "lay out all the available partitions and all the available 
consumer threads", it should be able to uniformly assign partitions to the 
available threads.  In each case, if a thread belongs to a consumer that 
doesn't have that partition selected, just move to the next available thread 
that does have the selection, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to