Re: Consumer rebalancing based on partition sizes?

2015-06-23 Thread Ewen Cheslack-Postava
Current partition assignment only has a few limited options -- see the
partition.assignment.strategy consumer option (which seems to be listed
twice, see the second version for a more detailed explanation). There has
been some discussion of making assignment strategies user extensible to
support use cases like this.

Is there a reason your data is unbalanced that might be avoidable? Ideally
good hashing of keys combined with a large enough number of keys with
reasonable data distribution across keys (not necessarily uniform) leads to
a reasonable balance, although there are certainly some workloads that are
so skewed that this doesn't work out.



On Tue, Jun 23, 2015 at 7:34 PM, Joel Ohman maelstrom.thunderb...@gmail.com
 wrote:

 Hello!

 I'm working with a topic of largely variable partition sizes. My biggest
 concern is that I have no control over which keys are assigned to which
 consumers in my consumer group, as the amount of data my consumer sees is
 directly reflected on it's work load. Is there a way to distribute
 partitions to consumers evenly  based on the size of each partition? The
 provided Consumer Rebalancing Algorithm prioritizes assigning consumers
 even numbers of partitions, regardless of their size.

 Regards,
 Joel




-- 
Thanks,
Ewen


Consumer rebalancing based on partition sizes?

2015-06-23 Thread Joel Ohman
Hello!

I'm working with a topic of largely variable partition sizes. My biggest
concern is that I have no control over which keys are assigned to which
consumers in my consumer group, as the amount of data my consumer sees is
directly reflected on it's work load. Is there a way to distribute
partitions to consumers evenly  based on the size of each partition? The
provided Consumer Rebalancing Algorithm prioritizes assigning consumers
even numbers of partitions, regardless of their size.

Regards,
Joel