Hey Chris,

Thanks for help! 

Is that a limitation of the Flume Kafka sink or Kafka itself? Because when I 
use another Kafka producer and publish without a key, it randomly moves among 
the partitions on every publish.

-J

Sent via iPhone

> On Jun 7, 2016, at 00:08, Chris Horrocks <[email protected]> wrote:
> 
> The producers bind to random partitions and move every 10 minutes. If you 
> leave it long enough you should see it in the producer flume agent logs, 
> although there's nothing to stop it from "randomly" choosing the same 
> partition twice. Annoyingly there's no concept of producer groups (yet) to 
> ensure that producers apportion the available partitions between them as this 
> would create a synchronisation issue between what should be entirely 
> independent processes. 
> 
> -- 
> Chris Horrocks
>> On 7 June 2016 at 00:32:29, Jason J. W. Williams ([email protected]) 
>> wrote:
>> 
>> Hi,
>> 
>> New to flume and I'm trying to relay log messages received over netcat 
>> source to Kafka sink. 
>> 
>> Everything seems to be fine, except that Flume is acting like it IS 
>> assigning a partition key to the produced messages though none is assigned. 
>> I'd like the messages to be assigned to a random partition, so that 
>> consumers are load balanced.
>> 
>> * Flume 1.6.0
>> * Kafka 0.9.0.1
>> 
>> Flume config: 
>> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c
>> 
>> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create 
>> --topic activity.history --partitions 20 --replication-factor 1
>> 
>> Python consumer program: 
>> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437
>> 
>> Test program (publishes to Flume): 
>> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b
>> 
>> Flume agent listens on 3132tcp for connections, and publishes messages 
>> received to the Kafka activity.history topic.  I'm running two instances of 
>> the Python consumer.
>> 
>> What happens however, is all logs messages get sent to a single Kafka 
>> consumer...if I restart Flume (leave consumers running) and re-run the test, 
>> all messages get published to the other consumer. So it feels like Flume is 
>> assigning a permanent partition key even though one is not defined (and 
>> should therefore be random).
>> 
>> Any advice is greatly appreciated.
>> 
>> -J

Reply via email to