The producers bind to random partitions and move every 10 minutes. If you leave 
it long enough you should see it in the producer flume agent logs, although 
there's nothing to stop it from "randomly" choosing the same partition twice. 
Annoyingly there's no concept of producer groups (yet) to ensure that producers 
apportion the available partitions between them as this would create a 
synchronisation issue between what should be entirely independent processes. 

-- 
Chris Horrocks 

On 7 June 2016 at 00:32:29, Jason J. W. Williams 
([email protected](mailto:[email protected])) wrote:

> 
> Hi, 
> 
> New to flume and I'm trying to relay log messages received over netcat source 
> to Kafka sink. 
> 
> Everything seems to be fine, except that Flume is acting like it IS assigning 
> a partition key to the produced messages though none is assigned. I'd like 
> the messages to be assigned to a random partition, so that consumers are load 
> balanced. 
> 
> * Flume 1.6.0 
> * Kafka 0.9.0.1
> 
> Flume config: 
> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c 
> 
> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create --topic 
> activity.history --partitions 20 --replication-factor 1
> 
> Python consumer program: 
> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437 
> 
> Test program (publishes to Flume): 
> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b
> 
> Flume agent listens on 3132tcp for connections, and publishes messages 
> received to the Kafka activity.history topic. I'm running two instances of 
> the Python consumer. 
> 
> What happens however, is all logs messages get sent to a single Kafka 
> consumer...if I restart Flume (leave consumers running) and re-run the test, 
> all messages get published to the other consumer. So it feels like Flume is 
> assigning a permanent partition key even though one is not defined (and 
> should therefore be random). 
> 
> Any advice is greatly appreciated. 
> 
> -J 

Reply via email to