Hey Chris,

That's great. I'll look into that. Thank you very much!

-J

Sent via iPhone

> On Jun 7, 2016, at 11:36, Chris Horrocks <[email protected]> wrote:
> 
> You can implement sink groups load balance processor 
> (https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors) to 
> spread the load over multiple producers to partly account for situation where 
> you might have one kafka sink rotating between mutliple partitions, it's not 
> an exact science though as you'd need multiple kafka sinks (n number more 
> than you have partitions in the topic being written to) in order to make the 
> balance of probabilities such that you were always writing to each partition 
> from at least one sink.
> 
> As with all hadoop technologies: depends on your use-case.
> 
> -- 
> Chris Horrocks
> 
> From: Jason J. W. Williams <[email protected]>
> Reply: [email protected] <[email protected]>
> Date: 7 June 2016 at 19:03:59
> To: [email protected] <[email protected]>
> Subject:  Re: Kafka Sink random partition assignment 
> 
>> Thanks again Chris. I am curious why I see the round-robin behavior I 
>> expected when using kafka-console-producer to inject messages though. 
>> 
>> -J
>> 
>>> On Tuesday, June 7, 2016, Chris Horrocks <[email protected]> wrote:
>>> It's by design of Kafka (and by extension flume). The producers are 
>>> designed to be many-to-one (producers to partitions) and as such picking a 
>>> random partition every 10 minutes prevents separate producer instances from 
>>> all randomly picking the same partition. 
>>> 
>>> -- 
>>> Chris Horrocks
>>> 
>>> From: Jason Williams <[email protected]>
>>> Reply: [email protected] <[email protected]>
>>> Date: 7 June 2016 at 09:43:34
>>> To: [email protected] <[email protected]>
>>> Subject:  Re: Kafka Sink random partition assignment
>>> 
>>>> Hey Chris,
>>>> 
>>>> Thanks for help! 
>>>> 
>>>> Is that a limitation of the Flume Kafka sink or Kafka itself? Because when 
>>>> I use another Kafka producer and publish without a key, it randomly moves 
>>>> among the partitions on every publish.
>>>> 
>>>> -J
>>>> 
>>>> Sent via iPhone
>>>> 
>>>> On Jun 7, 2016, at 00:08, Chris Horrocks <[email protected]> wrote:
>>>> 
>>>>> The producers bind to random partitions and move every 10 minutes. If you 
>>>>> leave it long enough you should see it in the producer flume agent logs, 
>>>>> although there's nothing to stop it from "randomly" choosing the same 
>>>>> partition twice. Annoyingly there's no concept of producer groups (yet) 
>>>>> to ensure that producers apportion the available partitions between them 
>>>>> as this would create a synchronisation issue between what should be 
>>>>> entirely independent processes. 
>>>>> 
>>>>> -- 
>>>>> Chris Horrocks
>>>>>> On 7 June 2016 at 00:32:29, Jason J. W. Williams 
>>>>>> ([email protected]) wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> New to flume and I'm trying to relay log messages received over netcat 
>>>>>> source to Kafka sink. 
>>>>>> 
>>>>>> Everything seems to be fine, except that Flume is acting like it IS 
>>>>>> assigning a partition key to the produced messages though none is 
>>>>>> assigned. I'd like the messages to be assigned to a random partition, so 
>>>>>> that consumers are load balanced.
>>>>>> 
>>>>>> * Flume 1.6.0
>>>>>> * Kafka 0.9.0.1
>>>>>> 
>>>>>> Flume config: 
>>>>>> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c
>>>>>> 
>>>>>> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create 
>>>>>> --topic activity.history --partitions 20 --replication-factor 1
>>>>>> 
>>>>>> Python consumer program: 
>>>>>> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437
>>>>>> 
>>>>>> Test program (publishes to Flume): 
>>>>>> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b
>>>>>> 
>>>>>> Flume agent listens on 3132tcp for connections, and publishes messages 
>>>>>> received to the Kafka activity.history topic.  I'm running two instances 
>>>>>> of the Python consumer.
>>>>>> 
>>>>>> What happens however, is all logs messages get sent to a single Kafka 
>>>>>> consumer...if I restart Flume (leave consumers running) and re-run the 
>>>>>> test, all messages get published to the other consumer. So it feels like 
>>>>>> Flume is assigning a permanent partition key even though one is not 
>>>>>> defined (and should therefore be random).
>>>>>> 
>>>>>> Any advice is greatly appreciated.
>>>>>> 
>>>>>> -J

Reply via email to