Hey Chris, That's great. I'll look into that. Thank you very much!
-J Sent via iPhone > On Jun 7, 2016, at 11:36, Chris Horrocks <[email protected]> wrote: > > You can implement sink groups load balance processor > (https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors) to > spread the load over multiple producers to partly account for situation where > you might have one kafka sink rotating between mutliple partitions, it's not > an exact science though as you'd need multiple kafka sinks (n number more > than you have partitions in the topic being written to) in order to make the > balance of probabilities such that you were always writing to each partition > from at least one sink. > > As with all hadoop technologies: depends on your use-case. > > -- > Chris Horrocks > > From: Jason J. W. Williams <[email protected]> > Reply: [email protected] <[email protected]> > Date: 7 June 2016 at 19:03:59 > To: [email protected] <[email protected]> > Subject: Re: Kafka Sink random partition assignment > >> Thanks again Chris. I am curious why I see the round-robin behavior I >> expected when using kafka-console-producer to inject messages though. >> >> -J >> >>> On Tuesday, June 7, 2016, Chris Horrocks <[email protected]> wrote: >>> It's by design of Kafka (and by extension flume). The producers are >>> designed to be many-to-one (producers to partitions) and as such picking a >>> random partition every 10 minutes prevents separate producer instances from >>> all randomly picking the same partition. >>> >>> -- >>> Chris Horrocks >>> >>> From: Jason Williams <[email protected]> >>> Reply: [email protected] <[email protected]> >>> Date: 7 June 2016 at 09:43:34 >>> To: [email protected] <[email protected]> >>> Subject: Re: Kafka Sink random partition assignment >>> >>>> Hey Chris, >>>> >>>> Thanks for help! >>>> >>>> Is that a limitation of the Flume Kafka sink or Kafka itself? Because when >>>> I use another Kafka producer and publish without a key, it randomly moves >>>> among the partitions on every publish. >>>> >>>> -J >>>> >>>> Sent via iPhone >>>> >>>> On Jun 7, 2016, at 00:08, Chris Horrocks <[email protected]> wrote: >>>> >>>>> The producers bind to random partitions and move every 10 minutes. If you >>>>> leave it long enough you should see it in the producer flume agent logs, >>>>> although there's nothing to stop it from "randomly" choosing the same >>>>> partition twice. Annoyingly there's no concept of producer groups (yet) >>>>> to ensure that producers apportion the available partitions between them >>>>> as this would create a synchronisation issue between what should be >>>>> entirely independent processes. >>>>> >>>>> -- >>>>> Chris Horrocks >>>>>> On 7 June 2016 at 00:32:29, Jason J. W. Williams >>>>>> ([email protected]) wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> New to flume and I'm trying to relay log messages received over netcat >>>>>> source to Kafka sink. >>>>>> >>>>>> Everything seems to be fine, except that Flume is acting like it IS >>>>>> assigning a partition key to the produced messages though none is >>>>>> assigned. I'd like the messages to be assigned to a random partition, so >>>>>> that consumers are load balanced. >>>>>> >>>>>> * Flume 1.6.0 >>>>>> * Kafka 0.9.0.1 >>>>>> >>>>>> Flume config: >>>>>> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c >>>>>> >>>>>> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create >>>>>> --topic activity.history --partitions 20 --replication-factor 1 >>>>>> >>>>>> Python consumer program: >>>>>> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437 >>>>>> >>>>>> Test program (publishes to Flume): >>>>>> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b >>>>>> >>>>>> Flume agent listens on 3132tcp for connections, and publishes messages >>>>>> received to the Kafka activity.history topic. I'm running two instances >>>>>> of the Python consumer. >>>>>> >>>>>> What happens however, is all logs messages get sent to a single Kafka >>>>>> consumer...if I restart Flume (leave consumers running) and re-run the >>>>>> test, all messages get published to the other consumer. So it feels like >>>>>> Flume is assigning a permanent partition key even though one is not >>>>>> defined (and should therefore be random). >>>>>> >>>>>> Any advice is greatly appreciated. >>>>>> >>>>>> -J
