You would run a flume-ng instance on each node with an avro-sink. Then on your collector machine you will run another flume-ng instance with an avro-collector.
If you run more than one collector you can setup sink groups and define that it does failover or load balancing. The concept of a flume master from flume 0.9.x does not exist on flume-ng. I personally use the node and collector configs in the same config file under a different agent name, and then keep them synced on all machines. These two docs are pretty helpful: https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst Thanks, Roy From: Juan Gentile [mailto:[email protected]] Sent: Tuesday, October 09, 2012 11:04 AM To: [email protected] Subject: Flume-ng - Distributed Hi, I'm new to Flume-ng, I'd like to ask you if you can tell me how I can accomplish to have an agent distributed in a cluster. I've have developed my own source and sink version that reads from a queue and the sink stores the messages read to hdfs. If I want to have this running in multiple instances, do I have to submit it on each node? This is my conf file: agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 1000 agent1.channels.channel1.transactionCapacity = 1000 agent1.sources.source1.channels = channel1 agent1.sources.source1.type = MySource agent1.sinks.sink1.channel = channel1 agent1.sinks.sink1.type = MySink agent1.channels = channel1 agent1.sources = source1 agent1.sinks = sink1 I see that there is the concept of 'master' a 'node' in the previous version of flume, do I have something similar here? Thanks, Juan
