Hi all. I've been working on this for quite some time, and need some advice from the experts. I have a two tiered Flume architecture:
App Tier (all on one server): 124 ExecSources -> MemoryChannel -> AvroSinks HDFS Tier (on two servers): AvroSource -> FileChannel -> HDFSSinks When I run the agents, the HDFS tier is keeping up fine with the App Tier. queue sizes stay between 0-10000 (I have a batch size of 10000). All is good. On the App Tier, when I view the JMX data through jconsole, I watch the size of the MemoryChannel grow steadily until it reaches the max, then it starts throwing exceptions about not being able to put the batch on the channel as expected. There seems to be two basic ways to increase the throughput of the App Tier: 1. Increase the MemoryChannel's transactionCapacity and the corresponding AvroSink's batch-size. Both are set to 10000 for me. 2. Increase the number of AvroSinks to drain the MemoryChannel. I'm up to 64 Sinks now which round-robin between the two Flume Agents on the HDFS tier. Both of those values seem quite high to me (batch size and number of sinks). Am I missing something as far as tuning? Which would allow for greater increase to throughput, more Sinks or larger batch size? I'm stumped here. I still think I can get this to work. :) Any suggestions are most welcome. Thanks for your time. Chris
