i meant 640,000 not 64,000
On Tue, Mar 12, 2013 at 2:10 PM, Roshan Naik <[email protected]> wrote: > beyond a certain # of sinks it wont help adding more. my suspicion is > you may have gone way overboard. > > if your sink-side batch size is that large and you have 64 sinks in > the round-robin.. it will take a lot of events (64,000) to be pumped > in by the source order before the first event can start trickling out > of any sink. Also memory consumption will be quite high.. each sink > will open a transaction and hold on to 10000 events. This the cause > for the Memory channel filling up. Until the sink side transaction is > committed (i.e 10k events are pulled), the memory reservation on the > channel is not relinquished. So your memory channel size will have to > really high to support so manch sinks each with such a big batch size. > > My gut feel is that your source-side batch size is not much of an > issue and can be smaller. Increasing the number of sinks will only > help if the sink is indeed the bott > > On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <[email protected]> wrote: >> Hi all. >> >> I've been working on this for quite some time, and need some advice from the >> experts. I have a two tiered Flume architecture: >> >> App Tier (all on one server): >> 124 ExecSources -> MemoryChannel -> AvroSinks >> >> HDFS Tier (on two servers): >> AvroSource -> FileChannel -> HDFSSinks >> >> When I run the agents, the HDFS tier is keeping up fine with the App Tier. >> queue sizes stay between 0-10000 (I have a batch size of 10000). All is >> good. >> >> On the App Tier, when I view the JMX data through jconsole, I watch the size >> of the MemoryChannel grow steadily until it reaches the max, then it starts >> throwing exceptions about not being able to put the batch on the channel as >> expected. >> >> There seems to be two basic ways to increase the throughput of the App Tier: >> 1. Increase the MemoryChannel's transactionCapacity and the corresponding >> AvroSink's batch-size. Both are set to 10000 for me. >> 2. Increase the number of AvroSinks to drain the MemoryChannel. I'm up to >> 64 Sinks now which round-robin between the two Flume Agents on the HDFS >> tier. >> >> Both of those values seem quite high to me (batch size and number of sinks). >> >> Am I missing something as far as tuning? >> Which would allow for greater increase to throughput, more Sinks or larger >> batch size? >> >> I'm stumped here. I still think I can get this to work. :) >> >> Any suggestions are most welcome. >> Thanks for your time. >> Chris >>
