So, in a 4 channel setup, would I bind each of the 124 sources to all of the 4 channels, or divide them up and put 31 sources on each individual channel? :)
On Tue, Mar 12, 2013 at 4:40 PM, Chris Neal <[email protected]> wrote: > Beautiful. Will try 4 channels in one Agent first. > Thanks! > > > On Tue, Mar 12, 2013 at 4:35 PM, Roshan Naik <[email protected]>wrote: > >> Even 16 on a single channel might be on the higher side IMHO. >> >> Try instead splitting into four channels with 4 sinks each... or even >> four agents with one channel and 4 sinks each ..... it will reduce >> contention. be careful to ensure your capacity of each channel is not >> too high since you now have many channels. >> -roshan >> >> On Tue, Mar 12, 2013 at 2:24 PM, Chris Neal <[email protected]> wrote: >> > Thanks for the reply. You're definitely on to something with the >> > ever-increasing number of sinks. :) >> > >> > I scaled it back to 16 AvroSinks, and used a >> > MemoryChannel.transactionCapacity of 1000, and AvroSink.batch-size of >> 1000. >> > My ExecSource.batchSize is 100 (I chose this smaller number because >> there >> > are so many of them (124), I didn't want 10s of thousands of events >> getting >> > dropped on the MemoryChannel at once, rather just 1000s). With those >> > settings, things are keeping the MemoryChannel drained. Finally getting >> > somewhere! :) >> > >> > Much appreciate the prompt response. If anything else comes to mind, >> please >> > do let me know. >> > >> > Thanks again. >> > Chris >> > >> > >> > >> > On Tue, Mar 12, 2013 at 4:12 PM, Roshan Naik <[email protected]> >> wrote: >> >> >> >> i meant 640,000 not 64,000 >> >> >> >> On Tue, Mar 12, 2013 at 2:10 PM, Roshan Naik <[email protected]> >> >> wrote: >> >> > beyond a certain # of sinks it wont help adding more. my suspicion is >> >> > you may have gone way overboard. >> >> > >> >> > if your sink-side batch size is that large and you have 64 sinks in >> >> > the round-robin.. it will take a lot of events (64,000) to be pumped >> >> > in by the source order before the first event can start trickling out >> >> > of any sink. Also memory consumption will be quite high.. each sink >> >> > will open a transaction and hold on to 10000 events. This the cause >> >> > for the Memory channel filling up. Until the sink side transaction is >> >> > committed (i.e 10k events are pulled), the memory reservation on the >> >> > channel is not relinquished. So your memory channel size will have to >> >> > really high to support so manch sinks each with such a big batch >> size. >> >> > >> >> > My gut feel is that your source-side batch size is not much of an >> >> > issue and can be smaller. Increasing the number of sinks will only >> >> > help if the sink is indeed the bott >> >> > >> >> > On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <[email protected]> >> wrote: >> >> >> Hi all. >> >> >> >> >> >> I've been working on this for quite some time, and need some advice >> >> >> from the >> >> >> experts. I have a two tiered Flume architecture: >> >> >> >> >> >> App Tier (all on one server): >> >> >> 124 ExecSources -> MemoryChannel -> AvroSinks >> >> >> >> >> >> HDFS Tier (on two servers): >> >> >> AvroSource -> FileChannel -> HDFSSinks >> >> >> >> >> >> When I run the agents, the HDFS tier is keeping up fine with the App >> >> >> Tier. >> >> >> queue sizes stay between 0-10000 (I have a batch size of 10000). >> All >> >> >> is >> >> >> good. >> >> >> >> >> >> On the App Tier, when I view the JMX data through jconsole, I watch >> the >> >> >> size >> >> >> of the MemoryChannel grow steadily until it reaches the max, then it >> >> >> starts >> >> >> throwing exceptions about not being able to put the batch on the >> >> >> channel as >> >> >> expected. >> >> >> >> >> >> There seems to be two basic ways to increase the throughput of the >> App >> >> >> Tier: >> >> >> 1. Increase the MemoryChannel's transactionCapacity and the >> >> >> corresponding >> >> >> AvroSink's batch-size. Both are set to 10000 for me. >> >> >> 2. Increase the number of AvroSinks to drain the MemoryChannel. >> I'm >> >> >> up to >> >> >> 64 Sinks now which round-robin between the two Flume Agents on the >> HDFS >> >> >> tier. >> >> >> >> >> >> Both of those values seem quite high to me (batch size and number of >> >> >> sinks). >> >> >> >> >> >> Am I missing something as far as tuning? >> >> >> Which would allow for greater increase to throughput, more Sinks or >> >> >> larger >> >> >> batch size? >> >> >> >> >> >> I'm stumped here. I still think I can get this to work. :) >> >> >> >> >> >> Any suggestions are most welcome. >> >> >> Thanks for your time. >> >> >> Chris >> >> >> >> > >> > >> > >
