There would be less contention if you could reduce the sharing... so may be divide them them into 31 per channel. 31 still looks like a huge number. Best if you can you consolidate 31 down to just 1 or 2 ?
Keep in mind there is one thread per sink and one per source (unless you are spawning more inside your source / sink). A rule of thumb (actually more like guidance) is 2 to 4 threads per core. So keep the an eye out for not overloading your box with too many threads. On Tue, Mar 12, 2013 at 2:55 PM, Chris Neal <[email protected]> wrote: > So, in a 4 channel setup, would I bind each of the 124 sources to all of the > 4 channels, or divide them up and put 31 sources on each individual channel? > :) > > > On Tue, Mar 12, 2013 at 4:40 PM, Chris Neal <[email protected]> wrote: >> >> Beautiful. Will try 4 channels in one Agent first. >> Thanks! >> >> >> On Tue, Mar 12, 2013 at 4:35 PM, Roshan Naik <[email protected]> >> wrote: >>> >>> Even 16 on a single channel might be on the higher side IMHO. >>> >>> Try instead splitting into four channels with 4 sinks each... or even >>> four agents with one channel and 4 sinks each ..... it will reduce >>> contention. be careful to ensure your capacity of each channel is not >>> too high since you now have many channels. >>> -roshan >>> >>> On Tue, Mar 12, 2013 at 2:24 PM, Chris Neal <[email protected]> wrote: >>> > Thanks for the reply. You're definitely on to something with the >>> > ever-increasing number of sinks. :) >>> > >>> > I scaled it back to 16 AvroSinks, and used a >>> > MemoryChannel.transactionCapacity of 1000, and AvroSink.batch-size of >>> > 1000. >>> > My ExecSource.batchSize is 100 (I chose this smaller number because >>> > there >>> > are so many of them (124), I didn't want 10s of thousands of events >>> > getting >>> > dropped on the MemoryChannel at once, rather just 1000s). With those >>> > settings, things are keeping the MemoryChannel drained. Finally >>> > getting >>> > somewhere! :) >>> > >>> > Much appreciate the prompt response. If anything else comes to mind, >>> > please >>> > do let me know. >>> > >>> > Thanks again. >>> > Chris >>> > >>> > >>> > >>> > On Tue, Mar 12, 2013 at 4:12 PM, Roshan Naik <[email protected]> >>> > wrote: >>> >> >>> >> i meant 640,000 not 64,000 >>> >> >>> >> On Tue, Mar 12, 2013 at 2:10 PM, Roshan Naik <[email protected]> >>> >> wrote: >>> >> > beyond a certain # of sinks it wont help adding more. my suspicion >>> >> > is >>> >> > you may have gone way overboard. >>> >> > >>> >> > if your sink-side batch size is that large and you have 64 sinks in >>> >> > the round-robin.. it will take a lot of events (64,000) to be pumped >>> >> > in by the source order before the first event can start trickling >>> >> > out >>> >> > of any sink. Also memory consumption will be quite high.. each sink >>> >> > will open a transaction and hold on to 10000 events. This the cause >>> >> > for the Memory channel filling up. Until the sink side transaction >>> >> > is >>> >> > committed (i.e 10k events are pulled), the memory reservation on the >>> >> > channel is not relinquished. So your memory channel size will have >>> >> > to >>> >> > really high to support so manch sinks each with such a big batch >>> >> > size. >>> >> > >>> >> > My gut feel is that your source-side batch size is not much of an >>> >> > issue and can be smaller. Increasing the number of sinks will only >>> >> > help if the sink is indeed the bott >>> >> > >>> >> > On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <[email protected]> >>> >> > wrote: >>> >> >> Hi all. >>> >> >> >>> >> >> I've been working on this for quite some time, and need some advice >>> >> >> from the >>> >> >> experts. I have a two tiered Flume architecture: >>> >> >> >>> >> >> App Tier (all on one server): >>> >> >> 124 ExecSources -> MemoryChannel -> AvroSinks >>> >> >> >>> >> >> HDFS Tier (on two servers): >>> >> >> AvroSource -> FileChannel -> HDFSSinks >>> >> >> >>> >> >> When I run the agents, the HDFS tier is keeping up fine with the >>> >> >> App >>> >> >> Tier. >>> >> >> queue sizes stay between 0-10000 (I have a batch size of 10000). >>> >> >> All >>> >> >> is >>> >> >> good. >>> >> >> >>> >> >> On the App Tier, when I view the JMX data through jconsole, I watch >>> >> >> the >>> >> >> size >>> >> >> of the MemoryChannel grow steadily until it reaches the max, then >>> >> >> it >>> >> >> starts >>> >> >> throwing exceptions about not being able to put the batch on the >>> >> >> channel as >>> >> >> expected. >>> >> >> >>> >> >> There seems to be two basic ways to increase the throughput of the >>> >> >> App >>> >> >> Tier: >>> >> >> 1. Increase the MemoryChannel's transactionCapacity and the >>> >> >> corresponding >>> >> >> AvroSink's batch-size. Both are set to 10000 for me. >>> >> >> 2. Increase the number of AvroSinks to drain the MemoryChannel. >>> >> >> I'm >>> >> >> up to >>> >> >> 64 Sinks now which round-robin between the two Flume Agents on the >>> >> >> HDFS >>> >> >> tier. >>> >> >> >>> >> >> Both of those values seem quite high to me (batch size and number >>> >> >> of >>> >> >> sinks). >>> >> >> >>> >> >> Am I missing something as far as tuning? >>> >> >> Which would allow for greater increase to throughput, more Sinks or >>> >> >> larger >>> >> >> batch size? >>> >> >> >>> >> >> I'm stumped here. I still think I can get this to work. :) >>> >> >> >>> >> >> Any suggestions are most welcome. >>> >> >> Thanks for your time. >>> >> >> Chris >>> >> >> >>> > >>> > >> >> >
