Avro agent.

Chris Neal Tue, 12 Mar 2013 14:55:30 -0700

So, in a 4 channel setup, would I bind each of the 124 sources to all of
the 4 channels, or divide them up and put 31 sources on each individual
channel? :)



On Tue, Mar 12, 2013 at 4:40 PM, Chris Neal <[email protected]> wrote:

> Beautiful.  Will try 4 channels in one Agent first.
> Thanks!
>
>
> On Tue, Mar 12, 2013 at 4:35 PM, Roshan Naik <[email protected]>wrote:
>
>> Even 16 on a single channel might be on the higher side IMHO.
>>
>> Try instead splitting into four channels with 4 sinks each... or even
>> four agents with one channel and 4 sinks each ..... it will reduce
>> contention. be careful to ensure your capacity of each channel is not
>> too high since you now have many channels.
>> -roshan
>>
>> On Tue, Mar 12, 2013 at 2:24 PM, Chris Neal <[email protected]> wrote:
>> > Thanks for the reply.  You're definitely on to something with the
>> > ever-increasing number of sinks.  :)
>> >
>> > I scaled it back to 16 AvroSinks, and used a
>> > MemoryChannel.transactionCapacity of 1000, and AvroSink.batch-size of
>> 1000.
>> > My ExecSource.batchSize is 100 (I chose this smaller number because
>> there
>> > are so many of them (124), I didn't want 10s of thousands of events
>> getting
>> > dropped on the MemoryChannel at once, rather just 1000s).  With those
>> > settings, things are keeping the MemoryChannel drained.  Finally getting
>> > somewhere! :)
>> >
>> > Much appreciate the prompt response.  If anything else comes to mind,
>> please
>> > do let me know.
>> >
>> > Thanks again.
>> > Chris
>> >
>> >
>> >
>> > On Tue, Mar 12, 2013 at 4:12 PM, Roshan Naik <[email protected]>
>> wrote:
>> >>
>> >> i meant 640,000 not 64,000
>> >>
>> >> On Tue, Mar 12, 2013 at 2:10 PM, Roshan Naik <[email protected]>
>> >> wrote:
>> >> > beyond a certain # of sinks it wont help adding more. my suspicion is
>> >> > you may have gone way overboard.
>> >> >
>> >> >  if your sink-side batch size is that large and you have 64 sinks in
>> >> > the round-robin.. it will take a lot of events (64,000) to be pumped
>> >> > in by the source order before the first event can start trickling out
>> >> > of any sink.  Also memory consumption will be quite high.. each sink
>> >> > will open a transaction and hold on to 10000 events. This the cause
>> >> > for the Memory channel filling up. Until the sink side transaction is
>> >> > committed (i.e 10k events are pulled), the memory reservation on the
>> >> > channel is not relinquished. So your memory channel size will have to
>> >> > really high to support so manch sinks each with such a big batch
>> size.
>> >> >
>> >> > My gut feel is that your source-side batch size is not much of an
>> >> > issue and can be smaller. Increasing the number of sinks will only
>> >> > help if the sink is indeed the bott
>> >> >
>> >> > On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <[email protected]>
>> wrote:
>> >> >> Hi all.
>> >> >>
>> >> >> I've been working on this for quite some time, and need some advice
>> >> >> from the
>> >> >> experts.  I have a two tiered Flume architecture:
>> >> >>
>> >> >> App Tier (all on one server):
>> >> >>  124 ExecSources -> MemoryChannel -> AvroSinks
>> >> >>
>> >> >> HDFS Tier (on two servers):
>> >> >>   AvroSource -> FileChannel -> HDFSSinks
>> >> >>
>> >> >> When I run the agents, the HDFS tier is keeping up fine with the App
>> >> >> Tier.
>> >> >> queue sizes stay between 0-10000 (I have a batch size of 10000).
>>  All
>> >> >> is
>> >> >> good.
>> >> >>
>> >> >> On the App Tier, when I view the JMX data through jconsole, I watch
>> the
>> >> >> size
>> >> >> of the MemoryChannel grow steadily until it reaches the max, then it
>> >> >> starts
>> >> >> throwing exceptions about not being able to put the batch on the
>> >> >> channel as
>> >> >> expected.
>> >> >>
>> >> >> There seems to be two basic ways to increase the throughput of the
>> App
>> >> >> Tier:
>> >> >> 1.  Increase the MemoryChannel's transactionCapacity and the
>> >> >> corresponding
>> >> >> AvroSink's batch-size.  Both are set to 10000 for me.
>> >> >> 2.  Increase the number of AvroSinks to drain the MemoryChannel.
>>  I'm
>> >> >> up to
>> >> >> 64 Sinks now which round-robin between the two Flume Agents on the
>> HDFS
>> >> >> tier.
>> >> >>
>> >> >> Both of those values seem quite high to me (batch size and number of
>> >> >> sinks).
>> >> >>
>> >> >> Am I missing something as far as tuning?
>> >> >> Which would allow for greater increase to throughput, more Sinks or
>> >> >> larger
>> >> >> batch size?
>> >> >>
>> >> >> I'm stumped here.  I still think I can get this to work. :)
>> >> >>
>> >> >> Any suggestions are most welcome.
>> >> >> Thanks for your time.
>> >> >> Chris
>> >> >>
>> >
>> >
>>
>
>

Re: Best way to increase throughput of Exec->Memory->Avro agent.

Reply via email to