Re: Guarantees of the memory channel for delivering to sink

Brock Noland Tue, 06 Nov 2012 15:06:36 -0800

This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.


Brock

On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[email protected]> wrote:
> We will update the checkpoint each time (we may tune this to be periodic)
> but the contents of the memory channel will be in the legacy logs which are
> currently being generated.
>
> Additionally, the sink for the memory channel will be an Avro source in
> another machine.
>
> Does that clear things up?
>
> ________________________________
> From: Brock Noland <[email protected]>
> To: [email protected]; Rahul Ravindran <[email protected]>
> Sent: Tuesday, November 6, 2012 1:44 PM
>
> Subject: Re: Guarantees of the memory channel for delivering to sink
>
> But in your architecture you are going to write the contents of the
> memory channel out? Or did I miss something?
>
> "The checkpoint will be updated each time we perform a successive
> insertion into the memory channel."
>
> On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[email protected]> wrote:
>> We have a legacy system which writes events to a file (existing log file).
>> This will continue. If I used a filechannel, I will be double the number
>> of
>> IO operations(writes to the legacy log file, and writes to WAL).
>>
>> ________________________________
>> From: Brock Noland <[email protected]>
>> To: [email protected]; Rahul Ravindran <[email protected]>
>> Sent: Tuesday, November 6, 2012 1:38 PM
>> Subject: Re: Guarantees of the memory channel for delivering to sink
>>
>> Your still going to be writing out all events, no? So how would file
>> channel do more IO than that?
>>
>> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[email protected]> wrote:
>>> Hi,
>>>    I am very new to Flume and we are hoping to use it for our log
>>> aggregation into HDFS. I have a few questions below:
>>>
>>> FileChannel will double our disk IO, which will affect IO performance on
>>> certain performance sensitive machines. Hence, I was hoping to write a
>>> custom Flume source which will use a memory channel, and which will
>>> perform
>>> checkpointing. The checkpoint will be updated each time we perform a
>>> successive insertion into the memory channel. (I realize that this
>>> results
>>> in a risk of data, the maximum size of which is the capacity of the
>>> memory
>>> channel).
>>>
>>>    As long as there is capacity in the memory channel buffers, does the
>>> memory channel guarantee delivery to a sink (does it wait for
>>> acknowledgements, and retry failed packets)? This would mean that we need
>>> to
>>> ensure that we do not exceed the channel capacity.
>>>
>>> I am writing a custom source which will use the memory channel, and which
>>> will catch a ChannelException to identify any channel capacity issues(so,
>>> buffer used in the memory channel is full because of lagging
>>> sinks/network
>>> issues etc). Is that a reasonable assumption to make?
>>>
>>> Thanks,
>>> ~Rahul.
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Guarantees of the memory channel for delivering to sink

Reply via email to