Got it. Thanks

________________________________
 From: Brock Noland <[email protected]>
To: [email protected]; Rahul Ravindran <[email protected]> 
Sent: Wednesday, November 7, 2012 12:14 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 

The memory channel doesn't know about networks.  The sources like 
avrosource/avrosink do. They operate on TCP/IP and when there is an error 
sending data downstream they roll the transaction back so that no data is lost. 
The believe the docs cover this here http://flume.apache.org/FlumeUserGuide.html

Brock


On Wed, Nov 7, 2012 at 1:52 PM, Rahul Ravindran <[email protected]> wrote:

Hi,
>
>
>Thanks for the response.
>
>
>Does the memory channel provide transactional guarantees? In the event of a 
>network packet loss, does it retry sending the packet? If we ensure that we do 
>not exceed the capacity for the memory channel, does it continue retrying to 
>send an event to the remote source on failure?
>
>
>Thanks,
>~Rahul.
>
>
>
>________________________________
> From: Brock Noland <[email protected]>
>To: [email protected]; Rahul Ravindran <[email protected]> 
>Sent: Wednesday, November 7, 2012 11:48 AM
>
>Subject: Re: Guarantees of the memory channel for delivering to sink
> 
>
>
>Hi,
>
>
>Yes if you use memory channel, you can lose data. To not lose data, file 
>channel needs to write to disk...
>
>
>Brock
>
>
>On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran <[email protected]> wrote:
>
>Ping on the below questions about new Spool Directory source:
>>
>>
>>If we choose to use the memory channel with this source, to an Avro sink on a 
>>remote box, do we risk data loss in the eventuality of a network 
>>partition/slow network or if the flume-agent on the source box dies?
>>If we choose to use file channel with this source, we will result in double 
>>writes to disk, correct? (one for the legacy log files which will be ingested 
>>by the Spool Directory source, and the other for the WAL)
>>
>>
>>
>>
>>
>>________________________________
>> From: Rahul Ravindran <[email protected]>
>>To: "[email protected]" <[email protected]> 
>>Sent: Tuesday, November 6, 2012 3:40 PM
>>
>>Subject: Re: Guarantees of the memory channel for delivering to sink
>> 
>>
>>
>>This is awesome. 
>>This may be perfect for our use case :)
>>
>>
>>When is the 1.3 release expected?
>>
>>
>>Couple of questions for the choice of channel for the new source:
>>
>>
>>If we choose to use the memory channel with this source, to an Avro sink on a 
>>remote box, do we risk data loss in the eventuality of a network 
>>partition/slow network or if the flume-agent on the source box dies?
>>If we choose to use file channel with this source, we will result in double 
>>writes to disk, correct? (one for the legacy log files which will be ingested 
>>by the Spool Directory source, and the other for the WAL)
>>
>>
>>Thanks,
>>~Rahul.
>>
>>
>>
>>________________________________
>> From: Brock Noland <[email protected]>
>>To: [email protected]; Rahul Ravindran <[email protected]> 
>>Sent: Tuesday, November 6, 2012 3:05 PM
>>Subject: Re: Guarantees of the memory channel for delivering to sink
>> 
>>This use case sounds like a perfect use of the Spool DIrectory source
>>which will be in the upcoming 1.3 release.
>>
>>Brock
>>
>>On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[email protected]> wrote:
>>> We will update the checkpoint each time
 (we may tune this to be
 periodic)
>>> but the contents of the memory channel will be in the legacy logs which are
>>> currently being generated.
>>>
>>> Additionally, the sink for the memory channel will be an Avro source in
>>> another machine.
>>>
>>> Does that clear things up?
>>>
>>> ________________________________
>>> From: Brock Noland <[email protected]>
>>> To: [email protected]; Rahul Ravindran <[email protected]>
>>> Sent: Tuesday, November 6, 2012 1:44 PM
>>>
>>> Subject: Re: Guarantees of the memory channel for delivering to sink
>>>
>>> But in your architecture you
 are going to write the contents of the
>>> memory channel out? Or did I miss
 something?
>>>
>>> "The checkpoint will be updated each time we perform a successive
>>> insertion into the memory channel."
>>>
>>> On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[email protected]> wrote:
>>>> We have a legacy system which writes events to a file (existing log file).
>>>> This will continue. If I used a filechannel, I will be double the number
>>>> of
>>>> IO operations(writes to the legacy log file, and writes to WAL).
>>>>
>>>> ________________________________
>>>> From: Brock Noland <[email protected]>
>>>> To: [email protected]; Rahul Ravindran <[email protected]>
>>>> Sent: Tuesday, November 6, 2012 1:38 PM
>>>> Subject: Re: Guarantees of the memory channel for delivering to sink
>>>>
>>>> Your still going to be writing out all events, no? So how would file
>>>> channel do more IO than that?
>>>>
>>>> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[email protected]> wrote:
>>>>> Hi,
>>>>>    I am very new to Flume and we are hoping to use it for our log
>>>>> aggregation into HDFS. I have a few questions below:
>>>>>
>>>>> FileChannel will double our disk IO, which will affect IO
 performance on
>>>>> certain performance sensitive machines. Hence, I was hoping to write a
>>>>> custom Flume source which will use a memory channel, and which
 will
>>>>> perform
>>>>> checkpointing. The checkpoint will be updated each time we perform a
>>>>> successive insertion into the memory channel. (I realize that this
>>>>> results
>>>>> in a risk of data, the maximum size of which is the capacity of the
>>>>> memory
>>>>> channel).
>>>>>
>>>>>    As long as there is capacity in the memory channel buffers, does the
>>>>> memory channel guarantee delivery to a sink (does it wait for
>>>>> acknowledgements, and retry failed packets)? This would mean that we need
>>>>> to
>>>>> ensure that we do not exceed the channel capacity.
>>>>>
>>>>> I am writing a custom source which will use the memory channel, and which
>>>>> will catch a ChannelException to identify any channel capacity issues(so,
>>>>> buffer used in the memory channel
 is full because of lagging
>>>>> sinks/network
>>>>> issues etc). Is that a reasonable assumption to make?
>>>>>
>>>>> Thanks,
>>>>> ~Rahul.
>>>>
>>>>
>>>>
>>>> --
>>>> Apache MRUnit - Unit testing MapReduce -
>>>> http://incubator.apache.org/mrunit/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>>
>>>
>>
>>
>>
>>-- 
>>Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>
>>
>>
>>
>>
>
>
>
>-- 
>Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Reply via email to