Got it. Thanks
________________________________ From: Brock Noland <[email protected]> To: [email protected]; Rahul Ravindran <[email protected]> Sent: Wednesday, November 7, 2012 12:14 PM Subject: Re: Guarantees of the memory channel for delivering to sink The memory channel doesn't know about networks. The sources like avrosource/avrosink do. They operate on TCP/IP and when there is an error sending data downstream they roll the transaction back so that no data is lost. The believe the docs cover this here http://flume.apache.org/FlumeUserGuide.html Brock On Wed, Nov 7, 2012 at 1:52 PM, Rahul Ravindran <[email protected]> wrote: Hi, > > >Thanks for the response. > > >Does the memory channel provide transactional guarantees? In the event of a >network packet loss, does it retry sending the packet? If we ensure that we do >not exceed the capacity for the memory channel, does it continue retrying to >send an event to the remote source on failure? > > >Thanks, >~Rahul. > > > >________________________________ > From: Brock Noland <[email protected]> >To: [email protected]; Rahul Ravindran <[email protected]> >Sent: Wednesday, November 7, 2012 11:48 AM > >Subject: Re: Guarantees of the memory channel for delivering to sink > > > >Hi, > > >Yes if you use memory channel, you can lose data. To not lose data, file >channel needs to write to disk... > > >Brock > > >On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran <[email protected]> wrote: > >Ping on the below questions about new Spool Directory source: >> >> >>If we choose to use the memory channel with this source, to an Avro sink on a >>remote box, do we risk data loss in the eventuality of a network >>partition/slow network or if the flume-agent on the source box dies? >>If we choose to use file channel with this source, we will result in double >>writes to disk, correct? (one for the legacy log files which will be ingested >>by the Spool Directory source, and the other for the WAL) >> >> >> >> >> >>________________________________ >> From: Rahul Ravindran <[email protected]> >>To: "[email protected]" <[email protected]> >>Sent: Tuesday, November 6, 2012 3:40 PM >> >>Subject: Re: Guarantees of the memory channel for delivering to sink >> >> >> >>This is awesome. >>This may be perfect for our use case :) >> >> >>When is the 1.3 release expected? >> >> >>Couple of questions for the choice of channel for the new source: >> >> >>If we choose to use the memory channel with this source, to an Avro sink on a >>remote box, do we risk data loss in the eventuality of a network >>partition/slow network or if the flume-agent on the source box dies? >>If we choose to use file channel with this source, we will result in double >>writes to disk, correct? (one for the legacy log files which will be ingested >>by the Spool Directory source, and the other for the WAL) >> >> >>Thanks, >>~Rahul. >> >> >> >>________________________________ >> From: Brock Noland <[email protected]> >>To: [email protected]; Rahul Ravindran <[email protected]> >>Sent: Tuesday, November 6, 2012 3:05 PM >>Subject: Re: Guarantees of the memory channel for delivering to sink >> >>This use case sounds like a perfect use of the Spool DIrectory source >>which will be in the upcoming 1.3 release. >> >>Brock >> >>On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[email protected]> wrote: >>> We will update the checkpoint each time (we may tune this to be periodic) >>> but the contents of the memory channel will be in the legacy logs which are >>> currently being generated. >>> >>> Additionally, the sink for the memory channel will be an Avro source in >>> another machine. >>> >>> Does that clear things up? >>> >>> ________________________________ >>> From: Brock Noland <[email protected]> >>> To: [email protected]; Rahul Ravindran <[email protected]> >>> Sent: Tuesday, November 6, 2012 1:44 PM >>> >>> Subject: Re: Guarantees of the memory channel for delivering to sink >>> >>> But in your architecture you are going to write the contents of the >>> memory channel out? Or did I miss something? >>> >>> "The checkpoint will be updated each time we perform a successive >>> insertion into the memory channel." >>> >>> On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[email protected]> wrote: >>>> We have a legacy system which writes events to a file (existing log file). >>>> This will continue. If I used a filechannel, I will be double the number >>>> of >>>> IO operations(writes to the legacy log file, and writes to WAL). >>>> >>>> ________________________________ >>>> From: Brock Noland <[email protected]> >>>> To: [email protected]; Rahul Ravindran <[email protected]> >>>> Sent: Tuesday, November 6, 2012 1:38 PM >>>> Subject: Re: Guarantees of the memory channel for delivering to sink >>>> >>>> Your still going to be writing out all events, no? So how would file >>>> channel do more IO than that? >>>> >>>> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[email protected]> wrote: >>>>> Hi, >>>>> I am very new to Flume and we are hoping to use it for our log >>>>> aggregation into HDFS. I have a few questions below: >>>>> >>>>> FileChannel will double our disk IO, which will affect IO performance on >>>>> certain performance sensitive machines. Hence, I was hoping to write a >>>>> custom Flume source which will use a memory channel, and which will >>>>> perform >>>>> checkpointing. The checkpoint will be updated each time we perform a >>>>> successive insertion into the memory channel. (I realize that this >>>>> results >>>>> in a risk of data, the maximum size of which is the capacity of the >>>>> memory >>>>> channel). >>>>> >>>>> As long as there is capacity in the memory channel buffers, does the >>>>> memory channel guarantee delivery to a sink (does it wait for >>>>> acknowledgements, and retry failed packets)? This would mean that we need >>>>> to >>>>> ensure that we do not exceed the channel capacity. >>>>> >>>>> I am writing a custom source which will use the memory channel, and which >>>>> will catch a ChannelException to identify any channel capacity issues(so, >>>>> buffer used in the memory channel is full because of lagging >>>>> sinks/network >>>>> issues etc). Is that a reasonable assumption to make? >>>>> >>>>> Thanks, >>>>> ~Rahul. >>>> >>>> >>>> >>>> -- >>>> Apache MRUnit - Unit testing MapReduce - >>>> http://incubator.apache.org/mrunit/ >>>> >>>> >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ >>> >>> >> >> >> >>-- >>Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ >> >> >> >> >> > > > >-- >Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
