The memory channel doesn't know about networks. The sources like avrosource/avrosink do. They operate on TCP/IP and when there is an error sending data downstream they roll the transaction back so that no data is lost. The believe the docs cover this here http://flume.apache.org/FlumeUserGuide.html
Brock On Wed, Nov 7, 2012 at 1:52 PM, Rahul Ravindran <[email protected]> wrote: > Hi, > > Thanks for the response. > > Does the memory channel provide transactional guarantees? In the event of > a network packet loss, does it retry sending the packet? If we ensure that > we do not exceed the capacity for the memory channel, does it continue > retrying to send an event to the remote source on failure? > > Thanks, > ~Rahul. > > ------------------------------ > *From:* Brock Noland <[email protected]> > *To:* [email protected]; Rahul Ravindran <[email protected]> > *Sent:* Wednesday, November 7, 2012 11:48 AM > > *Subject:* Re: Guarantees of the memory channel for delivering to sink > > Hi, > > Yes if you use memory channel, you can lose data. To not lose data, file > channel needs to write to disk... > > Brock > > On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran <[email protected]> wrote: > > Ping on the below questions about new Spool Directory source: > > If we choose to use the memory channel with this source, to an Avro sink > on a remote box, do we risk data loss in the eventuality of a network > partition/slow network or if the flume-agent on the source box dies? > If we choose to use file channel with this source, we will result in > double writes to disk, correct? (one for the legacy log files which will be > ingested by the Spool Directory source, and the other for the WAL) > > > ------------------------------ > *From:* Rahul Ravindran <[email protected]> > *To:* "[email protected]" <[email protected]> > *Sent:* Tuesday, November 6, 2012 3:40 PM > > *Subject:* Re: Guarantees of the memory channel for delivering to sink > > This is awesome. > This may be perfect for our use case :) > > When is the 1.3 release expected? > > Couple of questions for the choice of channel for the new source: > > If we choose to use the memory channel with this source, to an Avro sink > on a remote box, do we risk data loss in the eventuality of a network > partition/slow network or if the flume-agent on the source box dies? > If we choose to use file channel with this source, we will result in > double writes to disk, correct? (one for the legacy log files which will be > ingested by the Spool Directory source, and the other for the WAL) > > Thanks, > ~Rahul. > > ------------------------------ > *From:* Brock Noland <[email protected]> > *To:* [email protected]; Rahul Ravindran <[email protected]> > *Sent:* Tuesday, November 6, 2012 3:05 PM > *Subject:* Re: Guarantees of the memory channel for delivering to sink > > This use case sounds like a perfect use of the Spool DIrectory source > which will be in the upcoming 1.3 release. > > Brock > > On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[email protected]> wrote: > > We will update the checkpoint each time (we may tune this to be periodic) > > but the contents of the memory channel will be in the legacy logs which > are > > currently being generated. > > > > Additionally, the sink for the memory channel will be an Avro source in > > another machine. > > > > Does that clear things up? > > > > ________________________________ > > From: Brock Noland <[email protected]> > > To: [email protected]; Rahul Ravindran <[email protected]> > > Sent: Tuesday, November 6, 2012 1:44 PM > > > > Subject: Re: Guarantees of the memory channel for delivering to sink > > > > But in your architecture you are going to write the contents of the > > memory channel out? Or did I miss something? > > > > "The checkpoint will be updated each time we perform a successive > > insertion into the memory channel." > > > > On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[email protected]> > wrote: > >> We have a legacy system which writes events to a file (existing log > file). > >> This will continue. If I used a filechannel, I will be double the number > >> of > >> IO operations(writes to the legacy log file, and writes to WAL). > >> > >> ________________________________ > >> From: Brock Noland <[email protected]> > >> To: [email protected]; Rahul Ravindran <[email protected]> > >> Sent: Tuesday, November 6, 2012 1:38 PM > >> Subject: Re: Guarantees of the memory channel for delivering to sink > >> > >> Your still going to be writing out all events, no? So how would file > >> channel do more IO than that? > >> > >> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[email protected]> > wrote: > >>> Hi, > >>> I am very new to Flume and we are hoping to use it for our log > >>> aggregation into HDFS. I have a few questions below: > >>> > >>> FileChannel will double our disk IO, which will affect IO performance > on > >>> certain performance sensitive machines. Hence, I was hoping to write a > >>> custom Flume source which will use a memory channel, and which will > >>> perform > >>> checkpointing. The checkpoint will be updated each time we perform a > >>> successive insertion into the memory channel. (I realize that this > >>> results > >>> in a risk of data, the maximum size of which is the capacity of the > >>> memory > >>> channel). > >>> > >>> As long as there is capacity in the memory channel buffers, does the > >>> memory channel guarantee delivery to a sink (does it wait for > >>> acknowledgements, and retry failed packets)? This would mean that we > need > >>> to > >>> ensure that we do not exceed the channel capacity. > >>> > >>> I am writing a custom source which will use the memory channel, and > which > >>> will catch a ChannelException to identify any channel capacity > issues(so, > >>> buffer used in the memory channel is full because of lagging > >>> sinks/network > >>> issues etc). Is that a reasonable assumption to make? > >>> > >>> Thanks, > >>> ~Rahul. > >> > >> > >> > >> -- > >> Apache MRUnit - Unit testing MapReduce - > >> http://incubator.apache.org/mrunit/ > >> > >> > > > > > > > > -- > > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
