Re: Guarantees of the memory channel for delivering to sink

Brock Noland Wed, 07 Nov 2012 12:15:22 -0800

The memory channel doesn't know about networks.  The sources like
avrosource/avrosink do. They operate on TCP/IP and when there is an error
sending data downstream they roll the transaction back so that no data is
lost. The believe the docs cover this here
http://flume.apache.org/FlumeUserGuide.html


Brock

On Wed, Nov 7, 2012 at 1:52 PM, Rahul Ravindran <[email protected]> wrote:

> Hi,
>
> Thanks for the response.
>
> Does the memory channel provide transactional guarantees? In the event of
> a network packet loss, does it retry sending the packet? If we ensure that
> we do not exceed the capacity for the memory channel, does it continue
> retrying to send an event to the remote source on failure?
>
> Thanks,
> ~Rahul.
>
>   ------------------------------
> *From:* Brock Noland <[email protected]>
> *To:* [email protected]; Rahul Ravindran <[email protected]>
> *Sent:* Wednesday, November 7, 2012 11:48 AM
>
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> Hi,
>
> Yes if you use memory channel, you can lose data. To not lose data, file
> channel needs to write to disk...
>
> Brock
>
> On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran <[email protected]> wrote:
>
> Ping on the below questions about new Spool Directory source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
>
>   ------------------------------
> *From:* Rahul Ravindran <[email protected]>
>  *To:* "[email protected]" <[email protected]>
> *Sent:* Tuesday, November 6, 2012 3:40 PM
>
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This is awesome.
> This may be perfect for our use case :)
>
> When is the 1.3 release expected?
>
> Couple of questions for the choice of channel for the new source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
> Thanks,
> ~Rahul.
>
>   ------------------------------
> *From:* Brock Noland <[email protected]>
> *To:* [email protected]; Rahul Ravindran <[email protected]>
> *Sent:* Tuesday, November 6, 2012 3:05 PM
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This use case sounds like a perfect use of the Spool DIrectory source
> which will be in the upcoming 1.3 release.
>
> Brock
>
> On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <[email protected]> wrote:
> > We will update the checkpoint each time (we may tune this to be periodic)
> > but the contents of the memory channel will be in the legacy logs which
> are
> > currently being generated.
> >
> > Additionally, the sink for the memory channel will be an Avro source in
> > another machine.
> >
> > Does that clear things up?
> >
> > ________________________________
> > From: Brock Noland <[email protected]>
> > To: [email protected]; Rahul Ravindran <[email protected]>
> > Sent: Tuesday, November 6, 2012 1:44 PM
> >
> > Subject: Re: Guarantees of the memory channel for delivering to sink
> >
> > But in your architecture you are going to write the contents of the
> > memory channel out? Or did I miss something?
> >
> > "The checkpoint will be updated each time we perform a successive
> > insertion into the memory channel."
> >
> > On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <[email protected]>
> wrote:
> >> We have a legacy system which writes events to a file (existing log
> file).
> >> This will continue. If I used a filechannel, I will be double the number
> >> of
> >> IO operations(writes to the legacy log file, and writes to WAL).
> >>
> >> ________________________________
> >> From: Brock Noland <[email protected]>
> >> To: [email protected]; Rahul Ravindran <[email protected]>
> >> Sent: Tuesday, November 6, 2012 1:38 PM
> >> Subject: Re: Guarantees of the memory channel for delivering to sink
> >>
> >> Your still going to be writing out all events, no? So how would file
> >> channel do more IO than that?
> >>
> >> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[email protected]>
> wrote:
> >>> Hi,
> >>>    I am very new to Flume and we are hoping to use it for our log
> >>> aggregation into HDFS. I have a few questions below:
> >>>
> >>> FileChannel will double our disk IO, which will affect IO performance
> on
> >>> certain performance sensitive machines. Hence, I was hoping to write a
> >>> custom Flume source which will use a memory channel, and which will
> >>> perform
> >>> checkpointing. The checkpoint will be updated each time we perform a
> >>> successive insertion into the memory channel. (I realize that this
> >>> results
> >>> in a risk of data, the maximum size of which is the capacity of the
> >>> memory
> >>> channel).
> >>>
> >>>    As long as there is capacity in the memory channel buffers, does the
> >>> memory channel guarantee delivery to a sink (does it wait for
> >>> acknowledgements, and retry failed packets)? This would mean that we
> need
> >>> to
> >>> ensure that we do not exceed the channel capacity.
> >>>
> >>> I am writing a custom source which will use the memory channel, and
> which
> >>> will catch a ChannelException to identify any channel capacity
> issues(so,
> >>> buffer used in the memory channel is full because of lagging
> >>> sinks/network
> >>> issues etc). Is that a reasonable assumption to make?
> >>>
> >>> Thanks,
> >>> ~Rahul.
> >>
> >>
> >>
> >> --
> >> Apache MRUnit - Unit testing MapReduce -
> >> http://incubator.apache.org/mrunit/
> >>
> >>
> >
> >
> >
> > --
> > Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
> >
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>
>
>
>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Guarantees of the memory channel for delivering to sink

Reply via email to