Your still going to be writing out all events, no? So how would file channel do more IO than that?
On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <[email protected]> wrote: > Hi, > I am very new to Flume and we are hoping to use it for our log > aggregation into HDFS. I have a few questions below: > > FileChannel will double our disk IO, which will affect IO performance on > certain performance sensitive machines. Hence, I was hoping to write a > custom Flume source which will use a memory channel, and which will perform > checkpointing. The checkpoint will be updated each time we perform a > successive insertion into the memory channel. (I realize that this results > in a risk of data, the maximum size of which is the capacity of the memory > channel). > > As long as there is capacity in the memory channel buffers, does the > memory channel guarantee delivery to a sink (does it wait for > acknowledgements, and retry failed packets)? This would mean that we need to > ensure that we do not exceed the channel capacity. > > I am writing a custom source which will use the memory channel, and which > will catch a ChannelException to identify any channel capacity issues(so, > buffer used in the memory channel is full because of lagging sinks/network > issues etc). Is that a reasonable assumption to make? > > Thanks, > ~Rahul. -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
