Hi,
   I am very new to Flume and we are hoping to use it for our log aggregation 
into HDFS. I have a few questions below:

FileChannel will double our disk IO, which will affect IO performance on 
certain performance sensitive machines. Hence, I was hoping to write a custom 
Flume source which will use a memory channel, and which will perform 
checkpointing. The checkpoint will be updated each time we perform a successive 
insertion into the memory channel. (I realize that this results in a risk of 
data, the maximum size of which is the capacity of the memory channel).

   As long as there is capacity in the memory channel buffers, does the memory 
channel guarantee delivery to a sink (does it wait for acknowledgements, and 
retry failed packets)? This would mean that we need to ensure that we do not 
exceed the channel capacity.

I am writing a custom source which will use the memory channel, and which will 
catch a ChannelException to identify any channel capacity issues(so, buffer 
used in the memory channel is full because of lagging sinks/network issues 
etc). Is that a reasonable assumption to make?

Thanks,
~Rahul.

Reply via email to