Re: Enabling file channel backup checkpoint causes significant disk IO at start-up

Hari Shreedharan Mon, 08 Sep 2014 13:07:57 -0700

This patch should address the issue, if enabled:https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commitdiff;h=69fd6b3ad5e5b9ae6f1293b3d8e57ed57fd6701c;hp=f15f20785262ac3cb3e35c2a12e669b7a836d35f


It will be part of the next Flume release (or CDH5.2.0).


--

Thanks,
Hari

Michael Diamant <mailto:[email protected]>
September 8, 2014 at 12:58 PM
My team uses Flume 1.4.0 packaged with CDH5.0.2 via an embedded agentto write to a file channel. From a previous thread started by mycolleague, "FileChannel Replays consistently take a long time" andassociated issue, https://issues.apache.org/jira/browse/FLUME-2450, itwas suggested to use a backup checkpoint directory to avoid lengthyreplays. When I enabled the backup checkpoint directory, I observedvia iotop near 100% IO by my application with the embedded agent.This level of IO persists for about 30 seconds rendering theapplication unusable during this time period.
For comparison, I monitored via iotop when backup checkpoint isdisabled. IO activity occurs for at most several seconds. That is,there is a qualitative difference when enabling the backup checkpointdirectory. Additionally, I also tried deleting the existingcheckpoints/data directories to start with a clean slate. Thoseexperiment results are in-line with my above observations.
Is this expected behavior when using a backup checkpoint directory?Is there anyway in which the amount of IO can be reduced? Iappreciate feedback and insights because the current behavior isuntenable for a production environment.
Thank you,
Michael

Re: Enabling file channel backup checkpoint causes significant disk IO at start-up

Reply via email to