Fast replay will start only if the checkpoint files are deleted (or don't exist). If there is a checkpoint - fast replay will not start up even if the files are corrupt/unreadable. Depending on how many events are in the channel, fast replay can also fall victim to Java's GC slowdowns. That is something to think of.
I have filed https://issues.apache.org/jira/browse/FLUME-2155 to improve certain aspects of the File Channel replay. I have some ideas though I am not entirely sure when I will get time to work on this. Thanks, Hari On Friday, August 9, 2013 at 10:10 AM, Edwin Chiu wrote: > Thanks, Brock! I'll check out this 1.4 feature. > > I already have 1.3 running on production machines though, it's still > preferred to keep 1.3 unless there's no way around this potentially lengthy > log replay. > > In my scenario, there's about 4G of files under data directory. My system has > about 40G free memory. I've restarted flume with 36G max memory in flume-env, > after setting fast-replay to true. > > The resource monitoring shows 36G is allocated to the flume process. But > while replay was running, it was using about the same amount of memory as > before the new max memory was set in flume-env and with fast-replay was off. > > Any tips to "force" fast-replay to kick in? > > thanks! > > - e > > > - Edwin > > On Fri, Aug 9, 2013 at 4:26 AM, Brock Noland <[email protected] > (mailto:[email protected])> wrote: > > If fast replay doesn't help then you don't have enough RAM. I'd suggest you > > use the new dual checkpoint feature. Note the dual and backup checkpoint > > configs here: > > > > http://flume.apache.org/FlumeUserGuide.html#file-channel > > http://issues.apache.org/jira/browse/FLUME-1516 > > > > Brock > > > > On Thu, Aug 8, 2013 at 2:48 PM, Edwin Chiu <[email protected] > > (mailto:[email protected])> wrote: > > > Hi there! > > > > > > I'm using flume-ng 1.3.1 (Hortonworks latest production stable version as > > > of now) on centos 6 with jdk 1.6. > > > > > > I'm wondering how to speed up the replay of logs after changing file > > > channel parameters in flume.conf -- capacity and transactionCapacity. > > > > > > it takes hours for the node to catch up and able to receive and send > > > events again. > > > > > > use-fast-replay = true with a ridiculous amount of max memory doesn't > > > speed things up. > > > > > > Any recommendations to avoid the down time? > > > > > > thanks! > > > > > > Ed > > > > > > -- > > Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
