I searched through the logs but did not find a BadCheckpointException. I 
extrapolated the 6G from the size of the log-* files being read. Below is a 
snippet of the logs. As you can see from the timestamp of the log lines, the 
channel ch3 was started at time 3:32 and channel startup activity completed 
only at 4:40

07 May 2013 03:32:40,598 INFO  [lifecycleSupervisor-1-1] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-297.meta currentPosition = 1455187648, logWriteOrderID = 
1385582495370
07 May 2013 03:32:40,599 INFO  [lifecycleSupervisor-1-1] 
(org.apache.flume.channel.file.FileChannel.start:309)  - Queue Size after 
replay: 58497343 [channel=ch3]
07 May 2013 03:32:40,628 INFO  [lifecycleSupervisor-1-1] 
(org.apache.flume.instrumentation.MonitoredCounterGroup.register:89)  - 
Monitoried counter group for type: CHANNEL, name: ch3, registered successfully.
07 May 2013 03:32:40,628 INFO  [lifecycleSupervisor-1-1] 
(org.apache.flume.instrumentation.MonitoredCounterGroup.start:73)  - Component 
type: CHANNEL, name: ch3 started
07 May 2013 03:32:40,887 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - 
Encountered EOF at 1623195640 in /flume2/data/log-861
07 May 2013 03:32:45,972 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - 
Encountered EOF at 1623195632 in /flume2/data/log-862
07 May 2013 03:32:47,777 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - 
Encountered EOF at 1623195646 in /flume2/data/log-863
07 May 2013 03:32:49,349 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - 
Encountered EOF at 1623195620 in /flume2/data/log-864
07 May 2013 04:17:37,858 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - 
Encountered EOF at 1442343132 in /flume2/data/log-865
07 May 2013 04:17:37,859 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:320)  - read: 1829658, 
put: 0, take: 45223, rollback: 1, commit: 11, skip: 1784423, eventCount:44000
07 May 2013 04:17:37,860 ERROR [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:354)  - Pending takes 
1473 exist after the end of replay. Duplicate messages will exist in 
destination.
07 May 2013 04:17:37,860 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.Log.replay:399)  - Rolling /flume2/data
07 May 2013 04:17:37,860 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.Log.roll:811)  - Roll start /flume2/data
07 May 2013 04:17:37,861 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$Writer.<init>:171)  - Opened 
/flume2/data/log-866
07 May 2013 04:17:37,862 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.Log.roll:826)  - Roll end
07 May 2013 04:17:37,863 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:109)  - 
Start checkpoint for /flume2/checkpoint/checkpoint, elements to sync = 42527
07 May 2013 04:17:37,865 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:117)  - 
Updating checkpoint metadata: logWriteOrderID: 1385582495371, queueSize: 
46200121, queueHead: 65890565
07 May 2013 04:17:37,906 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-866.meta currentPosition = 0, logWriteOrderID = 1385582495371
07 May 2013 04:17:37,907 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.Log.writeCheckpoint:886)  - Updated checkpoint 
for file: /flume2/data/log-866 position: 0 logWriteOrderID: 1385582495371
07 May 2013 04:17:37,908 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-855
07 May 2013 04:17:37,913 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-855.meta currentPosition = 1592727635, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,915 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-856
07 May 2013 04:17:37,921 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-856.meta currentPosition = 1601406592, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,922 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-857
07 May 2013 04:17:37,928 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-857.meta currentPosition = 1608758564, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,929 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-858
07 May 2013 04:17:37,935 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-858.meta currentPosition = 1558785242, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,936 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-859
07 May 2013 04:17:37,941 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-859.meta currentPosition = 586542373, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,943 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-861
07 May 2013 04:17:37,949 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-861.meta currentPosition = 1556011472, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,951 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-862
07 May 2013 04:17:37,956 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-862.meta currentPosition = 1563024320, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,958 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-863
07 May 2013 04:17:37,963 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-863.meta currentPosition = 1612213007, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,965 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-864
07 May 2013 04:17:37,971 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-864.meta currentPosition = 1613618738, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,972 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume2/data/log-865
07 May 2013 04:17:37,977 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-865.meta currentPosition = 1440488562, logWriteOrderID = 
1385582495371
07 May 2013 04:17:37,979 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.channel.file.FileChannel.start:309)  - Queue Size after 
replay: 46200121 [channel=ch2]
07 May 2013 04:17:37,979 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.instrumentation.MonitoredCounterGroup.register:89)  - 
Monitoried counter group for type: CHANNEL, name: ch2, registered successfully.
07 May 2013 04:17:37,980 INFO  [lifecycleSupervisor-1-3] 
(org.apache.flume.instrumentation.MonitoredCounterGroup.start:73)  - Component 
type: CHANNEL, name: ch2 started
07 May 2013 04:40:37,198 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - 
Encountered EOF at 1522974283 in /flume1/data/log-861
07 May 2013 04:40:37,199 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:320)  - read: 1249010, 
put: 0, take: 121197, rollback: 0, commit: 31, skip: 1127782, eventCount:124000
07 May 2013 04:40:37,199 ERROR [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:354)  - Pending takes 
3504 exist after the end of replay. Duplicate messages will exist in 
destination.
07 May 2013 04:40:37,199 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.Log.replay:399)  - Rolling /flume1/data
07 May 2013 04:40:37,200 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.Log.roll:811)  - Roll start /flume1/data
07 May 2013 04:40:37,200 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$Writer.<init>:171)  - Opened 
/flume1/data/log-862
07 May 2013 04:40:37,202 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.Log.roll:826)  - Roll end
07 May 2013 04:40:37,203 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:109)  - 
Start checkpoint for /flume1/checkpoint/checkpoint, elements to sync = 120496
07 May 2013 04:40:37,204 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:117)  - 
Updating checkpoint metadata: logWriteOrderID: 1385582495372, queueSize: 
29456099, queueHead: 22675305
07 May 2013 04:40:37,238 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-862.meta currentPosition = 0, logWriteOrderID = 1385582495372
07 May 2013 04:40:37,240 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.Log.writeCheckpoint:886)  - Updated checkpoint 
for file: /flume1/data/log-862 position: 0 logWriteOrderID: 1385582495372
07 May 2013 04:40:37,240 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-855
07 May 2013 04:40:37,246 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-855.meta currentPosition = 1547225990, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,247 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-856
07 May 2013 04:40:37,253 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-856.meta currentPosition = 892596719, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,255 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-857
07 May 2013 04:40:37,260 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-857.meta currentPosition = 1559591451, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,262 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-858
07 May 2013 04:40:37,267 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-858.meta currentPosition = 1550429668, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,269 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-859
07 May 2013 04:40:37,274 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-859.meta currentPosition = 1616923189, logWriteOrderID = 
1385582495372


________________________________
 From: Hari Shreedharan <[email protected]>
To: "[email protected]" <[email protected]>; Rahul Ravindran 
<[email protected]> 
Sent: Monday, May 6, 2013 9:57 PM
Subject: Re: Usage of use-fast-replay for FileChannel
 


Did you have an issue with the checkpoint that the entire 6G of data was 
replayed (look for BadCheckpointException in the logs to figure out if the 
channel was stopped in middle of a checkpoint)?

With the next version of Flume, you should be able to recover even if the 
channel stopped while the checkpoint was being written.

Fast Replay will try to maintain order, but it will require a massive amount of 
memory to run if you have a large number of events. Also, fast replay will only 
run if the checkpoint is corrupt/does not exist.

Hari




On Mon, May 6, 2013 at 9:40 PM, Rahul Ravindran <[email protected]> wrote:

Hi,
>   For FileChannel, how much of a performance improvement in replay times were 
>observed with use-fast-replay? We currently have use-fast-replay set to false 
>and were replaying about 6 G of data. We noticed replay times of about one 
>hour. I looked at the code and it appears that fast-replay does not guarantee 
>the same ordering of events during replay. Is this accurate? Are there any 
>other downsides of using fast-replay? Any stability concerns?
>Thanks,
>~Rahul.

Reply via email to