Brock, This looks like FLUME-1417. This logs on the jira show when the problem is hit during startup. I actually managed to get the Log Id is null error during runtime when I was testing that issue, if you change to small file size and checkpoint very often.
Thanks, Hari -- Hari Shreedharan On Friday, October 5, 2012 at 11:19 AM, Brock Noland wrote: > Hi, > > Just curious if you got around this or figured out what was going on? > Makes me a little nervous about a file channel bug. > > Brock > > On Tue, Oct 2, 2012 at 6:28 AM, Brock Noland <[email protected] > (mailto:[email protected])> wrote: > > Also, if you could send us your full log that would be great. The > > email list doesn't take attachements so either: > > > > 1) post it on pastbin > > or > > 2) zip it and mail it to me directly > > > > Brock > > > > On Tue, Oct 2, 2012 at 6:06 AM, Brock Noland <[email protected] > > (mailto:[email protected])> wrote: > > > Hi, > > > > > > What version of flume? If trunk (1.3.0-SNAPSHOT) what is the last > > > patch you have? > > > > > > Can you how us a ls -la of your data and checkpoint directories? > > > > > > Brock > > > > > > On Tue, Oct 2, 2012 at 3:43 AM, Raymond Ng <[email protected] > > > (mailto:[email protected])> wrote: > > > > Just to add more info to this, I've checked the File channel where a > > > > "ChannelException: Cannot acquire capacity" is reported against, and > > > > can see > > > > file log-1 has the size of 0 and log-2 has over 300 MB of data, > > > > comparing > > > > with another File channel which has files log-2 and log-3 both with > > > > data in > > > > it but no file log-1 is found. > > > > > > > > sounds like log-1 is the one causing the "NullPointerException: LogFile > > > > is > > > > null for id 1" below, and when I restarted flume, I get the following > > > > warning. I can confirm there was no manual tampering in the file channel > > > > directory > > > > > > > > 2012-10-02 09:38:10,231 INFO [conf-file-poller-0] > > > > DefaultLogicalNodeManager.java - Starting Channel probeFileChannel1 > > > > 2012-10-02 09:38:10,239 INFO [conf-file-poller-0] > > > > DefaultLogicalNodeManager.java - Starting Channel probeFileChannel3 > > > > 2012-10-02 09:38:10,313 WARN [lifecycleSupervisor-1-2] > > > > ReplayHandler.java - > > > > Hit EOF on /home/user/flume-ng/filechannel3/data/log-1 > > > > 2012-10-02 09:38:10,314 INFO [lifecycleSupervisor-1-1] > > > > DirectMemoryUtils.java - Unable to get maxDirectMemory from VM: > > > > NoSuchMethodException: sun.misc.VM.maxDirectMemory(null) > > > > 2012-10-02 09:38:10,317 INFO [lifecycleSupervisor-1-1] > > > > DirectMemoryUtils.java - Direct Memory Allocation: Allocation = 1048576, > > > > Allocated = 0, MaxDirectMemorySize = 954466304, Remaining = 954466304 > > > > 2012-10-02 09:38:10,381 WARN [lifecycleSupervisor-1-1] LogFile.java - > > > > Checkpoint for file(/home/user/flume-ng/filechannel1/data/log-2) is: > > > > 1349166469095, which is beyond the requested checkpoint time: 0. > > > > 2012-10-02 09:38:10,381 WARN [lifecycleSupervisor-1-2] LogFile.java - > > > > Checkpoint for file(/home/user/flume-ng/filechannel3/data/log-2) is: > > > > 1349166991594, which is beyond the requested checkpoint time: 0. > > > > 2012-10-02 09:41:52,144 ERROR [lifecycleSupervisor-1-2] > > > > ReplayHandler.java - > > > > Pending takes 32103 exist after the end of replay. Duplicate messages > > > > will > > > > exist in destination. > > > > 2012-10-02 09:41:52,709 INFO [lifecycleSupervisor-1-2] > > > > MonitoredCounterGroup.java - Component type: CHANNEL, name: > > > > probeFileChannel3 started > > > > 2012-10-02 09:42:31,413 WARN [lifecycleSupervisor-1-1] LogFile.java - > > > > Checkpoint for > > > > file(/home/cluster_admin/flume-ng/filechannel1/data/log-3) > > > > is: 1349166981020, which is beyond the requested checkpoint time: 0. > > > > 2012-10-02 09:45:14,836 ERROR [lifecycleSupervisor-1-1] > > > > ReplayHandler.java - > > > > Pending takes 8409 exist after the end of replay. Duplicate messages > > > > will > > > > exist in destination. > > > > 2012-10-02 09:45:15,453 INFO [lifecycleSupervisor-1-1] > > > > MonitoredCounterGroup.java - Component type: CHANNEL, name: > > > > probeFileChannel1 started > > > > > > > > > > > > On Tue, Oct 2, 2012 at 9:19 AM, Raymond Ng <[email protected] > > > > (mailto:[email protected])> wrote: > > > > > > > > > > Hi > > > > > > > > > > Could I have some advice for the following exception please, is this > > > > > related to the "ChannelException: Cannot acquire capacity" which I > > > > > experience from time to time > > > > > > > > > > > > > > > 2012-10-02 09:16:53,563 ERROR [Log-BackgroundWorker] Log.java - > > > > > General > > > > > error in checkpoint worker > > > > > java.lang.NullPointerException > > > > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:738) > > > > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:692) > > > > > at org.apache.flume.channel.file.Log.access$300(Log.java:57) > > > > > at > > > > > org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:892) > > > > > 2012-10-02 09:16:56,317 ERROR > > > > > [SinkRunner-PollingRunner-DefaultSinkProcessor] HDFSEventSink.java - > > > > > process > > > > > failed > > > > > java.lang.NullPointerException: LogFile is null for id 1 > > > > > at > > > > > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) > > > > > at org.apache.flume.channel.file.Log.get(Log.java:316) > > > > > at > > > > > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:373) > > > > > at > > > > > org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113) > > > > > at > > > > > org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:91) > > > > > at > > > > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:383) > > > > > at > > > > > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > > > > > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > > > > > at java.lang.Thread.run(Thread.java:662) > > > > > 2012-10-02 09:16:56,318 ERROR > > > > > [SinkRunner-PollingRunner-DefaultSinkProcessor] SinkRunner.java - > > > > > Unable to > > > > > deliver event. Exception follows. > > > > > org.apache.flume.EventDeliveryException: > > > > > java.lang.NullPointerException: > > > > > LogFile is null for id 1 > > > > > at > > > > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:450) > > > > > at > > > > > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > > > > > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > > > > > at java.lang.Thread.run(Thread.java:662) > > > > > Caused by: java.lang.NullPointerException: LogFile is null for id 1 > > > > > at > > > > > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) > > > > > at org.apache.flume.channel.file.Log.get(Log.java:316) > > > > > at > > > > > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:373) > > > > > at > > > > > org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113) > > > > > at > > > > > org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:91) > > > > > at > > > > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:383) > > > > > ... 3 more > > > > > 2012-10-02 09:16:56,625 ERROR [Log-BackgroundWorker] Log.java - > > > > > General > > > > > error in checkpoint worker > > > > > java.lang.NullPointerException > > > > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:738) > > > > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:692) > > > > > at org.apache.flume.channel.file.Log.access$300(Log.java:57) > > > > > at > > > > > org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:892) > > > > > 2012-10-02 09:16:59,678 ERROR [Log-BackgroundWorker] Log.java - > > > > > General > > > > > error in checkpoint worker > > > > > java.lang.NullPointerException > > > > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:738) > > > > > at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:692) > > > > > at org.apache.flume.channel.file.Log.access$300(Log.java:57) > > > > > at > > > > > org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:892) > > > > > 2012-10-02 09:17:01,318 ERROR > > > > > [SinkRunner-PollingRunner-DefaultSinkProcessor] HDFSEventSink.java - > > > > > process > > > > > failed > > > > > java.lang.NullPointerException: LogFile is null for id 1 > > > > > at > > > > > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) > > > > > at org.apache.flume.channel.file.Log.get(Log.java:316) > > > > > at > > > > > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:373) > > > > > at > > > > > org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113) > > > > > at > > > > > org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:91) > > > > > at > > > > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:383) > > > > > at > > > > > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > > > > > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > > > > > at java.lang.Thread.run(Thread.java:662) > > > > > 2012-10-02 09:17:01,318 ERROR > > > > > [SinkRunner-PollingRunner-DefaultSinkProcessor] SinkRunner.java - > > > > > Unable to > > > > > deliver event. Exception follows. > > > > > org.apache.flume.EventDeliveryException: > > > > > java.lang.NullPointerException: > > > > > LogFile is null for id 1 > > > > > at > > > > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:450) > > > > > at > > > > > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > > > > > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > > > > > at java.lang.Thread.run(Thread.java:662) > > > > > Caused by: java.lang.NullPointerException: LogFile is null for id 1 > > > > > at > > > > > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) > > > > > at org.apache.flume.channel.file.Log.get(Log.java:316) > > > > > at > > > > > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:373) > > > > > at > > > > > org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113) > > > > > at > > > > > org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:91) > > > > > at > > > > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:383) > > > > > ... 3 more > > > > > > > > > > > > > > > > > > > > -- > > > > > Rgds > > > > > Ray > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Rgds > > > > Ray > > > > > > > > > > > > > > > > > > > -- > > > Apache MRUnit - Unit testing MapReduce - > > > http://incubator.apache.org/mrunit/ > > > > > > > > > > > > > -- > > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > >
