OK, I understand. I can't apply the patch, I have a format failed error, not sure why. Is this a diff from trunk? or from some local version? I see some changes with no matching lines in code.
Many thanks Anat On Tue, Sep 24, 2013 at 9:15 PM, Hari Shreedharan <[email protected] > wrote: > That is actually a symptom of the real problem. The real problem is that > the remove method ends up hitting the main checkpoint data structure and > causes too many ops on the hash map. The real fix is in the patch I > mentioned which reduce the number of ops tremendously. > > > Thanks, > Hari > > On Tuesday, September 24, 2013 at 6:12 AM, Anat Rozenzon wrote: > > For example this stack trace: > > > "lifecycleSupervisor-1-2" prio=10 tid=0x00007f89141d8800 nid=0x5ac8 > runnable [0x00007f89501ad000] > java.lang.Thread.State: RUNNABLE > at java.lang.Integer.valueOf(Integer.java:642) > at > org.apache.flume.channel.file.EventQueueBackingStoreFile.get(EventQueueBackingStoreFile.java:310) > at > org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225) > at > org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195) > - locked <0x00000006890f68f0> (a > org.apache.flume.channel.file.FlumeEventQueue) > at > org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405) > at > org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328) > at org.apache.flume.channel.file.Log.doReplay(Log.java:503) > at org.apache.flume.channel.file.Log.replay(Log.java:430) > at > org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302) > - locked <0x00000006890ea360> (a > org.apache.flume.channel.file.FileChannel) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > - locked <0x00000006890ea360> (a > org.apache.flume.channel.file.FileChannel) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > > On Tue, Sep 24, 2013 at 4:10 PM, Anat Rozenzon <[email protected]> wrote: > > After some deeper dive, it seems that the problem is with HashMap usage in > EventQueueBackingStoreFile. > > Almost every time I run jstack the JVM is inside > EventQueueBackingStoreFile.get() doing either HashMap.containsKey() or > Integer.valueOf(). > This is because of overwriteMap is defined as regular HashMap<Integer, > Long>(). > > Does your fix solves this issue? > > I think maybe using a Long[] will be better. > > > On Tue, Sep 24, 2013 at 2:34 PM, Anat Rozenzon <[email protected]> wrote: > > Thanks Hari, great news, I'll be glad to test it. > > However, I don't have environment with trunk, any way I can get it > packaged somehow? > > > On Mon, Sep 23, 2013 at 8:50 PM, Hari Shreedharan < > [email protected]> wrote: > > How many events does the File Channel get every 30 seconds and how many > get taken out? This is one of the edge cases of the File Channel I have > been working on ironing out. There is a patch on > https://issues.apache.org/jira/browse/FLUME-2155 (the > FLUME-2155-initial.patch file). If you have data that takes an hour to > start, and don't mind testing out this patch (this might be buggy, cause > data loss, hangs etc - so testing in prod is not recommended), apply this > patch to trunk and test it out, and see if it improves the startup time. > > > Thanks, > Hari > > On Monday, September 23, 2013 at 9:16 AM, Anat Rozenzon wrote: > > Hi, > > I have a flume instance that is collecting logs from several flume agents > using avro source and file channel. > Recently, when I'm restarting the collector it takes about an hour to > start listening on the avro port. > PSB a jstack entry, any idea why the startup is slow? > > Thanks > Anat > > "lifecycleSupervisor-1-0" prio=10 tid=0x00007f01505e4800 nid=0x4c78 > runnable [0x00007f01441d6000] > java.lang.Thread.State: RUNNABLE > at > org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225) > at > org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195) > - locked <0x0000000689149c30> (a > org.apache.flume.channel.file.FlumeEventQueue) > at > org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405) > at > org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328) > at org.apache.flume.channel.file.Log.doReplay(Log.java:503) > at org.apache.flume.channel.file.Log.replay(Log.java:430) > at > org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302) > - locked <0x0000000689145ca8> (a > org.apache.flume.channel.file.FileChannel) > at > org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) > - locked <0x0000000689145ca8> (a > org.apache.flume.channel.file.FileChannel) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > > > > >
