Re: Flume startup takes ~ hour

Anat Rozenzon Tue, 24 Sep 2013 23:30:33 -0700

OK, I understand.

I can't apply the patch, I have a format failed error, not sure why.
Is this a diff from trunk? or from some local version? I see some changes
with no matching lines in code.


Many thanks
Anat


On Tue, Sep 24, 2013 at 9:15 PM, Hari Shreedharan <[email protected]
> wrote:

> That is actually a symptom of the real problem. The real problem is that
> the remove method ends up hitting the main checkpoint data structure and
> causes too many ops on the hash map. The real fix is in the patch I
> mentioned which reduce the number of ops tremendously.
>
>
> Thanks,
> Hari
>
> On Tuesday, September 24, 2013 at 6:12 AM, Anat Rozenzon wrote:
>
> For example this stack trace:
>
>
> "lifecycleSupervisor-1-2" prio=10 tid=0x00007f89141d8800 nid=0x5ac8
> runnable [0x00007f89501ad000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.Integer.valueOf(Integer.java:642)
>         at
> org.apache.flume.channel.file.EventQueueBackingStoreFile.get(EventQueueBackingStoreFile.java:310)
>         at
> org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)
>         at
> org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
>         - locked <0x00000006890f68f0> (a
> org.apache.flume.channel.file.FlumeEventQueue)
>         at
> org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
>         at
> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
>         at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
>         at org.apache.flume.channel.file.Log.replay(Log.java:430)
>         at
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
>         - locked <0x00000006890ea360> (a
> org.apache.flume.channel.file.FileChannel)
>         at
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         - locked <0x00000006890ea360> (a
> org.apache.flume.channel.file.FileChannel)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
>
> On Tue, Sep 24, 2013 at 4:10 PM, Anat Rozenzon <[email protected]> wrote:
>
> After some deeper dive, it seems that the problem is with HashMap usage in
> EventQueueBackingStoreFile.
>
> Almost every time I run jstack the JVM is inside
> EventQueueBackingStoreFile.get() doing either HashMap.containsKey() or
> Integer.valueOf().
> This is because of overwriteMap is defined as regular HashMap<Integer,
> Long>().
>
> Does your fix solves this issue?
>
> I think maybe using a Long[] will be better.
>
>
> On Tue, Sep 24, 2013 at 2:34 PM, Anat Rozenzon <[email protected]> wrote:
>
> Thanks Hari, great news, I'll be glad to test it.
>
> However, I don't have environment with trunk, any way I can get it
> packaged somehow?
>
>
> On Mon, Sep 23, 2013 at 8:50 PM, Hari Shreedharan <
> [email protected]> wrote:
>
>  How many events does the File Channel get every 30 seconds and how many
> get taken out? This is one of the edge cases of the File Channel I have
> been working on ironing out. There is a patch on
> https://issues.apache.org/jira/browse/FLUME-2155 (the
> FLUME-2155-initial.patch file). If you have data that takes an hour to
> start, and don't mind testing out this patch (this might be buggy, cause
> data loss, hangs etc - so testing in prod is not recommended), apply this
> patch to trunk and test it out, and see if it improves the startup time.
>
>
> Thanks,
> Hari
>
> On Monday, September 23, 2013 at 9:16 AM, Anat Rozenzon wrote:
>
> Hi,
>
> I have a flume instance that is collecting logs from several flume agents
> using avro source and file channel.
> Recently, when I'm restarting the collector it takes about an hour to
> start listening on the avro port.
> PSB a jstack entry, any idea why the startup is slow?
>
> Thanks
> Anat
>
> "lifecycleSupervisor-1-0" prio=10 tid=0x00007f01505e4800 nid=0x4c78
> runnable [0x00007f01441d6000]
>    java.lang.Thread.State: RUNNABLE
>         at
> org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)
>         at
> org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
>         - locked <0x0000000689149c30> (a
> org.apache.flume.channel.file.FlumeEventQueue)
>         at
> org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
>         at
> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
>         at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
>         at org.apache.flume.channel.file.Log.replay(Log.java:430)
>         at
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
>         - locked <0x0000000689145ca8> (a
> org.apache.flume.channel.file.FileChannel)
>         at
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         - locked <0x0000000689145ca8> (a
> org.apache.flume.channel.file.FileChannel)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
>
>
>
>
>

Re: Flume startup takes ~ hour

Reply via email to