Hi,

I am having an issue with high cpu usage for deactivated apache storm 
topologies. I can reliably re-create the issue using the steps below but I 
haven't identified the exact cause or a solution yet.
The environment is a storm cluster on which 1 topology is running (The topology 
is extremely simple, I used the exclamation example). It is INACTIVE. Initially 
there is normal CPU usage. However, when I kill all topology JVM processes on 
all supervisors and let Storm restart them again, I find that some time later 
(~15 minutes) the CPU usage per JVM process rises to nearly 100%. I have tested 
an ACTIVE topology and this does not happen with it. I have also tested more 
than one topology and observe the same results when they're in the INACTIVE 
state.
**Steps to re-create:**
1. Run 1 topology on an Apache Storm cluster
2. Deactivate it
3. Kill *all* topology JVM processes on all supervisors (Storm will restart 
them)
4. Observe the CPU usage on Supervisors rise to nearly 100% for all *INACTIVE* 
topology JVM processes.
**Environment**
Apache Storm 1.1.0 running on 3 VMs, one nimbus and 2 supervisors.
Cluster Summary:
§  Supervisors: 2
§  Used Slots: 2
§  Available Slots: 38
§  Total Slots: 40
§  Executors: 50
§  Tasks: 50
the topology has 2 workers and 50 executors/tasks (threads).
**Investigation so far:**
Apart from being able to reliably re-create the issue, I have identified, for 
the affected topology JVM process, the threads using the most CPU. There are 
102 threads total in the process, 97 blocked, 5 IN_NATIVE. The threads using 
the most CPU are identical and there are 23 of them (all in BLOCKED state):
Thread 28558: (state = BLOCKED)
§  sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
§  java.util.concurrent.locks.LockSupport.parkNanos(long) @bci=11, line=338 
(Compiled frame)
§  com.lmax.disruptor.MultiProducerSequencer.next(int) @bci=82, line=136 
(Compiled frame)
§  com.lmax.disruptor.RingBuffer.next(int) @bci=5, line=260 (Interpreted frame)
§  org.apache.storm.utils.DisruptorQueue.publishDirect(java.util.ArrayList, 
boolean) @bci=18, line=517 (Interpreted frame)
§  
org.apache.storm.utils.DisruptorQueue.access$1000(org.apache.storm.utils.DisruptorQueue,
 java.util.ArrayList, boolean) @bci=3, line=61 (Interpreted frame)
§  org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher.flush(boolean) 
@bci=50, line=280 (Interpreted frame)
§  org.apache.storm.utils.DisruptorQueue$Flusher.run() @bci=55, line=303 
(Interpreted frame)
§  java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
(Compiled frame)
§  java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
§  
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
 @bci=95, line=1142 (Compiled frame)
§  java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
(Interpreted frame)
§  java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
I identified this thread by using `jstack` to get a thread dump for the process:
jstack -F <pid> > jstack<pid>.txt
and `top` to identify the threads within the process using the most CPU:
top -H -p <pid>

Has anyone come across this or a similar issue before? Any help would be much 
appreciated.

Thanks,
Stuart

Reply via email to