We are running a topology that processes about 10K tuples a second on a single worker. There are about 80 executor threads across 3 different Bolt types, each receiving a tick tuple at a frequency of 1 second. We observe that the tick tuples stop coming in after few hours or sometime even after a day. The nimbus UI also shows that the tick tuples count has frozen whereas the bolts continue to process streams from other spouts. I have taken a stack trace and it shows following:
"Thread-8" daemon prio=10 tid=0x00007f1a0fed2000 nid=0x4438 runnable [0x00007f19a5e01000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:349) at com.lmax.disruptor.AbstractMultithreadedClaimStrategy.waitForFreeSlotAt(AbstractMultithreadedClaimStrategy.java:99) at com.lmax.disruptor.AbstractMultithreadedClaimStrategy.incrementAndGet(AbstractMultithreadedClaimStrategy.java:49) at com.lmax.disruptor.Sequencer.next(Sequencer.java:127) at backtype.storm.utils.DisruptorQueue.publish(DisruptorQueue.java:113) at backtype.storm.disruptor$publish.invoke(disruptor.clj:51) at backtype.storm.disruptor$publish.invoke(disruptor.clj:53) at backtype.storm.daemon.executor$setup_ticks_BANG_$fn__3885.invoke(executor.clj:299) at backtype.storm.timer$schedule_recurring$this__1839.invoke(timer.clj:79) at backtype.storm.timer$mk_timer$thread_fn__1822$fn__1823.invoke(timer.clj:32) at backtype.storm.timer$mk_timer$thread_fn__1822.invoke(timer.clj:25) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:744) Locked ownable synchronizers: - None "Thread-268-__system" prio=10 tid=0x00007f1a0ef45000 nid=0x46b5 runnable [0x00007f17ef574000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000073d9a2060> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2176) at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:87) at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:54) at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:56) at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) at backtype.storm.daemon.executor$eval4023$fn__4024$fn__4038$fn__4089.invoke(executor.clj:749) at backtype.storm.util$async_loop$fn__364.invoke(util.clj:400) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:744) Locked ownable synchronizers: - None Any pointer on this. We are running storm-0.9.0_wip21. Thanks, Sushant
