Hi Padma,
See inline for my replies....
On Mon, May 12, 2014 at 4:39 PM, padma priya chitturi <
[email protected]> wrote:
> Hi,
>
> Few questions on your issue:
>
> 1. As soon as you start the topology, is that the bolt execution is not
> started forever ? or is it like after processing few tuples, bolt execution
> has stuck. Can you give clear picture on this.
>
<Srinath> Bolt execution is very normal in the beginning and for a long
time (about 1-2 days). And then abruptly stops.
> 2. You said that the behavior is seen on one of the worker process. So
> can is that on the other worker process, bolt is executed without any
> issues ?
>
<Srinath> Yes, I can see this only on one process. The bolt tasks in other
process are working fine.
> 3. What is the source from which spout reads the input ?
>
<Srinath> The spout is reading from a kestrel queue.
>
> Also, it would be great if you could provide nimbus/supervisor and worker
> logs ?
>
<Srinath> I'm currently troubleshooting this issue and looking into worker
logs. Will soon share some more details on my findings.
I don't see much in Nimbus and Supervisor logs. But the worker log details
seem to reveal some clues. Will update shortly.
>
>
> On Mon, May 12, 2014 at 6:57 AM, Srinath C <[email protected]> wrote:
>
>> Hi,
>> I'm facing a strange issue running a topology on version
>> 0.9.1-incubating with Netty as transport.
>> The topology has two worker processes on the same worker machine.
>>
>> To summarize the behavior, on one of the worker processes:
>> - one of the bolts are not getting executed: The bolt has multiple
>> executors of the same bolt but none of them are executed
>> - the spouts in the same worker process are trying to emit tuples to
>> the bolt but still the bolt is not executed
>> - after a while the spout itself is not executed (nextTuple is not
>> called)
>>
>> My suspicion is that due to some internal buffers getting filled up
>> the topology.max.spout.pending limit is hit and storm is no longer invoking
>> the spout. The topology remains hung like this for a long time and probably
>> for ever.
>> From the jstack output, I could figure out that there were 5 threads
>> lesser in the affected process than a normal process. The thread were
>> having a stack as below:
>>
>> Thread 5986: (state = BLOCKED)
>> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
>> information may be imprecise)
>> - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object,
>> long) @bci=20, line=226 (Compiled frame)
>> -
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long)
>> @bci=68, line=2082 (Compiled frame)
>> -
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take()
>> @bci=122, line=1090 (Compiled frame)
>> -
>> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take()
>> @bci=1, line=807 (Interpreted frame)
>> - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=156, line=1068
>> (Interpreted frame)
>> -
>> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>> @bci=26, line=1130 (Interpreted frame)
>> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615
>> (Interpreted frame)
>> - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)
>>
>> Has anyone seen such an issue? Any idea if I can confirm my suspicion
>> of internal buffers getting filled up? What else can I collect from the
>> processes for troubleshooting?
>>
>> Thanks,
>> Srinath.
>>
>>
>