Hi Tim,

You're actually hitting a Shell Spout death failure that we also identified
in production at Parse.ly using streamparse. It has to do with the
ShellSpout implementation.

If a ShellBolt does this, Storm automatically restarts the faulty
component. But if a ShellSpout does it, it hangs in just the way you're
describing.

There is actually a fix pending for this in Storm 1.0.2 that (we think)
addresses this issue, described in the JIRA issue STORM-1928
<https://issues.apache.org/jira/browse/STORM-1928>. Since this release is now
available on Github <https://github.com/apache/storm/releases/tag/v1.0.2>,
you may want to give it a try and see if the issue goes away.

It would be good for the community to know if this actually is the issue
that fixes things in 1.0.x. We are actually testing some patches against
the 0.9.x line that do the same over at Parse.ly.

--
Andrew Montalenti | CTO, Parse.ly

On Thu, Jul 28, 2016 at 1:30 PM, Tim Hopper <[email protected]>
wrote:

> I’m running streamparse3-based topologies on Storm 1.0.1.
>
> I’m able to improve my throughput by increasing the max pending tuples.
> However, the topology runs for a while and then dies. I get this message in
> the logs:
>
>
> 2016-07-28 16:21:21.946 o.a.s.s.ShellSpout [ERROR] Halting process:
> ShellSpout died.
> java.lang.RuntimeException: subprocess heartbeat timeout
> at
> org.apache.storm.spout.ShellSpout$SpoutHeartbeatTimerTask.run(ShellSpout.java:275)
> [storm-core-1.0.1.jar:1.0.1]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_91]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [?:1.8.0_91]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [?:1.8.0_91]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [?:1.8.0_91]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_91]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
> 2016-07-28 16:21:21.955 o.a.s.d.executor [ERROR]
>
>
>
> The bizarre thing to me is that the Storm UI gives no indication of what’s
> going on. No tuples fail. No errors appear. The storm metrics just stop
> changing. The worker processes aren’t restarted. All my statsd metrics
> flatline. It just dies.
>
> Can anyone help me troubleshoot this? From states, I can see that I’m not
> running out of system memory. Perhaps the heap is filling up?
>

Reply via email to