Yeah I also think you hit STORM-1928 <https://issues.apache.org/jira/browse/STORM-1928>. ShellSpout should not check heartbeat while there's no interaction between ShellSpout and subprocess due to max spout pending.
Btw, release vote for Storm 1.0.2 RC4 is now live. I encourage you to participate the vote so that we can feel 1.0.2 as more stable. Thanks, Jungtaek Lim (HeartSaVioR) 2016년 7월 30일 (토) 오전 4:43, Andrew Montalenti <[email protected]>님이 작성: > Hi Tim, > > You're actually hitting a Shell Spout death failure that we also > identified in production at Parse.ly using streamparse. It has to do with > the ShellSpout implementation. > > If a ShellBolt does this, Storm automatically restarts the faulty > component. But if a ShellSpout does it, it hangs in just the way you're > describing. > > There is actually a fix pending for this in Storm 1.0.2 that (we think) > addresses this issue, described in the JIRA issue STORM-1928 > <https://issues.apache.org/jira/browse/STORM-1928>. Since this release is now > available on Github <https://github.com/apache/storm/releases/tag/v1.0.2>, > you may want to give it a try and see if the issue goes away. > > It would be good for the community to know if this actually is the issue > that fixes things in 1.0.x. We are actually testing some patches against > the 0.9.x line that do the same over at Parse.ly. > > -- > Andrew Montalenti | CTO, Parse.ly > > On Thu, Jul 28, 2016 at 1:30 PM, Tim Hopper <[email protected] > > wrote: > >> I’m running streamparse3-based topologies on Storm 1.0.1. >> >> I’m able to improve my throughput by increasing the max pending tuples. >> However, the topology runs for a while and then dies. I get this message in >> the logs: >> >> >> 2016-07-28 16:21:21.946 o.a.s.s.ShellSpout [ERROR] Halting process: >> ShellSpout died. >> java.lang.RuntimeException: subprocess heartbeat timeout >> at >> org.apache.storm.spout.ShellSpout$SpoutHeartbeatTimerTask.run(ShellSpout.java:275) >> [storm-core-1.0.1.jar:1.0.1] >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> [?:1.8.0_91] >> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) >> [?:1.8.0_91] >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) >> [?:1.8.0_91] >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) >> [?:1.8.0_91] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [?:1.8.0_91] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [?:1.8.0_91] >> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91] >> 2016-07-28 16:21:21.955 o.a.s.d.executor [ERROR] >> >> >> >> The bizarre thing to me is that the Storm UI gives no indication of >> what’s going on. No tuples fail. No errors appear. The storm metrics >> just stop changing. The worker processes aren’t restarted. All my statsd >> metrics flatline. It just dies. >> >> Can anyone help me troubleshoot this? From states, I can see that I’m not >> running out of system memory. Perhaps the heap is filling up? >> > >
