We are currently running 7 node storm cluster, 1 nimbus and 6 supervisor
nodes all running storm 0.9.2, running 3 topologies. Any time we kill a
running topology the supervisors across all nodes start flapping and we end
up in a mess. To clean this up we end up killing all running topologies,
shutdown the supervisors, cleanup the storm/storm-local directories on all
supervisor nodes, restart the supervisor processes then restart the
topologies.

Has anyone experienced this issues, or have any ideas on how to resolve it.

Log snippet we see in the supervisor logs when this happens...

2015-03-25 11:28:13 b.s.d.supervisor [INFO] Shutting down
4d971d4b-a208-4758-a55e-3e8b34d7531f:ce049d5c-fd4c-499c-ad8d-ef1d8f2b992b
2015-03-25 11:28:13 b.s.event [ERROR] Error when processing event
java.io.IOException: . doesn't exist.
        at
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:157)
~[commons-exec-1.1.jar:1.1]
        at
org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147)
~[commons-exec-1.1.jar:1.1]
        at backtype.storm.util$exec_command_BANG_.invoke(util.clj:378)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at
backtype.storm.util$ensure_process_killed_BANG_.invoke(util.clj:394)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at
backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:175)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:240)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.applyToHelper(AFn.java:161)
~[clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.5.1.jar:na]
        at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
        at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
~[clojure-1.5.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397)
~[clojure-1.5.1.jar:na]
        at backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
2015-03-25 11:28:13 b.s.util [INFO] Halting process: ("Error when
processing an event")


There does not be any thing corresponding to this in the worker logs.'

Ideas??

Thanks
Justin

Reply via email to