I've got the same error. In a running cluster, I kill the supervisor running on
one of the machines, wait until storm reassigns the topology that was on that
machine (called Sync), and then bring the supervisor up again. It immediately
dies, with the following in the log:
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Synchronizing supervisor
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Worker
21f86017-fed3-4e94-93f4-7ea65ca983e3 is :timed-out:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id
"Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23
32] [1 10] [33 42]}, :port 6703} at supervisor time-secs 1395917412
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Storm code map:
{"Sync-1-1395916991" "/home/storm/storm/nimbus/stormdist/Sync-1-1395916991",
"Async-2-1395916991" "/home/storm/storm/nimbus/stormdist/Async-2-1395916991"}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Downloaded storm ids:
#{"Sync-1-1395916991"}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] All assignment: {}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] New assignment: {}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Writing new assignment {}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Syncing processes
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Assigned executors: {6703
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11 20]
[23 32])}}
2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Allocated:
{"21f86017-fed3-4e94-93f4-7ea65ca983e3" [:timed-out
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id
"Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23
32] [1 10] [33 42]}, :port 6703}]}
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down and clearing state
for id 21f86017-fed3-4e94-93f4-7ea65ca983e3. Current supervisor time:
1395917412. State: :timed-out, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231, :storm-id
"Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1] [11 20] [23
32] [1 10] [33 42]}, :port 6703}
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down
01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Removing code for storm id
Sync-1-1395916991
2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path
/home/storm/storm/supervisor/stormdist/Sync-1-1395916991
2014-03-27 10:50:12 b.s.util [INFO] Error when trying to kill 2160. Process is
probably already dead.
2014-03-27 10:50:12 b.s.util [DEBUG] Removing path
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids/2160
2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/heartbeats
2014-03-27 10:50:12 b.s.util [DEBUG] Removing path
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids
2014-03-27 10:50:12 b.s.util [DEBUG] Removing path
/home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shut down
01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3
2014-03-27 10:50:12 b.s.util [DEBUG] Making dirs at
/home/storm/storm/workers/9e27425e-6d2b-48cf-a592-8dfc0204f332/pids
2014-03-27 10:50:12 b.s.d.supervisor [INFO] Launching worker with assignment
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11 20]
[23 32])} for this supervisor 01e5ad81-57a4-4933-87e5-c487e969b4b3 on port 6703
with id 9e27425e-6d2b-48cf-a592-8dfc0204f332
2014-03-27 10:50:12 b.s.event [ERROR] Error when processing event
java.io.FileNotFoundException: File
'/home/storm/storm/supervisor/stormdist/Sync-1-1395916991/stormconf.ser' does
not exist
at
org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
~[commons-io-1.4.jar:1.4]
at
org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
~[commons-io-1.4.jar:1.4]
at
backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:177)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.daemon.supervisor$fn__6328.invoke(supervisor.clj:410)
~[storm-core-0.9.0.1.jar:na]
at clojure.lang.MultiFn.invoke(MultiFn.java:177)
~[clojure-1.4.0.jar:na]
at
backtype.storm.daemon.supervisor$sync_processes$iter__6219__6223$fn__6224.invoke(supervisor.clj:244)
~[storm-core-0.9.0.1.jar:na]
at clojure.lang.LazySeq.sval(LazySeq.java:42)
~[clojure-1.4.0.jar:na]
at clojure.lang.LazySeq.seq(LazySeq.java:60)
~[clojure-1.4.0.jar:na]
at clojure.lang.RT.seq(RT.java:473) ~[clojure-1.4.0.jar:na]
at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.4.0.jar:na]
at clojure.core$dorun.invoke(core.clj:2725)
~[clojure-1.4.0.jar:na]
at clojure.core$doall.invoke(core.clj:2741)
~[clojure-1.4.0.jar:na]
at
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:232)
~[storm-core-0.9.0.1.jar:na]
at clojure.lang.AFn.applyToHelper(AFn.java:161)
[clojure-1.4.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
at clojure.core$apply.invoke(core.clj:603)
~[clojure-1.4.0.jar:na]
at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
~[clojure-1.4.0.jar:na]
at clojure.lang.RestFn.invoke(RestFn.java:397)
~[clojure-1.4.0.jar:na]
at
backtype.storm.event$event_manager$fn__3072.invoke(event.clj:24)
~[storm-core-0.9.0.1.jar:na]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
2014-03-27 10:50:12 b.s.util [INFO] Halting process: ("Error when processing an
event")
Bringing the supervisor up again after this works, and the supervisor starts
correctly. Any ideas about what may be causing this?
From: Thuy Nguyen [mailto:[email protected]]
Sent: 29 January 2014 02:36
To: [email protected]
Subject: failed to start supervisor with missing stormconf.ser
Hi all,
After supervisor was killed, it failed to be restarted because of missing file
stormconf.ser. The log shows supervisor correctly "Removing code" for the
topology, but then it tried to "Launching worker with assignment" after
/storm-local/supervisor/stormdist/ is already cleaned up, thus, it got
exception about file does not exist. Is it a known bug? Could you please
provide some insights how this happened? After deleting /storm-local directory,
supervisor started successfully.
2014-01-28 16:51:02 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-01-28 16:51:02 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=xxx:2181/storm sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@252a78ee<mailto:watcher=com.netflix.curator.ConnectionState@252a78ee>
2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Opening socket connection to server
xxx:2181
2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Socket connection established to
xxx:2181, initiating session
2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Session establishment complete on
server xxx:2181, sessionid = 0x143b20bc2587b86, negotiated timeout = 20000
2014-01-28 16:51:02 b.s.d.supervisor [INFO] Starting supervisor with id
0aa5860a-248b-4814-b201-7b2f40ce701f at host xxx
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Removing code for storm id
performance_topology-1-1390953444
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down and clearing state
for id 03ccf938-7276-44f7-ab0e-f2cdfb76d394. Current supervisor time:
1390956663. State: :timed-out, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390956441, :storm-id
"performance_topology-1-1390953444", :executors #
{[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [-1 -1] [1 1]}
, :port 6797}
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down
0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394
kill 5817: No such process
2014-01-28 16:51:03 b.s.util [INFO] Error when trying to kill 5817. Process is
probably already dead.
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shut down
0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394
2014-01-28 16:51:03 b.s.d.supervisor [INFO] Launching worker with assignment
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"performance_topology-1-1390953444", :executors ([6 6] [5 5] [7 7] [8 8] [3 3]
[4 4] [2 2] [1 1])} for this supervisor 0aa5860a-248b-4814-b201-7b2f40ce701f on
port 6797 with id 7a34f6bf-5494-4ee7-ba54-bbdfbbcbb861
2014-01-28 16:51:03 b.s.event [ERROR] Error when processing event
java.io.FileNotFoundException: File
'storm-local/supervisor/stormdist/performance_topology-1-1390953444/stormconf.ser'
does not exist
Thanks,
Thuy