Hi,

Which version of Storm and Zookeper are you using? Storm 0.9.1 seems to be
compatible with zookeeper 3.3.3.Try downgrading the zookeeper version to
3.3.3,I think that should resolve the problem.

Thanks
Bijoy


On Thu, Mar 27, 2014 at 4:33 PM, Simon Cooper <
[email protected]> wrote:

>  I've got the same error. In a running cluster, I kill the supervisor
> running on one of the machines, wait until storm reassigns the topology
> that was on that machine (called Sync), and then bring the supervisor up
> again. It immediately dies, with the following in the log:
>
>
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Synchronizing supervisor
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Worker
> 21f86017-fed3-4e94-93f4-7ea65ca983e3 is :timed-out:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231,
> :storm-id "Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1]
> [11 20] [23 32] [1 10] [33 42]}, :port 6703} at supervisor time-secs
> 1395917412
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Storm code map:
> {"Sync-1-1395916991"
> "/home/storm/storm/nimbus/stormdist/Sync-1-1395916991",
> "Async-2-1395916991"
> "/home/storm/storm/nimbus/stormdist/Async-2-1395916991"}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Downloaded storm ids:
> #{"Sync-1-1395916991"}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] All assignment: {}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] New assignment: {}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Writing new assignment {}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Syncing processes
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Assigned executors: {6703
> #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11
> 20] [23 32])}}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [DEBUG] Allocated:
> {"21f86017-fed3-4e94-93f4-7ea65ca983e3" [:timed-out
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231,
> :storm-id "Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1]
> [11 20] [23 32] [1 10] [33 42]}, :port 6703}]}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 21f86017-fed3-4e94-93f4-7ea65ca983e3. Current supervisor time:
> 1395917412. State: :timed-out, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1395917231,
> :storm-id "Sync-1-1395916991", :executors #{[43 43] [21 21] [22 22] [-1 -1]
> [11 20] [23 32] [1 10] [33 42]}, :port 6703}
>
> 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shutting down
> 01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3
>
> 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Removing code for storm id
> Sync-1-1395916991
>
> 2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path
> /home/storm/storm/supervisor/stormdist/Sync-1-1395916991
>
> 2014-03-27 10:50:12 b.s.util [INFO] Error when trying to kill 2160.
> Process is probably already dead.
>
> 2014-03-27 10:50:12 b.s.util [DEBUG] Removing path
> /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids/2160
>
> 2014-03-27 10:50:12 b.s.util [DEBUG] Rmr path
> /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/heartbeats
>
> 2014-03-27 10:50:12 b.s.util [DEBUG] Removing path
> /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3/pids
>
> 2014-03-27 10:50:12 b.s.util [DEBUG] Removing path
> /home/storm/storm/workers/21f86017-fed3-4e94-93f4-7ea65ca983e3
>
> 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Shut down
> 01e5ad81-57a4-4933-87e5-c487e969b4b3:21f86017-fed3-4e94-93f4-7ea65ca983e3
>
> 2014-03-27 10:50:12 b.s.util [DEBUG] Making dirs at
> /home/storm/storm/workers/9e27425e-6d2b-48cf-a592-8dfc0204f332/pids
>
> 2014-03-27 10:50:12 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "Sync-1-1395916991", :executors ([33 42] [22 22] [21 21] [1 10] [43 43] [11
> 20] [23 32])} for this supervisor 01e5ad81-57a4-4933-87e5-c487e969b4b3 on
> port 6703 with id 9e27425e-6d2b-48cf-a592-8dfc0204f332
>
> 2014-03-27 10:50:12 b.s.event [ERROR] Error when processing event
>
> java.io.FileNotFoundException: File
> '/home/storm/storm/supervisor/stormdist/Sync-1-1395916991/stormconf.ser'
> does not exist
>
>                 at
> org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:137)
> ~[commons-io-1.4.jar:1.4]
>
>                 at
> org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1135)
> ~[commons-io-1.4.jar:1.4]
>
>                 at
> backtype.storm.config$read_supervisor_storm_conf.invoke(config.clj:177)
> ~[storm-core-0.9.0.1.jar:na]
>
>                 at
> backtype.storm.daemon.supervisor$fn__6328.invoke(supervisor.clj:410)
> ~[storm-core-0.9.0.1.jar:na]
>
>                 at clojure.lang.MultiFn.invoke(MultiFn.java:177)
> ~[clojure-1.4.0.jar:na]
>
>                 at
> backtype.storm.daemon.supervisor$sync_processes$iter__6219__6223$fn__6224.invoke(supervisor.clj:244)
> ~[storm-core-0.9.0.1.jar:na]
>
>                 at clojure.lang.LazySeq.sval(LazySeq.java:42)
> ~[clojure-1.4.0.jar:na]
>
>                 at clojure.lang.LazySeq.seq(LazySeq.java:60)
> ~[clojure-1.4.0.jar:na]
>
>                 at clojure.lang.RT.seq(RT.java:473) ~[clojure-1.4.0.jar:na]
>
>                 at clojure.core$seq.invoke(core.clj:133)
> ~[clojure-1.4.0.jar:na]
>
>                 at clojure.core$dorun.invoke(core.clj:2725)
> ~[clojure-1.4.0.jar:na]
>
>                 at clojure.core$doall.invoke(core.clj:2741)
> ~[clojure-1.4.0.jar:na]
>
>                 at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:232)
> ~[storm-core-0.9.0.1.jar:na]
>
>                 at clojure.lang.AFn.applyToHelper(AFn.java:161)
> [clojure-1.4.0.jar:na]
>
>                 at clojure.lang.AFn.applyTo(AFn.java:151)
> [clojure-1.4.0.jar:na]
>
>                 at clojure.core$apply.invoke(core.clj:603)
> ~[clojure-1.4.0.jar:na]
>
>                 at clojure.core$partial$fn__4070.doInvoke(core.clj:2343)
> ~[clojure-1.4.0.jar:na]
>
>                 at clojure.lang.RestFn.invoke(RestFn.java:397)
> ~[clojure-1.4.0.jar:na]
>
>                 at
> backtype.storm.event$event_manager$fn__3072.invoke(event.clj:24)
> ~[storm-core-0.9.0.1.jar:na]
>
>                 at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
>
>                 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
>
> 2014-03-27 10:50:12 b.s.util [INFO] Halting process: ("Error when
> processing an event")
>
>
>
> Bringing the supervisor up again after this works, and the supervisor
> starts correctly. Any ideas about what may be causing this?
>
>
>
> *From:* Thuy Nguyen [mailto:[email protected]]
> *Sent:* 29 January 2014 02:36
> *To:* [email protected]
> *Subject:* failed to start supervisor with missing stormconf.ser
>
>
>
> Hi all,
>
>
>
> After supervisor was killed, it failed to be restarted because of missing
> file stormconf.ser. The log shows supervisor correctly "Removing code"
> for the topology, but then it tried to "Launching worker with assignment"
> after /storm-local/supervisor/stormdist/ is already cleaned up, thus, it
> got exception about file does not exist. Is it a known bug? Could you
> please provide some insights how this happened? After deleting /storm-local
> directory, supervisor started successfully.
>
>
>
> 2014-01-28 16:51:02 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-01-28 16:51:02 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=xxx:2181/storm sessionTimeout=20000
> watcher=com.netflix.curator.ConnectionState@252a78ee
> 2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server xxx:2181
> 2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Socket connection established
> to xxx:2181, initiating session
> 2014-01-28 16:51:02 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server xxx:2181, sessionid = 0x143b20bc2587b86, negotiated timeout =
> 20000
> 2014-01-28 16:51:02 b.s.d.supervisor [INFO] Starting supervisor with id
> 0aa5860a-248b-4814-b201-7b2f40ce701f at host xxx
> 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Removing code for storm id
> performance_topology-1-1390953444
> 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 03ccf938-7276-44f7-ab0e-f2cdfb76d394. Current supervisor time:
> 1390956663. State: :timed-out, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390956441,
> :storm-id "performance_topology-1-1390953444", :executors #
>
> {[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [-1 -1] [1 1]}
>
> , :port 6797}
> 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shutting down
> 0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394
> kill 5817: No such process
> 2014-01-28 16:51:03 b.s.util [INFO] Error when trying to kill 5817.
> Process is probably already dead.
> 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Shut down
> 0aa5860a-248b-4814-b201-7b2f40ce701f:03ccf938-7276-44f7-ab0e-f2cdfb76d394
> 2014-01-28 16:51:03 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "performance_topology-1-1390953444", :executors ([6 6] [5 5] [7 7] [8 8] [3
> 3] [4 4] [2 2] [1 1])} for this supervisor
> 0aa5860a-248b-4814-b201-7b2f40ce701f on port 6797 with id
> 7a34f6bf-5494-4ee7-ba54-bbdfbbcbb861
> 2014-01-28 16:51:03 b.s.event [ERROR] Error when processing event
> java.io.FileNotFoundException: File
> 'storm-local/supervisor/stormdist/performance_topology-1-1390953444/stormconf.ser'
> does not exist
>
>
>
> Thanks,
>
> Thuy
>

Reply via email to