This could be due to your storm.local.dir getting corrupted. You can
delete the contents of this dir and restart the storm cluster (nimbus,
supervisor).
On Wed, Nov 26, 2014, at 01:51 AM, Dimitris Samaras wrote:
> Hi all,
>
> @Harsha, by :
>
> "Everything works fine up with topologies etc, to the point that the
> Storm cluster needs to be restarted. In that case for storm.sh
> (nimbus, super ,ui) to run successfully on a node Storm has to be
> redeployed on that node and reconfigured(storm.yaml)."
>
> i mean that i can deploy a fully functional cluster and run/test the
> topologies properly, everything ok on runtime. If the node gets
> restarted (it runs on VM) due to host pc restart etc., when i execute
> "storm supervisor" for example on a supervisor node to restart it, it
> does not start!
>
> @Samit, the supervisor.log is:
>
> 2014-11-26 11:26:16 b.s.d.supervisor [INFO] Starting supervisor with
> id ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52
> GMT 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:host.name=Ubuntu14super1 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client environment:java.version=1.7.0_72
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.vendor=Oracle Corporation 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre 2014-11-26
> 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/local/sto$
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:java.io.tmpdir=/tmp 2014-11-26 11:35:33 o.a.z.ZooKeeper
> [INFO] Client environment:java.compiler=<NA> 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client environment:os.name=Linux 2014-11-26
> 11:35:33 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd64
> 2014-11-26 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:os.version=3.13.0-40-generic 2014-11-26 11:35:33
> o.a.z.ZooKeeper [INFO] Client environment:user.name=dimsam 2014-11-26
> 11:35:33 o.a.z.ZooKeeper [INFO] Client
> environment:user.home=/home/dimsam 2014-11-26 11:35:33 o.a.z.ZooKeeper
> [INFO] Client environment:user.dir=/usr/local/storm/bin 2014-11-26
> 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52
> GMT 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:host.name=Ubuntu14super1 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.version=1.7.0_72 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server environment:java.vendor=Oracle
> Corporation 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.home=/usr/lib/jvm/java-7-oracle/jre 2014-11-26
> 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.class.path=/usr/local/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm/lib/logback-classic-1.0.6.jar:/usr/local/storm/lib/chill-java-0.3.5.jar:/usr/local/storm/lib/compojure-1.1.3.jar:/usr/l$
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:java.io.tmpdir=/tmp 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server environment:java.compiler=<NA>
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.name=Linux 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer
> [INFO] Server environment:os.arch=amd64 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server
> environment:os.version=3.13.0-40-generic 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server environment:user.name=dimsam
> 2014-11-26 11:35:33 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.home=/home/dimsam 2014-11-26 11:35:33
> o.a.z.s.ZooKeeperServer [INFO] Server
> environment:user.dir=/usr/local/storm/bin 2014-11-26 11:35:33
> b.s.d.supervisor [INFO] Starting Supervisor with conf
> {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper",
> "topology.tick.tuple.freq.secs" nil,
> "topology.builtin.metrics.bucket.size.secs" 60,
> "topology.fall.back.on.java.serialization" true, "topology.ma$
> 2014-11-26 11:35:34 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-11-26 11:35:34 o.a.z.ZooKeeper [INFO] Initiating client
> connection, connectString=195.251.117.209:2181 sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@4dddb4e 2014-11-26 11:35:34
> o.a.z.ClientCnxn [INFO] Opening socket connection to server
> themis.iti.gr/195.251.117.209:2181. Will not attempt to authenticate
> using SASL (unknown error) 2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO]
> Socket connection established to themis.iti.gr/195.251.117.209:2181,
> initiating session 2014-11-26 11:35:34 o.a.z.ClientCnxn [INFO] Session
> establishment complete on server themis.iti.gr/195.251.117.209:2181,
> sessionid = 0x149eb6ae8d10006, negotiated timeout = 20000 2014-11-26
> 11:35:34 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED 2014-11-26 11:35:34 o.a.c.f.s.ConnectionStateManager [WARN]
> There are no ConnectionStateListeners registered. 2014-11-26 11:35:34
> b.s.zookeeper [INFO] Zookeeper state update: :connected:none
> 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] EventThread shut down
> 2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Session: 0x149eb6ae8d10006
> closed 2014-11-26 11:35:35 o.a.c.f.i.CuratorFrameworkImpl [INFO]
> Starting 2014-11-26 11:35:35 o.a.z.ZooKeeper [INFO] Initiating client
> connection, connectString=195.251.117.209:2181/storm
> sessionTimeout=20000
> watcher=org.apache.curator.ConnectionState@4e451d76 2014-11-26
> 11:35:35 o.a.z.ClientCnxn [INFO] Opening socket connection to server
> themis.iti.gr/195.251.117.209:2181. Will not attempt to authenticate
> using SASL (unknown error) 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO]
> Socket connection established to themis.iti.gr/195.251.117.209:2181,
> initiating session 2014-11-26 11:35:35 o.a.z.ClientCnxn [INFO] Session
> establishment complete on server themis.iti.gr/195.251.117.209:2181,
> sessionid = 0x149eb6ae8d10007, negotiated timeout = 20000 2014-11-26
> 11:35:35 o.a.c.f.s.ConnectionStateManager [INFO] State change:
> CONNECTED 2014-11-26 11:35:35 o.a.c.f.s.ConnectionStateManager [WARN]
> There are no ConnectionStateListeners registered. 2014-11-26 11:35:35
> b.s.d.supervisor [INFO] Starting supervisor with id
> ea561988-508d-4593-9873-00f15736a6bf at host Ubuntu14super1 2014-11-26
> 11:35:36 b.s.event [ERROR] Error when processing event
> java.lang.RuntimeException: java.io.EOFException at
> backtype.storm.utils.Utils.deserialize(Utils.java:93)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na] at
> clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] at
> clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na] at
> clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na] at clojure.lang.RestFn.invoke(RestFn.java:397)
> ~[clojure-1.5.1.jar:na] at
> backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at
> java.lang.Thread.run(Thread.java:745) [na:1.7.0_72] Caused by:
> java.io.EOFException: null at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> ~[na:1.7.0_72] at
> backtype.storm.utils.Utils.deserialize(Utils.java:88)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] ... 11 common
> frames omitted 2014-11-26 11:35:36 b.s.event [ERROR] Error when
> processing event java.lang.RuntimeException: java.io.EOFException at
> backtype.storm.utils.Utils.deserialize(Utils.java:93)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.snapshot(LocalState.java:45)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.utils.LocalState.get(LocalState.java:56)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__6330.invoke(supervisor.clj:307)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> backtype.storm.event$event_manager$fn__2378.invoke(event.clj:39)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at
> clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at
> java.lang.Thread.run(Thread.java:745) [na:1.7.0_72] Caused by:
> java.io.EOFException: null at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
> ~[na:1.7.0_72] at
> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> ~[na:1.7.0_72] at
> backtype.storm.utils.Utils.deserialize(Utils.java:88)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] ... 6 common
> frames omitted 2014-11-26 11:35:36 b.s.util [INFO] Halting process:
> ("Error when processing an event")
>
>
> The first line is from when the strom supervisor was running properly!
> After a node restart the supervisor will not start and i get the rest
> of the log....
>
>
> by: "to run successfully on a node, Storm has to be redeployed on that
> node and reconfigured(storm.yaml)." i mean that in order to run the
> supervisor/nimbus again i have to redeploy Storm on every node that
> fails to start! I do not change the config on storm.yaml, simply have
> to rewrite it with the same values.
>
>
> Thanks again!
>
> 2014-11-25 17:53 GMT+02:00 Harsha <[email protected]>:
>> __
>>
>> Dimitris, can you give more details on this " Everything works fine
>> up with topologies etc, to the point that the Storm cluster needs to
>> be restarted. In that case for storm.sh (nimbus, super ,ui) to run
>> successfully on a node Storm has to be redeployed on that node and
>> reconfigured(storm.yaml)."
>>
>>
>> Is the cluster going down when you deploy a topology? "to run
>> successfully on a node Storm has to be redeployed on that node and
>> reconfigured(storm.yaml)."
>>
>> what you mean by reconfiguration do you change the storm.yaml values
>> from previous deployment.
>>
>> -Harsha
>>
>>
>> On Tue, Nov 25, 2014, at 06:24 AM, Samit Sasan wrote:
>>> can you share the logs
>>>
>>> -Samit
>>>
>>> On Tue, Nov 25, 2014 at 6:12 PM, Dimitris Samaras
>>> <[email protected]> wrote:
>>>> Hi all,
>>>>
>>>> We are currently testing Storm framework with 4 VM nodes (1 nimbus
>>>> , 3 supervisors) and a single node zookeeper cluster for the Storm
>>>> cluster management. Everything works fine up with topologies etc,
>>>> to the point that the Storm cluster needs to be restarted. In that
>>>> case for storm.sh (nimbus, super ,ui) to run successfully on a node
>>>> Storm has to be redeployed on that node and
>>>> reconfigured(storm.yaml).
>>>>
>>>> Any thoughts? Thanks in advance, Dimitris
>>>
>>
>