Sounds like it's having problems with heartbeats. As I recall worker -> supervisor heartbeats are written using the local filesystem. What user are you running as? In your storm.yaml file, you will see the option "storm.local.dir". Does the user supervisor / workers are running as have permissions to write to this folder?
On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <[email protected]> wrote: > New bonus bit of information - the supervisor log has got this going on: > > 2015-04-13 15:13:48 b.s.d.supervisor [INFO] > 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started > 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing > state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time: > 1428938060. State: :timed-out, Heartbeat: > #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029, > :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4 > 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] > [15 15] [-1 -1] [1 1]}, :port 6703} > 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down > bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 > 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284. > Process is probably already dead. > 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down > bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 > 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with > assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id > "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6 > 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1 > 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703 > with id 786a62ac-2868-4526-a7b2-5c26234bb23e > 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with command: > java -server -Xmx768m -Djava.net.preferIPv4Stack=true > -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib > -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21 > -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml > -Dstorm.id=production-topology-2-1428932653 > -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp > /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar > backtype.storm.daemon.worker production-topology-2-1428932653 > bd340179-5905-4233-ad5d-44b47b228177 6703 > 786a62ac-2868-4526-a7b2-5c26234bb23e > 2015-04-13 15:14:20 b.s.d.supervisor [INFO] > 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started > > On 13 April 2015 at 16:09, Matthew Sharpe <[email protected]> > wrote: > >> OK, in checking the worker process logs I've uncovered a bit more >> information. The UI reports 0 emitted from the wordGenerator spout, along >> with 0 transferred, acked and failed. >> >> Sorry - the *.compute.amazonaws.com is redacted (don't know the usual >> way of doing that), I've got a bunch of other stuff on the boxes and in >> troubleshooting things out have been a bit laxer with security than I >> should have been! >> >> The UI shows 5 workers all starting on the same node - each of which has >> a declared uptime of 1s. Looking at the logs on the node, during that time >> the following happens: >> >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >> environment:java.io.tmpdir=/tmp >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >> environment:java.compiler=<NA> >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment: >> os.name=Linux >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >> environment:os.arch=amd64 >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >> environment:os.version=3.13.0-48-generic >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment: >> user.name=storm >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >> environment:user.home=/app/home/storm >> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >> environment:user.dir=/ >> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for >> production-topology-2-1428932653 on >> bd340179-5905-4233-ad5d-44b47b228177:6702 with id >> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path" >> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, >> "topology.builtin.metrics.bucket.size.secs" 60, >> "topology.fall.back.on.java.serialization" true, >> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, >> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m >> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000, >> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, >> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", >> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", >> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm", >> "supervisor.worker.start.timeout.secs" 120, >> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" >> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, >> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*. >> compute.amazonaws.com", "storm.zookeeper.port" 2181, >> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" >> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" >> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, >> "supervisor.enable" true, "storm.zookeeper.servers" ["*. >> compute.amazonaws.com" "*.compute.amazonaws.com" "*.compute.amazonaws.com"], >> "transactional.zookeeper.root" "/transactional", "topology.acker.executors" >> nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil, >> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m >> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5, >> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, >> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", >> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, >> "topology.tasks" nil, "topology.spout.wait.strategy" >> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending" >> nil, "storm.zookeeper.retry.interval" 1000, " >> topology.sleep.spout.wait.strategy.time.ms" 1, >> "nimbus.topology.validator" >> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" >> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs" >> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" >> 30, "task.refresh.poll.secs" 10, "topology.workers" 1, >> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true", >> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, >> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" >> "backtype.storm.serialization.types.ListDelegateSerializer", >> "topology.disruptor.wait.strategy" >> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, >> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" >> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port" >> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" >> 5, "storm.thrift.transport" >> "backtype.storm.security.auth.SimpleTransportPlugin", >> "topology.state.synchronization.timeout.secs" 60, >> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" >> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq", >> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" >> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true", >> "storm.cluster.mode" "distributed", "topology.optimize" true, >> "topology.max.task.parallelism" nil} >> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting >> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection, >> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181 >> ,*.compute.amazonaws.com:2181 sessionTimeout=20000 >> watcher=com.netflix.curator.ConnectionState@585bd6a6 >> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to >> server *.compute.amazonaws.com/*:2181 >> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established >> to *.compute.amazonaws.com/*:2181, initiating session >> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment >> complete on server *.compute.amazonaws.com/*:2181, sessionid = >> 0x34cb217be05005e, negotiated timeout = 20000 >> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update: >> :connected:none >> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e >> closed >> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting >> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection, >> connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181 >> ,*.compute.amazonaws.com:2181/storm sessionTimeout=20000 >> watcher=com.netflix.curator.ConnectionState@4b492c47 >> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down >> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to >> server *.compute.amazonaws.com/*:2181 >> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established >> to *.compute.amazonaws.com/*:2181, initiating session >> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment >> complete on server *.compute.amazonaws.com/*:2181, sessionid = >> 0x24cb217be030072, negotiated timeout = 20000 >> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport >> plugin:backtype.storm.messaging.zmq >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2] >> >> >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >> wordGenerator:[11 11] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor >> wordGenerator:[11 11] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(11) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(11) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >> wordGenerator:(11) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >> wordGenerator:[12 12] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system >> ["startup"] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >> wordGenerator:[12 12] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor >> wordGenerator:[12 12] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(12) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(12) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >> wordGenerator:(12) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >> wordGenerator:[13 13] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system >> ["startup"] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >> wordGenerator:[13 13] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor >> wordGenerator:[13 13] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(13) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(13) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >> wordGenerator:(13) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >> wordGenerator:[14 14] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system >> ["startup"] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >> wordGenerator:[14 14] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor >> wordGenerator:[14 14] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default >> [bertels] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(14) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(14) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >> wordGenerator:(14) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:11, stream: default, id: {}, [bertels] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default >> [mike] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:12, stream: default, id: {}, [mike] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default >> [bertels] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:13, stream: default, id: {}, [bertels] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >> wordGenerator:[15 15] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system >> ["startup"] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >> wordGenerator:[15 15] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor >> wordGenerator:[15 15] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(15) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(15) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >> wordGenerator:(15) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1 >> -1] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system >> ["startup"] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >> __system:[-1 -1] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor >> __system:[-1 -1] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1) >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default >> [jackson] >> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default >> [mike] >> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:11, stream: default, id: {}, [jackson] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:14, stream: default, id: {}, [mike] >> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default >> [jackson] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:12, stream: default, id: {}, [jackson] >> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default >> [golda] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:13, stream: default, id: {}, [golda] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1) >> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system >> ["startup"] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks >> __acker:[1 1] >> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default >> [jackson] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor >> __acker:[1 1] >> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for >> bd340179-5905-4233-ad5d-44b47b228177:6702 >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1) >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1) >> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default >> [nathan] >> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default >> [bertels] >> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default >> [golda] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:11, stream: default, id: {}, [golda] >> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message >> source: wordGenerator:12, stream: default, id: {}, [nathan] >> >> After that, the logs stop (no more messages) and the UI says that the 5 >> executors are now running on another node (and port) - at which point the >> worker log is created on that node and with that number (same messages). >> >> So the worker seems to start and start emitting words but the UI isn't >> registered anything as having been emitted. After a short period of time >> the worker seems to stop without logging anything and so a new worker is >> started. >> >> Thanks, >> >> Matt >> >> >> >> On 13 April 2015 at 15:36, Nathan Leung <[email protected]> wrote: >> >>> Did you check the worker process logs? What are the emit counts for the >>> components in the UI? >>> >>> Also I have not run on AWS personally but the following line seems >>> strange to me: >>> >>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*. >>> compute.amazonaws.com"} >>> >>> Generally this would be bound to a specific machine, not *.<domain>. >>> >>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> I've set up a small Storm cluster on AWS that is giving me headaches. >>>> At the moment I've got a situation where the cluster (a nimbus node and 3 >>>> supervisors) will successfully run in local mode my particular topology. >>>> However, as soon as I try to run the topology in distributed mode there are >>>> no errors (that I can see) in any of the logs - the topology appears in the >>>> UI but never shows any tuples emitted by the spouts. >>>> >>>> In order to check it wasn't my topology I cloned the storm-starter >>>> project and have tried with the RollingTopWords example - running locally I >>>> get a stream of output but as soon as I specify that we should run in a >>>> distributed fashion I get a success message (topology successfully >>>> submitted). The topology appears on the UI as successfully submitted but I >>>> see no progress. >>>> >>>> It's a bit of a weird one as I don't know which logs to share to show >>>> the problem as the whole point is I'm not seeing any errors! >>>> >>>> The last entry in the supervisor logs pertains to the set up of the >>>> supervisor (i.e. it's not logged anything about the topologies that have >>>> been submitted) >>>> >>>> The last entries in the nimbus logs are probably a bit more >>>> illuminating: >>>> >>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>> production-topology-2-1428932653:[2 2] not alive >>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>> production-topology-2-1428932653:[3 3] not alive >>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>> production-topology-2-1428932653:[4 4] not alive >>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots: >>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700] >>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703] >>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702] >>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701] >>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700] >>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703] >>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702] >>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701] >>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700]) >>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning >>>> production-topology-2-1428932653 to 1 slots >>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 >>>> 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 >>>> 14] [15 15] [1 1]] >>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for >>>> topology id production-topology-2-1428932653: >>>> #backtype.storm.daemon.common.Assignment{:master-code-dir >>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host >>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"}, >>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], >>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1] >>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs >>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6 >>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10 >>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165, >>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}} >>>> >>>> If anybody can help that'd be greatly appreciated! >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> -- >>>> Check out my blog: dogdogfish.com >>>> >>> >>> >> >> >> -- >> Check out my blog: dogdogfish.com >> > > > > -- > Check out my blog: dogdogfish.com >
