It only matters what user supervisor is run as, so you should be ok.  How
long is it between the worker being started, and it getting killed?  What
are the timestamps of the heartbeats in the corresponding worker folder?

On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <[email protected]>
wrote:

> I'm running storm as a (specially created) storm user - on both the nimbus
> and the supervisors the process is showing as being run by the storm user.
> And I've got the storm local directory set as /app/storm - owned by storm
> user and with permissions 700. Thought it might be due to me submitting the
> job as a different user but have just terminated the old job and submitted
> the new one as the storm user and still not getting any tuples coming
> through.
>
> For what it's worth, I've looked in the local data directory on the
> supervisors and in the workers folder there is a (non-empty) heartbeats
> folder being created - don't know if that's indicative of heartbeats being
> processed correctly or if that's created at the start of a worker process.
>
> Is there anything else you want to see that would help you diagnose this?
> I can share the UI with you if you think that'd help?
>
> Thanks!
>
>
> On 13 April 2015 at 16:22, Nathan Leung <[email protected]> wrote:
>
>> Sounds like it's having problems with heartbeats.  As I recall worker ->
>> supervisor heartbeats are written using the local filesystem.  What user
>> are you running as?  In your storm.yaml file, you will see the option
>> "storm.local.dir".  Does the user supervisor / workers are running as have
>> permissions to write to this folder?
>>
>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>> [email protected]> wrote:
>>
>>> New bonus bit of information - the supervisor log has got this going on:
>>>
>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and clearing
>>> state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current supervisor time:
>>> 1428938060. State: :timed-out, Heartbeat:
>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] [4
>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>> Process is probably already dead.
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 5] [6
>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 6703
>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>> -Dlogfile.name=worker-6703.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>> -Dstorm.id=production-topology-2-1428932653
>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>
>>> On 13 April 2015 at 16:09, Matthew Sharpe <[email protected]>
>>> wrote:
>>>
>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>> information. The UI reports 0 emitted from the wordGenerator spout, along
>>>> with 0 transferred, acked and failed.
>>>>
>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the usual
>>>> way of doing that), I've got a bunch of other stuff on the boxes and in
>>>> troubleshooting things out have been a bit laxer with security than I
>>>> should have been!
>>>>
>>>> The UI shows 5 workers all starting on the same node - each of which
>>>> has a declared uptime of 1s. Looking at the logs on the node, during that
>>>> time the following happens:
>>>>
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:java.io.tmpdir=/tmp
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:java.compiler=<NA>
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>> os.name=Linux
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:os.arch=amd64
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:os.version=3.13.0-48-generic
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
>>>> user.name=storm
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:user.home=/app/home/storm
>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>> environment:user.dir=/
>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>> production-topology-2-1428932653 on
>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>> "topology.fall.back.on.java.serialization" true,
>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
>>>> "supervisor.worker.start.timeout.secs" 120,
>>>> "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>> "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>> "/transactional", "topology.acker.executors" nil,
>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>> -Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>> "nimbus.topology.validator"
>>>> "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
>>>> [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
>>>> 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>> "topology.disruptor.wait.strategy"
>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>> "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
>>>> 5, "storm.thrift.transport"
>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>> "topology.state.synchronization.timeout.secs" 60,
>>>> "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>> "topology.max.task.parallelism" nil}
>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>> to server *.compute.amazonaws.com/*:2181
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>> :connected:none
>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e
>>>> closed
>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>> sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@4b492c47
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection
>>>> to server *.compute.amazonaws.com/*:2181
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
>>>> plugin:backtype.storm.messaging.zmq
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2]
>>>>
>>>>
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[11 11]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[11 11]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(11)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(11)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(11)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[12 12]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[12 12]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[12 12]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(12)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(12)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(12)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[13 13]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[13 13]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[13 13]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(13)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(13)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(13)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[14 14]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[14 14]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[14 14]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(14)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(14)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(14)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [mike]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:12, stream: default, id: {}, [mike]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>> wordGenerator:[15 15]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> wordGenerator:[15 15]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> wordGenerator:[15 15]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>> wordGenerator:(15)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>> wordGenerator:(15)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>> wordGenerator:(15)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1
>>>> -1]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>> __system:[-1 -1]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
>>>> __system:[-1 -1]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [jackson]
>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [mike]
>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:14, stream: default, id: {}, [mike]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:13, stream: default, id: {}, [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>> ["startup"]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>> __acker:[1 1]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [jackson]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
>>>> __acker:[1 1]
>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
>>>> bd340179-5905-4233-ad5d-44b47b228177:6702
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [nathan]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [bertels]
>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
>>>> [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:11, stream: default, id: {}, [golda]
>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
>>>> source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>
>>>> After that, the logs stop (no more messages) and the UI says that the 5
>>>> executors are now running on another node (and port) - at which point the
>>>> worker log is created on that node and with that number (same messages).
>>>>
>>>> So the worker seems to start and start emitting words but the UI isn't
>>>> registered anything as having been emitted. After a short period of time
>>>> the worker seems to stop without logging anything and so a new worker is
>>>> started.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>
>>>>
>>>> On 13 April 2015 at 15:36, Nathan Leung <[email protected]> wrote:
>>>>
>>>>> Did you check the worker process logs?  What are the emit counts for
>>>>> the components in the UI?
>>>>>
>>>>> Also I have not run on AWS personally but the following line seems
>>>>> strange to me:
>>>>>
>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>> compute.amazonaws.com"}
>>>>>
>>>>> Generally this would be bound to a specific machine, not *.<domain>.
>>>>>
>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've set up a small Storm cluster on AWS that is giving me headaches.
>>>>>> At the moment I've got a situation where the cluster (a nimbus node and 3
>>>>>> supervisors) will successfully run in local mode my particular topology.
>>>>>> However, as soon as I try to run the topology in distributed mode there 
>>>>>> are
>>>>>> no errors (that I can see) in any of the logs - the topology appears in 
>>>>>> the
>>>>>> UI but never shows any tuples emitted by the spouts.
>>>>>>
>>>>>> In order to check it wasn't my topology I cloned the storm-starter
>>>>>> project and have tried with the RollingTopWords example - running 
>>>>>> locally I
>>>>>> get a stream of output but as soon as I specify that we should run in a
>>>>>> distributed fashion I get a success message (topology successfully
>>>>>> submitted). The topology appears on the UI as successfully submitted but 
>>>>>> I
>>>>>> see no progress.
>>>>>>
>>>>>> It's a bit of a weird one as I don't know which logs to share to show
>>>>>> the problem as the whole point is I'm not seeing any errors!
>>>>>>
>>>>>> The last entry in the supervisor logs pertains to the set up of the
>>>>>> supervisor (i.e. it's not logged anything about the topologies that have
>>>>>> been submitted)
>>>>>>
>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>> illuminating:
>>>>>>
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3
>>>>>> 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] 
>>>>>> [14
>>>>>> 14] [15 15] [1 1]]
>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>>>>>> topology id production-topology-2-1428932653:
>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", 
>>>>>> :node->host
>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 
>>>>>> 6702],
>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, 
>>>>>> :executor->start-time-secs
>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, 
>>>>>> [6
>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 
>>>>>> 1428934165,
>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>
>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>>
>>> --
>>> Check out my blog: dogdogfish.com
>>>
>>
>>
>
>
> --
> Check out my blog: dogdogfish.com
>

Reply via email to