OK, in checking the worker process logs I've uncovered a bit more
information. The UI reports 0 emitted from the wordGenerator spout, along
with 0 transferred, acked and failed.

Sorry - the *.compute.amazonaws.com is redacted (don't know the usual way
of doing that), I've got a bunch of other stuff on the boxes and in
troubleshooting things out have been a bit laxer with security than I
should have been!

The UI shows 5 workers all starting on the same node - each of which has a
declared uptime of 1s. Looking at the logs on the node, during that time
the following happens:

2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.io.tmpdir=/tmp
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:java.compiler=<NA>
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
os.name=Linux
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.arch=amd64
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:os.version=3.13.0-48-generic
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server environment:
user.name=storm
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.home=/app/home/storm
2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
environment:user.dir=/
2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
production-topology-2-1428932653 on
bd340179-5905-4233-ad5d-44b47b228177:6702 with id
ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
"/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
"topology.builtin.metrics.bucket.size.secs" 60,
"topology.fall.back.on.java.serialization" true,
"topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
"topology.skip.missing.kryo.registrations" false, "ui.childopts" "-Xmx768m
-Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 20000,
"nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500,
"nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
"java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
"topology.executor.send.buffer.size" 1024, "storm.local.dir" "/app/storm",
"supervisor.worker.start.timeout.secs" 120,
"topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs"
600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64,
"topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
compute.amazonaws.com", "storm.zookeeper.port" 2181,
"transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size"
1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
"/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
"supervisor.enable" true, "storm.zookeeper.servers" ["*.
compute.amazonaws.com" "*.compute.amazonaws.com" "*.compute.amazonaws.com"],
"transactional.zookeeper.root" "/transactional", "topology.acker.executors"
nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
"drpc.queue.size" 128, "worker.childopts" "-Xmx768m
-Djava.net.preferIPv4Stack=true", "supervisor.heartbeat.frequency.secs" 5,
"topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772,
"supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
"topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
"topology.tasks" nil, "topology.spout.wait.strategy"
"backtype.storm.spout.SleepSpoutWaitStrategy", "topology.max.spout.pending"
nil, "storm.zookeeper.retry.interval" 1000, "
topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator"
"backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports"
[6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs"
120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs"
30, "task.refresh.poll.secs" 10, "topology.workers" 1,
"supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
"nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
"worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
"backtype.storm.serialization.types.ListDelegateSerializer",
"topology.disruptor.wait.strategy"
"com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30,
"storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
"backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port"
3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times"
5, "storm.thrift.transport"
"backtype.storm.security.auth.SimpleTransportPlugin",
"topology.state.synchronization.timeout.secs" 60,
"supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs"
600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
"drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port"
8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
"storm.cluster.mode" "distributed", "topology.optimize" true,
"topology.max.task.parallelism" nil}
2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181,*.
compute.amazonaws.com:2181 sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@585bd6a6
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
server *.compute.amazonaws.com/*:2181
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
to *.compute.amazonaws.com/*:2181, initiating session
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment complete
on server *.compute.amazonaws.com/*:2181, sessionid = 0x34cb217be05005e,
negotiated timeout = 20000
2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
:connected:none
2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: 0x34cb217be05005e closed
2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=*.compute.amazonaws.com:2181,*.compute.amazonaws.com:2181,*.
compute.amazonaws.com:2181/storm sessionTimeout=20000
watcher=com.netflix.curator.ConnectionState@4b492c47
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket connection to
server *.compute.amazonaws.com/*:2181
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection established
to *.compute.amazonaws.com/*:2181, initiating session
2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment complete
on server *.compute.amazonaws.com/*:2181, sessionid = 0x24cb217be030072,
negotiated timeout = 20000
2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer transport
plugin:backtype.storm.messaging.zmq
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor counter:[2 2]


2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[11 11]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[11 11]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(11)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(11)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(11)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[12 12]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[12 12]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[12 12]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(12)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(12)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(12)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[13 13]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[13 13]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[13 13]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(13)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(13)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(13)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[14 14]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[14 14]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[14 14]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
[bertels]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(14)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(14)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(14)
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:11, stream: default, id: {}, [bertels]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default [mike]
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:12, stream: default, id: {}, [mike]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
[bertels]
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:13, stream: default, id: {}, [bertels]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
wordGenerator:[15 15]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
wordGenerator:[15 15]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
wordGenerator:[15 15]
2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout wordGenerator:(15)
2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout wordGenerator:(15)
2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
wordGenerator:(15)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __system:[-1 -1]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
["startup"]
2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
__system:[-1 -1]
2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading executor
__system:[-1 -1]
2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt __system:(-1)
2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor __acker:[1 1]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default
[jackson]
2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator default [mike]
2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received message
source: wordGenerator:11, stream: default, id: {}, [jackson]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:14, stream: default, id: {}, [mike]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[jackson]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:12, stream: default, id: {}, [jackson]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:13, stream: default, id: {}, [golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __system:(-1)
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system ["startup"]
2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks __acker:[1
1]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[jackson]
2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading executor
__acker:[1 1]
2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread for
bd340179-5905-4233-ad5d-44b47b228177:6702
2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt __acker:(1)
2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[nathan]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[bertels]
2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator default
[golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:11, stream: default, id: {}, [golda]
2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received message
source: wordGenerator:12, stream: default, id: {}, [nathan]

After that, the logs stop (no more messages) and the UI says that the 5
executors are now running on another node (and port) - at which point the
worker log is created on that node and with that number (same messages).

So the worker seems to start and start emitting words but the UI isn't
registered anything as having been emitted. After a short period of time
the worker seems to stop without logging anything and so a new worker is
started.

Thanks,

Matt



On 13 April 2015 at 15:36, Nathan Leung <[email protected]> wrote:

> Did you check the worker process logs?  What are the emit counts for the
> components in the UI?
>
> Also I have not run on AWS personally but the following line seems strange
> to me:
>
> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
> compute.amazonaws.com"}
>
> Generally this would be bound to a specific machine, not *.<domain>.
>
> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <[email protected]
> > wrote:
>
>> Hi,
>>
>> I've set up a small Storm cluster on AWS that is giving me headaches. At
>> the moment I've got a situation where the cluster (a nimbus node and 3
>> supervisors) will successfully run in local mode my particular topology.
>> However, as soon as I try to run the topology in distributed mode there are
>> no errors (that I can see) in any of the logs - the topology appears in the
>> UI but never shows any tuples emitted by the spouts.
>>
>> In order to check it wasn't my topology I cloned the storm-starter
>> project and have tried with the RollingTopWords example - running locally I
>> get a stream of output but as soon as I specify that we should run in a
>> distributed fashion I get a success message (topology successfully
>> submitted). The topology appears on the UI as successfully submitted but I
>> see no progress.
>>
>> It's a bit of a weird one as I don't know which logs to share to show the
>> problem as the whole point is I'm not seeing any errors!
>>
>> The last entry in the supervisor logs pertains to the set up of the
>> supervisor (i.e. it's not logged anything about the topologies that have
>> been submitted)
>>
>> The last entries in the nimbus logs are probably a bit more illuminating:
>>
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>> production-topology-2-1428932653:[2 2] not alive
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>> production-topology-2-1428932653:[3 3] not alive
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>> production-topology-2-1428932653:[4 4] not alive
>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>> production-topology-2-1428932653 to 1 slots
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3]
>> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>> [15 15] [1 1]]
>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment for
>> topology id production-topology-2-1428932653:
>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", :node->host
>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 6702],
>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, :executor->start-time-secs
>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 1428934165, [6
>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 1428934165, [10
>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 1428934165,
>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>
>> If anybody can help that'd be greatly appreciated!
>>
>> Thanks,
>>
>> Matt
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Reply via email to