I can only think that I'll be using whatever the default is - that's not a
setting that I've changed anywhere or set on any command line. Am I right
in thinking (hoping) that that'll default to one? If not, how do I set it
for the RollingTopWords example?

The cluster is being spun up now but will probably take half an hour or so
before it's ready. I'll reply once it's all up and good to go.

On 13 April 2015 at 18:40, Nathan Leung <[email protected]> wrote:

> That looks about right.  Are you running one worker per supervisor right
> now?
>
> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <[email protected]>
> wrote:
>
>> I was just thinking about that - I'm using EC2 m3.medium instances,
>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>> enable GC logging I'd just need to run Storm with the following options in
>> my 'storm.yaml'?
>>
>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -verbose:gc
>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -verbose:gc
>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>> If so, I'll set up the same cluster but using m3.xlarge instances (15GB
>> of RAM) with those settings. Either way, I'll start setting up the extra
>> large nodes.
>>
>>
>> On 13 April 2015 at 18:10, Nathan Leung <[email protected]> wrote:
>>
>>> Do you have GC logging enabled?  You should have enough ram for each
>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>> check.
>>>
>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>> [email protected]> wrote:
>>>
>>>> Not long at all - certainly less than couple of minutes but longer than
>>>> a few seconds, the couple I actually measured lasted 42 seconds followed by
>>>> about 5 seconds. So the heartbeats folder behaves like so...
>>>>
>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>>>> inside it
>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a couple
>>>> of examples:
>>>>
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>> )
>>>>
>>>> /app/storm/workers/[uuid] is deleted.
>>>>
>>>> I've not seen any heartbeats folder with any more than 2 heartbeats in.
>>>>
>>>>
>>>>
>>>> On 13 April 2015 at 17:00, Nathan Leung <[email protected]> wrote:
>>>>
>>>>> It only matters what user supervisor is run as, so you should be ok.
>>>>> How long is it between the worker being started, and it getting killed?
>>>>> What are the timestamps of the heartbeats in the corresponding worker
>>>>> folder?
>>>>>
>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I'm running storm as a (specially created) storm user - on both the
>>>>>> nimbus and the supervisors the process is showing as being run by the 
>>>>>> storm
>>>>>> user. And I've got the storm local directory set as /app/storm - owned by
>>>>>> storm user and with permissions 700. Thought it might be due to me
>>>>>> submitting the job as a different user but have just terminated the old 
>>>>>> job
>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>> tuples coming through.
>>>>>>
>>>>>> For what it's worth, I've looked in the local data directory on the
>>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>>> folder being created - don't know if that's indicative of heartbeats 
>>>>>> being
>>>>>> processed correctly or if that's created at the start of a worker 
>>>>>> process.
>>>>>>
>>>>>> Is there anything else you want to see that would help you diagnose
>>>>>> this? I can share the UI with you if you think that'd help?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 16:22, Nathan Leung <[email protected]> wrote:
>>>>>>
>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>> worker -> supervisor heartbeats are written using the local filesystem.
>>>>>>> What user are you running as?  In your storm.yaml file, you will see the
>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are 
>>>>>>> running
>>>>>>> as have permissions to write to this folder?
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>> going on:
>>>>>>>>
>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 3] 
>>>>>>>> [4
>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 
>>>>>>>> 14]
>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill 5284.
>>>>>>>> Process is probably already dead.
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 
>>>>>>>> 5] [6
>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port 
>>>>>>>> 6703
>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>> -Dlogfile.name=worker-6703.log 
>>>>>>>> -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 -cp
>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>
>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit more
>>>>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, 
>>>>>>>>> along
>>>>>>>>> with 0 transferred, acked and failed.
>>>>>>>>>
>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the 
>>>>>>>>> boxes and
>>>>>>>>> in troubleshooting things out have been a bit laxer with security 
>>>>>>>>> than I
>>>>>>>>> should have been!
>>>>>>>>>
>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, 
>>>>>>>>> during
>>>>>>>>> that time the following happens:
>>>>>>>>>
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:os.name=Linux
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:os.arch=amd64
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:user.name=storm
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>> environment:user.dir=/
>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000,
>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" 
>>>>>>>>> "-Xmx768m
>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 
>>>>>>>>> 20000,
>>>>>>>>> "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 
>>>>>>>>> 500,
>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" 
>>>>>>>>> "/app/storm",
>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>> "topology.enable.message.timeouts" true, 
>>>>>>>>> "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 
>>>>>>>>> 64,
>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>> "transactional.zookeeper.port" nil, 
>>>>>>>>> "topology.executor.receive.buffer.size"
>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil,
>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>> -Djava.net.preferIPv4Stack=true", 
>>>>>>>>> "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 
>>>>>>>>> 3772,
>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3,
>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", 
>>>>>>>>> "topology.max.spout.pending"
>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", 
>>>>>>>>> "supervisor.slots.ports"
>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, 
>>>>>>>>> "nimbus.task.launch.secs"
>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, 
>>>>>>>>> "topology.message.timeout.secs"
>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 
>>>>>>>>> 30,
>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", 
>>>>>>>>> "drpc.invocations.port"
>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, 
>>>>>>>>> "storm.zookeeper.retry.times"
>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>> "supervisor.worker.timeout.secs" 30, 
>>>>>>>>> "nimbus.file.copy.expiration.secs"
>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, 
>>>>>>>>> "ui.port"
>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true",
>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>> sessionTimeout=20000 
>>>>>>>>> watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>>> 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>>>>> :connected:none
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>> sessionTimeout=20000 
>>>>>>>>> watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session
>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment
>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid =
>>>>>>>>> 0x24cb217be030072, negotiated timeout = 20000
>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> counter:[2 2]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[11 11]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(11)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(11)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(11)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(12)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(12)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(12)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(13)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(13)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(13)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(14)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(14)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(14)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [mike]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> __system ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>> wordGenerator:(15)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>> wordGenerator:(15)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>> wordGenerator:(15)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> __system:[-1 -1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system
>>>>>>>>> ["startup"]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> __system:[-1 -1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>> __system:(-1)
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>> __acker:[1 1]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [jackson]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [mike]
>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>> __system:(-1)
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>>>>> ["startup"]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>> __acker:[1 1]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [jackson]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>> executor __acker:[1 1]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread
>>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>> __acker:(1)
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt __acker:(1)
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [nathan]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [bertels]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>> default [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>>
>>>>>>>>> After that, the logs stop (no more messages) and the UI says that
>>>>>>>>> the 5 executors are now running on another node (and port) - at which 
>>>>>>>>> point
>>>>>>>>> the worker log is created on that node and with that number (same
>>>>>>>>> messages).
>>>>>>>>>
>>>>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>>>>> isn't registered anything as having been emitted. After a short 
>>>>>>>>> period of
>>>>>>>>> time the worker seems to stop without logging anything and so a new 
>>>>>>>>> worker
>>>>>>>>> is started.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Matt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Did you check the worker process logs?  What are the emit counts
>>>>>>>>>> for the components in the UI?
>>>>>>>>>>
>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>> seems strange to me:
>>>>>>>>>>
>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>
>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>> *.<domain>.
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a 
>>>>>>>>>>> nimbus
>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my 
>>>>>>>>>>> particular
>>>>>>>>>>> topology. However, as soon as I try to run the topology in 
>>>>>>>>>>> distributed mode
>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the 
>>>>>>>>>>> topology
>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts.
>>>>>>>>>>>
>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords 
>>>>>>>>>>> example -
>>>>>>>>>>> running locally I get a stream of output but as soon as I specify 
>>>>>>>>>>> that we
>>>>>>>>>>> should run in a distributed fashion I get a success message 
>>>>>>>>>>> (topology
>>>>>>>>>>> successfully submitted). The topology appears on the UI as 
>>>>>>>>>>> successfully
>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>
>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share to
>>>>>>>>>>> show the problem as the whole point is I'm not seeing any errors!
>>>>>>>>>>>
>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up of
>>>>>>>>>>> the supervisor (i.e. it's not logged anything about the topologies 
>>>>>>>>>>> that
>>>>>>>>>>> have been submitted)
>>>>>>>>>>>
>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>> illuminating:
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots:
>>>>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2
>>>>>>>>>>> 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 
>>>>>>>>>>> 12] [13
>>>>>>>>>>> 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment
>>>>>>>>>>> for topology id production-topology-2-1428932653:
>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", 
>>>>>>>>>>> :node->host
>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.compute.amazonaws.com"},
>>>>>>>>>>> :executor->node+port {[2 2] ["bd340179-5905-4233-ad5d-44b47b228177" 
>>>>>>>>>>> 6702],
>>>>>>>>>>> [3 3] ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, 
>>>>>>>>>>> :executor->start-time-secs
>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 
>>>>>>>>>>> 1428934165, [6
>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 
>>>>>>>>>>> 1428934165, [10
>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 
>>>>>>>>>>> 1428934165,
>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>
>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Matt
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Reply via email to