I've (just about) solved this.

My problem was with reverse DNS lookup on my supervisor node - on adding
HOSTNAME - IP mappings in /etc/hosts  I was able to move on and get a
different error message: https://issues.apache.org/jira/browse/STORM-130

I've updated my version of Storm (to 0.9.3) and cleared Zookeeper and local
data directories and am now able to successfully run the
RollingWordsTopology on a cluster.

Thanks for your help Nathan :)

Now to solve the latest problem...

On 14 April 2015 at 11:30, Matthew Sharpe <[email protected]> wrote:

> Additional bit at the end of the worker GC log:
>
> 2.551: [GC [PSYoungGen: 120832K->16195K(140800K)]
> 120832K->16203K(461824K), 0.0217550 secs] [Times: user=0.09 sys=0.01,
> real=0.02 secs]
> Heap
>  PSYoungGen      total 140800K, used 36644K [0x00000000f0000000,
> 0x00000000f9d00000, 0x0000000100000000)
>   eden space 120832K, 16% used
> [0x00000000f0000000,0x00000000f13f8428,0x00000000f7600000)
>   from space 19968K, 81% used
> [0x00000000f7600000,0x00000000f85d0c90,0x00000000f8980000)
>   to   space 19968K, 0% used
> [0x00000000f8980000,0x00000000f8980000,0x00000000f9d00000)
>  ParOldGen       total 321024K, used 8K [0x00000000d0000000,
> 0x00000000e3980000, 0x00000000f0000000)
>   object space 321024K, 0% used
> [0x00000000d0000000,0x00000000d0002000,0x00000000e3980000)
>  PSPermGen       total 22528K, used 22196K [0x00000000cae00000,
> 0x00000000cc400000, 0x00000000d0000000)
>   object space 22528K, 98% used
> [0x00000000cae00000,0x00000000cc3ad218,0x00000000cc400000)
>
> On 14 April 2015 at 11:15, Matthew Sharpe <[email protected]>
> wrote:
>
>> Thanks for all your help so far - I've set up a cluster with m3.2xlarge
>> EC2 instances, 30GB RAM and 8 vCPUs which I'm hoping should be enough. I've
>> added GC logging and have switched to using one worker per supervisor
>> (have gone from:
>> - 6700
>> - 6701
>> - 6702
>> - 6703
>>
>> to
>>
>> - 6700). I've also switched all of my Java space declarations to the
>> values used in here:
>> https://github.com/apache/storm/blob/master/conf/defaults.yaml
>> i.e. nimbus: 1024m, ui : 768m, supervisor: 256m, worker : 768m
>>
>> I'm still not getting any tuples emitted - the supervisor GC log:
>>
>> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
>> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
>> 20080428 (Red Hat 4.3.0-8)
>> Memory: 4k page, physical 30824244k(27561384k free), swap 0k(0k free)
>> CommandLine flags: -XX:InitialHeapSize=268435456
>> -XX:MaxHeapSize=268435456 -XX:+PrintGC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC
>> 2.058: [GC [PSYoungGen: 66048K->10199K(76800K)] 66048K->10207K(251392K),
>> 0.0158580 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
>> 3.010: [GC [PSYoungGen: 76247K->10746K(76800K)] 76255K->11131K(251392K),
>> 0.0171470 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
>> 264.012: [GC [PSYoungGen: 76794K->7879K(76800K)] 77179K->8272K(251392K),
>> 0.0114030 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]
>>
>> The worker GC log:
>>
>> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
>> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
>> 20080428 (Red Hat 4.3.0-8)
>> Memory: 4k page, physical 30824244k(27356064k free), swap 0k(0k free)
>> CommandLine flags: -XX:InitialHeapSize=493187904
>> -XX:MaxHeapSize=805306368 -XX:+PrintGC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC
>> 2.861: [GC [PSYoungGen: 120832K->16150K(140800K)]
>> 120832K->16158K(461824K), 0.0233930 secs] [Times: user=0.07 sys=0.03,
>> real=0.02 secs]
>>
>> The supervisor log:
>>
>> 2015-04-14 10:04:24 b.s.d.supervisor [INFO]
>> f5888c5a-d36f-4d6b-8b67-4cf20203efef still hasn't started
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down and clearing
>> state for id f5888c5a-d36f-4d6b-8b67-4cf20203efef. Current supervisor time:
>> 1429005898. State: :timed-out, Heartbeat:
>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429005865,
>> :storm-id "production-topology-1-1429005259", :executors #{[2 2] [3 3] [4
>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>> [15 15] [-1 -1] [1 1]}, :port 6700}
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down
>> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
>> 2015-04-14 10:04:58 b.s.util [INFO] Error when trying to kill 30755.
>> Process is probably already dead.
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shut down
>> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with
>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>> "production-topology-1-1429005259", :executors ([2 2] [3 3] [4 4] [5 5] [6
>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
>> 1])} for this supervisor 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e on port 6700
>> with id e7cfe011-56a4-42d4-8331-4cb055055799
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with
>> command: java -server -Xmx768m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -verbose:gc -Xloggc:/app/storm/gc-storm-worker-6700.log
>> -Djava.net.preferIPv4Stack=true
>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>> -Dlogfile.name=worker-6700.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>> -Dstorm.id=production-topology-1-1429005259
>> -Dworker.id=e7cfe011-56a4-42d4-8331-4cb055055799 -Dworker.port=6700 -cp
>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-1-1429005259/stormjar.jar
>> backtype.storm.daemon.worker production-topology-1-1429005259
>> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e 6700
>> e7cfe011-56a4-42d4-8331-4cb055055799
>> 2015-04-14 10:04:58 b.s.d.supervisor [INFO]
>> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
>> 2015-04-14 10:04:59 b.s.d.supervisor [INFO]
>> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
>>
>> The worker log: https://gist.github.com/Kali89/2a3d3aea401f8d1bb195
>>
>> The nimbus log:
>>
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[2 2] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[3 3] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[4 4] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[5 5] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[6 6] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[7 7] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[8 8] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[9 9] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[10 10] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[11 11] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[12 12] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[13 13] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[14 14] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[15 15] not alive
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
>> production-topology-1-1429005259:[1 1] not alive
>> 2015-04-14 10:11:27 b.s.s.EvenScheduler [INFO] Available slots:
>> (["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]
>> ["5f1c9cea-3fcc-4d4e-aa9e-650d837e6a49" 6700])
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassigning
>> production-topology-1-1429005259 to 1 slots
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3]
>> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
>> [15 15] [1 1]]
>> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Setting new assignment for
>> topology id production-topology-1-1429005259:
>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>> "/app/storm/nimbus/stormdist/production-topology-1-1429005259", :node->host
>> {"eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" "
>> ec2-52-16-76-40.eu-west-1.compute.amazonaws.com"}, :executor->node+port
>> {[2 2] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [3 3]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [4 4]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [5 5]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [6 6]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [7 7]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [8 8]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [9 9]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [10 10]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [11 11]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [12 12]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [13 13]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [14 14]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [15 15]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [1 1]
>> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]}, :executor->start-time-secs
>> {[2 2] 1429006287, [3 3] 1429006287, [4 4] 1429006287, [5 5] 1429006287, [6
>> 6] 1429006287, [7 7] 1429006287, [8 8] 1429006287, [9 9] 1429006287, [10
>> 10] 1429006287, [11 11] 1429006287, [12 12] 1429006287, [13 13] 1429006287,
>> [14 14] 1429006287, [15 15] 1429006287, [1 1] 1429006287}}
>>
>> Sorry for the data dump! For what it's worth - I'm now wondering how the
>> supervisor and worker relationship works? With one 756MB worker per
>> supervisor (node) I'm well below the machine limit but am I hitting another
>> sort of limit? Does the supervisor daemon on the supervisor node have
>> enough memory with 256MB? Is there a relationship between the worker memory
>> and the supervisor memory?
>>
>> Thanks!
>>
>>
>> On 13 April 2015 at 18:58, Nathan Leung <[email protected]> wrote:
>>
>>> No, the default is 4.  If you are running multiple workers on the same
>>> supervisor node, that could cause problems with 768 MB / worker and 1 GB /
>>> supervisor host.
>>>
>>> On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <
>>> [email protected]> wrote:
>>>
>>>> I can only think that I'll be using whatever the default is - that's
>>>> not a setting that I've changed anywhere or set on any command line. Am I
>>>> right in thinking (hoping) that that'll default to one? If not, how do I
>>>> set it for the RollingTopWords example?
>>>>
>>>> The cluster is being spun up now but will probably take half an hour or
>>>> so before it's ready. I'll reply once it's all up and good to go.
>>>>
>>>> On 13 April 2015 at 18:40, Nathan Leung <[email protected]> wrote:
>>>>
>>>>> That looks about right.  Are you running one worker per supervisor
>>>>> right now?
>>>>>
>>>>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I was just thinking about that - I'm using EC2 m3.medium instances,
>>>>>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>>>>>> enable GC logging I'd just need to run Storm with the following options 
>>>>>> in
>>>>>> my 'storm.yaml'?
>>>>>>
>>>>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>>>>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>>>>>> If so, I'll set up the same cluster but using m3.xlarge instances
>>>>>> (15GB of RAM) with those settings. Either way, I'll start setting up the
>>>>>> extra large nodes.
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 18:10, Nathan Leung <[email protected]> wrote:
>>>>>>
>>>>>>> Do you have GC logging enabled?  You should have enough ram for each
>>>>>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>>>>>> check.
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Not long at all - certainly less than couple of minutes but longer
>>>>>>>> than a few seconds, the couple I actually measured lasted 42 seconds
>>>>>>>> followed by about 5 seconds. So the heartbeats folder behaves like 
>>>>>>>> so...
>>>>>>>>
>>>>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats
>>>>>>>> folder inside it
>>>>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a
>>>>>>>> couple of examples:
>>>>>>>>
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>>>>>
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>>>>>> )
>>>>>>>>
>>>>>>>> /app/storm/workers/[uuid] is deleted.
>>>>>>>>
>>>>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats
>>>>>>>> in.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 April 2015 at 17:00, Nathan Leung <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> It only matters what user supervisor is run as, so you should be
>>>>>>>>> ok.  How long is it between the worker being started, and it getting
>>>>>>>>> killed?  What are the timestamps of the heartbeats in the 
>>>>>>>>> corresponding
>>>>>>>>> worker folder?
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I'm running storm as a (specially created) storm user - on both
>>>>>>>>>> the nimbus and the supervisors the process is showing as being run 
>>>>>>>>>> by the
>>>>>>>>>> storm user. And I've got the storm local directory set as /app/storm 
>>>>>>>>>> -
>>>>>>>>>> owned by storm user and with permissions 700. Thought it might be 
>>>>>>>>>> due to me
>>>>>>>>>> submitting the job as a different user but have just terminated the 
>>>>>>>>>> old job
>>>>>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>>>>>> tuples coming through.
>>>>>>>>>>
>>>>>>>>>> For what it's worth, I've looked in the local data directory on
>>>>>>>>>> the supervisors and in the workers folder there is a (non-empty) 
>>>>>>>>>> heartbeats
>>>>>>>>>> folder being created - don't know if that's indicative of heartbeats 
>>>>>>>>>> being
>>>>>>>>>> processed correctly or if that's created at the start of a worker 
>>>>>>>>>> process.
>>>>>>>>>>
>>>>>>>>>> Is there anything else you want to see that would help you
>>>>>>>>>> diagnose this? I can share the UI with you if you think that'd help?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>>>>>> worker -> supervisor heartbeats are written using the local 
>>>>>>>>>>> filesystem.
>>>>>>>>>>> What user are you running as?  In your storm.yaml file, you will 
>>>>>>>>>>> see the
>>>>>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are 
>>>>>>>>>>> running
>>>>>>>>>>> as have permissions to write to this folder?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>>>>>> going on:
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 
>>>>>>>>>>>> 1428938029,
>>>>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] 
>>>>>>>>>>>> [3 3] [4
>>>>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] 
>>>>>>>>>>>> [14 14]
>>>>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill
>>>>>>>>>>>> 5284. Process is probably already dead.
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker
>>>>>>>>>>>> with assignment 
>>>>>>>>>>>> #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] 
>>>>>>>>>>>> [5 5] [6
>>>>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 
>>>>>>>>>>>> 15] [1
>>>>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on 
>>>>>>>>>>>> port 6703
>>>>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker
>>>>>>>>>>>> with command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>>> -Dlogfile.name=worker-6703.log 
>>>>>>>>>>>> -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e 
>>>>>>>>>>>> -Dworker.port=6703 -cp
>>>>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>>>>>
>>>>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit
>>>>>>>>>>>>> more information. The UI reports 0 emitted from the wordGenerator 
>>>>>>>>>>>>> spout,
>>>>>>>>>>>>> along with 0 transferred, acked and failed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know
>>>>>>>>>>>>> the usual way of doing that), I've got a bunch of other stuff on 
>>>>>>>>>>>>> the boxes
>>>>>>>>>>>>> and in troubleshooting things out have been a bit laxer with 
>>>>>>>>>>>>> security than
>>>>>>>>>>>>> I should have been!
>>>>>>>>>>>>>
>>>>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the 
>>>>>>>>>>>>> node, during
>>>>>>>>>>>>> that time the following happens:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:os.name=Linux
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:os.arch=amd64
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:user.name=storm
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>>>> environment:user.dir=/
>>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf 
>>>>>>>>>>>>> {"dev.zookeeper.path"
>>>>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 
>>>>>>>>>>>>> 5000,
>>>>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" 
>>>>>>>>>>>>> "-Xmx768m
>>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", 
>>>>>>>>>>>>> "storm.zookeeper.session.timeout" 20000,
>>>>>>>>>>>>> "nimbus.reassign" true, 
>>>>>>>>>>>>> "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" 
>>>>>>>>>>>>> "/app/storm",
>>>>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>>>>>> "topology.enable.message.timeouts" true, 
>>>>>>>>>>>>> "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, 
>>>>>>>>>>>>> "drpc.worker.threads" 64,
>>>>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>>>>>> "transactional.zookeeper.port" nil, 
>>>>>>>>>>>>> "topology.executor.receive.buffer.size"
>>>>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, 
>>>>>>>>>>>>> "storm.zookeeper.root"
>>>>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" 
>>>>>>>>>>>>> nil,
>>>>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", 
>>>>>>>>>>>>> "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, 
>>>>>>>>>>>>> "drpc.port" 3772,
>>>>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" 
>>>>>>>>>>>>> "-Xmx768m",
>>>>>>>>>>>>> "topology.receiver.buffer.size" 8, 
>>>>>>>>>>>>> "task.heartbeat.frequency.secs" 3,
>>>>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", 
>>>>>>>>>>>>> "topology.max.spout.pending"
>>>>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", 
>>>>>>>>>>>>> "supervisor.slots.ports"
>>>>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, 
>>>>>>>>>>>>> "nimbus.task.launch.secs"
>>>>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, 
>>>>>>>>>>>>> "topology.message.timeout.secs"
>>>>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", 
>>>>>>>>>>>>> "nimbus.task.timeout.secs" 30,
>>>>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, 
>>>>>>>>>>>>> "topology.kryo.factory"
>>>>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", 
>>>>>>>>>>>>> "drpc.invocations.port"
>>>>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, 
>>>>>>>>>>>>> "storm.zookeeper.retry.times"
>>>>>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>>>>>> "supervisor.worker.timeout.secs" 30, 
>>>>>>>>>>>>> "nimbus.file.copy.expiration.secs"
>>>>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, 
>>>>>>>>>>>>> "ui.port"
>>>>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m 
>>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true",
>>>>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>>>> Starting
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>>>>>> sessionTimeout=20000 
>>>>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>>>> session
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>>>> establishment complete on server *.
>>>>>>>>>>>>> compute.amazonaws.com/*:2181, sessionid = 0x34cb217be05005e,
>>>>>>>>>>>>> negotiated timeout = 20000
>>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state
>>>>>>>>>>>>> update: :connected:none
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>>>> Starting
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>>>>>> sessionTimeout=20000 
>>>>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut
>>>>>>>>>>>>> down
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>>>> session
>>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>>>> establishment complete on server *.
>>>>>>>>>>>>> compute.amazonaws.com/*:2181, sessionid = 0x24cb217be030072,
>>>>>>>>>>>>> negotiated timeout = 20000
>>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> counter:[2 2]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[11 11]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[12 12]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[13 13]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[14 14]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, 
>>>>>>>>>>>>> [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, 
>>>>>>>>>>>>> [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks wordGenerator:[15 15]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks __system:[-1 -1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>>>> __system:(-1)
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, 
>>>>>>>>>>>>> [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, 
>>>>>>>>>>>>> [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>>>> __system:(-1)
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker
>>>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor
>>>>>>>>>>>>> tasks __acker:[1 1]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [jackson]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>>>> executor __acker:[1 1]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching
>>>>>>>>>>>>> receive-thread for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>>>> __acker:(1)
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>>>> __acker:(1)
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [nathan]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [bertels]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>>>> default [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, 
>>>>>>>>>>>>> [nathan]
>>>>>>>>>>>>>
>>>>>>>>>>>>> After that, the logs stop (no more messages) and the UI says
>>>>>>>>>>>>> that the 5 executors are now running on another node (and port) - 
>>>>>>>>>>>>> at which
>>>>>>>>>>>>> point the worker log is created on that node and with that number 
>>>>>>>>>>>>> (same
>>>>>>>>>>>>> messages).
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the worker seems to start and start emitting words but the
>>>>>>>>>>>>> UI isn't registered anything as having been emitted. After a 
>>>>>>>>>>>>> short period
>>>>>>>>>>>>> of time the worker seems to stop without logging anything and so 
>>>>>>>>>>>>> a new
>>>>>>>>>>>>> worker is started.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did you check the worker process logs?  What are the emit
>>>>>>>>>>>>>> counts for the components in the UI?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>>>>>> seems strange to me:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>>>>>> *.<domain>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster 
>>>>>>>>>>>>>>> (a nimbus
>>>>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my 
>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>> topology. However, as soon as I try to run the topology in 
>>>>>>>>>>>>>>> distributed mode
>>>>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the 
>>>>>>>>>>>>>>> topology
>>>>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the 
>>>>>>>>>>>>>>> spouts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords 
>>>>>>>>>>>>>>> example -
>>>>>>>>>>>>>>> running locally I get a stream of output but as soon as I 
>>>>>>>>>>>>>>> specify that we
>>>>>>>>>>>>>>> should run in a distributed fashion I get a success message 
>>>>>>>>>>>>>>> (topology
>>>>>>>>>>>>>>> successfully submitted). The topology appears on the UI as 
>>>>>>>>>>>>>>> successfully
>>>>>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to
>>>>>>>>>>>>>>> share to show the problem as the whole point is I'm not seeing 
>>>>>>>>>>>>>>> any errors!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up
>>>>>>>>>>>>>>> of the supervisor (i.e. it's not logged anything about the 
>>>>>>>>>>>>>>> topologies that
>>>>>>>>>>>>>>> have been submitted)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>>>>>> illuminating:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available
>>>>>>>>>>>>>>> slots: (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors:
>>>>>>>>>>>>>>> [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 
>>>>>>>>>>>>>>> 11] [12 12]
>>>>>>>>>>>>>>> [13 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new
>>>>>>>>>>>>>>> assignment for topology id production-topology-2-1428932653:
>>>>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", 
>>>>>>>>>>>>>>> :node->host
>>>>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, 
>>>>>>>>>>>>>>> :executor->start-time-secs
>>>>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 
>>>>>>>>>>>>>>> 1428934165, [6
>>>>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 
>>>>>>>>>>>>>>> 1428934165, [10
>>>>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 
>>>>>>>>>>>>>>> 1428934165,
>>>>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>
>
> --
> Check out my blog: dogdogfish.com
>



-- 
Check out my blog: dogdogfish.com

Reply via email to