Thanks for all your help so far - I've set up a cluster with m3.2xlarge EC2
instances, 30GB RAM and 8 vCPUs which I'm hoping should be enough. I've
added GC logging and have switched to using one worker per supervisor
(have gone from:
- 6700
- 6701
- 6702
- 6703

to

- 6700). I've also switched all of my Java space declarations to the values
used in here: https://github.com/apache/storm/blob/master/conf/defaults.yaml
i.e. nimbus: 1024m, ui : 768m, supervisor: 256m, worker : 768m

I'm still not getting any tuples emitted - the supervisor GC log:

Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
(1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 30824244k(27561384k free), swap 0k(0k free)
CommandLine flags: -XX:InitialHeapSize=268435456 -XX:MaxHeapSize=268435456
-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+UseCompressedOops -XX:+UseParallelGC
2.058: [GC [PSYoungGen: 66048K->10199K(76800K)] 66048K->10207K(251392K),
0.0158580 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
3.010: [GC [PSYoungGen: 76247K->10746K(76800K)] 76255K->11131K(251392K),
0.0171470 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
264.012: [GC [PSYoungGen: 76794K->7879K(76800K)] 77179K->8272K(251392K),
0.0114030 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

The worker GC log:

Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE
(1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0
20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 30824244k(27356064k free), swap 0k(0k free)
CommandLine flags: -XX:InitialHeapSize=493187904 -XX:MaxHeapSize=805306368
-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+UseCompressedOops -XX:+UseParallelGC
2.861: [GC [PSYoungGen: 120832K->16150K(140800K)] 120832K->16158K(461824K),
0.0233930 secs] [Times: user=0.07 sys=0.03, real=0.02 secs]

The supervisor log:

2015-04-14 10:04:24 b.s.d.supervisor [INFO]
f5888c5a-d36f-4d6b-8b67-4cf20203efef still hasn't started
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down and clearing
state for id f5888c5a-d36f-4d6b-8b67-4cf20203efef. Current supervisor time:
1429005898. State: :timed-out, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429005865,
:storm-id "production-topology-1-1429005259", :executors #{[2 2] [3 3] [4
4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
[15 15] [-1 -1] [1 1]}, :port 6700}
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down
5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
2015-04-14 10:04:58 b.s.util [INFO] Error when trying to kill 30755.
Process is probably already dead.
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shut down
5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"production-topology-1-1429005259", :executors ([2 2] [3 3] [4 4] [5 5] [6
6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1
1])} for this supervisor 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e on port 6700
with id e7cfe011-56a4-42d4-8331-4cb055055799
2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with command:
java -server -Xmx768m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-verbose:gc -Xloggc:/app/storm/gc-storm-worker-6700.log
-Djava.net.preferIPv4Stack=true
 -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
-Dlogfile.name=worker-6700.log -Dstorm.home=/usr/local/storm-0.9.0-wip21
-Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
-Dstorm.id=production-topology-1-1429005259
-Dworker.id=e7cfe011-56a4-42d4-8331-4cb055055799 -Dworker.port=6700 -cp
/usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-1-1429005259/stormjar.jar
backtype.storm.daemon.worker production-topology-1-1429005259
5a1f1d44-4d4f-4309-9ed3-d753da1dda9e 6700
e7cfe011-56a4-42d4-8331-4cb055055799
2015-04-14 10:04:58 b.s.d.supervisor [INFO]
e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started
2015-04-14 10:04:59 b.s.d.supervisor [INFO]
e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started

The worker log: https://gist.github.com/Kali89/2a3d3aea401f8d1bb195

The nimbus log:

2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[2 2] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[3 3] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[4 4] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[5 5] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[6 6] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[7 7] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[8 8] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[9 9] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[10 10] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[11 11] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[12 12] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[13 13] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[14 14] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[15 15] not alive
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor
production-topology-1-1429005259:[1 1] not alive
2015-04-14 10:11:27 b.s.s.EvenScheduler [INFO] Available slots:
(["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]
["5f1c9cea-3fcc-4d4e-aa9e-650d837e6a49" 6700])
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassigning
production-topology-1-1429005259 to 1 slots
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3] [4
4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14]
[15 15] [1 1]]
2015-04-14 10:11:27 b.s.d.nimbus [INFO] Setting new assignment for topology
id production-topology-1-1429005259:
#backtype.storm.daemon.common.Assignment{:master-code-dir
"/app/storm/nimbus/stormdist/production-topology-1-1429005259", :node->host
{"eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" "
ec2-52-16-76-40.eu-west-1.compute.amazonaws.com"}, :executor->node+port {[2
2] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [3 3]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [4 4]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [5 5]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [6 6]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [7 7]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [8 8]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [9 9]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [10 10]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [11 11]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [12 12]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [13 13]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [14 14]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [15 15]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [1 1]
["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]}, :executor->start-time-secs
{[2 2] 1429006287, [3 3] 1429006287, [4 4] 1429006287, [5 5] 1429006287, [6
6] 1429006287, [7 7] 1429006287, [8 8] 1429006287, [9 9] 1429006287, [10
10] 1429006287, [11 11] 1429006287, [12 12] 1429006287, [13 13] 1429006287,
[14 14] 1429006287, [15 15] 1429006287, [1 1] 1429006287}}

Sorry for the data dump! For what it's worth - I'm now wondering how the
supervisor and worker relationship works? With one 756MB worker per
supervisor (node) I'm well below the machine limit but am I hitting another
sort of limit? Does the supervisor daemon on the supervisor node have
enough memory with 256MB? Is there a relationship between the worker memory
and the supervisor memory?

Thanks!


On 13 April 2015 at 18:58, Nathan Leung <[email protected]> wrote:

> No, the default is 4.  If you are running multiple workers on the same
> supervisor node, that could cause problems with 768 MB / worker and 1 GB /
> supervisor host.
>
> On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <[email protected]>
> wrote:
>
>> I can only think that I'll be using whatever the default is - that's not
>> a setting that I've changed anywhere or set on any command line. Am I right
>> in thinking (hoping) that that'll default to one? If not, how do I set it
>> for the RollingTopWords example?
>>
>> The cluster is being spun up now but will probably take half an hour or
>> so before it's ready. I'll reply once it's all up and good to go.
>>
>> On 13 April 2015 at 18:40, Nathan Leung <[email protected]> wrote:
>>
>>> That looks about right.  Are you running one worker per supervisor right
>>> now?
>>>
>>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <
>>> [email protected]> wrote:
>>>
>>>> I was just thinking about that - I'm using EC2 m3.medium instances,
>>>> they're not especially RAM heavy (1GB). Am I right in thinking that to
>>>> enable GC logging I'd just need to run Storm with the following options in
>>>> my 'storm.yaml'?
>>>>
>>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log"
>>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -verbose:gc
>>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log"
>>>> If so, I'll set up the same cluster but using m3.xlarge instances (15GB
>>>> of RAM) with those settings. Either way, I'll start setting up the extra
>>>> large nodes.
>>>>
>>>>
>>>> On 13 April 2015 at 18:10, Nathan Leung <[email protected]> wrote:
>>>>
>>>>> Do you have GC logging enabled?  You should have enough ram for each
>>>>> worker (for this example) that it's not a problem, but doesn't hurt to
>>>>> check.
>>>>>
>>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Not long at all - certainly less than couple of minutes but longer
>>>>>> than a few seconds, the couple I actually measured lasted 42 seconds
>>>>>> followed by about 5 seconds. So the heartbeats folder behaves like so...
>>>>>>
>>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder
>>>>>> inside it
>>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a
>>>>>> couple of examples:
>>>>>>
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941026297.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:03 1428941027240.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240
>>>>>>
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941398235.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235
>>>>>> -rw-r--r-- 1 storm storm    0 Apr 13 16:09 1428941399163.version
>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163
>>>>>> )
>>>>>>
>>>>>> /app/storm/workers/[uuid] is deleted.
>>>>>>
>>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats
>>>>>> in.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13 April 2015 at 17:00, Nathan Leung <[email protected]> wrote:
>>>>>>
>>>>>>> It only matters what user supervisor is run as, so you should be
>>>>>>> ok.  How long is it between the worker being started, and it getting
>>>>>>> killed?  What are the timestamps of the heartbeats in the corresponding
>>>>>>> worker folder?
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I'm running storm as a (specially created) storm user - on both the
>>>>>>>> nimbus and the supervisors the process is showing as being run by the 
>>>>>>>> storm
>>>>>>>> user. And I've got the storm local directory set as /app/storm - owned 
>>>>>>>> by
>>>>>>>> storm user and with permissions 700. Thought it might be due to me
>>>>>>>> submitting the job as a different user but have just terminated the 
>>>>>>>> old job
>>>>>>>> and submitted the new one as the storm user and still not getting any
>>>>>>>> tuples coming through.
>>>>>>>>
>>>>>>>> For what it's worth, I've looked in the local data directory on the
>>>>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats
>>>>>>>> folder being created - don't know if that's indicative of heartbeats 
>>>>>>>> being
>>>>>>>> processed correctly or if that's created at the start of a worker 
>>>>>>>> process.
>>>>>>>>
>>>>>>>> Is there anything else you want to see that would help you diagnose
>>>>>>>> this? I can share the UI with you if you think that'd help?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Sounds like it's having problems with heartbeats.  As I recall
>>>>>>>>> worker -> supervisor heartbeats are written using the local 
>>>>>>>>> filesystem.
>>>>>>>>> What user are you running as?  In your storm.yaml file, you will see 
>>>>>>>>> the
>>>>>>>>> option "storm.local.dir".  Does the user supervisor / workers are 
>>>>>>>>> running
>>>>>>>>> as have permissions to write to this folder?
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> New bonus bit of information - the supervisor log has got this
>>>>>>>>>> going on:
>>>>>>>>>>
>>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO]
>>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and
>>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current
>>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat:
>>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029,
>>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 
>>>>>>>>>> 3] [4
>>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 
>>>>>>>>>> 14]
>>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703}
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down
>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill
>>>>>>>>>> 5284. Process is probably already dead.
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down
>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>>>> assignment 
>>>>>>>>>> #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 
>>>>>>>>>> 5] [6
>>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] 
>>>>>>>>>> [1
>>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on 
>>>>>>>>>> port 6703
>>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with
>>>>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true
>>>>>>>>>>  -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>> -Dlogfile.name=worker-6703.log 
>>>>>>>>>> -Dstorm.home=/usr/local/storm-0.9.0-wip21
>>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml
>>>>>>>>>> -Dstorm.id=production-topology-2-1428932653
>>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 
>>>>>>>>>> -cp
>>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar
>>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653
>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703
>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e
>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO]
>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started
>>>>>>>>>>
>>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit
>>>>>>>>>>> more information. The UI reports 0 emitted from the wordGenerator 
>>>>>>>>>>> spout,
>>>>>>>>>>> along with 0 transferred, acked and failed.
>>>>>>>>>>>
>>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the
>>>>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the 
>>>>>>>>>>> boxes and
>>>>>>>>>>> in troubleshooting things out have been a bit laxer with security 
>>>>>>>>>>> than I
>>>>>>>>>>> should have been!
>>>>>>>>>>>
>>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of
>>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, 
>>>>>>>>>>> during
>>>>>>>>>>> that time the following happens:
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:java.io.tmpdir=/tmp
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:java.compiler=<NA>
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:os.name=Linux
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:os.arch=amd64
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:os.version=3.13.0-48-generic
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:user.name=storm
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:user.home=/app/home/storm
>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server
>>>>>>>>>>> environment:user.dir=/
>>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for
>>>>>>>>>>> production-topology-2-1428932653 on
>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id
>>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path"
>>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil,
>>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60,
>>>>>>>>>>> "topology.fall.back.on.java.serialization" true,
>>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 
>>>>>>>>>>> 5000,
>>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" 
>>>>>>>>>>> "-Xmx768m
>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" 
>>>>>>>>>>> 20000,
>>>>>>>>>>> "nimbus.reassign" true, 
>>>>>>>>>>> "topology.trident.batch.emit.interval.millis" 500,
>>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m",
>>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib",
>>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" 
>>>>>>>>>>> "/app/storm",
>>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120,
>>>>>>>>>>> "topology.enable.message.timeouts" true, 
>>>>>>>>>>> "nimbus.cleanup.inbox.freq.secs"
>>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 
>>>>>>>>>>> 64,
>>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*.
>>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181,
>>>>>>>>>>> "transactional.zookeeper.port" nil, 
>>>>>>>>>>> "topology.executor.receive.buffer.size"
>>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root"
>>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000,
>>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*.
>>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*.
>>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root"
>>>>>>>>>>> "/transactional", "topology.acker.executors" nil,
>>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" 
>>>>>>>>>>> nil,
>>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m
>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", 
>>>>>>>>>>> "supervisor.heartbeat.frequency.secs" 5,
>>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, 
>>>>>>>>>>> "drpc.port" 3772,
>>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m",
>>>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 
>>>>>>>>>>> 3,
>>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy"
>>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", 
>>>>>>>>>>> "topology.max.spout.pending"
>>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, "
>>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1,
>>>>>>>>>>> "nimbus.topology.validator"
>>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", 
>>>>>>>>>>> "supervisor.slots.ports"
>>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, 
>>>>>>>>>>> "nimbus.task.launch.secs"
>>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, 
>>>>>>>>>>> "topology.message.timeout.secs"
>>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1,
>>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true",
>>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05,
>>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer"
>>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer",
>>>>>>>>>>> "topology.disruptor.wait.strategy"
>>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", 
>>>>>>>>>>> "nimbus.task.timeout.secs" 30,
>>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory"
>>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", 
>>>>>>>>>>> "drpc.invocations.port"
>>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, 
>>>>>>>>>>> "storm.zookeeper.retry.times"
>>>>>>>>>>> 5, "storm.thrift.transport"
>>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin",
>>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60,
>>>>>>>>>>> "supervisor.worker.timeout.secs" 30, 
>>>>>>>>>>> "nimbus.file.copy.expiration.secs"
>>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq",
>>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, 
>>>>>>>>>>> "ui.port"
>>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m 
>>>>>>>>>>> -Djava.net.preferIPv4Stack=true",
>>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true,
>>>>>>>>>>> "topology.max.task.parallelism" nil}
>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>> Starting
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181
>>>>>>>>>>> sessionTimeout=20000 
>>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@585bd6a6
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>> session
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181,
>>>>>>>>>>> sessionid = 0x34cb217be05005e, negotiated timeout = 20000
>>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update:
>>>>>>>>>>> :connected:none
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session:
>>>>>>>>>>> 0x34cb217be05005e closed
>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO]
>>>>>>>>>>> Starting
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client
>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*.
>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm
>>>>>>>>>>> sessionTimeout=20000 
>>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@4b492c47
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket
>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection
>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating
>>>>>>>>>>> session
>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session
>>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181,
>>>>>>>>>>> sessionid = 0x24cb217be030072, negotiated timeout = 20000
>>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer
>>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> counter:[2 2]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[11 11]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[11 11]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(11)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[12 12]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[12 12]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(12)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[13 13]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[13 13]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(13)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[14 14]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[14 14]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(14)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [mike]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> wordGenerator:[15 15]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor wordGenerator:[15 15]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout
>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout
>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout
>>>>>>>>>>> wordGenerator:(15)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system
>>>>>>>>>>> __system ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> __system:[-1 -1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor __system:[-1 -1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>> __system:(-1)
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor
>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [jackson]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [mike]
>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>> __system:(-1)
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system
>>>>>>>>>>> ["startup"]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks
>>>>>>>>>>> __acker:[1 1]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [jackson]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading
>>>>>>>>>>> executor __acker:[1 1]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread
>>>>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt
>>>>>>>>>>> __acker:(1)
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt
>>>>>>>>>>> __acker:(1)
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [nathan]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [bertels]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator
>>>>>>>>>>> default [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda]
>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received
>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan]
>>>>>>>>>>>
>>>>>>>>>>> After that, the logs stop (no more messages) and the UI says
>>>>>>>>>>> that the 5 executors are now running on another node (and port) - 
>>>>>>>>>>> at which
>>>>>>>>>>> point the worker log is created on that node and with that number 
>>>>>>>>>>> (same
>>>>>>>>>>> messages).
>>>>>>>>>>>
>>>>>>>>>>> So the worker seems to start and start emitting words but the UI
>>>>>>>>>>> isn't registered anything as having been emitted. After a short 
>>>>>>>>>>> period of
>>>>>>>>>>> time the worker seems to stop without logging anything and so a new 
>>>>>>>>>>> worker
>>>>>>>>>>> is started.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Matt
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Did you check the worker process logs?  What are the emit
>>>>>>>>>>>> counts for the components in the UI?
>>>>>>>>>>>>
>>>>>>>>>>>> Also I have not run on AWS personally but the following line
>>>>>>>>>>>> seems strange to me:
>>>>>>>>>>>>
>>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>> compute.amazonaws.com"}
>>>>>>>>>>>>
>>>>>>>>>>>> Generally this would be bound to a specific machine, not
>>>>>>>>>>>> *.<domain>.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me
>>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster 
>>>>>>>>>>>>> (a nimbus
>>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my 
>>>>>>>>>>>>> particular
>>>>>>>>>>>>> topology. However, as soon as I try to run the topology in 
>>>>>>>>>>>>> distributed mode
>>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the 
>>>>>>>>>>>>> topology
>>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the 
>>>>>>>>>>>>> spouts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order to check it wasn't my topology I cloned the
>>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords 
>>>>>>>>>>>>> example -
>>>>>>>>>>>>> running locally I get a stream of output but as soon as I specify 
>>>>>>>>>>>>> that we
>>>>>>>>>>>>> should run in a distributed fashion I get a success message 
>>>>>>>>>>>>> (topology
>>>>>>>>>>>>> successfully submitted). The topology appears on the UI as 
>>>>>>>>>>>>> successfully
>>>>>>>>>>>>> submitted but I see no progress.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share
>>>>>>>>>>>>> to show the problem as the whole point is I'm not seeing any 
>>>>>>>>>>>>> errors!
>>>>>>>>>>>>>
>>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up
>>>>>>>>>>>>> of the supervisor (i.e. it's not logged anything about the 
>>>>>>>>>>>>> topologies that
>>>>>>>>>>>>> have been submitted)
>>>>>>>>>>>>>
>>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more
>>>>>>>>>>>>> illuminating:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor
>>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available
>>>>>>>>>>>>> slots: (["bd340179-5905-4233-ad5d-44b47b228177" 6702]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701]
>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701]
>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700])
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning
>>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors:
>>>>>>>>>>>>> [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] 
>>>>>>>>>>>>> [12 12]
>>>>>>>>>>>>> [13 13] [14 14] [15 15] [1 1]]
>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment
>>>>>>>>>>>>> for topology id production-topology-2-1428932653:
>>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", 
>>>>>>>>>>>>> :node->host
>>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*.
>>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1]
>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, 
>>>>>>>>>>>>> :executor->start-time-secs
>>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] 
>>>>>>>>>>>>> 1428934165, [6
>>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] 
>>>>>>>>>>>>> 1428934165, [10
>>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] 
>>>>>>>>>>>>> 1428934165,
>>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}}
>>>>>>>>>>>>>
>>>>>>>>>>>>> If anybody can help that'd be greatly appreciated!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Check out my blog: dogdogfish.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Check out my blog: dogdogfish.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Check out my blog: dogdogfish.com
>>>>
>>>
>>>
>>
>>
>> --
>> Check out my blog: dogdogfish.com
>>
>
>


-- 
Check out my blog: dogdogfish.com

Reply via email to