I've (just about) solved this. My problem was with reverse DNS lookup on my supervisor node - on adding HOSTNAME - IP mappings in /etc/hosts I was able to move on and get a different error message: https://issues.apache.org/jira/browse/STORM-130
I've updated my version of Storm (to 0.9.3) and cleared Zookeeper and local data directories and am now able to successfully run the RollingWordsTopology on a cluster. Thanks for your help Nathan :) Now to solve the latest problem... On 14 April 2015 at 11:30, Matthew Sharpe <[email protected]> wrote: > Additional bit at the end of the worker GC log: > > 2.551: [GC [PSYoungGen: 120832K->16195K(140800K)] > 120832K->16203K(461824K), 0.0217550 secs] [Times: user=0.09 sys=0.01, > real=0.02 secs] > Heap > PSYoungGen total 140800K, used 36644K [0x00000000f0000000, > 0x00000000f9d00000, 0x0000000100000000) > eden space 120832K, 16% used > [0x00000000f0000000,0x00000000f13f8428,0x00000000f7600000) > from space 19968K, 81% used > [0x00000000f7600000,0x00000000f85d0c90,0x00000000f8980000) > to space 19968K, 0% used > [0x00000000f8980000,0x00000000f8980000,0x00000000f9d00000) > ParOldGen total 321024K, used 8K [0x00000000d0000000, > 0x00000000e3980000, 0x00000000f0000000) > object space 321024K, 0% used > [0x00000000d0000000,0x00000000d0002000,0x00000000e3980000) > PSPermGen total 22528K, used 22196K [0x00000000cae00000, > 0x00000000cc400000, 0x00000000d0000000) > object space 22528K, 98% used > [0x00000000cae00000,0x00000000cc3ad218,0x00000000cc400000) > > On 14 April 2015 at 11:15, Matthew Sharpe <[email protected]> > wrote: > >> Thanks for all your help so far - I've set up a cluster with m3.2xlarge >> EC2 instances, 30GB RAM and 8 vCPUs which I'm hoping should be enough. I've >> added GC logging and have switched to using one worker per supervisor >> (have gone from: >> - 6700 >> - 6701 >> - 6702 >> - 6703 >> >> to >> >> - 6700). I've also switched all of my Java space declarations to the >> values used in here: >> https://github.com/apache/storm/blob/master/conf/defaults.yaml >> i.e. nimbus: 1024m, ui : 768m, supervisor: 256m, worker : 768m >> >> I'm still not getting any tuples emitted - the supervisor GC log: >> >> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE >> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0 >> 20080428 (Red Hat 4.3.0-8) >> Memory: 4k page, physical 30824244k(27561384k free), swap 0k(0k free) >> CommandLine flags: -XX:InitialHeapSize=268435456 >> -XX:MaxHeapSize=268435456 -XX:+PrintGC -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC >> 2.058: [GC [PSYoungGen: 66048K->10199K(76800K)] 66048K->10207K(251392K), >> 0.0158580 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] >> 3.010: [GC [PSYoungGen: 76247K->10746K(76800K)] 76255K->11131K(251392K), >> 0.0171470 secs] [Times: user=0.07 sys=0.01, real=0.02 secs] >> 264.012: [GC [PSYoungGen: 76794K->7879K(76800K)] 77179K->8272K(251392K), >> 0.0114030 secs] [Times: user=0.08 sys=0.00, real=0.01 secs] >> >> The worker GC log: >> >> Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE >> (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0 >> 20080428 (Red Hat 4.3.0-8) >> Memory: 4k page, physical 30824244k(27356064k free), swap 0k(0k free) >> CommandLine flags: -XX:InitialHeapSize=493187904 >> -XX:MaxHeapSize=805306368 -XX:+PrintGC -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC >> 2.861: [GC [PSYoungGen: 120832K->16150K(140800K)] >> 120832K->16158K(461824K), 0.0233930 secs] [Times: user=0.07 sys=0.03, >> real=0.02 secs] >> >> The supervisor log: >> >> 2015-04-14 10:04:24 b.s.d.supervisor [INFO] >> f5888c5a-d36f-4d6b-8b67-4cf20203efef still hasn't started >> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down and clearing >> state for id f5888c5a-d36f-4d6b-8b67-4cf20203efef. Current supervisor time: >> 1429005898. State: :timed-out, Heartbeat: >> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429005865, >> :storm-id "production-topology-1-1429005259", :executors #{[2 2] [3 3] [4 >> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] >> [15 15] [-1 -1] [1 1]}, :port 6700} >> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down >> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef >> 2015-04-14 10:04:58 b.s.util [INFO] Error when trying to kill 30755. >> Process is probably already dead. >> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shut down >> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef >> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with >> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id >> "production-topology-1-1429005259", :executors ([2 2] [3 3] [4 4] [5 5] [6 >> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1 >> 1])} for this supervisor 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e on port 6700 >> with id e7cfe011-56a4-42d4-8331-4cb055055799 >> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with >> command: java -server -Xmx768m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps >> -verbose:gc -Xloggc:/app/storm/gc-storm-worker-6700.log >> -Djava.net.preferIPv4Stack=true >> -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >> -Dlogfile.name=worker-6700.log -Dstorm.home=/usr/local/storm-0.9.0-wip21 >> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml >> -Dstorm.id=production-topology-1-1429005259 >> -Dworker.id=e7cfe011-56a4-42d4-8331-4cb055055799 -Dworker.port=6700 -cp >> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-1-1429005259/stormjar.jar >> backtype.storm.daemon.worker production-topology-1-1429005259 >> 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e 6700 >> e7cfe011-56a4-42d4-8331-4cb055055799 >> 2015-04-14 10:04:58 b.s.d.supervisor [INFO] >> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started >> 2015-04-14 10:04:59 b.s.d.supervisor [INFO] >> e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started >> >> The worker log: https://gist.github.com/Kali89/2a3d3aea401f8d1bb195 >> >> The nimbus log: >> >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[2 2] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[3 3] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[4 4] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[5 5] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[6 6] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[7 7] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[8 8] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[9 9] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[10 10] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[11 11] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[12 12] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[13 13] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[14 14] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[15 15] not alive >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor >> production-topology-1-1429005259:[1 1] not alive >> 2015-04-14 10:11:27 b.s.s.EvenScheduler [INFO] Available slots: >> (["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700] >> ["5f1c9cea-3fcc-4d4e-aa9e-650d837e6a49" 6700]) >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassigning >> production-topology-1-1429005259 to 1 slots >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3] >> [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] >> [15 15] [1 1]] >> 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Setting new assignment for >> topology id production-topology-1-1429005259: >> #backtype.storm.daemon.common.Assignment{:master-code-dir >> "/app/storm/nimbus/stormdist/production-topology-1-1429005259", :node->host >> {"eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" " >> ec2-52-16-76-40.eu-west-1.compute.amazonaws.com"}, :executor->node+port >> {[2 2] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [3 3] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [4 4] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [5 5] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [6 6] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [7 7] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [8 8] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [9 9] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [10 10] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [11 11] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [12 12] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [13 13] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [14 14] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [15 15] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [1 1] >> ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]}, :executor->start-time-secs >> {[2 2] 1429006287, [3 3] 1429006287, [4 4] 1429006287, [5 5] 1429006287, [6 >> 6] 1429006287, [7 7] 1429006287, [8 8] 1429006287, [9 9] 1429006287, [10 >> 10] 1429006287, [11 11] 1429006287, [12 12] 1429006287, [13 13] 1429006287, >> [14 14] 1429006287, [15 15] 1429006287, [1 1] 1429006287}} >> >> Sorry for the data dump! For what it's worth - I'm now wondering how the >> supervisor and worker relationship works? With one 756MB worker per >> supervisor (node) I'm well below the machine limit but am I hitting another >> sort of limit? Does the supervisor daemon on the supervisor node have >> enough memory with 256MB? Is there a relationship between the worker memory >> and the supervisor memory? >> >> Thanks! >> >> >> On 13 April 2015 at 18:58, Nathan Leung <[email protected]> wrote: >> >>> No, the default is 4. If you are running multiple workers on the same >>> supervisor node, that could cause problems with 768 MB / worker and 1 GB / >>> supervisor host. >>> >>> On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe < >>> [email protected]> wrote: >>> >>>> I can only think that I'll be using whatever the default is - that's >>>> not a setting that I've changed anywhere or set on any command line. Am I >>>> right in thinking (hoping) that that'll default to one? If not, how do I >>>> set it for the RollingTopWords example? >>>> >>>> The cluster is being spun up now but will probably take half an hour or >>>> so before it's ready. I'll reply once it's all up and good to go. >>>> >>>> On 13 April 2015 at 18:40, Nathan Leung <[email protected]> wrote: >>>> >>>>> That looks about right. Are you running one worker per supervisor >>>>> right now? >>>>> >>>>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe < >>>>> [email protected]> wrote: >>>>> >>>>>> I was just thinking about that - I'm using EC2 m3.medium instances, >>>>>> they're not especially RAM heavy (1GB). Am I right in thinking that to >>>>>> enable GC logging I'd just need to run Storm with the following options >>>>>> in >>>>>> my 'storm.yaml'? >>>>>> >>>>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails >>>>>> -XX:+PrintGCTimeStamps -verbose:gc >>>>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log" >>>>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails >>>>>> -XX:+PrintGCTimeStamps -verbose:gc >>>>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log" >>>>>> If so, I'll set up the same cluster but using m3.xlarge instances >>>>>> (15GB of RAM) with those settings. Either way, I'll start setting up the >>>>>> extra large nodes. >>>>>> >>>>>> >>>>>> On 13 April 2015 at 18:10, Nathan Leung <[email protected]> wrote: >>>>>> >>>>>>> Do you have GC logging enabled? You should have enough ram for each >>>>>>> worker (for this example) that it's not a problem, but doesn't hurt to >>>>>>> check. >>>>>>> >>>>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Not long at all - certainly less than couple of minutes but longer >>>>>>>> than a few seconds, the couple I actually measured lasted 42 seconds >>>>>>>> followed by about 5 seconds. So the heartbeats folder behaves like >>>>>>>> so... >>>>>>>> >>>>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats >>>>>>>> folder inside it >>>>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a >>>>>>>> couple of examples: >>>>>>>> >>>>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:03 1428941026297.version >>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297 >>>>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:03 1428941027240.version >>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240 >>>>>>>> >>>>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:09 1428941398235.version >>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235 >>>>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:09 1428941399163.version >>>>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163 >>>>>>>> ) >>>>>>>> >>>>>>>> /app/storm/workers/[uuid] is deleted. >>>>>>>> >>>>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats >>>>>>>> in. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 13 April 2015 at 17:00, Nathan Leung <[email protected]> wrote: >>>>>>>> >>>>>>>>> It only matters what user supervisor is run as, so you should be >>>>>>>>> ok. How long is it between the worker being started, and it getting >>>>>>>>> killed? What are the timestamps of the heartbeats in the >>>>>>>>> corresponding >>>>>>>>> worker folder? >>>>>>>>> >>>>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I'm running storm as a (specially created) storm user - on both >>>>>>>>>> the nimbus and the supervisors the process is showing as being run >>>>>>>>>> by the >>>>>>>>>> storm user. And I've got the storm local directory set as /app/storm >>>>>>>>>> - >>>>>>>>>> owned by storm user and with permissions 700. Thought it might be >>>>>>>>>> due to me >>>>>>>>>> submitting the job as a different user but have just terminated the >>>>>>>>>> old job >>>>>>>>>> and submitted the new one as the storm user and still not getting any >>>>>>>>>> tuples coming through. >>>>>>>>>> >>>>>>>>>> For what it's worth, I've looked in the local data directory on >>>>>>>>>> the supervisors and in the workers folder there is a (non-empty) >>>>>>>>>> heartbeats >>>>>>>>>> folder being created - don't know if that's indicative of heartbeats >>>>>>>>>> being >>>>>>>>>> processed correctly or if that's created at the start of a worker >>>>>>>>>> process. >>>>>>>>>> >>>>>>>>>> Is there anything else you want to see that would help you >>>>>>>>>> diagnose this? I can share the UI with you if you think that'd help? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Sounds like it's having problems with heartbeats. As I recall >>>>>>>>>>> worker -> supervisor heartbeats are written using the local >>>>>>>>>>> filesystem. >>>>>>>>>>> What user are you running as? In your storm.yaml file, you will >>>>>>>>>>> see the >>>>>>>>>>> option "storm.local.dir". Does the user supervisor / workers are >>>>>>>>>>> running >>>>>>>>>>> as have permissions to write to this folder? >>>>>>>>>>> >>>>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> New bonus bit of information - the supervisor log has got this >>>>>>>>>>>> going on: >>>>>>>>>>>> >>>>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO] >>>>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started >>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and >>>>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current >>>>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat: >>>>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs >>>>>>>>>>>> 1428938029, >>>>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] >>>>>>>>>>>> [3 3] [4 >>>>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] >>>>>>>>>>>> [14 14] >>>>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703} >>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down >>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 >>>>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill >>>>>>>>>>>> 5284. Process is probably already dead. >>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down >>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 >>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker >>>>>>>>>>>> with assignment >>>>>>>>>>>> #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id >>>>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] >>>>>>>>>>>> [5 5] [6 >>>>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 >>>>>>>>>>>> 15] [1 >>>>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on >>>>>>>>>>>> port 6703 >>>>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e >>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker >>>>>>>>>>>> with command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true >>>>>>>>>>>> -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >>>>>>>>>>>> -Dlogfile.name=worker-6703.log >>>>>>>>>>>> -Dstorm.home=/usr/local/storm-0.9.0-wip21 >>>>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml >>>>>>>>>>>> -Dstorm.id=production-topology-2-1428932653 >>>>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e >>>>>>>>>>>> -Dworker.port=6703 -cp >>>>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar >>>>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653 >>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703 >>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e >>>>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] >>>>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started >>>>>>>>>>>> >>>>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit >>>>>>>>>>>>> more information. The UI reports 0 emitted from the wordGenerator >>>>>>>>>>>>> spout, >>>>>>>>>>>>> along with 0 transferred, acked and failed. >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know >>>>>>>>>>>>> the usual way of doing that), I've got a bunch of other stuff on >>>>>>>>>>>>> the boxes >>>>>>>>>>>>> and in troubleshooting things out have been a bit laxer with >>>>>>>>>>>>> security than >>>>>>>>>>>>> I should have been! >>>>>>>>>>>>> >>>>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of >>>>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the >>>>>>>>>>>>> node, during >>>>>>>>>>>>> that time the following happens: >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:java.io.tmpdir=/tmp >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:java.compiler=<NA> >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:os.name=Linux >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:os.arch=amd64 >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:os.version=3.13.0-48-generic >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:user.name=storm >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:user.home=/app/home/storm >>>>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>>>> environment:user.dir=/ >>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for >>>>>>>>>>>>> production-topology-2-1428932653 on >>>>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id >>>>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf >>>>>>>>>>>>> {"dev.zookeeper.path" >>>>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, >>>>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60, >>>>>>>>>>>>> "topology.fall.back.on.java.serialization" true, >>>>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" >>>>>>>>>>>>> 5000, >>>>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" >>>>>>>>>>>>> "-Xmx768m >>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", >>>>>>>>>>>>> "storm.zookeeper.session.timeout" 20000, >>>>>>>>>>>>> "nimbus.reassign" true, >>>>>>>>>>>>> "topology.trident.batch.emit.interval.millis" 500, >>>>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", >>>>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", >>>>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" >>>>>>>>>>>>> "/app/storm", >>>>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120, >>>>>>>>>>>>> "topology.enable.message.timeouts" true, >>>>>>>>>>>>> "nimbus.cleanup.inbox.freq.secs" >>>>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, >>>>>>>>>>>>> "drpc.worker.threads" 64, >>>>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*. >>>>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181, >>>>>>>>>>>>> "transactional.zookeeper.port" nil, >>>>>>>>>>>>> "topology.executor.receive.buffer.size" >>>>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, >>>>>>>>>>>>> "storm.zookeeper.root" >>>>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, >>>>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*. >>>>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*. >>>>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root" >>>>>>>>>>>>> "/transactional", "topology.acker.executors" nil, >>>>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" >>>>>>>>>>>>> nil, >>>>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m >>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", >>>>>>>>>>>>> "supervisor.heartbeat.frequency.secs" 5, >>>>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, >>>>>>>>>>>>> "drpc.port" 3772, >>>>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" >>>>>>>>>>>>> "-Xmx768m", >>>>>>>>>>>>> "topology.receiver.buffer.size" 8, >>>>>>>>>>>>> "task.heartbeat.frequency.secs" 3, >>>>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy" >>>>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", >>>>>>>>>>>>> "topology.max.spout.pending" >>>>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, " >>>>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1, >>>>>>>>>>>>> "nimbus.topology.validator" >>>>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", >>>>>>>>>>>>> "supervisor.slots.ports" >>>>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, >>>>>>>>>>>>> "nimbus.task.launch.secs" >>>>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, >>>>>>>>>>>>> "topology.message.timeout.secs" >>>>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1, >>>>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true", >>>>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, >>>>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" >>>>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer", >>>>>>>>>>>>> "topology.disruptor.wait.strategy" >>>>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", >>>>>>>>>>>>> "nimbus.task.timeout.secs" 30, >>>>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, >>>>>>>>>>>>> "topology.kryo.factory" >>>>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", >>>>>>>>>>>>> "drpc.invocations.port" >>>>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, >>>>>>>>>>>>> "storm.zookeeper.retry.times" >>>>>>>>>>>>> 5, "storm.thrift.transport" >>>>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin", >>>>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60, >>>>>>>>>>>>> "supervisor.worker.timeout.secs" 30, >>>>>>>>>>>>> "nimbus.file.copy.expiration.secs" >>>>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq", >>>>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, >>>>>>>>>>>>> "ui.port" >>>>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m >>>>>>>>>>>>> -Djava.net.preferIPv4Stack=true", >>>>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true, >>>>>>>>>>>>> "topology.max.task.parallelism" nil} >>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] >>>>>>>>>>>>> Starting >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client >>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*. >>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181 >>>>>>>>>>>>> sessionTimeout=20000 >>>>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@585bd6a6 >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket >>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181 >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection >>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating >>>>>>>>>>>>> session >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session >>>>>>>>>>>>> establishment complete on server *. >>>>>>>>>>>>> compute.amazonaws.com/*:2181, sessionid = 0x34cb217be05005e, >>>>>>>>>>>>> negotiated timeout = 20000 >>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state >>>>>>>>>>>>> update: :connected:none >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: >>>>>>>>>>>>> 0x34cb217be05005e closed >>>>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] >>>>>>>>>>>>> Starting >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client >>>>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*. >>>>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm >>>>>>>>>>>>> sessionTimeout=20000 >>>>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@4b492c47 >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut >>>>>>>>>>>>> down >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket >>>>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181 >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection >>>>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating >>>>>>>>>>>>> session >>>>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session >>>>>>>>>>>>> establishment complete on server *. >>>>>>>>>>>>> compute.amazonaws.com/*:2181, sessionid = 0x24cb217be030072, >>>>>>>>>>>>> negotiated timeout = 20000 >>>>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer >>>>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>>>> counter:[2 2] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor >>>>>>>>>>>>> tasks wordGenerator:[11 11] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>>>> executor wordGenerator:[11 11] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>>>> wordGenerator:(11) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>>>> wordGenerator:(11) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>>>> wordGenerator:(11) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>>>> wordGenerator:[12 12] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> __system ["startup"] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor >>>>>>>>>>>>> tasks wordGenerator:[12 12] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>>>> executor wordGenerator:[12 12] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>>>> wordGenerator:(12) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>>>> wordGenerator:(12) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>>>> wordGenerator:(12) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>>>> wordGenerator:[13 13] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> __system ["startup"] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor >>>>>>>>>>>>> tasks wordGenerator:[13 13] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>>>> executor wordGenerator:[13 13] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>>>> wordGenerator:(13) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>>>> wordGenerator:(13) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>>>> wordGenerator:(13) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>>>> wordGenerator:[14 14] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> __system ["startup"] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor >>>>>>>>>>>>> tasks wordGenerator:[14 14] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>>>> executor wordGenerator:[14 14] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [bertels] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>>>> wordGenerator:(14) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>>>> wordGenerator:(14) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>>>> wordGenerator:(14) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, >>>>>>>>>>>>> [bertels] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [mike] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [bertels] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, >>>>>>>>>>>>> [bertels] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>>>> wordGenerator:[15 15] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> __system ["startup"] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor >>>>>>>>>>>>> tasks wordGenerator:[15 15] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>>>> executor wordGenerator:[15 15] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>>>> wordGenerator:(15) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>>>> wordGenerator:(15) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>>>> wordGenerator:(15) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>>>> __system:[-1 -1] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system >>>>>>>>>>>>> __system ["startup"] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor >>>>>>>>>>>>> tasks __system:[-1 -1] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>>>> executor __system:[-1 -1] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt >>>>>>>>>>>>> __system:(-1) >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>>>> __acker:[1 1] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [jackson] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [mike] >>>>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, >>>>>>>>>>>>> [jackson] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [jackson] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, >>>>>>>>>>>>> [jackson] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [golda] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt >>>>>>>>>>>>> __system:(-1) >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker >>>>>>>>>>>>> __system ["startup"] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor >>>>>>>>>>>>> tasks __acker:[1 1] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [jackson] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading >>>>>>>>>>>>> executor __acker:[1 1] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching >>>>>>>>>>>>> receive-thread for bd340179-5905-4233-ad5d-44b47b228177:6702 >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt >>>>>>>>>>>>> __acker:(1) >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt >>>>>>>>>>>>> __acker:(1) >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [nathan] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [bertels] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>>>> default [golda] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda] >>>>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, >>>>>>>>>>>>> [nathan] >>>>>>>>>>>>> >>>>>>>>>>>>> After that, the logs stop (no more messages) and the UI says >>>>>>>>>>>>> that the 5 executors are now running on another node (and port) - >>>>>>>>>>>>> at which >>>>>>>>>>>>> point the worker log is created on that node and with that number >>>>>>>>>>>>> (same >>>>>>>>>>>>> messages). >>>>>>>>>>>>> >>>>>>>>>>>>> So the worker seems to start and start emitting words but the >>>>>>>>>>>>> UI isn't registered anything as having been emitted. After a >>>>>>>>>>>>> short period >>>>>>>>>>>>> of time the worker seems to stop without logging anything and so >>>>>>>>>>>>> a new >>>>>>>>>>>>> worker is started. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Matt >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Did you check the worker process logs? What are the emit >>>>>>>>>>>>>> counts for the components in the UI? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also I have not run on AWS personally but the following line >>>>>>>>>>>>>> seems strange to me: >>>>>>>>>>>>>> >>>>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*. >>>>>>>>>>>>>> compute.amazonaws.com"} >>>>>>>>>>>>>> >>>>>>>>>>>>>> Generally this would be bound to a specific machine, not >>>>>>>>>>>>>> *.<domain>. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me >>>>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster >>>>>>>>>>>>>>> (a nimbus >>>>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my >>>>>>>>>>>>>>> particular >>>>>>>>>>>>>>> topology. However, as soon as I try to run the topology in >>>>>>>>>>>>>>> distributed mode >>>>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the >>>>>>>>>>>>>>> topology >>>>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the >>>>>>>>>>>>>>> spouts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In order to check it wasn't my topology I cloned the >>>>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords >>>>>>>>>>>>>>> example - >>>>>>>>>>>>>>> running locally I get a stream of output but as soon as I >>>>>>>>>>>>>>> specify that we >>>>>>>>>>>>>>> should run in a distributed fashion I get a success message >>>>>>>>>>>>>>> (topology >>>>>>>>>>>>>>> successfully submitted). The topology appears on the UI as >>>>>>>>>>>>>>> successfully >>>>>>>>>>>>>>> submitted but I see no progress. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to >>>>>>>>>>>>>>> share to show the problem as the whole point is I'm not seeing >>>>>>>>>>>>>>> any errors! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up >>>>>>>>>>>>>>> of the supervisor (i.e. it's not logged anything about the >>>>>>>>>>>>>>> topologies that >>>>>>>>>>>>>>> have been submitted) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more >>>>>>>>>>>>>>> illuminating: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive >>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive >>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive >>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available >>>>>>>>>>>>>>> slots: (["bd340179-5905-4233-ad5d-44b47b228177" 6702] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700] >>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703] >>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702] >>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701] >>>>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700] >>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703] >>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702] >>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701] >>>>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700]) >>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning >>>>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots >>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: >>>>>>>>>>>>>>> [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 >>>>>>>>>>>>>>> 11] [12 12] >>>>>>>>>>>>>>> [13 13] [14 14] [15 15] [1 1]] >>>>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new >>>>>>>>>>>>>>> assignment for topology id production-topology-2-1428932653: >>>>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir >>>>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", >>>>>>>>>>>>>>> :node->host >>>>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*. >>>>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1] >>>>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, >>>>>>>>>>>>>>> :executor->start-time-secs >>>>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] >>>>>>>>>>>>>>> 1428934165, [6 >>>>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] >>>>>>>>>>>>>>> 1428934165, [10 >>>>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] >>>>>>>>>>>>>>> 1428934165, >>>>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If anybody can help that'd be greatly appreciated! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Check out my blog: dogdogfish.com >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Check out my blog: dogdogfish.com >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Check out my blog: dogdogfish.com >>>> >>> >>> >> >> >> -- >> Check out my blog: dogdogfish.com >> > > > > -- > Check out my blog: dogdogfish.com > -- Check out my blog: dogdogfish.com
