Thanks for all your help so far - I've set up a cluster with m3.2xlarge EC2 instances, 30GB RAM and 8 vCPUs which I'm hoping should be enough. I've added GC logging and have switched to using one worker per supervisor (have gone from: - 6700 - 6701 - 6702 - 6703
to - 6700). I've also switched all of my Java space declarations to the values used in here: https://github.com/apache/storm/blob/master/conf/defaults.yaml i.e. nimbus: 1024m, ui : 768m, supervisor: 256m, worker : 768m I'm still not getting any tuples emitted - the supervisor GC log: Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) Memory: 4k page, physical 30824244k(27561384k free), swap 0k(0k free) CommandLine flags: -XX:InitialHeapSize=268435456 -XX:MaxHeapSize=268435456 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC 2.058: [GC [PSYoungGen: 66048K->10199K(76800K)] 66048K->10207K(251392K), 0.0158580 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 3.010: [GC [PSYoungGen: 76247K->10746K(76800K)] 76255K->11131K(251392K), 0.0171470 secs] [Times: user=0.07 sys=0.01, real=0.02 secs] 264.012: [GC [PSYoungGen: 76794K->7879K(76800K)] 77179K->8272K(251392K), 0.0114030 secs] [Times: user=0.08 sys=0.00, real=0.01 secs] The worker GC log: Java HotSpot(TM) 64-Bit Server VM (24.76-b04) for linux-amd64 JRE (1.7.0_76-b13), built on Dec 18 2014 16:58:39 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) Memory: 4k page, physical 30824244k(27356064k free), swap 0k(0k free) CommandLine flags: -XX:InitialHeapSize=493187904 -XX:MaxHeapSize=805306368 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC 2.861: [GC [PSYoungGen: 120832K->16150K(140800K)] 120832K->16158K(461824K), 0.0233930 secs] [Times: user=0.07 sys=0.03, real=0.02 secs] The supervisor log: 2015-04-14 10:04:24 b.s.d.supervisor [INFO] f5888c5a-d36f-4d6b-8b67-4cf20203efef still hasn't started 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down and clearing state for id f5888c5a-d36f-4d6b-8b67-4cf20203efef. Current supervisor time: 1429005898. State: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429005865, :storm-id "production-topology-1-1429005259", :executors #{[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [-1 -1] [1 1]}, :port 6700} 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shutting down 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef 2015-04-14 10:04:58 b.s.util [INFO] Error when trying to kill 30755. Process is probably already dead. 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Shut down 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e:f5888c5a-d36f-4d6b-8b67-4cf20203efef 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "production-topology-1-1429005259", :executors ([2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1 1])} for this supervisor 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e on port 6700 with id e7cfe011-56a4-42d4-8331-4cb055055799 2015-04-14 10:04:58 b.s.d.supervisor [INFO] Launching worker with command: java -server -Xmx768m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/app/storm/gc-storm-worker-6700.log -Djava.net.preferIPv4Stack=true -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=worker-6700.log -Dstorm.home=/usr/local/storm-0.9.0-wip21 -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml -Dstorm.id=production-topology-1-1429005259 -Dworker.id=e7cfe011-56a4-42d4-8331-4cb055055799 -Dworker.port=6700 -cp /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-1-1429005259/stormjar.jar backtype.storm.daemon.worker production-topology-1-1429005259 5a1f1d44-4d4f-4309-9ed3-d753da1dda9e 6700 e7cfe011-56a4-42d4-8331-4cb055055799 2015-04-14 10:04:58 b.s.d.supervisor [INFO] e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started 2015-04-14 10:04:59 b.s.d.supervisor [INFO] e7cfe011-56a4-42d4-8331-4cb055055799 still hasn't started The worker log: https://gist.github.com/Kali89/2a3d3aea401f8d1bb195 The nimbus log: 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[2 2] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[3 3] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[4 4] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[5 5] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[6 6] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[7 7] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[8 8] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[9 9] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[10 10] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[11 11] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[12 12] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[13 13] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[14 14] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[15 15] not alive 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Executor production-topology-1-1429005259:[1 1] not alive 2015-04-14 10:11:27 b.s.s.EvenScheduler [INFO] Available slots: (["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700] ["5f1c9cea-3fcc-4d4e-aa9e-650d837e6a49" 6700]) 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassigning production-topology-1-1429005259 to 1 slots 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] [1 1]] 2015-04-14 10:11:27 b.s.d.nimbus [INFO] Setting new assignment for topology id production-topology-1-1429005259: #backtype.storm.daemon.common.Assignment{:master-code-dir "/app/storm/nimbus/stormdist/production-topology-1-1429005259", :node->host {"eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" " ec2-52-16-76-40.eu-west-1.compute.amazonaws.com"}, :executor->node+port {[2 2] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [3 3] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [4 4] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [5 5] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [6 6] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [7 7] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [8 8] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [9 9] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [10 10] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [11 11] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [12 12] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [13 13] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [14 14] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [15 15] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700], [1 1] ["eb1f8e08-1171-4361-8f25-6fda6fa8e2ca" 6700]}, :executor->start-time-secs {[2 2] 1429006287, [3 3] 1429006287, [4 4] 1429006287, [5 5] 1429006287, [6 6] 1429006287, [7 7] 1429006287, [8 8] 1429006287, [9 9] 1429006287, [10 10] 1429006287, [11 11] 1429006287, [12 12] 1429006287, [13 13] 1429006287, [14 14] 1429006287, [15 15] 1429006287, [1 1] 1429006287}} Sorry for the data dump! For what it's worth - I'm now wondering how the supervisor and worker relationship works? With one 756MB worker per supervisor (node) I'm well below the machine limit but am I hitting another sort of limit? Does the supervisor daemon on the supervisor node have enough memory with 256MB? Is there a relationship between the worker memory and the supervisor memory? Thanks! On 13 April 2015 at 18:58, Nathan Leung <[email protected]> wrote: > No, the default is 4. If you are running multiple workers on the same > supervisor node, that could cause problems with 768 MB / worker and 1 GB / > supervisor host. > > On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <[email protected]> > wrote: > >> I can only think that I'll be using whatever the default is - that's not >> a setting that I've changed anywhere or set on any command line. Am I right >> in thinking (hoping) that that'll default to one? If not, how do I set it >> for the RollingTopWords example? >> >> The cluster is being spun up now but will probably take half an hour or >> so before it's ready. I'll reply once it's all up and good to go. >> >> On 13 April 2015 at 18:40, Nathan Leung <[email protected]> wrote: >> >>> That looks about right. Are you running one worker per supervisor right >>> now? >>> >>> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe < >>> [email protected]> wrote: >>> >>>> I was just thinking about that - I'm using EC2 m3.medium instances, >>>> they're not especially RAM heavy (1GB). Am I right in thinking that to >>>> enable GC logging I'd just need to run Storm with the following options in >>>> my 'storm.yaml'? >>>> >>>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails >>>> -XX:+PrintGCTimeStamps -verbose:gc >>>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log" >>>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails >>>> -XX:+PrintGCTimeStamps -verbose:gc >>>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log" >>>> If so, I'll set up the same cluster but using m3.xlarge instances (15GB >>>> of RAM) with those settings. Either way, I'll start setting up the extra >>>> large nodes. >>>> >>>> >>>> On 13 April 2015 at 18:10, Nathan Leung <[email protected]> wrote: >>>> >>>>> Do you have GC logging enabled? You should have enough ram for each >>>>> worker (for this example) that it's not a problem, but doesn't hurt to >>>>> check. >>>>> >>>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe < >>>>> [email protected]> wrote: >>>>> >>>>>> Not long at all - certainly less than couple of minutes but longer >>>>>> than a few seconds, the couple I actually measured lasted 42 seconds >>>>>> followed by about 5 seconds. So the heartbeats folder behaves like so... >>>>>> >>>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder >>>>>> inside it >>>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a >>>>>> couple of examples: >>>>>> >>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:03 1428941026297.version >>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297 >>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:03 1428941027240.version >>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240 >>>>>> >>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:09 1428941398235.version >>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235 >>>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:09 1428941399163.version >>>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163 >>>>>> ) >>>>>> >>>>>> /app/storm/workers/[uuid] is deleted. >>>>>> >>>>>> I've not seen any heartbeats folder with any more than 2 heartbeats >>>>>> in. >>>>>> >>>>>> >>>>>> >>>>>> On 13 April 2015 at 17:00, Nathan Leung <[email protected]> wrote: >>>>>> >>>>>>> It only matters what user supervisor is run as, so you should be >>>>>>> ok. How long is it between the worker being started, and it getting >>>>>>> killed? What are the timestamps of the heartbeats in the corresponding >>>>>>> worker folder? >>>>>>> >>>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I'm running storm as a (specially created) storm user - on both the >>>>>>>> nimbus and the supervisors the process is showing as being run by the >>>>>>>> storm >>>>>>>> user. And I've got the storm local directory set as /app/storm - owned >>>>>>>> by >>>>>>>> storm user and with permissions 700. Thought it might be due to me >>>>>>>> submitting the job as a different user but have just terminated the >>>>>>>> old job >>>>>>>> and submitted the new one as the storm user and still not getting any >>>>>>>> tuples coming through. >>>>>>>> >>>>>>>> For what it's worth, I've looked in the local data directory on the >>>>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats >>>>>>>> folder being created - don't know if that's indicative of heartbeats >>>>>>>> being >>>>>>>> processed correctly or if that's created at the start of a worker >>>>>>>> process. >>>>>>>> >>>>>>>> Is there anything else you want to see that would help you diagnose >>>>>>>> this? I can share the UI with you if you think that'd help? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> >>>>>>>> On 13 April 2015 at 16:22, Nathan Leung <[email protected]> wrote: >>>>>>>> >>>>>>>>> Sounds like it's having problems with heartbeats. As I recall >>>>>>>>> worker -> supervisor heartbeats are written using the local >>>>>>>>> filesystem. >>>>>>>>> What user are you running as? In your storm.yaml file, you will see >>>>>>>>> the >>>>>>>>> option "storm.local.dir". Does the user supervisor / workers are >>>>>>>>> running >>>>>>>>> as have permissions to write to this folder? >>>>>>>>> >>>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> New bonus bit of information - the supervisor log has got this >>>>>>>>>> going on: >>>>>>>>>> >>>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO] >>>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started >>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and >>>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current >>>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat: >>>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029, >>>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 >>>>>>>>>> 3] [4 >>>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 >>>>>>>>>> 14] >>>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703} >>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down >>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 >>>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill >>>>>>>>>> 5284. Process is probably already dead. >>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down >>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 >>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with >>>>>>>>>> assignment >>>>>>>>>> #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id >>>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 >>>>>>>>>> 5] [6 >>>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] >>>>>>>>>> [1 >>>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on >>>>>>>>>> port 6703 >>>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e >>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with >>>>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true >>>>>>>>>> -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >>>>>>>>>> -Dlogfile.name=worker-6703.log >>>>>>>>>> -Dstorm.home=/usr/local/storm-0.9.0-wip21 >>>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml >>>>>>>>>> -Dstorm.id=production-topology-2-1428932653 >>>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 >>>>>>>>>> -cp >>>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar >>>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653 >>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703 >>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e >>>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] >>>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started >>>>>>>>>> >>>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit >>>>>>>>>>> more information. The UI reports 0 emitted from the wordGenerator >>>>>>>>>>> spout, >>>>>>>>>>> along with 0 transferred, acked and failed. >>>>>>>>>>> >>>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the >>>>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the >>>>>>>>>>> boxes and >>>>>>>>>>> in troubleshooting things out have been a bit laxer with security >>>>>>>>>>> than I >>>>>>>>>>> should have been! >>>>>>>>>>> >>>>>>>>>>> The UI shows 5 workers all starting on the same node - each of >>>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, >>>>>>>>>>> during >>>>>>>>>>> that time the following happens: >>>>>>>>>>> >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:java.io.tmpdir=/tmp >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:java.compiler=<NA> >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:os.name=Linux >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:os.arch=amd64 >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:os.version=3.13.0-48-generic >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:user.name=storm >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:user.home=/app/home/storm >>>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>>> environment:user.dir=/ >>>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for >>>>>>>>>>> production-topology-2-1428932653 on >>>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id >>>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path" >>>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, >>>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60, >>>>>>>>>>> "topology.fall.back.on.java.serialization" true, >>>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" >>>>>>>>>>> 5000, >>>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" >>>>>>>>>>> "-Xmx768m >>>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" >>>>>>>>>>> 20000, >>>>>>>>>>> "nimbus.reassign" true, >>>>>>>>>>> "topology.trident.batch.emit.interval.millis" 500, >>>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", >>>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", >>>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" >>>>>>>>>>> "/app/storm", >>>>>>>>>>> "supervisor.worker.start.timeout.secs" 120, >>>>>>>>>>> "topology.enable.message.timeouts" true, >>>>>>>>>>> "nimbus.cleanup.inbox.freq.secs" >>>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" >>>>>>>>>>> 64, >>>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*. >>>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181, >>>>>>>>>>> "transactional.zookeeper.port" nil, >>>>>>>>>>> "topology.executor.receive.buffer.size" >>>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" >>>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, >>>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*. >>>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*. >>>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root" >>>>>>>>>>> "/transactional", "topology.acker.executors" nil, >>>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" >>>>>>>>>>> nil, >>>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m >>>>>>>>>>> -Djava.net.preferIPv4Stack=true", >>>>>>>>>>> "supervisor.heartbeat.frequency.secs" 5, >>>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, >>>>>>>>>>> "drpc.port" 3772, >>>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", >>>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" >>>>>>>>>>> 3, >>>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy" >>>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", >>>>>>>>>>> "topology.max.spout.pending" >>>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, " >>>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1, >>>>>>>>>>> "nimbus.topology.validator" >>>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", >>>>>>>>>>> "supervisor.slots.ports" >>>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, >>>>>>>>>>> "nimbus.task.launch.secs" >>>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, >>>>>>>>>>> "topology.message.timeout.secs" >>>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1, >>>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true", >>>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, >>>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" >>>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer", >>>>>>>>>>> "topology.disruptor.wait.strategy" >>>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", >>>>>>>>>>> "nimbus.task.timeout.secs" 30, >>>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" >>>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", >>>>>>>>>>> "drpc.invocations.port" >>>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, >>>>>>>>>>> "storm.zookeeper.retry.times" >>>>>>>>>>> 5, "storm.thrift.transport" >>>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin", >>>>>>>>>>> "topology.state.synchronization.timeout.secs" 60, >>>>>>>>>>> "supervisor.worker.timeout.secs" 30, >>>>>>>>>>> "nimbus.file.copy.expiration.secs" >>>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq", >>>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, >>>>>>>>>>> "ui.port" >>>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m >>>>>>>>>>> -Djava.net.preferIPv4Stack=true", >>>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true, >>>>>>>>>>> "topology.max.task.parallelism" nil} >>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] >>>>>>>>>>> Starting >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client >>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*. >>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181 >>>>>>>>>>> sessionTimeout=20000 >>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@585bd6a6 >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket >>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181 >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection >>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating >>>>>>>>>>> session >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session >>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181, >>>>>>>>>>> sessionid = 0x34cb217be05005e, negotiated timeout = 20000 >>>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update: >>>>>>>>>>> :connected:none >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: >>>>>>>>>>> 0x34cb217be05005e closed >>>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] >>>>>>>>>>> Starting >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client >>>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*. >>>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm >>>>>>>>>>> sessionTimeout=20000 >>>>>>>>>>> watcher=com.netflix.curator.ConnectionState@4b492c47 >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket >>>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181 >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection >>>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating >>>>>>>>>>> session >>>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session >>>>>>>>>>> establishment complete on server *.compute.amazonaws.com/*:2181, >>>>>>>>>>> sessionid = 0x24cb217be030072, negotiated timeout = 20000 >>>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer >>>>>>>>>>> transport plugin:backtype.storm.messaging.zmq >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>> counter:[2 2] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>>> wordGenerator:[11 11] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>> executor wordGenerator:[11 11] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>> wordGenerator:(11) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>> wordGenerator:(11) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>> wordGenerator:(11) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>> wordGenerator:[12 12] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> __system ["startup"] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>>> wordGenerator:[12 12] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>> executor wordGenerator:[12 12] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>> wordGenerator:(12) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>> wordGenerator:(12) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>> wordGenerator:(12) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>> wordGenerator:[13 13] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> __system ["startup"] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>>> wordGenerator:[13 13] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>> executor wordGenerator:[13 13] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>> wordGenerator:(13) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>> wordGenerator:(13) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>> wordGenerator:(13) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>> wordGenerator:[14 14] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> __system ["startup"] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>>> wordGenerator:[14 14] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>> executor wordGenerator:[14 14] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [bertels] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>> wordGenerator:(14) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>> wordGenerator:(14) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>> wordGenerator:(14) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [mike] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [bertels] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>> wordGenerator:[15 15] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> __system ["startup"] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>>> wordGenerator:[15 15] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>> executor wordGenerator:[15 15] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>>> wordGenerator:(15) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>>> wordGenerator:(15) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>>> wordGenerator:(15) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>> __system:[-1 -1] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system >>>>>>>>>>> __system ["startup"] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>>> __system:[-1 -1] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>>> executor __system:[-1 -1] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt >>>>>>>>>>> __system:(-1) >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>>> __acker:[1 1] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [jackson] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [mike] >>>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [jackson] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [golda] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt >>>>>>>>>>> __system:(-1) >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system >>>>>>>>>>> ["startup"] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>>> __acker:[1 1] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [jackson] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading >>>>>>>>>>> executor __acker:[1 1] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread >>>>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702 >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt >>>>>>>>>>> __acker:(1) >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt >>>>>>>>>>> __acker:(1) >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [nathan] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [bertels] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>>> default [golda] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda] >>>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan] >>>>>>>>>>> >>>>>>>>>>> After that, the logs stop (no more messages) and the UI says >>>>>>>>>>> that the 5 executors are now running on another node (and port) - >>>>>>>>>>> at which >>>>>>>>>>> point the worker log is created on that node and with that number >>>>>>>>>>> (same >>>>>>>>>>> messages). >>>>>>>>>>> >>>>>>>>>>> So the worker seems to start and start emitting words but the UI >>>>>>>>>>> isn't registered anything as having been emitted. After a short >>>>>>>>>>> period of >>>>>>>>>>> time the worker seems to stop without logging anything and so a new >>>>>>>>>>> worker >>>>>>>>>>> is started. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Did you check the worker process logs? What are the emit >>>>>>>>>>>> counts for the components in the UI? >>>>>>>>>>>> >>>>>>>>>>>> Also I have not run on AWS personally but the following line >>>>>>>>>>>> seems strange to me: >>>>>>>>>>>> >>>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*. >>>>>>>>>>>> compute.amazonaws.com"} >>>>>>>>>>>> >>>>>>>>>>>> Generally this would be bound to a specific machine, not >>>>>>>>>>>> *.<domain>. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me >>>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster >>>>>>>>>>>>> (a nimbus >>>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my >>>>>>>>>>>>> particular >>>>>>>>>>>>> topology. However, as soon as I try to run the topology in >>>>>>>>>>>>> distributed mode >>>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the >>>>>>>>>>>>> topology >>>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the >>>>>>>>>>>>> spouts. >>>>>>>>>>>>> >>>>>>>>>>>>> In order to check it wasn't my topology I cloned the >>>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords >>>>>>>>>>>>> example - >>>>>>>>>>>>> running locally I get a stream of output but as soon as I specify >>>>>>>>>>>>> that we >>>>>>>>>>>>> should run in a distributed fashion I get a success message >>>>>>>>>>>>> (topology >>>>>>>>>>>>> successfully submitted). The topology appears on the UI as >>>>>>>>>>>>> successfully >>>>>>>>>>>>> submitted but I see no progress. >>>>>>>>>>>>> >>>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share >>>>>>>>>>>>> to show the problem as the whole point is I'm not seeing any >>>>>>>>>>>>> errors! >>>>>>>>>>>>> >>>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up >>>>>>>>>>>>> of the supervisor (i.e. it's not logged anything about the >>>>>>>>>>>>> topologies that >>>>>>>>>>>>> have been submitted) >>>>>>>>>>>>> >>>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more >>>>>>>>>>>>> illuminating: >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive >>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive >>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive >>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available >>>>>>>>>>>>> slots: (["bd340179-5905-4233-ad5d-44b47b228177" 6702] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700] >>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703] >>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702] >>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701] >>>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700] >>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703] >>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702] >>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701] >>>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700]) >>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning >>>>>>>>>>>>> production-topology-2-1428932653 to 1 slots >>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: >>>>>>>>>>>>> [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] >>>>>>>>>>>>> [12 12] >>>>>>>>>>>>> [13 13] [14 14] [15 15] [1 1]] >>>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment >>>>>>>>>>>>> for topology id production-topology-2-1428932653: >>>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir >>>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", >>>>>>>>>>>>> :node->host >>>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*. >>>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1] >>>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, >>>>>>>>>>>>> :executor->start-time-secs >>>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] >>>>>>>>>>>>> 1428934165, [6 >>>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] >>>>>>>>>>>>> 1428934165, [10 >>>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] >>>>>>>>>>>>> 1428934165, >>>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}} >>>>>>>>>>>>> >>>>>>>>>>>>> If anybody can help that'd be greatly appreciated! >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Matt >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Check out my blog: dogdogfish.com >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Check out my blog: dogdogfish.com >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Check out my blog: dogdogfish.com >>>> >>> >>> >> >> >> -- >> Check out my blog: dogdogfish.com >> > > -- Check out my blog: dogdogfish.com
