No, the default is 4. If you are running multiple workers on the same supervisor node, that could cause problems with 768 MB / worker and 1 GB / supervisor host.
On Mon, Apr 13, 2015 at 1:55 PM, Matthew Sharpe <[email protected]> wrote: > I can only think that I'll be using whatever the default is - that's not a > setting that I've changed anywhere or set on any command line. Am I right > in thinking (hoping) that that'll default to one? If not, how do I set it > for the RollingTopWords example? > > The cluster is being spun up now but will probably take half an hour or so > before it's ready. I'll reply once it's all up and good to go. > > On 13 April 2015 at 18:40, Nathan Leung <[email protected]> wrote: > >> That looks about right. Are you running one worker per supervisor right >> now? >> >> On Mon, Apr 13, 2015 at 1:33 PM, Matthew Sharpe <[email protected] >> > wrote: >> >>> I was just thinking about that - I'm using EC2 m3.medium instances, >>> they're not especially RAM heavy (1GB). Am I right in thinking that to >>> enable GC logging I'd just need to run Storm with the following options in >>> my 'storm.yaml'? >>> >>> worker.childopts: "-Xmx768m -XX:MaxPermSize=256m -XX:+PrintGCDetails >>> -XX:+PrintGCTimeStamps -verbose:gc >>> -Xloggc:/app/storm/logs/gc-storm-worker-%ID%.log" >>> supervisor.childopts: "-XX:MaxPermSize=256m -XX:+PrintGCDetails >>> -XX:+PrintGCTimeStamps -verbose:gc >>> -Xloggc:/app/storm/logs/gc-storm-supervisor.log" >>> If so, I'll set up the same cluster but using m3.xlarge instances (15GB >>> of RAM) with those settings. Either way, I'll start setting up the extra >>> large nodes. >>> >>> >>> On 13 April 2015 at 18:10, Nathan Leung <[email protected]> wrote: >>> >>>> Do you have GC logging enabled? You should have enough ram for each >>>> worker (for this example) that it's not a problem, but doesn't hurt to >>>> check. >>>> >>>> On Mon, Apr 13, 2015 at 12:12 PM, Matthew Sharpe < >>>> [email protected]> wrote: >>>> >>>>> Not long at all - certainly less than couple of minutes but longer >>>>> than a few seconds, the couple I actually measured lasted 42 seconds >>>>> followed by about 5 seconds. So the heartbeats folder behaves like so... >>>>> >>>>> /app/storm/workers/[uuid] is created with an empty heartbeats folder >>>>> inside it >>>>> /app/storm/workers/[uuid]/heartbeats has two timestamps added (a >>>>> couple of examples: >>>>> >>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:03 1428941026297.version >>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941026297 >>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:03 1428941027240.version >>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:03 1428941027240 >>>>> >>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:09 1428941398235.version >>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941398235 >>>>> -rw-r--r-- 1 storm storm 0 Apr 13 16:09 1428941399163.version >>>>> -rw-r--r-- 1 storm storm 2.2K Apr 13 16:09 1428941399163 >>>>> ) >>>>> >>>>> /app/storm/workers/[uuid] is deleted. >>>>> >>>>> I've not seen any heartbeats folder with any more than 2 heartbeats in. >>>>> >>>>> >>>>> >>>>> On 13 April 2015 at 17:00, Nathan Leung <[email protected]> wrote: >>>>> >>>>>> It only matters what user supervisor is run as, so you should be ok. >>>>>> How long is it between the worker being started, and it getting killed? >>>>>> What are the timestamps of the heartbeats in the corresponding worker >>>>>> folder? >>>>>> >>>>>> On Mon, Apr 13, 2015 at 11:47 AM, Matthew Sharpe < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I'm running storm as a (specially created) storm user - on both the >>>>>>> nimbus and the supervisors the process is showing as being run by the >>>>>>> storm >>>>>>> user. And I've got the storm local directory set as /app/storm - owned >>>>>>> by >>>>>>> storm user and with permissions 700. Thought it might be due to me >>>>>>> submitting the job as a different user but have just terminated the old >>>>>>> job >>>>>>> and submitted the new one as the storm user and still not getting any >>>>>>> tuples coming through. >>>>>>> >>>>>>> For what it's worth, I've looked in the local data directory on the >>>>>>> supervisors and in the workers folder there is a (non-empty) heartbeats >>>>>>> folder being created - don't know if that's indicative of heartbeats >>>>>>> being >>>>>>> processed correctly or if that's created at the start of a worker >>>>>>> process. >>>>>>> >>>>>>> Is there anything else you want to see that would help you diagnose >>>>>>> this? I can share the UI with you if you think that'd help? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>>> On 13 April 2015 at 16:22, Nathan Leung <[email protected]> wrote: >>>>>>> >>>>>>>> Sounds like it's having problems with heartbeats. As I recall >>>>>>>> worker -> supervisor heartbeats are written using the local filesystem. >>>>>>>> What user are you running as? In your storm.yaml file, you will see >>>>>>>> the >>>>>>>> option "storm.local.dir". Does the user supervisor / workers are >>>>>>>> running >>>>>>>> as have permissions to write to this folder? >>>>>>>> >>>>>>>> On Mon, Apr 13, 2015 at 11:17 AM, Matthew Sharpe < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> New bonus bit of information - the supervisor log has got this >>>>>>>>> going on: >>>>>>>>> >>>>>>>>> 2015-04-13 15:13:48 b.s.d.supervisor [INFO] >>>>>>>>> 3b092266-c3a0-499b-b4e9-bd2c895c5231 still hasn't started >>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down and >>>>>>>>> clearing state for id 3b092266-c3a0-499b-b4e9-bd2c895c5231. Current >>>>>>>>> supervisor time: 1428938060. State: :timed-out, Heartbeat: >>>>>>>>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1428938029, >>>>>>>>> :storm-id "production-topology-2-1428932653", :executors #{[2 2] [3 >>>>>>>>> 3] [4 >>>>>>>>> 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 >>>>>>>>> 14] >>>>>>>>> [15 15] [-1 -1] [1 1]}, :port 6703} >>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shutting down >>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 >>>>>>>>> 2015-04-13 15:14:20 b.s.util [INFO] Error when trying to kill >>>>>>>>> 5284. Process is probably already dead. >>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Shut down >>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:3b092266-c3a0-499b-b4e9-bd2c895c5231 >>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with >>>>>>>>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id >>>>>>>>> "production-topology-2-1428932653", :executors ([2 2] [3 3] [4 4] [5 >>>>>>>>> 5] [6 >>>>>>>>> 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 13] [14 14] [15 15] >>>>>>>>> [1 >>>>>>>>> 1])} for this supervisor bd340179-5905-4233-ad5d-44b47b228177 on port >>>>>>>>> 6703 >>>>>>>>> with id 786a62ac-2868-4526-a7b2-5c26234bb23e >>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] Launching worker with >>>>>>>>> command: java -server -Xmx768m -Djava.net.preferIPv4Stack=true >>>>>>>>> -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >>>>>>>>> -Dlogfile.name=worker-6703.log >>>>>>>>> -Dstorm.home=/usr/local/storm-0.9.0-wip21 >>>>>>>>> -Dlogback.configurationFile=/usr/local/storm-0.9.0-wip21/logback/cluster.xml >>>>>>>>> -Dstorm.id=production-topology-2-1428932653 >>>>>>>>> -Dworker.id=786a62ac-2868-4526-a7b2-5c26234bb23e -Dworker.port=6703 >>>>>>>>> -cp >>>>>>>>> /usr/local/storm-0.9.0-wip21/storm-console-logging-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-core-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/storm-netty-0.9.0-wip21.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/jline-0.9.94.jar:/usr/local/storm-0.9.0-wip21/lib/zookeeper-3.3.3.jar:/usr/local/storm-0.9.0-wip21/lib/curator-framework-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/logback-core-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/tools.macro-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar:/usr/local/storm-0.9.0-wip21/lib/commons-io-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/objenesis-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/slf4j-api-1.6.5.jar:/usr/local/storm-0.9.0-wip21/lib/minlog-1.2.jar:/usr/local/storm-0.9.0-wip21/lib/curator-client-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/hiccup-0.3.6.jar:/usr/local/storm-0.9.0-wip21/lib/asm-4.0.jar:/usr/local/storm-0.9.0-wip21/lib/junit-3.8.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.logging-0.2.3.jar:/usr/local/storm-0.9.0-wip21/lib/kryo-2.17.jar:/usr/local/storm-0.9.0-wip21/lib/httpcore-4.1.jar:/usr/local/storm-0.9.0-wip21/lib/clout-1.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/reflectasm-1.07-shaded.jar:/usr/local/storm-0.9.0-wip21/lib/disruptor-2.10.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-exec-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/joda-time-2.0.jar:/usr/local/storm-0.9.0-wip21/lib/core.incubator-0.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/servlet-api-2.5-20081211.jar:/usr/local/storm-0.9.0-wip21/lib/ring-devel-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/math.numeric-tower-0.0.1.jar:/usr/local/storm-0.9.0-wip21/lib/ring-core-1.1.5.jar:/usr/local/storm-0.9.0-wip21/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jgrapht-0.8.3.jar:/usr/local/storm-0.9.0-wip21/lib/commons-logging-1.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/commons-fileupload-1.2.1.jar:/usr/local/storm-0.9.0-wip21/lib/guava-13.0.jar:/usr/local/storm-0.9.0-wip21/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/storm-0.9.0-wip21/lib/ring-servlet-0.3.11.jar:/usr/local/storm-0.9.0-wip21/lib/jetty-util-6.1.26.jar:/usr/local/storm-0.9.0-wip21/lib/mockito-all-1.9.5.jar:/usr/local/storm-0.9.0-wip21/lib/clojure-1.4.0.jar:/usr/local/storm-0.9.0-wip21/lib/libthrift7-0.7.0-2.jar:/usr/local/storm-0.9.0-wip21/lib/snakeyaml-1.11.jar:/usr/local/storm-0.9.0-wip21/lib/netty-3.6.3.Final.jar:/usr/local/storm-0.9.0-wip21/lib/json-simple-1.1.jar:/usr/local/storm-0.9.0-wip21/lib/clj-stacktrace-0.2.2.jar:/usr/local/storm-0.9.0-wip21/lib/commons-lang-2.5.jar:/usr/local/storm-0.9.0-wip21/lib/commons-codec-1.4.jar:/usr/local/storm-0.9.0-wip21/lib/clj-time-0.4.1.jar:/usr/local/storm-0.9.0-wip21/lib/carbonite-1.5.0.jar:/usr/local/storm-0.9.0-wip21/lib/compojure-1.1.3.jar:/usr/local/storm-0.9.0-wip21/lib/jzmq-2.1.0.jar:/usr/local/storm-0.9.0-wip21/lib/httpclient-4.1.1.jar:/usr/local/storm-0.9.0-wip21/lib/tools.cli-0.2.2.jar:/usr/local/storm-0.9.0-wip21/conf:/app/storm/supervisor/stormdist/production-topology-2-1428932653/stormjar.jar >>>>>>>>> backtype.storm.daemon.worker production-topology-2-1428932653 >>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177 6703 >>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e >>>>>>>>> 2015-04-13 15:14:20 b.s.d.supervisor [INFO] >>>>>>>>> 786a62ac-2868-4526-a7b2-5c26234bb23e still hasn't started >>>>>>>>> >>>>>>>>> On 13 April 2015 at 16:09, Matthew Sharpe < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> OK, in checking the worker process logs I've uncovered a bit more >>>>>>>>>> information. The UI reports 0 emitted from the wordGenerator spout, >>>>>>>>>> along >>>>>>>>>> with 0 transferred, acked and failed. >>>>>>>>>> >>>>>>>>>> Sorry - the *.compute.amazonaws.com is redacted (don't know the >>>>>>>>>> usual way of doing that), I've got a bunch of other stuff on the >>>>>>>>>> boxes and >>>>>>>>>> in troubleshooting things out have been a bit laxer with security >>>>>>>>>> than I >>>>>>>>>> should have been! >>>>>>>>>> >>>>>>>>>> The UI shows 5 workers all starting on the same node - each of >>>>>>>>>> which has a declared uptime of 1s. Looking at the logs on the node, >>>>>>>>>> during >>>>>>>>>> that time the following happens: >>>>>>>>>> >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:java.io.tmpdir=/tmp >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:java.compiler=<NA> >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:os.name=Linux >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:os.arch=amd64 >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:os.version=3.13.0-48-generic >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:user.name=storm >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:user.home=/app/home/storm >>>>>>>>>> 2015-04-13 14:58:48 o.a.z.s.ZooKeeperServer [INFO] Server >>>>>>>>>> environment:user.dir=/ >>>>>>>>>> 2015-04-13 14:58:50 b.s.d.worker [INFO] Launching worker for >>>>>>>>>> production-topology-2-1428932653 on >>>>>>>>>> bd340179-5905-4233-ad5d-44b47b228177:6702 with id >>>>>>>>>> ecbd85cc-c495-4f4d-af28-c3277e3e7531 and conf {"dev.zookeeper.path" >>>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, >>>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60, >>>>>>>>>> "topology.fall.back.on.java.serialization" true, >>>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, >>>>>>>>>> "topology.skip.missing.kryo.registrations" false, "ui.childopts" >>>>>>>>>> "-Xmx768m >>>>>>>>>> -Djava.net.preferIPv4Stack=true", "storm.zookeeper.session.timeout" >>>>>>>>>> 20000, >>>>>>>>>> "nimbus.reassign" true, >>>>>>>>>> "topology.trident.batch.emit.interval.millis" 500, >>>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", >>>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", >>>>>>>>>> "topology.executor.send.buffer.size" 1024, "storm.local.dir" >>>>>>>>>> "/app/storm", >>>>>>>>>> "supervisor.worker.start.timeout.secs" 120, >>>>>>>>>> "topology.enable.message.timeouts" true, >>>>>>>>>> "nimbus.cleanup.inbox.freq.secs" >>>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" >>>>>>>>>> 64, >>>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "*. >>>>>>>>>> compute.amazonaws.com", "storm.zookeeper.port" 2181, >>>>>>>>>> "transactional.zookeeper.port" nil, >>>>>>>>>> "topology.executor.receive.buffer.size" >>>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" >>>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, >>>>>>>>>> "supervisor.enable" true, "storm.zookeeper.servers" ["*. >>>>>>>>>> compute.amazonaws.com" "*.compute.amazonaws.com" "*. >>>>>>>>>> compute.amazonaws.com"], "transactional.zookeeper.root" >>>>>>>>>> "/transactional", "topology.acker.executors" nil, >>>>>>>>>> "topology.transfer.buffer.size" 1024, "topology.worker.childopts" >>>>>>>>>> nil, >>>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m >>>>>>>>>> -Djava.net.preferIPv4Stack=true", >>>>>>>>>> "supervisor.heartbeat.frequency.secs" 5, >>>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" >>>>>>>>>> 3772, >>>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", >>>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, >>>>>>>>>> "topology.tasks" nil, "topology.spout.wait.strategy" >>>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", >>>>>>>>>> "topology.max.spout.pending" >>>>>>>>>> nil, "storm.zookeeper.retry.interval" 1000, " >>>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1, >>>>>>>>>> "nimbus.topology.validator" >>>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", >>>>>>>>>> "supervisor.slots.ports" >>>>>>>>>> [6700 6701 6702 6703], "topology.debug" false, >>>>>>>>>> "nimbus.task.launch.secs" >>>>>>>>>> 120, "nimbus.supervisor.timeout.secs" 60, >>>>>>>>>> "topology.message.timeout.secs" >>>>>>>>>> 30, "task.refresh.poll.secs" 10, "topology.workers" 1, >>>>>>>>>> "supervisor.childopts" "-Djava.net.preferIPv4Stack=true", >>>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, >>>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" >>>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer", >>>>>>>>>> "topology.disruptor.wait.strategy" >>>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", >>>>>>>>>> "nimbus.task.timeout.secs" 30, >>>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" >>>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", >>>>>>>>>> "drpc.invocations.port" >>>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, >>>>>>>>>> "storm.zookeeper.retry.times" >>>>>>>>>> 5, "storm.thrift.transport" >>>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin", >>>>>>>>>> "topology.state.synchronization.timeout.secs" 60, >>>>>>>>>> "supervisor.worker.timeout.secs" 30, >>>>>>>>>> "nimbus.file.copy.expiration.secs" >>>>>>>>>> 600, "storm.messaging.transport" "backtype.storm.messaging.zmq", >>>>>>>>>> "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, >>>>>>>>>> "ui.port" >>>>>>>>>> 8080, "nimbus.childopts" "-Xmx1024m -Djava.net.preferIPv4Stack=true", >>>>>>>>>> "storm.cluster.mode" "distributed", "topology.optimize" true, >>>>>>>>>> "topology.max.task.parallelism" nil} >>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client >>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*. >>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181 >>>>>>>>>> sessionTimeout=20000 >>>>>>>>>> watcher=com.netflix.curator.ConnectionState@585bd6a6 >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket >>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181 >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection >>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment >>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid = >>>>>>>>>> 0x34cb217be05005e, negotiated timeout = 20000 >>>>>>>>>> 2015-04-13 14:58:50 b.s.zookeeper [INFO] Zookeeper state update: >>>>>>>>>> :connected:none >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Session: >>>>>>>>>> 0x34cb217be05005e closed >>>>>>>>>> 2015-04-13 14:58:50 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ZooKeeper [INFO] Initiating client >>>>>>>>>> connection, connectString=*.compute.amazonaws.com:2181,*. >>>>>>>>>> compute.amazonaws.com:2181,*.compute.amazonaws.com:2181/storm >>>>>>>>>> sessionTimeout=20000 >>>>>>>>>> watcher=com.netflix.curator.ConnectionState@4b492c47 >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] EventThread shut down >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Opening socket >>>>>>>>>> connection to server *.compute.amazonaws.com/*:2181 >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Socket connection >>>>>>>>>> established to *.compute.amazonaws.com/*:2181, initiating session >>>>>>>>>> 2015-04-13 14:58:50 o.a.z.ClientCnxn [INFO] Session establishment >>>>>>>>>> complete on server *.compute.amazonaws.com/*:2181, sessionid = >>>>>>>>>> 0x24cb217be030072, negotiated timeout = 20000 >>>>>>>>>> 2015-04-13 14:58:50 b.s.m.TransportFactory [INFO] Storm peer >>>>>>>>>> transport plugin:backtype.storm.messaging.zmq >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>> counter:[2 2] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>> wordGenerator:[11 11] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>> executor wordGenerator:[11 11] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>> wordGenerator:(11) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>> wordGenerator:(11) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>> wordGenerator:(11) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>> wordGenerator:[12 12] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> __system ["startup"] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>> wordGenerator:[12 12] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>> executor wordGenerator:[12 12] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>> wordGenerator:(12) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>> wordGenerator:(12) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>> wordGenerator:(12) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>> wordGenerator:[13 13] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> __system ["startup"] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>> wordGenerator:[13 13] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>> executor wordGenerator:[13 13] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>> wordGenerator:(13) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>> wordGenerator:(13) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>> wordGenerator:(13) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>> wordGenerator:[14 14] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> __system ["startup"] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>> wordGenerator:[14 14] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>> executor wordGenerator:[14 14] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [bertels] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>> wordGenerator:(14) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>> wordGenerator:(14) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>> wordGenerator:(14) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [bertels] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [mike] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [mike] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [bertels] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [bertels] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>> wordGenerator:[15 15] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> __system ["startup"] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>> wordGenerator:[15 15] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>> executor wordGenerator:[15 15] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opening spout >>>>>>>>>> wordGenerator:(15) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Opened spout >>>>>>>>>> wordGenerator:(15) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Activating spout >>>>>>>>>> wordGenerator:(15) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>> __system:[-1 -1] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: __system __system >>>>>>>>>> ["startup"] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>> __system:[-1 -1] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Finished loading >>>>>>>>>> executor __system:[-1 -1] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Preparing bolt >>>>>>>>>> __system:(-1) >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Loading executor >>>>>>>>>> __acker:[1 1] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [jackson] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [mike] >>>>>>>>>> 2015-04-13 14:58:51 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [jackson] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:14, stream: default, id: {}, [mike] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [jackson] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [jackson] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [golda] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:13, stream: default, id: {}, [golda] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt >>>>>>>>>> __system:(-1) >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: __acker __system >>>>>>>>>> ["startup"] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Loaded executor tasks >>>>>>>>>> __acker:[1 1] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [jackson] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Finished loading >>>>>>>>>> executor __acker:[1 1] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.worker [INFO] Launching receive-thread >>>>>>>>>> for bd340179-5905-4233-ad5d-44b47b228177:6702 >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Preparing bolt >>>>>>>>>> __acker:(1) >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Prepared bolt >>>>>>>>>> __acker:(1) >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [nathan] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [bertels] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.task [INFO] Emitting: wordGenerator >>>>>>>>>> default [golda] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:11, stream: default, id: {}, [golda] >>>>>>>>>> 2015-04-13 14:58:52 b.s.d.executor [INFO] Processing received >>>>>>>>>> message source: wordGenerator:12, stream: default, id: {}, [nathan] >>>>>>>>>> >>>>>>>>>> After that, the logs stop (no more messages) and the UI says that >>>>>>>>>> the 5 executors are now running on another node (and port) - at >>>>>>>>>> which point >>>>>>>>>> the worker log is created on that node and with that number (same >>>>>>>>>> messages). >>>>>>>>>> >>>>>>>>>> So the worker seems to start and start emitting words but the UI >>>>>>>>>> isn't registered anything as having been emitted. After a short >>>>>>>>>> period of >>>>>>>>>> time the worker seems to stop without logging anything and so a new >>>>>>>>>> worker >>>>>>>>>> is started. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 13 April 2015 at 15:36, Nathan Leung <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Did you check the worker process logs? What are the emit counts >>>>>>>>>>> for the components in the UI? >>>>>>>>>>> >>>>>>>>>>> Also I have not run on AWS personally but the following line >>>>>>>>>>> seems strange to me: >>>>>>>>>>> >>>>>>>>>>> :node->host {"bd340179-5905-4233-ad5d-44b47b228177" "*. >>>>>>>>>>> compute.amazonaws.com"} >>>>>>>>>>> >>>>>>>>>>> Generally this would be bound to a specific machine, not >>>>>>>>>>> *.<domain>. >>>>>>>>>>> >>>>>>>>>>> On Mon, Apr 13, 2015 at 10:22 AM, Matthew Sharpe < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I've set up a small Storm cluster on AWS that is giving me >>>>>>>>>>>> headaches. At the moment I've got a situation where the cluster (a >>>>>>>>>>>> nimbus >>>>>>>>>>>> node and 3 supervisors) will successfully run in local mode my >>>>>>>>>>>> particular >>>>>>>>>>>> topology. However, as soon as I try to run the topology in >>>>>>>>>>>> distributed mode >>>>>>>>>>>> there are no errors (that I can see) in any of the logs - the >>>>>>>>>>>> topology >>>>>>>>>>>> appears in the UI but never shows any tuples emitted by the spouts. >>>>>>>>>>>> >>>>>>>>>>>> In order to check it wasn't my topology I cloned the >>>>>>>>>>>> storm-starter project and have tried with the RollingTopWords >>>>>>>>>>>> example - >>>>>>>>>>>> running locally I get a stream of output but as soon as I specify >>>>>>>>>>>> that we >>>>>>>>>>>> should run in a distributed fashion I get a success message >>>>>>>>>>>> (topology >>>>>>>>>>>> successfully submitted). The topology appears on the UI as >>>>>>>>>>>> successfully >>>>>>>>>>>> submitted but I see no progress. >>>>>>>>>>>> >>>>>>>>>>>> It's a bit of a weird one as I don't know which logs to share >>>>>>>>>>>> to show the problem as the whole point is I'm not seeing any >>>>>>>>>>>> errors! >>>>>>>>>>>> >>>>>>>>>>>> The last entry in the supervisor logs pertains to the set up of >>>>>>>>>>>> the supervisor (i.e. it's not logged anything about the topologies >>>>>>>>>>>> that >>>>>>>>>>>> have been submitted) >>>>>>>>>>>> >>>>>>>>>>>> The last entries in the nimbus logs are probably a bit more >>>>>>>>>>>> illuminating: >>>>>>>>>>>> >>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>> production-topology-2-1428932653:[2 2] not alive >>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>> production-topology-2-1428932653:[3 3] not alive >>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Executor >>>>>>>>>>>> production-topology-2-1428932653:[4 4] not alive >>>>>>>>>>>> 2015-04-13 14:09:25 b.s.s.EvenScheduler [INFO] Available slots: >>>>>>>>>>>> (["bd340179-5905-4233-ad5d-44b47b228177" 6702] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6701] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6700] >>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6703] >>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6702] >>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6701] >>>>>>>>>>>> ["9de97b6f-3d3f-4797-8ac0-e89501de48d3" 6700] >>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6703] >>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6702] >>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6701] >>>>>>>>>>>> ["2874d18d-e0d1-46e0-a50d-85ce645ec4aa" 6700]) >>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassigning >>>>>>>>>>>> production-topology-2-1428932653 to 1 slots >>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Reassign executors: [[2 >>>>>>>>>>>> 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 >>>>>>>>>>>> 12] [13 >>>>>>>>>>>> 13] [14 14] [15 15] [1 1]] >>>>>>>>>>>> 2015-04-13 14:09:25 b.s.d.nimbus [INFO] Setting new assignment >>>>>>>>>>>> for topology id production-topology-2-1428932653: >>>>>>>>>>>> #backtype.storm.daemon.common.Assignment{:master-code-dir >>>>>>>>>>>> "/app/storm/nimbus/stormdist/production-topology-2-1428932653", >>>>>>>>>>>> :node->host >>>>>>>>>>>> {"bd340179-5905-4233-ad5d-44b47b228177" "*. >>>>>>>>>>>> compute.amazonaws.com"}, :executor->node+port {[2 2] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [3 3] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [4 4] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [5 5] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [6 6] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [7 7] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [8 8] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [9 9] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [10 10] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [11 11] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [12 12] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [13 13] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [14 14] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [15 15] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702], [1 1] >>>>>>>>>>>> ["bd340179-5905-4233-ad5d-44b47b228177" 6702]}, >>>>>>>>>>>> :executor->start-time-secs >>>>>>>>>>>> {[2 2] 1428934165, [3 3] 1428934165, [4 4] 1428934165, [5 5] >>>>>>>>>>>> 1428934165, [6 >>>>>>>>>>>> 6] 1428934165, [7 7] 1428934165, [8 8] 1428934165, [9 9] >>>>>>>>>>>> 1428934165, [10 >>>>>>>>>>>> 10] 1428934165, [11 11] 1428934165, [12 12] 1428934165, [13 13] >>>>>>>>>>>> 1428934165, >>>>>>>>>>>> [14 14] 1428934165, [15 15] 1428934165, [1 1] 1428934165}} >>>>>>>>>>>> >>>>>>>>>>>> If anybody can help that'd be greatly appreciated! >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Check out my blog: dogdogfish.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Check out my blog: dogdogfish.com >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Check out my blog: dogdogfish.com >>>>> >>>> >>>> >>> >>> >>> -- >>> Check out my blog: dogdogfish.com >>> >> >> > > > -- > Check out my blog: dogdogfish.com >
