I'd appreciate any insight you all may be able to provide with an issue I'm
facing.
I've run this topology in local mode without issue. However, when deployed
to my cluster (2 supervisors) my workers fail to start.
The worker logs on each node are empty.
The supervisor logs on each node look like this:
2014-02-19 09:51:57 b.s.d.supervisor [INFO] Downloading code for storm id
opixrs-5-1392825117 from /data/storm/nimbus/stormdist/opixrs-5-1392825117
2014-02-19 09:52:00 b.s.util [INFO] Could not extract resources from
/data/storm/supervisor/tmp/c1a236df-ddeb-4bc1-824a-de40dc1888fd/stormjar.jar
2014-02-19 09:52:00 b.s.d.supervisor [INFO] Finished downloading code for
storm id opixrs-5-1392825117 from
/data/storm/nimbus/stormdist/opixrs-5-1392825117
2014-02-19 09:52:00 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"opixrs-5-1392825117", :executors ([5 5] [9 9] [1 1])} for this supervisor
e1534292-5a48-4306-b52f-fc80812b12ba on port 6703 with id
539e69da-8eab-4773-b51f-8f70b1bc6222
2014-02-19 09:52:00 b.s.d.supervisor [INFO] Launching worker with command:
java -server -Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+UseConcMa
...snip....
/data/storm/supervisor/stormdist/opixrs-5-1392825117/stormjar.jar
...snip....
2014-02-19 09:52:00 b.s.d.supervisor [INFO]
539e69da-8eab-4773-b51f-8f70b1bc6222 still hasn't started
2014-02-19 09:52:01 b.s.d.supervisor [INFO]
539e69da-8eab-4773-b51f-8f70b1bc6222 still hasn't started
2
Many lines of the worker "still hasn't started" and then try again.
The Nimbus log shows successful topology submission, but then complains
about:
2014-02-19 09:53:59 b.s.d.nimbus [INFO] Executor opixrs-5-1392825117:[2 2]
not alive
2014-02-19 09:53:59 b.s.d.nimbus [INFO] Executor opixrs-5-1392825117:[3 3]
not alive
... and then reassigns the topology slots and tries again.
I don't see my topology jar anywhere on the supervisor nodes.
I didn't include most of the java call in the above log snippet, but the
-cp option referencing the stormdist directory (between snips) points to a
directory that doesn't exist.
My first thought was some kind of permissions issue, but even after
updating stormdist and the supervisor directories to be wide open I still
face the same issue. Then I checked hosts files and verified I could access
each server from/to Nimbus and Zookeeper.
Any ideas? Where should I be looking? Anybody face similar issues in the
past?
Thanks again for your help.
-Chad