I ran into this problem and could not find a workaround. In my case, the
worker does not commit suicide, just that the supervisor believes the
worker is dead.  Here is the output from one of the supervisor:

2015-06-02T05:56:47.355+0000 b.s.d.supervisor [INFO] Downloading code for
storm id asyncVarGenTopology-2-1433224607 from
/usr/local/storm/storm-local/nimbus/stormdist/asyncVarGenTopology-2-1433224607
2015-06-02T05:56:47.710+0000 b.s.d.supervisor [INFO] Finished downloading
code for storm id asyncVarGenTopology-2-1433224607 from
/usr/local/storm/storm-local/nimbus/stormdist/asyncVarGenTopology-2-1433224607
2015-06-02T05:56:47.713+0000 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"asyncVarGenTopology-2-1433224607", :executors ([3 3] [38 38] [8 8] [43 43]
[13 13] [48 48] [18 18] [23 23] [28 28] [33 33])} for this supervisor
5dbc466f-24f0-45ba-8880-3d93771a79c2 on port 6703 with id
fe2e3a1f-51e2-4b94-8a31-79773ac8e641
2015-06-02T05:56:47.716+0000 b.s.d.supervisor [INFO] Launching worker with
command: 'java' '-server' '-Xms12G' '-Xmx12G' '-XX:NewSize=6G'
'-XX:+UseParNewGC' '-XX:+UseConcMarkSweepGC'
'-XX:+CMSParallelRemarkEnabled' '-XX:SurvivorRatio=6'
'-XX:MaxTenuringThreshold=2' '-XX:CMSInitiatingOccupancyFraction=70'
'-XX:+UseCMSInitiatingOccupancyOnly' '-XX:+UseTLAB' '-XX:+UseCondCardMark'
'-XX:CMSWaitDuration=5000' '-XX:+CMSScavengeBeforeRemark'
'-XX:+UnlockDiagnosticVMOptions' '-XX:ParGCCardsPerStrideChunk=4096'
'-XX:+ExplicitGCInvokesConcurrent' '-XX:+PrintGCDetails'
'-XX:+PrintGCDateStamps' '-XX:+PrintTenuringDistribution'
'-XX:PrintFLSStatistics=1' '-XX:+PrintPromotionFailure'
'-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintHeapAtGC'
'-XX:+PrintSafepointStatistics' '-Xloggc:/usr/local/storm/logs/gc.log'
'-XX:+UseGCLogFileRotation' '-XX:NumberOfGCLogFiles=10'
'-XX:GCLogFileSize=100M' '-Dcom.sun.management.jmxremote.port=9999'
'-Dcom.sun.management.jmxremote.ssl=false'
'-Dcom.sun.management.jmxremote.authenticate=false'
'-XX:+CMSClassUnloadingEnabled'
'-Djava.library.path=/usr/local/storm/storm-local/supervisor/stormdist/asyncVarGenTopology-2-1433224607/resources/Linux-amd64:/usr/local/storm/storm-local/supervisor/stormdist/asyncVarGenTopology-2-1433224607/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
'-Dlogfile.name=worker-6703.log'
'-Dstorm.home=/usr/local/apache-storm-0.9.4' '-Dstorm.conf.file='
'-Dstorm.options=' '-Dstorm.log.dir=/usr/local/apache-storm-0.9.4/logs'
'-Dlogback.configurationFile=/usr/local/apache-storm-0.9.4/logback/cluster.xml'
'-Dstorm.id=asyncVarGenTopology-2-1433224607'
'-Dworker.id=fe2e3a1f-51e2-4b94-8a31-79773ac8e641' '-Dworker.port=6703'
'-cp'
'/usr/local/apache-storm-0.9.4/lib/clj-time-0.4.1.jar:/usr/local/apache-storm-0.9.4/lib/clojure-1.5.1.jar:/usr/local/apache-storm-0.9.4/lib/logback-classic-1.0.13.jar:/usr/local/apache-storm-0.9.4/lib/ring-servlet-0.3.11.jar:/usr/local/apache-storm-0.9.4/lib/objenesis-1.2.jar:/usr/local/apache-storm-0.9.4/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/apache-storm-0.9.4/lib/ring-core-1.1.5.jar:/usr/local/apache-storm-0.9.4/lib/minlog-1.2.jar:/usr/local/apache-storm-0.9.4/lib/commons-lang-2.5.jar:/usr/local/apache-storm-0.9.4/lib/json-simple-1.1.jar:/usr/local/apache-storm-0.9.4/lib/joda-time-2.0.jar:/usr/local/apache-storm-0.9.4/lib/servlet-api-2.5.jar:/usr/local/apache-storm-0.9.4/lib/commons-logging-1.1.3.jar:/usr/local/apache-storm-0.9.4/lib/jetty-6.1.26.jar:/usr/local/apache-storm-0.9.4/lib/hiccup-0.3.6.jar:/usr/local/apache-storm-0.9.4/lib/commons-fileupload-1.2.1.jar:/usr/local/apache-storm-0.9.4/lib/carbonite-1.4.0.jar:/usr/local/apache-storm-0.9.4/lib/logback-core-1.0.13.jar:/usr/local/apache-storm-0.9.4/lib/compojure-1.1.3.jar:/usr/local/apache-storm-0.9.4/lib/tools.logging-0.2.3.jar:/usr/local/apache-storm-0.9.4/lib/chill-java-0.3.5.jar:/usr/local/apache-storm-0.9.4/lib/slf4j-api-1.7.5.jar:/usr/local/apache-storm-0.9.4/lib/reflectasm-1.07-shaded.jar:/usr/local/apache-storm-0.9.4/lib/snakeyaml-1.11.jar:/usr/local/apache-storm-0.9.4/lib/commons-exec-1.1.jar:/usr/local/apache-storm-0.9.4/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/apache-storm-0.9.4/lib/math.numeric-tower-0.0.1.jar:/usr/local/apache-storm-0.9.4/lib/core.incubator-0.1.0.jar:/usr/local/apache-storm-0.9.4/lib/tools.macro-0.1.0.jar:/usr/local/apache-storm-0.9.4/lib/commons-codec-1.6.jar:/usr/local/apache-storm-0.9.4/lib/ring-devel-0.3.11.jar:/usr/local/apache-storm-0.9.4/lib/asm-4.0.jar:/usr/local/apache-storm-0.9.4/lib/clj-stacktrace-0.2.2.jar:/usr/local/apache-storm-0.9.4/lib/jetty-util-6.1.26.jar:/usr/local/apache-storm-0.9.4/lib/storm-core-0.9.4.jar:/usr/local/apache-storm-0.9.4/lib/clout-1.0.1.jar:/usr/local/apache-storm-0.9.4/lib/kryo-2.21.jar:/usr/local/apache-storm-0.9.4/lib/jgrapht-core-0.9.0.jar:/usr/local/apache-storm-0.9.4/lib/jline-2.11.jar:/usr/local/apache-storm-0.9.4/lib/commons-io-2.4.jar:/usr/local/apache-storm-0.9.4/lib/disruptor-2.10.1.jar:/usr/local/apache-storm-0.9.4/lib/tools.cli-0.2.4.jar:/usr/local/apache-storm-0.9.4/conf:/usr/local/storm/storm-local/supervisor/stormdist/asyncVarGenTopology-2-1433224607/stormjar.jar'
'backtype.storm.daemon.worker' 'asyncVarGenTopology-2-1433224607'
'5dbc466f-24f0-45ba-8880-3d93771a79c2' '6703'
'fe2e3a1f-51e2-4b94-8a31-79773ac8e641'
2015-06-02T05:56:47.717+0000 b.s.d.supervisor [INFO]
fe2e3a1f-51e2-4b94-8a31-79773ac8e641 still hasn't started
2015-06-02T05:56:48.217+0000 b.s.d.supervisor [INFO]
fe2e3a1f-51e2-4b94-8a31-79773ac8e641 still hasn't started
2015-06-02T05:56:48.718+0000 b.s.d.supervisor [INFO]
fe2e3a1f-51e2-4b94-8a31-79773ac8e641 still hasn't started
2015-06-02T05:56:49.218+0000 b.s.d.supervisor [INFO]
fe2e3a1f-51e2-4b94-8a31-79773ac8e641 still hasn't started
2015-06-02T05:56:49.719+0000 b.s.d.supervisor [INFO]
fe2e3a1f-51e2-4b94-8a31-79773ac8e641 still hasn't started
2015-06-02T05:56:50.219+0000 b.s.d.supervisor [INFO]
fe2e3a1f-51e2-4b94-8a31-79773ac8e641 still hasn't started
2015-06-02T05:56:50.719+0000 b.s.d.supervisor [INFO]
fe2e3a1f-51e2-4b94-8a31-79773ac8e641 still hasn't started
2015-06-02T06:09:23.423+0000 b.s.d.supervisor [INFO] Removing code for
storm id asyncVarGenTopology-2-1433224607
2015-06-02T06:09:23.425+0000 b.s.d.supervisor [INFO] Shutting down and
clearing state for id fe2e3a1f-51e2-4b94-8a31-79773ac8e641. Current
supervisor time: 1433225363. State: :disallowed, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1433225130,
:storm-id "asyncVarGenTopology-2-1433224607", :executors #{[3 3] [38 38] [8
8] [43 43] [13 13] [48 48] [18 18] [23 23] [28 28] [-1 -1] [33 33]}, :port
6703}
2015-06-02T06:09:23.425+0000 b.s.d.supervisor [INFO] Shutting down
5dbc466f-24f0-45ba-8880-3d93771a79c2:fe2e3a1f-51e2-4b94-8a31-79773ac8e641
2015-06-02T06:09:24.433+0000 b.s.d.supervisor [INFO] Shut down
5dbc466f-24f0-45ba-8880-3d93771a79c2:fe2e3a1f-51e2-4b94-8a31-79773ac8e641
2015-06-02T06:11:23.425+0000 b.s.d.supervisor [INFO] Downloading code for
storm id asyncVarGenTopology-2-1433224607 from
/usr/local/storm/storm-local/nimbus/stormdist/asyncVarGenTopology-2-1433224607
2015-06-02T06:11:23.665+0000 b.s.d.supervisor [INFO] Finished downloading
code for storm id asyncVarGenTopology-2-1433224607 from
/usr/local/storm/storm-local/nimbus/stormdist/asyncVarGenTopology-2-1433224607
2015-06-02T06:11:23.668+0000 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"asyncVarGenTopology-2-1433224607", :executors ([3 3] [38 38] [8 8] [43 43]
[13 13] [48 48] [18 18] [23 23] [28 28] [33 33])} for this supervisor
5dbc466f-24f0-45ba-8880-3d93771a79c2 on port 6703 with id
c98c6803-84d4-493b-a5aa-cc2d24f040ac
2015-06-02T06:11:23.670+0000 b.s.d.supervisor [INFO] Launching worker with
command: 'java' '-server' '-Xms12G' '-Xmx12G' '-XX:NewSize=6G'
'-XX:+UseParNewGC' '-XX:+UseConcMarkSweepGC'
'-XX:+CMSParallelRemarkEnabled' '-XX:SurvivorRatio=6'
'-XX:MaxTenuringThreshold=2' '-XX:CMSInitiatingOccupancyFraction=70'
'-XX:+UseCMSInitiatingOccupancyOnly' '-XX:+UseTLAB' '-XX:+UseCondCardMark'
'-XX:CMSWaitDuration=5000' '-XX:+CMSScavengeBeforeRemark'
'-XX:+UnlockDiagnosticVMOptions' '-XX:ParGCCardsPerStrideChunk=4096'
'-XX:+ExplicitGCInvokesConcurrent' '-XX:+PrintGCDetails'
'-XX:+PrintGCDateStamps' '-XX:+PrintTenuringDistribution'
'-XX:PrintFLSStatistics=1' '-XX:+PrintPromotionFailure'
'-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintHeapAtGC'
'-XX:+PrintSafepointStatistics' '-Xloggc:/usr/local/storm/logs/gc.log'
'-XX:+UseGCLogFileRotation' '-XX:NumberOfGCLogFiles=10'
'-XX:GCLogFileSize=100M' '-Dcom.sun.management.jmxremote.port=9999'
'-Dcom.sun.management.jmxremote.ssl=false'
'-Dcom.sun.management.jmxremote.authenticate=false'
'-XX:+CMSClassUnloadingEnabled'
'-Djava.library.path=/usr/local/storm/storm-local/supervisor/stormdist/asyncVarGenTopology-2-1433224607/resources/Linux-amd64:/usr/local/storm/storm-local/supervisor/stormdist/asyncVarGenTopology-2-1433224607/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
'-Dlogfile.name=worker-6703.log'
'-Dstorm.home=/usr/local/apache-storm-0.9.4' '-Dstorm.conf.file='
'-Dstorm.options=' '-Dstorm.log.dir=/usr/local/apache-storm-0.9.4/logs'
'-Dlogback.configurationFile=/usr/local/apache-storm-0.9.4/logback/cluster.xml'
'-Dstorm.id=asyncVarGenTopology-2-1433224607'
'-Dworker.id=c98c6803-84d4-493b-a5aa-cc2d24f040ac' '-Dworker.port=6703'
'-cp'
'/usr/local/apache-storm-0.9.4/lib/clj-time-0.4.1.jar:/usr/local/apache-storm-0.9.4/lib/clojure-1.5.1.jar:/usr/local/apache-storm-0.9.4/lib/logback-classic-1.0.13.jar:/usr/local/apache-storm-0.9.4/lib/ring-servlet-0.3.11.jar:/usr/local/apache-storm-0.9.4/lib/objenesis-1.2.jar:/usr/local/apache-storm-0.9.4/lib/log4j-over-slf4j-1.6.6.jar:/usr/local/apache-storm-0.9.4/lib/ring-core-1.1.5.jar:/usr/local/apache-storm-0.9.4/lib/minlog-1.2.jar:/usr/local/apache-storm-0.9.4/lib/commons-lang-2.5.jar:/usr/local/apache-storm-0.9.4/lib/json-simple-1.1.jar:/usr/local/apache-storm-0.9.4/lib/joda-time-2.0.jar:/usr/local/apache-storm-0.9.4/lib/servlet-api-2.5.jar:/usr/local/apache-storm-0.9.4/lib/commons-logging-1.1.3.jar:/usr/local/apache-storm-0.9.4/lib/jetty-6.1.26.jar:/usr/local/apache-storm-0.9.4/lib/hiccup-0.3.6.jar:/usr/local/apache-storm-0.9.4/lib/commons-fileupload-1.2.1.jar:/usr/local/apache-storm-0.9.4/lib/carbonite-1.4.0.jar:/usr/local/apache-storm-0.9.4/lib/logback-core-1.0.13.jar:/usr/local/apache-storm-0.9.4/lib/compojure-1.1.3.jar:/usr/local/apache-storm-0.9.4/lib/tools.logging-0.2.3.jar:/usr/local/apache-storm-0.9.4/lib/chill-java-0.3.5.jar:/usr/local/apache-storm-0.9.4/lib/slf4j-api-1.7.5.jar:/usr/local/apache-storm-0.9.4/lib/reflectasm-1.07-shaded.jar:/usr/local/apache-storm-0.9.4/lib/snakeyaml-1.11.jar:/usr/local/apache-storm-0.9.4/lib/commons-exec-1.1.jar:/usr/local/apache-storm-0.9.4/lib/ring-jetty-adapter-0.3.11.jar:/usr/local/apache-storm-0.9.4/lib/math.numeric-tower-0.0.1.jar:/usr/local/apache-storm-0.9.4/lib/core.incubator-0.1.0.jar:/usr/local/apache-storm-0.9.4/lib/tools.macro-0.1.0.jar:/usr/local/apache-storm-0.9.4/lib/commons-codec-1.6.jar:/usr/local/apache-storm-0.9.4/lib/ring-devel-0.3.11.jar:/usr/local/apache-storm-0.9.4/lib/asm-4.0.jar:/usr/local/apache-storm-0.9.4/lib/clj-stacktrace-0.2.2.jar:/usr/local/apache-storm-0.9.4/lib/jetty-util-6.1.26.jar:/usr/local/apache-storm-0.9.4/lib/storm-core-0.9.4.jar:/usr/local/apache-storm-0.9.4/lib/clout-1.0.1.jar:/usr/local/apache-storm-0.9.4/lib/kryo-2.21.jar:/usr/local/apache-storm-0.9.4/lib/jgrapht-core-0.9.0.jar:/usr/local/apache-storm-0.9.4/lib/jline-2.11.jar:/usr/local/apache-storm-0.9.4/lib/commons-io-2.4.jar:/usr/local/apache-storm-0.9.4/lib/disruptor-2.10.1.jar:/usr/local/apache-storm-0.9.4/lib/tools.cli-0.2.4.jar:/usr/local/apache-storm-0.9.4/conf:/usr/local/storm/storm-local/supervisor/stormdist/asyncVarGenTopology-2-1433224607/stormjar.jar'
'backtype.storm.daemon.worker' 'asyncVarGenTopology-2-1433224607'
'5dbc466f-24f0-45ba-8880-3d93771a79c2' '6703'
'c98c6803-84d4-493b-a5aa-cc2d24f040ac'
2015-06-02T06:11:23.671+0000 b.s.d.supervisor [INFO]
c98c6803-84d4-493b-a5aa-cc2d24f040ac still hasn't started
2015-06-02T06:11:24.172+0000 b.s.d.supervisor [INFO]
c98c6803-84d4-493b-a5aa-cc2d24f040ac still hasn't started
2015-06-02T06:11:24.672+0000 b.s.d.supervisor [INFO]
c98c6803-84d4-493b-a5aa-cc2d24f040ac still hasn't started
2015-06-02T06:11:25.172+0000 b.s.d.supervisor [INFO]
c98c6803-84d4-493b-a5aa-cc2d24f040ac still hasn't started
2015-06-02T06:11:25.673+0000 b.s.d.supervisor [INFO]
c98c6803-84d4-493b-a5aa-cc2d24f040ac still hasn't started
2015-06-02T06:11:26.173+0000 b.s.d.supervisor [INFO]
c98c6803-84d4-493b-a5aa-cc2d24f040ac still hasn't started
2015-06-02T06:11:26.673+0000 b.s.d.supervisor [INFO]
c98c6803-84d4-493b-a5aa-cc2d24f040ac still hasn't started


On Fri, May 29, 2015 at 3:09 PM, Jeffery Maass <[email protected]> wrote:

> When you look at the worker logs, do some of the workers sometimes kill
> themselves because there is a missing stormconf.ser file?  If so, grab that
> error message and have fun googling.
>
> Some will say those problems went away with the latest release.
> Apparently, it is complicated.
>
> My best advice is to avoid resource contention issues ( failed heartbeats
> ) or unhandled exceptions ( worker commits suicide ) that result in workers
> dying or being killed.  This may lead to a state that I describe as the
> fubar worker loop.  I haven't figured it out, but basically, a worker gets
> up, the supervisor is looking for the wrong worker, the supervisor decides
> the worker is dead, so cleans up, the clean up means the worker no longer
> sees its stormconf.ser, so it commits suicide.  That's one version of
> events.
>
> Good luck.
>
> Thank you for your time!
>
> +++++++++++++++++++++
> Jeff Maass <[email protected]>
> linkedin.com/in/jeffmaass
> stackoverflow.com/users/373418/maassql
> +++++++++++++++++++++
>
>
> On Fri, May 29, 2015 at 12:36 PM, Grant Overby (groverby) <
> [email protected]> wrote:
>
>>  Supervisor is reporting that the worker “still hasn’t started” and
>> eventually kills and restarts the worker. However; the worker has started
>> and is processing tuples. This repeats indefinitely.
>>
>>  Debugging steps?
>>
>>
>>         *Grant Overby*
>> Software Engineer
>> Cisco.com <http://www.cisco.com/>
>> [email protected]
>> Mobile: *865 724 4910 <865%20724%204910>*
>>
>>
>>
>>        Think before you print.
>>
>> This email may contain confidential and privileged material for the sole
>> use of the intended recipient. Any review, use, distribution or disclosure
>> by others is strictly prohibited. If you are not the intended recipient (or
>> authorized to receive for the recipient), please contact the sender by
>> reply email and delete all copies of this message.
>>
>> Please click here
>> <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for
>> Company Registration Information.
>>
>>
>>
>>
>

Reply via email to