I looked at the supervisor and nimbus logs and confirmed that the topology
is being rebalanced. It could be because of timeout. Could you help me
figure out exactly why this timeout occurs and how I can fix it?

I have enabled GC for my workers and supervisor. Could it be the reason
that worker is not able to send a heartbeat? I tried increasing the heap
size allotted to each worker by tweaking the value of worker.childopts to
have "-Xmx768m". But I did not see any difference in behaviour of my
topology. How can I fix this issue?

here are my supervisor logs -

2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shutting down and clearing
state for id a8733f89-9f41-4624-9bce-d9f2d8c449ee. Current supervisor time:
1410261997. State: :timed-out, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1410261964,
:storm-id "popeyeTopology-2-1410260830", :executors #{[4 4] [7 7] [-1 -1]
[1 1]}, :port 6700}
2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shutting down
9f26d478-1963-425e-a0c2-139712f32b9e:a8733f89-9f41-4624-9bce-d9f2d8c449ee
2014-09-09 16:56:37 b.s.util [INFO] Error when trying to kill 28950.
Process is probably already dead.
2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shut down
9f26d478-1963-425e-a0c2-139712f32b9e:a8733f89-9f41-4624-9bce-d9f2d8c449ee
2014-09-09 16:56:37 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"popeyeTopology-2-1410260830", :executors ([7 7] [4 4] [1 1])} for this
supervisor 9f26d478-1963-425e-a0c2-139712f32b9e on port 6700 with id
78233b86-b7f3-4590-9709-ba0d71130d7e
2014-09-09 16:56:37 b.s.d.supervisor [INFO] Launching worker with command:
'/usr/lib/jvm/java-7-oracle/bin/java' '-server' '-Xmx768m' '-verbose:gc'
'-XX:+PrintGCTimeStamps' '-XX:+PrintGCDetails'
'-Dcom.sun.management.jmxremote' '-Dcom.sun.management.jmxremote.ssl=false'
'-Dcom.sun.management.jmxremote.authenticate=false'
'-Dcom.sun.management.jmxremote.port=16700'
'-Djava.library.path=/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/resources/Linux-amd64:/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/resources:/usr/lib/jvm/java-7-oracle/lib'
'-Dlogfile.name=worker-6700.log'
'-Dstorm.home=/home/stormcluster/Storm/apache-storm-0.9.2-incubating'
'-Dlogback.configurationFile=/home/stormcluster/Storm/apache-storm-0.9.2-incubating/logback/cluster.xml'
'-Dstorm.id=popeyeTopology-2-1410260830'
'-Dworker.id=78233b86-b7f3-4590-9709-ba0d71130d7e' '-Dworker.port=6700'
'-cp'
'/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/scala-library-2.9.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/zmq.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/storm-kafka-0.9.2-incubating.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/kafka_2.9.2-0.8.1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/conf:/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/stormjar.jar'
'backtype.storm.daemon.worker' 'popeyeTopology-2-1410260830'
'9f26d478-1963-425e-a0c2-139712f32b9e' '6700'
'78233b86-b7f3-4590-9709-ba0d71130d7e'
2014-09-09 16:56:37 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
2014-09-09 16:56:37 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
2014-09-09 16:56:38 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
2014-09-09 16:56:38 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
2014-09-09 16:56:39 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
2014-09-09 16:56:39 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
2014-09-09 16:56:40 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
2014-09-09 16:56:40 b.s.d.supervisor [INFO]
78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started


Thanks,
Palak

On Mon, Sep 8, 2014 at 6:46 PM, Cyrille Karmann <[email protected]>
wrote:

> Look at the supervisors logs. There could be a timeout that make a
> supervisor to force a worker to shut down, triggering a rebalancing.
>
>
>
> 2014-09-08 7:17 GMT-04:00 Palak Shah <[email protected]>:
>
> I have a topology that uses a Kafka spout to read values from a Kafka
>> queue. I used the kafkaSpout that came with storm-0.9.2-incubating.
>>
>> I observed that the nimbus rebalances the topology very often. The
>> topology suddenly shuts down and starts again with the tasks running on
>> different machines. I wanted to know why storm nimbus is rebalancing my
>> topology, so I observed the storm throughput, latency, load on bolts and
>> even system metrics like cpu and memory utilization, but I could not see a
>> pattern.
>>
>> Can someone explain what are the factors that lead to rebalancing of
>> topology in storm?
>>
>> Thanks,
>> Palak Shah
>>
>
>
>
> --
> Cyrille Karmann
> +1-514-659-1209
> [email protected]
>

Reply via email to