You can also print GC details to log (the following example is verbose but
you can tailor it to your needs):

-Xloggc:/opt/storm/logs/gc-worker-%ID%.log -verbose:gc
-XX:GCLogFileSize=10m -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps

On Tue, Sep 9, 2014 at 10:15 AM, Cyrille Karmann <[email protected]>
wrote:

> This line:
>  Shutting down and clearing state for id a8733f89-9f41-4624-9bce-d9f2d8c449ee.
> Current supervisor time: 1410261997. State: :timed-out,
> shows that it is a heartbeat time-out.
>
> In my experience, GC in the worker has been the main reason for time-outs
> of the heartbeat. But to be sure, you should try to connect to your worker
> with a tool like JVisualVM and observe what is going on.
>
> That being said, according to your logs, you are using the default GC
> algorithm for your workers, which can make the JVM do significant pauses,
> even if there is plenty of heap. You should use CMS instead, to limit
> pauses. For this, add the following to childopts:
>
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>
>
>
>
> 2014-09-09 7:39 GMT-04:00 Palak Shah <[email protected]>:
>
> I looked at the supervisor and nimbus logs and confirmed that the topology
>> is being rebalanced. It could be because of timeout. Could you help me
>> figure out exactly why this timeout occurs and how I can fix it?
>>
>> I have enabled GC for my workers and supervisor. Could it be the reason
>> that worker is not able to send a heartbeat? I tried increasing the heap
>> size allotted to each worker by tweaking the value of worker.childopts to
>> have "-Xmx768m". But I did not see any difference in behaviour of my
>> topology. How can I fix this issue?
>>
>> here are my supervisor logs -
>>
>> 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shutting down and clearing
>> state for id a8733f89-9f41-4624-9bce-d9f2d8c449ee. Current supervisor time:
>> 1410261997. State: :timed-out, Heartbeat:
>> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1410261964,
>> :storm-id "popeyeTopology-2-1410260830", :executors #{[4 4] [7 7] [-1 -1]
>> [1 1]}, :port 6700}
>> 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shutting down
>> 9f26d478-1963-425e-a0c2-139712f32b9e:a8733f89-9f41-4624-9bce-d9f2d8c449ee
>> 2014-09-09 16:56:37 b.s.util [INFO] Error when trying to kill 28950.
>> Process is probably already dead.
>> 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shut down
>> 9f26d478-1963-425e-a0c2-139712f32b9e:a8733f89-9f41-4624-9bce-d9f2d8c449ee
>> 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Launching worker with
>> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
>> "popeyeTopology-2-1410260830", :executors ([7 7] [4 4] [1 1])} for this
>> supervisor 9f26d478-1963-425e-a0c2-139712f32b9e on port 6700 with id
>> 78233b86-b7f3-4590-9709-ba0d71130d7e
>> 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Launching worker with
>> command: '/usr/lib/jvm/java-7-oracle/bin/java' '-server' '-Xmx768m'
>> '-verbose:gc' '-XX:+PrintGCTimeStamps' '-XX:+PrintGCDetails'
>> '-Dcom.sun.management.jmxremote' '-Dcom.sun.management.jmxremote.ssl=false'
>> '-Dcom.sun.management.jmxremote.authenticate=false'
>> '-Dcom.sun.management.jmxremote.port=16700'
>> '-Djava.library.path=/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/resources/Linux-amd64:/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/resources:/usr/lib/jvm/java-7-oracle/lib'
>> '-Dlogfile.name=worker-6700.log'
>> '-Dstorm.home=/home/stormcluster/Storm/apache-storm-0.9.2-incubating'
>> '-Dlogback.configurationFile=/home/stormcluster/Storm/apache-storm-0.9.2-incubating/logback/cluster.xml'
>> '-Dstorm.id=popeyeTopology-2-1410260830'
>> '-Dworker.id=78233b86-b7f3-4590-9709-ba0d71130d7e' '-Dworker.port=6700'
>> '-cp'
>> '/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/scala-library-2.9.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/zmq.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/storm-kafka-0.9.2-incubating.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/kafka_2.9.2-0.8.1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/conf:/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/stormjar.jar'
>> 'backtype.storm.daemon.worker' 'popeyeTopology-2-1410260830'
>> '9f26d478-1963-425e-a0c2-139712f32b9e' '6700'
>> '78233b86-b7f3-4590-9709-ba0d71130d7e'
>> 2014-09-09 16:56:37 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>> 2014-09-09 16:56:37 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>> 2014-09-09 16:56:38 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>> 2014-09-09 16:56:38 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>> 2014-09-09 16:56:39 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>> 2014-09-09 16:56:39 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>> 2014-09-09 16:56:40 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>> 2014-09-09 16:56:40 b.s.d.supervisor [INFO]
>> 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started
>>
>>
>> Thanks,
>> Palak
>>
>> On Mon, Sep 8, 2014 at 6:46 PM, Cyrille Karmann <[email protected]
>> > wrote:
>>
>>> Look at the supervisors logs. There could be a timeout that make a
>>> supervisor to force a worker to shut down, triggering a rebalancing.
>>>
>>>
>>>
>>> 2014-09-08 7:17 GMT-04:00 Palak Shah <[email protected]>:
>>>
>>> I have a topology that uses a Kafka spout to read values from a Kafka
>>>> queue. I used the kafkaSpout that came with storm-0.9.2-incubating.
>>>>
>>>> I observed that the nimbus rebalances the topology very often. The
>>>> topology suddenly shuts down and starts again with the tasks running on
>>>> different machines. I wanted to know why storm nimbus is rebalancing my
>>>> topology, so I observed the storm throughput, latency, load on bolts and
>>>> even system metrics like cpu and memory utilization, but I could not see a
>>>> pattern.
>>>>
>>>> Can someone explain what are the factors that lead to rebalancing of
>>>> topology in storm?
>>>>
>>>> Thanks,
>>>> Palak Shah
>>>>
>>>
>>>
>>>
>>> --
>>> Cyrille Karmann
>>> +1-514-659-1209
>>> [email protected]
>>>
>>
>>
>
>
> --
> Cyrille Karmann
> +1-514-659-1209
> [email protected]
>

Reply via email to