Re: Emit count decreases of storm topology

Bobby Evans Thu, 26 Oct 2017 14:40:29 -0700

This is a know issue with how storm stores its metrics.  The metrics that
appear on the UI for each worker are stored in zookeeper in a per worker
location.  So if a worker crashes for some reason and then comes back up
again the metrics for that worker are reset to 0.  This is why the metrics
go backwards.  We have plans to fix this, but they are still a work in
progress.

The issue you are seeing from your exception is that you are running out of
heap space in your worker.  Java will throw an OutOfMemoryError when either
there is not enough memory on the heap to hold the request you are making,
or when the JVM spent a over 98% of the time doing GC trying to find enough
memory to make progress.  The exception you hit is the latter.  This may
mean a number of things.  You have 768MB as the default heap size in
storm.yaml so unless you are overriding it when you launch your topology
that is the heap size.  I don't know if you are using RAS for scheduling or
not, but if not then you don't need to set supervisor.memory.capacity.mb or
 worker.heap.memory.mb.

Common issues I have seen that make a worker run out of memory.

1) Too many executors in a worker.  If you are not using the RAS scheduler
the heap size of your worker is hard coded and if you have too many bolts
and spouts in a single worker it can easily run out of memory.  To fix this
either launch more workers or increase the heap size by setting
topology.worker.childopts.
2) Your topology cannot keep up with incoming data and you haven't set
topology.max.spout.pending.  Max spout pending allows for flow control in
your topology when acking is enabled.  Without it if the input data is
coming faster then your topology can process it, that data is stored in
memory.  This can quickly get out of hand and your workers can crash.  You
can typically see this when using the logging metrics consumer and you will
see the receive queue population metric.  If it is large close to the
capacity metric then you know the buffer is filing up.  The best way to fix
this is to either enable flow control with max spout pending or to increase
the parallelism of components that are slow.  But this can make you run
into the situation above if you don't also increase the number of workers
or heap size too.
3) You might have a memory leak in your code.  If that is the case then you
need to do a heap dump of one of the workers that is in trouble.  There are
ways to do a heap dump on an out of memory error too by changing the worker
options.  If you see lots of things inside of a disruptor queue then it is
probably issue 2.

I hope this helps.

As for the NPEs you really need to look at the caused by for anything
coming from the disruptor queue as it is just wrapping the exception that
was likely thrown by something in your topology.

- Bobby

On Thu, Oct 26, 2017 at 5:04 AM Subarna Chatterjee <[email protected]>
wrote:

> Hello,
>
> I am deploying a storm topology on a clustered environment with one master
> and two supervisor nodes. The topology works fine for sometime and then the
> no of tuples emitted decreases and I am getting two different errors:
>
> java.lang.RuntimeException: java.lang.NullPointerException at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:135)
> at backtype.storm.utils.DisruptorQueue.consumeBatchWhe
>
> and
>
> apache storm java.lang.OutOfMemoryError: GC overhead limit exceeded at
> java.lang.reflect.Method.copy(Method.java:151) at
> java.lang.reflect.ReflectAccess.copyMethod(ReflectAccess.java:136) at
> sun.reflect.Reflect
>
> In my storm.yaml, the content is as follows:
>
> storm.zookeeper.servers:
>      - "zoo01"
>  storm.zookeeper.port: 2181
>
>  nimbus.host: "nimbus01"
>  nimbus.thrift.port: 6627
>
>  nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
>
>  ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
>
>  supervisor.childopts: "-Djava.net.preferIPv4Stack=true"
>
>  worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
>
>  storm.local.dir: "/usr/local/storm/data"
>
>  java.library.path: "/usr/lib/jvm/java-7-openjdk-amd64/"
>
>  supervisor.memory.capacity.mb: 32000
>
>  worker.heap.memory.mb: 2000
>
>  supervisor.slots.ports:
>
>  - 6700
>  - 6701
>  - 6702
>  - 6703
>
> Can anyone help?
>
> Thanks a lot! :)
>
> *Thanking you,*
>
>
> *Dr. Subarna Chatterjee, Ph.D.*
>
> *Post-Doctoral Researcher*
> *Inria, Rennes*
> *Website:
> <http://goog_1688852758>http://chatterjeesubarna.wix.com/subarna
> <http://chatterjeesubarna.wix.com/subarna>*
>

Re: Emit count decreases of storm topology

Reply via email to