set the spout max pending, and make sure that you ack your messages in the
bolt/s.

On Wed, Nov 26, 2014 at 8:39 AM, 张炜 <[email protected]> wrote:

> Dear all,
> We frequently meet a heap out of space problem when running topology using
> KafkaSpout. Please kindly help.
>
> Our scenario is that we send large files to Kafka, each file is about 3MB
> size. We use Storm to consume messages from Kafka (using KafkaSpout), and
> we process the message line by line and emit messages.
> We find that very frequently there are memory problems as shown below:
>
> java.lang.OutOfMemoryError: Java heap space at
> java.util.Arrays.copyOf(Arrays.java:2271) at
> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
> java.io.ByteArrayOutputStream.ensur
> java.lang.OutOfMemoryError: Java heap space at
> com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296) at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(
> java.lang.OutOfMemoryError: GC overhead limit exceeded at
> java.util.Arrays.copyOf(Arrays.java:2367) at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
> at java.lang.Ab
>
>
> Our settings of Storm are:
>
> drpc.childopts -Xmx768m
> supervisor.childopts -Xmx256m
> worker.childopts -Xmx1536m -Xms1024m -XX:MaxPermSize=128m
> -XX:NewSize=512m -XX:MaxNewSize=1024m
>
> Each node of our cluster is 4CPU, 8GB memory, and we configured 4 workers
> a node
>
> We dumped the memory and analyzed that there is an LinkedList object
> holding lots of memory, and we found that it's used by KafkaSpout.
>
> The List is used to hold all the messages, if we understand correctly,
> KafkaSpout will fetch all the messages from current consumed offset to the
> max offset and store the messages in the list.
> Because kafka producer is very fast, and in Storm we process line by line
> which is not consuming fast enough, the list gets bigger and bigger.
>
> So my questions are these:
> 1) If our analysis is correct, how to limit the size of messages that
> KafkaSpout fetch every time, for example, make it not fetch from current
> offset to the latest messages. Or to say, fetch a fixed number of messages,
> for instance.
>
> 2) If our analysis is not correct, could you give a suggestion where the
> problems are? Also are the memory settings correct?
>
> Thank you very much for your help!
>
> Regards,
> Sai
>
>

Reply via email to