If you increase the parallelism ten-fold, trippling the number of workers is probably not enough. You could increase the memory per worker or scale all your vars (batchSize, maxtasks, numWorkers, parralismHint) with the same factor for each variable. Hope this helps.

Best regards
Thomas


Am 1/23/2014 4:20 PM, schrieb Jean-Sebastien Vachon:

Hi All,

We've been running storm 0.9.0.1 for some time now in pre-production and yesterday we started seeing this error in our logs. I've executed our topology with a reduced number of workers and it is working fine (5 Spout Pending, batchsize=100, maxtasks=5, numWorkers=25, parallelismHint=1). However, if I give it more workers and tasks (5 Spout Pending, batchsize=400, maxtasks=40, numWorkers=80, parallelismHint=10) then the topology seems to hang and starts showing the error below. This same topology used to run fin with batchsize of up to 2000.

java.lang.OutOfMemoryError: GC overhead limit exceeded
         at com.esotericsoftware.kryo.io.Output.toBytes(Output.java:103)
at backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:28) at backtype.storm.daemon.worker$mk_transfer_fn$fn__5686$fn__5690.invoke(worker.clj:108)
         at backtype.storm.util$fast_list_map.invoke(util.clj:801)
at backtype.storm.daemon.worker$mk_transfer_fn$fn__5686.invoke(worker.clj:108) at backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3328.invoke(executor.clj:240) at backtype.storm.disruptor$clojure_handler$reify__2962.onEvent(disruptor.clj:43) at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87) at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:61) at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) at backtype.storm.disruptor$consume_loop_STAR_$fn__2975.invoke(disruptor.clj:74)
         at backtype.storm.util$async_loop$fn__444.invoke(util.clj:403)
         at clojure.lang.AFn.run(AFn.java:24)
         at java.lang.Thread.run(Thread.java:662)

Can anyone pinpoint me to some configurations options that could help resolve this issue? Each worker has 600Mb of RAM as of now, this amount of memory seemed to be sufficient until yesterday.

Buffers are sized using these values:

topology.executor.receive.buffer.size: 16384

topology.executor.send.buffer.size: 16384

topology.receiver.buffer.size: 8

topology.transfer.buffer.size: 32

Thanks



Reply via email to