Try increasing nofile and nproc for your storm service account.

Jon

On Mon, Aug 14, 2017, 12:46 Laurens Vets <[email protected]> wrote:

> Hi List,
>
> I'm seeing the following errors in our indexing topology:
>
> kafkaSpout:
> java.lang.OutOfMemoryError: GC overhead limit exceeded at
> org.apache.kafka.common.utils.Utils.toArray(Utils.java:272) at
> org.apache.kafka.common.utils.Utils.toArray(Utils.java:265) at
>
> org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:626)
> at
>
> org.apache.kafka.clients.consumer.internals.Fetcher.parseFetchedData(Fetcher.java:548)
> at
>
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:354)
> at
>
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1000)
> at
>
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
> at
>
> org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:286)
> at
> org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:224)
> at
>
> org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded at
> java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at
> java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at
>
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
> at
>
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
> at
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
> at
>
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:283) at
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) at
>
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at
>
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at
>
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:201)
> at
>
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999)
> at
>
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
> at
>
> org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:286)
> at
> org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:224)
> at
>
> org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
>
> hdfsIndexingBolt:
> java.lang.Exception: WARNING: Default and (likely) unoptimized writer
> config used for hdfs writer and sensor cloudtrail at
>
> org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:115)
> at
>
> org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734)
> at
>
> org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:466)
> at
>
> org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40)
> at
>
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451)
> at
>
> org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430)
> at
>
> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
> at
>
> org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded at
> java.util.Arrays.copyOf(Arrays.java:3236) at
> sun.misc.Resource.getBytes(Resource.java:117) at
> java.net.URLClassLoader.defineClass(URLClassLoader.java:462) at
> java.net.URLClassLoader.access$100(URLClassLoader.java:73) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:368) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:362) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:361) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
>
> org.apache.metron.common.error.MetronError.addStacktrace(MetronError.java:120)
> at
>
> org.apache.metron.common.error.MetronError.getJSONObject(MetronError.java:99)
> at
> org.apache.metron.common.utils.ErrorUtils.handleError(ErrorUtils.java:94)
> at
>
> org.apache.metron.writer.BulkWriterComponent.error(BulkWriterComponent.java:81)
> at
>
> org.apache.metron.writer.BulkWriterComponent.write(BulkWriterComponent.java:152)
> at
>
> org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:117)
> at
>
> org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734)
> at
>
> org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:466)
> at
>
> org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40)
> at
>
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451)
> at
>
> org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430)
> at
>
> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
> at
>
> org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
>
> Some backgroud information:
> We're currently using Metron on 2 EC2 nodes (32GB RAM, 8 cores) and only
> changed the following default options:
> worker.childopts: -Xmx4096m.
> topology.acker.executors: from "null" to 1.
> logviewer.childopts: from "-Xmx128m" to "-Xmx1024m
> topology.transfer.buffer.size: from 1024 to 32
> elasticsearch heap_size: 8192m
>
> 1 node is at 100% load & memory and the other is almost doing nothing...
>
> The messages we're ingesting are only approx. 1 kbyte JSON and we're
> limiting ingestion to 1200 messages/minute via NiFi. Initially,
> everything seemed to be going fine, but then Storm started throwing
> memory errors at various places.
>
> Any idea what might be going on and how I can further troubleshoot this?
>
-- 

Jon

Reply via email to