Try increasing nofile and nproc for your storm service account. Jon
On Mon, Aug 14, 2017, 12:46 Laurens Vets <[email protected]> wrote: > Hi List, > > I'm seeing the following errors in our indexing topology: > > kafkaSpout: > java.lang.OutOfMemoryError: GC overhead limit exceeded at > org.apache.kafka.common.utils.Utils.toArray(Utils.java:272) at > org.apache.kafka.common.utils.Utils.toArray(Utils.java:265) at > > org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:626) > at > > org.apache.kafka.clients.consumer.internals.Fetcher.parseFetchedData(Fetcher.java:548) > at > > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:354) > at > > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1000) > at > > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938) > at > > org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:286) > at > org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:224) > at > > org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651) > at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at > clojure.lang.AFn.run(AFn.java:22) at > java.lang.Thread.run(Thread.java:745) > > java.lang.OutOfMemoryError: GC overhead limit exceeded at > java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at > java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at > > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) > at > > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323) > at org.apache.kafka.common.network.Selector.poll(Selector.java:283) at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) at > > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:201) > at > > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999) > at > > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938) > at > > org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:286) > at > org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:224) > at > > org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651) > at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at > clojure.lang.AFn.run(AFn.java:22) at > java.lang.Thread.run(Thread.java:745) > > hdfsIndexingBolt: > java.lang.Exception: WARNING: Default and (likely) unoptimized writer > config used for hdfs writer and sensor cloudtrail at > > org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:115) > at > > org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734) > at > > org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:466) > at > > org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40) > at > > org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451) > at > > org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430) > at > > org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73) > at > > org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853) > at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at > clojure.lang.AFn.run(AFn.java:22) at > java.lang.Thread.run(Thread.java:745) > > java.lang.OutOfMemoryError: GC overhead limit exceeded at > java.util.Arrays.copyOf(Arrays.java:3236) at > sun.misc.Resource.getBytes(Resource.java:117) at > java.net.URLClassLoader.defineClass(URLClassLoader.java:462) at > java.net.URLClassLoader.access$100(URLClassLoader.java:73) at > java.net.URLClassLoader$1.run(URLClassLoader.java:368) at > java.net.URLClassLoader$1.run(URLClassLoader.java:362) at > java.security.AccessController.doPrivileged(Native Method) at > java.net.URLClassLoader.findClass(URLClassLoader.java:361) at > java.lang.ClassLoader.loadClass(ClassLoader.java:424) at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at > java.lang.ClassLoader.loadClass(ClassLoader.java:357) at > > org.apache.metron.common.error.MetronError.addStacktrace(MetronError.java:120) > at > > org.apache.metron.common.error.MetronError.getJSONObject(MetronError.java:99) > at > org.apache.metron.common.utils.ErrorUtils.handleError(ErrorUtils.java:94) > at > > org.apache.metron.writer.BulkWriterComponent.error(BulkWriterComponent.java:81) > at > > org.apache.metron.writer.BulkWriterComponent.write(BulkWriterComponent.java:152) > at > > org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:117) > at > > org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734) > at > > org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:466) > at > > org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40) > at > > org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451) > at > > org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430) > at > > org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73) > at > > org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853) > at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at > clojure.lang.AFn.run(AFn.java:22) at > java.lang.Thread.run(Thread.java:745) > > Some backgroud information: > We're currently using Metron on 2 EC2 nodes (32GB RAM, 8 cores) and only > changed the following default options: > worker.childopts: -Xmx4096m. > topology.acker.executors: from "null" to 1. > logviewer.childopts: from "-Xmx128m" to "-Xmx1024m > topology.transfer.buffer.size: from 1024 to 32 > elasticsearch heap_size: 8192m > > 1 node is at 100% load & memory and the other is almost doing nothing... > > The messages we're ingesting are only approx. 1 kbyte JSON and we're > limiting ingestion to 1200 messages/minute via NiFi. Initially, > everything seemed to be going fine, but then Storm started throwing > memory errors at various places. > > Any idea what might be going on and how I can further troubleshoot this? > -- Jon
