Hi, try to increase the heap size of workers from 1-4 gb. Also some memory for Elasticsearch.
On Tue, Mar 27, 2018, 10:55 AM Roshan Naik <[email protected]> wrote: > One of the spouts or bolts is most likely leaking memory. What components > are you using in your topo ? > > > > Sent from Yahoo Mail for iPhone > <https://overview.mail.yahoo.com/?.src=iOS> > > On Monday, March 26, 2018, 7:23 PM, 王 纯超 <[email protected]> wrote: > > Hi all, > > I am now experiencing the problem that a worker restarts periodically. I > checked the worker.log, and found below information when the worker shuts > down(I do not know exactly before or after. My guess is before and is the > cause of worker shutdown): > > 2018-03-27 09:56:29.331 o.a.s.k.KafkaUtils > Thread-63-reader-Tsignalmsg_4-executor[83 83] [INFO] Task [5/10] assigned > [Partition{host=132.98.23.31:9092, topic=Tsignalmsg_4, partition=4}] > 2018-03-27 09:56:29.331 o.a.s.k.ZkCoordinator > Thread-63-reader-Tsignalmsg_4-executor[83 83] [INFO] Task [5/10] Deleted > partition managers: [] > 2018-03-27 09:56:29.331 o.a.s.k.ZkCoordinator > Thread-63-reader-Tsignalmsg_4-executor[83 83] [INFO] Task [5/10] New > partition managers: [] > 2018-03-27 09:56:29.331 o.a.s.k.ZkCoordinator > Thread-63-reader-Tsignalmsg_4-executor[83 83] [INFO] Task [5/10] Finished > refreshing > 2018-03-27 09:56:32.666 o.e.t.n.Netty4Utils > elasticsearch[_client_][transport_client_boss][T#1] [ERROR] fatal error on > the network layer > at > org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:185) > at > org.elasticsearch.transport.netty4.Netty4Transport.lambda$sendMessage$4(Netty4Transport.java:402) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) > at > io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:905) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:805) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1291) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723) > at > io.netty.handler.logging.LoggingHandler.write(LoggingHandler.java:249) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816) > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723) > at > org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.write(Netty4MessageChannelHandler.java:56) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1089) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1136) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1078) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at java.lang.Thread.run(Thread.java:748) > 2018-03-27 09:56:32.675 STDIO Thread-80 [ERROR] Halting due to Out Of Memory > Error...Thread-80 > 2018-03-27 09:56:38.560 STDERR Thread-1 [INFO] Java HotSpot(TM) 64-Bit Server > VM warning: Using the ParNew young collector with the Serial old collector is > deprecated and will likely be removed in a future release > 2018-03-27 09:56:40.361 o.a.s.d.worker main [INFO] Launching worker for > ESTopology-1-1522058870 on a1cd10fc-dcbd-4e56-a18b-41b6e89985ac:6704 > > > *Below is the GC log:* > > > 2018-03-27T09:55:36.823+0800: 462.021: [GC (Allocation Failure) > 2018-03-27T09:55:36.823+0800: 462.022: [ParNew: 3774912K->317486K(3774912K), > 0.1497501 secs] 4704499K->1666793K(10066368K), 0.1503144 secs] [Times: > user=1.22 sys=0.65, real=0.15 secs] > > 2018-03-27T09:56:29.597+0800: 514.795: [GC (Allocation Failure) > 2018-03-27T09:56:29.597+0800: 514.796: [ParNew: 3673006K->315626K(3774912K), > 0.1109489 secs] 5022313K->1664933K(10066368K), 0.1114803 secs] [Times: > user=1.37 sys=0.00, real=0.11 secs] > Heap > > par new generation total 3774912K, used 2143135K [0x0000000540000000, > 0x0000000640000000, 0x0000000640000000) > > eden space 3355520K, 54% used [0x0000000540000000, 0x00000005af8ad4c0, > 0x000000060cce0000) > > from space 419392K, 75% used [0x0000000626670000, 0x0000000639aaa870, > 0x0000000640000000) > > to space 419392K, 0% used [0x000000060cce0000, 0x000000060cce0000, > 0x0000000626670000) > > tenured generation total 6291456K, used 1349307K [0x0000000640000000, > 0x00000007c0000000, 0x00000007c0000000) > > the space 6291456K, 21% used [0x0000000640000000, 0x00000006925aed18, > 0x00000006925aee00, 0x00000007c0000000) > > Metaspace used 67106K, capacity 580218K, committed 580352K, reserved > 2097152K > > class space used 10503K, capacity 10993K, committed 11008K, reserved > 1048576K > tail: gc.log.0.current:文件已截断 > > Java HotSpot(TM) 64-Bit Server VM (25.141-b15) for linux-amd64 JRE > (1.8.0_141-b15), built on Jul 12 2017 04:21:34 by "java_re" with gcc 4.3.0 > 20080428 (Red Hat 4.3.0-8) > > Memory: 4k page, physical 264403580k(155999164k free), swap > 67108856k(67066856k free) > CommandLine flags: -XX:+DisableExplicitGC -XX:GCLogFileSize=1048576 > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump > -XX:InitialBootClassLoaderMetaspaceSize= > 536870912 -XX:InitialHeapSize=10737418240 > -XX:+ManagementServer -XX:MaxDirectMemorySize=134217728 -XX:MaxHeapSize= > 10737418240 -XX:MaxMetaspaceFreeRatio=80 -XX:MaxNewSize=4294967296 > -XX:MetaspaceSize=1073741824 -XX:NewSize=4294967296 > -XX:NumberOfGCLogFiles=10 -XX:+PrintGC -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers > -XX:+UseCompressedOops -XX:+UseGCLogFileRotation -XX:+UseParNewGC > > 2018-03-27T09:56:41.561+0800: 3.529: [GC (Allocation Failure) > 2018-03-27T09:56:41.561+0800: 3.529: [ParNew: 3355520K->34156K(3774912K), > 0.0419198 secs] 3355520K->34156K(10066368K), 0.0421130 secs] [Times: > user=0.33 sys=0.03, real=0.04 secs] > > 2018-03-27T09:56:42.847+0800: 4.815: [GC (Allocation Failure) > 2018-03-27T09:56:42.848+0800: 4.816: [ParNew: 3389676K->37180K(3774912K), > 0.0218977 secs] 3389676K->37180K(10066368K), 0.0222756 secs] [Times: > user=0.26 sys=0.03, real=0.02 secs] > > *the last moment of the worker:* > > > *the JVM arguments:* > -Dlogging.sensitivity=S3 > -Dlogfile.name=worker.log > -Dstorm.home=/opt/apache-storm-1.1.0 > -Dworkers.artifacts=/home/hadoop_stq_A/srv/storm/logs/workers-artifacts > -Dstorm.id=ESTopology-1-1522058870 > -Dworker.id=f7c73b57-2fc7-4453-ad86-d0b491c4ffd7 > -Dworker.port=6704 > -Dstorm.log.dir=/home/hadoop_stq_A/srv/storm/logs > -Dlog4j.configurationFile=/opt/apache-storm-1.1.0/log4j2/worker.xml > > -DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector > -Dstorm.local.dir=/home/hadoop_stq_A/srv/storm/storm-local-data/ > -XX:+DisableExplicitGC > -Xmx10g > -Xms10g > -XX:MetaspaceSize=1g > -XX:InitialBootClassLoaderMetaspaceSize=512m > -XX:MaxMetaspaceFreeRatio=80 > -XX:MaxDirectMemorySize=128m > -XX:NewSize=4g > -XX:MaxNewSize=4g > -XX:+PrintGCDetails > -Xloggc:artifacts/gc.log > -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps > -XX:+UseParNewGC > -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 > -XX:GCLogFileSize=1M > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=artifacts/heapdump > -Dcom.sun.management.jmxremote > -Dcom.sun.managemen > > So my question is: > 1. How does OOM occur since the worker heap is abundant? > 2. Does OOM lead to worker restart or topology failure? > > In addition, any information helpful in identifying the root case and > solving the problem is quite appreciated. > > ------------------------------ > [email protected] > >
