My Java is still rusty and I often run into OutOfMemoryError: GC overhead
exceeded...
Yes, I need to look for memory leaks...
But first I need to clear up this memory so I can run again without having to
shut down and restart everything.
I've tried using the jcmd <pid> GC.run command on eachof the JVM instances on a
taskmanager but I get a boat load of output like this:
On the host running the
command:com.sun.tools.attach.AttachNotSupportedException: Unable to open socket
file: target process not responding or HotSpot VM not loaded at
sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106) at
sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:213)
at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:140) at
sun.tools.jcmd.JCmd.main(JCmd.java:129)
and on the taskmanager log:
"Flink-IPC Server handler 1 on 6121" daemon prio=10 tid=0x00007f5f107ee000
nid=0x8f waiting on condition [0x00007f5eb4803000] java.lang.Thread.State:
WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to
wait for <0x00000000f37e95c0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at
org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:941)
"Flink-IPC Server handler 0 on 6121" daemon prio=10 tid=0x00007f5f107eb800
nid=0x8e waiting on condition [0x00007f5eb4904000] java.lang.Thread.State:
WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to
wait for <0x00000000f37e95c0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at
org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:941)
"Flink-IPC Server listener on 6121" daemon prio=10 tid=0x00007f5f107e9800
nid=0x8d runnable [0x00007f5eb4a05000] java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked
<0x00000000f385d3c0> (a sun.nio.ch.Util$2) - locked <0x00000000f385d3d0> (a
java.util.Collections$UnmodifiableSet) - locked <0x00000000f385d378> (a
sun.nio.ch.EPollSelectorImpl) at
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102) at
org.apache.flink.runtime.ipc.Server$Listener.run(Server.java:341)
"Flink-IPC Server Responder" daemon prio=10 tid=0x00007f5f107e8800 nid=0x8c
runnable [0x00007f5eb4b06000] java.lang.Thread.State: RUNNABLE at
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked
<0x00000000f387b528> (a sun.nio.ch.Util$2) - locked <0x00000000f387b538> (a
java.util.Collections$UnmodifiableSet) - locked <0x00000000f387b4e0> (a
sun.nio.ch.EPollSelectorImpl) at
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at
org.apache.flink.runtime.ipc.Server$Responder.run(Server.java:506)
"Service Thread" daemon prio=10 tid=0x00007f5f100c2000 nid=0x8a runnable
[0x0000000000000000] java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007f5f100c0000 nid=0x89 waiting on
condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007f5f100bd000 nid=0x88 waiting on
condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007f5f100b3000 nid=0x87 waiting on
condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007f5f1009c800 nid=0x86 in Object.wait()
[0x00007f5eb605b000] java.lang.Thread.State: WAITING (on object monitor) at
java.lang.Object.wait(Native Method) - waiting on <0x00000000f381cc08> (a
java.lang.ref.ReferenceQueue$Lock) at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked
<0x00000000f381cc08> (a java.lang.ref.ReferenceQueue$Lock) at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
"Reference Handler" daemon prio=10 tid=0x00007f5f10098800 nid=0x85 in
Object.wait() [0x00007f5eb615c000] java.lang.Thread.State: WAITING (on object
monitor) at java.lang.Object.wait(Native Method) - waiting on
<0x00000000f381c820> (a java.lang.ref.Reference$Lock) at
java.lang.Object.wait(Object.java:503) at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked
<0x00000000f381c820> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007f5f1000d800 nid=0x6a in Object.wait()
[0x00007f5f178d4000] java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method) - waiting on <0x00000000fbe14200> (a
java.lang.Object) at java.lang.Object.wait(Object.java:503) at
org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.java:1115) -
locked <0x00000000fbe14200> (a java.lang.Object)
"VM Thread" prio=10 tid=0x00007f5f10096000 nid=0x84 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f5f10023000 nid=0x6b runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f5f10025000 nid=0x6c runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f5f10027000 nid=0x6d runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f5f10029000 nid=0x6e runnable
"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007f5f1002a800 nid=0x6f runnable
"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007f5f1002c800 nid=0x70 runnable
"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007f5f1002e800 nid=0x71 runnable
"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007f5f10030000 nid=0x72 runnable
"GC task thread#8 (ParallelGC)" prio=10 tid=0x00007f5f10032000 nid=0x73 runnable
"GC task thread#9 (ParallelGC)" prio=10 tid=0x00007f5f10034000 nid=0x74 runnable
"GC task thread#10 (ParallelGC)" prio=10 tid=0x00007f5f10036000 nid=0x75
runnable
"GC task thread#11 (ParallelGC)" prio=10 tid=0x00007f5f10037800 nid=0x76
runnable
"GC task thread#12 (ParallelGC)" prio=10 tid=0x00007f5f10039800 nid=0x77
runnable
"GC task thread#13 (ParallelGC)" prio=10 tid=0x00007f5f1003b800 nid=0x78
runnable
"GC task thread#14 (ParallelGC)" prio=10 tid=0x00007f5f1003d000 nid=0x79
runnable
"GC task thread#15 (ParallelGC)" prio=10 tid=0x00007f5f1003f000 nid=0x7a
runnable
"GC task thread#16 (ParallelGC)" prio=10 tid=0x00007f5f10041000 nid=0x7b
runnable
"GC task thread#17 (ParallelGC)" prio=10 tid=0x00007f5f10043000 nid=0x7c
runnable
"GC task thread#18 (ParallelGC)" prio=10 tid=0x00007f5f10044800 nid=0x7d
runnable
"GC task thread#19 (ParallelGC)" prio=10 tid=0x00007f5f10046800 nid=0x7e
runnable
"GC task thread#20 (ParallelGC)" prio=10 tid=0x00007f5f10048800 nid=0x7f
runnable
"GC task thread#21 (ParallelGC)" prio=10 tid=0x00007f5f1004a000 nid=0x80
runnable
"GC task thread#22 (ParallelGC)" prio=10 tid=0x00007f5f1004c000 nid=0x81
runnable
"VM Periodic Task Thread" prio=10 tid=0x00007f5f100d5000 nid=0x8b waiting on
condition
JNI global references: 530
Heap PSYoungGen total 76800K, used 63133K [0x00000000faa80000,
0x0000000100000000, 0x0000000100000000) eden space 66048K, 95% used
[0x00000000faa80000,0x00000000fe827690,0x00000000feb00000) from space 10752K,
0% used [0x00000000ff580000,0x00000000ff580000,0x0000000100000000) to space
10752K, 0% used [0x00000000feb00000,0x00000000feb00000,0x00000000ff580000)
ParOldGen total 175104K, used 175046K [0x00000000eff80000,
0x00000000faa80000, 0x00000000faa80000) object space 175104K, 99% used
[0x00000000eff80000,0x00000000faa71bb0,0x00000000faa80000) PSPermGen
total 29696K, used 29267K [0x00000000dff80000, 0x00000000e1c80000,
0x00000000eff80000) object space 29696K, 98% used
[0x00000000dff80000,0x00000000e1c14d38,0x00000000e1c80000)
Any insight on clearing GC cleanly when this happens?
THanks!