Hi
I am running Ignite data grid in persistent mode with following
configuration.
20 clients
3 ignite servers (S1,S2,S3) of Xmx 5G started with 2.6.0
I have set WAL and WALArchiving to the same path so that the WAL Archiving
is disabled.
For TcpCommunicationSpi
<property name="usePairedConnections" value="true"></property>
<property name="messageQueueLimit" value="1024"></property>
After 15-20 minutes of run (by that time all clients get connected), S2 goes
down owing to full GC. I have repeated this experiment 3 times and everytime
either of the three servers go down. I am taking heap dump but it is dead
slow. Please look at the time around 10:15.
But I am pasting the ignite server logs for S2
2019-01-03 10:14:49,961 INFO [exchange-worker-#122%ABCD2%] {}
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander
- Starting rebalancing [grp=ProductCache, mode=ASYNC,
fromNode=c861aadf-692c-439b-a27c-5c0c0bc25cc2, partitionsCount=246,
topology=AffinityTopologyVersion [topVer=21, minorTopVer=2],
rebalanceId=281]
2019-01-03 10:14:50,078 INFO [grid-nio-worker-tcp-comm-10-#99%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted
incoming communication connection [locAddr=/127.0.0.1:47100,
rmtAddr=/127.0.0.1:59476]
2019-01-03 10:14:53,948 INFO [db-checkpoint-thread-#123%ABCD2%] {}
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager
- Checkpoint finished [cpId=e8b1e7a7-ab78-49d4-8dda-9752fc91d92d,
pages=343796, markPos=FileWALPointer [idx=49, fileOff=58182175, len=14930],
walSegmentsCleared=0, markDuration=883ms, pagesWrite=3802ms, fsync=5036ms,
total=9922ms]
2019-01-03 10:14:59,241 INFO [grid-nio-worker-tcp-comm-12-#101%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted
incoming communication connection [locAddr=/127.0.0.1:47100,
rmtAddr=/127.0.0.1:59508]
2019-01-03 10:14:59,442 INFO [grid-nio-worker-tcp-comm-14-#103%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted
incoming communication connection [locAddr=/127.0.0.1:47100,
rmtAddr=/127.0.0.1:59510]
2019-01-03 10:15:00,242 ERROR [grid-nio-worker-tcp-comm-12-#101%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Failed to
process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=12,
bytesRcvd=1184852821, bytesSent=2534, bytesRcvd0=42, bytesSent0=38,
select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-12,
igniteInstanceName=ABCD2, finished=false, hashCode=1249811755,
interrupted=false, runner=grid-nio-worker-tcp-comm-12-#101%ABCD2%]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:59508,
createTime=1546510499232, closeTime=0, bytesSent=38, bytesRcvd=42,
bytesSent0=38, bytesRcvd0=42, sndSchedTime=1546510499232,
lastSndTime=1546510499232, lastRcvTime=1546510499232, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@70f888ac, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]] java.io.IOException:
Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1250)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
2019-01-03 10:15:00,242 WARN [grid-nio-worker-tcp-comm-12-#101%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Closing NIO
session because of unhandled exception [cls=class
o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
2019-01-03 10:15:03,906 INFO [grid-nio-worker-tcp-comm-16-#105%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted
incoming communication connection [locAddr=/10.63.216.22:47100,
rmtAddr=/10.63.216.50:56876]
2019-01-03 10:15:04,109 INFO [grid-nio-worker-tcp-comm-18-#107%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted
incoming communication connection [locAddr=/10.63.216.22:47100,
rmtAddr=/10.63.216.50:56878]
2019-01-03 10:15:04,908 ERROR [grid-nio-worker-tcp-comm-16-#105%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Failed to
process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=16,
bytesRcvd=1257163869, bytesSent=2484, bytesRcvd0=42, bytesSent0=38,
select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-16,
igniteInstanceName=ABCD2, finished=false, hashCode=1355017133,
interrupted=false, runner=grid-nio-worker-tcp-comm-16-#105%ABCD2%]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/10.63.216.22:47100, rmtAddr=/10.63.216.50:56876,
createTime=1546510503900, closeTime=0, bytesSent=38, bytesRcvd=42,
bytesSent0=38, bytesRcvd0=42, sndSchedTime=1546510503900,
lastSndTime=1546510503900, lastRcvTime=1546510503900, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@70f888ac, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]] java.io.IOException:
Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1250)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
2019-01-03 10:15:04,908 WARN [grid-nio-worker-tcp-comm-16-#105%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Closing NIO
session because of unhandled exception [cls=class
o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
2019-01-03 10:15:06,428 INFO [grid-nio-worker-tcp-comm-0-#89%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted
incoming communication connection [locAddr=/127.0.0.1:47100,
rmtAddr=/127.0.0.1:59518]
2019-01-03 10:15:06,630 INFO [grid-nio-worker-tcp-comm-2-#91%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted
incoming communication connection [locAddr=/127.0.0.1:47100,
rmtAddr=/127.0.0.1:59522]
2019-01-03 10:15:07,429 ERROR [grid-nio-worker-tcp-comm-0-#89%ABCD2%] {}
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Failed to
process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0,
bytesRcvd=949538335, bytesSent=2282, bytesRcvd0=927948, bytesSent0=38,
select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0,
igniteInstanPattern not found (press RETURN)
GC Logs for S2 (excerpts)
2019-01-03T10:14:58.873+0000: 1544.657: [GC pause (G1 Evacuation Pause)
(mixed), 0.0352468 secs]
[Parallel Time: 29.1 ms, GC Workers: 28]
[GC Worker Start (ms): Min: 1544657.8, Avg: 1544658.3, Max: 1544658.8,
Diff: 1.0]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.4, Max: 1.2, Diff: 1.2, Sum:
12.5]
[Update RS (ms): Min: 7.9, Avg: 8.4, Max: 11.3, Diff: 3.4, Sum: 235.8]
[Processed Buffers: Min: 1, Avg: 10.8, Max: 20, Diff: 19, Sum: 303]
[Scan RS (ms): Min: 0.0, Avg: 1.3, Max: 1.8, Diff: 1.8, Sum: 36.8]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
Sum: 0.6]
[Object Copy (ms): Min: 15.2, Avg: 16.8, Max: 18.1, Diff: 2.8, Sum:
471.2]
[Termination (ms): Min: 0.0, Avg: 1.3, Max: 1.3, Diff: 1.3, Sum: 35.4]
[Termination Attempts: Min: 1, Avg: 1.4, Max: 3, Diff: 2, Sum: 39]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum:
1.6]
[GC Worker Total (ms): Min: 27.8, Avg: 28.4, Max: 28.8, Diff: 1.0,
Sum: 793.9]
[GC Worker End (ms): Min: 1544686.6, Avg: 1544686.7, Max: 1544686.7,
Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.2 ms]
[Other: 4.8 ms]
[Choose CSet: 0.4 ms]
[Ref Proc: 0.7 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.5 ms]
[Humongous Register: 0.2 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 2.0 ms]
[Eden: 224.0M(224.0M)->0.0B(2215.0M) Survivors: 32.0M->32.0M Heap:
1942.5M(5120.0M)->1609.5M(5120.0M)]
[Times: user=0.79 sys=0.01, real=0.04 secs]
2019-01-03T10:15:10.173+0000: 1555.956: [GC pause (G1 Evacuation Pause)
(young) (to-space exhausted), 1.2771571 secs]
[Parallel Time: 1070.4 ms, GC Workers: 28]
[GC Worker Start (ms): Min: 1555957.8, Avg: 1555958.1, Max: 1555958.7,
Diff: 0.9]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 1.1, Diff: 1.1, Sum:
13.9]
[Update RS (ms): Min: 3.1, Avg: 12.4, Max: 20.2, Diff: 17.1, Sum:
346.2]
[Processed Buffers: Min: 8, Avg: 25.0, Max: 43, Diff: 35, Sum: 701]
[Scan RS (ms): Min: 0.5, Avg: 1.7, Max: 7.9, Diff: 7.4, Sum: 48.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.2]
[Object Copy (ms): Min: 1040.3, Avg: 1051.5, Max: 1062.3, Diff: 22.0,
Sum: 29441.9]
[Termination (ms): Min: 0.0, Avg: 3.2, Max: 10.6, Diff: 10.6, Sum:
89.7]
[Termination Attempts: Min: 1, Avg: 11.1, Max: 27, Diff: 26, Sum:
310]
[GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum:
6.5]
[GC Worker Total (ms): Min: 1068.8, Avg: 1069.5, Max: 1070.0, Diff:
1.2, Sum: 29946.8]
[GC Worker End (ms): Min: 1557027.4, Avg: 1557027.6, Max: 1557027.8,
Diff: 0.4]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.5 ms]
[Other: 205.2 ms]
[Evacuation Failure: 197.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.2 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.4 ms]
[Humongous Register: 0.3 ms]
[Humongous Reclaim: 0.2 ms]
[Free CSet: 3.2 ms]
****************************************
2019-01-03T10:22:00.265+0000: 1966.049: [GC pause (G1 Evacuation
Pause)
(young) (initial-mark), 0.0210944 secs]
[Parallel Time: 12.1 ms, GC Workers: 28]
[GC Worker Start (ms): Min: 1966050.3, Avg: 1966050.8, Max: 1966051.4,
Diff: 1.1]
[Ext Root Scanning (ms): Min: 1.3, Avg: 4.7, Max: 10.4, Diff: 9.1,
Sum: 132.3]
[Update RS (ms): Min: 0.0, Avg: 0.1, Max: 0.7, Diff: 0.7, Sum: 2.5]
[Processed Buffers: Min: 0, Avg: 1.2, Max: 5, Diff: 5, Sum: 34]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.0]
[Object Copy (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 1.2]
[Termination (ms): Min: 0.0, Avg: 6.2, Max: 9.6, Diff: 9.6, Sum:
172.4]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 28]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.4]
[GC Worker Total (ms): Min: 10.4, Avg: 11.0, Max: 11.6, Diff: 1.1,
Sum: 308.8]
[GC Worker End (ms): Min: 1966061.8, Avg: 1966061.8, Max: 1966062.2,
Diff: 0.4]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.5 ms]
[Other: 8.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 5.2 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.6 ms]
[Humongous Register: 0.2 ms]
[Humongous Reclaim: 0.5 ms]
[Free CSet: 0.2 ms]
[Eden: 0.0B(256.0M)->0.0B(256.0M) Survivors: 0.0B->0.0B Heap:
5075.6M(5120.0M)->5075.6M(5120.0M)]
[Times: user=0.22 sys=0.00, real=0.02 secs]
2019-01-03T10:22:00.287+0000: 1966.070: [GC
concurrent-root-region-scan-start]
2019-01-03T10:22:00.287+0000: 1966.070: [GC concurrent-root-region-scan-end,
0.0000520 secs]
2019-01-03T10:22:00.287+0000: 1966.070: [GC concurrent-mark-start]
2019-01-03T10:22:00.288+0000: 1966.072: [Full GC (Allocation Failure)
(END)
Can someone help me on the same ? All three times the re-balancing is
started
2019-01-03 10:14:49,961 INFO [exchange-worker-#122%ABCD2%] {}
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander
- Starting rebalancing [grp=ProductCache, mode=ASYNC,
fromNode=c861aadf-692c-439b-a27c-5c0c0bc25cc2, partitionsCount=246,
topology=AffinityTopologyVersion [topVer=21, minorTopVer=2],
rebalanceId=281]
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/