Hello John,

Can you share the full log and cluster configuration?
For instance, the cluster may fail the node due to unresponsive socket
request if this node had long GC pauses due to high load.

> Does this mean this node was also the master?

Not sure I've got your question. Do you mean the coordinator node?

On Tue, 17 Nov 2020 at 20:52, John Smith <[email protected]> wrote:
>
> So if I understand correctly the logs below...
>
> The node that shut off was timing out trying to get partition exchange from 
> the indicated nodes and it shut itself off correct? Does this mean this node 
> was also the master?
>
> 1- The time indicated in the log is that UTC?
> 2- I'm trying to see if it was high IO, but the node that stopped had no high 
> IO.
> 3- the other nodes had an average peek of about 0.3%-0.5% And that was around 
> 16:30 EST So not sure if those times match with the log.
> 4- On super high load days we get up to 7%-9% IO Wait but we never lost a 
> node on those occasions.
> 5- The nodes are persistent nodes so I'm guessing on write there's high IO 
> can we reduce the commit time? but even then I'm not sure it's related. As 
> even on higher load we don't lose the node.
>
> [04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Completed 
> partition exchange [localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=132, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=e5645874-77e9-4455-8db5-07fa63984276, addrs=[127.0.0.1, xxx.xxx.xxx.5], 
> sockAddrs=[/xxx.xxx.xxx.5:0, /127.0.0.1:0], discPort=0, order=108, 
> intOrder=60, lastExchangeTime=1600798045116, loc=false, 
> ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null, 
> durationFromInit=1605328567765]
> [04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=133, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is 
> stopping: xxxxxx]
> [04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Completed 
> partition exchange [localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=133, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=13f473cb-5441-460b-9b9f-a23c1bcf3b0b, addrs=[127.0.0.1, xxx.xxx.xxx.3], 
> sockAddrs=[/127.0.0.1:0, /xxx.xxx.xxx.3:0], discPort=0, order=110, 
> intOrder=61, lastExchangeTime=1600798045127, loc=false, 
> ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null, 
> durationFromInit=1605328567765]
> [04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=134, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is 
> stopping: xxxxxx]
> [04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Completed 
> partition exchange [localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=134, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=ef9f3750-edfa-474e-8e68-6f0abee5095a, addrs=[127.0.0.1, xxx.xxx.xxx.10], 
> sockAddrs=[/127.0.0.1:0, /xxx.xxx.xxx.10:0], discPort=0, order=112, 
> intOrder=62, lastExchangeTime=1600798045137, loc=false, 
> ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null, 
> durationFromInit=1605328567765]
> [04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=135, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is 
> stopping: xxxxxx]
> [04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Completed 
> partition exchange [localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=135, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=e14e5bea-2784-4a6f-af2f-e98a4bf0ab52, addrs=[127.0.0.1, xxx.xxx.xxx.63], 
> sockAddrs=[xxxxxx-0001/xxx.xxx.xxx.63:47500, /127.0.0.1:47500], 
> discPort=47500, order=118, intOrder=65, lastExchangeTime=1600798763474, 
> loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], 
> topVer=null, durationFromInit=1605328567765]
> [04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=136, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is 
> stopping: xxxxxx]
> [04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Completed 
> partition exchange [localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=136, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=8facbe9c-6d9f-4c2d-8d9c-5c7f38dbb7da, addrs=[127.0.0.1, xxx.xxx.xxx.69], 
> sockAddrs=[/127.0.0.1:47500, xxxxxx-0003/xxx.xxx.xxx.69:47500], 
> discPort=47500, order=120, intOrder=66, lastExchangeTime=1600799256509, 
> loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], 
> topVer=null, durationFromInit=1605328567765]
> [04:36:07,796][INFO][db-checkpoint-thread-#56%xxxxxx%][GridCacheDatabaseSharedManager]
>  Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
> checkpointLockHoldTime=14ms, reason='timeout']
> [04:36:07,832][INFO][node-stopper][GridCacheProcessor] Stopped cache 
> [cacheName=carrier-ids-for-phones]
> [04:36:07,839][INFO][node-stopper][GridCacheProcessor] Stopped cache 
> [cacheName=twofa-otp]
> [04:36:07,840][INFO][node-stopper][GridCacheProcessor] Stopped cache 
> [cacheName=SQL_PUBLIC_CARRIER_CODE]
> [04:36:07,841][INFO][node-stopper][GridCacheProcessor] Stopped cache 
> [cacheName=carrier-lookups]
> [04:36:07,842][INFO][node-stopper][GridCacheProcessor] Stopped cache 
> [cacheName=ignite-sys-cache]
> [04:36:08,497][INFO][node-stopper][IgniteKernal%xxxxxx]
>
> >>> +---------------------------------------------------------------------------------+
> >>> Ignite ver. 2.7.0#20181130-sha1:256ae4012cb143b4855b598b740a6f3499ead4db 
> >>> stopped OK
> >>> +---------------------------------------------------------------------------------+
> >>> Ignite instance name: xxxxxx
> >>> Grid uptime: 52 days, 10:25:51.029

Reply via email to