Hello Pavel, 

The PME stuck again and here's the detailed log and thread dump.
node1.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node1.zip>  
node2.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2.zip>  
node3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node3.zip>  

This time, the client prints the following log and stuck.

18/07/31 06:40:48 INFO exchange.time: Started exchange init
[topVer=AffinityTopologyVersion [topVer=98, minorTopVer=0], crd=false,
evt=NODE_JOINED, evtNode=e6b3704f-0e12-44bb-b14f-af9b0b49df4e,
customEvt=null, allowMerge=true]
18/07/31 06:40:49 INFO cache.GridCacheProcessor: Started cache
[name=ignite-sys-cache, id=-2100569601, memoryPolicyName=sysMemPlc,
mode=REPLICATED, atomicity=TRANSACTIONAL, backups=2147483647]
18/07/31 06:40:49 INFO tcp.TcpCommunicationSpi: Established outgoing
communication connection [locAddr=/10.255.201.110:59811,
rmtAddr=001/10.240.161.91:47100]
18/07/31 06:40:49 INFO exchange.time: Finished exchange init
[topVer=AffinityTopologyVersion [topVer=98, minorTopVer=0], crd=false]
18/07/31 06:40:59 WARN internal.diagnostic: Failed to wait for partition map
exchange [topVer=AffinityTopologyVersion [topVer=98, minorTopVer=0],
node=e6b3704f-0e12-44bb-b14f-af9b0b49df4e]. Consider changing
TransactionConfiguration.txTimeoutOnPartitionMapSynchronization to non
default value to avoid this message. Dumping pending objects that might be
the cause:
18/07/31 06:40:59 WARN internal.diagnostic: Ready affinity version:
AffinityTopologyVersion [topVer=-1, minorTopVer=0]
18/07/31 06:40:59 WARN internal.diagnostic: Last exchange future:
GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e,
addrs=[10.255.201.110, 127.0.0.1], sockAddrs=[rpsj2hsn132/10.255.201.110:0,
/127.0.0.1:0], discPort=0, order=98, intOrder=0,
lastExchangeTime=1533019247506, loc=true, ver=2.6.0#20180709-sha1:5faffcee,
isClient=true], topVer=98, nodeId8=e6b3704f, msg=null, type=NODE_JOINED,
tstamp=1533019248647], crd=TcpDiscoveryNode
[id=0b3ced74-9cc4-4b37-9eea-e0b519bc26fd, addrs=[10.240.161.91, 127.0.0.1],
sockAddrs=[/127.0.0.1:49500, 001/10.240.161.91:49500], discPort=49500,
order=1, intOrder=1, lastExchangeTime=1533019248041, loc=false,
ver=2.6.0#20180709-sha1:5faffcee, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
[topVer=98, minorTopVer=0], discoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e,
addrs=[10.255.201.110, 127.0.0.1], sockAddrs=[rpsj2hsn132/10.255.201.110:0,
/127.0.0.1:0], discPort=0, order=98, intOrder=0,
lastExchangeTime=1533019247506, loc=true, ver=2.6.0#20180709-sha1:5faffcee,
isClient=true], topVer=98, nodeId8=e6b3704f, msg=null, type=NODE_JOINED,
tstamp=1533019248647], nodeId=e6b3704f, evt=NODE_JOINED], added=true,
initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true,
hash=118882705], init=true, lastVer=null, partReleaseFut=null,
exchActions=ExchangeActions [startCaches=null, stopCaches=null,
startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null],
affChangeMsg=null, initTs=1533019248697, centralizedAff=false,
forceAffReassignment=false, changeGlobalStateE=null, done=false,
state=CLIENT, evtLatch=0, remaining=[07a24ed6-c6c7-4ee1-94cd-5c8ff95d956d,
0b3ced74-9cc4-4b37-9eea-e0b519bc26fd, 8967f541-5627-43a2-ab74-bb822d12f02b],
super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null,
hash=1778222816]]
18/07/31 06:40:59 WARN cache.GridCachePartitionExchangeManager: First 10
pending exchange futures [total=0]
18/07/31 06:40:59 WARN internal.diagnostic: Last 10 exchange futures (total:
1):
18/07/31 06:40:59 WARN internal.diagnostic: >>>
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=98,
minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e, addrs=[10.255.201.110, 127.0.0.1],
sockAddrs=[rpsj2hsn132/10.255.201.110:0, /127.0.0.1:0], discPort=0,
order=98, intOrder=0, lastExchangeTime=1533019247506, loc=true,
ver=2.6.0#20180709-sha1:5faffcee, isClient=true], done=false]
18/07/31 06:40:59 WARN internal.diagnostic: Latch manager state:
ExchangeLatchManager [serverLatches={}, clientLatches={}]
18/07/31 06:40:59 WARN internal.diagnostic: Pending transactions:
18/07/31 06:40:59 WARN internal.diagnostic: Pending explicit locks:
18/07/31 06:40:59 WARN internal.diagnostic: Pending cache futures:
18/07/31 06:40:59 WARN internal.diagnostic: Pending atomic cache futures:
18/07/31 06:40:59 WARN internal.diagnostic: Pending data streamer futures:
18/07/31 06:40:59 WARN internal.diagnostic: Pending transaction deadlock
detection futures:
18/07/31 06:41:08 WARN cache.GridCachePartitionExchangeManager: Failed to
wait for initial partition map exchange. Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.
18/07/31 06:41:19 WARN internal.diagnostic: Failed to wait for partition map
exchange [topVer=AffinityTopologyVersion [topVer=98, minorTopVer=0],
node=e6b3704f-0e12-44bb-b14f-af9b0b49df4e]. Consider changing
TransactionConfiguration.txTimeoutOnPartitionMapSynchronization to non
default value to avoid this message. Dumping pending objects that might be
the cause:
18/07/31 06:41:19 WARN internal.diagnostic: Ready affinity version:
AffinityTopologyVersion [topVer=-1, minorTopVer=0]
18/07/31 06:41:19 WARN internal.diagnostic: Last exchange future:
GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e,
addrs=[10.255.201.110, 127.0.0.1], sockAddrs=[rpsj2hsn132/10.255.201.110:0,
/127.0.0.1:0], discPort=0, order=98, intOrder=0,
lastExchangeTime=1533019247506, loc=true, ver=2.6.0#20180709-sha1:5faffcee,
isClient=true], topVer=98, nodeId8=e6b3704f, msg=null, type=NODE_JOINED,
tstamp=1533019248647], crd=TcpDiscoveryNode
[id=0b3ced74-9cc4-4b37-9eea-e0b519bc26fd, addrs=[10.240.161.91, 127.0.0.1],
sockAddrs=[/127.0.0.1:49500, 001/10.240.161.91:49500], discPort=49500,
order=1, intOrder=1, lastExchangeTime=1533019248041, loc=false,
ver=2.6.0#20180709-sha1:5faffcee, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
[topVer=98, minorTopVer=0], discoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e,
addrs=[10.255.201.110, 127.0.0.1], sockAddrs=[rpsj2hsn132/10.255.201.110:0,
/127.0.0.1:0], discPort=0, order=98, intOrder=0,
lastExchangeTime=1533019247506, loc=true, ver=2.6.0#20180709-sha1:5faffcee,
isClient=true], topVer=98, nodeId8=e6b3704f, msg=null, type=NODE_JOINED,
tstamp=1533019248647], nodeId=e6b3704f, evt=NODE_JOINED], added=true,
initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true,
hash=118882705], init=true, lastVer=null, partReleaseFut=null,
exchActions=ExchangeActions [startCaches=null, stopCaches=null,
startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null],
affChangeMsg=null, initTs=1533019248697, centralizedAff=false,
forceAffReassignment=false, changeGlobalStateE=null, done=false,
state=CLIENT, evtLatch=0, remaining=[07a24ed6-c6c7-4ee1-94cd-5c8ff95d956d,
0b3ced74-9cc4-4b37-9eea-e0b519bc26fd, 8967f541-5627-43a2-ab74-bb822d12f02b],
super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null,
hash=1778222816]]
18/07/31 06:41:19 WARN cache.GridCachePartitionExchangeManager: First 10
pending exchange futures [total=0]
18/07/31 06:41:19 WARN internal.diagnostic: Last 10 exchange futures (total:
1):
18/07/31 06:41:19 WARN internal.diagnostic: >>>
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=98,
minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e, addrs=[10.255.201.110, 127.0.0.1],
sockAddrs=[rpsj2hsn132/10.255.201.110:0, /127.0.0.1:0], discPort=0,
order=98, intOrder=0, lastExchangeTime=1533019247506, loc=true,
ver=2.6.0#20180709-sha1:5faffcee, isClient=true], done=false]
18/07/31 06:41:19 WARN internal.diagnostic: Latch manager state:
ExchangeLatchManager [serverLatches={}, clientLatches={}]
18/07/31 06:41:19 WARN internal.diagnostic: Pending transactions:
18/07/31 06:41:19 WARN internal.diagnostic: Pending explicit locks:
18/07/31 06:41:19 WARN internal.diagnostic: Pending cache futures:
18/07/31 06:41:19 WARN internal.diagnostic: Pending atomic cache futures:
18/07/31 06:41:19 WARN internal.diagnostic: Pending data streamer futures:
18/07/31 06:41:19 WARN internal.diagnostic: Pending transaction deadlock
detection futures:
18/07/31 06:41:28 WARN cache.GridCachePartitionExchangeManager: Still
waiting for initial partition map exchange
[fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e,
addrs=[10.255.201.110, 127.0.0.1], sockAddrs=[rpsj2hsn132/10.255.201.110:0,
/127.0.0.1:0], discPort=0, order=98, intOrder=0,
lastExchangeTime=1533019247506, loc=true, ver=2.6.0#20180709-sha1:5faffcee,
isClient=true], topVer=98, nodeId8=e6b3704f, msg=null, type=NODE_JOINED,
tstamp=1533019248647], crd=TcpDiscoveryNode
[id=0b3ced74-9cc4-4b37-9eea-e0b519bc26fd, addrs=[10.240.161.91, 127.0.0.1],
sockAddrs=[/127.0.0.1:49500, 001/10.240.161.91:49500], discPort=49500,
order=1, intOrder=1, lastExchangeTime=1533019248041, loc=false,
ver=2.6.0#20180709-sha1:5faffcee, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
[topVer=98, minorTopVer=0], discoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=e6b3704f-0e12-44bb-b14f-af9b0b49df4e,
addrs=[10.255.201.110, 127.0.0.1], sockAddrs=[rpsj2hsn132/10.255.201.110:0,
/127.0.0.1:0], discPort=0, order=98, intOrder=0,
lastExchangeTime=1533019247506, loc=true, ver=2.6.0#20180709-sha1:5faffcee,
isClient=true], topVer=98, nodeId8=e6b3704f, msg=null, type=NODE_JOINED,
tstamp=1533019248647], nodeId=e6b3704f, evt=NODE_JOINED], added=true,
initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true,
hash=118882705], init=true, lastVer=null, partReleaseFut=null,
exchActions=ExchangeActions [startCaches=null, stopCaches=null,
startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null],
affChangeMsg=null, initTs=1533019248697, centralizedAff=false,
forceAffReassignment=false, changeGlobalStateE=null, done=false,
state=CLIENT, evtLatch=0, remaining=[07a24ed6-c6c7-4ee1-94cd-5c8ff95d956d,
0b3ced74-9cc4-4b37-9eea-e0b519bc26fd, 8967f541-5627-43a2-ab74-bb822d12f02b],
super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null,
hash=1778222816]]]

And servers prints out logs on node1
[2018-07-31T05:43:04,781][WARN
][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await
partitions release latch within t=2,
pendingAcks=[07a24ed6-c6c7-4ee1-94cd-5c8ff95d956d,
8967f541-5627-43a2-ab74-bb822d12f02b], super=CompletableLatch [id=exchange,
topVerpVer=58, minorTopVer=0]]]
and these logs on node2 and node3
Unable to await partitions release latch within timeout: ClientLatch
[coordinator=TcpDiscoveryNode [id=0b3ced74-9cc4-4b37-9eea-e0b519bc26fd,
addrs=[10.240.161.91, 127.0.0.1], sockAddrs=[        /127.0.0.1:49500],
discPort=49500, order=1, intOrder=1, lastExchangeTime=1532940317188, loc=fa     
  
lse, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], ackSent=true,
super=CompletableLatch [id=exchange,
topVer=AffinityTopologyVersion[topVer=90, minorTopVer=0]]]

I did a little research and found that both ClientLatch and CompletableLatch
in ExchangeLatchManager are introduced by this ticket.
https://issues.apache.org/jira/browse/IGNITE-7871

Could this be the cause of PME freezing?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to