Hello Pavel,

I was able to reproduce this issue and I've attached the DEBUG log and
thread dump for three nodes as you suggested.
Archive.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/Archive.zip>  

This time, there's no "no route to host" exception between server and client
nodes.

Node2 and node3 logs "Unable to await partitions release latch within
timeout: ClientLatch" shortly after cluster starts, node1 don't have
explicitly errors.

And cluster begins to freeze after about 20 minutes after the data ingestion
starts.

The attached picture is data streaming threads running/park time slice in
each of three nodes.
You can see that node3 freezes first then node2 freezes.
So client can only writes to node1 and triggered a lot of rebalancing.

node1.png
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node1.png>    
node2.png
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2.png>  
node3.png
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node3.png>  

By the time I wrote the post, the data ingestion usually takes 5 minutes is
still not finished after 1.1 hour.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to