It happened in the below order: 1. one server node failed, please refer to "failed_server_node.log" and "thread_dump.txt" 2. All the other server nodes identified this and isolated it successfully, please refer to "normal_server_node.log" 3. All the client nodes identified this and isolated the failed server node, and no other errors, but the PutAll in all the client nodes hang, please refer to "client_node.log"
FYI, the Ignite is the latest 1.6.0 .net version and the cache is partitioned with 2 backups and use OFF_HEAP, and PRIMARY_SYNC BTW, - new client node can connect to the cluster and works very well. - The failed server couldn't restart automatically, and after the network recovered, when restarted it manually, it could work very well. - Event the failed nodes recovered, all the client nodes still hang and couldn't recover. The detailed logs: failed_server_node.log <http://apache-ignite-users.70518.x6.nabble.com/file/n6642/failed_server_node.log> normal_server_node.log <http://apache-ignite-users.70518.x6.nabble.com/file/n6642/normal_server_node.log> client_node.log <http://apache-ignite-users.70518.x6.nabble.com/file/n6642/client_node.log> thread_dump.txt <http://apache-ignite-users.70518.x6.nabble.com/file/n6642/thread_dump.txt> Thanks, -Jason -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-net-version-When-on-server-node-failed-the-PutAll-in-all-the-client-nodes-hang-tp6642.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
