It happened in the below order:

1. one server node failed, please refer to "failed_server_node.log" and
"thread_dump.txt"
2. All the other server nodes identified this and isolated it successfully,
please refer to "normal_server_node.log"
3. All the client nodes identified this and isolated the failed server node,
and no other errors, but the PutAll in all the client nodes hang, please
refer to "client_node.log"

FYI, the Ignite is the latest 1.6.0 .net version and the cache is
partitioned with 2 backups and use OFF_HEAP, and PRIMARY_SYNC

BTW, 
 - new client node can connect to the cluster and works very well.
 - The failed server couldn't restart automatically, and after the network
recovered, when restarted it manually, it could work very well.
 - Event the failed nodes recovered, all the client nodes still hang and
couldn't recover.

The detailed logs:
failed_server_node.log
<http://apache-ignite-users.70518.x6.nabble.com/file/n6642/failed_server_node.log>
  
normal_server_node.log
<http://apache-ignite-users.70518.x6.nabble.com/file/n6642/normal_server_node.log>
  
client_node.log
<http://apache-ignite-users.70518.x6.nabble.com/file/n6642/client_node.log>  
thread_dump.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/n6642/thread_dump.txt>  

Thanks,
-Jason



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-net-version-When-on-server-node-failed-the-PutAll-in-all-the-client-nodes-hang-tp6642.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to