hi Ignite team, In a cluster with 20 server nodes, I manually restarted a server node to test its partition re-balance and reliability, then the restarted node couldn't join the topology with the below error. And this process lasted for few hours, but still couldn't move forward.
The attached is the log in the other remote server nodes. FYI, we've a big cache with 20G off_heap memory per node. Would you like to take a look and give us some suggestion on how to tune this? Any suggestion or advice will be appreciated. Thanks, -Jason [TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=30000] error_log.txt <http://apache-ignite-users.70518.x6.nabble.com/file/n6987/error_log.txt> -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Fail-to-join-topology-and-repeat-join-process-tp6987.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
