[ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

Thomas Lamprecht Wed, 30 Sep 2015 06:20:04 -0700

Hello,

we are using corosync version needle (2.3.5) for our cluster filesystem(pmxcfs).The situation is the following. First we start up the pmxcfs, which isan fuse fs. And if there is an cluster configuration, we start alsocorosync.This allows the filesystem to exist on one node 'cluster's or forcing itin an local mode. We use CPG to send our messages to all members,

the filesystem is in the RAM and all fs operations are sent 'over the wire'.


The problem is now the following:

When we're restarting all (in my test case 3) nodes at the same time, Iget in 1 from 10 cases only CS_ERR_BAD_HANDLE back when callingcpg_mcast_joined to send out the data, but only one node.corosyn-quorumtool shows that we have quorum, and the logs are alsoshowing a healthy connect to the corosync cluster. The failing handle isinitialized once at the initialization of our filesystem. Should it bereinitialized on every reconnect?Restarting the filesystem solves this problem, the strange thing is thatisn't clearly reproduce-able and often works just fine.


Are there some known problems or steps we should look for?


_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

Reply via email to