Hi, We have a two node cluster setup, running the following versions: cluster-glue-1.0.6 resource-agents-1.0.4 pacemaker-1.1.8 corosync-1.4.1 libaio-devel-0.3.106 libibverbs-1.1.3 libqb-0.14.2 librdmacm-1.0.10 libtool-ltdl-1.5.22 pacemaker-cli-1.1.8 pacemaker-cluster-libs-1.1.8 pacemaker-libs-1.1.8 crm-1.2
crm_mon failed on Node-1 with the following error: establish cib_ro connection: Resource temporarily unavailable (11) crm resource cleanup <rsc> failed on Node-1 with the following error: Could not establish cib_rw connection: Resource temporarily unavailable (11) Error signing on to the CIB service: Transport endpoint is not connected All the resources were running on both nodes as configured. All the pacemaker & corosync processes were running. After some time node-1 appeared offline: Last updated: Wed Jan 2 11:36:11 2013 Last change: Wed Jan 2 11:31:43 2013 via crmd on CSS-FU-2 Stack: openais Current DC: CSS-FU-2 - partition with quorum Version: 1.1.8-2.el5-394e906 2 Nodes configured, 2 expected votes 19 Resources configured. Online: [ CSS-FU-2 ] OFFLINE: [ CSS-FU-1 ] Next, stopping pacemaker service also didn't succeed. It succeeded on Node-2. We had to kill pacemaker service to bring everything in-sync. I have collated some of the logs (error/warning) of the duration: It can be found at: http://dl.dropbox.com/u/20096935/Pacemaker_Stop_Failure/pacemaker1.1_stop_failur e.txt Immediate help required. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org