Hello, On 02/09/2012 03:29 PM, Karl Rößmann wrote: > Hi all, > > we run a three Node HA Cluster using cLVM and Xen. > > After installing some online updates node by node
While cluster was in maintenance-mode or when cluster was shut down on the node that received updates? > the cLVM is stuck, and the last (updated) Node does not want to join the > the cLVM. > Two nodes are still running > The xen VMs are running (they have there disks on cLVs), > but commands like 'lvdisplay' do not work. > > Is there a way to recover the cLVM without restarting the whole cluster ? Any logentries about the controld on orion1 ... what is the output of "crm_mon -1fr"? ... looks like there is a problem with starting the dlm_controld.pcmk .... and I wonder why orion1 is not fenced on stop errors, or did that happen? Did you inspect the output of "dlm_tool ls/dump" on all nodes where the controld is running? Your crm_mon output shows orion1 offline ... seems not to be timely related to your logs? Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > we have the latest > SuSE Sles SP1 and HA-Extension including > pacemaker-1.1.5-5.9.11.1 > corosync-1.3.3-0.3.1 > > Some ERROR messages: > Feb 9 12:06:42 orion1 crmd: [6462]: ERROR: process_lrm_event: LRM > operation clvm:0_start_0 (15) Timed Out (timeout=240000ms) > Feb 9 12:13:41 orion1 crmd: [6462]: ERROR: process_lrm_event: LRM > operation cluvg1:2_start_0 (19) Timed Out (timeout=240000ms) > Feb 9 12:16:21 orion1 crmd: [6462]: ERROR: process_lrm_event: LRM > operation cluvg1:2_stop_0 (20) Timed Out (timeout=100000ms) > Feb 9 13:39:10 orion1 crmd: [14350]: ERROR: process_lrm_event: LRM > operation clvm:0_start_0 (15) Timed Out (timeout=240000ms) > Feb 9 13:53:38 orion1 crmd: [14350]: ERROR: process_lrm_event: LRM > operation cluvg1:2_start_0 (19) Timed Out (timeout=240000ms) > Feb 9 13:56:18 orion1 crmd: [14350]: ERROR: process_lrm_event: LRM > operation cluvg1:2_stop_0 (20) Timed Out (timeout=100000ms) > > > > Feb 9 12:11:55 orion2 crm_resource: [13025]: ERROR: > resource_ipc_timeout: No messages received in 60 seconds > Feb 9 12:13:41 orion2 crmd: [5882]: ERROR: send_msg_via_ipc: Unknown > Sub-system (13025_crm_resource)... discarding message. > Feb 9 12:14:41 orion2 crmd: [5882]: ERROR: print_elem: Aborting > transition, action lost: [Action 35]: In-flight (id: cluvg1:2_start_0, > loc: orion1, priority: 0) > Feb 9 13:54:38 orion2 crmd: [5882]: ERROR: print_elem: Aborting > transition, action lost: [Action 35]: In-flight (id: cluvg1:2_start_0, > loc: orion1, priority: 0) > > > > Some additional information: > > crm_mon -1: > ============ > Last updated: Thu Feb 9 15:10:34 2012 > Stack: openais > Current DC: orion2 - partition with quorum > Version: 1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60 > 3 Nodes configured, 3 expected votes > 17 Resources configured. > ============ > > Online: [ orion2 orion7 ] > OFFLINE: [ orion1 ] > > Clone Set: dlm_clone [dlm] > Started: [ orion2 orion7 ] > Stopped: [ dlm:0 ] > Clone Set: clvm_clone [clvm] > Started: [ orion2 orion7 ] > Stopped: [ clvm:0 ] > sbd_stonith (stonith:external/sbd): Started orion2 > Clone Set: cluvg1_clone [cluvg1] > Started: [ orion2 orion7 ] > Stopped: [ cluvg1:2 ] > styx (ocf::heartbeat:Xen): Started orion7 > shib (ocf::heartbeat:Xen): Started orion7 > wiki (ocf::heartbeat:Xen): Started orion2 > horde (ocf::heartbeat:Xen): Started orion7 > www (ocf::heartbeat:Xen): Started orion7 > enventory (ocf::heartbeat:Xen): Started orion2 > mailrelay (ocf::heartbeat:Xen): Started orion2 > > > > crm configure show > node orion1 \ > attributes standby="off" > node orion2 \ > attributes standby="off" > node orion7 \ > attributes standby="off" > primitive cluvg1 ocf:heartbeat:LVM \ > params volgrpname="cluvg1" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" \ > meta target-role="Started" > primitive clvm ocf:lvm2:clvmd \ > params daemon_timeout="30" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" \ > meta target-role="Started" > primitive dlm ocf:pacemaker:controld \ > op monitor interval="120s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" \ > meta target-role="Started" > primitive enventory ocf:heartbeat:Xen \ > meta target-role="Started" allow-migrate="true" \ > operations $id="enventory-operations" \ > op monitor interval="10" timeout="30" \ > op migrate_from interval="0" timeout="600" \ > op migrate_to interval="0" timeout="600" \ > params xmfile="/etc/xen/vm/enventory" shutdown_timeout="60" > primitive horde ocf:heartbeat:Xen \ > meta target-role="Started" is-managed="true" allow-migrate="true" \ > operations $id="horde-operations" \ > op monitor interval="10" timeout="30" \ > op migrate_from interval="0" timeout="600" \ > op migrate_to interval="0" timeout="600" \ > params xmfile="/etc/xen/vm/horde" shutdown_timeout="120" > primitive sbd_stonith stonith:external/sbd \ > params > sbd_device="/dev/disk/by-id/scsi-360080e50001c150e0000019e4df6d4d5-part1" \ > meta target-role="started" > ... > ... > ... > clone cluvg1_clone cluvg1 \ > meta interleave="true" target-role="started" is-managed="true" > clone clvm_clone clvm \ > meta globally-unique="false" interleave="true" > target-role="started" > clone dlm_clone dlm \ > meta globally-unique="false" interleave="true" > target-role="started" > colocation cluvg1_with_clvm inf: cluvg1_clone clvm_clone > colocation clvm_with_dlm inf: clvm_clone dlm_clone > colocation enventory_with_cluvg1 inf: enventory cluvg1_clone > colocation horde_with_cluvg1 inf: horde cluvg1_clone > ... > ... more Xen VMs > ... > order cluvg1_before_enventory inf: cluvg1_clone enventory > order cluvg1_before_horde inf: cluvg1_clone horde > order clvm_before_cluvg1 inf: clvm_clone cluvg1_clone > order dlm_before_clvm inf: dlm_clone clvm_clone > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="3" \ > stonith-timeout="420s" \ > last-lrm-refresh="1328792018" > rsc_defaults $id="rsc_defaults-options" \ > resource-stickiness="10" > op_defaults $id="op_defaults-options" \ > record-pending="false" > >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
