Thanks Andrew. Yes, cibadmin -Ql works, but cibadmin -Q not.
What is DC? And here is the logs. Feb 10 08:57:30 arsvr1 cibadmin: [4264]: info: Invoked: cibadmin -Ql Feb 10 08:57:32 arsvr1 cibadmin: [4265]: info: Invoked: cibadmin -Q Feb 10 08:58:04 arsvr1 crmd: [960]: info: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ] Feb 10 08:58:04 arsvr1 crmd: [960]: info: do_dc_release: DC role released Feb 10 08:58:04 arsvr1 crmd: [960]: info: do_te_control: Transitioner is now inactive Feb 10 08:58:08 arsvr1 crmd: [960]: info: update_dc: Set DC to arsvr2 (3.0.1) Feb 10 08:58:10 arsvr1 attrd: [959]: info: attrd_local_callback:Sending full refresh (origin=crmd) Feb 10 08:58:10 arsvr1 crmd: [960]: info: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ] Feb 10 08:58:10 arsvr1 attrd: [959]: info: attrd_trigger_update:Sending flush op to all hosts for: shutdown (<null>) Feb 10 08:58:10 arsvr1 attrd: [959]: info: attrd_trigger_update:Sending flush op to all hosts for: master-drbd_mysql:0 (<null>) Feb 10 08:58:10 arsvr1 attrd: [959]: info: attrd_trigger_update:Sending flush op to all hosts for: terminate (<null>) Feb 10 08:58:10 arsvr1 attrd: [959]: info: attrd_trigger_update:Sending flush op to all hosts for: master-drbd_webfs:0 (<null>) Feb 10 08:58:10 arsvr1 attrd: [959]: info: attrd_trigger_update:Sending flush op to all hosts for: probe_complete (<null>) Feb 10 08:58:10 arsvr1 attrd: [959]: info: attrd_ha_callback: flush message from arsvr1 Feb 10 08:58:12 arsvr1 attrd: last message repeated 4 times Feb 10 08:58:12 arsvr1 attrd: [959]: info: attrd_ha_callback: flush message from arsvr2 Feb 10 08:58:12 arsvr1 attrd: [959]: info: attrd_ha_callback:flush message from arsvr2 Feb 10 08:58:12 arsvr1 crmd: [960]:notice:crmd_client_status_callback: Status update: Client arsvr2/crmd now has status [offline] (DC=false) Feb 10 08:58:12 arsvr1 attrd: [959]: info: attrd_ha_callback: flush message from arsvr2 Feb 10 08:58:12 arsvr1 crmd: [960]: info: crm_update_peer_proc:arsvr2.crmd is now offline Feb 10 08:58:12 arsvr1 attrd: [959]: info: attrd_ha_callback: flush message from arsvr2 Feb 10 08:58:12 arsvr1 crmd: [960]: info:crmd_client_status_callback:Got client status callback - our DC is dead Feb 10 08:58:12 arsvr1 crmd: [960]: notice:crmd_client_status_callback: Status update: Client arsvr2/crmd now has status [online] (DC=false) Feb 10 08:58:12 arsvr1 crmd: [960]: info: crm_update_peer_proc:arsvr2.crmd is now online Feb 10 08:58:12 arsvr1 crmd: [960]: info: crmd_client_status_callback:Not the DC Feb 10 08:58:12 arsvr1 crmd: [960]: info: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK origin=crmd_client_status_callback ] Feb 10 08:58:12 arsvr1 crmd: [960]: info: update_dc: Unset DC arsvr2 Feb 10 08:58:12 arsvr1 attrd: [959]: info: attrd_ha_callback: flush message from arsvr2 Feb 10 08:58:14 arsvr1 heartbeat: [898]: WARN: 1 lost packet(s) for [arsvr2] [131787:131789] Feb 10 08:58:14 arsvr1 heartbeat: [898]: info: No pkts missing from arsvr2! Liang Ma Contractuel | Consultant | SED Systems Inc. Ground Systems Analyst Agence spatiale canadienne | Canadian Space Agency 6767, Route de l'Aéroport, Longueuil (St-Hubert), QC, Canada, J3Y 8Y9 Tél/Tel : (450) 926-5099 | Téléc/Fax: (450) 926-5083 Courriel/E-mail : [liang...@space.gc.ca] Site web/Web site : [www.space.gc.ca ] -----Original Message----- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: February 10, 2011 2:39 AM To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Could not connect to the CIB: Remote node did notrespond On Wed, Feb 9, 2011 at 3:59 PM, <liang...@asc-csa.gc.ca> wrote: > Hi There, > > After a network and power shutdown, my LAMP cluster servers were totally > screwed up. > > Now crm status gives me > > crm status > ============ > Last updated: Wed Feb 9 09:44:17 2011 > Stack: Heartbeat > Current DC: arsvr2 (bc6bf61d-6b5f-4307-85f3-bf7bb11531bb) - partition with > quorum > Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd > 2 Nodes configured, 1 expected votes > 4 Resources configured. > ============ > > Online: [ arsvr1 arsvr2 ] > > None of the resources comes up. > > First I found a brain split in drbd disks. I fixed that and the drbd disks > are health. I can mount them manually without problem. > > However if I try anything to bring up a resource or edit cib or even a query, > it gives me errors as following > > crm resource start fs_mysql > Call cib_replace failed (-41): Remote node did not respond <null> > > crm configure edit > Could not connect to the CIB: Remote node did not respond > ERROR: creating tmp shadow __crmshell.2540 failed > > > cibadmin -Q > Call cib_query failed (-41): Remote node did not respond <null> > > Any idea what I can do to bring the cluster back? Seems like you don't have a DC. Hard to say why without logs. Does cibadmin -Ql work? _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker