Thanks, I did a little Googling and found the git repository for pcs. Is there any way to make a two-node cluster work with the stock Debian packages, though? It seems odd that this would be impossible.
On Tue, Oct 1, 2013 at 3:16 PM, Larry Brigman <larry.brig...@gmail.com>wrote: > pcs is another package you will need to install. > On Oct 1, 2013 9:04 AM, "David Parker" <dpar...@utica.edu> wrote: > >> Hello, >> >> Sorry for the delay in my reply. I've been doing a lot of >> experimentation, but so far I've had no luck. >> >> Thanks for the suggestion, but it seems I'm not able to use CMAN. I'm >> running Debian Wheezy with Corosync and Pacemaker installed via apt-get. >> When I installed CMAN and set up a cluster.conf file, Pacemaker refused to >> start and said that CMAN was not supported. When CMAN is not installed, >> Pacemaker starts up fine, but I see these lines in the log: >> >> Sep 30 23:36:29 test-vm-1 crmd: [6941]: ERROR: init_quorum_connection: >> The Corosync quorum API is not supported in this build >> Sep 30 23:36:29 test-vm-1 pacemakerd: [6932]: ERROR: pcmk_child_exit: >> Child process crmd exited (pid=6941, rc=100) >> Sep 30 23:36:29 test-vm-1 pacemakerd: [6932]: WARN: pcmk_child_exit: >> Pacemaker child process crmd no longer wishes to be respawned. Shutting >> ourselves down. >> >> So, then I checked to see which plugins are supported: >> >> # pacemakerd -F >> Pacemaker 1.1.7 (Build: ee0730e13d124c3d58f00016c3376a1de5323cff) >> Supporting: generated-manpages agent-manpages ncurses heartbeat >> corosync-plugin snmp libesmtp >> >> Am I correct in believing that this Pacemaker package has been compiled >> without support for any quorum API? If so, does anyone know if there is a >> Debian package which has the correct support? >> >> I also tried compiling LibQB, Corosync and Pacemaker from source via git, >> following the instructions documented here: >> >> http://clusterlabs.org/wiki/SourceInstall >> >> I was hopeful that this would work, because as I understand it, Corosync >> 2.x no longer uses CMAN. Everything compiled and started fine, but the >> compiled version of Pacemaker did not include either the 'crm' or 'pcs' >> commands. Do I need to install something else in order to get one of these? >> >> Any and all help is greatly appreciated! >> >> Thanks, >> Dave >> >> >> On Wed, Sep 25, 2013 at 6:08 AM, David Lang <da...@lang.hm> wrote: >> >>> the cluster is trying to reach a quarum (the majority of the nodes >>> talking to each other) and that is never going to happen with only one >>> node. so you have to disable this. >>> >>> try putting >>> <cman two_node="1" expected_votes="1" transport="udpu"/> >>> in your cluster.conf >>> >>> David Lang >>> >>> On Tue, 24 Sep 2013, David Parker wrote: >>> >>> Date: Tue, 24 Sep 2013 11:48:59 -0400 >>>> From: David Parker <dpar...@utica.edu> >>>> Reply-To: The Pacemaker cluster resource manager >>>> <pacemaker@oss.clusterlabs.org**> >>>> To: The Pacemaker cluster resource manager < >>>> pacemaker@oss.clusterlabs.org**> >>>> Subject: Re: [Pacemaker] Corosync won't recover when a node fails >>>> >>>> >>>> I forgot to mention, OS is Debian Wheezy 64-bit, Corosync and Pacemaker >>>> installed from packages via apt-get, and there are no local firewall >>>> rules >>>> in place: >>>> >>>> # iptables -L >>>> Chain INPUT (policy ACCEPT) >>>> target prot opt source destination >>>> >>>> Chain FORWARD (policy ACCEPT) >>>> target prot opt source destination >>>> >>>> Chain OUTPUT (policy ACCEPT) >>>> target prot opt source destination >>>> >>>> >>>> On Tue, Sep 24, 2013 at 11:41 AM, David Parker <dpar...@utica.edu> >>>> wrote: >>>> >>>> Hello, >>>>> >>>>> I have a 2-node cluster using Corosync and Pacemaker, where the nodes >>>>> are >>>>> actually to VirtualBox VMs on the same physical machine. I have some >>>>> resources set up in Pacemaker, and everything works fine if I move >>>>> them in >>>>> a controlled way with the "crm_resource -r <resource> --move --node >>>>> <node>" >>>>> command. >>>>> >>>>> However, when I hard-fail one of the nodes via the "poweroff" command >>>>> in >>>>> Virtual Box, which "pulls the plug" on the VM, the resources do not >>>>> move, >>>>> and I see the following output in the log on the remaining node: >>>>> >>>>> Sep 24 11:20:30 corosync [TOTEM ] The token was lost in the OPERATIONAL >>>>> state. >>>>> Sep 24 11:20:30 corosync [TOTEM ] A processor failed, forming new >>>>> configuration. >>>>> Sep 24 11:20:30 corosync [TOTEM ] entering GATHER state from 2. >>>>> Sep 24 11:20:31 test-vm-2 lrmd: [2503]: debug: rsc:drbd_r0:0 >>>>> monitor[31] >>>>> (pid 8495) >>>>> drbd[8495]: 2013/09/24_11:20:31 WARNING: This resource agent is >>>>> deprecated and may be removed in a future release. See the man page for >>>>> details. To suppress this warning, set the "ignore_deprecation" >>>>> resource >>>>> parameter to true. >>>>> drbd[8495]: 2013/09/24_11:20:31 WARNING: This resource agent is >>>>> deprecated and may be removed in a future release. See the man page for >>>>> details. To suppress this warning, set the "ignore_deprecation" >>>>> resource >>>>> parameter to true. >>>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c >>>>> /etc/drbd.conf role r0 >>>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Exit code 0 >>>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Command output: >>>>> Secondary/Primary >>>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c >>>>> /etc/drbd.conf cstate r0 >>>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Exit code 0 >>>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0: Command output: >>>>> Connected >>>>> drbd[8495]: 2013/09/24_11:20:31 DEBUG: r0 status: Secondary/Primary >>>>> Secondary Primary Connected >>>>> Sep 24 11:20:31 test-vm-2 lrmd: [2503]: info: operation monitor[31] on >>>>> drbd_r0:0 for client 2506: pid 8495 exited with return code 0 >>>>> Sep 24 11:20:32 corosync [TOTEM ] entering GATHER state from 0. >>>>> Sep 24 11:20:34 corosync [TOTEM ] The consensus timeout expired. >>>>> Sep 24 11:20:34 corosync [TOTEM ] entering GATHER state from 3. >>>>> Sep 24 11:20:36 corosync [TOTEM ] The consensus timeout expired. >>>>> Sep 24 11:20:36 corosync [TOTEM ] entering GATHER state from 3. >>>>> Sep 24 11:20:38 corosync [TOTEM ] The consensus timeout expired. >>>>> Sep 24 11:20:38 corosync [TOTEM ] entering GATHER state from 3. >>>>> Sep 24 11:20:40 corosync [TOTEM ] The consensus timeout expired. >>>>> Sep 24 11:20:40 corosync [TOTEM ] entering GATHER state from 3. >>>>> Sep 24 11:20:40 corosync [TOTEM ] Totem is unable to form a cluster >>>>> because of an operating system or network fault. The most common cause >>>>> of >>>>> this message is that the local firewall is configured improperly. >>>>> Sep 24 11:20:43 corosync [TOTEM ] The consensus timeout expired. >>>>> Sep 24 11:20:43 corosync [TOTEM ] entering GATHER state from 3. >>>>> Sep 24 11:20:43 corosync [TOTEM ] Totem is unable to form a cluster >>>>> because of an operating system or network fault. The most common cause >>>>> of >>>>> this message is that the local firewall is configured improperly. >>>>> Sep 24 11:20:45 corosync [TOTEM ] The consensus timeout expired. >>>>> Sep 24 11:20:45 corosync [TOTEM ] entering GATHER state from 3. >>>>> Sep 24 11:20:45 corosync [TOTEM ] Totem is unable to form a cluster >>>>> because of an operating system or network fault. The most common cause >>>>> of >>>>> this message is that the local firewall is configured improperly. >>>>> Sep 24 11:20:47 corosync [TOTEM ] The consensus timeout expired. >>>>> >>>>> Those last 3 messages just repeat over and over, the cluster never >>>>> recovers, and the resources never move. "crm_mon" reports that the >>>>> resources are still running on the dead node, and shows no indication >>>>> that >>>>> anything has gone wrong. >>>>> >>>>> Does anyone know what the issue could be? My expectation was that the >>>>> remaining node would become the sole member of the cluster, take over >>>>> the >>>>> resources, and everything would keep running. >>>>> >>>>> For reference, my corosync.conf file is below: >>>>> >>>>> compatibility: whitetank >>>>> >>>>> totem { >>>>> version: 2 >>>>> secauth: off >>>>> interface { >>>>> member { >>>>> memberaddr: 192.168.25.201 >>>>> } >>>>> member { >>>>> memberaddr: 192.168.25.202 >>>>> } >>>>> ringnumber: 0 >>>>> bindnetaddr: 192.168.25.0 >>>>> mcastport: 5405 >>>>> } >>>>> transport: udpu >>>>> } >>>>> >>>>> logging { >>>>> fileline: off >>>>> to_logfile: yes >>>>> to_syslog: yes >>>>> debug: on >>>>> logfile: /var/log/cluster/corosync.log >>>>> timestamp: on >>>>> logger_subsys { >>>>> subsys: AMF >>>>> debug: on >>>>> } >>>>> } >>>>> >>>>> >>>>> Thanks! >>>>> Dave >>>>> >>>>> -- >>>>> Dave Parker >>>>> Systems Administrator >>>>> Utica College >>>>> Integrated Information Technology Services >>>>> (315) 792-3229 >>>>> Registered Linux User #408177 >>>>> >>>>> >>>> >>>> >>>> >>> _______________________________________________ >>> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> >>> >>> Project Home: http://www.clusterlabs.org >>> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> >> >> -- >> Dave Parker >> Systems Administrator >> Utica College >> Integrated Information Technology Services >> (315) 792-3229 >> Registered Linux User #408177 >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > -- Dave Parker Systems Administrator Utica College Integrated Information Technology Services (315) 792-3229 Registered Linux User #408177
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org