On Fri, Oct 8, 2010 at 10:05 PM, Pavlos Parissis <pavlos.paris...@gmail.com> wrote: > On 8 October 2010 09:29, Andrew Beekhof <and...@beekhof.net> wrote: >> On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis >> <pavlos.paris...@gmail.com> wrote: >>> On 8 October 2010 08:29, Andrew Beekhof <and...@beekhof.net> wrote: >>>> On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis >>>> <pavlos.paris...@gmail.com> wrote: >>>>> >>>>> >>>>> On 7 October 2010 09:01, Andrew Beekhof <and...@beekhof.net> wrote: >>>>>> >>>>>> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis >>>>>> <pavlos.paris...@gmail.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > I am having again the same issue, in a different set of 3 nodes. When I >>>>>> > try >>>>>> > to failover manually the resource group on the standby node, the >>>>>> > ms-drbd >>>>>> > resource is not moved as well and as a result the resource group is not >>>>>> > fully started, only the ip resource is started. >>>>>> > Any ideas why I am having this issue? >>>>>> >>>>>> I think its a bug that was fixed recently. Could you try the latest >>>>>> from code Mercurial? >>>>> >>>>> 1.1 or 1.2 branch? >>>> >>>> 1.1 >>>> >>> to save time on compiling stuff I want to use the available rpms on >>> 1.1.3 version from rpm-next repo. >>> But before I go and recreate the scenario, which means rebuild 3 >>> nodes, I would like to know if this bug is fixed in 1.1.3 >> >> As I said, I believe so. >> > > I've just upgraded[1] my pacemaker to 1.1.3 and stonithd can not be > started, am I missing something?
Heartbeat based clusters need the following added to ha.cf apiauth stonith-ng uid=root > > Oct 08 21:08:01 node-02 heartbeat: [14192]: info: Starting > "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 14192) > Oct 08 21:08:01 node-02 heartbeat: [14193]: info: Starting > "/usr/lib/heartbeat/attrd" as uid 101 gid 103 (pid 14193) > Oct 08 21:08:01 node-02 heartbeat: [14194]: info: Starting > "/usr/lib/heartbeat/crmd" as uid 101 gid 103 (pid 14194) > Oct 08 21:08:01 node-02 ccm: [14189]: info: Hostname: node-02 > Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed > Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM > Connection failed 1 times (30 max) > Oct 08 21:08:01 node-02 attrd: [14193]: info: Invoked: > /usr/lib/heartbeat/attrd > Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: Invoked: > /usr/lib/heartbeat/stonithd > Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: > G_main_add_SignalHandler: Added signal handler for signal 17 > Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Client [stonith-ng] > pid 14192 failed authorization [no default client auth] > Oct 08 21:08:01 node-02 heartbeat: [14158]: ERROR: > api_process_registration_msg: cannot add client(stonith-ng) > Oct 08 21:08:01 node-02 stonith-ng: [14192]: ERROR: > register_heartbeat_conn: Cannot sign on with heartbeat: > Oct 08 21:08:01 node-02 stonith-ng: [14192]: CRIT: main: Cannot sign > in to the cluster... terminating > Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Managed > /usr/lib/heartbeat/stonithd process 14192 exited with return code 100. > Oct 08 21:08:01 node-02 crmd: [14194]: info: Invoked: /usr/lib/heartbeat/crmd > Oct 08 21:08:01 node-02 crmd: [14194]: info: G_main_add_SignalHandler: > Added signal handler for signal 17 > Oct 08 21:08:02 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't > complete CIB registration 1 times... pause and retry > Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed > Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM > Connection failed 2 times (30 max) > Oct 08 21:08:05 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't > complete CIB registration 2 times... pause and retry > [..snip...] > Oct 08 21:08:33 node-02 crmd: [14194]: ERROR: te_connect_stonith: > Sign-in failed: triggered a retry > > > [1] I use CentOS 5.4 and when I did the installation I used the > following repository > [r...@node-02 ~]# cat /etc/yum.repos.d/pacemaker.repo > [clusterlabs] > name=High Availability/Clustering server technologies (epel-5) > baseurl=http://www.clusterlabs.org/rpm/epel-5 > type=rpm-md > gpgcheck=0 > enabled=1 > > and in order to perform the upgrade I added the following rep. > > [clusterlabs-next] > name=High Availability/Clustering server technologies (epel-5-next) > baseurl=http://www.clusterlabs.org/rpm-next/epel-5 > metadata_expire=45m > type=rpm-md > gpgcheck=0 > enabled=1 > > and here is the installation/upgrade log, where you can see only > pacemaker-libs and pacemaker were upgraded. > Oct 03 21:06:20 Installed: libibverbs-1.1.3-2.el5.i386 > Oct 03 21:06:25 Installed: lm_sensors-2.10.7-9.el5.i386 > Oct 03 21:06:31 Installed: 1:net-snmp-5.3.2.2-9.el5_5.1.i386 > Oct 03 21:06:31 Installed: librdmacm-1.0.10-1.el5.i386 > Oct 03 21:06:32 Installed: openhpi-libs-2.14.0-5.el5.i386 > Oct 03 21:06:33 Installed: OpenIPMI-libs-2.0.16-7.el5.i386 > Oct 03 21:06:35 Installed: libesmtp-1.0.4-5.el5.i386 > Oct 03 21:06:36 Installed: cluster-glue-libs-1.0.6-1.6.el5.i386 > Oct 03 21:06:37 Installed: heartbeat-libs-3.0.3-2.3.el5.i386 > Oct 03 21:06:39 Installed: corosynclib-1.2.7-1.1.el5.i386 > Oct 03 21:06:42 Installed: cluster-glue-1.0.6-1.6.el5.i386 > Oct 03 21:06:45 Installed: resource-agents-1.0.3-2.6.el5.i386 > Oct 03 21:06:46 Installed: heartbeat-3.0.3-2.3.el5.i386 > Oct 03 21:06:47 Installed: pacemaker-libs-1.0.9.1-1.15.el5.i386 > Oct 03 21:06:49 Installed: pacemaker-1.0.9.1-1.15.el5.i386 > Oct 03 21:06:50 Installed: corosync-1.2.7-1.1.el5.i386 > Oct 08 21:06:37 Updated: pacemaker-libs-1.1.3-1.el5.i386 > Oct 08 21:06:43 Updated: pacemaker-1.1.3-1.el5.i386 > > and my conf > [r...@node-02 log]# cibadmin -Ql|grep vali > <cib validate-with="pacemaker-1.0" crm_feature_set="3.0.2" > have-quorum="1" dc-uuid="b7764e7b-0a00-4745-8d9e-6911271eefb2" > admin_epoch="0" epoch="319" num_updates="60"> > [r...@node-02 log]# crm configure show > node $id="80275014-5efe-4825-a29c-d42610f08cd1" node-02 > node $id="b7764e7b-0a00-4745-8d9e-6911271eefb2" node-03 > node $id="c7459ab3-55b6-4155-946d-5c1ba783507f" node-01 > primitive drbd_01 ocf:linbit:drbd \ > params drbd_resource="drbd_pbx_service_1" \ > op monitor interval="30s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="120s" > primitive drbd_02 ocf:linbit:drbd \ > params drbd_resource="drbd_pbx_service_2" \ > op monitor interval="30s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="120s" > primitive fs_01 ocf:heartbeat:Filesystem \ > params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \ > meta migration-threshold="3" failure-timeout="60" \ > op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive fs_02 ocf:heartbeat:Filesystem \ > params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" \ > meta migration-threshold="3" failure-timeout="60" \ > op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive ip_01 ocf:heartbeat:IPaddr2 \ > params ip="192.168.78.10" cidr_netmask="24" broadcast="192.168.78.255" > \ > meta failure-timeout="120" migration-threshold="3" \ > op monitor interval="5s" > primitive ip_02 ocf:heartbeat:IPaddr2 \ > params ip="192.168.78.20" cidr_netmask="24" broadcast="192.168.78.255" > \ > meta failure-timeout="120" migration-threshold="3" \ > op monitor interval="5s" > primitive pbx_01 lsb:znd-pbx_01 \ > meta failure-timeout="120" migration-threshold="3" > target-role="Started" \ > op monitor interval="20s" timeout="40s" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive pbx_02 ocf:heartbeat:Dummy \ > params state="/pbx_service_02/Dummy.state" \ > meta failure-timeout="120" migration-threshold="3" \ > op monitor interval="20s" timeout="40s" > primitive sshd-pbx_01 lsb:sshd-pbx_01 \ > meta target-role="Started" \ > op monitor interval="10m" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive sshd-pbx_02 lsb:sshd-pbx_02 \ > meta target-role="Started" \ > op monitor interval="10m" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive stonith-meatware stonith:meatware \ > params hostlist="node-01 node-02 node-03" stonith-timeout="60" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > group pbx_service_01 ip_01 fs_01 pbx_01 sshd-pbx_01 \ > meta target-role="Started" > group pbx_service_02 ip_02 fs_02 pbx_02 sshd-pbx_02 \ > meta target-role="Started" > ms ms-drbd_01 drbd_01 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" > ms ms-drbd_02 drbd_02 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" > clone stonith-clone stonith-meatware \ > meta clone-max="3" clone-node-max="1" target-role="Started" > globally_unique="false" > location PrimaryNode-drbd_01 ms-drbd_01 100: node-01 > location PrimaryNode-drbd_02 ms-drbd_02 100: node-02 > location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01 > location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02 > location SecondaryNode-drbd_01 ms-drbd_01 0: node-03 > location SecondaryNode-drbd_02 ms-drbd_02 0: node-03 > location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03 > location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03 > location stonith-node-01 stonith-clone 100: node-01 > location stonith-node-02 stonith-clone 100: node-02 > location stonith-node-03 stonith-clone 100: node-03 > colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master > colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master > order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote > pbx_service_01:start > order pbx_service_02-after-drbd_02 inf: ms-drbd_02:promote > pbx_service_02:start > property $id="cib-bootstrap-options" \ > stonith-enabled="true" \ > symmetric-cluster="false" \ > dc-version="1.1.3-9c2342c0378140df9bed7d192f2b9ed157908007" \ > cluster-infrastructure="Heartbeat" \ > last-lrm-refresh="1286195722" > rsc_defaults $id="rsc-options" \ > resource-stickiness="1000" > [r...@node-02 log]# > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker