Re: [Pacemaker] crm resource move doesn't move the resource
On 11 October 2010 11:16, Pavlos Parissis wrote: > On 8 October 2010 09:29, Andrew Beekhof wrote: >> On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis >> wrote: >>> On 8 October 2010 08:29, Andrew Beekhof wrote: On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis wrote: > > > On 7 October 2010 09:01, Andrew Beekhof wrote: >> >> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis >> wrote: >> > Hi, >> > >> > I am having again the same issue, in a different set of 3 nodes. When I >> > try >> > to failover manually the resource group on the standby node, the >> > ms-drbd >> > resource is not moved as well and as a result the resource group is not >> > fully started, only the ip resource is started. >> > Any ideas why I am having this issue? >> >> I think its a bug that was fixed recently. Could you try the latest >> from code Mercurial? > > 1.1 or 1.2 branch? 1.1 >>> to save time on compiling stuff I want to use the available rpms on >>> 1.1.3 version from rpm-next repo. >>> But before I go and recreate the scenario, which means rebuild 3 >>> nodes, I would like to know if this bug is fixed in 1.1.3 >> >> As I said, I believe so. > > I recreated the 3 node cluster and I didn't face that issue, but I am > going to keep an eye on it for few days and even rerun the whole > scenario (recreate 3 node cluster ...) just to be very sure. If I > don't the see it again I will also close the bug report > > Thanks, > Pavlos > I recreated the 3-node cluster using 1.1.3 version just see if it is solved, but the issue appeared again. So, Andrew the issue is not solved in 1.1.3. I am going to update the bug report accordingly. Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On 8 October 2010 09:29, Andrew Beekhof wrote: > On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis > wrote: >> On 8 October 2010 08:29, Andrew Beekhof wrote: >>> On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis >>> wrote: On 7 October 2010 09:01, Andrew Beekhof wrote: > > On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis > wrote: > > Hi, > > > > I am having again the same issue, in a different set of 3 nodes. When I > > try > > to failover manually the resource group on the standby node, the ms-drbd > > resource is not moved as well and as a result the resource group is not > > fully started, only the ip resource is started. > > Any ideas why I am having this issue? > > I think its a bug that was fixed recently. Could you try the latest > from code Mercurial? 1.1 or 1.2 branch? >>> >>> 1.1 >>> >> to save time on compiling stuff I want to use the available rpms on >> 1.1.3 version from rpm-next repo. >> But before I go and recreate the scenario, which means rebuild 3 >> nodes, I would like to know if this bug is fixed in 1.1.3 > > As I said, I believe so. I recreated the 3 node cluster and I didn't face that issue, but I am going to keep an eye on it for few days and even rerun the whole scenario (recreate 3 node cluster ...) just to be very sure. If I don't the see it again I will also close the bug report Thanks, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On Fri, Oct 8, 2010 at 10:05 PM, Pavlos Parissis wrote: > On 8 October 2010 09:29, Andrew Beekhof wrote: >> On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis >> wrote: >>> On 8 October 2010 08:29, Andrew Beekhof wrote: On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis wrote: > > > On 7 October 2010 09:01, Andrew Beekhof wrote: >> >> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis >> wrote: >> > Hi, >> > >> > I am having again the same issue, in a different set of 3 nodes. When I >> > try >> > to failover manually the resource group on the standby node, the >> > ms-drbd >> > resource is not moved as well and as a result the resource group is not >> > fully started, only the ip resource is started. >> > Any ideas why I am having this issue? >> >> I think its a bug that was fixed recently. Could you try the latest >> from code Mercurial? > > 1.1 or 1.2 branch? 1.1 >>> to save time on compiling stuff I want to use the available rpms on >>> 1.1.3 version from rpm-next repo. >>> But before I go and recreate the scenario, which means rebuild 3 >>> nodes, I would like to know if this bug is fixed in 1.1.3 >> >> As I said, I believe so. >> > > I've just upgraded[1] my pacemaker to 1.1.3 and stonithd can not be > started, am I missing something? Heartbeat based clusters need the following added to ha.cf apiauth stonith-ng uid=root > > Oct 08 21:08:01 node-02 heartbeat: [14192]: info: Starting > "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 14192) > Oct 08 21:08:01 node-02 heartbeat: [14193]: info: Starting > "/usr/lib/heartbeat/attrd" as uid 101 gid 103 (pid 14193) > Oct 08 21:08:01 node-02 heartbeat: [14194]: info: Starting > "/usr/lib/heartbeat/crmd" as uid 101 gid 103 (pid 14194) > Oct 08 21:08:01 node-02 ccm: [14189]: info: Hostname: node-02 > Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed > Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM > Connection failed 1 times (30 max) > Oct 08 21:08:01 node-02 attrd: [14193]: info: Invoked: > /usr/lib/heartbeat/attrd > Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: Invoked: > /usr/lib/heartbeat/stonithd > Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: > G_main_add_SignalHandler: Added signal handler for signal 17 > Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Client [stonith-ng] > pid 14192 failed authorization [no default client auth] > Oct 08 21:08:01 node-02 heartbeat: [14158]: ERROR: > api_process_registration_msg: cannot add client(stonith-ng) > Oct 08 21:08:01 node-02 stonith-ng: [14192]: ERROR: > register_heartbeat_conn: Cannot sign on with heartbeat: > Oct 08 21:08:01 node-02 stonith-ng: [14192]: CRIT: main: Cannot sign > in to the cluster... terminating > Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Managed > /usr/lib/heartbeat/stonithd process 14192 exited with return code 100. > Oct 08 21:08:01 node-02 crmd: [14194]: info: Invoked: /usr/lib/heartbeat/crmd > Oct 08 21:08:01 node-02 crmd: [14194]: info: G_main_add_SignalHandler: > Added signal handler for signal 17 > Oct 08 21:08:02 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't > complete CIB registration 1 times... pause and retry > Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed > Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM > Connection failed 2 times (30 max) > Oct 08 21:08:05 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't > complete CIB registration 2 times... pause and retry > [..snip...] > Oct 08 21:08:33 node-02 crmd: [14194]: ERROR: te_connect_stonith: > Sign-in failed: triggered a retry > > > [1] I use CentOS 5.4 and when I did the installation I used the > following repository > [r...@node-02 ~]# cat /etc/yum.repos.d/pacemaker.repo > [clusterlabs] > name=High Availability/Clustering server technologies (epel-5) > baseurl=http://www.clusterlabs.org/rpm/epel-5 > type=rpm-md > gpgcheck=0 > enabled=1 > > and in order to perform the upgrade I added the following rep. > > [clusterlabs-next] > name=High Availability/Clustering server technologies (epel-5-next) > baseurl=http://www.clusterlabs.org/rpm-next/epel-5 > metadata_expire=45m > type=rpm-md > gpgcheck=0 > enabled=1 > > and here is the installation/upgrade log, where you can see only > pacemaker-libs and pacemaker were upgraded. > Oct 03 21:06:20 Installed: libibverbs-1.1.3-2.el5.i386 > Oct 03 21:06:25 Installed: lm_sensors-2.10.7-9.el5.i386 > Oct 03 21:06:31 Installed: 1:net-snmp-5.3.2.2-9.el5_5.1.i386 > Oct 03 21:06:31 Installed: librdmacm-1.0.10-1.el5.i386 > Oct 03 21:06:32 Installed: openhpi-libs-2.14.0-5.el5.i386 > Oct 03 21:06:33 Installed: OpenIPMI-libs-2.0.16-7.el5.i386 > Oct 03 21:06:35 Installed: libesmtp-1.0.4-5.el5.i386 > Oct 03 21:06:36 Installed: cluster-glue-libs-1.0.6-1.6.el5.i386 > Oct 03 21:06:37 Installed: heartbeat-libs-3.0.3-2.3.el5.i386 > Oct 03 21:06:39 Installed: corosynclib-1.2.7-1.
Re: [Pacemaker] crm resource move doesn't move the resource
On 8 October 2010 22:05, Pavlos Parissis wrote: > On 8 October 2010 09:29, Andrew Beekhof wrote: >> On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis >> wrote: >>> On 8 October 2010 08:29, Andrew Beekhof wrote: On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis wrote: > > > On 7 October 2010 09:01, Andrew Beekhof wrote: >> >> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis >> wrote: >> > Hi, >> > >> > I am having again the same issue, in a different set of 3 nodes. When I >> > try >> > to failover manually the resource group on the standby node, the >> > ms-drbd >> > resource is not moved as well and as a result the resource group is not >> > fully started, only the ip resource is started. >> > Any ideas why I am having this issue? >> >> I think its a bug that was fixed recently. Could you try the latest >> from code Mercurial? > > 1.1 or 1.2 branch? 1.1 >>> to save time on compiling stuff I want to use the available rpms on >>> 1.1.3 version from rpm-next repo. >>> But before I go and recreate the scenario, which means rebuild 3 >>> nodes, I would like to know if this bug is fixed in 1.1.3 >> >> As I said, I believe so. >> > > I've just upgraded[1] my pacemaker to 1.1.3 and stonithd can not be > started, am I missing something? > > Oct 08 21:08:01 node-02 heartbeat: [14192]: info: Starting > "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 14192) > Oct 08 21:08:01 node-02 heartbeat: [14193]: info: Starting > "/usr/lib/heartbeat/attrd" as uid 101 gid 103 (pid 14193) > Oct 08 21:08:01 node-02 heartbeat: [14194]: info: Starting > "/usr/lib/heartbeat/crmd" as uid 101 gid 103 (pid 14194) > Oct 08 21:08:01 node-02 ccm: [14189]: info: Hostname: node-02 > Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed > Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM > Connection failed 1 times (30 max) > Oct 08 21:08:01 node-02 attrd: [14193]: info: Invoked: > /usr/lib/heartbeat/attrd > Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: Invoked: > /usr/lib/heartbeat/stonithd > Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: > G_main_add_SignalHandler: Added signal handler for signal 17 > Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Client [stonith-ng] > pid 14192 failed authorization [no default client auth] > Oct 08 21:08:01 node-02 heartbeat: [14158]: ERROR: > api_process_registration_msg: cannot add client(stonith-ng) > Oct 08 21:08:01 node-02 stonith-ng: [14192]: ERROR: > register_heartbeat_conn: Cannot sign on with heartbeat: > Oct 08 21:08:01 node-02 stonith-ng: [14192]: CRIT: main: Cannot sign > in to the cluster... terminating > Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Managed > /usr/lib/heartbeat/stonithd process 14192 exited with return code 100. > Oct 08 21:08:01 node-02 crmd: [14194]: info: Invoked: /usr/lib/heartbeat/crmd > Oct 08 21:08:01 node-02 crmd: [14194]: info: G_main_add_SignalHandler: > Added signal handler for signal 17 > Oct 08 21:08:02 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't > complete CIB registration 1 times... pause and retry > Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed > Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM > Connection failed 2 times (30 max) > Oct 08 21:08:05 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't > complete CIB registration 2 times... pause and retry > [..snip...] > Oct 08 21:08:33 node-02 crmd: [14194]: ERROR: te_connect_stonith: > Sign-in failed: triggered a retry > Solved by adding apiauth stonith-ng uid=root on ha.hf it was mentioned here, http://www.gossamer-threads.com/lists/linuxha/users/67189#67189 and a patch exists which will make heartbeat to not require apiauth. http://hg.linux-ha.org/dev/rev/9624b66a6b82 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On 8 October 2010 09:29, Andrew Beekhof wrote: > On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis > wrote: >> On 8 October 2010 08:29, Andrew Beekhof wrote: >>> On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis >>> wrote: On 7 October 2010 09:01, Andrew Beekhof wrote: > > On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis > wrote: > > Hi, > > > > I am having again the same issue, in a different set of 3 nodes. When I > > try > > to failover manually the resource group on the standby node, the ms-drbd > > resource is not moved as well and as a result the resource group is not > > fully started, only the ip resource is started. > > Any ideas why I am having this issue? > > I think its a bug that was fixed recently. Could you try the latest > from code Mercurial? 1.1 or 1.2 branch? >>> >>> 1.1 >>> >> to save time on compiling stuff I want to use the available rpms on >> 1.1.3 version from rpm-next repo. >> But before I go and recreate the scenario, which means rebuild 3 >> nodes, I would like to know if this bug is fixed in 1.1.3 > > As I said, I believe so. > I've just upgraded[1] my pacemaker to 1.1.3 and stonithd can not be started, am I missing something? Oct 08 21:08:01 node-02 heartbeat: [14192]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 14192) Oct 08 21:08:01 node-02 heartbeat: [14193]: info: Starting "/usr/lib/heartbeat/attrd" as uid 101 gid 103 (pid 14193) Oct 08 21:08:01 node-02 heartbeat: [14194]: info: Starting "/usr/lib/heartbeat/crmd" as uid 101 gid 103 (pid 14194) Oct 08 21:08:01 node-02 ccm: [14189]: info: Hostname: node-02 Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM Connection failed 1 times (30 max) Oct 08 21:08:01 node-02 attrd: [14193]: info: Invoked: /usr/lib/heartbeat/attrd Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: Invoked: /usr/lib/heartbeat/stonithd Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Client [stonith-ng] pid 14192 failed authorization [no default client auth] Oct 08 21:08:01 node-02 heartbeat: [14158]: ERROR: api_process_registration_msg: cannot add client(stonith-ng) Oct 08 21:08:01 node-02 stonith-ng: [14192]: ERROR: register_heartbeat_conn: Cannot sign on with heartbeat: Oct 08 21:08:01 node-02 stonith-ng: [14192]: CRIT: main: Cannot sign in to the cluster... terminating Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Managed /usr/lib/heartbeat/stonithd process 14192 exited with return code 100. Oct 08 21:08:01 node-02 crmd: [14194]: info: Invoked: /usr/lib/heartbeat/crmd Oct 08 21:08:01 node-02 crmd: [14194]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 08 21:08:02 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM Connection failed 2 times (30 max) Oct 08 21:08:05 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't complete CIB registration 2 times... pause and retry [..snip...] Oct 08 21:08:33 node-02 crmd: [14194]: ERROR: te_connect_stonith: Sign-in failed: triggered a retry [1] I use CentOS 5.4 and when I did the installation I used the following repository [r...@node-02 ~]# cat /etc/yum.repos.d/pacemaker.repo [clusterlabs] name=High Availability/Clustering server technologies (epel-5) baseurl=http://www.clusterlabs.org/rpm/epel-5 type=rpm-md gpgcheck=0 enabled=1 and in order to perform the upgrade I added the following rep. [clusterlabs-next] name=High Availability/Clustering server technologies (epel-5-next) baseurl=http://www.clusterlabs.org/rpm-next/epel-5 metadata_expire=45m type=rpm-md gpgcheck=0 enabled=1 and here is the installation/upgrade log, where you can see only pacemaker-libs and pacemaker were upgraded. Oct 03 21:06:20 Installed: libibverbs-1.1.3-2.el5.i386 Oct 03 21:06:25 Installed: lm_sensors-2.10.7-9.el5.i386 Oct 03 21:06:31 Installed: 1:net-snmp-5.3.2.2-9.el5_5.1.i386 Oct 03 21:06:31 Installed: librdmacm-1.0.10-1.el5.i386 Oct 03 21:06:32 Installed: openhpi-libs-2.14.0-5.el5.i386 Oct 03 21:06:33 Installed: OpenIPMI-libs-2.0.16-7.el5.i386 Oct 03 21:06:35 Installed: libesmtp-1.0.4-5.el5.i386 Oct 03 21:06:36 Installed: cluster-glue-libs-1.0.6-1.6.el5.i386 Oct 03 21:06:37 Installed: heartbeat-libs-3.0.3-2.3.el5.i386 Oct 03 21:06:39 Installed: corosynclib-1.2.7-1.1.el5.i386 Oct 03 21:06:42 Installed: cluster-glue-1.0.6-1.6.el5.i386 Oct 03 21:06:45 Installed: resource-agents-1.0.3-2.6.el5.i386 Oct 03 21:06:46 Installed: heartbeat-3.0.3-2.3.el5.i386 Oct 03 21:06:47 Installed: pacemaker-libs-1.0.9.1-1.15.el5.i386 Oct 03 21:06:49 Installed: pacemaker-1.0.9.1-1.15.el5.i386 Oct 03 21:06:50 Installed: corosync
Re: [Pacemaker] crm resource move doesn't move the resource
On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis wrote: > On 8 October 2010 08:29, Andrew Beekhof wrote: >> On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis >> wrote: >>> >>> >>> On 7 October 2010 09:01, Andrew Beekhof wrote: On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis wrote: > Hi, > > I am having again the same issue, in a different set of 3 nodes. When I > try > to failover manually the resource group on the standby node, the ms-drbd > resource is not moved as well and as a result the resource group is not > fully started, only the ip resource is started. > Any ideas why I am having this issue? I think its a bug that was fixed recently. Could you try the latest from code Mercurial? >>> >>> 1.1 or 1.2 branch? >> >> 1.1 >> > to save time on compiling stuff I want to use the available rpms on > 1.1.3 version from rpm-next repo. > But before I go and recreate the scenario, which means rebuild 3 > nodes, I would like to know if this bug is fixed in 1.1.3 As I said, I believe so. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On 8 October 2010 08:29, Andrew Beekhof wrote: > On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis > wrote: >> >> >> On 7 October 2010 09:01, Andrew Beekhof wrote: >>> >>> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis >>> wrote: >>> > Hi, >>> > >>> > I am having again the same issue, in a different set of 3 nodes. When I >>> > try >>> > to failover manually the resource group on the standby node, the ms-drbd >>> > resource is not moved as well and as a result the resource group is not >>> > fully started, only the ip resource is started. >>> > Any ideas why I am having this issue? >>> >>> I think its a bug that was fixed recently. Could you try the latest >>> from code Mercurial? >> >> 1.1 or 1.2 branch? > > 1.1 > to save time on compiling stuff I want to use the available rpms on 1.1.3 version from rpm-next repo. But before I go and recreate the scenario, which means rebuild 3 nodes, I would like to know if this bug is fixed in 1.1.3 Thanks, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis wrote: > > > On 7 October 2010 09:01, Andrew Beekhof wrote: >> >> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis >> wrote: >> > Hi, >> > >> > I am having again the same issue, in a different set of 3 nodes. When I >> > try >> > to failover manually the resource group on the standby node, the ms-drbd >> > resource is not moved as well and as a result the resource group is not >> > fully started, only the ip resource is started. >> > Any ideas why I am having this issue? >> >> I think its a bug that was fixed recently. Could you try the latest >> from code Mercurial? > > 1.1 or 1.2 branch? 1.1 1.2 is a placeholder at the moment, we're not really updating it. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On 8 October 2010 04:26, jiaju liu wrote: > Message: 2 > Date: Thu, 7 Oct 2010 21:58:29 +0200 > From: Pavlos Parissis > http://cn.mc157.mail.yahoo.com/mc/compose?to=pavlos.paris...@gmail.com> > > > To: The Pacemaker cluster resource manager > > http://cn.mc157.mail.yahoo.com/mc/compose?to=pacema...@oss.clusterlabs.org> > > > Subject: Re: [Pacemaker] crm resource move doesn't move the resource > Message-ID: > > http://cn.mc157.mail.yahoo.com/mc/compose?to=bukp0wt2wie...@mail.gmail.com> > > > Content-Type: text/plain; charset="utf-8" > > On 7 October 2010 09:01, Andrew Beekhof > http://cn.mc157.mail.yahoo.com/mc/compose?to=and...@beekhof.net>> > wrote: > > > On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis > > http://cn.mc157.mail.yahoo.com/mc/compose?to=pavlos.paris...@gmail.com>> > wrote: > > > Hi, > > > > > > I am having again the same issue, in a different set of 3 nodes. When I > > try > > > to failover manually the resource group on the standby node, the > ms-drbd > > > resource is not moved as well and as a result the resource group is not > > > fully started, only the ip resource is started. > > > Any ideas why I am having this issue? > > > > I think its a bug that was fixed recently. Could you try the latest > > from code Mercurial? > Maybe you should clear failcount > > > the failcount was 0. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
Message: 2 Date: Thu, 7 Oct 2010 21:58:29 +0200 From: Pavlos Parissis To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] crm resource move doesn't move the resource Message-ID: Content-Type: text/plain; charset="utf-8" On 7 October 2010 09:01, Andrew Beekhof wrote: > On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis > wrote: > > Hi, > > > > I am having again the same issue, in a different set of 3 nodes. When I > try > > to failover manually the resource group on the standby node, the ms-drbd > > resource is not moved as well and as a result the resource group is not > > fully started, only the ip resource is started. > > Any ideas why I am having this issue? > > I think its a bug that was fixed recently. Could you try the latest > from code Mercurial? Maybe you should clear failcount 1.1 or 1.2 branch? -- next part -- An HTML attachment was scrubbed... URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20101007/ce6d0b4e/attachment-0001.htm> ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On 7 October 2010 09:01, Andrew Beekhof wrote: > On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis > wrote: > > Hi, > > > > I am having again the same issue, in a different set of 3 nodes. When I > try > > to failover manually the resource group on the standby node, the ms-drbd > > resource is not moved as well and as a result the resource group is not > > fully started, only the ip resource is started. > > Any ideas why I am having this issue? > > I think its a bug that was fixed recently. Could you try the latest > from code Mercurial? 1.1 or 1.2 branch? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis wrote: > Hi, > > I am having again the same issue, in a different set of 3 nodes. When I try > to failover manually the resource group on the standby node, the ms-drbd > resource is not moved as well and as a result the resource group is not > fully started, only the ip resource is started. > Any ideas why I am having this issue? I think its a bug that was fixed recently. Could you try the latest from code Mercurial? > > here are the info > [r...@node-01 ~]# crm resource move pbx_service_01 node-03 > [r...@node-01 ~]# crm resource unmove pbx_service_01 > [r...@node-01 ~]# ptest -Ls > Allocation scores: > clone_color: ms-drbd_01 allocation score on node-01: 100 > clone_color: ms-drbd_01 allocation score on node-03: 0 > clone_color: drbd_01:0 allocation score on node-01: 11100 > clone_color: drbd_01:0 allocation score on node-03: 0 > clone_color: drbd_01:1 allocation score on node-01: 100 > clone_color: drbd_01:1 allocation score on node-03: 11000 > native_color: drbd_01:0 allocation score on node-01: 11100 > native_color: drbd_01:0 allocation score on node-03: 0 > native_color: drbd_01:1 allocation score on node-01: -100 > native_color: drbd_01:1 allocation score on node-03: 11000 > drbd_01:0 promotion score on node-01: 10100 > drbd_01:1 promotion score on node-03: 1 > drbd_01:2 promotion score on none: 0 > group_color: pbx_service_01 allocation score on node-01: 200 > group_color: pbx_service_01 allocation score on node-03: 10 > group_color: ip_01 allocation score on node-01: 200 > group_color: ip_01 allocation score on node-03: 1010 > group_color: fs_01 allocation score on node-01: 0 > group_color: fs_01 allocation score on node-03: 0 > group_color: pbx_01 allocation score on node-01: 0 > group_color: pbx_01 allocation score on node-03: 0 > native_color: ip_01 allocation score on node-01: 200 > native_color: ip_01 allocation score on node-03: 1010 > drbd_01:0 promotion score on node-01: 100 > drbd_01:1 promotion score on node-03: -100 > drbd_01:2 promotion score on none: 0 > native_color: fs_01 allocation score on node-01: -100 > native_color: fs_01 allocation score on node-03: -100 > native_color: pbx_01 allocation score on node-01: -100 > native_color: pbx_01 allocation score on node-03: -100 > > > [r...@node-01 ~]# crm status > > Last updated: Sat Oct 2 18:27:32 2010 > Stack: Heartbeat > Current DC: node-03 (3dd75a8f-9819-450f-9f18-c27730665925) - partition with > quorum > Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 > 3 Nodes configured, unknown expected votes > 2 Resources configured. > > > Online: [ node-03 node-01 node-02 ] > > Master/Slave Set: ms-drbd_01 > Masters: [ node-01 ] > Slaves: [ node-03 ] > Resource Group: pbx_service_01 > ip_01 (ocf::heartbeat:IPaddr2): Started node-03 > fs_01 (ocf::heartbeat:Filesystem): Stopped > pbx_01 (lsb:test-01): Stopped > > > [r...@node-01 ~]# crm configure show > node $id="3dd75a8f-9819-450f-9f18-c27730665925" node-03 > node $id="4e47db29-5f14-4371-9734-317bf342b8ed" node-02 > node $id="a8f56e42-438f-4ea5-a6ba-a7f1d23ed401" node-01 > primitive drbd_01 ocf:linbit:drbd \ > params drbd_resource="drbd_pbx_service_1" \ > op monitor interval="30s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="120s" > primitive fs_01 ocf:heartbeat:Filesystem \ > params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" > \ > meta migration-threshold="3" failure-timeout="60" \ > op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive ip_01 ocf:heartbeat:IPaddr2 \ > params ip="192.168.78.10" cidr_netmask="24" > broadcast="192.168.78.255" \ > meta failure-timeout="120" migration-threshold="3" \ > op monitor interval="5s" > primitive pbx_01 lsb:test-01 \ > meta migration-threshold="3" failure-timeout="60" \ > op monitor interval="20s" timeout="20s" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > group pbx_service_01 ip_01 fs_01 pbx_01 \ > meta target-role="Started" > ms ms-drbd_01 drbd_01 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" > location PrimaryNode-drbd_01 ms-drbd_01 100: node-01 > location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01 > location SecondaryNode-drbd_01 ms-drbd_01 0: node-03 > location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03 > colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master > order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote > pbx_service_01:start > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ > cluster-in
Re: [Pacemaker] crm resource move doesn't move the resource
I am wondering if resource-stickiness="1000" could be reason for the behavior I see, but again when on the other cluster i recreated the ms-drbd the issue was solved. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm resource move doesn't move the resource
Hi, I am having again the same issue, in a different set of 3 nodes. When I try to failover manually the resource group on the standby node, the ms-drbd resource is not moved as well and as a result the resource group is not fully started, only the ip resource is started. Any ideas why I am having this issue? here are the info [r...@node-01 ~]# crm resource move pbx_service_01 node-03 [r...@node-01 ~]# crm resource unmove pbx_service_01 [r...@node-01 ~]# ptest -Ls Allocation scores: clone_color: ms-drbd_01 allocation score on node-01: 100 clone_color: ms-drbd_01 allocation score on node-03: 0 clone_color: drbd_01:0 allocation score on node-01: 11100 clone_color: drbd_01:0 allocation score on node-03: 0 clone_color: drbd_01:1 allocation score on node-01: 100 clone_color: drbd_01:1 allocation score on node-03: 11000 native_color: drbd_01:0 allocation score on node-01: 11100 native_color: drbd_01:0 allocation score on node-03: 0 native_color: drbd_01:1 allocation score on node-01: -100 native_color: drbd_01:1 allocation score on node-03: 11000 drbd_01:0 promotion score on node-01: 10100 drbd_01:1 promotion score on node-03: 1 drbd_01:2 promotion score on none: 0 group_color: pbx_service_01 allocation score on node-01: 200 group_color: pbx_service_01 allocation score on node-03: 10 group_color: ip_01 allocation score on node-01: 200 group_color: ip_01 allocation score on node-03: 1010 group_color: fs_01 allocation score on node-01: 0 group_color: fs_01 allocation score on node-03: 0 group_color: pbx_01 allocation score on node-01: 0 group_color: pbx_01 allocation score on node-03: 0 native_color: ip_01 allocation score on node-01: 200 native_color: ip_01 allocation score on node-03: 1010 drbd_01:0 promotion score on node-01: 100 drbd_01:1 promotion score on node-03: -100 drbd_01:2 promotion score on none: 0 native_color: fs_01 allocation score on node-01: -100 native_color: fs_01 allocation score on node-03: -100 native_color: pbx_01 allocation score on node-01: -100 native_color: pbx_01 allocation score on node-03: -100 [r...@node-01 ~]# crm status Last updated: Sat Oct 2 18:27:32 2010 Stack: Heartbeat Current DC: node-03 (3dd75a8f-9819-450f-9f18-c27730665925) - partition with quorum Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 3 Nodes configured, unknown expected votes 2 Resources configured. Online: [ node-03 node-01 node-02 ] Master/Slave Set: ms-drbd_01 Masters: [ node-01 ] Slaves: [ node-03 ] Resource Group: pbx_service_01 ip_01 (ocf::heartbeat:IPaddr2): Started node-03 fs_01 (ocf::heartbeat:Filesystem):Stopped pbx_01 (lsb:test-01): Stopped [r...@node-01 ~]# crm configure show node $id="3dd75a8f-9819-450f-9f18-c27730665925" node-03 node $id="4e47db29-5f14-4371-9734-317bf342b8ed" node-02 node $id="a8f56e42-438f-4ea5-a6ba-a7f1d23ed401" node-01 primitive drbd_01 ocf:linbit:drbd \ params drbd_resource="drbd_pbx_service_1" \ op monitor interval="30s" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="120s" primitive fs_01 ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \ meta migration-threshold="3" failure-timeout="60" \ op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" primitive ip_01 ocf:heartbeat:IPaddr2 \ params ip="192.168.78.10" cidr_netmask="24" broadcast="192.168.78.255" \ meta failure-timeout="120" migration-threshold="3" \ op monitor interval="5s" primitive pbx_01 lsb:test-01 \ meta migration-threshold="3" failure-timeout="60" \ op monitor interval="20s" timeout="20s" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" group pbx_service_01 ip_01 fs_01 pbx_01 \ meta target-role="Started" ms ms-drbd_01 drbd_01 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" location PrimaryNode-drbd_01 ms-drbd_01 100: node-01 location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01 location SecondaryNode-drbd_01 ms-drbd_01 0: node-03 location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03 colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start property $id="cib-bootstrap-options" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" \ symmetric-cluster="false" \ stonith-enabled="false" rsc_defaults $id="rsc-options" \ resource-stickiness="1000" [r...@node-01 ~]# Thanks, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Pr
Re: [Pacemaker] crm resource move doesn't move the resource
On 28 September 2010 15:09, Pavlos Parissis wrote: > Hi, > > > When I issue "crm resource move pbx_service_01 node-0N" it moves this > resource group but the fs_01 resource is not started because drbd_01 is > still running on other node and it is not moved as well tonode-0N, even I > have colocation constraints. > I am pretty sure that I have that working before, but I can't figure why it > doesn't work anymore. > The resource pbx_service_01 and drbd_01 are moved to another node in case > of failure, but for some reason not manually. > > Can you see in my conf where it could be the problem? I have already spent > some time and I think I can't see the obvious anymore:-( > > [...snip ...] Just to that this issue is applicable only for one of the resource group, even the conf is the same for both of them! So, after hours of running the same test again and again, and reading 10 lines of logs (BTW it seams that they say in a clear way why certain things happen) I decided to recreate the drbd_01 and ms-drbd_01 resource and adjust the order constraints before it was like this order fs_01-after-drbd_01 inf: ms-drbd_01:promote fs_01:start order fs_02-after-drbd_02 inf: ms-drbd_02:promote fs_02:start order pbx_01-after-fs_01 inf: fs_01 pbx_01 order pbx_01-after-ip_01 inf: ip_01 pbx_01 order pbx_02-after-fs_02 inf: fs_02 pbx_02 order pbx_02-after-ip_02 inf: ip_02 pbx_02 and now like this order fs_02-after-drbd_02 inf: ms-drbd_02:promote fs_02:start order pbx_02-after-fs_02 inf: fs_02 pbx_02 order pbx_02-after-ip_02 inf: ip_02 pbx_02 order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start* * as you can see no major changes. The end result is that now every time I issue "crm resource move pbx_service_01 node-0N" the drbd_01 is promoted on that node as well and the whole resource group is started! So, issue is solved but I don't like it for the very simple reason, I don't why it didn't work, and that scares me! Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] crm resource move doesn't move the resource
Hi, When I issue "crm resource move pbx_service_01 node-0N" it moves this resource group but the fs_01 resource is not started because drbd_01 is still running on other node and it is not moved as well tonode-0N, even I have colocation constraints. I am pretty sure that I have that working before, but I can't figure why it doesn't work anymore. The resource pbx_service_01 and drbd_01 are moved to another node in case of failure, but for some reason not manually. Can you see in my conf where it could be the problem? I have already spent some time and I think I can't see the obvious anymore:-( node $id="b8ad13a6-8a6e-4304-a4a1-8f69fa735100" node-02 node $id="d5557037-cf8f-49b7-95f5-c264927a0c76" node-01 node $id="e5195d6b-ed14-4bb3-92d3-9105543f9251" node-03 primitive drbd_01 ocf:linbit:drbd \ params drbd_resource="drbd_pbx_service_1" \ op monitor interval="30s" primitive drbd_02 ocf:linbit:drbd \ params drbd_resource="drbd_pbx_service_2" \ op monitor interval="30s" primitive fs_01 ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \ meta migration-threshold="3" failure-timeout="60" \ op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" primitive fs_02 ocf:heartbeat:Filesystem \ params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" \ meta migration-threshold="3" failure-timeout="60" \ op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" primitive ip_01 ocf:heartbeat:IPaddr2 \ params ip="10.10.10.10" cidr_netmask="25" broadcast="10.10.10.127" \ meta failure-timeout="120" migration-threshold="3" \ op monitor interval="5s" primitive ip_02 ocf:heartbeat:IPaddr2 \ params ip="10.10.10.11" cidr_netmask="25" broadcast="10.10.10.127" \ op monitor interval="5s" primitive pbx_01 ocf:heartbeat:Dummy \ params state="/pbx_service_01/Dummy.state" \ meta failure-timeout="60" migration-threshold="3" target-role="Started" \ op monitor interval="20s" timeout="40s" primitive pbx_02 ocf:heartbeat:Dummy \ params state="/pbx_service_02/Dummy.state" \ meta failure-timeout="60" migration-threshold="3" group pbx_service_01 ip_01 fs_01 pbx_01 \ meta target-role="Started" group pbx_service_02 ip_02 fs_02 pbx_02 \ meta target-role="Started" ms ms-drbd_01 drbd_01 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" ms ms-drbd_02 drbd_02 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" location PrimaryNode-drbd_01 ms-drbd_01 100: node-01 location PrimaryNode-drbd_02 ms-drbd_02 100: node-02 location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01 location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02 location SecondaryNode-drbd_01 ms-drbd_01 0: node-03 location SecondaryNode-drbd_02 ms-drbd_02 0: node-03 location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03 location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03 colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master colocation pbx_01-with-fs_01 inf: pbx_01 fs_01 colocation pbx_01-with-ip_01 inf: pbx_01 ip_01 colocation pbx_02-with-fs_02 inf: pbx_02 fs_02 colocation pbx_02-with-ip_02 inf: pbx_02 ip_02 order fs_01-after-drbd_01 inf: ms-drbd_01:promote fs_01:start order fs_02-after-drbd_02 inf: ms-drbd_02:promote fs_02:start order pbx_01-after-fs_01 inf: fs_01 pbx_01 order pbx_01-after-ip_01 inf: ip_01 pbx_01 order pbx_02-after-fs_02 inf: fs_02 pbx_02 order pbx_02-after-ip_02 inf: ip_02 pbx_02 property $id="cib-bootstrap-options" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ symmetric-cluster="false" \ last-lrm-refresh="1285323745" rsc_defaults $id="rsc-options" \ resource-stickiness="1000" ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker