Re: [ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources
Hello Ken, Thank you so much for your remarks. We have just applied them and will perform some tests to see changes. Best Regards, El 16-02-2017 a las 16:34, Ken Gaillot escribió: On 02/16/2017 02:26 AM, Félix Díaz de Rada wrote: Hi all, We are currently setting up a MySQL cluster (Master-Slave) over this platform: - Two nodes, on RHEL 7.0 - pacemaker-1.1.10-29.el7.x86_64 - corosync-2.3.3-2.el7.x86_64 - pcs-0.9.115-32.el7.x86_64 There is a IP address resource to be used as a "virtual IP". This is configuration of cluster: Cluster Name: webmobbdprep Corosync Nodes: webmob1bdprep-ges webmob2bdprep-ges Pacemaker Nodes: webmob1bdprep-ges webmob2bdprep-ges Resources: Group: G_MySQL_M Meta Attrs: priority=100 Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m) Attributes: binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep log=/data/webmob_prep/webmob_prep.err pid=/data/webmob_prep/webmob_rep.pid socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql test_table=replica.pacemaker_test test_user=root Meta Attrs: resource-stickiness=1000 Operations: promote interval=0s timeout=120 (MySQL_M-promote-timeout-120) demote interval=0s timeout=120 (MySQL_M-demote-timeout-120) start interval=0s timeout=120s on-fail=restart (MySQL_M-start-timeout-120s-on-fail-restart) stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s) monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 (MySQL_M-monitor-interval-60s-timeout-30s) Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32 Meta Attrs: target-role=Started migration-threshold=3 failure-timeout=60s Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s) stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s) monitor interval=60s (ClusterIP-monitor-interval-60s) Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s) Attributes: binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep log=/data/webmob_prep/webmob_prep.err pid=/data/webmob_prep/webmob_rep.pid socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql test_table=replica.pacemaker_test test_user=root Meta Attrs: resource-stickiness=0 Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120) demote interval=0s timeout=120 (MySQL_S-demote-timeout-120) start interval=0s timeout=120s on-fail=restart (MySQL_S-start-timeout-120s-on-fail-restart) stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s) monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 (MySQL_S-monitor-interval-60s-timeout-30s) Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: start MySQL_M then start ClusterIP (Mandatory) (id:order-MySQL_M-ClusterIP-mandatory) start G_MySQL_M then start MySQL_S (Mandatory) (id:order-G_MySQL_M-MySQL_S-mandatory) Colocation Constraints: G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY) Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.10-29.el7-368c726 last-lrm-refresh: 1487148812 no-quorum-policy: ignore stonith-enabled: false Pacemaker works as expected under most of situations, but there is one scenario that is really not understable to us. I will try to describe it: a - Master resource (and Cluster IP address) are active on node 1 and Slave resource is active on node 2. b - We force movement of Master resource to node 2. c - Pacemaker stops all resources: Master, Slave and Cluster IP. d - Master resource and Cluster IP are started on node 2 (this is OK), but Slave also tries to start (??). It fails (logically, because Master resource has been started on the same node), it logs an "unknown error" and its state is marked as "failed". This is a capture of 'pcs status' at that point: OFFLINE: [ webmob1bdprep-ges ] Online: [ webmob2bdprep-ges ] Full list of resources: Resource Group: G_MySQL_M MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges Failed actions: MySQL_M_monitor_6 on webmob2bdprep-ges 'master' (8): call=62, status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms, exec=0ms MySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78, status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms, exec=0ms PCSD Status: webmob1bdprep-ges: Offline webmob2bdprep-ges: Online e - Pacemaker moves Slave resource to node 1 and starts it. Now we have both resources started again, Master on node 2 and Slave on node 1. f - One minute later, Pa
Re: [ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources
On 02/16/2017 02:26 AM, Félix Díaz de Rada wrote: > > Hi all, > > We are currently setting up a MySQL cluster (Master-Slave) over this > platform: > - Two nodes, on RHEL 7.0 > - pacemaker-1.1.10-29.el7.x86_64 > - corosync-2.3.3-2.el7.x86_64 > - pcs-0.9.115-32.el7.x86_64 > There is a IP address resource to be used as a "virtual IP". > > This is configuration of cluster: > > Cluster Name: webmobbdprep > Corosync Nodes: > webmob1bdprep-ges webmob2bdprep-ges > Pacemaker Nodes: > webmob1bdprep-ges webmob2bdprep-ges > > Resources: > Group: G_MySQL_M > Meta Attrs: priority=100 > Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m) >Attributes: > binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe > config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep > log=/data/webmob_prep/webmob_prep.err > pid=/data/webmob_prep/webmob_rep.pid > socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql > test_table=replica.pacemaker_test test_user=root >Meta Attrs: resource-stickiness=1000 >Operations: promote interval=0s timeout=120 (MySQL_M-promote-timeout-120) >demote interval=0s timeout=120 (MySQL_M-demote-timeout-120) >start interval=0s timeout=120s on-fail=restart > (MySQL_M-start-timeout-120s-on-fail-restart) >stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s) >monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 > (MySQL_M-monitor-interval-60s-timeout-30s) > Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32 >Meta Attrs: target-role=Started migration-threshold=3 > failure-timeout=60s >Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s) >stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s) >monitor interval=60s (ClusterIP-monitor-interval-60s) > Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s) > Attributes: > binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe > config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep > log=/data/webmob_prep/webmob_prep.err > pid=/data/webmob_prep/webmob_rep.pid > socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql > test_table=replica.pacemaker_test test_user=root > Meta Attrs: resource-stickiness=0 > Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120) > demote interval=0s timeout=120 (MySQL_S-demote-timeout-120) > start interval=0s timeout=120s on-fail=restart > (MySQL_S-start-timeout-120s-on-fail-restart) > stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s) > monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 > (MySQL_S-monitor-interval-60s-timeout-30s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Ordering Constraints: > start MySQL_M then start ClusterIP (Mandatory) > (id:order-MySQL_M-ClusterIP-mandatory) > start G_MySQL_M then start MySQL_S (Mandatory) > (id:order-G_MySQL_M-MySQL_S-mandatory) > Colocation Constraints: > G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY) > > Cluster Properties: > cluster-infrastructure: corosync > dc-version: 1.1.10-29.el7-368c726 > last-lrm-refresh: 1487148812 > no-quorum-policy: ignore > stonith-enabled: false > > Pacemaker works as expected under most of situations, but there is one > scenario that is really not understable to us. I will try to describe it: > > a - Master resource (and Cluster IP address) are active on node 1 and > Slave resource is active on node 2. > b - We force movement of Master resource to node 2. > c - Pacemaker stops all resources: Master, Slave and Cluster IP. > d - Master resource and Cluster IP are started on node 2 (this is OK), > but Slave also tries to start (??). It fails (logically, because Master > resource has been started on the same node), it logs an "unknown error" > and its state is marked as "failed". This is a capture of 'pcs status' > at that point: > > OFFLINE: [ webmob1bdprep-ges ] > Online: [ webmob2bdprep-ges ] > > Full list of resources: > > Resource Group: G_MySQL_M > MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges > ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges > MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges > > Failed actions: > MySQL_M_monitor_6 on webmob2bdprep-ges 'master' (8): call=62, > status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms, > exec=0ms > MySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78, > status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms, > exec=0ms > > PCSD Status: > webmob1bdprep-ges: Offline > webmob2bdprep-ges: Online > > e - Pacemaker moves Slave resource to node 1 and starts it. Now we have > both resources started again, Master on node 2 and Slave on node 1. > f - One minu
[ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources
Hi all, We are currently setting up a MySQL cluster (Master-Slave) over this platform: - Two nodes, on RHEL 7.0 - pacemaker-1.1.10-29.el7.x86_64 - corosync-2.3.3-2.el7.x86_64 - pcs-0.9.115-32.el7.x86_64 There is a IP address resource to be used as a "virtual IP". This is configuration of cluster: Cluster Name: webmobbdprep Corosync Nodes: webmob1bdprep-ges webmob2bdprep-ges Pacemaker Nodes: webmob1bdprep-ges webmob2bdprep-ges Resources: Group: G_MySQL_M Meta Attrs: priority=100 Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m) Attributes: binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep log=/data/webmob_prep/webmob_prep.err pid=/data/webmob_prep/webmob_rep.pid socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql test_table=replica.pacemaker_test test_user=root Meta Attrs: resource-stickiness=1000 Operations: promote interval=0s timeout=120 (MySQL_M-promote-timeout-120) demote interval=0s timeout=120 (MySQL_M-demote-timeout-120) start interval=0s timeout=120s on-fail=restart (MySQL_M-start-timeout-120s-on-fail-restart) stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s) monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 (MySQL_M-monitor-interval-60s-timeout-30s) Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32 Meta Attrs: target-role=Started migration-threshold=3 failure-timeout=60s Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s) stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s) monitor interval=60s (ClusterIP-monitor-interval-60s) Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s) Attributes: binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep log=/data/webmob_prep/webmob_prep.err pid=/data/webmob_prep/webmob_rep.pid socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql test_table=replica.pacemaker_test test_user=root Meta Attrs: resource-stickiness=0 Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120) demote interval=0s timeout=120 (MySQL_S-demote-timeout-120) start interval=0s timeout=120s on-fail=restart (MySQL_S-start-timeout-120s-on-fail-restart) stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s) monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 (MySQL_S-monitor-interval-60s-timeout-30s) Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: start MySQL_M then start ClusterIP (Mandatory) (id:order-MySQL_M-ClusterIP-mandatory) start G_MySQL_M then start MySQL_S (Mandatory) (id:order-G_MySQL_M-MySQL_S-mandatory) Colocation Constraints: G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY) Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.10-29.el7-368c726 last-lrm-refresh: 1487148812 no-quorum-policy: ignore stonith-enabled: false Pacemaker works as expected under most of situations, but there is one scenario that is really not understable to us. I will try to describe it: a - Master resource (and Cluster IP address) are active on node 1 and Slave resource is active on node 2. b - We force movement of Master resource to node 2. c - Pacemaker stops all resources: Master, Slave and Cluster IP. d - Master resource and Cluster IP are started on node 2 (this is OK), but Slave also tries to start (??). It fails (logically, because Master resource has been started on the same node), it logs an "unknown error" and its state is marked as "failed". This is a capture of 'pcs status' at that point: OFFLINE: [ webmob1bdprep-ges ] Online: [ webmob2bdprep-ges ] Full list of resources: Resource Group: G_MySQL_M MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges Failed actions: MySQL_M_monitor_6 on webmob2bdprep-ges 'master' (8): call=62, status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms, exec=0ms MySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78, status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms, exec=0ms PCSD Status: webmob1bdprep-ges: Offline webmob2bdprep-ges: Online e - Pacemaker moves Slave resource to node 1 and starts it. Now we have both resources started again, Master on node 2 and Slave on node 1. f - One minute later, Pacemaker restarts both resources (???). So we are wondering: - After the migration of the Master resource, why Pacemaker tries to start Slave resource on the same node where Master resource has been started previously? Why is