Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)
Thank you very much Ken!! You nailed it, now it's working :-) On Tue, Dec 5, 2017 at 5:29 AM, Ken Gaillotwrote: > On Mon, 2017-12-04 at 23:15 +0800, Hui Xiang wrote: > > Thanks Ken very much for the helpful information. It indeed help a > > lot for debbuging. > > > > " Each time the DC decides what to do, there will be a line like > > "... > > saving inputs in ..." with a file name. The log messages just before > > that may give some useful information." > > - I am unable to find such information in the logs, it only prints > > some like /var/lib/pacemaker/pengine/pe-input-xx > > If the cluster had nothing to do, it won't show anything, but if > actions were needed, it should show them, like > "Start myrsc ( node1 )". > > Are there any messages with "error" or "warning" in the log? > > > When I am comparing the cib.xml file of good with bad one, it > > diffetiates from the order of "name" and "id" as below shown, does it > > matter for cib to function normally? > > No, the XML attributes can be any order. > > I just noticed that your cluster has symmetric-cluster=false. That > means that resources can't run anywhere by default; in order for a > resource to run, there must be a location constraint allowing it to run > on a node. Have you added such constraints? > > > > > > > > name="monitor" timeout="30"/> > > > timeout="60"/> > > > timeout="60"/> > > > name="promote" timeout="60"/> > > > name="demote" timeout="60"/> > > > > > > > > > timeout="30" id="ovndb-servers-monitor-20"/> > > > > > > > > > > > > Thanks. > > Hui. > > > > > > On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillot > > wrote: > > > On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote: > > > > Hi all, > > > > > > > > I am using the ovndb-servers ocf agent[1] which is a kind of > > > multi- > > > > state resource,when I am creating it(please see my previous > > > email), > > > > the monitor is called only once, and the start operation is never > > > > called, according to below description, the once called monitor > > > > operation returned OCF_NOT_RUNNING, should the pacemaker will > > > decide > > > > to execute start action based this return code? is there any way > > > to > > > > > > Before Pacemaker does anything with a resource, it first calls a > > > one- > > > time monitor (called a "probe") to find out the current status of > > > the > > > resource across the cluster. This allows it to discover if the > > > service > > > is already running somewhere. > > > > > > So, you will see those probes for every resource when the cluster > > > starts, or when the resource is added to the configuration, or when > > > the > > > resource is cleaned up. > > > > > > > check out what is the next action? Currently in my environment > > > > nothing happened and I am almost tried all I known ways to debug, > > > > however, no lucky, could anyone help it out? thank you very much. > > > > > > > > Monitor Return Code Description > > > > OCF_NOT_RUNNING Stopped > > > > OCF_SUCCESS Running (Slave) > > > > OCF_RUNNING_MASTERRunning (Master) > > > > OCF_FAILED_MASTER Failed (Master) > > > > Other Failed (Slave) > > > > > > > > > > > > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ > > > ovnd > > > > b-servers.ocf > > > > Hui. > > > > > > > > > > > > > > > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang > > > > wrote: > > > > > The really weired thing is that the monitor is only called once > > > > > other than expected repeatedly, where should I check for it? > > > > > > > > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang > > > > > > > > wrote: > > > > > > Thanks Ken very much for your helpful infomation. > > > > > > > > > > > > I am now blocking on I can't see the pacemaker DC do any > > > further > > > > > > start/promote etc action on my resource agents, no helpful > > > logs > > > > > > founded. > > > > > > Each time the DC decides what to do, there will be a line like "... > > > saving inputs in ..." with a file name. The log messages just > > > before > > > that may give some useful information. > > > > > > Otherwise, you can take that file, and simulate what the cluster > > > decided at that point: > > > > > > crm_simulate -Sx $FILENAME > > > > > > It will first show the status of the cluster at the start of the > > > decision-making, then a "Transition Summary" with the actions that > > > are > > > required, then a simulated execution of those actions, and then > > > what > > > the resulting status would be if those actions succeeded. > > > > > > That may give you some more information. You can make it more > > > verbose > > > by using "-Ssx", or by adding "-", but it's not very user- > > > friendly > > > output. > > > > > > > > > > > > > > > So my first question is
Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)
On Mon, 2017-12-04 at 23:15 +0800, Hui Xiang wrote: > Thanks Ken very much for the helpful information. It indeed help a > lot for debbuging. > > " Each time the DC decides what to do, there will be a line like > "... > saving inputs in ..." with a file name. The log messages just before > that may give some useful information." > - I am unable to find such information in the logs, it only prints > some like /var/lib/pacemaker/pengine/pe-input-xx If the cluster had nothing to do, it won't show anything, but if actions were needed, it should show them, like "Start myrsc ( node1 )". Are there any messages with "error" or "warning" in the log? > When I am comparing the cib.xml file of good with bad one, it > diffetiates from the order of "name" and "id" as below shown, does it > matter for cib to function normally? No, the XML attributes can be any order. I just noticed that your cluster has symmetric-cluster=false. That means that resources can't run anywhere by default; in order for a resource to run, there must be a location constraint allowing it to run on a node. Have you added such constraints? > > > name="monitor" timeout="30"/> > timeout="60"/> > timeout="60"/> > name="promote" timeout="60"/> > name="demote" timeout="60"/> > > > >timeout="30" id="ovndb-servers-monitor-20"/> > > > > > > Thanks. > Hui. > > > On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillot > wrote: > > On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote: > > > Hi all, > > > > > > I am using the ovndb-servers ocf agent[1] which is a kind of > > multi- > > > state resource,when I am creating it(please see my previous > > email), > > > the monitor is called only once, and the start operation is never > > > called, according to below description, the once called monitor > > > operation returned OCF_NOT_RUNNING, should the pacemaker will > > decide > > > to execute start action based this return code? is there any way > > to > > > > Before Pacemaker does anything with a resource, it first calls a > > one- > > time monitor (called a "probe") to find out the current status of > > the > > resource across the cluster. This allows it to discover if the > > service > > is already running somewhere. > > > > So, you will see those probes for every resource when the cluster > > starts, or when the resource is added to the configuration, or when > > the > > resource is cleaned up. > > > > > check out what is the next action? Currently in my environment > > > nothing happened and I am almost tried all I known ways to debug, > > > however, no lucky, could anyone help it out? thank you very much. > > > > > > Monitor Return Code Description > > > OCF_NOT_RUNNING Stopped > > > OCF_SUCCESS Running (Slave) > > > OCF_RUNNING_MASTER Running (Master) > > > OCF_FAILED_MASTER Failed (Master) > > > Other Failed (Slave) > > > > > > > > > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ > > ovnd > > > b-servers.ocf > > > Hui. > > > > > > > > > > > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang > > > wrote: > > > > The really weired thing is that the monitor is only called once > > > > other than expected repeatedly, where should I check for it? > > > > > > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang > > > > > > wrote: > > > > > Thanks Ken very much for your helpful infomation. > > > > > > > > > > I am now blocking on I can't see the pacemaker DC do any > > further > > > > > start/promote etc action on my resource agents, no helpful > > logs > > > > > founded. > > > > Each time the DC decides what to do, there will be a line like "... > > saving inputs in ..." with a file name. The log messages just > > before > > that may give some useful information. > > > > Otherwise, you can take that file, and simulate what the cluster > > decided at that point: > > > > crm_simulate -Sx $FILENAME > > > > It will first show the status of the cluster at the start of the > > decision-making, then a "Transition Summary" with the actions that > > are > > required, then a simulated execution of those actions, and then > > what > > the resulting status would be if those actions succeeded. > > > > That may give you some more information. You can make it more > > verbose > > by using "-Ssx", or by adding "-", but it's not very user- > > friendly > > output. > > > > > > > > > > > > So my first question is that in what kind of situation DC > > will > > > > > decide do call start action? does the monitor operation need > > to > > > > > be return OCF_SUCCESS? in my case, it will return > > > > > OCF_NOT_RUNNING, and the monitor operation is not being > > called > > > > > any more, which should be wrong as I felt that it should be > > > > > called intervally. > > > >
Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)
Thanks Ken very much for the helpful information. It indeed help a lot for debbuging. " Each time the DC decides what to do, there will be a line like "... saving inputs in ..." with a file name. The log messages just before that may give some useful information." - I am unable to find such information in the logs, it only prints some like /var/lib/pacemaker/pengine/pe-input-xx When I am comparing the cib.xml file of good with bad one, it diffetiates from the order of "name" and "id" as below shown, does it matter for cib to function normally? Thanks. Hui. On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillotwrote: > On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote: > > Hi all, > > > > I am using the ovndb-servers ocf agent[1] which is a kind of multi- > > state resource,when I am creating it(please see my previous email), > > the monitor is called only once, and the start operation is never > > called, according to below description, the once called monitor > > operation returned OCF_NOT_RUNNING, should the pacemaker will decide > > to execute start action based this return code? is there any way to > > Before Pacemaker does anything with a resource, it first calls a one- > time monitor (called a "probe") to find out the current status of the > resource across the cluster. This allows it to discover if the service > is already running somewhere. > > So, you will see those probes for every resource when the cluster > starts, or when the resource is added to the configuration, or when the > resource is cleaned up. > > > check out what is the next action? Currently in my environment > > nothing happened and I am almost tried all I known ways to debug, > > however, no lucky, could anyone help it out? thank you very much. > > > > Monitor Return Code Description > > OCF_NOT_RUNNING Stopped > > OCF_SUCCESS Running (Slave) > > OCF_RUNNING_MASTERRunning (Master) > > OCF_FAILED_MASTER Failed (Master) > > Other Failed (Slave) > > > > > > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovnd > > b-servers.ocf > > Hui. > > > > > > > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang > > wrote: > > > The really weired thing is that the monitor is only called once > > > other than expected repeatedly, where should I check for it? > > > > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang > > > wrote: > > > > Thanks Ken very much for your helpful infomation. > > > > > > > > I am now blocking on I can't see the pacemaker DC do any further > > > > start/promote etc action on my resource agents, no helpful logs > > > > founded. > > Each time the DC decides what to do, there will be a line like "... > saving inputs in ..." with a file name. The log messages just before > that may give some useful information. > > Otherwise, you can take that file, and simulate what the cluster > decided at that point: > > crm_simulate -Sx $FILENAME > > It will first show the status of the cluster at the start of the > decision-making, then a "Transition Summary" with the actions that are > required, then a simulated execution of those actions, and then what > the resulting status would be if those actions succeeded. > > That may give you some more information. You can make it more verbose > by using "-Ssx", or by adding "-", but it's not very user-friendly > output. > > > > > > > > > So my first question is that in what kind of situation DC will > > > > decide do call start action? does the monitor operation need to > > > > be return OCF_SUCCESS? in my case, it will return > > > > OCF_NOT_RUNNING, and the monitor operation is not being called > > > > any more, which should be wrong as I felt that it should be > > > > called intervally. > > The DC will ask for a start if the configuration and current status > require it. For example, if the resource's current status is stopped, > and the configuration calls for a target role of started (the default), > then it will start it. On the other hand, if the current status is > started, then it doesn't need to do anything -- or, if location > constraints ban all the nodes from running the resource, then it can't > do anything. > > So, it's all based on what the current status is (based on the last > monitor result), and what the configuration requires. > > > > > > > > > The resource agent monitor logistic: > > > > In the xx_monitor function it will call xx_update, and there > > > > always hit "$CRM_MASTER -D;;" , what does it usually mean? will > > > > it stopped that start operation being called? > > Each master/slave resource has a special node attribute with a "master > score" for that node. The node with the highest master score will be > promoted to master. It's up to the resource agent to set this > attribute. The "-D" call you see deletes that
Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)
On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote: > Hi all, > > I am using the ovndb-servers ocf agent[1] which is a kind of multi- > state resource,when I am creating it(please see my previous email), > the monitor is called only once, and the start operation is never > called, according to below description, the once called monitor > operation returned OCF_NOT_RUNNING, should the pacemaker will decide > to execute start action based this return code? is there any way to Before Pacemaker does anything with a resource, it first calls a one- time monitor (called a "probe") to find out the current status of the resource across the cluster. This allows it to discover if the service is already running somewhere. So, you will see those probes for every resource when the cluster starts, or when the resource is added to the configuration, or when the resource is cleaned up. > check out what is the next action? Currently in my environment > nothing happened and I am almost tried all I known ways to debug, > however, no lucky, could anyone help it out? thank you very much. > > Monitor Return Code Description > OCF_NOT_RUNNING Stopped > OCF_SUCCESS Running (Slave) > OCF_RUNNING_MASTERRunning (Master) > OCF_FAILED_MASTER Failed (Master) > Other Failed (Slave) > > > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovnd > b-servers.ocf > Hui. > > > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang> wrote: > > The really weired thing is that the monitor is only called once > > other than expected repeatedly, where should I check for it? > > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang > > wrote: > > > Thanks Ken very much for your helpful infomation. > > > > > > I am now blocking on I can't see the pacemaker DC do any further > > > start/promote etc action on my resource agents, no helpful logs > > > founded. Each time the DC decides what to do, there will be a line like "... saving inputs in ..." with a file name. The log messages just before that may give some useful information. Otherwise, you can take that file, and simulate what the cluster decided at that point: crm_simulate -Sx $FILENAME It will first show the status of the cluster at the start of the decision-making, then a "Transition Summary" with the actions that are required, then a simulated execution of those actions, and then what the resulting status would be if those actions succeeded. That may give you some more information. You can make it more verbose by using "-Ssx", or by adding "-", but it's not very user-friendly output. > > > > > > So my first question is that in what kind of situation DC will > > > decide do call start action? does the monitor operation need to > > > be return OCF_SUCCESS? in my case, it will return > > > OCF_NOT_RUNNING, and the monitor operation is not being called > > > any more, which should be wrong as I felt that it should be > > > called intervally. The DC will ask for a start if the configuration and current status require it. For example, if the resource's current status is stopped, and the configuration calls for a target role of started (the default), then it will start it. On the other hand, if the current status is started, then it doesn't need to do anything -- or, if location constraints ban all the nodes from running the resource, then it can't do anything. So, it's all based on what the current status is (based on the last monitor result), and what the configuration requires. > > > > > > The resource agent monitor logistic: > > > In the xx_monitor function it will call xx_update, and there > > > always hit "$CRM_MASTER -D;;" , what does it usually mean? will > > > it stopped that start operation being called? Each master/slave resource has a special node attribute with a "master score" for that node. The node with the highest master score will be promoted to master. It's up to the resource agent to set this attribute. The "-D" call you see deletes that attribute (presumably before updating it later). The master score has no effect on starting/stopping. > > > > > > ovsdb_server_master_update() { > > > ocf_log info "ovsdb_server_master_update: $1}" > > > > > > case $1 in > > > $OCF_SUCCESS) > > > $CRM_MASTER -v ${slave_score};; > > > $OCF_RUNNING_MASTER) > > > $CRM_MASTER -v ${master_score};; > > > #*) $CRM_MASTER -D;; > > > esac > > > ocf_log info "ovsdb_server_master_update end}" > > > } > > > > > > ovsdb_server_monitor() { > > > ocf_log info "ovsdb_server_monitor" > > > ovsdb_server_check_status > > > rc=$? > > > > > > ovsdb_server_master_update $rc > > > ocf_log info "monitor is going to return $rc" > > > return $rc > > > } > > > > > > > > > Below is my cluster configuration: > > > > > > 1. First I have an vip set. > > > [root@node-1 ~]# pcs resource show > > > vip__management_old
Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)
Hi all, I am using the ovndb-servers ocf agent[1] which is a kind of multi-state resource,when I am creating it(please see my previous email), the monitor is called only once, and the start operation is never called, according to below description, the once called monitor operation returned OCF_NOT_RUNNING, should the pacemaker will decide to execute start action based this return code? is there any way to check out what is the next action? Currently in my environment nothing happened and I am almost tried all I known ways to debug, however, no lucky, could anyone help it out? thank you very much. Monitor Return CodeDescription OCF_NOT_RUNNING Stopped OCF_SUCCESS Running (Slave) OCF_RUNNING_MASTER Running (Master) OCF_FAILED_MASTER Failed (Master) Other Failed (Slave) [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovndb-servers.ocf Hui. On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiangwrote: > The really weired thing is that the monitor is only called once other than > expected repeatedly, where should I check for it? > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang wrote: > >> Thanks Ken very much for your helpful infomation. >> >> I am now blocking on I can't see the pacemaker DC do any further >> start/promote etc action on my resource agents, no helpful logs founded. >> >> So my first question is that in what kind of situation DC will decide do >> call start action? does the monitor operation need to be return >> OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor >> operation is not being called any more, which should be wrong as I felt >> that it should be called intervally. >> >> The resource agent monitor logistic: >> In the xx_monitor function it will call xx_update, and there always hit >> "$CRM_MASTER -D;;" , what does it usually mean? will it stopped that >> start operation being called? >> >> ovsdb_server_master_update() { >> ocf_log info "ovsdb_server_master_update: $1}" >> >> case $1 in >> $OCF_SUCCESS) >> $CRM_MASTER -v ${slave_score};; >> $OCF_RUNNING_MASTER) >> $CRM_MASTER -v ${master_score};; >> #*) $CRM_MASTER -D;; >> esac >> ocf_log info "ovsdb_server_master_update end}" >> } >> >> ovsdb_server_monitor() { >> ocf_log info "ovsdb_server_monitor" >> ovsdb_server_check_status >> rc=$? >> >> ovsdb_server_master_update $rc >> ocf_log info "monitor is going to return $rc" >> return $rc >> } >> >> >> Below is my cluster configuration: >> >> 1. First I have an vip set. >> [root@node-1 ~]# pcs resource show >> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >> >> 2. Use pcs to create ovndb-servers and constraint >> [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers >> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 >> sb_master_port=6642 master >> ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true >> Error: unable to find a resource/clone/master/group: >> tst-ovndb-master) ## returned error, so I changed into below command. >> [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb >> notify=true >> [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master >> with vip__management_old >> >> 3. pcs status >> [root@node-1 ~]# pcs status >> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >> Master/Slave Set: tst-ovndb-master [tst-ovndb] >> Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] >> >> 4. pcs resource show XXX >> [root@node-1 ~]# pcs resource show vip__management_old >> Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2) >> Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m >> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none >> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false >> iptables_comment=default-comment >> Meta Attrs: migration-threshold=3 failure-timeout=60 >> resource-stickiness=1 >> Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3 >> ) >> start interval=0 timeout=30 (vip__management_old-start-0) >> stop interval=0 timeout=30 (vip__management_old-stop-0) >> [root@node-1 ~]# pcs resource show tst-ovndb-master >> Master: tst-ovndb-master >> Meta Attrs: notify=true >> Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers) >>Attributes: manage_northd=yes master_ip=192.168.0.2 >> nb_master_port=6641 sb_master_port=6642 >>Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s) >>stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s) >>promote interval=0s timeout=50s >> (tst-ovndb-promote-timeout-50s) >>demote interval=0s timeout=50s >> (tst-ovndb-demote-timeout-50s) >>monitor interval=30s timeout=20s >> (tst-ovndb-monitor-interval-30s) >>
Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)
The really weired thing is that the monitor is only called once other than expected repeatedly, where should I check for it? On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiangwrote: > Thanks Ken very much for your helpful infomation. > > I am now blocking on I can't see the pacemaker DC do any further > start/promote etc action on my resource agents, no helpful logs founded. > > So my first question is that in what kind of situation DC will decide do > call start action? does the monitor operation need to be return > OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor > operation is not being called any more, which should be wrong as I felt > that it should be called intervally. > > The resource agent monitor logistic: > In the xx_monitor function it will call xx_update, and there always hit > "$CRM_MASTER > -D;;" , what does it usually mean? will it stopped that start operation > being called? > > ovsdb_server_master_update() { > ocf_log info "ovsdb_server_master_update: $1}" > > case $1 in > $OCF_SUCCESS) > $CRM_MASTER -v ${slave_score};; > $OCF_RUNNING_MASTER) > $CRM_MASTER -v ${master_score};; > #*) $CRM_MASTER -D;; > esac > ocf_log info "ovsdb_server_master_update end}" > } > > ovsdb_server_monitor() { > ocf_log info "ovsdb_server_monitor" > ovsdb_server_check_status > rc=$? > > ovsdb_server_master_update $rc > ocf_log info "monitor is going to return $rc" > return $rc > } > > > Below is my cluster configuration: > > 1. First I have an vip set. > [root@node-1 ~]# pcs resource show > vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld > > 2. Use pcs to create ovndb-servers and constraint > [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers > manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 > sb_master_port=6642 master > ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true > Error: unable to find a resource/clone/master/group: > tst-ovndb-master) ## returned error, so I changed into below command. > [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb > notify=true > [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master > with vip__management_old > > 3. pcs status > [root@node-1 ~]# pcs status > vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld > Master/Slave Set: tst-ovndb-master [tst-ovndb] > Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] > > 4. pcs resource show XXX > [root@node-1 ~]# pcs resource show vip__management_old > Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2) > Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m > ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none > gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false > iptables_comment=default-comment > Meta Attrs: migration-threshold=3 failure-timeout=60 > resource-stickiness=1 > Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3 > ) > start interval=0 timeout=30 (vip__management_old-start-0) > stop interval=0 timeout=30 (vip__management_old-stop-0) > [root@node-1 ~]# pcs resource show tst-ovndb-master > Master: tst-ovndb-master > Meta Attrs: notify=true > Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers) >Attributes: manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 > sb_master_port=6642 >Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s) >stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s) >promote interval=0s timeout=50s > (tst-ovndb-promote-timeout-50s) >demote interval=0s timeout=50s > (tst-ovndb-demote-timeout-50s) >monitor interval=30s timeout=20s > (tst-ovndb-monitor-interval-30s) >monitor interval=10s role=Master timeout=20s > (tst-ovndb-monitor-interval-10s-role-Master) >monitor interval=30s role=Slave timeout=20s > (tst-ovndb-monitor-interval-30s-role-Slave) > > > colocation colocation-tst-ovndb-master-vip__management_old-INFINITY inf: > tst-ovndb-master:Master vip__management_old:Started > > 5. I have put log in every ovndb-servers op, seems only the monitor op is > being called, no promoted by the pacemaker DC: > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: > ovsdb_server_monitor > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: > ovsdb_server_check_status > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: > return OCFOCF_NOT_RUNNINGG > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: > ovsdb_server_master_update: 7} > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: > ovsdb_server_master_update end} > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: > monitor is
Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)
Thanks Ken very much for your helpful infomation. I am now blocking on I can't see the pacemaker DC do any further start/promote etc action on my resource agents, no helpful logs founded. So my first question is that in what kind of situation DC will decide do call start action? does the monitor operation need to be return OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor operation is not being called any more, which should be wrong as I felt that it should be called intervally. The resource agent monitor logistic: In the xx_monitor function it will call xx_update, and there always hit "$CRM_MASTER -D;;" , what does it usually mean? will it stopped that start operation being called? ovsdb_server_master_update() { ocf_log info "ovsdb_server_master_update: $1}" case $1 in $OCF_SUCCESS) $CRM_MASTER -v ${slave_score};; $OCF_RUNNING_MASTER) $CRM_MASTER -v ${master_score};; #*) $CRM_MASTER -D;; esac ocf_log info "ovsdb_server_master_update end}" } ovsdb_server_monitor() { ocf_log info "ovsdb_server_monitor" ovsdb_server_check_status rc=$? ovsdb_server_master_update $rc ocf_log info "monitor is going to return $rc" return $rc } Below is my cluster configuration: 1. First I have an vip set. [root@node-1 ~]# pcs resource show vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld 2. Use pcs to create ovndb-servers and constraint [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 sb_master_port=6642 master ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true Error: unable to find a resource/clone/master/group: tst-ovndb-master) ## returned error, so I changed into below command. [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb notify=true [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master with vip__management_old 3. pcs status [root@node-1 ~]# pcs status vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld Master/Slave Set: tst-ovndb-master [tst-ovndb] Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] 4. pcs resource show XXX [root@node-1 ~]# pcs resource show vip__management_old Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2) Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false iptables_comment=default-comment Meta Attrs: migration-threshold=3 failure-timeout=60 resource-stickiness=1 Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3) start interval=0 timeout=30 (vip__management_old-start-0) stop interval=0 timeout=30 (vip__management_old-stop-0) [root@node-1 ~]# pcs resource show tst-ovndb-master Master: tst-ovndb-master Meta Attrs: notify=true Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers) Attributes: manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 sb_master_port=6642 Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s) stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s) promote interval=0s timeout=50s (tst-ovndb-promote-timeout- 50s) demote interval=0s timeout=50s (tst-ovndb-demote-timeout-50s) monitor interval=30s timeout=20s (tst-ovndb-monitor-interval- 30s) monitor interval=10s role=Master timeout=20s (tst-ovndb-monitor-interval-10s-role-Master) monitor interval=30s role=Slave timeout=20s (tst-ovndb-monitor-interval-30s-role-Slave) colocation colocation-tst-ovndb-master-vip__management_old-INFINITY inf: tst-ovndb-master:Master vip__management_old:Started 5. I have put log in every ovndb-servers op, seems only the monitor op is being called, no promoted by the pacemaker DC: <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: ovsdb_server_monitor <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: ovsdb_server_check_status <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: return OCFOCF_NOT_RUNNINGG <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: ovsdb_server_master_update: 7} <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: ovsdb_server_master_update end} <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: monitor is going to return 7 <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO: metadata exit OCF_SUCCESS} 6. The cluster property: property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.12-a14efad \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ stonith-enabled=false \ symmetric-cluster=false \ last-lrm-refresh=1511802933