Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-12-05 Thread Hui Xiang
Thank you very much Ken!! You nailed it, now it's working :-)

On Tue, Dec 5, 2017 at 5:29 AM, Ken Gaillot  wrote:

> On Mon, 2017-12-04 at 23:15 +0800, Hui Xiang wrote:
> > Thanks Ken very much for the helpful information. It indeed help a
> > lot for debbuging.
> >
> >  " Each time the DC decides what to do, there will be a line like
> > "...
> > saving inputs in ..." with a file name. The log messages just before
> > that may give some useful information."
> >   - I am unable to find such information in the logs, it only prints
> > some like /var/lib/pacemaker/pengine/pe-input-xx
>
> If the cluster had nothing to do, it won't show anything, but if
> actions were needed, it should show them, like
> "Start  myrsc ( node1 )".
>
> Are there any messages with "error" or "warning" in the log?
>
> > When I am comparing the cib.xml file of good with bad one, it
> > diffetiates from the order of "name" and "id" as below shown, does it
> > matter for cib to function normally?
>
> No, the XML attributes can be any order.
>
> I just noticed that your cluster has symmetric-cluster=false. That
> means that resources can't run anywhere by default; in order for a
> resource to run, there must be a location constraint allowing it to run
> on a node. Have you added such constraints?
>
> >
> >   
> >  > name="monitor" timeout="30"/>
> >  > timeout="60"/>
> >  > timeout="60"/>
> >  > name="promote" timeout="60"/>
> >  > name="demote" timeout="60"/>
> >   
> >
> >   
> >  > timeout="30"  id="ovndb-servers-monitor-20"/>
> > 
> > 
> >   
> >
> >
> > Thanks.
> > Hui.
> >
> >
> > On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillot 
> > wrote:
> > > On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote:
> > > > Hi all,
> > > >
> > > >   I am using the ovndb-servers ocf agent[1] which is a kind of
> > > multi-
> > > > state resource,when I am creating it(please see my previous
> > > email),
> > > > the monitor is called only once, and the start operation is never
> > > > called, according to below description, the once called monitor
> > > > operation returned OCF_NOT_RUNNING, should the pacemaker will
> > > decide
> > > > to execute start action based this return code? is there any way
> > > to
> > >
> > > Before Pacemaker does anything with a resource, it first calls a
> > > one-
> > > time monitor (called a "probe") to find out the current status of
> > > the
> > > resource across the cluster. This allows it to discover if the
> > > service
> > > is already running somewhere.
> > >
> > > So, you will see those probes for every resource when the cluster
> > > starts, or when the resource is added to the configuration, or when
> > > the
> > > resource is cleaned up.
> > >
> > > > check out what is the next action? Currently in my environment
> > > > nothing happened and I am almost tried all I known ways to debug,
> > > > however, no lucky, could anyone help it out? thank you very much.
> > > >
> > > > Monitor Return Code   Description
> > > > OCF_NOT_RUNNING   Stopped
> > > > OCF_SUCCESS   Running (Slave)
> > > > OCF_RUNNING_MASTERRunning (Master)
> > > > OCF_FAILED_MASTER Failed (Master)
> > > > Other Failed (Slave)
> > > >
> > > >
> > > > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/
> > > ovnd
> > > > b-servers.ocf
> > > > Hui.
> > > >
> > > >
> > > >
> > > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang 
> > > > wrote:
> > > > > The really weired thing is that the monitor is only called once
> > > > > other than expected repeatedly, where should I check for it?
> > > > >
> > > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang  > > >
> > > > > wrote:
> > > > > > Thanks Ken very much for your helpful infomation.
> > > > > >
> > > > > > I am now blocking on I can't see the pacemaker DC do any
> > > further
> > > > > > start/promote etc action on my resource agents, no helpful
> > > logs
> > > > > > founded.
> > >
> > > Each time the DC decides what to do, there will be a line like "...
> > > saving inputs in ..." with a file name. The log messages just
> > > before
> > > that may give some useful information.
> > >
> > > Otherwise, you can take that file, and simulate what the cluster
> > > decided at that point:
> > >
> > >   crm_simulate -Sx $FILENAME
> > >
> > > It will first show the status of the cluster at the start of the
> > > decision-making, then a "Transition Summary" with the actions that
> > > are
> > > required, then a simulated execution of those actions, and then
> > > what
> > > the resulting status would be if those actions succeeded.
> > >
> > > That may give you some more information. You can make it more
> > > verbose
> > > by using "-Ssx", or by adding "-", but it's not very user-
> > > friendly
> > > output.
> > >
> > > > > >
> > > > > > So my first question is 

Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-12-04 Thread Ken Gaillot
On Mon, 2017-12-04 at 23:15 +0800, Hui Xiang wrote:
> Thanks Ken very much for the helpful information. It indeed help a
> lot for debbuging.
> 
>  " Each time the DC decides what to do, there will be a line like
> "...
> saving inputs in ..." with a file name. The log messages just before
> that may give some useful information."
>   - I am unable to find such information in the logs, it only prints
> some like /var/lib/pacemaker/pengine/pe-input-xx

If the cluster had nothing to do, it won't show anything, but if
actions were needed, it should show them, like
"Start  myrsc ( node1 )".

Are there any messages with "error" or "warning" in the log?

> When I am comparing the cib.xml file of good with bad one, it
> diffetiates from the order of "name" and "id" as below shown, does it
> matter for cib to function normally?

No, the XML attributes can be any order.

I just noticed that your cluster has symmetric-cluster=false. That
means that resources can't run anywhere by default; in order for a
resource to run, there must be a location constraint allowing it to run
on a node. Have you added such constraints?

> 
>           
>              name="monitor" timeout="30"/>
>              timeout="60"/>
>              timeout="60"/>
>              name="promote" timeout="60"/>
>              name="demote" timeout="60"/>
>           
> 
>           
>              timeout="30"  id="ovndb-servers-monitor-20"/>
>             
>             
>           
> 
> 
> Thanks.
> Hui.
> 
> 
> On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillot 
> wrote:
> > On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote:
> > > Hi all,
> > >
> > >   I am using the ovndb-servers ocf agent[1] which is a kind of
> > multi-
> > > state resource,when I am creating it(please see my previous
> > email),
> > > the monitor is called only once, and the start operation is never
> > > called, according to below description, the once called monitor
> > > operation returned OCF_NOT_RUNNING, should the pacemaker will
> > decide
> > > to execute start action based this return code? is there any way
> > to
> > 
> > Before Pacemaker does anything with a resource, it first calls a
> > one-
> > time monitor (called a "probe") to find out the current status of
> > the
> > resource across the cluster. This allows it to discover if the
> > service
> > is already running somewhere.
> > 
> > So, you will see those probes for every resource when the cluster
> > starts, or when the resource is added to the configuration, or when
> > the
> > resource is cleaned up.
> > 
> > > check out what is the next action? Currently in my environment
> > > nothing happened and I am almost tried all I known ways to debug,
> > > however, no lucky, could anyone help it out? thank you very much.
> > >
> > > Monitor Return Code   Description
> > > OCF_NOT_RUNNING       Stopped
> > > OCF_SUCCESS   Running (Slave)
> > > OCF_RUNNING_MASTER    Running (Master)
> > > OCF_FAILED_MASTER     Failed (Master)
> > > Other Failed (Slave)
> > >
> > >
> > > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/
> > ovnd
> > > b-servers.ocf
> > > Hui.
> > >
> > >
> > >
> > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang 
> > > wrote:
> > > > The really weired thing is that the monitor is only called once
> > > > other than expected repeatedly, where should I check for it?
> > > >
> > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang  > >
> > > > wrote:
> > > > > Thanks Ken very much for your helpful infomation.
> > > > >
> > > > > I am now blocking on I can't see the pacemaker DC do any
> > further
> > > > > start/promote etc action on my resource agents, no helpful
> > logs
> > > > > founded.
> > 
> > Each time the DC decides what to do, there will be a line like "...
> > saving inputs in ..." with a file name. The log messages just
> > before
> > that may give some useful information.
> > 
> > Otherwise, you can take that file, and simulate what the cluster
> > decided at that point:
> > 
> >   crm_simulate -Sx $FILENAME
> > 
> > It will first show the status of the cluster at the start of the
> > decision-making, then a "Transition Summary" with the actions that
> > are
> > required, then a simulated execution of those actions, and then
> > what
> > the resulting status would be if those actions succeeded.
> > 
> > That may give you some more information. You can make it more
> > verbose
> > by using "-Ssx", or by adding "-", but it's not very user-
> > friendly
> > output.
> > 
> > > > >
> > > > > So my first question is that in what kind of situation DC
> > will
> > > > > decide do call start action?  does the monitor operation need
> > to
> > > > > be return OCF_SUCCESS? in my case, it will return
> > > > > OCF_NOT_RUNNING, and the monitor operation is not being
> > called
> > > > > any more, which should be wrong as I felt that it should be
> > > > > called intervally. 
> > 
> > 

Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-12-04 Thread Hui Xiang
Thanks Ken very much for the helpful information. It indeed help a lot for
debbuging.

 " Each time the DC decides what to do, there will be a line like "...
saving inputs in ..." with a file name. The log messages just before
that may give some useful information."
  - I am unable to find such information in the logs, it only prints some
like /var/lib/pacemaker/pengine/pe-input-xx

When I am comparing the cib.xml file of good with bad one, it diffetiates
from the order of "name" and "id" as below shown, does it matter for cib to
function normally?

  





  

  



  


Thanks.
Hui.


On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillot  wrote:

> On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote:
> > Hi all,
> >
> >   I am using the ovndb-servers ocf agent[1] which is a kind of multi-
> > state resource,when I am creating it(please see my previous email),
> > the monitor is called only once, and the start operation is never
> > called, according to below description, the once called monitor
> > operation returned OCF_NOT_RUNNING, should the pacemaker will decide
> > to execute start action based this return code? is there any way to
>
> Before Pacemaker does anything with a resource, it first calls a one-
> time monitor (called a "probe") to find out the current status of the
> resource across the cluster. This allows it to discover if the service
> is already running somewhere.
>
> So, you will see those probes for every resource when the cluster
> starts, or when the resource is added to the configuration, or when the
> resource is cleaned up.
>
> > check out what is the next action? Currently in my environment
> > nothing happened and I am almost tried all I known ways to debug,
> > however, no lucky, could anyone help it out? thank you very much.
> >
> > Monitor Return Code   Description
> > OCF_NOT_RUNNING   Stopped
> > OCF_SUCCESS   Running (Slave)
> > OCF_RUNNING_MASTERRunning (Master)
> > OCF_FAILED_MASTER Failed (Master)
> > Other Failed (Slave)
> >
> >
> > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovnd
> > b-servers.ocf
> > Hui.
> >
> >
> >
> > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang 
> > wrote:
> > > The really weired thing is that the monitor is only called once
> > > other than expected repeatedly, where should I check for it?
> > >
> > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang 
> > > wrote:
> > > > Thanks Ken very much for your helpful infomation.
> > > >
> > > > I am now blocking on I can't see the pacemaker DC do any further
> > > > start/promote etc action on my resource agents, no helpful logs
> > > > founded.
>
> Each time the DC decides what to do, there will be a line like "...
> saving inputs in ..." with a file name. The log messages just before
> that may give some useful information.
>
> Otherwise, you can take that file, and simulate what the cluster
> decided at that point:
>
>   crm_simulate -Sx $FILENAME
>
> It will first show the status of the cluster at the start of the
> decision-making, then a "Transition Summary" with the actions that are
> required, then a simulated execution of those actions, and then what
> the resulting status would be if those actions succeeded.
>
> That may give you some more information. You can make it more verbose
> by using "-Ssx", or by adding "-", but it's not very user-friendly
> output.
>
> > > >
> > > > So my first question is that in what kind of situation DC will
> > > > decide do call start action?  does the monitor operation need to
> > > > be return OCF_SUCCESS? in my case, it will return
> > > > OCF_NOT_RUNNING, and the monitor operation is not being called
> > > > any more, which should be wrong as I felt that it should be
> > > > called intervally.
>
> The DC will ask for a start if the configuration and current status
> require it. For example, if the resource's current status is stopped,
> and the configuration calls for a target role of started (the default),
> then it will start it. On the other hand, if the current status is
> started, then it doesn't need to do anything -- or, if location
> constraints ban all the nodes from running the resource, then it can't
> do anything.
>
> So, it's all based on what the current status is (based on the last
> monitor result), and what the configuration requires.
>
> > > >
> > > > The resource agent monitor logistic:
> > > > In the xx_monitor function it will call xx_update, and there
> > > > always hit  "$CRM_MASTER -D;;" , what does it usually mean? will
> > > > it stopped that start operation being called?
>
> Each master/slave resource has a special node attribute with a "master
> score" for that node. The node with the highest master score will be
> promoted to master. It's up to the resource agent to set this
> attribute. The "-D" call you see deletes that 

Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-12-01 Thread Ken Gaillot
On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote:
> Hi all,
> 
>   I am using the ovndb-servers ocf agent[1] which is a kind of multi-
> state resource,when I am creating it(please see my previous email),
> the monitor is called only once, and the start operation is never
> called, according to below description, the once called monitor
> operation returned OCF_NOT_RUNNING, should the pacemaker will decide
> to execute start action based this return code? is there any way to 

Before Pacemaker does anything with a resource, it first calls a one-
time monitor (called a "probe") to find out the current status of the
resource across the cluster. This allows it to discover if the service
is already running somewhere.

So, you will see those probes for every resource when the cluster
starts, or when the resource is added to the configuration, or when the
resource is cleaned up.

> check out what is the next action? Currently in my environment
> nothing happened and I am almost tried all I known ways to debug,
> however, no lucky, could anyone help it out? thank you very much.
> 
> Monitor Return Code   Description
> OCF_NOT_RUNNING   Stopped
> OCF_SUCCESS   Running (Slave)
> OCF_RUNNING_MASTERRunning (Master)
> OCF_FAILED_MASTER Failed (Master)
> Other Failed (Slave)
> 
> 
> [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovnd
> b-servers.ocf
> Hui.
> 
> 
> 
> On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang 
> wrote:
> > The really weired thing is that the monitor is only called once
> > other than expected repeatedly, where should I check for it?
> > 
> > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang 
> > wrote:
> > > Thanks Ken very much for your helpful infomation.
> > > 
> > > I am now blocking on I can't see the pacemaker DC do any further
> > > start/promote etc action on my resource agents, no helpful logs
> > > founded.

Each time the DC decides what to do, there will be a line like "...
saving inputs in ..." with a file name. The log messages just before
that may give some useful information.

Otherwise, you can take that file, and simulate what the cluster
decided at that point:

  crm_simulate -Sx $FILENAME

It will first show the status of the cluster at the start of the
decision-making, then a "Transition Summary" with the actions that are
required, then a simulated execution of those actions, and then what
the resulting status would be if those actions succeeded.

That may give you some more information. You can make it more verbose
by using "-Ssx", or by adding "-", but it's not very user-friendly
output.

> > > 
> > > So my first question is that in what kind of situation DC will
> > > decide do call start action?  does the monitor operation need to
> > > be return OCF_SUCCESS? in my case, it will return
> > > OCF_NOT_RUNNING, and the monitor operation is not being called
> > > any more, which should be wrong as I felt that it should be
> > > called intervally. 

The DC will ask for a start if the configuration and current status
require it. For example, if the resource's current status is stopped,
and the configuration calls for a target role of started (the default),
then it will start it. On the other hand, if the current status is
started, then it doesn't need to do anything -- or, if location
constraints ban all the nodes from running the resource, then it can't
do anything.

So, it's all based on what the current status is (based on the last
monitor result), and what the configuration requires.

> > > 
> > > The resource agent monitor logistic:
> > > In the xx_monitor function it will call xx_update, and there
> > > always hit  "$CRM_MASTER -D;;" , what does it usually mean? will
> > > it stopped that start operation being called? 

Each master/slave resource has a special node attribute with a "master
score" for that node. The node with the highest master score will be
promoted to master. It's up to the resource agent to set this
attribute. The "-D" call you see deletes that attribute (presumably
before updating it later).

The master score has no effect on starting/stopping.

> > > 
> > > ovsdb_server_master_update() {
> > >     ocf_log info "ovsdb_server_master_update: $1}"
> > > 
> > >     case $1 in
> > >         $OCF_SUCCESS)
> > >         $CRM_MASTER -v ${slave_score};;
> > >         $OCF_RUNNING_MASTER)
> > >             $CRM_MASTER -v ${master_score};;
> > >         #*) $CRM_MASTER -D;;
> > >     esac
> > >     ocf_log info "ovsdb_server_master_update end}"
> > > }
> > > 
> > > ovsdb_server_monitor() {
> > >     ocf_log info "ovsdb_server_monitor"
> > >     ovsdb_server_check_status
> > >     rc=$?
> > > 
> > >     ovsdb_server_master_update $rc
> > >     ocf_log info "monitor is going to return $rc"
> > >     return $rc
> > > }
> > > 
> > > 
> > > Below is my cluster configuration:
> > > 
> > > 1. First I have an vip set.
> > > [root@node-1 ~]# pcs resource show
> > >  vip__management_old  

Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-11-30 Thread Hui Xiang
Hi all,

  I am using the ovndb-servers ocf agent[1] which is a kind of multi-state
resource,when I am creating it(please see my previous email), the monitor
is called only once, and the start operation is never called, according to
below description, the once called monitor operation returned
OCF_NOT_RUNNING, should the pacemaker will decide to execute start action
based this return code? is there any way to check out what is the next
action? Currently in my environment nothing happened and I am almost tried
all I known ways to debug, however, no lucky, could anyone help it out?
thank you very much.

Monitor Return CodeDescription
OCF_NOT_RUNNING Stopped
OCF_SUCCESS Running (Slave)
OCF_RUNNING_MASTER Running (Master)
OCF_FAILED_MASTER Failed (Master)
Other Failed (Slave)


[1]
https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovndb-servers.ocf
Hui.



On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang  wrote:

> The really weired thing is that the monitor is only called once other than
> expected repeatedly, where should I check for it?
>
> On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang  wrote:
>
>> Thanks Ken very much for your helpful infomation.
>>
>> I am now blocking on I can't see the pacemaker DC do any further
>> start/promote etc action on my resource agents, no helpful logs founded.
>>
>> So my first question is that in what kind of situation DC will decide do
>> call start action?  does the monitor operation need to be return
>> OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor
>> operation is not being called any more, which should be wrong as I felt
>> that it should be called intervally.
>>
>> The resource agent monitor logistic:
>> In the xx_monitor function it will call xx_update, and there always hit
>>  "$CRM_MASTER -D;;" , what does it usually mean? will it stopped that
>> start operation being called?
>>
>> ovsdb_server_master_update() {
>> ocf_log info "ovsdb_server_master_update: $1}"
>>
>> case $1 in
>> $OCF_SUCCESS)
>> $CRM_MASTER -v ${slave_score};;
>> $OCF_RUNNING_MASTER)
>> $CRM_MASTER -v ${master_score};;
>> #*) $CRM_MASTER -D;;
>> esac
>> ocf_log info "ovsdb_server_master_update end}"
>> }
>>
>> ovsdb_server_monitor() {
>> ocf_log info "ovsdb_server_monitor"
>> ovsdb_server_check_status
>> rc=$?
>>
>> ovsdb_server_master_update $rc
>> ocf_log info "monitor is going to return $rc"
>> return $rc
>> }
>>
>>
>> Below is my cluster configuration:
>>
>> 1. First I have an vip set.
>> [root@node-1 ~]# pcs resource show
>>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>>
>> 2. Use pcs to create ovndb-servers and constraint
>> [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers
>> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
>> sb_master_port=6642 master
>>  ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true
>>   Error: unable to find a resource/clone/master/group:
>> tst-ovndb-master) ## returned error, so I changed into below command.
>> [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb
>> notify=true
>> [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master
>> with vip__management_old
>>
>> 3. pcs status
>> [root@node-1 ~]# pcs status
>>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>>  Master/Slave Set: tst-ovndb-master [tst-ovndb]
>>  Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
>>
>> 4. pcs resource show XXX
>> [root@node-1 ~]# pcs resource show  vip__management_old
>>  Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2)
>>   Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m
>> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none
>> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false
>> iptables_comment=default-comment
>>   Meta Attrs: migration-threshold=3 failure-timeout=60
>> resource-stickiness=1
>>   Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3
>> )
>>   start interval=0 timeout=30 (vip__management_old-start-0)
>>   stop interval=0 timeout=30 (vip__management_old-stop-0)
>> [root@node-1 ~]# pcs resource show tst-ovndb-master
>>  Master: tst-ovndb-master
>>   Meta Attrs: notify=true
>>   Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers)
>>Attributes: manage_northd=yes master_ip=192.168.0.2
>> nb_master_port=6641 sb_master_port=6642
>>Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s)
>>stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s)
>>promote interval=0s timeout=50s
>> (tst-ovndb-promote-timeout-50s)
>>demote interval=0s timeout=50s
>> (tst-ovndb-demote-timeout-50s)
>>monitor interval=30s timeout=20s
>> (tst-ovndb-monitor-interval-30s)
>>  

Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-11-30 Thread Hui Xiang
The really weired thing is that the monitor is only called once other than
expected repeatedly, where should I check for it?

On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang  wrote:

> Thanks Ken very much for your helpful infomation.
>
> I am now blocking on I can't see the pacemaker DC do any further
> start/promote etc action on my resource agents, no helpful logs founded.
>
> So my first question is that in what kind of situation DC will decide do
> call start action?  does the monitor operation need to be return
> OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor
> operation is not being called any more, which should be wrong as I felt
> that it should be called intervally.
>
> The resource agent monitor logistic:
> In the xx_monitor function it will call xx_update, and there always hit  
> "$CRM_MASTER
> -D;;" , what does it usually mean? will it stopped that start operation
> being called?
>
> ovsdb_server_master_update() {
> ocf_log info "ovsdb_server_master_update: $1}"
>
> case $1 in
> $OCF_SUCCESS)
> $CRM_MASTER -v ${slave_score};;
> $OCF_RUNNING_MASTER)
> $CRM_MASTER -v ${master_score};;
> #*) $CRM_MASTER -D;;
> esac
> ocf_log info "ovsdb_server_master_update end}"
> }
>
> ovsdb_server_monitor() {
> ocf_log info "ovsdb_server_monitor"
> ovsdb_server_check_status
> rc=$?
>
> ovsdb_server_master_update $rc
> ocf_log info "monitor is going to return $rc"
> return $rc
> }
>
>
> Below is my cluster configuration:
>
> 1. First I have an vip set.
> [root@node-1 ~]# pcs resource show
>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>
> 2. Use pcs to create ovndb-servers and constraint
> [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers
> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
> sb_master_port=6642 master
>  ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true
>   Error: unable to find a resource/clone/master/group:
> tst-ovndb-master) ## returned error, so I changed into below command.
> [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb
> notify=true
> [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master
> with vip__management_old
>
> 3. pcs status
> [root@node-1 ~]# pcs status
>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>  Master/Slave Set: tst-ovndb-master [tst-ovndb]
>  Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
>
> 4. pcs resource show XXX
> [root@node-1 ~]# pcs resource show  vip__management_old
>  Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2)
>   Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m
> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none
> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false
> iptables_comment=default-comment
>   Meta Attrs: migration-threshold=3 failure-timeout=60
> resource-stickiness=1
>   Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3
> )
>   start interval=0 timeout=30 (vip__management_old-start-0)
>   stop interval=0 timeout=30 (vip__management_old-stop-0)
> [root@node-1 ~]# pcs resource show tst-ovndb-master
>  Master: tst-ovndb-master
>   Meta Attrs: notify=true
>   Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers)
>Attributes: manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
> sb_master_port=6642
>Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s)
>stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s)
>promote interval=0s timeout=50s
> (tst-ovndb-promote-timeout-50s)
>demote interval=0s timeout=50s
> (tst-ovndb-demote-timeout-50s)
>monitor interval=30s timeout=20s
> (tst-ovndb-monitor-interval-30s)
>monitor interval=10s role=Master timeout=20s
> (tst-ovndb-monitor-interval-10s-role-Master)
>monitor interval=30s role=Slave timeout=20s
> (tst-ovndb-monitor-interval-30s-role-Slave)
>
>
> colocation colocation-tst-ovndb-master-vip__management_old-INFINITY inf:
> tst-ovndb-master:Master vip__management_old:Started
>
> 5. I have put log in every ovndb-servers op, seems only the monitor op is
> being called, no promoted by the pacemaker DC:
> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_monitor
> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_check_status
> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> return OCFOCF_NOT_RUNNINGG
> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_master_update: 7}
> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_master_update end}
> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> monitor is 

Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-11-30 Thread Hui Xiang
Thanks Ken very much for your helpful infomation.

I am now blocking on I can't see the pacemaker DC do any further
start/promote etc action on my resource agents, no helpful logs founded.

So my first question is that in what kind of situation DC will decide do
call start action?  does the monitor operation need to be return
OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor
operation is not being called any more, which should be wrong as I felt
that it should be called intervally.

The resource agent monitor logistic:
In the xx_monitor function it will call xx_update, and there always
hit  "$CRM_MASTER
-D;;" , what does it usually mean? will it stopped that start operation
being called?

ovsdb_server_master_update() {
ocf_log info "ovsdb_server_master_update: $1}"

case $1 in
$OCF_SUCCESS)
$CRM_MASTER -v ${slave_score};;
$OCF_RUNNING_MASTER)
$CRM_MASTER -v ${master_score};;
#*) $CRM_MASTER -D;;
esac
ocf_log info "ovsdb_server_master_update end}"
}

ovsdb_server_monitor() {
ocf_log info "ovsdb_server_monitor"
ovsdb_server_check_status
rc=$?

ovsdb_server_master_update $rc
ocf_log info "monitor is going to return $rc"
return $rc
}


Below is my cluster configuration:

1. First I have an vip set.
[root@node-1 ~]# pcs resource show
 vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld

2. Use pcs to create ovndb-servers and constraint
[root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers
manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
sb_master_port=6642 master
 ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true
  Error: unable to find a resource/clone/master/group:
tst-ovndb-master) ## returned error, so I changed into below command.
[root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb notify=true
[root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master with
vip__management_old

3. pcs status
[root@node-1 ~]# pcs status
 vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
 Master/Slave Set: tst-ovndb-master [tst-ovndb]
 Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]

4. pcs resource show XXX
[root@node-1 ~]# pcs resource show  vip__management_old
 Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2)
  Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m
ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none
gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false
iptables_comment=default-comment
  Meta Attrs: migration-threshold=3 failure-timeout=60
resource-stickiness=1
  Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3)
  start interval=0 timeout=30 (vip__management_old-start-0)
  stop interval=0 timeout=30 (vip__management_old-stop-0)
[root@node-1 ~]# pcs resource show tst-ovndb-master
 Master: tst-ovndb-master
  Meta Attrs: notify=true
  Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers)
   Attributes: manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
sb_master_port=6642
   Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s)
   stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s)
   promote interval=0s timeout=50s (tst-ovndb-promote-timeout-
50s)
   demote interval=0s timeout=50s (tst-ovndb-demote-timeout-50s)
   monitor interval=30s timeout=20s (tst-ovndb-monitor-interval-
30s)
   monitor interval=10s role=Master timeout=20s
(tst-ovndb-monitor-interval-10s-role-Master)
   monitor interval=30s role=Slave timeout=20s
(tst-ovndb-monitor-interval-30s-role-Slave)


colocation colocation-tst-ovndb-master-vip__management_old-INFINITY inf:
tst-ovndb-master:Master vip__management_old:Started

5. I have put log in every ovndb-servers op, seems only the monitor op is
being called, no promoted by the pacemaker DC:
<30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_monitor
<30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_check_status
<30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: return
OCFOCF_NOT_RUNNINGG
<30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_master_update: 7}
<30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_master_update end}
<30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: monitor
is going to return 7
<30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO: metadata
exit OCF_SUCCESS}

6. The cluster property:
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.12-a14efad \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
stonith-enabled=false \
symmetric-cluster=false \
last-lrm-refresh=1511802933