Re: [ClusterLabs] why and when a call of crm_attribute can be delayed ?

2016-05-04 Thread Ken Gaillot
On 04/25/2016 05:02 AM, Jehan-Guillaume de Rorthais wrote:
> Hi all,
> 
> I am facing a strange issue with attrd while doing some testing on a three 
> node
> cluster with the pgsqlms RA [1].
> 
> pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave
> setup on top of pgsqld.
> 
> Before triggering a failure, here was the situation:
> 
>   * centos1: pgsql-ha slave
>   * centos2: pgsql-ha slave
>   * centos3: pgsql-ha master
> 
> Then we triggered a failure: the node centos3 has been kill using 
> 
>   echo c > /proc/sysrq-trigger
> 
> In this situation, PEngine provide a transition where :
> 
>   * centos3 is fenced 
>   * pgsql-ha on centos2 is promoted
> 
> During the pre-promote notify action in the pgsqlms RA, each remaining slave 
> are
> setting a node attribute called lsn_location, see: 
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1504
> 
>   crm_attribute -l reboot -t status --node "$nodename" \
> --name lsn_location --update "$node_lsn"
> 
> During the promotion action in the pgsqlms RA, the RA check the lsn_location 
> of
> the all the nodes to make sure the local one is higher or equal to all others.
> See:
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1292
> 
> This is where we face a attrd behavior we don't understand.
> 
> Despite we can see in the log the RA was able to set its local
> "lsn_location", during the promotion action, the RA was unable to read its
> local lsn_location":
> 
>   pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
> INFO: pgsql_notify: promoting instance on node "centos2" 
> 
>   pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
> INFO: pgsql_notify: current node LSN: 0/1EE24000 
> 
>   [...]
> 
>   pgsqlms(pgsqld)[9023]:  2016/04/22_14:46:16
> CRIT: pgsql_promote: can not get current node LSN location
> 
>   Apr 22 14:46:16 [5864] centos2   lrmd:
> notice: operation_finished: pgsqld_promote_0:9023:stderr 
> [ Error performing operation: No such device or address ] 
> 
>   Apr 22 14:46:16 [5864] centos2   lrmd: 
> info: log_finished:  finished - rsc:pgsqld
> action:promote call_id:211 pid:9023 exit-code:1 exec-time:107ms
> queue-time:0ms
> 
> The error comes from:
> 
>   https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1320
> 
> **After** this error, we can see in the log file attrd set the "lsn_location" 
> of
> centos2:
> 
>   Apr 22 14:46:16 [5865] centos2
> attrd: info: attrd_peer_update:
> Setting lsn_location[centos2]: (null) -> 0/1EE24000 from centos2 
> 
>   Apr 22 14:46:16 [5865] centos2
> attrd: info: write_attribute:   
> Write out of 'lsn_location' delayed:update 189 in progress
> 
> 
> As I understand it, the call of crm_attribute during pre-promote notification
> has been taken into account AFTER the "promote" action, leading to this error.
> Am I right?
> 
> Why and how this could happen? Could it comes from the dampen parameter? We 
> did
> not set any dampen anywhere, is there a default value in the cluster setup?
> Could we avoid this behavior?

Unfortunately, that is expected. Both the cluster's call of the RA's
notify action, and the RA's call of crm_attribute, are asynchronous. So
there is no guarantee that anything done by the pre-promote notify will
be complete (or synchronized across other cluster nodes) by the time the
promote action is called.

There would be no point in the pre-promote notify waiting for the
attribute value to be retrievable, because the cluster isn't going to
wait for the pre-promote notify to finish before calling promote.

Maybe someone else can come up with a better idea, but I'm thinking
maybe the attribute could be set as timestamp:lsn, and the promote
action could poll attrd repeatedly (for a small duration lower than the
typical promote timeout) until it gets lsn's with a recent timestamp
from all nodes. One error condition to handle would be if one of the
other slaves happens to fail or be unresponsive at that time.

> Please, find in attachment a tarball with :
>   * all cluster logfiles from the three nodes
>   * the content of /var/lib/pacemaker from the three nodes:
> * CIBs
> * PEngine transitions
> 
> 
> Regards,
> 
> [1] https://github.com/dalibo/PAF
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ringid interface FAULTY no resource move

2016-05-04 Thread Ken Gaillot
On 05/04/2016 07:14 AM, Rafał Sanocki wrote:
> Hello,
> I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker
> , DRBD .  When i plug out cable nothing happend.
> 
> Corosync.conf
> 
> # Please read the corosync.conf.5 manual page
> totem {
> version: 2
> crypto_cipher: none
> crypto_hash: none
> rrp_mode: passive
> 
> interface {
> ringnumber: 0
> bindnetaddr: 172.17.10.0
> mcastport: 5401
> ttl: 1
> }
> interface {
> ringnumber: 1
> bindnetaddr: 255.255.255.0
> mcastport: 5409
> ttl: 1
> }

255.255.255.0 is not a valid bindnetaddr. bindnetaddr should be the IP
network address (not netmask) of the desired interface.

Also, the point of rrp is to have two redundant network links. So
unplugging one shouldn't cause problems, if the other is still up.

> 
> transport: udpu
> }
> 
> logging {
> fileline: off
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
> 
> nodelist {
> node {
> ring0_addr: 172.17.10.81
> ring1_addr: 255.255.255.1
> nodeid: 1
> }
> node {
> ring0_addr: 172.17.10.89
> ring1_addr: 255.255.255.9
> nodeid: 2
> }
> 
> }
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> provider: corosync_votequorum
> }
> 
> crm config
> 
> crm(live)configure# show
> node 1: cs01A
> node 2: cs01B
> primitive p_drbd2dev ocf:linbit:drbd \
> params drbd_resource=b1 \
> op monitor interval=29s role=Master \
> op monitor interval=31s role=Slave \
> meta target-role=Started
> primitive p_exportfs_fs2 exportfs \
> params fsid=101 directory="/data1/b1"
> options="rw,sync,no_root_squash,insecure,anonuid=100,anongid=101,nohide"
> clientspec="172.17.10.0/255.255.255.0" wait_for_leasetime_on_stop=false \
> op monitor interval=30s \
> op start interval=0 timeout=240s \
> op stop interval=0 timeout=100s \
> meta target-role=Started
> primitive p_ip_2 IPaddr2 \
> params ip=172.17.10.97 nic=neteth0 cidr_netmask=24 \
> op monitor interval=30s timeout=5s \
> meta target-role=Started
> primitive p_mount_fs2 Filesystem \
> params fstype=reiserfs options="noatime,nodiratime,notail"
> directory="/data1" device="/dev/drbd2" \
> op start interval=0 timeout=400s \
> op stop interval=0 timeout=100s \
> op monitor interval=30s \
> meta target-role=Started
> group g_nfs2 p_ip_2 p_mount_fs2 p_exportfs_fs2
> ms ms_drbd2 p_drbd2dev \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true is-managed=true target-role=Slave
> colocation co_drbd2 inf: g_nfs2 ms_drbd2:Master
> order ms_drbd2_order Mandatory: ms_drbd2:promote g_nfs2:start
> property cib-bootstrap-options: \
> stonith-enabled=false \
> have-watchdog=true \
> dc-version=1.1.14-535193a \
> cluster-infrastructure=corosync \
> maintenance-mode=false \
> no-quorum-policy=ignore \
> last-lrm-refresh=1460627538
> 
> 
> # ip addr show
> neteth1:  mtu 1500 qdisc mq portid
> d8d385bda90c state DOWN group default qlen 1000
> link/ether d8:d3:85:aa:aa:aa brd ff:ff:ff:ff:ff:ff
> inet 255.255.255.1/24 brd 255.255.255.255 scope global neteth1
>valid_lft forever preferred_lft forever
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
> id  = 172.17.10.81
> status  = ring 0 active with no faults
> RING ID 1
> id  = 255.255.255.1
> status  = Marking ringid 1 interface 255.255.255.1 FAULTY
> 
> #crm_mon -A
> 
> Stack: corosync
> Current DC: csb01A (version 1.1.14-535193a) - partition with quorum
> Last updated: Wed May  4 14:11:34 2016  Last change: Thu Apr 14
> 13:06:15 2016 by root via crm_resource on csb01B
> 
> 2 nodes and 5 resources configured: 2 resources DISABLED and 0 BLOCKED
> from being started due to failures
> 
> Online: [ cs01A cs01B ]
> 
>  Resource Group: g_nfs2
>  p_ip_2 (ocf::heartbeat:IPaddr2):   Started csb01A
>  p_mount_fs2(ocf::heartbeat:Filesystem):Started csb01A
>  p_exportfs_fs2 (ocf::heartbeat:exportfs):  Started csb01A
>  Master/Slave Set: ms_drbd2 [p_drbd2dev]
>  Masters: [ csb01A ]
>  Slaves (target-role): [ csb01B ]
> 
> Node Attributes:
> * Node csb01A:
> + master-p_drbd2dev : 1
> * Node csb01B:
> + 

Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-04 Thread Ken Gaillot
On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>> Hi all,
>>
>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>> would like Pacemaker to be extended to export the current failcount as
>> an environment variable to OCF RA scripts when they are invoked with
>> 'start' or 'stop' actions.  This would mean that if you have
>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>> would be able to implement a different behaviour for the third and
>> final 'stop' of a service executed on a node, which is different to
>> the previous 'stop' actions executed just prior to attempting a
>> restart of the service.  (In the non-clone case, this would happen
>> just before migrating the service to another node.)
> So what you actually want to know is how much headroom
> there still is till the resource would be migrated.
> So wouldn't it then be much more catchy if we don't pass
> the failcount but rather the headroom?

Yes, that's the plan: pass a new environment variable with
(migration-threshold - fail-count) when recovering a resource. I haven't
worked out the exact behavior yet, but that's the idea. I do hope to get
this in 1.1.15 since it's a small change.

The advantage over using crm_failcount is that it will be limited to the
current recovery attempt, and it will calculate the headroom as you say,
rather than the raw failcount.

>> One use case for this is to invoke "nova service-disable" if Pacemaker
>> fails to restart the nova-compute service on an OpenStack compute
>> node.
>>
>> Is it feasible to squeeze this in before the 1.1.15 release?
>>
>> Thanks a lot!
>> Adam

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: ringid interface FAULTY no resource move

2016-05-04 Thread emmanuel segura
use fencing and drbd fencing handler

2016-05-04 14:46 GMT+02:00 Rafał Sanocki :
> Resources shuld move to second node when any  interface is down.
>
>
>
>
> W dniu 2016-05-04 o 14:41, Ulrich Windl pisze:
>
> Rafal Sanocki  schrieb am 04.05.2016 um 14:14
> in
>>
>> Nachricht <78d882b1-a407-31e0-2b9e-b5f8406d4...@gmail.com>:
>>>
>>> Hello,
>>> I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker
>>> , DRBD .  When i plug out cable nothing happend.
>>
>> "nothing"? The wrong cable?
>>
>> [...]
>>
>> Regards,
>> Ulrich
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: ringid interface FAULTY no resource move

2016-05-04 Thread Rafał Sanocki

Resources shuld move to second node when any  interface is down.




W dniu 2016-05-04 o 14:41, Ulrich Windl pisze:

Rafal Sanocki  schrieb am 04.05.2016 um 14:14 in

Nachricht <78d882b1-a407-31e0-2b9e-b5f8406d4...@gmail.com>:

Hello,
I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker
, DRBD .  When i plug out cable nothing happend.

"nothing"? The wrong cable?

[...]

Regards,
Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-04 Thread Jehan-Guillaume de Rorthais
Le Wed, 4 May 2016 13:09:04 +0100,
Adam Spiers  a écrit :

> Hi all,

Hello,

> As discussed with Ken and Andrew at the OpenStack summit last week, we
> would like Pacemaker to be extended to export the current failcount as
> an environment variable to OCF RA scripts when they are invoked with
> 'start' or 'stop' actions.  This would mean that if you have
> start-failure-is-fatal=false and migration-threshold=3 (say), then you
> would be able to implement a different behaviour for the third and
> final 'stop' of a service executed on a node, which is different to
> the previous 'stop' actions executed just prior to attempting a
> restart of the service.  (In the non-clone case, this would happen
> just before migrating the service to another node.)
> 
> One use case for this is to invoke "nova service-disable" if Pacemaker
> fails to restart the nova-compute service on an OpenStack compute
> node.
> 
> Is it feasible to squeeze this in before the 1.1.15 release?

Wouldn't it possible to do the following command from the RA to get its
current failcount ?

  crm_failcount --resource "$OCF_RESOURCE_INSTANCE" -G

Moreover, how would you track the previous failures were all from the start
action? I suppose you will have to track internally the failcount yourself,
isn't it? Maybe you could track failure in some fashion using private
attributes (eg. start_attempt and last_start_ts)?

Regards,

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] ringid interface FAULTY no resource move

2016-05-04 Thread Rafał Sanocki

Hello,
I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker 
, DRBD .  When i plug out cable nothing happend.


Corosync.conf

# Please read the corosync.conf.5 manual page
totem {
version: 2
crypto_cipher: none
crypto_hash: none
rrp_mode: passive

interface {
ringnumber: 0
bindnetaddr: 172.17.10.0
mcastport: 5401
ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 255.255.255.0
mcastport: 5409
ttl: 1
}


transport: udpu
}

logging {
fileline: off
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}

nodelist {
node {
ring0_addr: 172.17.10.81
ring1_addr: 255.255.255.1
nodeid: 1
}
node {
ring0_addr: 172.17.10.89
ring1_addr: 255.255.255.9
nodeid: 2
}

}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
}

crm config

crm(live)configure# show
node 1: cs01A
node 2: cs01B
primitive p_drbd2dev ocf:linbit:drbd \
params drbd_resource=b1 \
op monitor interval=29s role=Master \
op monitor interval=31s role=Slave \
meta target-role=Started
primitive p_exportfs_fs2 exportfs \
params fsid=101 directory="/data1/b1" 
options="rw,sync,no_root_squash,insecure,anonuid=100,anongid=101,nohide" 
clientspec="172.17.10.0/255.255.255.0" wait_for_leasetime_on_stop=false \

op monitor interval=30s \
op start interval=0 timeout=240s \
op stop interval=0 timeout=100s \
meta target-role=Started
primitive p_ip_2 IPaddr2 \
params ip=172.17.10.97 nic=neteth0 cidr_netmask=24 \
op monitor interval=30s timeout=5s \
meta target-role=Started
primitive p_mount_fs2 Filesystem \
params fstype=reiserfs options="noatime,nodiratime,notail" 
directory="/data1" device="/dev/drbd2" \

op start interval=0 timeout=400s \
op stop interval=0 timeout=100s \
op monitor interval=30s \
meta target-role=Started
group g_nfs2 p_ip_2 p_mount_fs2 p_exportfs_fs2
ms ms_drbd2 p_drbd2dev \
meta master-max=1 master-node-max=1 clone-max=2 
clone-node-max=1 notify=true is-managed=true target-role=Slave

colocation co_drbd2 inf: g_nfs2 ms_drbd2:Master
order ms_drbd2_order Mandatory: ms_drbd2:promote g_nfs2:start
property cib-bootstrap-options: \
stonith-enabled=false \
have-watchdog=true \
dc-version=1.1.14-535193a \
cluster-infrastructure=corosync \
maintenance-mode=false \
no-quorum-policy=ignore \
last-lrm-refresh=1460627538


# ip addr show
neteth1:  mtu 1500 qdisc mq portid 
d8d385bda90c state DOWN group default qlen 1000

link/ether d8:d3:85:aa:aa:aa brd ff:ff:ff:ff:ff:ff
inet 255.255.255.1/24 brd 255.255.255.255 scope global neteth1
   valid_lft forever preferred_lft forever

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id  = 172.17.10.81
status  = ring 0 active with no faults
RING ID 1
id  = 255.255.255.1
status  = Marking ringid 1 interface 255.255.255.1 FAULTY

#crm_mon -A

Stack: corosync
Current DC: csb01A (version 1.1.14-535193a) - partition with quorum
Last updated: Wed May  4 14:11:34 2016  Last change: Thu Apr 14 
13:06:15 2016 by root via crm_resource on csb01B


2 nodes and 5 resources configured: 2 resources DISABLED and 0 BLOCKED 
from being started due to failures


Online: [ cs01A cs01B ]

 Resource Group: g_nfs2
 p_ip_2 (ocf::heartbeat:IPaddr2):   Started csb01A
 p_mount_fs2(ocf::heartbeat:Filesystem):Started csb01A
 p_exportfs_fs2 (ocf::heartbeat:exportfs):  Started csb01A
 Master/Slave Set: ms_drbd2 [p_drbd2dev]
 Masters: [ csb01A ]
 Slaves (target-role): [ csb01B ]

Node Attributes:
* Node csb01A:
+ master-p_drbd2dev : 1
* Node csb01B:
+ master-p_drbd2dev : 1000

--
Rafal Sanocki



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] FR: send failcount to OCF RA start/stop actions

2016-05-04 Thread Adam Spiers
Hi all,

As discussed with Ken and Andrew at the OpenStack summit last week, we
would like Pacemaker to be extended to export the current failcount as
an environment variable to OCF RA scripts when they are invoked with
'start' or 'stop' actions.  This would mean that if you have
start-failure-is-fatal=false and migration-threshold=3 (say), then you
would be able to implement a different behaviour for the third and
final 'stop' of a service executed on a node, which is different to
the previous 'stop' actions executed just prior to attempting a
restart of the service.  (In the non-clone case, this would happen
just before migrating the service to another node.)

One use case for this is to invoke "nova service-disable" if Pacemaker
fails to restart the nova-compute service on an OpenStack compute
node.

Is it feasible to squeeze this in before the 1.1.15 release?

Thanks a lot!
Adam

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] OpenStack Summit - Austin recap

2016-05-04 Thread Kristoffer Grönlund
Ken Gaillot  writes:

> Hi all,
>
> Last week's OpenStack Summit in Austin, Texas, was quite an event --
> equal parts spectacle and substance. ;-)
>

Wish I could have been there!

>
> In particular, anyone interested in HA and OpenStack should check out
> Adam Spiers and Dawid Deja's excellent presentation on the state of
> instance HA:
> https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation
>
> I was a bit surprised by how many presenters claimed "HA" as part of
> their topic's features, but when pressed, said their HA solution was
> either planned for the future, or couldn't handle split-brain. It seems
> we still have a lot of work to do to raise awareness of what true HA means.

I've been toying with the idea of a HA presentation called "Anything
they can do we can do better" - but my non-confrontational swedish
sensibilities are keeping me back. Would love to see someone else do
that one though. ;)

This may be due to ignorance on my part, but from what I have seen so
far, I personally think Pacemaker has all of the new container-centric
cluster "solutions" including Kubernetes beat on everything except
handling network configuration. Which admittedly is a big part of
container management...

Cheers,

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org