Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-20 Thread Ken Gaillot
are assigned to 192.168.120.10 and 
>> 192.168.120.11.
>>
>> - A dedicated back-to-back connection for corosync heartbeats in 
>> 192.168.121.1/24. MDA1PFP-PCS01 and MDA1PFP-S02 are assigned to 
>> 192.168.121.10 and 192.168.121.11. When the cluster is created, we use these 
>> as primary node names and use the 10GB device as a second backup connection 
>> for increased reliability: pcs cluster setup --name MDA1PFP 
>> MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02
>>
>> - A dedicated back-to-back connection for drbd in 192.168.122.1/24. Hosts 
>> MDA1PFP-DRBD01 and MDA1PFP-DRBD02 are assigned 192.168.23.10 and 
>> 192.168.123.11.
> 
> Ah, nice.
> 
>> Given that I think it is not a node-level failure. pcs status also reports 
>> the nodes as online. I think this should not trigger fencing from pacemaker.
>>
>>> When DRBD is configured with 'fencing resource-only' and 'fence-peer
>>> "/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
>>> it will try to add a constraint that prevents the other node from
>>> becoming master. It removes the constraint when connectivity is restored.
>>
>>> I am not familiar with all the under-the-hood details, but IIUC, if
>>> pacemaker actually fences the node, then the other node can still take
>>> over the DRBD. But if there is a network outage and no pacemaker
>>> fencing, then you'll see the behavior you describe -- DRBD prevents
>>> master takeover, to avoid stale data being used.
>>
>> This is my understanding as well, but there should be no network outage for 
>> DRBD. I can reproduce the behavior by stopping cluster nodes which DRBD 
>> seems to interpret as network outages since it cannot communicate with the 
>> shutdown node anymore. Maybe I should ask on the DRBD mailing list?
> 
> OK, I think I follow you now: you're ifdown'ing the data traffic
> interface, but the interfaces for both corosync and DRBD traffic are
> still up. So, pacemaker detects the virtual IP failure on the traffic
> interface, and correctly recovers the IP on the other node, but the DRBD
> master role is not recovered.
> 
> If the behavior goes away when you remove the DRBD fencing config, then
> it sounds like DRBD is seeing it as a network outage, and is adding the
> constraint to prevent a stale master. Yes, I think that would be worth
> bringing up on the DRBD list, though there might be some DRBD users here
> who can chime in, too.
> 
>> Cheers,
>>   Jens
>> --
>> Jens Auer | CGI | Software-Engineer
>> CGI (Germany) GmbH & Co. KG
>> Rheinstraße 95 | 64295 Darmstadt | Germany
>> T: +49 6151 36860 154
>> jens.a...@cgi.com
>> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
>> de.cgi.com/pflichtangaben.
>>
>> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to 
>> CGI Group Inc. and its affiliates may be contained in this message. If you 
>> are not a recipient indicated or intended in this message (or responsible 
>> for delivery of this message to such person), or you think for any reason 
>> that this message may have been addressed to you in error, you may not use 
>> or copy or deliver this message to anyone else. In such case, you should 
>> destroy this message and are asked to notify the sender by reply e-mail.
>>
>> 
>> Von: Ken Gaillot [kgail...@redhat.com]
>> Gesendet: Montag, 19. September 2016 16:28
>> An: Auer, Jens; Cluster Labs - All topics related to open-source clustering 
>> welcomed
>> Betreff: Re: [ClusterLabs] No DRBD resource promoted to master in 
>> Active/Passive setup
>>
>> On 09/19/2016 02:31 AM, Auer, Jens wrote:
>>> Hi,
>>>
>>> I am not sure that pacemaker should do any fencing here. In my setting, 
>>> corosync is configured to use a back-to-back connection for heartbeats. 
>>> This is different subnet then used by the ping resource that checks the 
>>> network connectivity and detects a failure. In my test, I bring down the 
>>> network device used by ping and this triggers the failover. The node status 
>>> is known by pacemaker since it receives heartbeats and it only a resource 
>>> failure. I asked for fencing conditions a few days ago, and basically was 
>>> asserted that resource failure should not trigger STONITH actions if not 
>>> explicitly configured.
>>
>> Is the network interface being taken down here used for corosync
>> communication? If so, that is a no

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-20 Thread Auer, Jens
vior you describe -- DRBD prevents
>> master takeover, to avoid stale data being used.
>
> This is my understanding as well, but there should be no network outage for 
> DRBD. I can reproduce the behavior by stopping cluster nodes which DRBD seems 
> to interpret as network outages since it cannot communicate with the shutdown 
> node anymore. Maybe I should ask on the DRBD mailing list?

OK, I think I follow you now: you're ifdown'ing the data traffic
interface, but the interfaces for both corosync and DRBD traffic are
still up. So, pacemaker detects the virtual IP failure on the traffic
interface, and correctly recovers the IP on the other node, but the DRBD
master role is not recovered.

If the behavior goes away when you remove the DRBD fencing config, then
it sounds like DRBD is seeing it as a network outage, and is adding the
constraint to prevent a stale master. Yes, I think that would be worth
bringing up on the DRBD list, though there might be some DRBD users here
who can chime in, too.

> Cheers,
>   Jens
> --
> Jens Auer | CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.a...@cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
> de.cgi.com/pflichtangaben.
>
> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
> Group Inc. and its affiliates may be contained in this message. If you are 
> not a recipient indicated or intended in this message (or responsible for 
> delivery of this message to such person), or you think for any reason that 
> this message may have been addressed to you in error, you may not use or copy 
> or deliver this message to anyone else. In such case, you should destroy this 
> message and are asked to notify the sender by reply e-mail.
>
> 
> Von: Ken Gaillot [kgail...@redhat.com]
> Gesendet: Montag, 19. September 2016 16:28
> An: Auer, Jens; Cluster Labs - All topics related to open-source clustering 
> welcomed
> Betreff: Re: [ClusterLabs] No DRBD resource promoted to master in 
> Active/Passive setup
>
> On 09/19/2016 02:31 AM, Auer, Jens wrote:
>> Hi,
>>
>> I am not sure that pacemaker should do any fencing here. In my setting, 
>> corosync is configured to use a back-to-back connection for heartbeats. This 
>> is different subnet then used by the ping resource that checks the network 
>> connectivity and detects a failure. In my test, I bring down the network 
>> device used by ping and this triggers the failover. The node status is known 
>> by pacemaker since it receives heartbeats and it only a resource failure. I 
>> asked for fencing conditions a few days ago, and basically was asserted that 
>> resource failure should not trigger STONITH actions if not explicitly 
>> configured.
>
> Is the network interface being taken down here used for corosync
> communication? If so, that is a node-level failure, and pacemaker will
> fence.
>
> There is a bit of a distinction between DRBD fencing and pacemaker
> fencing. The DRBD configuration is designed so that DRBD's fencing
> method is to go through pacemaker.
>
> When DRBD is configured with 'fencing resource-only' and 'fence-peer
> "/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
> it will try to add a constraint that prevents the other node from
> becoming master. It removes the constraint when connectivity is restored.
>
> I am not familiar with all the under-the-hood details, but IIUC, if
> pacemaker actually fences the node, then the other node can still take
> over the DRBD. But if there is a network outage and no pacemaker
> fencing, then you'll see the behavior you describe -- DRBD prevents
> master takeover, to avoid stale data being used.
>
>
>> I am also wondering why this is "sticky". After a failover test the DRBD 
>> resources are not working even if I restart the cluster on all nodes.
>>
>> Best wishes,
>>   Jens
>>
>> --
>> Dr. Jens Auer | CGI | Software Engineer
>> CGI Deutschland Ltd. & Co. KG
>> Rheinstraße 95 | 64295 Darmstadt | Germany
>> T: +49 6151 36860 154
>> jens.a...@cgi.com
>> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
>> de.cgi.com/pflichtangaben.
>>
>> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to 
>> CGI Group Inc. and its affiliates may be contained in this message. If you 
>> are not a recipient indicated or intended in this message (or responsible 
>> for delivery of this message to such person), or you think for any r

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-19 Thread Ken Gaillot
On 09/19/2016 09:48 AM, Auer, Jens wrote:
> Hi,
> 
>> Is the network interface being taken down here used for corosync
>> communication? If so, that is a node-level failure, and pacemaker will
>> fence.
> 
> We have different connections on each server:
> - A bonded 10GB network card for data traffic that will be accessed via a 
> virtual ip managed by pacemaker in 192.168.120.1/24. In the cluster nodes 
> MDA1PFP-S01 and MDA1PFP-S02 are assigned to 192.168.120.10 and 192.168.120.11.
> 
> - A dedicated back-to-back connection for corosync heartbeats in 
> 192.168.121.1/24. MDA1PFP-PCS01 and MDA1PFP-S02 are assigned to 
> 192.168.121.10 and 192.168.121.11. When the cluster is created, we use these 
> as primary node names and use the 10GB device as a second backup connection 
> for increased reliability: pcs cluster setup --name MDA1PFP 
> MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02
> 
> - A dedicated back-to-back connection for drbd in 192.168.122.1/24. Hosts 
> MDA1PFP-DRBD01 and MDA1PFP-DRBD02 are assigned 192.168.23.10 and 
> 192.168.123.11.

Ah, nice.

> Given that I think it is not a node-level failure. pcs status also reports 
> the nodes as online. I think this should not trigger fencing from pacemaker.
> 
>> When DRBD is configured with 'fencing resource-only' and 'fence-peer
>> "/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
>> it will try to add a constraint that prevents the other node from
>> becoming master. It removes the constraint when connectivity is restored.
> 
>> I am not familiar with all the under-the-hood details, but IIUC, if
>> pacemaker actually fences the node, then the other node can still take
>> over the DRBD. But if there is a network outage and no pacemaker
>> fencing, then you'll see the behavior you describe -- DRBD prevents
>> master takeover, to avoid stale data being used.
> 
> This is my understanding as well, but there should be no network outage for 
> DRBD. I can reproduce the behavior by stopping cluster nodes which DRBD seems 
> to interpret as network outages since it cannot communicate with the shutdown 
> node anymore. Maybe I should ask on the DRBD mailing list?

OK, I think I follow you now: you're ifdown'ing the data traffic
interface, but the interfaces for both corosync and DRBD traffic are
still up. So, pacemaker detects the virtual IP failure on the traffic
interface, and correctly recovers the IP on the other node, but the DRBD
master role is not recovered.

If the behavior goes away when you remove the DRBD fencing config, then
it sounds like DRBD is seeing it as a network outage, and is adding the
constraint to prevent a stale master. Yes, I think that would be worth
bringing up on the DRBD list, though there might be some DRBD users here
who can chime in, too.

> Cheers,
>   Jens
> --
> Jens Auer | CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.a...@cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
> de.cgi.com/pflichtangaben.
> 
> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
> Group Inc. and its affiliates may be contained in this message. If you are 
> not a recipient indicated or intended in this message (or responsible for 
> delivery of this message to such person), or you think for any reason that 
> this message may have been addressed to you in error, you may not use or copy 
> or deliver this message to anyone else. In such case, you should destroy this 
> message and are asked to notify the sender by reply e-mail.
> 
> ________________
> Von: Ken Gaillot [kgail...@redhat.com]
> Gesendet: Montag, 19. September 2016 16:28
> An: Auer, Jens; Cluster Labs - All topics related to open-source clustering 
> welcomed
> Betreff: Re: [ClusterLabs] No DRBD resource promoted to master in 
> Active/Passive setup
> 
> On 09/19/2016 02:31 AM, Auer, Jens wrote:
>> Hi,
>>
>> I am not sure that pacemaker should do any fencing here. In my setting, 
>> corosync is configured to use a back-to-back connection for heartbeats. This 
>> is different subnet then used by the ping resource that checks the network 
>> connectivity and detects a failure. In my test, I bring down the network 
>> device used by ping and this triggers the failover. The node status is known 
>> by pacemaker since it receives heartbeats and it only a resource failure. I 
>> asked for fencing conditions a few days ago, and basically was asserted that 
>> resource failure should not trigger STONITH actions if not explicitly 
>> configu

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-19 Thread Auer, Jens
Hi,

> Is the network interface being taken down here used for corosync
> communication? If so, that is a node-level failure, and pacemaker will
> fence.

We have different connections on each server:
- A bonded 10GB network card for data traffic that will be accessed via a 
virtual ip managed by pacemaker in 192.168.120.1/24. In the cluster nodes 
MDA1PFP-S01 and MDA1PFP-S02 are assigned to 192.168.120.10 and 192.168.120.11.

- A dedicated back-to-back connection for corosync heartbeats in 
192.168.121.1/24. MDA1PFP-PCS01 and MDA1PFP-S02 are assigned to 192.168.121.10 
and 192.168.121.11. When the cluster is created, we use these as primary node 
names and use the 10GB device as a second backup connection for increased 
reliability: pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 
MDA1PFP-PCS02,MDA1PFP-S02

- A dedicated back-to-back connection for drbd in 192.168.122.1/24. Hosts 
MDA1PFP-DRBD01 and MDA1PFP-DRBD02 are assigned 192.168.23.10 and 192.168.123.11.

Given that I think it is not a node-level failure. pcs status also reports the 
nodes as online. I think this should not trigger fencing from pacemaker.

> When DRBD is configured with 'fencing resource-only' and 'fence-peer
> "/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
> it will try to add a constraint that prevents the other node from
> becoming master. It removes the constraint when connectivity is restored.

> I am not familiar with all the under-the-hood details, but IIUC, if
> pacemaker actually fences the node, then the other node can still take
> over the DRBD. But if there is a network outage and no pacemaker
> fencing, then you'll see the behavior you describe -- DRBD prevents
> master takeover, to avoid stale data being used.

This is my understanding as well, but there should be no network outage for 
DRBD. I can reproduce the behavior by stopping cluster nodes which DRBD seems 
to interpret as network outages since it cannot communicate with the shutdown 
node anymore. Maybe I should ask on the DRBD mailing list?

Cheers,
  Jens
--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Ken Gaillot [kgail...@redhat.com]
Gesendet: Montag, 19. September 2016 16:28
An: Auer, Jens; Cluster Labs - All topics related to open-source clustering 
welcomed
Betreff: Re: [ClusterLabs] No DRBD resource promoted to master in 
Active/Passive setup

On 09/19/2016 02:31 AM, Auer, Jens wrote:
> Hi,
>
> I am not sure that pacemaker should do any fencing here. In my setting, 
> corosync is configured to use a back-to-back connection for heartbeats. This 
> is different subnet then used by the ping resource that checks the network 
> connectivity and detects a failure. In my test, I bring down the network 
> device used by ping and this triggers the failover. The node status is known 
> by pacemaker since it receives heartbeats and it only a resource failure. I 
> asked for fencing conditions a few days ago, and basically was asserted that 
> resource failure should not trigger STONITH actions if not explicitly 
> configured.

Is the network interface being taken down here used for corosync
communication? If so, that is a node-level failure, and pacemaker will
fence.

There is a bit of a distinction between DRBD fencing and pacemaker
fencing. The DRBD configuration is designed so that DRBD's fencing
method is to go through pacemaker.

When DRBD is configured with 'fencing resource-only' and 'fence-peer
"/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
it will try to add a constraint that prevents the other node from
becoming master. It removes the constraint when connectivity is restored.

I am not familiar with all the under-the-hood details, but IIUC, if
pacemaker actually fences the node, then the other node can still take
over the DRBD. But if there is a network outage and no pacemaker
fencing, then you'll see the behavior you describe -- DRBD prevents
master takeover, to avoid stale data being used.


> I am also wondering why this is "sticky". After a failover test the DRBD 
>

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-19 Thread Ken Gaillot
On 09/19/2016 02:31 AM, Auer, Jens wrote:
> Hi,
> 
> I am not sure that pacemaker should do any fencing here. In my setting, 
> corosync is configured to use a back-to-back connection for heartbeats. This 
> is different subnet then used by the ping resource that checks the network 
> connectivity and detects a failure. In my test, I bring down the network 
> device used by ping and this triggers the failover. The node status is known 
> by pacemaker since it receives heartbeats and it only a resource failure. I 
> asked for fencing conditions a few days ago, and basically was asserted that 
> resource failure should not trigger STONITH actions if not explicitly 
> configured.

Is the network interface being taken down here used for corosync
communication? If so, that is a node-level failure, and pacemaker will
fence.

There is a bit of a distinction between DRBD fencing and pacemaker
fencing. The DRBD configuration is designed so that DRBD's fencing
method is to go through pacemaker.

When DRBD is configured with 'fencing resource-only' and 'fence-peer
"/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
it will try to add a constraint that prevents the other node from
becoming master. It removes the constraint when connectivity is restored.

I am not familiar with all the under-the-hood details, but IIUC, if
pacemaker actually fences the node, then the other node can still take
over the DRBD. But if there is a network outage and no pacemaker
fencing, then you'll see the behavior you describe -- DRBD prevents
master takeover, to avoid stale data being used.


> I am also wondering why this is "sticky". After a failover test the DRBD 
> resources are not working even if I restart the cluster on all nodes. 
> 
> Best wishes,
>   Jens
> 
> --
> Dr. Jens Auer | CGI | Software Engineer
> CGI Deutschland Ltd. & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.a...@cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
> de.cgi.com/pflichtangaben.
> 
> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
> Group Inc. and its affiliates may be contained in this message. If you are 
> not a recipient indicated or intended in this message (or responsible for 
> delivery of this message to such person), or you think for any reason that 
> this message may have been addressed to you in error, you may not use or copy 
> or deliver this message to anyone else. In such case, you should destroy this 
> message and are asked to notify the sender by reply e-mail.
> 
>> -Original Message-----
>> From: Ken Gaillot [mailto:kgail...@redhat.com]
>> Sent: 16 September 2016 17:56
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] No DRBD resource promoted to master in 
>> Active/Passive
>> setup
>>
>> On 09/16/2016 10:02 AM, Auer, Jens wrote:
>>> Hi,
>>>
>>> I have an Active/Passive configuration with a drbd mast/slave resource:
>>>
>>> MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status Cluster name: MDA1PFP
>>> Last updated: Fri Sep 16 14:41:18 2016Last change: Fri Sep 16
>>> 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
>>> Stack: corosync
>>> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition
>>> with quorum
>>> 2 nodes and 7 resources configured
>>>
>>> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
>>>
>>> Full list of resources:
>>>
>>>  Master/Slave Set: drbd1_sync [drbd1]
>>>  Masters: [ MDA1PFP-PCS02 ]
>>>  Slaves: [ MDA1PFP-PCS01 ]
>>>  mda-ip(ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
>>>  Clone Set: ping-clone [ping]
>>>  Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
>>>  ACTIVE(ocf::heartbeat:Dummy):Started MDA1PFP-PCS02
>>>  shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02
>>>
>>> PCSD Status:
>>>   MDA1PFP-PCS01: Online
>>>   MDA1PFP-PCS02: Online
>>>
>>> Daemon Status:
>>>   corosync: active/disabled
>>>   pacemaker: active/disabled
>>>   pcsd: active/enabled
>>>
>>> MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
>>>  Master: drbd1_sync
>>>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
>>> clone-node-max=1 notify=true
>>>   Resource: drbd1 (class=ocf provider=linbit type=drbd)
>>>Attributes: drbd_resource=shared_fs
>>>Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
>>>promote interval=0s timeout

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-19 Thread Auer, Jens
Hi,

I am not sure that pacemaker should do any fencing here. In my setting, 
corosync is configured to use a back-to-back connection for heartbeats. This is 
different subnet then used by the ping resource that checks the network 
connectivity and detects a failure. In my test, I bring down the network device 
used by ping and this triggers the failover. The node status is known by 
pacemaker since it receives heartbeats and it only a resource failure. I asked 
for fencing conditions a few days ago, and basically was asserted that resource 
failure should not trigger STONITH actions if not explicitly configured.

I am also wondering why this is "sticky". After a failover test the DRBD 
resources are not working even if I restart the cluster on all nodes. 

Best wishes,
  Jens

--
Dr. Jens Auer | CGI | Software Engineer
CGI Deutschland Ltd. & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.

> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com]
> Sent: 16 September 2016 17:56
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] No DRBD resource promoted to master in 
> Active/Passive
> setup
> 
> On 09/16/2016 10:02 AM, Auer, Jens wrote:
> > Hi,
> >
> > I have an Active/Passive configuration with a drbd mast/slave resource:
> >
> > MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status Cluster name: MDA1PFP
> > Last updated: Fri Sep 16 14:41:18 2016Last change: Fri Sep 16
> > 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
> > Stack: corosync
> > Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition
> > with quorum
> > 2 nodes and 7 resources configured
> >
> > Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
> >
> > Full list of resources:
> >
> >  Master/Slave Set: drbd1_sync [drbd1]
> >  Masters: [ MDA1PFP-PCS02 ]
> >  Slaves: [ MDA1PFP-PCS01 ]
> >  mda-ip(ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
> >  Clone Set: ping-clone [ping]
> >  Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
> >  ACTIVE(ocf::heartbeat:Dummy):Started MDA1PFP-PCS02
> >  shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02
> >
> > PCSD Status:
> >   MDA1PFP-PCS01: Online
> >   MDA1PFP-PCS02: Online
> >
> > Daemon Status:
> >   corosync: active/disabled
> >   pacemaker: active/disabled
> >   pcsd: active/enabled
> >
> > MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
> >  Master: drbd1_sync
> >   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> > clone-node-max=1 notify=true
> >   Resource: drbd1 (class=ocf provider=linbit type=drbd)
> >Attributes: drbd_resource=shared_fs
> >Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
> >promote interval=0s timeout=90 (drbd1-promote-interval-0s)
> >demote interval=0s timeout=90 (drbd1-demote-interval-0s)
> >stop interval=0s timeout=100 (drbd1-stop-interval-0s)
> >monitor interval=60s (drbd1-monitor-interval-60s)
> >  Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
> >   Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
> >   Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
> >   stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
> >   monitor interval=1s (mda-ip-monitor-interval-1s)
> >  Clone: ping-clone
> >   Resource: ping (class=ocf provider=pacemaker type=ping)
> >Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1
> > timeout=1 attempts=3
> >Operations: start interval=0s timeout=60 (ping-start-interval-0s)
> >stop interval=0s timeout=20 (ping-stop-interval-0s)
> >monitor interval=1 (ping-monitor-interval-1)
> >  Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
> >   Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
> >   stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
> >   monitor inte

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-16 Thread Ken Gaillot
On 09/16/2016 10:02 AM, Auer, Jens wrote:
> Hi,
> 
> I have an Active/Passive configuration with a drbd mast/slave resource:
> 
> MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status
> Cluster name: MDA1PFP
> Last updated: Fri Sep 16 14:41:18 2016Last change: Fri Sep 16
> 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
> Stack: corosync
> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition
> with quorum
> 2 nodes and 7 resources configured
> 
> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
> 
> Full list of resources:
> 
>  Master/Slave Set: drbd1_sync [drbd1]
>  Masters: [ MDA1PFP-PCS02 ]
>  Slaves: [ MDA1PFP-PCS01 ]
>  mda-ip(ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
>  Clone Set: ping-clone [ping]
>  Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
>  ACTIVE(ocf::heartbeat:Dummy):Started MDA1PFP-PCS02
>  shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02
> 
> PCSD Status:
>   MDA1PFP-PCS01: Online
>   MDA1PFP-PCS02: Online
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> 
> MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
>  Master: drbd1_sync
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: drbd1 (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=shared_fs
>Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
>promote interval=0s timeout=90 (drbd1-promote-interval-0s)
>demote interval=0s timeout=90 (drbd1-demote-interval-0s)
>stop interval=0s timeout=100 (drbd1-stop-interval-0s)
>monitor interval=60s (drbd1-monitor-interval-60s)
>  Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
>   Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
>   stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
>   monitor interval=1s (mda-ip-monitor-interval-1s)
>  Clone: ping-clone
>   Resource: ping (class=ocf provider=pacemaker type=ping)
>Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1
> timeout=1 attempts=3
>Operations: start interval=0s timeout=60 (ping-start-interval-0s)
>stop interval=0s timeout=20 (ping-stop-interval-0s)
>monitor interval=1 (ping-monitor-interval-1)
>  Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
>   Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
>   stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
>   monitor interval=10 timeout=20 (ACTIVE-monitor-interval-10)
>  Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
>   Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
>   stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
>   monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)
> 
> MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full
> Location Constraints:
>   Resource: mda-ip
> Enabled on: MDA1PFP-PCS01 (score:50)
> (id:location-mda-ip-MDA1PFP-PCS01-50)
> Constraint: location-mda-ip
>   Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
> Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
> Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1)
> Ordering Constraints:
>   start ping-clone then start mda-ip (kind:Optional)
> (id:order-ping-clone-mda-ip-Optional)
>   promote drbd1_sync then start shared_fs (kind:Mandatory)
> (id:order-drbd1_sync-shared_fs-mandatory)
> Colocation Constraints:
>   ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
>   drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master)
> (with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
>   shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started)
> (with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)
> 
> The cluster starts fine, except resources starting not on the preferred
> host. I asked this in a different question to keep things separated.
> The status after starting is:
> Last updated: Fri Sep 16 14:39:57 2016  Last change: Fri Sep 16
> 14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
> Stack: corosync
> Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition
> with quorum
> 2 nodes and 7 resources configured
> 
> Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
> 
>  Master/Slave Set: drbd1_sync [drbd1]
>  Masters: [ MDA1PFP-PCS02 ]
>  Slaves: [ MDA1PFP-PCS01 ]
> mda-ip  (ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
>  Clone Set: ping-clone [ping]
>  Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
> ACTIVE  (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02
> shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02
> 
> From this state, I