Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

2017-02-06 Thread Ken Gaillot
On 02/06/2017 09:00 AM, Scott Greenlese wrote:
> Further explanation for my concern about --disabled not taking effect
> until after the iface-bridge was configured ...
> 
> The reason I wanted to create the iface-bridge resource "disabled", was
> to allow me the opportunity to impose
> a location constraint / rule on the resource to prevent it from being
> started on certain cluster nodes,
> where the specified slave vlan did not exist.
> 
> In my case, pacemaker assigned the resource to a cluster node where the
> specified slave vlan did not exist, which in turn
> triggered a fenced (off) action against that node (apparently, because
> the device could not be stopped, per Ken's reply earlier).
> 
> Again, my cluster is configured as "symmetric" , so I would have to "opt
> out" my new resource from
> certain cluster nodes via location constraint.
> 
> So, if this really is how --disable is designed to work, is there any
> way to impose a location constraint rule BEFORE
> the iface-bridge resource gets assigned. configured and started on a
> cluster node in a symmetrical cluster?

I would expect --disabled to behave like that already; I'm not sure
what's happening there.

But, you can add a resource and any constraints that apply to it
simultaneously. How to do this depends on whether you want to do it
interactively or scripted, and whether you prefer the low-level tools,
crm shell, or pcs.

If you want to script it via pcs, you can do pcs cluster cib $SOME_FILE,
then pcs -f $SOME_FILE , then pcs cluster
cib-push $SOME_FILE --config.

> 
> Thanks,
> 
> Scott Greenlese ... IBM KVM on System Z - Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com
> 
> 
> 
> Inactive hide details for Scott Greenlese---02/03/2017 03:23:40
> PM---Ken, Thanks for the explanation.Scott Greenlese---02/03/2017
> 03:23:40 PM---Ken, Thanks for the explanation.
> 
> From: Scott Greenlese/Poughkeepsie/IBM@IBMUS
> To: kgail...@redhat.com, Cluster Labs - All topics related to
> open-source clustering welcomed <users@clusterlabs.org>
> Date: 02/03/2017 03:23 PM
> Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource
> causes cluster node fence action.
> 
> 
> 
> 
> 
> Ken,
> 
> Thanks for the explanation.
> 
> One other thing, relating to the iface-bridge resource creation. I
> specified --disabled flag:
> 
>> [root@zs95kj VD]# date;pcs resource create br0_r1
>> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
>> monitor timeout="20s" interval="10s" --*disabled*
> 
> Does the bridge device have to be successfully configured by pacemaker
> before disabling the resource? It seems
> that that was the behavior, since it failed the resource and fenced the
> node instead of disabling the resource.
> Just checking with you to be sure.
> 
> Thanks again..
> 
> Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com
> 
> 
> 
> Inactive hide details for Ken Gaillot ---02/02/2017 03:29:12 PM---On
> 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks,Ken Gaillot
> ---02/02/2017 03:29:12 PM---On 02/02/2017 02:14 PM, Scott Greenlese
> wrote: > Hi folks,
> 
> From: Ken Gaillot <kgail...@redhat.com>
> To: users@clusterlabs.org
> Date: 02/02/2017 03:29 PM
> Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource
> causes cluster node fence action.
> 
> 
> 
> 
> On 02/02/2017 02:14 PM, Scott Greenlese wrote:
>> Hi folks,
>>
>> I'm testing iface-bridge resource support on a Linux KVM on System Z
>> pacemaker cluster.
>>
>> pacemaker-1.1.13-10.el7_2.ibm.1.s390x
>> corosync-2.3.4-7.el7_2.ibm.1.s390x
>>
>> I created an iface-bridge resource, but specified a non-existent
>> bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist).
>>
>> [root@zs95kj VD]# date;pcs resource create br0_r1
>> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
>> monitor timeout="20s" interval="10s" --disabled
>> Wed Feb 1 17:49:16 EST 2017
>> [root@zs95kj VD]#
>>
>> [root@zs95kj VD]# pcs resource show |grep br0
>> br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1
>> [root@zs95kj VD]#
>>
>> As you can see, the resource was created, but failed to start on the
>> target node zs93kppcs1.
>>
>> To my surprise, the target node zs93kppcs1 was unceremoniously fenced.
>>
>> pacemaker.log shows a fenc

Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

2017-02-06 Thread Scott Greenlese

Further explanation for my concern about --disabled not taking effect until
after the iface-bridge was configured  ...

The reason I wanted to create the iface-bridge resource "disabled", was to
allow me the opportunity to impose
a location constraint / rule  on the resource to prevent it from being
started on certain cluster nodes,
where the specified slave vlan did not exist.

In my case, pacemaker assigned the resource to a cluster node where the
specified slave vlan did not exist, which in turn
triggered a fenced (off) action against that node (apparently, because the
device could not be stopped, per Ken's reply earlier).

Again, my cluster is configured as "symmetric" , so I would have to "opt
out" my new resource from
certain cluster nodes via location constraint.

So, if this really is how --disable is designed to work, is there any way
to impose a location constraint rule BEFORE
the iface-bridge resource gets assigned. configured and started on a
cluster node in a symmetrical cluster?

Thanks,

Scott Greenlese ... IBM KVM on System Z -  Solutions Test,  Poughkeepsie,
N.Y.
  INTERNET:  swgre...@us.ibm.com





From:   Scott Greenlese/Poughkeepsie/IBM@IBMUS
To: kgail...@redhat.com, Cluster Labs - All topics related to
open-source clustering welcomed <users@clusterlabs.org>
Date:   02/03/2017 03:23 PM
Subject:    Re: [ClusterLabs] Failure to configure iface-bridge resource
    causes cluster node fence action.



Ken,

Thanks for the explanation.

One other thing, relating to the iface-bridge resource creation. I
specified --disabled flag:

> [root@zs95kj VD]# date;pcs resource create br0_r1
> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
> monitor timeout="20s" interval="10s" --disabled

Does the bridge device have to be successfully configured by pacemaker
before disabling the resource? It seems
that that was the behavior, since it failed the resource and fenced the
node instead of disabling the resource.
Just checking with you to be sure.

Thanks again..

Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y.
INTERNET: swgre...@us.ibm.com



Inactive hide details for Ken Gaillot ---02/02/2017 03:29:12 PM---On
02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks,Ken Gaillot
---02/02/2017 03:29:12 PM---On 02/02/2017 02:14 PM, Scott Greenlese wrote:
> Hi folks,

From: Ken Gaillot <kgail...@redhat.com>
To: users@clusterlabs.org
Date: 02/02/2017 03:29 PM
Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource
causes cluster node fence action.



On 02/02/2017 02:14 PM, Scott Greenlese wrote:
> Hi folks,
>
> I'm testing iface-bridge resource support on a Linux KVM on System Z
> pacemaker cluster.
>
> pacemaker-1.1.13-10.el7_2.ibm.1.s390x
> corosync-2.3.4-7.el7_2.ibm.1.s390x
>
> I created an iface-bridge resource, but specified a non-existent
> bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist).
>
> [root@zs95kj VD]# date;pcs resource create br0_r1
> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
> monitor timeout="20s" interval="10s" --disabled
> Wed Feb 1 17:49:16 EST 2017
> [root@zs95kj VD]#
>
> [root@zs95kj VD]# pcs resource show |grep br0
> br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1
> [root@zs95kj VD]#
>
> As you can see, the resource was created, but failed to start on the
> target node zs93kppcs1.
>
> To my surprise, the target node zs93kppcs1 was unceremoniously fenced.
>
> pacemaker.log shows a fence (off) action initiated against that target
> node, "because of resource failure(s)" :
>
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug:
> determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not
> configured' (6) instead of the expected value: 'ok' (0)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning:
> unpack_rsc_op_failure: Processing failed op stop for br0_r1 on
> zs93kjpcs1: not configured (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error:
> unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation
> stop failed 'not configured' (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug:
> determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not
> configured' (6) instead of the expected value: 'ok' (0)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning:
> unpack_rsc_op_failure: Processing failed op stop for br0_r1 on
> zs93kjpcs1: not configured (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error:
> unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation
> stop failed 'not configured' (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:96

Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

2017-02-03 Thread Scott Greenlese

Ken,

Thanks for the explanation.

One other thing, relating to the iface-bridge resource creation.   I
specified --disabled flag:

> [root@zs95kj VD]# date;pcs resource create br0_r1
> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
> monitor timeout="20s" interval="10s" --disabled

Does the bridge device have to be successfully configured by pacemaker
before disabling the resource?   It seems
that that was the behavior, since it failed the resource and fenced the
node instead of disabling the resource.
Just checking with you to be sure.

Thanks again..

Scott Greenlese ... IBM KVM on System Z Solutions Test,  Poughkeepsie, N.Y.
  INTERNET:  swgre...@us.ibm.com





From:   Ken Gaillot <kgail...@redhat.com>
To: users@clusterlabs.org
Date:   02/02/2017 03:29 PM
Subject:    Re: [ClusterLabs] Failure to configure iface-bridge resource
        causes cluster node fence action.



On 02/02/2017 02:14 PM, Scott Greenlese wrote:
> Hi folks,
>
> I'm testing iface-bridge resource support on a Linux KVM on System Z
> pacemaker cluster.
>
> pacemaker-1.1.13-10.el7_2.ibm.1.s390x
> corosync-2.3.4-7.el7_2.ibm.1.s390x
>
> I created an iface-bridge resource, but specified a non-existent
> bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist).
>
> [root@zs95kj VD]# date;pcs resource create br0_r1
> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op
> monitor timeout="20s" interval="10s" --disabled
> Wed Feb 1 17:49:16 EST 2017
> [root@zs95kj VD]#
>
> [root@zs95kj VD]# pcs resource show |grep br0
> br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1
> [root@zs95kj VD]#
>
> As you can see, the resource was created, but failed to start on the
> target node zs93kppcs1.
>
> To my surprise, the target node zs93kppcs1 was unceremoniously fenced.
>
> pacemaker.log shows a fence (off) action initiated against that target
> node, "because of resource failure(s)" :
>
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug:
> determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not
> configured' (6) instead of the expected value: 'ok' (0)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning:
> unpack_rsc_op_failure: Processing failed op stop for br0_r1 on
> zs93kjpcs1: not configured (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error:
> unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation
> stop failed 'not configured' (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug:
> determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not
> configured' (6) instead of the expected value: 'ok' (0)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning:
> unpack_rsc_op_failure: Processing failed op stop for br0_r1 on
> zs93kjpcs1: not configured (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error:
> unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation
> stop failed 'not configured' (6)
> Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:96 ) warning:
> pe_fence_node: Node zs93kjpcs1 will be fenced because of resource failure
(s)
>
>
> Thankfully, I was able to successfully create a iface-bridge resource
> when I changed the bridge_slaves value to an existent vlan interface.
>
> My main concern is, why would the response to a failed bridge config
> operation warrant a node fence (off) action? Isn't it enough to just
> fail the resource and try another cluster node,
> or at most, give up if it can't be started / configured on any node?
>
> Is there any way to control this harsh recovery action in the cluster?
>
> Thanks much..
>
>
> Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie,
N.Y.
> INTERNET: swgre...@us.ibm.com

It's actually the stop operation failure that leads to the fence.

If a resource fails to stop, fencing is the only way pacemaker can
recover the resource elsewhere. Consider a database master -- if it
doesn't stop, starting the master elsewhere could lead to severe data
inconsistency.

You can tell pacemaker to not attempt recovery, by setting on-fail=block
on the stop operation, so it doesn't need to fence. Obviously, that
prevents high availability, as manual intervention is required to do
anything further with the service.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org