Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.
On 02/06/2017 09:00 AM, Scott Greenlese wrote: > Further explanation for my concern about --disabled not taking effect > until after the iface-bridge was configured ... > > The reason I wanted to create the iface-bridge resource "disabled", was > to allow me the opportunity to impose > a location constraint / rule on the resource to prevent it from being > started on certain cluster nodes, > where the specified slave vlan did not exist. > > In my case, pacemaker assigned the resource to a cluster node where the > specified slave vlan did not exist, which in turn > triggered a fenced (off) action against that node (apparently, because > the device could not be stopped, per Ken's reply earlier). > > Again, my cluster is configured as "symmetric" , so I would have to "opt > out" my new resource from > certain cluster nodes via location constraint. > > So, if this really is how --disable is designed to work, is there any > way to impose a location constraint rule BEFORE > the iface-bridge resource gets assigned. configured and started on a > cluster node in a symmetrical cluster? I would expect --disabled to behave like that already; I'm not sure what's happening there. But, you can add a resource and any constraints that apply to it simultaneously. How to do this depends on whether you want to do it interactively or scripted, and whether you prefer the low-level tools, crm shell, or pcs. If you want to script it via pcs, you can do pcs cluster cib $SOME_FILE, then pcs -f $SOME_FILE , then pcs cluster cib-push $SOME_FILE --config. > > Thanks, > > Scott Greenlese ... IBM KVM on System Z - Solutions Test, Poughkeepsie, N.Y. > INTERNET: swgre...@us.ibm.com > > > > Inactive hide details for Scott Greenlese---02/03/2017 03:23:40 > PM---Ken, Thanks for the explanation.Scott Greenlese---02/03/2017 > 03:23:40 PM---Ken, Thanks for the explanation. > > From: Scott Greenlese/Poughkeepsie/IBM@IBMUS > To: kgail...@redhat.com, Cluster Labs - All topics related to > open-source clustering welcomed <users@clusterlabs.org> > Date: 02/03/2017 03:23 PM > Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource > causes cluster node fence action. > > > > > > Ken, > > Thanks for the explanation. > > One other thing, relating to the iface-bridge resource creation. I > specified --disabled flag: > >> [root@zs95kj VD]# date;pcs resource create br0_r1 >> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op >> monitor timeout="20s" interval="10s" --*disabled* > > Does the bridge device have to be successfully configured by pacemaker > before disabling the resource? It seems > that that was the behavior, since it failed the resource and fenced the > node instead of disabling the resource. > Just checking with you to be sure. > > Thanks again.. > > Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. > INTERNET: swgre...@us.ibm.com > > > > Inactive hide details for Ken Gaillot ---02/02/2017 03:29:12 PM---On > 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks,Ken Gaillot > ---02/02/2017 03:29:12 PM---On 02/02/2017 02:14 PM, Scott Greenlese > wrote: > Hi folks, > > From: Ken Gaillot <kgail...@redhat.com> > To: users@clusterlabs.org > Date: 02/02/2017 03:29 PM > Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource > causes cluster node fence action. > > > > > On 02/02/2017 02:14 PM, Scott Greenlese wrote: >> Hi folks, >> >> I'm testing iface-bridge resource support on a Linux KVM on System Z >> pacemaker cluster. >> >> pacemaker-1.1.13-10.el7_2.ibm.1.s390x >> corosync-2.3.4-7.el7_2.ibm.1.s390x >> >> I created an iface-bridge resource, but specified a non-existent >> bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist). >> >> [root@zs95kj VD]# date;pcs resource create br0_r1 >> ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op >> monitor timeout="20s" interval="10s" --disabled >> Wed Feb 1 17:49:16 EST 2017 >> [root@zs95kj VD]# >> >> [root@zs95kj VD]# pcs resource show |grep br0 >> br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1 >> [root@zs95kj VD]# >> >> As you can see, the resource was created, but failed to start on the >> target node zs93kppcs1. >> >> To my surprise, the target node zs93kppcs1 was unceremoniously fenced. >> >> pacemaker.log shows a fenc
Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.
Further explanation for my concern about --disabled not taking effect until after the iface-bridge was configured ... The reason I wanted to create the iface-bridge resource "disabled", was to allow me the opportunity to impose a location constraint / rule on the resource to prevent it from being started on certain cluster nodes, where the specified slave vlan did not exist. In my case, pacemaker assigned the resource to a cluster node where the specified slave vlan did not exist, which in turn triggered a fenced (off) action against that node (apparently, because the device could not be stopped, per Ken's reply earlier). Again, my cluster is configured as "symmetric" , so I would have to "opt out" my new resource from certain cluster nodes via location constraint. So, if this really is how --disable is designed to work, is there any way to impose a location constraint rule BEFORE the iface-bridge resource gets assigned. configured and started on a cluster node in a symmetrical cluster? Thanks, Scott Greenlese ... IBM KVM on System Z - Solutions Test, Poughkeepsie, N.Y. INTERNET: swgre...@us.ibm.com From: Scott Greenlese/Poughkeepsie/IBM@IBMUS To: kgail...@redhat.com, Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Date: 02/03/2017 03:23 PM Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action. Ken, Thanks for the explanation. One other thing, relating to the iface-bridge resource creation. I specified --disabled flag: > [root@zs95kj VD]# date;pcs resource create br0_r1 > ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op > monitor timeout="20s" interval="10s" --disabled Does the bridge device have to be successfully configured by pacemaker before disabling the resource? It seems that that was the behavior, since it failed the resource and fenced the node instead of disabling the resource. Just checking with you to be sure. Thanks again.. Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. INTERNET: swgre...@us.ibm.com Inactive hide details for Ken Gaillot ---02/02/2017 03:29:12 PM---On 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks,Ken Gaillot ---02/02/2017 03:29:12 PM---On 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks, From: Ken Gaillot <kgail...@redhat.com> To: users@clusterlabs.org Date: 02/02/2017 03:29 PM Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action. On 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks, > > I'm testing iface-bridge resource support on a Linux KVM on System Z > pacemaker cluster. > > pacemaker-1.1.13-10.el7_2.ibm.1.s390x > corosync-2.3.4-7.el7_2.ibm.1.s390x > > I created an iface-bridge resource, but specified a non-existent > bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist). > > [root@zs95kj VD]# date;pcs resource create br0_r1 > ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op > monitor timeout="20s" interval="10s" --disabled > Wed Feb 1 17:49:16 EST 2017 > [root@zs95kj VD]# > > [root@zs95kj VD]# pcs resource show |grep br0 > br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1 > [root@zs95kj VD]# > > As you can see, the resource was created, but failed to start on the > target node zs93kppcs1. > > To my surprise, the target node zs93kppcs1 was unceremoniously fenced. > > pacemaker.log shows a fence (off) action initiated against that target > node, "because of resource failure(s)" : > > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:96
Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.
Ken, Thanks for the explanation. One other thing, relating to the iface-bridge resource creation. I specified --disabled flag: > [root@zs95kj VD]# date;pcs resource create br0_r1 > ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op > monitor timeout="20s" interval="10s" --disabled Does the bridge device have to be successfully configured by pacemaker before disabling the resource? It seems that that was the behavior, since it failed the resource and fenced the node instead of disabling the resource. Just checking with you to be sure. Thanks again.. Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. INTERNET: swgre...@us.ibm.com From: Ken Gaillot <kgail...@redhat.com> To: users@clusterlabs.org Date: 02/02/2017 03:29 PM Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action. On 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks, > > I'm testing iface-bridge resource support on a Linux KVM on System Z > pacemaker cluster. > > pacemaker-1.1.13-10.el7_2.ibm.1.s390x > corosync-2.3.4-7.el7_2.ibm.1.s390x > > I created an iface-bridge resource, but specified a non-existent > bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist). > > [root@zs95kj VD]# date;pcs resource create br0_r1 > ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op > monitor timeout="20s" interval="10s" --disabled > Wed Feb 1 17:49:16 EST 2017 > [root@zs95kj VD]# > > [root@zs95kj VD]# pcs resource show |grep br0 > br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1 > [root@zs95kj VD]# > > As you can see, the resource was created, but failed to start on the > target node zs93kppcs1. > > To my surprise, the target node zs93kppcs1 was unceremoniously fenced. > > pacemaker.log shows a fence (off) action initiated against that target > node, "because of resource failure(s)" : > > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:96 ) warning: > pe_fence_node: Node zs93kjpcs1 will be fenced because of resource failure (s) > > > Thankfully, I was able to successfully create a iface-bridge resource > when I changed the bridge_slaves value to an existent vlan interface. > > My main concern is, why would the response to a failed bridge config > operation warrant a node fence (off) action? Isn't it enough to just > fail the resource and try another cluster node, > or at most, give up if it can't be started / configured on any node? > > Is there any way to control this harsh recovery action in the cluster? > > Thanks much.. > > > Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. > INTERNET: swgre...@us.ibm.com It's actually the stop operation failure that leads to the fence. If a resource fails to stop, fencing is the only way pacemaker can recover the resource elsewhere. Consider a database master -- if it doesn't stop, starting the master elsewhere could lead to severe data inconsistency. You can tell pacemaker to not attempt recovery, by setting on-fail=block on the stop operation, so it doesn't need to fence. Obviously, that prevents high availability, as manual intervention is required to do anything further with the service. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org