Ken, Thanks for the explanation.
One other thing, relating to the iface-bridge resource creation. I specified --disabled flag: > [root@zs95kj VD]# date;pcs resource create br0_r1 > ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op > monitor timeout="20s" interval="10s" --disabled Does the bridge device have to be successfully configured by pacemaker before disabling the resource? It seems that that was the behavior, since it failed the resource and fenced the node instead of disabling the resource. Just checking with you to be sure. Thanks again.. Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. INTERNET: [email protected] From: Ken Gaillot <[email protected]> To: [email protected] Date: 02/02/2017 03:29 PM Subject: Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action. On 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks, > > I'm testing iface-bridge resource support on a Linux KVM on System Z > pacemaker cluster. > > pacemaker-1.1.13-10.el7_2.ibm.1.s390x > corosync-2.3.4-7.el7_2.ibm.1.s390x > > I created an iface-bridge resource, but specified a non-existent > bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist). > > [root@zs95kj VD]# date;pcs resource create br0_r1 > ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op > monitor timeout="20s" interval="10s" --disabled > Wed Feb 1 17:49:16 EST 2017 > [root@zs95kj VD]# > > [root@zs95kj VD]# pcs resource show |grep br0 > br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1 > [root@zs95kj VD]# > > As you can see, the resource was created, but failed to start on the > target node zs93kppcs1. > > To my surprise, the target node zs93kppcs1 was unceremoniously fenced. > > pacemaker.log shows a fence (off) action initiated against that target > node, "because of resource failure(s)" : > > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:96 ) warning: > pe_fence_node: Node zs93kjpcs1 will be fenced because of resource failure (s) > > > Thankfully, I was able to successfully create a iface-bridge resource > when I changed the bridge_slaves value to an existent vlan interface. > > My main concern is, why would the response to a failed bridge config > operation warrant a node fence (off) action? Isn't it enough to just > fail the resource and try another cluster node, > or at most, give up if it can't be started / configured on any node? > > Is there any way to control this harsh recovery action in the cluster? > > Thanks much.. > > > Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. > INTERNET: [email protected] It's actually the stop operation failure that leads to the fence. If a resource fails to stop, fencing is the only way pacemaker can recover the resource elsewhere. Consider a database master -- if it doesn't stop, starting the master elsewhere could lead to severe data inconsistency. You can tell pacemaker to not attempt recovery, by setting on-fail=block on the stop operation, so it doesn't need to fence. Obviously, that prevents high availability, as manual intervention is required to do anything further with the service. _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
