On 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks, > > I'm testing iface-bridge resource support on a Linux KVM on System Z > pacemaker cluster. > > pacemaker-1.1.13-10.el7_2.ibm.1.s390x > corosync-2.3.4-7.el7_2.ibm.1.s390x > > I created an iface-bridge resource, but specified a non-existent > bridge_slaves value, vlan1292 (i.e. vlan1292 doesn't exist). > > [root@zs95kj VD]# date;pcs resource create br0_r1 > ocf:heartbeat:iface-bridge bridge_name=br0 bridge_slaves=vlan1292 op > monitor timeout="20s" interval="10s" --disabled > Wed Feb 1 17:49:16 EST 2017 > [root@zs95kj VD]# > > [root@zs95kj VD]# pcs resource show |grep br0 > br0_r1 (ocf::heartbeat:iface-bridge): FAILED zs93kjpcs1 > [root@zs95kj VD]# > > As you can see, the resource was created, but failed to start on the > target node zs93kppcs1. > > To my surprise, the target node zs93kppcs1 was unceremoniously fenced. > > pacemaker.log shows a fence (off) action initiated against that target > node, "because of resource failure(s)" : > > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2719 ) debug: > determine_op_status: br0_r1_stop_0 on zs93kjpcs1 returned 'not > configured' (6) instead of the expected value: 'ok' (0) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:2602 ) warning: > unpack_rsc_op_failure: Processing failed op stop for br0_r1 on > zs93kjpcs1: not configured (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:3244 ) error: > unpack_rsc_op: Preventing br0_r1 from re-starting anywhere: operation > stop failed 'not configured' (6) > Feb 01 17:55:56 [52941] zs95kj crm_resource: ( unpack.c:96 ) warning: > pe_fence_node: Node zs93kjpcs1 will be fenced because of resource failure(s) > > > Thankfully, I was able to successfully create a iface-bridge resource > when I changed the bridge_slaves value to an existent vlan interface. > > My main concern is, why would the response to a failed bridge config > operation warrant a node fence (off) action? Isn't it enough to just > fail the resource and try another cluster node, > or at most, give up if it can't be started / configured on any node? > > Is there any way to control this harsh recovery action in the cluster? > > Thanks much.. > > > Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. > INTERNET: swgre...@us.ibm.com
It's actually the stop operation failure that leads to the fence. If a resource fails to stop, fencing is the only way pacemaker can recover the resource elsewhere. Consider a database master -- if it doesn't stop, starting the master elsewhere could lead to severe data inconsistency. You can tell pacemaker to not attempt recovery, by setting on-fail=block on the stop operation, so it doesn't need to fence. Obviously, that prevents high availability, as manual intervention is required to do anything further with the service. _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org