18.07.2018 04:21, Confidential Company пишет: >>> Hi, >>> >>> On my two-node active/passive setup, I configured fencing via >>> fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I >> expected >>> that both nodes will be stonithed simultaenously. >>> >>> On my test scenario, Node1 has ClusterIP resource. When I >> disconnect >>> service/corosync link physically, Node1 was fenced and Node2 keeps >> alive >>> given pcmk_delay=0 on both nodes. >>> >>> Can you explain the behavior above? >>> >> >> #node1 could not connect to ESX because links were disconnected. As >> the >> #most obvious explanation. >> >> #You have logs, you are the only one who can answer this question >> with >> #some certainty. Others can only guess. >> >> >> Oops, my bad. I forgot to tell. I have two interfaces on each virtual >> machine (nodes). second interface was used for ESX links, so fence >> can be executed even though corosync links were disconnected. Looking >> forward to your response. Thanks > > #Having no fence delay means a death match (each node killing the other) > #is possible, but it doesn't guarantee that it will happen. Some of the > #time, one node will detect the outage and fence the other one before > #the other one can react. > > #It's basically an Old West shoot-out -- they may reach for their guns > #at the same time, but one may be quicker. > > #As Andrei suggested, the logs from both nodes could give you a timeline > #of what happened when. > > > Hi andrei, kindly see below logs. Based on time of logs, Node1 should have > fenced first Node2, but in actual test/scenario, Node1 was fenced/shutdown > by Node2. >
Node1 tried to fence but failed. It could be connectivity, it could be credentials. > Is it possible to have a 2-Node active/passive setup in pacemaker/corosync > that the node that gets disconnected/interface down is the only one that > gets fenced? > If you could determine which node was disconnected you would not need any fencing at all. > Thanks guys > > *LOGS from Node2:* > > Jul 17 13:33:27 ArcosRhel2 corosync[1048]: [TOTEM ] A processor failed, > forming new configuration. ... > Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 will be > fenced because the node is no longer part of the cluster ... > Jul 17 13:33:50 ArcosRhel2 stonith-ng[1080]: notice: Operation 'reboot' > [2323] (call 2 from crmd.1084) for host 'ArcosRhel1' with device 'Fence1' > returned: 0 (OK) > Jul 17 13:33:50 ArcosRhel2 stonith-ng[1080]: notice: Operation reboot of > ArcosRhel1 by ArcosRhel2 for crmd.1084@ArcosRhel2.0426e6e1: OK > Jul 17 13:33:50 ArcosRhel2 crmd[1084]: notice: Stonith operation > 2/12:0:0:f9418e1f-1f13-4033-9eaa-aec705f807ef: OK (0) > Jul 17 13:33:50 ArcosRhel2 crmd[1084]: notice: Peer ArcosRhel1 was > terminated (reboot) by ArcosRhel2 for ArcosRhel2: OK ... > > > > *LOGS from NODE1* > Jul 17 13:33:26 ArcoSRhel1 corosync[1464]: [TOTEM ] A processor failed, > forming new configuration.... > Jul 17 13:33:28 ArcoSRhel1 pengine[1476]: warning: Node ArcosRhel2 will be > fenced because the node is no longer part of the cluster ... > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: warning: Mapping action='off' > to pcmk_reboot_action='off' > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:46 ArcoSRhel1 fence_vmware_soap: Unable to connect/login to > fencing device > Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: > fence_vmware_soap[7157] stderr: [ Unable to connect/login to fencing device > ] > Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: > fence_vmware_soap[7157] stderr: [ ] > Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: > fence_vmware_soap[7157] stderr: [ ] > > > > > > >>> See my config below: >>> >>> [root@ArcosRhel2 cluster]# pcs config >>> Cluster Name: ARCOSCLUSTER >>> Corosync Nodes: >>> ? ArcosRhel1 ArcosRhel2 >>> Pacemaker Nodes: >>> ? ArcosRhel1 ArcosRhel2 >>> >>> Resources: >>> ? Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >>> ? ?Attributes: cidr_netmask=32 ip=172.16.10.243 >>> ? ?Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) >>> ? ? ? ? ? ? ? ?start interval=0s timeout=20s (ClusterIP-start- >> interval-0s) >>> ? ? ? ? ? ? ? ?stop interval=0s timeout=20s (ClusterIP-stop- >> interval-0s) >>> >>> Stonith Devices: >>> ? Resource: Fence1 (class=stonith type=fence_vmware_soap) >>> ? ?Attributes: action=off ipaddr=172.16.10.151 login=admin >> passwd=123pass >>> pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s >> port=ArcosRhel1(Joniel) >>> ssl_insecure=1 pcmk_delay_max=0s >>> ? ?Operations: monitor interval=60s (Fence1-monitor-interval-60s) >>> ? Resource: fence2 (class=stonith type=fence_vmware_soap) >>> ? ?Attributes: action=off ipaddr=172.16.10.152 login=admin >> passwd=123pass >>> pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 >> pcmk_monitor_timeout=60s >>> port=ArcosRhel2(Ben) ssl_insecure=1 >>> ? ?Operations: monitor interval=60s (fence2-monitor-interval-60s) >>> Fencing Levels: >>> >>> Location Constraints: >>> ? ?Resource: Fence1 >>> ? ? ?Enabled on: ArcosRhel2 (score:INFINITY) >>> (id:location-Fence1-ArcosRhel2-INFINITY) >>> ? ?Resource: fence2 >>> ? ? ?Enabled on: ArcosRhel1 (score:INFINITY) >>> (id:location-fence2-ArcosRhel1-INFINITY) >>> Ordering Constraints: >>> Colocation Constraints: >>> Ticket Constraints: >>> >>> Alerts: >>> ? No alerts defined >>> >>> Resources Defaults: >>> ? No defaults set >>> Operations Defaults: >>> ? No defaults set >>> >>> Cluster Properties: >>> ? cluster-infrastructure: corosync >>> ? cluster-name: ARCOSCLUSTER >>> ? dc-version: 1.1.16-12.el7-94ff4df >>> ? have-watchdog: false >>> ? last-lrm-refresh: 1531810841 >>> ? stonith-enabled: true >>> >>> Quorum: >>> ? ?Options: > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org