On 07/24/2017 09:46 PM, Kristián Feldsam wrote: > so why to use some other fencing method like disablink port on switch, > so nobody can acces faultly node and write data to it. it is common > practice too.
Well don't get me wrong here. I don't want to hard-sell sbd. Just though that very likely requirements that prevent usage of a remote-controlled power-switch will make access to a switch to disable the ports unusable as well. And if a working qdevice setup is there already the gap between what he thought he would get from qdevice and what he actually had just matches exactly quorum-based-watchdog-fencing. But you are of course right. I don't really know the scenario. Maybe fabric fencing is the perfect match - good to mention it here as a possibility. Regards, Klaus > > S pozdravem Kristián Feldsam > Tel.: +420 773 303 353, +421 944 137 535 > E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz> > > www.feldhost.cz <http://www.feldhost.cz> - FeldHost™ – profesionální > hostingové a serverové služby za adekvátní ceny. > > FELDSAM s.r.o. > V rohu 434/3 > Praha 4 – Libuš, PSČ 142 00 > IČ: 290 60 958, DIČ: CZ290 60 958 > C 200350 vedená u Městského soudu v Praze > > Banka: Fio banka a.s. > Číslo účtu: 2400330446/2010 > BIC: FIOBCZPPXX > IBAN: CZ82 2010 0000 0024 0033 0446 > >> On 24 Jul 2017, at 21:16, Klaus Wenninger <kwenn...@redhat.com >> <mailto:kwenn...@redhat.com>> wrote: >> >> On 07/24/2017 08:27 PM, Prasad, Shashank wrote: >>> My understanding is that SBD will need a shared storage between >>> clustered nodes. >>> And that, SBD will need at least 3 nodes in a cluster, if using w/o >>> shared storage. >> >> Haven't tried to be honest but reason for 3 nodes is that without >> shared disk you need a real quorum-source and not something >> 'faked' as with 2-node-feature in corosync. >> But I don't see anything speaking against getting the proper >> quorum via qdevice instead with a third full cluster-node. >> >>> >>> Therefore, for systems which do NOT use shared storage between 1+1 >>> HA clustered nodes, SBD may NOT be an option. >>> Correct me, if I am wrong. >>> >>> For cluster systems using the likes of iDRAC/IMM2 fencing agents, >>> which have redundant but shared power supply units with the nodes, >>> the normal fencing mechanisms should work for all resiliency >>> scenarios, but for IMM2/iDRAC are being NOT reachable for whatsoever >>> reasons. And, to bail out of those situations in the absence of SBD, >>> I believe using used-defined failover hooks (via scripts) into >>> Pacemaker Alerts, with sudo permissions for ‘hacluster’, should help. >> >> If you don't see your fencing device assuming after some time >> the the corresponding node will probably be down is quite risky >> in my opinion. >> But why not assure it to be down using a watchdog? >> >>> >>> Thanx. >>> >>> >>> *From:* Klaus Wenninger [mailto:kwenn...@redhat.com] >>> *Sent:* Monday, July 24, 2017 11:31 PM >>> *To:* Cluster Labs - All topics related to open-source clustering >>> welcomed; Prasad, Shashank >>> *Subject:* Re: [ClusterLabs] Two nodes cluster issue >>> >>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote: >>> >>> Sometimes IPMI fence devices use shared power of the node, and >>> it cannot be avoided. >>> In such scenarios the HA cluster is NOT able to handle the power >>> failure of a node, since the power is shared with its own fence >>> device. >>> The failure of IPMI based fencing can also exist due to other >>> reasons also. >>> >>> A failure to fence the failed node will cause cluster to be >>> marked UNCLEAN. >>> To get over it, the following command needs to be invoked on the >>> surviving node. >>> >>> pcs stonith confirm <failed_node_name> --force >>> >>> This can be automated by hooking a recovery script, when the the >>> Stonith resource ‘Timed Out’ event. >>> To be more specific, the Pacemaker Alerts can be used for watch >>> for Stonith timeouts and failures. >>> In that script, all that’s essentially to be executed is the >>> aforementioned command. >>> >>> >>> If I get you right here you can disable fencing then in the first place. >>> Actually quorum-based-watchdog-fencing is the way to do this in a >>> safe manner. This of course assumes you have a proper source for >>> quorum in your 2-node-setup with e.g. qdevice or using a shared >>> disk with sbd (not directly pacemaker quorum here but similar thing >>> handled inside sbd). >>> >>> >>> Since the alerts are issued from ‘hacluster’ login, sudo permissions >>> for ‘hacluster’ needs to be configured. >>> >>> Thanx. >>> >>> >>> *From:* Klaus Wenninger [mailto:kwenn...@redhat.com] >>> *Sent:* Monday, July 24, 2017 9:24 PM >>> *To:* Kristián Feldsam; Cluster Labs - All topics related to >>> open-source clustering welcomed >>> *Subject:* Re: [ClusterLabs] Two nodes cluster issue >>> >>> On 07/24/2017 05:37 PM, Kristián Feldsam wrote: >>> >>> I personally think that power off node by switched pdu is more >>> safe, or not? >>> >>> >>> True if that is working in you environment. If you can't do a >>> physical setup >>> where you aren't simultaneously loosing connection to both your node and >>> the switch-device (or you just want to cover cases where that happens) >>> you have to come up with something else. >>> >>> >>> >>> >>> S pozdravem Kristián Feldsam >>> Tel.: +420 773 303 353, +421 944 137 535 >>> E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz> >>> >>> www.feldhost.cz <http://www.feldhost.cz/> - *Feld*Host™ – >>> profesionální hostingové a serverové služby za adekvátní ceny. >>> >>> FELDSAM s.r.o. >>> V rohu 434/3 >>> Praha 4 – Libuš, PSČ 142 00 >>> IČ: 290 60 958, DIČ: CZ290 60 958 >>> C 200350 vedená u Městského soudu v Praze >>> >>> Banka: Fio banka a.s. >>> Číslo účtu: 2400330446/2010 >>> BIC: FIOBCZPPXX >>> IBAN: CZ82 2010 0000 0024 0033 0446 >>> >>> >>> On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenn...@redhat.com >>> <mailto:kwenn...@redhat.com>> wrote: >>> >>> On 07/24/2017 05:15 PM, Tomer Azran wrote: >>> >>> I still don't understand why the qdevice concept doesn't >>> help on this situation. Since the master node is down, I >>> would expect the quorum to declare it as dead. >>> Why doesn't it happens? >>> >>> >>> That is not how quorum works. It just limits the decision-making >>> to the quorate subset of the cluster. >>> Still the unknown nodes are not sure to be down. >>> That is why I suggested to have quorum-based watchdog-fencing >>> with sbd. >>> That would assure that within a certain time all nodes of the >>> non-quorate part >>> of the cluster are down. >>> >>> >>> >>> >>> >>> >>> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri >>> Maziuk" <dmitri.maz...@gmail.com >>> <mailto:dmitri.maz...@gmail.com>> wrote: >>> >>> On 2017-07-24 07:51, Tomer Azran wrote: >>> >>> > We don't have the ability to use it. >>> >>> > Is that the only solution? >>> >>> >>> >>> No, but I'd recommend thinking about it first. Are you sure you will >>> >>> care about your cluster working when your server room is on fire? >>> 'Cause >>> >>> unless you have halon suppression, your server room is a complete >>> >>> write-off anyway. (Think water from sprinklers hitting rich chunky >>> volts >>> >>> in the servers.) >>> >>> >>> >>> Dima >>> >>> >>> >>> _______________________________________________ >>> >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> >>> >>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> >>> >>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>> >>> >>> >>> -- >>> >>> Klaus Wenninger >>> >>> >>> >>> Senior Software Engineer, EMEA ENG Openstack Infrastructure >>> >>> >>> >>> Red Hat >>> >>> >>> >>> kwenn...@redhat.com <mailto:kwenn...@redhat.com> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> <mailto:Users@clusterlabs.org> >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> <http://www.clusterlabs.org/> >>> Getting >>> started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org