Down below you replied to 2 threads. I think the latter is the one you intended to ... very confusing ... Sry for adding more spam - was hesitant - but I think there is a chance it removes some confusion ...
Klaus On Mon, Sep 4, 2023 at 10:29 PM Adil Bouazzaoui <adilb...@gmail.com> wrote: > Hi Jan, > > to add more information, we deployed Centreon 2 Node HA Cluster (Master in > DC 1 & Slave in DC 2), quorum device which is responsible for split-brain > is on DC 1 too, and the poller which is responsible for monitoring is i DC > 1 too. The problem is that a VIP address is required (attached to Master > node, in case of failover it will be moved to Slave) and we don't know what > VIP we should use? also we don't know what is the perfect setup for our > current scenario so if DC 1 goes down then the Slave on DC 2 will be the > Master, that's why we don't know where to place the Quorum device and the > poller? > > i hope to get some ideas so we can setup this cluster correctly. > thanks in advance. > > Adil Bouazzaoui > IT Infrastructure engineer > adil.bouazza...@tmandis.ma > adilb...@gmail.com > > Le lun. 4 sept. 2023 à 15:24, <users-requ...@clusterlabs.org> a écrit : > >> Send Users mailing list submissions to >> users@clusterlabs.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.clusterlabs.org/mailman/listinfo/users >> or, via email, send a message with subject or body 'help' to >> users-requ...@clusterlabs.org >> >> You can reach the person managing the list at >> users-ow...@clusterlabs.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Users digest..." >> >> >> Today's Topics: >> >> 1. Re: issue during Pacemaker failover testing (Klaus Wenninger) >> 2. Re: issue during Pacemaker failover testing (Klaus Wenninger) >> 3. Re: issue during Pacemaker failover testing (David Dolan) >> 4. Re: Centreon HA Cluster - VIP issue (Jan Friesse) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 4 Sep 2023 14:15:52 +0200 >> From: Klaus Wenninger <kwenn...@redhat.com> >> To: Cluster Labs - All topics related to open-source clustering >> welcomed <users@clusterlabs.org> >> Cc: David Dolan <daithido...@gmail.com> >> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing >> Message-ID: >> <CALrDAo0XqSRZ69LRArOPrLOOxwmCy1UuwqFPXsQzSC= >> wody...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> On Mon, Sep 4, 2023 at 1:44?PM Andrei Borzenkov <arvidj...@gmail.com> >> wrote: >> >> > On Mon, Sep 4, 2023 at 2:25?PM Klaus Wenninger <kwenn...@redhat.com> >> > wrote: >> > > >> > > >> > > Or go for qdevice with LMS where I would expect it to be able to >> really >> > go down to >> > > a single node left - any of the 2 last ones - as there is still >> qdevice.# >> > > Sry for the confusion btw. >> > > >> > >> > According to documentation, "LMS is also incompatible with quorum >> > devices, if last_man_standing is specified in corosync.conf then the >> > quorum device will be disabled". >> > >> >> That is why I said qdevice with LMS - but it was probably not explicit >> enough without telling that I meant the qdevice algorithm and not >> the corosync flag. >> >> Klaus >> >> > _______________________________________________ >> > Manage your subscription: >> > https://lists.clusterlabs.org/mailman/listinfo/users >> > >> > ClusterLabs home: https://www.clusterlabs.org/ >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/23e22260/attachment-0001.htm >> > >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 4 Sep 2023 14:32:39 +0200 >> From: Klaus Wenninger <kwenn...@redhat.com> >> To: Cluster Labs - All topics related to open-source clustering >> welcomed <users@clusterlabs.org> >> Cc: David Dolan <daithido...@gmail.com> >> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing >> Message-ID: >> < >> calrdao0v8bxp4ajwcobkeae6pimvgg2xme6ia+ohxshesx9...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidj...@gmail.com> >> wrote: >> >> > On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenn...@redhat.com> >> > wrote: >> > > >> > > >> > > >> > > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithido...@gmail.com> >> > wrote: >> > >> >> > >> Hi Klaus, >> > >> >> > >> With default quorum options I've performed the following on my 3 node >> > cluster >> > >> >> > >> Bring down cluster services on one node - the running services >> migrate >> > to another node >> > >> Wait 3 minutes >> > >> Bring down cluster services on one of the two remaining nodes - the >> > surviving node in the cluster is then fenced >> > >> >> > >> Instead of the surviving node being fenced, I hoped that the services >> > would migrate and run on that remaining node. >> > >> >> > >> Just looking for confirmation that my understanding is ok and if I'm >> > missing something? >> > > >> > > >> > > As said I've never used it ... >> > > Well when down to 2 nodes LMS per definition is getting into trouble >> as >> > after another >> > > outage any of them is gonna be alone. In case of an ordered shutdown >> > this could >> > > possibly be circumvented though. So I guess your fist attempt to >> enable >> > auto-tie-breaker >> > > was the right idea. Like this you will have further service at least >> on >> > one of the nodes. >> > > So I guess what you were seeing is the right - and unfortunately only >> > possible - behavior. >> > >> > I still do not see where fencing comes from. Pacemaker requests >> > fencing of the missing nodes. It also may request self-fencing, but >> > not in the default settings. It is rather hard to tell what happens >> > without logs from the last remaining node. >> > >> > That said, the default action is to stop all resources, so the end >> > result is not very different :) >> > >> >> But you are of course right. The expected behaviour would be that >> the leftover node stops the resources. >> But maybe we're missing something here. Hard to tell without >> the exact configuration including fencing. >> Again, as already said, I don't know anything about the LMS >> implementation with corosync. In theory there were both arguments >> to either suicide (but that would have to be done by pacemaker) or >> to automatically switch to some 2-node-mode once the remaining >> partition is reduced to just 2 followed by a fence-race (when done >> without the precautions otherwise used for 2-node-clusters). >> But I guess in this case it is none of those 2. >> >> Klaus >> >> > _______________________________________________ >> > Manage your subscription: >> > https://lists.clusterlabs.org/mailman/listinfo/users >> > >> > ClusterLabs home: https://www.clusterlabs.org/ >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/eec03b22/attachment-0001.htm >> > >> >> ------------------------------ >> >> Message: 3 >> Date: Mon, 4 Sep 2023 14:44:25 +0100 >> From: David Dolan <daithido...@gmail.com> >> To: Klaus Wenninger <kwenn...@redhat.com>, arvidj...@gmail.com >> Cc: Cluster Labs - All topics related to open-source clustering >> welcomed <users@clusterlabs.org> >> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing >> Message-ID: >> <CAH1k77CSK64= >> bgmnyqjo6b4gbbo2q06jhnp9xk2tcebrahv...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> Thanks Klaus\Andrei, >> >> So if I understand correctly what I'm trying probably shouldn't work. >> And I should attempt setting auto_tie_breaker in corosync and remove >> last_man_standing. >> Then, I should set up another server with qdevice and configure that using >> the LMS algorithm. >> >> Thanks >> David >> >> On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger <kwenn...@redhat.com> >> wrote: >> >> > >> > >> > On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidj...@gmail.com> >> > wrote: >> > >> >> On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenn...@redhat.com> >> >> wrote: >> >> > >> >> > >> >> > >> >> > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithido...@gmail.com> >> >> wrote: >> >> >> >> >> >> Hi Klaus, >> >> >> >> >> >> With default quorum options I've performed the following on my 3 >> node >> >> cluster >> >> >> >> >> >> Bring down cluster services on one node - the running services >> migrate >> >> to another node >> >> >> Wait 3 minutes >> >> >> Bring down cluster services on one of the two remaining nodes - the >> >> surviving node in the cluster is then fenced >> >> >> >> >> >> Instead of the surviving node being fenced, I hoped that the >> services >> >> would migrate and run on that remaining node. >> >> >> >> >> >> Just looking for confirmation that my understanding is ok and if I'm >> >> missing something? >> >> > >> >> > >> >> > As said I've never used it ... >> >> > Well when down to 2 nodes LMS per definition is getting into trouble >> as >> >> after another >> >> > outage any of them is gonna be alone. In case of an ordered shutdown >> >> this could >> >> > possibly be circumvented though. So I guess your fist attempt to >> enable >> >> auto-tie-breaker >> >> > was the right idea. Like this you will have further service at least >> on >> >> one of the nodes. >> >> > So I guess what you were seeing is the right - and unfortunately only >> >> possible - behavior. >> >> >> >> I still do not see where fencing comes from. Pacemaker requests >> >> fencing of the missing nodes. It also may request self-fencing, but >> >> not in the default settings. It is rather hard to tell what happens >> >> without logs from the last remaining node. >> >> >> >> That said, the default action is to stop all resources, so the end >> >> result is not very different :) >> >> >> > >> > But you are of course right. The expected behaviour would be that >> > the leftover node stops the resources. >> > But maybe we're missing something here. Hard to tell without >> > the exact configuration including fencing. >> > Again, as already said, I don't know anything about the LMS >> > implementation with corosync. In theory there were both arguments >> > to either suicide (but that would have to be done by pacemaker) or >> > to automatically switch to some 2-node-mode once the remaining >> > partition is reduced to just 2 followed by a fence-race (when done >> > without the precautions otherwise used for 2-node-clusters). >> > But I guess in this case it is none of those 2. >> > >> > Klaus >> > >> >> _______________________________________________ >> >> Manage your subscription: >> >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >> >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/dbb61369/attachment-0001.htm >> > >> >> ------------------------------ >> >> Message: 4 >> Date: Mon, 4 Sep 2023 16:23:40 +0200 >> From: Jan Friesse <jfrie...@redhat.com> >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Centreon HA Cluster - VIP issue >> Message-ID: <cd344f85-a161-2fe1-9f4e-61d7497d2...@redhat.com> >> Content-Type: text/plain; charset=utf-8; format=flowed >> >> Hi, >> >> >> On 02/09/2023 17:16, Adil Bouazzaoui wrote: >> > Hello, >> > >> > My name is Adil,i worked for Tman company, we are testing the Centreon >> HA >> > cluster to monitor our infrastructure for 13 companies, for now we are >> > using the 100 IT licence to test the platform, once everything is >> working >> > fine then we can purchase a licence suitable for our case. >> > >> > We're stuck at *scenario 2*: setting up Centreon HA Cluster with Master >> & >> > Slave on a different datacenters. >> > For *scenario 1*: setting up the Cluster with Master & Slave and VIP >> > address on the same network (VLAN) it is working fine. >> > >> > *Scenario 1: Cluster on Same network (same DC) ==> works fine* >> > Master in DC 1 VLAN 1: 172.30.15.10 /24 >> > Slave in DC 1 VLAN 1: 172.30.15.20 /24 >> > VIP in DC 1 VLAN 1: 172.30.15.30/24 >> > Quorum in DC 1 LAN: 192.168.1.10/24 >> > Poller in DC 1 LAN: 192.168.1.20/24 >> > >> > *Scenario 2: Cluster on different networks (2 separate DCs connected >> with >> > VPN) ==> still not working* >> >> corosync on all nodes needs to have direct connection to any other node. >> VPN should work as long as routing is correctly configured. What exactly >> is "still not working"? >> >> > Master in DC 1 VLAN 1: 172.30.15.10 /24 >> > Slave in DC 2 VLAN 2: 172.30.50.10 /24 >> > VIP: example 102.84.30.XXX. We used a public static IP from our internet >> > service provider, we thought that using a IP from a site network won't >> > work, if the site goes down then the VIP won't be reachable! >> > Quorum: 192.168.1.10/24 >> >> No clue what you mean by Quorum, but placing it in DC1 doesn't feel right. >> >> > Poller: 192.168.1.20/24 >> > >> > Our *goal *is to have Master & Slave nodes on different sites, so when >> Site >> > A goes down, we keep monitoring with the slave. >> > The problem is that we don't know how to set up the VIP address? Nor >> what >> > kind of VIP address will work? or how can the VIP address work in this >> > scenario? or is there anything else that can replace the VIP address to >> > make things work. >> > Also, can we use a backup poller? so if the poller 1 on Site A goes >> down, >> > then the poller 2 on Site B can take the lead? >> > >> > we looked everywhere (The watch, youtube, Reddit, Github...), and we >> still >> > couldn't get a workaround! >> > >> > the guide we used to deploy the 2 Nodes Cluster: >> > >> https://docs.centreon.com/docs/installation/installation-of-centreon-ha/overview/ >> > >> > attached the 2 DCs architecture example. >> > >> > We appreciate your support. >> > Thank you in advance. >> > >> > >> > Adil Bouazzaoui >> > IT Infrastructure Engineer >> > TMAN >> > adil.bouazza...@tmandis.ma >> > adilb...@gmail.com >> > +212 656 29 2020 >> > >> > >> > _______________________________________________ >> > Manage your subscription: >> > https://lists.clusterlabs.org/mailman/listinfo/users >> > >> > ClusterLabs home: https://www.clusterlabs.org/ >> > >> >> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >> >> ------------------------------ >> >> End of Users Digest, Vol 104, Issue 5 >> ************************************* >> > > > -- > > > *Adil Bouazzaoui* > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/