Re: [ClusterLabs] @ maillist Admins - DMARC (yahoo)

2021-07-13 Thread Andrei Borzenkov
On Mon, Jul 12, 2021 at 5:50 PM wrote: > > On Sat, 2021-07-10 at 12:34 +0100, lejeczek wrote: > > Hi Admins(of this mailing list) > > > > Could you please fix in DMARC(s) so those of us who are on > > Yahoo would be able to receive own emails/thread. > > > > many thanks, L. > > I suppose we

Re: [ClusterLabs] QDevice vs 3rd host for majority node quorum

2021-07-13 Thread Andrei Borzenkov
On 13.07.2021 19:52, Gerry R Sommerville wrote: > Hello everyone, > I am currently comparing using QDevice vs adding a 3rd host to my > even-number-node cluster and I am wondering about the details concerning > network > communication. > For example, say my cluster is utilizing multiple

Re: [ClusterLabs] unexpected fenced node and promotion of the new master PAF - postgres

2021-07-13 Thread Andrei Borzenkov
On 13.07.2021 23:09, damiano giuliani wrote: > Hi Klaus, thanks for helping, im quite lost because cant find out the > causes. > i attached the corosync logs of all three nodes hoping you guys can find > and hint me something i cant see. i really appreciate the effort. > the old master log seems

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: @ maillist Admins ‑ DMARC (yahoo)

2021-07-14 Thread Andrei Borzenkov
On Wed, Jul 14, 2021 at 10:21 AM Ulrich Windl wrote: > > What I meant is: > The original signature confirms that the message is from the submitter > (author). > After mangling the message, you can't re-testify that the message is still > from that author, but you can testify that the message is

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-03 Thread Andrei Borzenkov
On Tue, Aug 3, 2021 at 11:40 AM Antony Stone wrote: > > To implement the above "one resource which can run anywhere, but only a single > instance", I joined together clusters A and B, and placed the corresponding > location constraints on the resources I want only at A and the ones I want > only

Re: [ClusterLabs] Antw: [EXT] Moving resource only one way

2021-08-03 Thread Andrei Borzenkov
On 03.08.2021 20:19, Ervin Hegedüs wrote: > Hi there, > > Okay, so I thought I'm done, but today I ran into an issue. There are two > nodes, here is the config: > > node 1: sles15-1 > node 2: sles15-2 > primitive virtualip IPaddr2 \ > params ip=192.168.72.27 nic=eth0 cidr_netmask=24 \ >

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Andrei Borzenkov
On Mon, Aug 9, 2021 at 3:07 PM Andreas Janning wrote: > > Hi, > > I have just tried your suggestion by adding > name="interleave" value="true"/> > to the clone configuration. > Unfortunately, the behavior stays the same. The service is still restarted on > the passive node when

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Andrei Borzenkov
On 09.08.2021 16:00, Andreas Janning wrote: > Hi, > > yes, by "service" I meant the apache-clone resource. > > Maybe I can give a more stripped down and detailed example: > > *Given the following configuration:* > [root@pacemaker-test-1 cluster]# pcs cluster cib --config > > > >

Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Andrei Borzenkov
On 09.08.2021 22:57, Reid Wahl wrote: > On Mon, Aug 9, 2021 at 6:19 AM Andrei Borzenkov wrote: > >> On 09.08.2021 16:00, Andreas Janning wrote: >>> Hi, >>> >>> yes, by "service" I meant the apache-clone resource. >>> >>> Maybe I

[ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-05 Thread Andrei Borzenkov
Three nodes A, B, C. Communication between A and B is blocked (completely - no packet can come in both direction). A and B can communicate with C. I expected that result will be two partitions - (A, C) and (B, C). To my surprise, A went offline leaving (B, C) running. It was always the same node

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > > > Won't something like this work ? Each node in LA will have same score of > > 5000, while other cities will be -5000. > > > > pcs constraint location DummyRes1 rule

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On Wed, Aug 4, 2021 at 5:03 PM Antony Stone wrote: > > On Wednesday 04 August 2021 at 13:31:12, Andrei Borzenkov wrote: > > > On Wed, Aug 4, 2021 at 1:48 PM Antony Stone wrote: > > > On Tuesday 03 August 2021 at 12:12:03, Strahil Nikolov via Users wrote: > > >

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-04 Thread Andrei Borzenkov
On 05.08.2021 00:01, Antony Stone wrote: > On Wednesday 04 August 2021 at 22:06:39, Frank D. Engel, Jr. wrote: > >> There is no safe way to do what you are trying to do. >> >> If the resource is on cluster A and contact is lost between clusters A >> and B due to a network failure, how does

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Fri, Aug 6, 2021 at 3:47 PM Ulrich Windl wrote: > > >>> Antony Stone schrieb am 06.08.2021 um > 14:41 in > Nachricht <202108061441.59936.antony.st...@ha.open.source.it>: > ... > > location pref_A GroupA rule ‑inf: site ne cityA > > location pref_B GroupB rule ‑inf: site ne cityB >

Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-06 Thread Andrei Borzenkov
On Thu, Aug 5, 2021 at 9:25 PM Andrei Borzenkov wrote: > > Three nodes A, B, C. Communication between A and B is blocked > (completely - no packet can come in both direction). A and B can > communicate with C. > > I expected that result will be two partitions - (A, C) and (B, C)

Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Fri, Aug 6, 2021 at 3:42 PM Antony Stone wrote: > > On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote: > > > On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote: > > > > > > For anyone interested in the detail of how to do this (without needing >

Re: [ClusterLabs] Sub‑clusters / super‑clusters - working :)

2021-08-06 Thread Andrei Borzenkov
On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote: > > On Thursday 05 August 2021 at 10:51:37, Antony Stone wrote: > > > On Thursday 05 August 2021 at 07:48:37, Ulrich Windl wrote: > > > > > > Have you ever tried to find out why this happens? (Talking about logs) > > > > Not in detail, no, but

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Move a resource only where another has Started

2021-10-11 Thread Andrei Borzenkov
On 11.10.2021 10:15, Ulrich Windl wrote: >>>> Andrei Borzenkov schrieb am 10.10.2021 um 16:52 in > Nachricht : >> On 10.10.2021 14:29, martin doc wrote: > > ... >> For each resource pacemaker computes allocation scores for each node >> (taking into acc

Re: [ClusterLabs] Antw: [EXT] Move a resource only where another has Started

2021-10-10 Thread Andrei Borzenkov
On 10.10.2021 14:29, martin doc wrote: > ok, I think I've solved my problem or at least part of it. > > The issue was I was not including a "score" in any of my constraint > statements. This meant that "INFINITY" was being used. The result is that the > scores would always be the same. >

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Coming in Pacemaker 2.1.2: new fencing configuration options

2021-10-12 Thread Andrei Borzenkov
On 12.10.2021 09:27, Ulrich Windl wrote: >>>> Andrei Borzenkov schrieb am 11.10.2021 um 11:43 in > Nachricht > : >> On Mon, Oct 11, 2021 at 9:29 AM Ulrich Windl >> wrote: >> >>>>> Also how long would such a delay be: Long enough until t

[ClusterLabs] No link to https://clusterlabs.org/pacemaker/man/ from main page

2021-10-13 Thread Andrei Borzenkov
I found page https://clusterlabs.org/pacemaker/man/ only by accident. There is no link from anywhere else in this site, at least I have not found one. Logically I expect it to be linked from Documentation section. ___ Manage your subscription:

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Andrei Borzenkov
On 20.10.2021 17:54, Ian Diddams wrote: > > > On Wednesday, 20 October 2021, 11:15:48 BST, Andrei Borzenkov > wrote: > > >> You cannot resolve split brain without fencing. This is as simple as >> that. Your pacemaker configuration (from another mail

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Andrei Borzenkov
On Wed, Oct 20, 2021 at 11:54 AM Ian Diddams via Users wrote: > > So - system logs recently show this > > ESTRELA > Oct 18th > Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 > drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from > peer node > Oct 18

Re: [ClusterLabs] iflabel removed??

2021-10-14 Thread Andrei Borzenkov
On 14.10.2021 18:31, Paul Warwicker wrote: > Hello, > > Has the ability to specify an interface alias been removed? I checked > the archives and the source at > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/IPaddr2 > and it appears to still be valid. Also here >

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-14 Thread Andrei Borzenkov
On 13.10.2021 18:01, martin doc wrote: > In the ping resource script, there's support for "dampen" in the use of > attrd_updater. > > My expectation is that it will cause "ping", "no-ping", "ping" to result in > the service being continually presented as up rather than to flap about. > > In

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-16 Thread Andrei Borzenkov
On 14.10.2021 23:51, martin doc wrote: > > > > From: Andrei Borzenkov , Friday, 15 October 2021 4:59 AM > ... >> Dampening defines delay before attributes are committed to CIB. >> Private attributes are never ever written into CIB, so

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-16 Thread Andrei Borzenkov
On 15.10.2021 13:24, Klaus Wenninger wrote: > On Fri, Oct 15, 2021 at 12:01 PM Andrei Borzenkov > wrote: > >> On Fri, Oct 15, 2021 at 9:25 AM Klaus Wenninger >> wrote: >> >>> Main pain-point here is that ping-RA allows us to configure the count of >

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Andrei Borzenkov
On 15.10.2021 09:24, Klaus Wenninger wrote: > Main pain-point here is that ping-RA allows us to configure the count of > pings sent, but it > is just using the exit-value from ping that becomes negative already when > one of the > answers is missing. Looking closer, this is not true. This is

Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-15 Thread Andrei Borzenkov
On Fri, Oct 15, 2021 at 9:25 AM Klaus Wenninger wrote: > Main pain-point here is that ping-RA allows us to configure the count of > pings sent, but it > is just using the exit-value from ping that becomes negative already when one > of the > answers is missing. Use fping instead? Which is

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.1.2: new fencing configuration options

2021-10-11 Thread Andrei Borzenkov
On Mon, Oct 11, 2021 at 9:29 AM Ulrich Windl wrote: > >> Also how long would such a delay be: Long enough until the other node > >> is > >> fenced, or long enough until the other node was fenced, booted > >> (assuming it > >> does) and is running pacemaker? > > > > The delay should be on the

Re: [ClusterLabs] Question about automating cluster unfencing.

2021-08-28 Thread Andrei Borzenkov
On Fri, Aug 27, 2021 at 8:11 PM Gerry R Sommerville wrote: > > Hey all, > > From what I see in the documentation for fabric fencing, Pacemaker requires > an administrator to login to the node to manually start and unfence the node > after some failure. > >

Re: [ClusterLabs] resource start after network reconnected

2021-11-20 Thread Andrei Borzenkov
On 21.11.2021 00:39, Strahil Nikolov via Users wrote: > Nope, as long as you use SBD's integration with pacemaker. As the 2 nodes can > communicate between each other sbd won't act. I thinkt it was an entry like > this in the /etc/sysconfig/sbd: 'SBD_PACEMAKER=yes' > That's correct except it

Re: [ClusterLabs] resource start after network reconnected

2021-11-18 Thread Andrei Borzenkov
On 18.11.2021 22:33, john tillman wrote: > > Greetings all, > > preamble: RHEL8, PCS 0.10.8, COROSYNC 3.1.0, PACEMAKER 2.0.5 > > I have a mysql resource, cloned, that is behaving the way I wanted. When > the node it is on is unplugged from the network quorum is lost and the > mysqld service

Re: [ClusterLabs] resource start after network reconnected

2021-11-19 Thread Andrei Borzenkov
On 19.11.2021 17:36, john tillman wrote: >> On 18.11.2021 22:33, john tillman wrote: >>> >>> Greetings all, >>> >>> preamble: RHEL8, PCS 0.10.8, COROSYNC 3.1.0, PACEMAKER 2.0.5 >>> >>> I have a mysql resource, cloned, that is behaving the way I wanted. >>> When >>> the node it is on is unplugged

Re: [ClusterLabs] drbd nfs slave not working

2021-11-14 Thread Andrei Borzenkov
On 14.11.2021 19:47, Neil McFadyen wrote: > I have a Ubuntu 20.04 drbd nfs pacemaker/corosync setup for 2 nodes, it > was working fine before but now I can't get the 2nd node to show as a slave > under the Clone Set. So if I do a failover both nodes show as stopped. > >

Re: [ClusterLabs] resource start after network reconnected

2021-11-19 Thread Andrei Borzenkov
On 19.11.2021 19:26, john tillman wrote: ... >>> >>> If pacemaker tries to stop resources due to out of quorum condition, you >>> could set suitable failure-timeout; this will be equivalent to using >>> "pcs >>> resource refresh". Keep in mind that pacemaker only checks for >>> failure-timeout

Re: [ClusterLabs] resource start after network reconnected

2021-11-19 Thread Andrei Borzenkov
On 19.11.2021 20:45, Ken Gaillot wrote: > On Fri, 2021-11-19 at 10:40 -0500, john tillman wrote: > > > >>> If pacemaker tries to stop resources due to out of quorum >>> condition, you >>> could set suitable failure-timeout; this will be equivalent to >>> using "pcs >>> resource refresh". Keep

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-12 Thread Andrei Borzenkov
On 12.11.2021 20:31, S Rogers wrote: > Hi, I'm hoping someone will be able to point me in the right direction. > > I am configuring a two-node active/passive cluster that utilises the > PostgreSQL PAF resource agent. Each node has two NICs, therefore the > cluster is configured with two corosync

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Andrei Borzenkov
On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger wrote: > > > > On Mon, Nov 15, 2021 at 10:37 AM S Rogers wrote: >> >> I had thought about doing that, but the cluster is then dependent on the >> external system, and if that external system was to go down or become >> unreachable for any reason

Re: [ClusterLabs] Fence node when network interface goes down

2021-11-15 Thread Andrei Borzenkov
On Mon, Nov 15, 2021 at 3:32 PM S Rogers wrote: >> >> The only solution here - as long as fencing node on external >> connectivity loss is acceptable - is modifying ethmonitor RA to fail >> monitor operation in this case. > > I was hoping to find a way to achieve the desired outcome without

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-30 Thread Andrei Borzenkov
On 29.10.2021 18:37, Ken Gaillot wrote: ... To address the original question, this is the log sequence I find most relevant: > Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker- > schedulerd[776553] > (unpack_rsc_op_failure) warning: Unexpected result

Re: [ClusterLabs] Favoured node in priority-fencing-delay

2021-11-02 Thread Andrei Borzenkov
On 03.11.2021 05:21, Alex Zarifoglu wrote: > Hello all, > I have a question about the "priority-fencing-delay" parameter. > This parameter, although very helpful, it doesn't handle the scenario where > nodes have equal priority. It is intended exactly for the scenario where nodes have equal

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-28 Thread Andrei Borzenkov
On Thu, Oct 28, 2021 at 10:30 AM Ulrich Windl wrote: > > Fencing _is_ a part of failover! > As any blanket answer this is mostly incorrect in this context. There are two separate objects here - remote host itself and pacemaker resource used to connect to and monitor state of remote host.

Re: [ClusterLabs] Cannot ping a secondary address apart from the server which it is assigned to (on Azure)

2021-10-28 Thread Andrei Borzenkov
On Thu, Oct 28, 2021 at 3:43 PM Paul Warwicker wrote: > > Hello, > > I originally posted this in the Azure forums first but have had no replies. > Trying here instead in case anyone has encountered it. > > I am trying to setup up a High Availability Cluster in Azure using CentOS 8, > Pacemaker

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-29 Thread Andrei Borzenkov
On 29.10.2021 18:16, Andrei Borzenkov wrote: > On 29.10.2021 17:53, Ken Gaillot wrote: >> On Fri, 2021-10-29 at 13:59 +, Gerry R Sommerville wrote: >>> Hey Andrei, >>> >>> Thanks for your response again. The cluster nodes and remote hosts

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-29 Thread Andrei Borzenkov
On 29.10.2021 17:53, Ken Gaillot wrote: > On Fri, 2021-10-29 at 13:59 +, Gerry R Sommerville wrote: >> Hey Andrei, >> >> Thanks for your response again. The cluster nodes and remote hosts >> each share two networks, however there is no routing between them. I >> don't suppose there is a

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-10-28 Thread Andrei Borzenkov
On 28.10.2021 20:13, Gerry R Sommerville wrote: > > What we also found to be interesting is that if the cluster is only using a > single heartbeat ring, then srv-2 will get fenced instead, and the So as already suspected you did not actually isolate the node at all. > pacemaker-remote

Re: [ClusterLabs] Inquiry - remote node fencing issue

2021-10-27 Thread Andrei Borzenkov
On Tue, Oct 26, 2021 at 11:09 PM Janghyuk Boo wrote: > > > > Dear Community , > > > > Thank you Ken for your reply last time. > > > > I attached the log messages as requested from the last thread. > > > > I have a Pacemaker cluster with two cluster nodes with two network interfaces > each, and

Re: [ClusterLabs] Antw: [EXT] Inquiry - remote node fencing issue

2021-11-05 Thread Andrei Borzenkov
On 05.11.2021 01:20, Ken Gaillot wrote: >> >> There are two issues discussed in this thread. >> >> 1. Remote node is fenced when connection with this node is lost. For >> all >> I can tell this is intended and expected behavior. That was the >> original >> question. > > It's expected only because

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
On 31.10.2021 16:48, Strahil Nikolov via Users wrote: > Have you checked the options in /etc/sysconfig/pacemaker as recommended in  > https://documentation.suse.com/sle-ha/15-SP3/html/SLE-HA-all/app-ha-troubleshooting.html#sec-ha-troubleshooting-log > ? > And where exactly it explains how to

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
t or all of the > pacemaker processes. > > This might be the environment variable you are looking for ? > It sets log level to debug, while I need trace. > Regards, > > Le 31 octobre 2021 09:20:00 GMT+01:00, Andrei Borzenkov > a écrit : >> I think it worked in the past by pa

[ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
I think it worked in the past by passing a lot of -VVV when starting pacemaker. It does not seem to work now. I can call /usr/sbin/pacemakerd -..., but it does pass options further to children it starts. So every other daemon is started without any option and with default log

Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Andrei Borzenkov
On 31.10.2021 19:37, Strahil Nikolov wrote: > At least it's worth trying (/etc/sysconfig/pacemaker):PCMK_trace_files=* commit 85040eb19b9405464b01a7e67eb6769d2a03c611 Author: Ken Gaillot Date: Fri Jun 19 17:49:22 2020 -0500 Doc: sysconfig: remove outdated reference to wildcards in

Re: [ClusterLabs] Cannot ping a secondary address apart from the server which it is assigned to (on Azure)

2021-10-31 Thread Andrei Borzenkov
On 01.11.2021 01:56, Paul Warwicker wrote: > On 28/10/2021 14:30, Andrei Borzenkov wrote: >> For virtual IP you can (should?) use Azure >> load balancers - basically,  you create a pool of one address, Azure >> probes each node and detects which node has IP active. >>

Re: [ClusterLabs] Question: Mount Monitoring for Non-shared File-system

2021-12-07 Thread Andrei Borzenkov
On 07.12.2021 21:35, Asseel Sidique wrote: > Hi Everyone, > I'm looking for some insight on what the best way is to configure mount > monitoring for a cloned database resource. > Consider the resource model below: > * Clone Set: database_1-clone [database_1] (promotable): > * Masters: [

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-20 Thread Andrei Borzenkov
On 21.07.2021 07:28, Strahil Nikolov via Users wrote: > Hi, > consider using a 3rd system as a Q disk. What was not clear in "Quorum is a different concept and doesn't remove the need for fencing"? > Also, you can use iscsi from that node as a SBD device, so you will have > proper fencing .If

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-21 Thread Andrei Borzenkov
On Wed, Jul 21, 2021 at 11:50 AM Frank D. Engel, Jr. wrote: > > OpenVMS can do this sort of thing without a requirement for fencing (you > still need a third disk as a quorum device in a 2-node cluster), but > Linux (at least in its current form) cannot. From what I can tell the > fencing

Re: [ClusterLabs] [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-21 Thread Andrei Borzenkov
pad.com/managing_computers/2007/10/split-brain-quo.html > There are eight possible states that I tried to illustrate on the attached > sketch (S="Split Brain", "Q=Quorum, F=Fencing). > > ;-) > > Regards, > Ulrich > > > >>> Andrei Borzenko

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-22 Thread Andrei Borzenkov
On Thu, Jul 22, 2021 at 1:05 PM Jehan-Guillaume de Rorthais wrote: > To do some rewording in regard with the current topic: if Pacemaker is able to > stop its resources after a quorum lost, it will not reboot, no "death" either. > And how exactly is the remaining quorate partition supposed to

Re: [ClusterLabs] Two node cluster without fencing and no split brain?

2021-07-22 Thread Andrei Borzenkov
On Thu, Jul 22, 2021 at 12:43 PM Jehan-Guillaume de Rorthais wrote: > > On Wed, 21 Jul 2021 12:45:40 -0400 > Digimer wrote: > > > On 2021-07-21 3:26 a.m., Jehan-Guillaume de Rorthais wrote: > > > Hi, > > > > > > On Wed, 21 Jul 2021 04:28:30 + (UTC) > > > Strahil Nikolov via Users wrote: > >

Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-26 Thread Andrei Borzenkov
On Mon, Jul 26, 2021 at 4:53 PM john tillman wrote: > >> > >> Maybe explain how it should work: > >> If the two nodes cannot rech each other, but each can reach the ping > >> node, > >> which node has the quorum then? > >> > > > > Guess both - which is what is played down as 'disadvantage' in the

Re: [ClusterLabs] pcs stonith update problems

2021-07-15 Thread Andrei Borzenkov
On 16.07.2021 01:02, Digimer wrote: > Hi all, > > I've got a predicament... I want to update a stonith resource to > remove an argument. Specifically, when resource move nodes, I want to > change the stonith delay to favour the new host. This involves adding > the 'delay="x"' argument to one

Re: [ClusterLabs] What's wrong with IPsrcaddr?

2022-03-16 Thread Andrei Borzenkov
On 16.03.2022 12:24, ZZ Wave wrote: > Hello. I'm trying to implement floating IP with pacemaker but I can't > get IPsrcaddr to work correctly. I want a following thing - floating > IP and its route SRC is started on node1. If node1 loses network > connectivity to node2, node1 should instantly

Re: [ClusterLabs] What's wrong with IPsrcaddr?

2022-03-17 Thread Andrei Borzenkov
There are both .23 alias and def route src. After a network > failure, there is NO default route at all on both nodes and IPsrcaddr > fails, as it requires default route. > I already explained above why IPsrcaddr was not migrated. > > ср, 16 мар. 2022 г. в 19:23, Andrei Borzenkov

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Andrei Borzenkov
On 15.03.2022 19:35, john tillman wrote: > Hello, > > I'm trying to guarantee that all my cloned drbd resources start on the > same node and I can't figure out the syntax of the constraint to do it. > > I could nominate one of the drbd resources as a "leader" and have all the > others follow it.

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Andrei Borzenkov
On 15.03.2022 21:53, john tillman wrote: >> On 15.03.2022 19:35, john tillman wrote: >>> Hello, >>> >>> I'm trying to guarantee that all my cloned drbd resources start on the >>> same node and I can't figure out the syntax of the constraint to do it. >>> >>> I could nominate one of the drbd

Re: [ClusterLabs] Filesystem resource agent w/ filesystem attribute 'noauto'

2022-03-09 Thread Andrei Borzenkov
On 09.03.2022 18:45, Asseel Sidique wrote: > Hi Team, > > My question is regarding the filesystem resource agent. In the filesystem > resource > agent > , there is a comment that states: > > # Do not put this

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-24 Thread Andrei Borzenkov
On Thu, Feb 24, 2022 at 1:17 PM Jan Friesse wrote: > > On 24/02/2022 10:28, Viet Nguyen wrote: > > Hi, > > > > Thank you so so much for your help. May i ask a following up question: > > > > For the option of having one big cluster with 4 nodes without booth, then, > > if one site (having 2 nodes)

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Andrei Borzenkov
On 16.02.2022 14:35, Lentes, Bernd wrote: > > > - On Feb 16, 2022, at 12:52 AM, kgaillot kgail...@redhat.com wrote: > > >>> Any idea ? >>> What is about that transition 128, which is aborted ? >> >> A transition is the set of actions that need to be taken in response to >> current

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Andrei Borzenkov
On 16.02.2022 20:48, Andrei Borzenkov wrote: > > I guess the real question here is why "Transition aborted" is logged although > transition apparently continues. Transition 128 started at 20:54:30 and > completed > at 21:04:26, but there were multiple "Tr

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Andrei Borzenkov
On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl wrote: > > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl wrote: > > ... > > > > > > So what happens most likely is that the watchdog terminates the kdump. > > > In that case all the mess with fence_kdump won't help, right? > > > > You can configure

Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-23 Thread Andrei Borzenkov
On 23.03.2022 08:30, Balotra, Priyanka wrote: > Hi All, > > We have a scenario on SLES 12 SP3 cluster. > The scenario is explained as follows in the order of events: > > * There is a 2-node cluster (FILE-1, FILE-2) > * The cluster and the resources were up and running fine initially . >

Re: [ClusterLabs] Antw: [EXT] Re: Failed migration causing fencing loop

2022-04-03 Thread Andrei Borzenkov
On 31.03.2022 14:02, Ulrich Windl wrote: "Gao,Yan" schrieb am 31.03.2022 um 11:18 in Nachricht > <67785c2f-f875-cb16-608b-77d63d9b0...@suse.com>: >> On 2022/3/31 9:03, Ulrich Windl wrote: >>> Hi! >>> >>> I just wanted to point out one thing that hit us with SLES15 SP3: >>> Some failed live

Re: [ClusterLabs] Order constraint with a timeout?

2022-03-29 Thread Andrei Borzenkov
On 29.03.2022 15:38, john tillman wrote: >> On 29.03.2022 00:26, john tillman wrote: On Mon, 2022-03-28 at 14:03 -0400, john tillman wrote: > Greetings all, > > Is it possible to have an order constraint with a timeout? I can't > find > one but perhaps I am using the

Re: [ClusterLabs] Order constraint with a timeout?

2022-03-28 Thread Andrei Borzenkov
On 29.03.2022 00:26, john tillman wrote: >> On Mon, 2022-03-28 at 14:03 -0400, john tillman wrote: >>> Greetings all, >>> >>> Is it possible to have an order constraint with a timeout? I can't >>> find >>> one but perhaps I am using the wrong keywords in google. >>> >>> I have several Filesystem

Re: [ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Andrei Borzenkov
On Thu, Jan 27, 2022 at 5:10 PM Ulrich Windl wrote: > > Any better ideas anyone? > Perform online upgrade. Any reason you need to do an offline upgrade in the first place? ___ Manage your subscription:

Re: [ClusterLabs] Antw: Antw: [EXT] Re: heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-28 Thread Andrei Borzenkov
On Fri, Jan 28, 2022 at 11:00 AM Ulrich Windl wrote: > > >>> "Ulrich Windl" schrieb am 28.01.2022 > um > 08:51 in Nachricht <61f3a06602a100047...@gwsmtp.uni-regensburg.de>: > >>>> Andrei Borzenkov schrieb am 28.01.2022 um 06:38 in >

Re: [ClusterLabs] Coming in Pacemaker 2.1.3: multiple-active=stop_unexpected

2022-04-08 Thread Andrei Borzenkov
On 08.04.2022 20:16, Ken Gaillot wrote: > Hi all, > > I'm hoping to have the first release candidate for Pacemaker 2.1.3 > available in a couple of weeks. > > One of the new features will be a new possible value for the "multiple- > active" resource meta-attribute, which specifies how the

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Andrei Borzenkov
On Wed, Aug 30, 2023 at 3:34 PM David Dolan wrote: > > Hi All, > > I'm running Pacemaker on Centos7 > Name: pcs > Version : 0.9.169 > Release : 3.el7.centos.3 > Architecture: x86_64 > > > I'm performing some cluster failover tests in a 3 node cluster. We have 3 > resources in the

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger wrote: > > > > On Mon, Sep 4, 2023 at 12:45 PM David Dolan wrote: >> >> Hi Klaus, >> >> With default quorum options I've performed the following on my 3 node cluster >> >> Bring down cluster services on one node - the running services migrate to >>

Re: [ClusterLabs] PAF / PGSQLMS on Ubuntu

2023-09-07 Thread Andrei Borzenkov
On Thu, Sep 7, 2023 at 5:01 PM lejeczek via Users wrote: > > Hi guys. > > I'm trying to set ocf_heartbeat_pgsqlms agent but I get: > ... > Failed Resource Actions: > * PGSQL-PAF-5433 stop on ubusrv3 returned 'invalid parameter' because > 'Parameter "recovery_target_timeline" MUST be set to

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Andrei Borzenkov
On 30.08.2023 19:23, David Dolan wrote: Use fencing. Quorum is not a replacement for fencing. With (reliable) fencing you can simply run pacemaker with no-quorum-policy=ignore. The practical problem is that usually the last resort that will work in all cases is SBD + suicide and SBD cannot

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
On Mon, Sep 4, 2023 at 1:45 PM David Dolan wrote: > > Hi Klaus, > > With default quorum options I've performed the following on my 3 node cluster > > Bring down cluster services on one node - the running services migrate to > another node > Wait 3 minutes > Bring down cluster services on one of

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
On Mon, Sep 4, 2023 at 2:25 PM Klaus Wenninger wrote: > > > Or go for qdevice with LMS where I would expect it to be able to really go > down to > a single node left - any of the 2 last ones - as there is still qdevice.# > Sry for the confusion btw. > According to documentation, "LMS is also

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov
> last_man_standing. > Then, I should set up another server with qdevice and configure that using > the LMS algorithm. > > Thanks > David > > On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger wrote: >> >> >> >> On Mon, Sep 4, 2023 at

Re: [ClusterLabs] Mutually exclusive resources ?

2023-09-27 Thread Andrei Borzenkov
On Wed, Sep 27, 2023 at 3:21 PM Adam Cecile wrote: > > Hello, > > > I'm struggling to understand if it's possible to create some kind of > constraint to avoid two different resources to be running on the same host. > > Basically, I'd like to have floating IP "1" and floating IP "2" always being

Re: [ClusterLabs] Using cluster without fencing

2023-10-16 Thread Andrei Borzenkov
On Mon, Oct 16, 2023 at 9:28 AM Sergey Cherukhin wrote: > > Hello! > > I use Postgresql+Pacemaker+Corosync 3 nodes cluster with 2 Postgresql > instances in synchronous replication mode on two high performance nodes and > Pacemaker+Corosync on the third low performance node for quorum only. At

Re: [ClusterLabs] pacemaker:start-delay

2023-08-18 Thread Andrei Borzenkov
On Fri, Aug 18, 2023 at 12:13 PM Mr.R via Users wrote: > > Hi all, > > There is a problem with the start-delay of monitor during the process of > configuring and starting resources. > > For example, there is the result of resource config. > > Resource: d1 (class=ocf provider=pacemaker type=Dummy)

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-21 Thread Andrei Borzenkov
On 21.04.2022 18:26, john tillman wrote: >> Dne 20. 04. 22 v 20:21 john tillman napsal(a): On 20.04.2022 19:53, john tillman wrote: > I have a two node cluster that won't start any resources if only one > node > is booted; the pacemaker service does not start. > > Once the

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Andrei Borzenkov
On Fri, Apr 22, 2022 at 12:05 PM Tomas Jelinek wrote: > > As discussed in other branches of this thread, you need to figure out > why pacemaker is not starting. Even if one node is not running, corosync > and pacemaker are expected to be able to start on the other node. Well, when trying to

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Andrei Borzenkov
On 22.04.2022 16:01, john tillman wrote: >> On Fri, Apr 22, 2022 at 12:05 PM Tomas Jelinek >> wrote: >>> >>> As discussed in other branches of this thread, you need to figure out >>> why pacemaker is not starting. Even if one node is not running, corosync >>> and pacemaker are expected to be able

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-20 Thread Andrei Borzenkov
On 20.04.2022 19:53, john tillman wrote: > I have a two node cluster that won't start any resources if only one node > is booted; the pacemaker service does not start. > > Once the second node boots up, the first node will start pacemaker and the > resources are started. All is well. But I

Re: [ClusterLabs] More pacemaker oddities while stopping DC

2022-05-27 Thread Andrei Borzenkov
On 25.05.2022 09:47, Gao,Yan via Users wrote: > On 2022/5/25 8:10, Ulrich Windl wrote: >> Hi! >> >> We are still suffering from kernel RAM corruption on the Xen hypervisor when >> a VM or the hypervisor is doing I/O (three months since the bug report at >> SUSE, but no fix or workaround meaning

Re: [ClusterLabs] how does the VirtualDomain RA know with which options it's called ?

2022-05-12 Thread Andrei Borzenkov
On 12.05.2022 21:03, Lentes, Bernd wrote: > Hi, > > from my understanding the resource agents in > /usr/lib/ocf/resource.d/heartbeat are quite similar > to the old scripts in /etc/init.d started by init. > Init starts these scripts with "script [start|stop|reload|restart|status]". > Inside the

Re: [ClusterLabs] Question regarding the security of corosync

2022-06-21 Thread Andrei Borzenkov
On 22.06.2022 02:27, Antony Stone wrote: > On Friday 17 June 2022 at 11:39:14, Mario Freytag wrote: > >> I’d like to ask about the security of corosync. We’re using a Proxmox HA >> setup in our testing environment and need to confirm it’s compliance with >> PCI guidelines. >> >> We have a few

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov
and sqlserver are running as > different containers > > Regards > Sridharan > > > > > > > > > > > > > On Wed, 8 Jun 2022 at 18:52, Andrei Borzenkov wrote: > >> On Wed, Jun 8, 2022 at 4:01 PM Sridhar K wrote: >>> &g

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov
On Wed, Jun 8, 2022 at 4:01 PM Sridhar K wrote: > > Hi Team, > > Required guidance w.r.t below problem statement > > Need to have a HA setup for SQLServer running as a docker container and HA > managed by the Pacemaker which is running as a separate docker container. > It is very unlikely to be

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov
On 08.06.2022 17:01, Ken Gaillot wrote: > On Wed, 2022-06-08 at 18:31 +0530, Sridhar K wrote: >> Hi Team, >> >> Required guidance w.r.t below problem statement >> >> Need to have a HA setup for SQLServer running as a docker container >> and HA managed by the Pacemaker which is running as a

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Andrei Borzenkov
For test purpose try to use script that loops until sbd is actually stopped for ExecStop. Note that systemd strongly recommends to use synchronous command for ExecStop (we may argue that this should be handled by service manager itself, but well ...). > > Zoran > > - Original Message ---

Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Andrei Borzenkov
On 07.06.2022 11:26, Zoran Bošnjak wrote: > > In the test scenario, the dummy resource is currently running on node1. I > have simulated node failure by unplugging the ipmi AND host network > interfaces from node1. The result was that node1 gets rebooted (by watchdog), > but the rest of the

<    1   2   3   4   5   6   7   >