Re: [ClusterLabs] Pacemaker problems with pingd

2021-08-05 Thread Klaus Wenninger
On Wed, Aug 4, 2021 at 5:30 PM Janusz Jaskiewicz < janusz.jaskiew...@gmail.com> wrote: > Hello. > > Please forgive the length of this email but I wanted to provide as much > details as possible. > > I'm trying to set up a cluster of two nodes for my service. > I have a problem with a scenario

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-03 Thread Klaus Wenninger
On Tue, Aug 3, 2021 at 10:41 AM Antony Stone wrote: > On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote: > > > Here is the example I had promised: > > > > pcs node attribute server1 city=LA > > pcs node attribute server2 city=NY > > > > # Don't run on any node that is not in LA > > pcs

Re: [ClusterLabs] Antw: Re: [EXT] Re: Two node cluster without fencing and no split brain?

2021-07-23 Thread Klaus Wenninger
On Fri, Jul 23, 2021 at 8:55 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > >>> "john tillman" schrieb am 22.07.2021 um 16:48 in > Nachricht > <1175ffcec0033015e13d11d7821d5acb.squir...@mail.panix.com>: > > There was a lot of discussion on this topic which might have overshadowed

Re: [ClusterLabs] Antw: [EXT] Re: unexpected fenced node and promotion of the new master PAF ‑ postgres

2021-07-14 Thread Klaus Wenninger
oring unknown cluster health > > Jul 13 20:42:15 ltaoperdbs02 sbd[185357]: notice: inquisitor_child: > > Servant cluster is healthy (age: 0) > > Jul 13 20:42:15 ltaoperdbs02 sbd[185357]: notice: watchdog_init: Using > > watchdog device '/dev/watchdog' > >

Re: [ClusterLabs] unexpected fenced node and promotion of the new master PAF - postgres

2021-07-14 Thread Klaus Wenninger
So at least for sbd there don't seem to be systematic issues switching to rt-scheduling as we would see it moaning in the logs above. > this is happening to all 3 nodes, any toughts? > > Thanks for helping, have as good day > > Damiano > > > Il giorno mer 14 lug 2021 all

Re: [ClusterLabs] QDevice vs 3rd host for majority node quorum

2021-07-14 Thread Klaus Wenninger
On Tue, Jul 13, 2021 at 9:56 PM Strahil Nikolov wrote: > In some cases the third location has a single IP and it makes sense to use > it as QDevice. If it has multiple network connections to that location - > use a full blown node . If you are intending to use watchdog-fencing via sbd with a

Re: [ClusterLabs] unexpected fenced node and promotion of the new master PAF - postgres

2021-07-14 Thread Klaus Wenninger
; > Jul 13 00:40:47 ltaoperdbs04 stonith-ng[77928]: notice: Operation > 'reboot' > > targeting ltaoperdbs02 on ltaoperdbs03 for > crmd.228700@ltaoperdbs03.f5d882d5: > > OK > > > > i really appreciate the help and what you think about it. > > > > PS

Re: [ClusterLabs] unexpected fenced node and promotion of the new master PAF - postgres

2021-07-13 Thread Klaus Wenninger
On Tue, Jul 13, 2021 at 1:43 PM damiano giuliani < damianogiulian...@gmail.com> wrote: > Hi guys, > im back with some PAF postgres cluster problems. > tonight the cluster fenced the master node and promote the PAF resource to > a new node. > everything went fine, unless i really dont know why. >

Re: [ClusterLabs] Antw: [EXT] VIP monitor Timed Out

2021-07-05 Thread Klaus Wenninger
Using DHCP? Maybe a glitch/issue during renewal ... but elaborate monitoring as suggested should show that ... On Mon, Jul 5, 2021 at 9:03 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > Hi! > > See "ip_served" and "find_interface" (essentially "$IP2UTIL -o -f $FAMILY > addr >

Re: [ClusterLabs] Antw: [EXT] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-16 Thread Klaus Wenninger
On Wed, Jun 16, 2021 at 11:26 AM Klaus Wenninger wrote: > > > On Wed, Jun 16, 2021 at 10:47 AM Roger Zhou wrote: > >> >> On 6/16/21 3:03 PM, Andrei Borzenkov wrote: >> >> > >> >>> >> >>> We thought that access to storage wa

Re: [ClusterLabs] Antw: [EXT] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-16 Thread Klaus Wenninger
On Wed, Jun 16, 2021 at 10:47 AM Roger Zhou wrote: > > On 6/16/21 3:03 PM, Andrei Borzenkov wrote: > > > > >>> > >>> We thought that access to storage was restored, but one step was > >>> missing so devices appeared empty. > >>> > >>> At this point I tried to restart the pacemaker. But as soon

Re: [ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-16 Thread Klaus Wenninger
On Tue, Jun 15, 2021 at 10:41 PM Strahil Nikolov wrote: > Maybe you can try: > > while true ; do echo '0' > /proc/sys/kernel/nmi_watchdog ; sleep 1 ; done > > and in another shell stop pacemaker and sbd. > > I guess the only way to easily reproduce is with sbd over iscsi. > > Best Regards, >

[ClusterLabs] sbd v1.5.0

2021-06-14 Thread Klaus Wenninger
Hi sbd - developers & users! Thanks to everybody for contributing to tests and further development. Changes since 1.4.2 - default to resource-syncing with pacemaker in spec-file and configure.ac This default has to match between sbd and pacemaker and thus qualifies this release for a

Re: [ClusterLabs] Cluster Stopped, No Messages?

2021-06-07 Thread Klaus Wenninger
- From: Users On Behalf Of Klaus Wenninger Sent: Monday, May 31, 2021 12:54 AM To: users@clusterlabs.org Subject: Re: [ClusterLabs] Cluster Stopped, No Messages? On 5/29/21 12:21 AM, Strahil Nikolov wrote: I agree -> fencing is mandatory. Agreed that with proper fencing setup the clus

Re: [ClusterLabs] Pacemaker not issuing start command intermittently

2021-05-31 Thread Klaus Wenninger
On 5/29/21 12:05 AM, Strahil Nikolov wrote: Most RA scripts are writen in bash. Usually you can change the shebang to '!#/usr/bin/bash -x' or you can set trace_ra=1 via 'pcs resource update RESOURCE trace_ra=1 trace_file=/somepath'. If you don't define trace_file, it should create them in

Re: [ClusterLabs] Cluster Stopped, No Messages?

2021-05-30 Thread Klaus Wenninger
On 5/29/21 12:21 AM, Strahil Nikolov wrote: I agree -> fencing is mandatory. Agreed that with proper fencing setup the cluster wouldn'thave run into that state. But still it might be interesting to find out what has happened. Not seeing anything in the log snippet either. Assuming you are

Re: [ClusterLabs] 32 nodes pacemaker cluster setup issue

2021-05-19 Thread Klaus Wenninger
On 5/19/21 1:45 PM, Klaus Wenninger wrote: On 5/19/21 12:54 PM, S Sathish S wrote: Hi Klaus, pacemaker/corosync we generated our own build from clusterlab source code. [root@node1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) [root@node1 ~]# uname -r

Re: [ClusterLabs] 32 nodes pacemaker cluster setup issue

2021-05-19 Thread Klaus Wenninger
On 5/19/21 12:54 PM, S Sathish S wrote: Hi Klaus, pacemaker/corosync we generated our own build from clusterlab source code. [root@node1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) [root@node1 ~]# uname -r 3.10.0-693.82.1.el7.x86_64 [root@node1 ~]#

Re: [ClusterLabs] 32 nodes pacemaker cluster setup issue

2021-05-19 Thread Klaus Wenninger
clusters, so I can't offer direct advice. Perhaps this thread is helpful? https://lists.clusterlabs.org/pipermail/users/2016-August/010999.html You might also find useful advice in 'man 5 corosync.conf'. -- Klaus Wenninger Senior Software Engineer, EMEA ENG Base Operating Systems Red Hat kwenn

Re: [ClusterLabs] DRBD + VDO HowTo?

2021-05-16 Thread Klaus Wenninger
Did you try VDO in sync-mode for the case the flush-fua stuff isn't working through the layers? Did you check that VDO-service is disabled and solely under pacemaker-control and that the dependencies are set correctly? Klaus On 5/17/21 6:17 AM, Eric Robinson wrote: Yes, DRBD is working fine.

Re: [ClusterLabs] bit of wizardry bit of trickery needed.

2021-05-11 Thread Klaus Wenninger
On 5/10/21 7:16 PM, lejeczek wrote: On 10/05/2021 17:04, Andrei Borzenkov wrote: On 10.05.2021 16:48, lejeczek wrote: Hi guys Before I begin my adventure with this I though I would ask experts if something like below is possible. resourceA if started on nodeA, then nodes B & C start

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-05-10 Thread Klaus Wenninger
On 5/10/21 2:32 PM, Antony Stone wrote: Hi. I'm using corosync 3.0.1 and pacemaker 2.0.1, currently in the following way: I have two separate clusters of three machines each, one in a data centre in city A, and one in a data centre in city B. Several of the resources being managed by these

Re: [ClusterLabs] Antw: [EXT] Re: VirtualDomain & "deeper" monitors - what/how?

2021-05-03 Thread Klaus Wenninger
On 5/3/21 8:57 AM, Ulrich Windl wrote: Andrei Borzenkov schrieb am 30.04.2021 um 18:24 in Nachricht : On 30.04.2021 17:57, Ken Gaillot wrote: On Fri, 2021‑04‑30 at 11:00 +0100, lejeczek wrote: Hi guys I'd like to ask around for thoughts & suggestions on any semi/official ways to monitor

Re: [ClusterLabs] Preventing multiple resources from moving at the same time.

2021-04-30 Thread Klaus Wenninger
On 4/30/21 4:04 PM, Matthew Schumacher wrote: On 4/21/21 11:04 AM, Matthew Schumacher wrote: On 4/21/21 10:21 AM, Andrei Borzenkov wrote: If I set the stickiness to 100 then it's a race condition, many times we get the storage layer migrated without VirtualDomain noticing, but if the

Re: [ClusterLabs] Stopping the last node with pcs

2021-04-28 Thread Klaus Wenninger
On 4/28/21 4:10 PM, Ken Gaillot wrote: On Tue, 2021-04-27 at 23:23 -0400, Digimer wrote: Hi all, I noticed something odd. [root@an-a02n01 ~]# pcs cluster status Cluster Status: Cluster Summary: * Stack: corosync * Current DC: an-a02n01 (version 2.0.4-6.el8_3.2-2deceaa3ae) -

Re: [ClusterLabs] Preventing multiple resources from moving at the same time.

2021-04-21 Thread Klaus Wenninger
On 4/20/21 11:43 PM, Ken Gaillot wrote: On Tue, 2021-04-20 at 13:43 -0700, Matthew Schumacher wrote: I have a resource that more or less makes or breaks everything (storage). When I move it, it only takes a few seconds and everything keeps working due to timeouts, however, if another resource

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

2021-04-16 Thread Klaus Wenninger
On 4/16/21 8:09 AM, Steffen Vinther Sørensen wrote: On Fri, Apr 16, 2021 at 6:56 AM Andrei Borzenkov wrote: On 15.04.2021 23:09, Steffen Vinther Sørensen wrote: On Thu, Apr 15, 2021 at 3:39 PM Klaus Wenninger wrote: On 4/15/21 3:26 PM, Ulrich Windl wrote: Steffen Vinther Sørensen schrieb

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

2021-04-15 Thread Klaus Wenninger
On 4/15/21 3:26 PM, Ulrich Windl wrote: Steffen Vinther Sørensen schrieb am 15.04.2021 um 14:56 in Nachricht : On Thu, Apr 15, 2021 at 2:29 PM Ulrich Windl wrote: Steffen Vinther Sørensen schrieb am 15.04.2021 um 13:10 in Nachricht : Hi there, In this 3 node cluster, node03 been offline

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

2021-04-12 Thread Klaus Wenninger
On 4/9/21 5:13 PM, Klaus Wenninger wrote: On 4/9/21 4:04 PM, Klaus Wenninger wrote: On 4/9/21 3:45 PM, Klaus Wenninger wrote: On 4/9/21 3:36 PM, Klaus Wenninger wrote: On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

2021-04-09 Thread Klaus Wenninger
On 4/9/21 4:04 PM, Klaus Wenninger wrote: On 4/9/21 3:45 PM, Klaus Wenninger wrote: On 4/9/21 3:36 PM, Klaus Wenninger wrote: On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux enabled? Respectively do you see any related

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

2021-04-09 Thread Klaus Wenninger
On 4/9/21 3:45 PM, Klaus Wenninger wrote: On 4/9/21 3:36 PM, Klaus Wenninger wrote: On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Selinux is not enabled. Isn't

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

2021-04-09 Thread Klaus Wenninger
On 4/9/21 3:36 PM, Klaus Wenninger wrote: On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: Hi Klaus, Thanks for your comment. Hmm ... is that with selinux enabled? Respectively do you see any related avc messages? Selinux is not enabled. Isn't crm_mon caused by not returning a response

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

2021-04-09 Thread Klaus Wenninger
operation. Best Regards, Hideo Yamauchi. - Original Message - From: Klaus Wenninger To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed Cc: Date: 2021/4/9, Fri 21:12 Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

2021-04-09 Thread Klaus Wenninger
On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Ken, Hi All, In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed. However, pacemaker included in RHEL8.4beta fails to execute this crm_mon.  - The problem also occurs on github

Re: [ClusterLabs] how to setup single node cluster

2021-04-08 Thread Klaus Wenninger
On 4/8/21 8:16 AM, Reid Wahl wrote: On Wed, Apr 7, 2021 at 9:46 PM Strahil Nikolov > wrote: I always though that the setup is the same, just the node count is only one. I guess you need pcs, corosync + pacemaker. If RH is going to support it,

Re: [ClusterLabs] Antw: [EXT] staggered resource start/stop

2021-04-06 Thread Klaus Wenninger
On 3/29/21 12:47 PM, Reid Wahl wrote: On Mon, Mar 29, 2021 at 3:35 AM Ulrich Windl > wrote: >>> d tbsky mailto:tbs...@gmail.com>> schrieb am 29.03.2021 um 04:01 in Nachricht mailto:t_moml5ecx8ay3gtcfgmofkd...@mail.gmail.com>>: > Hi:

Re: [ClusterLabs] Antw: [EXT] Re: Live migration possible with KSM ?

2021-04-06 Thread Klaus Wenninger
On 4/6/21 9:34 AM, Ulrich Windl wrote: "Lentes, Bernd" schrieb am 31.03.2021 um 15:58 in Nachricht <537587707.120787942.1617199127012.javamail.zim...@helmholtz-muenchen.de>: - On Mar 30, 2021, at 7:54 PM, hunter86 bg hunter86...@yahoo.com wrote: Keep in mind that KSM is highly cpu

Re: [ClusterLabs] staggered resource start/stop

2021-04-06 Thread Klaus Wenninger
On 4/6/21 10:23 AM, d tbsky wrote: Klaus Wenninger Guess that heavily depends on what you are running inside your VMs. If the services inside don't need each other or anything provided by the other cluster-resources (or other way round) or everything is synchronizing independently from

Re: [ClusterLabs] Live migration possible with KSM ?

2021-04-06 Thread Klaus Wenninger
On 3/30/21 7:54 PM, Strahil Nikolov wrote: Keep in mind that KSM is highly cpu intensive and is most suitable for same type of VMs,so similar memory pages will be merged until a change happen (and that change is allocated elsewhere). In oVirt migration is possible with KSM actively working,

Re: [ClusterLabs] staggered resource start/stop

2021-04-05 Thread Klaus Wenninger
On 3/31/21 11:11 AM, d tbsky wrote: Klaus Wenninger In this case it might be useful not to wait some defined time hoping startup of the VM would have gone far enough that the IO load has already decayed enough. What about a resource that checks for something running inside the VM

Re: [ClusterLabs] staggered resource start/stop

2021-03-29 Thread Klaus Wenninger
On 3/29/21 8:44 AM, d tbsky wrote: Reid Wahl An order constraint set with kind=Serialize (which is mentioned in the first reply to the thread you linked) seems like the most logical option to me. You could serialize a set of resource sets, where each inner set contains a VirtualDomain

Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-29 Thread Klaus Wenninger
h a reverse proxy is up to you. The Clusters from Scratch example shows how to do it with a web server, to present the concepts, and you can tailor that to any service that needs to be clustered. On Thursday, March 25, 2021, 05:20:33 PM GMT+4:30, Klaus Wenninger < kwenn...@redhat.com&

Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-25 Thread Klaus Wenninger
ng" IP address, which means the cluster can sometimes run it on the first node and sometimes on the second node. This IP address is the one that users will use to contact the service. That way, users always have a single address that they use, no matter which node is providing the service.

Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-23 Thread Klaus Wenninger
2021, 01:03:39 PM GMT+4:30, Klaus Wenninger wrote: On 3/23/21 9:13 AM, Jason Long wrote: Thank you. But:  https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch06.html ? The floating IP address is:  https://www.clusterlabs.org/pacemaker/doc/en-US/Pacem

Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-23 Thread Klaus Wenninger
On 3/23/21 9:13 AM, Jason Long wrote: Thank you. But:  https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch06.html ? The floating IP address is:  https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_add_a_resource.html

Re: [ClusterLabs] The proxy server received an invalid response from an upstream server.

2021-03-22 Thread Klaus Wenninger
.d/status.conf: No such file or directory If you're using Ubuntu, I believe it's in a different location -- somewhere in /etc/apache2 if memory serves. On Tuesday, March 16, 2021, 07:20:32 PM GMT+3:30, Klaus Wenninger wrote: On 3/16/21 3:18 PM, Ken Gaillot wrote: On Tue, 202

Re: [ClusterLabs] Q: Is there any plan for pcs to support corosync-notifyd?

2021-03-18 Thread Klaus Wenninger
On 3/18/21 9:29 AM, 井上和徳 wrote: On Tue, Mar 16, 2021 at 10:23 PM Jehan-Guillaume de Rorthais wrote: On Tue, 16 Mar 2021, 09:58 井上和徳, wrote: Hi! Cluster (corosync and pacemaker) can be started with pcs, but corosync-notifyd needs to be started separately with systemctl, which is not easy to

Re: [ClusterLabs] The proxy server received an invalid response from an upstream server.

2021-03-16 Thread Klaus Wenninger
On 3/16/21 3:18 PM, Ken Gaillot wrote: On Tue, 2021-03-16 at 09:42 +, Jason Long wrote: Hello, I want to launch a Clustering for my Apache Web Server. I have three servers: 1- Main server that acts as a Reverse Proxy 2- The secondary server that when my main server stopped, work as a

Re: [ClusterLabs] maximum token value (knet)

2021-03-15 Thread Klaus Wenninger
On 3/13/21 12:55 AM, Strahil Nikolov wrote: I will try to get into the details on monday, when I have access to the cluster again. I guess the /var/log/cluster/corosync.log and /etc/corosync/corosync.conf are the most interesting. So far, I have 6 node cluster with separate VLANs for HANA

Re: [ClusterLabs] Q: callback hooks for sbd?

2021-03-12 Thread Klaus Wenninger
, Klaus Wenninger написа: On 3/11/21 12:30 PM, Ulrich Windl wrote: Hi! I wonder: Is it possible to register some callback to sbd that is called whenever a fencing operation is to be executed? I would like to run some command on the node that is going to be fenced. Don't know of anything

Re: [ClusterLabs] Q: callback hooks for sbd?

2021-03-11 Thread Klaus Wenninger
On 3/11/21 12:30 PM, Ulrich Windl wrote: Hi! I wonder: Is it possible to register some callback to sbd that is called whenever a fencing operation is to be executed? I would like to run some command on the node that is going to be fenced. Don't know of anything that exists you could use for

Re: [ClusterLabs] Antw: Re: Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

2021-03-05 Thread Klaus Wenninger
On 3/5/21 6:04 PM, Digimer wrote: > On 2021-03-05 2:14 a.m., Ulrich Windl wrote: How would the fencing be confirmed? I don't know. >>> It's part of the FenceAgentAPI. The cluster invokes the fence agent, >>> passes in variable=value pairs on STDIN, and waits for the agent to >>> exit. It

Re: [ClusterLabs] Antw: Re: Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

2021-03-05 Thread Klaus Wenninger
On 3/5/21 8:14 AM, Ulrich Windl wrote: Digimer schrieb am 04.03.2021 um 06:35 in Nachricht > : >> On 2021-03-03 1:56 a.m., Ulrich Windl wrote: >> Eric Robinson schrieb am 02.03.2021 um 19:26 > in >>> Nachricht >>> >> > m> > -Original Message- > From: Users On Behalf Of

Re: [ClusterLabs] Antw: [EXT] "Error: unable to fence '001db02a'" but It got fenced anyway

2021-03-02 Thread Klaus Wenninger
On 3/1/21 8:48 AM, Ulrich Windl wrote: Eric Robinson schrieb am 28.02.2021 um 16:34 in > Nachricht > > >> I just configured STONITH in Azure for the first time. My initial test went >> fine. >> >> On node 001db02a, the command... >> >> # pcs stonith fence 001db02b >> >> ...produced

Re: [ClusterLabs] OCF resource agent is not starting up

2021-03-01 Thread Klaus Wenninger
On 2/26/21 12:39 PM, Niveditha U wrote: > Hi Team, > > We have xml data base called xdb which we want to use it as pcs > resource. Hence, we created a custom resource agent script for the > same. We are able to start/stop the xdb resource using debug-start and > debug-stop but pacemaker is unable

Re: [ClusterLabs] CentOS Stream - rpm packages break update

2021-03-01 Thread Klaus Wenninger
On 2/28/21 12:32 PM, lejeczek wrote: > Hi guys, > > in case a developer(s) who might have something to do with RPM builds > for Centos read this: Looks as if it sees just the pacemaker-packages in AppStream to be updated and not those in HighAvailability. Sure the HighAvailability-repo is active?

Re: [ClusterLabs] alert is not executed

2021-02-16 Thread Klaus Wenninger
On 2/15/21 10:24 PM, Lentes, Bernd wrote: > > - On Feb 15, 2021, at 9:00 PM, kgaillot kgail...@redhat.com wrote: > >> On Mon, 2021-02-15 at 20:47 +0100, Lentes, Bernd wrote: >>> - On Feb 15, 2021, at 4:53 PM, kgaillot kgail...@redhat.com >>> wrote: >>> I'd check for SELinux denials.

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-09 Thread Klaus Wenninger
On 2/9/21 4:32 PM, Ulrich Windl wrote: >>>> Klaus Wenninger schrieb am 09.02.2021 um 16:12 in > Nachricht : >> On 2/9/21 3:10 PM, Ulrich Windl wrote: >>>>>> "Ulrich Windl" schrieb am > 09.02.2021 >>> um >>> 15:00 in

Re: [ClusterLabs] Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-09 Thread Klaus Wenninger
On 2/9/21 3:10 PM, Ulrich Windl wrote: "Ulrich Windl" schrieb am 09.02.2021 > um > 15:00 in Nachricht <6022956302a10003e...@gwsmtp.uni-regensburg.de>: >> Hi! >> >> I had made a mistake, leading to node h16 to be fenced. After recovery (h16 >> had re‑joined the cluster) I had stopped the

Re: [ClusterLabs] Stopping all nodes causes servers to migrate

2021-01-25 Thread Klaus Wenninger
On 1/25/21 9:51 AM, Jehan-Guillaume de Rorthais wrote: > Hi Digimer, > > On Sun, 24 Jan 2021 15:31:22 -0500 > Digimer wrote: > [...] >> I had a test server (srv01-test) running on node 1 (el8-a01n01), and on >> node 2 (el8-a01n02) I ran 'pcs cluster stop --all'. >> >> It appears like pacemaker

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: A bug? (SLES15 SP2 with "crm resource refresh")

2021-01-13 Thread Klaus Wenninger
On 1/12/21 8:23 AM, Ulrich Windl wrote: Ken Gaillot schrieb am 11.01.2021 um 16:45 in Nachricht > <3e78312a1c92cde0a1cdd82c2fed33a679f63770.ca...@redhat.com>: > > ... from growing indefinitely). (Plus some timing issues to consider.) >>> Wouldn't a temporary local status variable

Re: [ClusterLabs] Antw: [EXT] Re: Configuring millisecond timestamps in pacemaker.log.

2021-01-12 Thread Klaus Wenninger
On 1/12/21 8:42 AM, Ulrich Windl wrote: Ken Gaillot schrieb am 11.01.2021 um 21:16 in > Nachricht > : >> Pacemaker doesn't currently support it, sorry. It should be pretty easy >> to add though (when built with libqb 2), so hopefully we can get it in >> 2.1.0. > Some time ago I wrote my own

Re: [ClusterLabs] Pending Fencing Actions shown in pcs status

2021-01-12 Thread Klaus Wenninger
the > topology is not set. > > > Best Regards, > Hideo Yamauchi. > > > ----- Original Message - >> From: Klaus Wenninger >> To: Steffen Vinther Sørensen >> Cc: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to >> open-source clus

Re: [ClusterLabs] Pending Fencing Actions shown in pcs status

2021-01-12 Thread Klaus Wenninger
? > > Best Regards, > Hideo Yamauchi. > > > - Original Message - > > From: Klaus Wenninger <mailto:kwenn...@redhat.com>> > > To: Steffen Vinther Sørensen <mailto:svint...@gmail.com>>; Cluster Labs - All topics relat

Re: [ClusterLabs] Pending Fencing Actions shown in pcs status

2021-01-07 Thread Klaus Wenninger
2157ms > > Failed Fencing Actions: > * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=, > client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk, > last-failed='Thu Jan 7 12:48:18 2021' > > > # from /etc/hosts on all 3 nodes: > > 172.31.0.31kvm03-node01

Re: [ClusterLabs] Pending Fencing Actions shown in pcs status

2021-01-07 Thread Klaus Wenninger
Hi Steffen, If you just see the leftover pending-action on one node it would be interesting if restarting of pacemaker on one of the other nodes does sync it to all of the nodes. Regards, Klaus On 1/7/21 9:54 AM, renayama19661...@ybb.ne.jp wrote: > Hi Steffen, > >> Unfortunately not sure about

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2021-01-04 Thread Klaus Wenninger
On 1/4/21 1:50 PM, Christine Caulfield wrote: > > > On 04/01/2021 09:21, Klaus Wenninger wrote: >> On 1/4/21 8:36 AM, Christine Caulfield wrote: >>> >>> >>> On 18/12/2020 20:41, Andrei Borzenkov wrote: >>>> 18.12.2020 21:54, Ken Gaillot пишет:

Re: [ClusterLabs] Fencing explanation

2021-01-04 Thread Klaus Wenninger
On 12/29/20 12:38 AM, Reid Wahl wrote: > Hi, Ignazio. You can set either the delay in one of two ways: > - Using the `delay` attribute, whose value is a bare integer > (representing the number of seconds). This is implemented within the > fencing library (/usr/share/fence/fencing.py). > - Using

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2021-01-04 Thread Klaus Wenninger
On 1/4/21 8:36 AM, Christine Caulfield wrote: > > > On 18/12/2020 20:41, Andrei Borzenkov wrote: >> 18.12.2020 21:54, Ken Gaillot пишет: >>> On Fri, 2020-12-18 at 17:51 +, Animesh Pande wrote: Hello, Is there a tool that would allow for commands to be run on remote nodes in

Re: [ClusterLabs] Running shell command on remote node via corosync messaging infrastructure

2020-12-21 Thread Klaus Wenninger
On 12/18/20 9:41 PM, Andrei Borzenkov wrote: > 18.12.2020 21:54, Ken Gaillot пишет: >> On Fri, 2020-12-18 at 17:51 +, Animesh Pande wrote: >>> Hello, >>> >>> Is there a tool that would allow for commands to be run on remote >>> nodes in the cluster through the corosync messaging layer? I have

Re: [ClusterLabs] query on pacemaker monitor timeout

2020-12-14 Thread Klaus Wenninger
On 12/11/20 7:49 PM, Ken Gaillot wrote: > On Thu, 2020-12-10 at 17:53 +, S Sathish S wrote: >> Hi Team, >> >> Problem Statement: >> >> pcs resource monitor got timed out after 12ms and tried to >> recover resource(application) by stopping and starting first >> occurrence itself. Due to

Re: [ClusterLabs] Calling crm executables via effective uid

2020-12-14 Thread Klaus Wenninger
On 12/11/20 10:20 PM, Alex Zarifoglu wrote: > Hello, >   > I have question regarding the running crm commands with the effective uid. >   > I am trying to create a tool to manage pacemaker resources for > multiple users. For security reasons, these users will only be able to > create/delete/manage

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-14 Thread Klaus Wenninger
On 12/14/20 11:48 AM, Gabriele Bulfon wrote: > > Thanks! > > I tried first option, by adding pcmk_delay_base to the two stonith > primitives. > First has 1 second, second has 5 seconds. > It didn't work :( they still killed each other :( > Anything wrong with the way I did it? Maybe 4s difference

Re: [ClusterLabs] Antw: [EXT] sbd v1.4.2

2020-12-08 Thread Klaus Wenninger
On 12/8/20 11:51 AM, Klaus Wenninger wrote: > On 12/3/20 9:29 AM, Reid Wahl wrote: >> On Thu, Dec 3, 2020 at 12:03 AM Ulrich Windl >> wrote: >>> Hi! >>> >>> See comments inline... >>> >>>>>> Klaus Wenninger schrieb am 02

Re: [ClusterLabs] Antw: [EXT] sbd v1.4.2

2020-12-08 Thread Klaus Wenninger
On 12/3/20 9:29 AM, Reid Wahl wrote: > On Thu, Dec 3, 2020 at 12:03 AM Ulrich Windl > wrote: >> Hi! >> >> See comments inline... >> >>>>> Klaus Wenninger schrieb am 02.12.2020 um 22:05 in >> Nachricht <1b29fa92-b1b7-2315-fbcf-078

[ClusterLabs] sbd v1.4.2

2020-12-02 Thread Klaus Wenninger
Hi sbd - developers & users! Thanks to everybody for contributing to tests and further development. Improvements in build/CI-friendlyness and added robustness against misconfiguration justify labeling the repo v1.4.2. I tried to quickly summarize the changes in the repo since it was labeled

Re: [ClusterLabs] egards, Q: what does " corosync-cfgtool -s" check actually?

2020-11-20 Thread Klaus Wenninger
On 11/20/20 8:22 AM, Ulrich Windl wrote: > Hi! > > having a problem, I wonder what " corosync-cfgtool -s" does check actually: > I see on all nodes and all rings "status = ring 0 active with no faults", > but the nodes seem unable to comminicate somehow. > I there a kind of "corosync node ping"

Re: [ClusterLabs] issue

2020-11-17 Thread Klaus Wenninger
On 11/17/20 8:23 AM, Guy Przytula wrote: > > sorry for coming back and thanks for the answers > > but how do you make a relation between your resource(s) and the script ? > Not sure if this really was your question but you might have a look at /usr/lib/ocf/resource.d/heartbeat/... Klaus > > a

Re: [ClusterLabs] Tuchanka

2020-10-05 Thread Klaus Wenninger
On 10/5/20 1:33 PM, Олег Самойлов wrote: > >> On 2 Oct 2020, at 17:19, Klaus Wenninger wrote: >> >>>> My english is poor, I'll try to find other words. My primary and main task >>>> was to create a prototype for an automatic deploy system. So I

Re: [ClusterLabs] Tuchanka

2020-10-02 Thread Klaus Wenninger
On 10/2/20 3:15 PM, Jehan-Guillaume de Rorthais wrote: > On Fri, 2 Oct 2020 15:18:18 +0300 > Олег Самойлов wrote: > >>> On 29 Sep 2020, at 11:34, Jehan-Guillaume de Rorthais >>> wrote: >>> >>> >>> Vagrant use virtualbox by default, which supports softdog, but it support >>> many other

Re: [ClusterLabs] Pacemaker not starting

2020-09-29 Thread Klaus Wenninger
ot that I would expect too much insight from it but ...:   CIB_file=/var/lib/pacemaker/cib/cib.xml crm_mon Klaus > > On Fri, 25 Sep 2020, 8:46 pm Klaus Wenninger, <mailto:kwenn...@redhat.com>> wrote: > > On 9/24/20 2:53 PM, Ambadas Kawle wrote: >> Hello Team

Re: [ClusterLabs] Pacemaker not starting

2020-09-25 Thread Klaus Wenninger
On 9/24/20 2:53 PM, Ambadas Kawle wrote: > Hello Team  > > > Please help me to solve this problem You have to provide us with some information about your cluster setup so that we can help. That is why we had asked you for the content of /etc/cluster/cluster.conf and the output of 'pcs config'.

Re: [ClusterLabs] Determine a resource's current host in the CIB

2020-09-24 Thread Klaus Wenninger
On 9/24/20 9:19 AM, Reid Wahl wrote: > **Directly via the CIB**, I don't see a more obvious way than looking > for the most recent (perhaps by last-rc-change) successful > (rc-code="0" or rc-code="8") monitor operation. That might be > error-prone. I haven't looked into exactly how crm_simulate

Re: [ClusterLabs] SBD fencing not working on my two-node cluster

2020-09-22 Thread Klaus Wenninger
On 9/22/20 8:19 AM, Andrei Borzenkov wrote: > 22.09.2020 02:06, Philippe M Stedman пишет: >> Hi Strahil, >> >> Here is the output of those commands I appreciate the help! >> >> # crm config show >> node 1: ceha03 \ >> attributes ethmonitor-ens192=1 >> node 2: ceha04 \ >>

Re: [ClusterLabs] Two-node Pacemaker cluster with "fence_aws" fence agent

2020-09-07 Thread Klaus Wenninger
On 9/4/20 11:24 PM, Digimer wrote: > On 2020-09-04 5:15 p.m., Philippe M Stedman wrote: >> Hi ClusterLabs development, >> >> I am in the process of deploying a two-node cluster on AWS and using the >> fence_aws fence agent for fencing. I was reading through the following >> article about common

Re: [ClusterLabs] Tuchanka

2020-09-03 Thread Klaus Wenninger
On 9/3/20 5:58 PM, Ken Gaillot wrote: > On Wed, 2020-09-02 at 20:33 +0300, Олег Самойлов wrote: >> Hi all. >> >> I have developed a test bed to test high available clusters based on >> Pacemaker and PostgreSQL. The combination of words "test bed" was >> given to me by a dictionary. For an russian

Re: [ClusterLabs] Antw: [EXT] Re: Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-24 Thread Klaus Wenninger
On 8/24/20 8:04 AM, Ulrich Windl wrote: Vladislav Bogdanov schrieb am 21.08.2020 um 20:55 > in > Nachricht : >> Hi, >> >> btw, is sbd is now able to handle cib diffs internally? >> Last time I tried to use it with frequently changing CIB, it became a >> CPU hog - it requested full CIB copy

Re: [ClusterLabs] Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-23 Thread Klaus Wenninger
On 8/21/20 8:55 PM, Vladislav Bogdanov wrote: > Hi, > > btw, is sbd is now able to handle cib diffs internally? > Last time I tried to use it with frequently changing CIB, it became a > CPU hog - it requested full CIB copy on every change. Actually sbd should have been able to handle cib-diffs

Re: [ClusterLabs] Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-23 Thread Klaus Wenninger
f2d562f0646082c40fa. You shouldn't be able to reproduce this as there was an individual fix for this condition: commit 824fe834c67fb7bae7feb87607381f9fa8fa2945 Author: Klaus Wenninger Date:   Fri Jun 7 19:09:06 2019 +0200     Fix: sbd-pacemaker: assume graceful exit if leftovers are unmanged

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-18 Thread Klaus Wenninger
On 8/18/20 9:07 PM, Andrei Borzenkov wrote: > 18.08.2020 17:02, Ken Gaillot пишет: >> On Tue, 2020-08-18 at 08:21 +0200, Klaus Wenninger wrote: >>> On 8/18/20 7:49 AM, Andrei Borzenkov wrote: >>>> 17.08.2020 23:39, Jehan-Guillaume de Rorthais пишет: >>&g

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-18 Thread Klaus Wenninger
On 8/18/20 7:49 AM, Andrei Borzenkov wrote: > 17.08.2020 23:39, Jehan-Guillaume de Rorthais пишет: >> On Mon, 17 Aug 2020 10:19:45 -0500 >> Ken Gaillot wrote: >> >>> On Fri, 2020-08-14 at 15:09 +0200, Gabriele Bulfon wrote: Thanks to all your suggestions, I now have the systems with stonith

Re: [ClusterLabs] Coming in Pacemaker 2.0.5: on-fail=demote / no-quorum-policy=demote

2020-08-17 Thread Klaus Wenninger
On 8/10/20 6:47 PM, Ken Gaillot wrote: > Hi all, > > Looking ahead to the Pacemaker 2.0.5 release expected at the end of > this year, here is a new feature already in the master branch. > > When configuring resource operations, Pacemaker lets you set an "on- > fail" policy to specify whether to

Re: [ClusterLabs] Clear Pending Fencing Action

2020-08-17 Thread Klaus Wenninger
On 8/3/20 7:04 AM, Reid Wahl wrote: > Hi, Илья. `stonith_admin --cleanup` doesn't get rid of pending > actions, only failed ones. You might be hitting > https://bugs.clusterlabs.org/show_bug.cgi?id=5401. > > I believe a simultaneous reboot of both nodes will clear the pending > actions. I don't

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-17 Thread Klaus Wenninger
On 8/17/20 9:19 AM, Andrei Borzenkov wrote: > 17.08.2020 10:06, Klaus Wenninger пишет: >>>> Alternatively, you can set up corosync-qdevice, using a separate system >>>> running qnetd server as a quorum arbitrator. >>>> >>> Any solution that is ba

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-17 Thread Klaus Wenninger
On 8/16/20 11:40 AM, Andrei Borzenkov wrote: > 16.08.2020 04:25, Reid Wahl пишет: >> >>> - considering that I have both nodes with stonith against the other node, >>> once the two nodes can communicate, how can I be sure the two nodes will >>> not try to stonith each other? >>> >> The simplest

Re: [ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-30 Thread Klaus Wenninger
On 7/29/20 10:39 AM, Reid Wahl wrote: > Hi, > > It looks like this is a bug that was fixed in later releases. The > `path` variable was a null pointer when it was passed to > `systemd_unit_exec_with_unit` as the `unit` argument. Commit 62a0d26a >

Re: [ClusterLabs] pacemaker systemd resource

2020-07-22 Thread Klaus Wenninger
On 7/22/20 9:59 AM, Хиль Эдуард wrote: > Hi there! I have 2 nodes with Pacemaker 2.0.3, corosync 3.0.3 on > ubuntu 20 + 1 qdevice. I want to define new resource as systemd > unit *dummy.service *: >   > [Unit] > Description=Dummy > [Service] Type=simple That could do the trick. Actually I thought

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-21 Thread Klaus Wenninger
eued=189ms, exec=534334ms >   * stonith_id_2_start_0 on server2ubuntu1 'error' (1): call=216, > status='complete', exitreason='', last-rc-change='1970-01-08 01:44:53 > +01:00', queued=82ms, exec=564228ms > > Failed Fencing Actions: >   * reboot of server2ubuntu1 failed: delegate=,

Re: [ClusterLabs] fence_virt architecture? (was: Re: Still Beginner STONITH Problem)

2020-07-20 Thread Klaus Wenninger
On 7/20/20 11:09 AM, Andrei Borzenkov wrote: > > > On Mon, Jul 20, 2020 at 11:45 AM Klaus Wenninger <mailto:kwenn...@redhat.com>> wrote: > > On 7/20/20 10:34 AM, Andrei Borzenkov wrote: >> >> >>   >> >> The cpg-

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-20 Thread Klaus Wenninger
gt; > Am 17.07.2020 um 16:49 schrieb Strahil Nikolov: > >The simplest way to check if the libvirt's network is NAT (or not)  > is to try to ssh from the first VM to the second one. > That does work without any issue. I can ssh to any server in our > network, host or guest, without a

<    1   2   3   4   5   6   >