[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

2021-03-08 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 08.03.2021 um 17:47 in Nachricht <76793e7b39e2194d328821c7ac9a5d3b82778d5e.ca...@redhat.com>: > On Mon, 2021‑03‑08 at 09:57 +0100, Ulrich Windl wrote: >> > > > Reid Wahl schrieb am 08.03.2021 um 08:42 in >> > > > Nac

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

2021-03-08 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 08.03.2021 um 11:46 in Nachricht <366c7071-8b7e-9ea8-5ea1-cbb6de6d4...@gmail.com>: ... > Probe needs to answer "is resource active *now*". If probe for resource > is impossible until some other resources are active, something is really > wrong with design. Either

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

2021-03-08 Thread Ulrich Windl
detail :) I see what you mean. Regards, Ulrich > > On Sun, Mar 7, 2021 at 11:10 PM Ulrich Windl < > ulrich.wi...@rz.uni-regensburg.de> wrote: > >> >>> Reid Wahl schrieb am 05.03.2021 um 21:22 in >> Nachricht >> : >> > On Fri, Mar 5, 2021 a

[ClusterLabs] Antw: [EXT] Re: Q: constrain or delay "probes"?

2021-03-07 Thread Ulrich Windl
>>> Reid Wahl schrieb am 05.03.2021 um 21:22 in Nachricht : > On Fri, Mar 5, 2021 at 10:13 AM Ken Gaillot wrote: > >> On Fri, 2021-03-05 at 11:39 +0100, Ulrich Windl wrote: >> > Hi! >> > >> > I'm unsure what actually causes a problem I

[ClusterLabs] Antw: [EXT] Re: [ClusterLabs Developers] fence-virt: consider to merge into fence-agents git repository

2021-03-07 Thread Ulrich Windl
>>> Digimer schrieb am 05.03.2021 um 18:05 in Nachricht <5c062e2b-8742-4a9a-0e7c-bc8dec251...@alteeve.ca>: > On 2021-03-05 3:34 a.m., Oyvind Albrigtsen wrote: >> Hi, >> >> We are considering to merge the fence-virt repo into the fence-agents >> git repository. >> >> Tell us if you have any

[ClusterLabs] Q: constrain or delay "probes"?

2021-03-05 Thread Ulrich Windl
Hi! I'm unsure what actually causes a problem I see (a resource was "detected running" when it actually was not), but I'm sure some probe started on cluster node start cannot provide a useful result until some other resource has been started. AFAIK there is no way to make a probe obey odering

[ClusterLabs] Antw: Re: Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

2021-03-04 Thread Ulrich Windl
>>> Digimer schrieb am 04.03.2021 um 06:38 in Nachricht <41edb705-6b8a-2221-fc8b-a367aac98...@alteeve.ca>: > On 2021-03-03 6:53 p.m., Eric Robinson wrote: >> >>> -Original Message----- >>> From: Users On Behalf Of Ulrich Windl >>> Se

[ClusterLabs] Antw: Re: Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

2021-03-04 Thread Ulrich Windl
>>> Digimer schrieb am 04.03.2021 um 06:35 in Nachricht : > On 2021-03-03 1:56 a.m., Ulrich Windl wrote: >>>>> Eric Robinson schrieb am 02.03.2021 um 19:26 in >> Nachricht >> > m> >> >>>> -Original Message- >>>&g

[ClusterLabs] Antw: RE: Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

2021-03-02 Thread Ulrich Windl
>>> Eric Robinson schrieb am 02.03.2021 um 19:26 in Nachricht >> -Original Message- >> From: Users On Behalf Of Digimer >> Sent: Monday, March 1, 2021 11:02 AM >> To: Cluster Labs - All topics related to open-source clustering welcomed >> ; Ulr

[ClusterLabs] Antw: [EXT] Re: "Error: unable to fence '001db02a'" but It got fenced anyway

2021-02-28 Thread Ulrich Windl
>>> Valentin Vidic schrieb am 28.02.2021 um 16:59 in Nachricht <20210228155921.gm29...@valentin-vidic.from.hr>: > On Sun, Feb 28, 2021 at 03:34:20PM +, Eric Robinson wrote: >> 001db02b rebooted. After it came back up, I tried it in the other direction. >> >> On node 001db02b, the command...

[ClusterLabs] Antw: [EXT] "Error: unable to fence '001db02a'" but It got fenced anyway

2021-02-28 Thread Ulrich Windl
>>> Eric Robinson schrieb am 28.02.2021 um 16:34 in Nachricht > I just configured STONITH in Azure for the first time. My initial test went > fine. > > On node 001db02a, the command... > > # pcs stonith fence 001db02b > > ...produced output... > > 001db02b fenced. > > 001db02b rebooted.

[ClusterLabs] Antw: Re: [EXTERNAL] - Antw: [EXT] OCF resource agent is not starting up

2021-02-28 Thread Ulrich Windl
pt. I couldn't >> find the script under /usr/lib/ocf, Also on the internet. >> >> Regards, >> Niveditha >> -- >> *From:* Users on behalf of Ulrich Windl < >> ulrich.wi...@rz.uni-regensburg.de> >> *Sent:* Frida

[ClusterLabs] Antw: [EXT] Re: Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-28 Thread Ulrich Windl
>>> Eric Robinson schrieb am 26.02.2021 um 19:58 in Nachricht >> -Original Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Friday, February 26, 2021 11:27 AM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went

[ClusterLabs] Antw: [EXT] Re: Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-28 Thread Ulrich Windl
>>> Eric Robinson schrieb am 26.02.2021 um 18:23 in Nachricht >> ‑Original Message‑ >> From: Digimer >> Sent: Friday, February 26, 2021 10:35 AM >> To: Cluster Labs ‑ All topics related to open‑source clustering welcomed >> ; Eric Robinson >> Subject: Re: [ClusterLabs] Our 2‑Node

[ClusterLabs] Antw: [EXT] Re: Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-28 Thread Ulrich Windl
>>> Digimer schrieb am 26.02.2021 um 17:34 in Nachricht <699432c7-89a6-41bf-c805-f4a7a0a4a...@alteeve.ca>: > On 2021‑02‑26 11:19 a.m., Eric Robinson wrote: >> At 5:16 am Pacific time Monday, one of our cluster nodes failed and its >> mysql services went down. The cluster did not automatically

[ClusterLabs] Antw: [EXT] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-02-28 Thread Ulrich Windl
>>> Eric Robinson schrieb am 26.02.2021 um 17:19 in Nachricht > At 5:16 am Pacific time Monday, one of our cluster nodes failed and its mysql > services went down. The cluster did not automatically recover. > > We're trying to figure out: > > > 1. Why did it fail? > 2. Why did it not

[ClusterLabs] Antw: Re: [EXTERNAL] - Antw: [EXT] OCF resource agent is not starting up

2021-02-28 Thread Ulrich Windl
ot have ocf-tester script with me to test the RA script. I couldn't > find the script under /usr/lib/ocf, Also on the internet. > > Regards, > Niveditha > > From: Users on behalf of Ulrich Windl > > Sent: Friday, February 26, 2021 5:35

[ClusterLabs] Antw: [EXT] OCF resource agent is not starting up

2021-02-26 Thread Ulrich Windl
>>> Niveditha U schrieb am 26.02.2021 um 12:39 in Nachricht > Hi Team, > > We have xml data base called xdb which we want to use it as pcs resource. > Hence, we created a custom resource agent script for the same. We are able to > start/stop the xdb resource using debug‑start and debug‑stop

[ClusterLabs] Antw: [EXT] Re: Q: effieciently collecting some cluster facts

2021-02-26 Thread Ulrich Windl
t; and in the corosync sources tarball tests/testvotequorum1.c > > CHrissie > > > On 25/02/2021 07:16, Ulrich Windl wrote: >> Hi! >> >> I'm thinking about some simple cluster status display that is updated > periodically. >> I wonder how to get some &q

[ClusterLabs] Q: effect of "-p" on corosync-quorumtool in SLES15

2021-02-26 Thread Ulrich Windl
Hi! According to the help message "-p" provides "machine readable" output for corosync-quorumtool for "-s" and "-l". However I don't see any considerable format change in output with or without "-p": # /usr/sbin/corosync-quorumtool -l Membership information -- Nodeid

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: effieciently collecting some cluster facts

2021-02-25 Thread Ulrich Windl
resources_running="7" type="member"/> > Yes it's all in the CIB, but parsing XML is not being considered efficient by me ;-) In most cases using XML just speeds up global warming ;-) Regards, Ulrich > > > On Thu, 2021‑02‑25 at 11:26 +0100, Ulrich Windl wrote:

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

2021-02-25 Thread Ulrich Windl
for fencing >> > > > dlm_controld[1616]: 91659 lvm_global wait for fencing >> > > > >> > > > These were messages when postgresql‑12 service was being >> > started on >> > > > node2. >> > > > As postgresql service i

[ClusterLabs] Antw: [EXT] Re: Q: effieciently collecting some cluster facts

2021-02-25 Thread Ulrich Windl
gt; > CHrissie > > > On 25/02/2021 07:16, Ulrich Windl wrote: >> Hi! >> >> I'm thinking about some simple cluster status display that is updated > periodically. >> I wonder how to get some "cluster facts" efficiently. Among those are: >> &

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

2021-02-25 Thread Ulrich Windl
t;> > > These were messages when postgresql-12 service was being started on >> > > node2. >> > > As postgresql service is dependent on these services(dlm,lvmlockd >> > > and gfs2), it has not started in time on node2. >> > > And node2 fenced it

[ClusterLabs] Antw: [EXT] Re: Resource balancing and "ptest scores"

2021-02-25 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 24.02.2021 um 23:45 in Nachricht <6373352fd18e819bada715a7d610499a658eda29.ca...@redhat.com>: > On Wed, 2021‑02‑24 at 11:16 +0100, Ulrich Windl wrote: >> Hi! >> >> Using a utilization‑based placement strategy (placement‑ >>

[ClusterLabs] Q: effieciently collecting some cluster facts

2021-02-24 Thread Ulrich Windl
Hi! I'm thinking about some simple cluster status display that is updated periodically. I wonder how to get some "cluster facts" efficiently. Among those are: * Is corosync running, and how many nodes can be seen? * Is Pacemaker running, how many nodes does it see, and does it have a quorum? *

[ClusterLabs] Resource balancing and "ptest scores"

2021-02-24 Thread Ulrich Windl
Hi! Using a utilization-based placement strategy (placement-strategy=balanced), I wonder why pacemaker chose node h16 to place a new resource. The situation before placement looks like this: Remaining: h16 capacity: utl_ram=207124 utl_cpu=340 Remaining: h18 capacity: utl_ram=209172 utl_cpu=360

[ClusterLabs] Antw: [EXT] Re: Latest PDF documents have truncated lines

2021-02-22 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 19.02.2021 um 17:48 in Nachricht : > On Fri, 2021‑02‑19 at 17:54 +0300, Andrei Borzenkov wrote: >> In the latest PDF versions I downloaded recently code samples appear >> truncated quite often ‑ they do not fit on page. I compared with >> previous versions I have and

[ClusterLabs] Antw: [EXT] faulty business logic

2021-02-22 Thread Ulrich Windl
>>> lejeczek schrieb am 19.02.2021 um 17:40 in Nachricht : > Hi guys. > > I have a simple cluster with simple constraints: > > Colocation Constraints: >check-jupyterhub with openvpn-server (score:INFINITY) >secret-dropbox with openvpn-server (score:INFINITY) > Ticket Constraints: > >

[ClusterLabs] Missing success log message for resource migration

2021-02-19 Thread Ulrich Windl
Hi! Inspecting the logs after the cluster had rebalanced resources, I'm wondering: It looks as if pacemaker-controld does log a success message when a local migration succeeded, but not if a remote one did. Actions planned: Migrateprm_xen_test-jeos1( h16 -> h18 ) Migrate

[ClusterLabs] Antw: [EXT] Colocation per site ?

2021-02-17 Thread Ulrich Windl
>>> Strahil Nikolov schrieb am 17.02.2021 um 17:46 in Nachricht <2134183555.2122291.1613580414...@mail.yahoo.com>: > Hello All, > I'm currently in a process of building SAP HANA Scale-out cluster and the > HANA team has asked that all nodes on the active instance should have one IP > for backup

[ClusterLabs] Antw: [EXT] Feedback wanted: using the systemd message catalog

2021-02-17 Thread Ulrich Windl
Hi Ken, personally I think systemd is already logging too much, and I don't think that adding instructions to many log messages is actually helpful (It could be done as separate log message (maybe at severity info) already). In Windows I see the problem that it's very hard to find real problems

[ClusterLabs] Antw: [EXT] Re: alert is not executed - solved

2021-02-17 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 16.02.2021 >>> um 10:37 in Nachricht <151181584.46426249.1613468259150.javamail.zim...@helmholtz-muenchen.de>: > > - On Feb 15, 2021, at 10:24 PM, Bernd Lentes > bernd.len...@helmholtz-muenchen.de wrote: > >> - On Feb 15, 2021, at 9:00 PM, kgaillot

[ClusterLabs] Antw: Re: Antw: [EXT] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

2021-02-15 Thread Ulrich Windl
> pcmk_monitor_action=metadata pcmk_reboot_action=off > Meta Attrs: provides=unfencing > Operations: monitor interval=60s (scsi-monitor-interval-60s) > > On Mon, Feb 15, 2021 at 7:17 AM Ulrich Windl < > ulrich.wi...@rz.uni-regensburg.de> wrote: > >> >>> shivra

[ClusterLabs] Antw: [EXT] Re: weird xml snippet in "crm configure show"

2021-02-15 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 13.02.2021 um 01:23 in Nachricht <547781995.41340156.1613175834146.javamail.zim...@helmholtz-muenchen.de>: > > - On Feb 12, 2021, at 12:50 PM, Yan Gao y...@suse.com wrote: > > >> >> >> It seems that crmsh has difficulty parsing the "random" ids of the >>

[ClusterLabs] Antw: [EXT] Non recoverable state of cluster after exit of one node due to killing of processes by oom killer

2021-02-14 Thread Ulrich Windl
>>> shivraj dongawe schrieb am 14.02.2021 um 12:03 in Nachricht : > We are running a two node cluster on Ubuntu 20.04 LTS. Cluster related > package version details are as > follows: pacemaker/focal-updates,focal-security 2.0.3-3ubuntu4.1 amd64 > pacemaker/focal 2.0.3-3ubuntu3 amd64 >

[ClusterLabs] Antw: [EXT] weird xml snippet in "crm configure show"

2021-02-12 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 12.02.2021 um 11:05 in Nachricht <2012472669.39955087.1613124328501.javamail.zim...@helmholtz-muenchen.de>: > Hi, > > i have problems with a configured alert which does not alert anymore. > I played a bit around with it and changed several times the configuration

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-11 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 11.02.2021 um 19:13 in Nachricht <5ddea954b8e8a45cf73a7a169752146e27f69083.ca...@redhat.com>: > On Thu, 2021-02-11 at 13:59 +0100, Ulrich Windl wrote: >> Hi! >> >> After that problem I see this in crm_mon output: >> Failed Fe

[ClusterLabs] Antw: Re: Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-11 Thread Ulrich Windl
is there. Regards, Ulrich >>> Ulrich Windl schrieb am 09.02.2021 um 16:32 in Nachricht <6022AB1C.645 : 161 : 60728>: >>>> Klaus Wenninger schrieb am 09.02.2021 um 16:12 in > Nachricht : > > On 2/9/21 3:10 PM, Ulrich Windl wrote: > >>>>> &q

[ClusterLabs] Antw: [EXT] Question: How can i group 2 resources together

2021-02-11 Thread Ulrich Windl
>>> "Ben .T.George" schrieb am 10.02.2021 um 20:28 in Nachricht : > HI > > i have 2 resources and i would like configure in such a way that both > should always run from same node, "from" == "on"? see "colocation" constraints. > > also is it safe to give below values for 2 node cluster: > >

[ClusterLabs] Antw: [EXT] Question: 2 node pcs cluster required quorum and separate Heartbeat Network

2021-02-11 Thread Ulrich Windl
>>> "Ben .T.George" schrieb am 10.02.2021 um 19:56 in Nachricht : > HI > > Is it mandatory for 2 node pcs cluster require a quorum and separate > Heartbeat Network? Question: What do you expect when the network link goes down? > > Regards, > Ben

[ClusterLabs] Antw: [EXT] Re: Help: Cluster resource relocating to rebooted node automatically

2021-02-11 Thread Ulrich Windl
>>> "Ben .T.George" schrieb am 10.02.2021 um 16:14 in Nachricht : > HI > > thanks for the Help and i have done "pcs resource clear" and tried the same > method again, now the resource is not going back. > > One more thing I noticed is that my service was from systemd and I have > created a

[ClusterLabs] Antw: Re: Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-09 Thread Ulrich Windl
>>> Klaus Wenninger schrieb am 09.02.2021 um 16:12 in Nachricht : > On 2/9/21 3:10 PM, Ulrich Windl wrote: >>>>> "Ulrich Windl" schrieb am 09.02.2021 >> um >> 15:00 in Nachricht <6022956302a10003e...@gwsmtp.uni-regensburg.de>: &g

[ClusterLabs] Antw: [EXT] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-09 Thread Ulrich Windl
>>> "Ulrich Windl" schrieb am 09.02.2021 um 15:00 in Nachricht <6022956302a10003e...@gwsmtp.uni-regensburg.de>: > Hi! > > I had made a mistake, leading to node h16 to be fenced. After recovery (h16 > had re‑joined the cluster) I had stopp

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Peer (slave) node deleting master's transient_attributes

2021-02-09 Thread Ulrich Windl
age of your "X", then decide what Y should be ;-) Regards, Ulrich > Regards, > Stuart > > On Tue, Feb 9, 2021 at 2:34 AM Ulrich Windl < > ulrich.wi...@rz.uni-regensburg.de> wrote: > >> Hi! >> >> Maybe you just misunderstand what maintennce mo

[ClusterLabs] Another odd message: pacemaker-fenced[31326]: warning: Can't create a sane reply

2021-02-09 Thread Ulrich Windl
Hi! I had made a mistake, leading to node h16 to be fenced. After recovery (h16 had re-joined the cluster) I had stopped the node, reconfigured the network, then started the node again. Then I did the same thing (not the unwanted fencing) with h18. When I started the node again, I saw these

[ClusterLabs] Antw: [EXT] Re: Peer (slave) node deleting master's transient_attributes

2021-02-08 Thread Ulrich Windl
Hi! Maybe you just misunderstand what maintennce mode for a single node means: CIBS updates will still be performed, but not the resource actions. If CIB updates are sent to another node, that node will perform actions. Maybe just explain what you really want to do with one node in maintenance

[ClusterLabs] Antw: [EXT] Re: node fencing due to "Stonith/shutdown of node .. was not expected" while the node shut down cleanly

2021-02-08 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 08.02.2021 um 17:43 in Nachricht <5ee981d3893dd7712c747661de05240df1ccd8eb.ca...@redhat.com>: > On Mon, 2021‑02‑08 at 08:41 +0100, Ulrich Windl wrote: >> Hi! >> >> There were previous indications of this problem, but today I had it

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: starting systemd resources

2021-02-08 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 05.02.2021 um 16:47 in >>> Nachricht <7247097610e6ab4f3a44a7648e0acf32fbdb9937.ca...@redhat.com>: Hi! ... >> Doesn't systemctl return a proper exit status? > > It does, but we don't use systemctl, we use the systemd C library > interface. And unfortunately, our

[ClusterLabs] Antw: Re: Antw: [EXT] Re: failed migration handled the wrong way

2021-02-08 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 05.02.2021 um 15:31 in Nachricht <4572fad7-c5ae-6d93-2559-741d052e3...@gmail.com>: > 05.02.2021 12:54, Ulrich Windl пишет: >>>>> Ulrich Windl schrieb am 01.02.2021 um 11:59 in Nachricht <6017DF04.888 : >> 161 : >

[ClusterLabs] node fencing due to "Stonith/shutdown of node .. was not expected" while the node shut down cleanly

2021-02-07 Thread Ulrich Windl
Hi! There were previous indications of this problem, but today I had it again: I restarted a node (h18, DC) via "crm cluster restart", and the node shutdown cleanly (at least it came to an end), but when restarting, the node was fenced by the new DC (h16): Feb 08 08:12:24 h18

[ClusterLabs] Antw: [EXT] Re: Q: starting systemd resources

2021-02-05 Thread Ulrich Windl
t; (action_complete)debug: nfs-daemon systemd start is now complete > (elapsed=2397ms, remaining=97603ms): ok (0) > Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd [19354] > (log_finished) debug: nfs-daemon monitor (call 20) exited with status > 0 (execution

[ClusterLabs] Antw: [EXT] Re: failed migration handled the wrong way

2021-02-05 Thread Ulrich Windl
>>> Ulrich Windl schrieb am 01.02.2021 um 11:59 in Nachricht <6017DF04.888 : 161 : 60728>: >>>> Andrei Borzenkov schrieb am 01.02.2021 um 11:05 in > Nachricht > : > > On Mon, Feb 1, 2021 at 12:53 PM Ulrich Windl > > wrote: > ... > &g

[ClusterLabs] Q: starting systemd resources

2021-02-04 Thread Ulrich Windl
Hi! While analyzing cluster problems I noticed this: Normal resources executed via OCF RAs create two log entries by pacemaker-execd: One when starting the resource and another when the resource completed starting. However for systemd units I only get a start message. Is that intentional? Does

[ClusterLabs] Antw: [EXT] Re: start vs promote drbd m/s colocation constraint

2021-02-03 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 03.02.2021 um 00:02 in Nachricht <396cc52f2d27b8aab611d2312ba172b07bdc9d7f.ca...@redhat.com>: > On Tue, 2021‑02‑02 at 14:27 ‑0700, Brent Jensen wrote: >> I've been trying to get my DRBD cluster on Centos8 / Pacemaker 2 to >> work but have had issues with cluster not

[ClusterLabs] Antw: [EXT] pcs status command output consist of * in each line , is this expected behavior

2021-02-03 Thread Ulrich Windl
>>> S Sathish S schrieb am 02.02.2021 um 07:20 in Nachricht > Hi Team, > > we have taken latest pacemaker version after that we found pcs status > command output consist of * in each line , is this expected behavior. > > https://github.com/ClusterLabs/pacemaker/tree/Pacemaker‑2.0.5 > > pcs

[ClusterLabs] Antw: [EXT] Anyone using remote-clear-port or remote-tls-port for remote CIB administration?

2021-02-03 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 02.02.2021 um 17:40 in Nachricht <5d7d52f14417e6e8baee49dfbc23884b5183b073.ca...@redhat.com>: > Hi all, > > Pacemaker has a feature allowing CIB modifications to be made from > hosts that are not cluster nodes: > >

[ClusterLabs] Q: Should a cleanup reset the failcount also?

2021-02-03 Thread Ulrich Windl
Hi! I'm wondering: I had a failed clone resource. After fixing the problem, I performed a cleanup, but the fail-counts weren't reset (I thought that was the case in older versions of pacemaker): Before: Full List of Resources: * Clone Set: cln_iotw-md10 [prm_iotw-md10]: * Started: [ h19

[ClusterLabs] Antw: [EXT] Re: Peer (slave) node deleting master's transient_attributes

2021-02-01 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 01.02.2021 um 17:27 in Nachricht <9b99d08faf4ddbe496ede10165f586afd81aa850.ca...@redhat.com>: > On Mon, 2021-02-01 at 11:16 -0500, Stuart Massey wrote: >> Andrei, >> You are right, thank you. I have an earlier thread on which I posted >> a pacemaker.log for this issue,

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-02-01 Thread Ulrich Windl
; adding this line works for me: >> >> pcs constraint colocation add lta-subscription-backend-ope-s1 with >> s3srvnotificationdispatcher INFINITY >> >> I would thanks everyone helped me and spend his time. >> >> Have a good Week! >> >> Best >> >> Damian >&

[ClusterLabs] Antw: Re: Antw: [EXT] Re: failed migration handled the wrong way

2021-02-01 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 01.02.2021 um 12:07 in Nachricht : > On Mon, Feb 1, 2021 at 1:59 PM Ulrich Windl > wrote: >> >> But the VM *wasn't* stopped on h16! >> > > I am not sure what you mean here. It was not stopped during migration? > Y

[ClusterLabs] Antw: [EXT] Re: failed migration handled the wrong way

2021-02-01 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 01.02.2021 um 11:05 in Nachricht : > On Mon, Feb 1, 2021 at 12:53 PM Ulrich Windl > wrote: ... >> Feb 01 10:33:08 h16 pacemaker‑execd[7464]: notice: > prm_xen_test‑jeos5_stop_0[33137] erro

[ClusterLabs] Antw: [EXT] Re: failed migration handled the wrong way

2021-02-01 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 01.02.2021 um 11:05 in Nachricht : > On Mon, Feb 1, 2021 at 12:53 PM Ulrich Windl > wrote: >> >> Hi! >> >> While fighting to get the wrong configuration, I broke libvirt live‑migration Of course I meant "*right*

[ClusterLabs] failed migration handled the wrong way

2021-02-01 Thread Ulrich Windl
Hi! While fighting to get the wrong configuration, I broke libvirt live-migration by not enabling the TLS socket. When testing to live-migrate a VM from h16 to h18, these are the essential events: Feb 01 10:30:10 h16 pacemaker-schedulerd[7466]: notice: * Move prm_cron_snap_test-jeos5

[ClusterLabs] Antw: [EXT] Re: CCIB migration from Pacemaker 1.x to 2.x

2021-01-31 Thread Ulrich Windl
>>> "Sharma, Jaikumar" schrieb am 30.01.2021 um 13:41 in Nachricht >> fence_drac5 , fence_drac (not sure about that) , SBD > I've configured IPMI over LAN giving static IP addresses to both nodes (at > iDRAC level) in cluster and I can power reset/reboot both nodes in the > cluster by

[ClusterLabs] Antw: [EXT] Peer (slave) node deleting master's transient_attributes

2021-01-31 Thread Ulrich Windl
>>> Stuart Massey schrieb am 29.01.2021 um 18:37 in Nachricht : > Can someone help me with this? > Background: > > "node01" is failing, and has been placed in "maintenance" mode. It > occasionally loses connectivity. > > "node02" is able to run our resources > > Consider the following messages

[ClusterLabs] Antw: [EXT] Re: Problem with systemd socket service (start fails when running already)

2021-01-31 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 29.01.2021 um 18:36 in Nachricht <7bd34d6c-642f-0e44-e424-1445ebb30...@gmail.com>: > 29.01.2021 14:19, Ulrich Windl пишет: >> Hi! >> >> I'm having an odd failure using a systemd socket unit controlled by the > cl

[ClusterLabs] Problem with systemd socket service (start fails when running already)

2021-01-29 Thread Ulrich Windl
Hi! I'm having an odd failure using a systemd socket unit controlled by the cluster. (Personally I feel: "cluster and systemd: One resource controller too much". But when you need to control a systemd unit...) When the unit is active already, a start peration fails: Jan 29 12:12:46 h16

[ClusterLabs] Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-01-29 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 28.01.2021 um 18:30 in Nachricht : > 27.01.2021 22:03, Ken Gaillot пишет: >> >> With a group, later members depend on earlier members. If an earlier >> member can't run, then no members after it can run. >> >> However we can't make the dependency go in both

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-01-29 Thread Ulrich Windl
example? Regards, Ulrich > i really hope was a solution or workaorund for this, but as ken clarify, > pacemaker cant hadle this exceptions. > > Many thanks for your quick and effective support. > > Have a good evening! > > Damiano > > > Il giorno gio

[ClusterLabs] Q: Using undefined utilization

2021-01-29 Thread Ulrich Windl
Hi again! I had made a mistake: defining resource utilization with a name that doesn't exist as node capacity/utilization (mistyped it). The effect was that the resource was stopped, but unfortunately ptest did not tell me why ("Insuffient node capacity for resource ...") However I'd think

[ClusterLabs] Antw: [EXT] Re: SLES15 SP2: crm shell crashed

2021-01-28 Thread Ulrich Windl
Regards, Ulrich > > Regards, > xin > ________ > From: Users on behalf of Ulrich Windl > > Sent: Thursday, January 28, 2021 4:46 PM > To: users@clusterlabs.org > Subject: [ClusterLabs] SLES15 SP2: crm shell crashed > > Hi! > > Trying

[ClusterLabs] Antw: [EXT] Re: Q: wrong "unexpected shutdown of DC" detected

2021-01-28 Thread Ulrich Windl
Ken, thanks for analyzing the logs! See comments inline... >>> Ken Gaillot schrieb am 27.01.2021 um 19:55 in Nachricht <644fc719a2e8870c332db859bcdef275d986249a.ca...@redhat.com>: > On Wed, 2021‑01‑27 at 12:36 +0100, Ulrich Windl wrote: ... >> Jan 27 10:43:48 h

[ClusterLabs] Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

2021-01-28 Thread Ulrich Windl
>>> damiano giuliani schrieb am 27.01.2021 um 19:25 in Nachricht : > Hi Andrei, Thanks for ur help. > if one of my resource in the group fails or the primary node went down ( > in my case acspcmk-02 ), the probe notices it and pacemaker tries to > restart the whole resource group on the second

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Stopping all nodes causes servers to migrate

2021-01-28 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 27.01.2021 um 18:46 in Nachricht <02cd90fcc10f1021d9f51649e2991da3209a6935.ca...@redhat.com>: > On Wed, 2021-01-27 at 08:35 +0100, Ulrich Windl wrote: >> > > > Tomas Jelinek schrieb am 26.01.2021 um >> > > > 16:15

[ClusterLabs] SLES15 SP2: crm shell crashed

2021-01-28 Thread Ulrich Windl
Hi! Trying to add one resource to my cluster that is very similar to an existing one, I had forgotten to define one related resource. When editing the config to "clone and adjust" constraints, crm shell crashed when saving: crm(live/h16)configure# edit ERROR: constraint

[ClusterLabs] Antw: [EXT] Re: Stopping all nodes causes servers to migrate

2021-01-26 Thread Ulrich Windl
>>> Tomas Jelinek schrieb am 26.01.2021 um 16:15 in Nachricht <48f935a5-184f-d2d7-7f1a-db596aa6c...@redhat.com>: > Dne 25. 01. 21 v 17:01 Ken Gaillot napsal(a): >> On Mon, 2021‑01‑25 at 09:51 +0100, Jehan‑Guillaume de Rorthais wrote: >>> Hi Digimer, >>> >>> On Sun, 24 Jan 2021 15:31:22 ‑0500 >>>

[ClusterLabs] Antw: [EXT] Re: Stop timeout=INFINITY not working

2021-01-26 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 26.01.2021 um 16:08 in Nachricht : > On Tue, 2021‑01‑26 at 02:12 ‑0500, Digimer wrote: >> Hi all, >> >> I created a resource with an INFINITE stop timeout; >> >> pcs resource create srv01‑test ocf:alteeve:server name="srv01‑test" >> meta >> allow‑migrate="true"

[ClusterLabs] Antw: [EXT] Re: Stopping all nodes causes servers to migrate

2021-01-25 Thread Ulrich Windl
>>> Digimer schrieb am 25.01.2021 um 19:18 in Nachricht <18d77f26-b21b-4f2e-184c-c2280876d...@alteeve.ca>: ... > If I understand what's been said in this thread, the host node got a > shutdown request so it migrated the resource. Then the peer (new host) > would have gotten the shutdown request,

[ClusterLabs] Antw: [EXT] Resource migration and constraint timeout

2021-01-25 Thread Ulrich Windl
>>> Strahil Nikolov schrieb am 25.01.2021 um 12:28 in Nachricht <1768184755.3488991.1611574085...@mail.yahoo.com>: > Hi All, > As you all know migrating a resource is actually manipulating the location > constraint for that resource. > Is there any plan for an option to control a default timeout

[ClusterLabs] Need help for "libvirtd[16755]: resource busy: Lockspace resource '...' is not locked"

2021-01-25 Thread Ulrich Windl
Hi! I reconfigured my cluster to let it control virtlockd (instead of just "enable" it in systemd). However I still have problems I don't understand: When live-migrating a Xen PV I still get these messages: Jan 25 12:38:06 h18 virtlockd[42724]: libvirt version: 6.0.0 Jan 25 12:38:06 h18

[ClusterLabs] Antw: [EXT] Re: Stopping all nodes causes servers to migrate

2021-01-25 Thread Ulrich Windl
>>> Jehan-Guillaume de Rorthais schrieb am 25.01.2021 um 09:51 in Nachricht <20210125095132.575f55aa@firost>: > Hi Digimer, > > On Sun, 24 Jan 2021 15:31:22 ‑0500 > Digimer wrote: > [...] >> I had a test server (srv01‑test) running on node 1 (el8‑a01n01), and on >> node 2 (el8‑a01n02) I ran

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] DRBD ms resource keeps getting demoted

2021-01-24 Thread Ulrich Windl
drbd resource > since we put the failing node in maintenance. When you are in maintenance mode, monitor operations won't run AFAIK. > We will watch for a bit longer. > Thanks again > > On Thu, Jan 21, 2021 at 2:23 AM Ulrich Windl < > ulrich.wi...@rz.uni-regensburg.de&

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Q: What is lvmlockd locking?

2021-01-22 Thread Ulrich Windl
>>> Gang He schrieb am 22.01.2021 um 09:44 in Nachricht : > > On 2021/1/22 16:17, Ulrich Windl wrote: >>>>> Gang He schrieb am 22.01.2021 um 09:13 in Nachricht >> <1fd1c07d-d12c-fea9-4b17-90a977fe7...@suse.com>: >>> Hi Ulrich, >>&g

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: What is lvmlockd locking?

2021-01-22 Thread Ulrich Windl
mlock But cln_lockspace_ocfs2 provides the shared filesystem that lvmlockd uses. I thought for locking in a cluster it needs a cluster-wide filesystem. > > > Thanks > Gang > > On 2021/1/21 20:08, Ulrich Windl wrote: >>>>> Gang He schrieb am 21.01.2021 um 11:30 in Na

[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.0: noncritical resources

2021-01-21 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 22.01.2021 um 00:51 in Nachricht : > Hi all, > > A recurring request we've seen from Pacemaker users is a feature called > "non‑critical resources" in a proprietary product and "independent > subtrees" in the old rgmanager project. > > An example is a large database

[ClusterLabs] Antw: [EXT] Re: Q: utilization, stickiness and resource placement

2021-01-21 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 21.01.2021 um 17:24 in Nachricht <28f8b077a30233efa41d04688eb21e82c8432ddd.ca...@redhat.com>: > On Thu, 2021‑01‑21 at 08:19 +0100, Ulrich Windl wrote: >> Hi! >> >> I have a question about utilization‑based resource placement

[ClusterLabs] Antw: [EXT] Re: Q: What is lvmlockd locking?

2021-01-21 Thread Ulrich Windl
ched. The only VG the cluster node sees is: ph16:~ # vgs VG #PV #LV #SN Attr VSize VFree sys 1 3 0 wz--n- 222.50g0 Regards, Ulrich > I feel the problem was probably caused by lvmlock resource agent script, > which did not handle this corner case correctly. > > Thanks

[ClusterLabs] Q: What is lvmlockd locking?

2021-01-21 Thread Ulrich Windl
Hi! I have a problem: For tests I had configured lvmlockd. Now that the tests have ended, no LVM is used for cluster resources any more, but lvmlockd is still configured. Unfortunately I ran into this problem: On OCFS2 mount was unmounted successfully, another holding the lockspace for

[ClusterLabs] Antw: Re: Antw: [EXT] DRBD ms resource keeps getting demoted

2021-01-20 Thread Ulrich Windl
; IP, since that is the route to the IP addresses resolved for the host >> names; that will certainly be the only route to the quorum device. I can >> say that this cluster has run reasonably well for quite some time with this >> configuration prior to the recently developed hardware issues o

[ClusterLabs] Q: utilization, stickiness and resource placement

2021-01-20 Thread Ulrich Windl
Hi! I have a question about utilization-based resource placement (specifically: placement-strategy=balanced): Assume you have two resource capacities (say A and B) on each node, and each resource also has a utilization parameter for both. Both nodes have enough capacity for a resource to be

[ClusterLabs] Antw: [EXT] Q: placement-strategy=balanced

2021-01-19 Thread Ulrich Windl
>>> "Ulrich Windl" schrieb am 15.01.2021 um 09:36 in Nachricht <6001541002a10003e...@gwsmtp.uni-regensburg.de>: > Hi! > > The cluster I'm configuring (SLES15 SP2) fenced a node last night. Still > unsure what exactly caused the fencing, but looking at

[ClusterLabs] Antw: Re: Antw: [EXT] Re: What's a "transition", BTW?

2021-01-18 Thread Ulrich Windl
>>> Reid Wahl schrieb am 19.01.2021 um 08:22 in Nachricht : > On Mon, Jan 18, 2021 at 11:18 PM Ulrich Windl < > ulrich.wi...@rz.uni-regensburg.de> wrote: > >> >>> Ken Gaillot schrieb am 18.01.2021 um 19:29 in >> Nachricht >> <104

[ClusterLabs] Antw: [EXT] DRBD ms resource keeps getting demoted

2021-01-18 Thread Ulrich Windl
>>> Stuart Massey schrieb am 19.01.2021 um 04:46 in Nachricht : > So, we have a 2-node cluster with a quorum device. One of the nodes (node1) > is having some trouble, so we have added constraints to prevent any > resources migrating to it, but have not put it in standby, so that drbd in >

[ClusterLabs] Antw: [EXT] Re: What's a "transition", BTW?

2021-01-18 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 18.01.2021 um 19:29 in Nachricht <1047fd943be77f4a6fd4cd4dd19b65d1550512f8.ca...@redhat.com>: > On Fri, 2021‑01‑15 at 11:40 +0100, Ulrich Windl wrote: >> Hi! >> >> With a cluster recheck interval, I see periodic log messages

[ClusterLabs] Antw: [EXT] Re: Q: placement-strategy=balanced

2021-01-18 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 18.01.2021 um 19:20 in Nachricht <06d171c5d33bcb20af71d534a94ce26a56bdd530.ca...@redhat.com>: > On Fri, 2021‑01‑15 at 09:36 +0100, Ulrich Windl wrote: >> Hi! >> >> The cluster I'm configuring (SLES15 SP2) fenced a node last night.

[ClusterLabs] Antw: [EXT] CentOS 8 & drbd 9, two drbd devices and colocation

2021-01-18 Thread Ulrich Windl
Hi! It should be easy (I guess), but when requiring both masters to be on the same node, can't you do with one DRBD device (something like putting a LVM VG on that and proivide two LVs)? Regards, Ulrich >>> schrieb am 18.01.2021 um 18:43 in Nachricht : > Hi again, > > I need some help to

[ClusterLabs] Antw: [EXT] Re: Q: When do I need virtlockd?

2021-01-18 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 18.01.2021 um 10:54 in Nachricht : > On Mon, Jan 18, 2021 at 11:55 AM Ulrich Windl > wrote: > . >> >> So can someone explan, or direct me to some helpful docs? >> > > Are you aware of https://libvirt.org/kbase/locki

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Cluster breaks after pcs unstandby node

2021-01-18 Thread Ulrich Windl
d line was never written to persistent journal What might help is running "journalctl -f" on a terminal. So you see the last messages received, even if not written to the filesystem (I think). So when the host is down, you see the last messages. Disk writes frequently miss the

[ClusterLabs] Q: When do I need virtlockd?

2021-01-18 Thread Ulrich Windl
Hi! I'm migrating our Xen PVM environment from SLES11 SP4 to SLES15 SP2. As it seems libvirt is the preferred framework to use, so I configured VirtualDomains instead of Xen. I had to move configuration from Xen xm via xen xl to libvirt. What I couldn't get from the docs is whether and when I

<    1   2   3   4   5   6   7   8   9   10   >