[ClusterLabs] Antw: Antw: [EXT] Stopping a server failed and fenced, despite disabling stop timeout

2021-01-18 Thread Ulrich Windl
>>> "Ulrich Windl" schrieb am 18.01.2021 um 09:28 in Nachricht <6005469702a10003e...@gwsmtp.uni-regensburg.de>: >>>> Digimer schrieb am 18.01.2021 um 03:11 in Nachricht > <816a4d1e-a92d-2a4c-b1a0-cf4353e3f...@alteeve.ca>: >> Hi all, >&g

[ClusterLabs] Antw: [EXT] Stopping a server failed and fenced, despite disabling stop timeout

2021-01-18 Thread Ulrich Windl
>>> Digimer schrieb am 18.01.2021 um 03:11 in Nachricht <816a4d1e-a92d-2a4c-b1a0-cf4353e3f...@alteeve.ca>: > Hi all, > > Mind the slew of questions, well into testing now and finding lots of > issues. This one is two questions... :) > > I set a server to be unamaged in pacemaker while the

[ClusterLabs] Antw: [EXT] Re: DRBD 2-node M/S doesn't want to promote new master, Centos 8

2021-01-18 Thread Ulrich Windl
>>> Brent Jensen schrieb am 17.01.2021 um 20:00 in >>> Nachricht <2ffb6214-6397-8178-b22e-269b38953...@gmail.com>: ... Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: helper command: /sbin/drbdadm disconnected Jan 17 11:48:14 nfs6 drbdadm[797503]: drbdadm: Unknown command 'disconnected' Jan 17

[ClusterLabs] Antw: [EXT] Completely disabled resource failure triggered fencing

2021-01-18 Thread Ulrich Windl
>>> Digimer schrieb am 17.01.2021 um 19:45 in Nachricht <5c428538-3507-d886-8c54-f63bc4bad...@alteeve.ca>: > Hi all, > > I'm trying to figure out how to define a resource such that if it > fails in any way, it will not cause pacemaker self self-fence. The > reasoning being that there are

[ClusterLabs] Antw: Re: Antw: [EXT] Cluster breaks after pcs unstandby node

2021-01-17 Thread Ulrich Windl
>>> Steffen Vinther Sørensen schrieb am 16.01.2021 um 19:28 in Nachricht : > Hi and thank you for the insights Hi! ... > I just did a test after the latest adjustments with colocations etc. > trying to standby node02, ends up with node02 being fenced before > migrations complete. Unfortunately

Re: [ClusterLabs] [EXT] DRBD 2-node M/S doesn't want to promote new master, Centos 8

2021-01-15 Thread Ulrich Windl
On 1/15/21 10:10 PM, Brent Jensen wrote: pacemaker-attrd[7671]: notice: Setting master-drbd0[nfs5]: 1 -> 1000 I wonder: Does that mean the stickiness for master is still 1000 on nfs5? ___ Manage your subscription:

[ClusterLabs] What's a "transition", BTW?

2021-01-15 Thread Ulrich Windl
Hi! With a cluster recheck interval, I see periodic log messages like this: Jan 15 11:05:50 h19 pacemaker-controld[4804]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Jan 15 11:15:50 h19 pacemaker-controld[4804]: notice: State transition S_IDLE -> S_POLICY_ENGINE Jan 15 11:15:50 h19

[ClusterLabs] Q: placement-strategy=balanced

2021-01-15 Thread Ulrich Windl
Hi! The cluster I'm configuring (SLES15 SP2) fenced a node last night. Still unsure what exactly caused the fencing, but looking at the logs I found this "action plan" that lead to fencing: Jan 14 20:05:12 h19 pacemaker-schedulerd[4803]: notice: * Move prm_cron_snap_test-jeos1

[ClusterLabs] Antw: Antw: [EXT] Cluster breaks after pcs unstandby node

2021-01-14 Thread Ulrich Windl
>>> "Ulrich Windl" schrieb am 14.01.2021 um 08:26 in Nachricht <522e02a10003e...@gwsmtp.uni-regensburg.de>: ... > No idea why, but then: > Jan 04 13:54:18 kvm03-node03 crmd[37819]: notice: Node > kvm03-node02.avigol-gcs.dk state is now lost Sorry, i

[ClusterLabs] Antw: [EXT] Cluster breaks after pcs unstandby node

2021-01-13 Thread Ulrich Windl
Hi! I'm using SLES, but I think your configuration misses many colocations (IMHO every ordering should have a correspoonding colocation). From the logs of node1, this looks odd to me: attrd[11024]:error: Connection to the CPG API failed: Library error (2) After systemd[1]: Unit

[ClusterLabs] Q: List resources affected by utilization limits

2021-01-13 Thread Ulrich Windl
Hi! I had made a test: I had configured RAM requirements for some test VMs together with node RAM capacities. Things were running fine. Then as a test I reduced the RAM capacity of all nodes, and test VMs were stopped due to not enough RAM. Now I wonder: is there a command that can list those

[ClusterLabs] Antw: [EXT] Re: Setup Apache virtual IP SSL certificate config

2021-01-12 Thread Ulrich Windl
Actually I wonder whether an encrypted connection form localhost to localhost does make much sense at all. >>> Ken Gaillot schrieb am 12.01.2021 um 18:03 in Nachricht <9de95d50930dd3f83461e7eb63cb904c4b1e7f08.ca...@redhat.com>: > I'd try using the name in the certificate instead of localhost >

[ClusterLabs] Questions about the infamous TOTEM retransmit list

2021-01-12 Thread Ulrich Windl
Hi! Before setting up our first pacemaker cluster we thought one low-speed redundant network would be good in addition to the normal high-speed network. However as is seems now (SLES15 SP2) there is NO reasonable RRP mode to drive such a configuration with corosync. Passive RRP mode with UDPU

[ClusterLabs] Antw: [EXT] Re: Configuring millisecond timestamps in pacemaker.log.

2021-01-11 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 11.01.2021 um 21:16 in Nachricht : > Pacemaker doesn't currently support it, sorry. It should be pretty easy > to add though (when built with libqb 2), so hopefully we can get it in > 2.1.0. Some time ago I wrote my own set of logging routines, and actually I uses

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: A bug? (SLES15 SP2 with "crm resource refresh")

2021-01-11 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 11.01.2021 um 16:45 in >>> Nachricht <3e78312a1c92cde0a1cdd82c2fed33a679f63770.ca...@redhat.com>: ... >> > from growing indefinitely). (Plus some timing issues to consider.) >> >> Wouldn't a temporary local status variable do also? > Hi Ken, I appreciate your

[ClusterLabs] Antw: Re: Antw: [EXT] Re: A bug? (SLES15 SP2 with "crm resource refresh")

2021-01-11 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 11.01.2021 um 15:46 in Nachricht <3df79a20eb4440357759cca4fe5b0e0729e47085.ca...@redhat.com>: > On Mon, 2021-01-11 at 08:25 +0100, Ulrich Windl wrote: >> > > > Ken Gaillot schrieb am 08.01.2021 um &

[ClusterLabs] SLES15 SP2: Problem deactivating Clustered RAID

2021-01-11 Thread Ulrich Windl
Hi! I'm fixing my configuration after each unexpected fencing. Probably I missed some dependency, but these messages seem somewhat unexpected to me: Jan 11 15:54:32 h18 Raid1(prm_lockspace_raid_md10)[30630]: INFO: running lsof to list /dev/md10 users... Jan 11 15:54:32 h18 systemd[1]: Stopping

[ClusterLabs] Antw: [EXT] Re: A bug? (SLES15 SP2 with "crm resource refresh")

2021-01-10 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 08.01.2021 um 17:38 in Nachricht <662b69bff331fae41771cf8833e819c2d5b18044.ca...@redhat.com>: > On Fri, 2021‑01‑08 at 11:46 +0100, Ulrich Windl wrote: >> Hi! >> >> Trying to reproduce a problem that had occurred in the past after a

[ClusterLabs] A bug in "crm resource stop ..."?

2021-01-08 Thread Ulrich Windl
Hi! When editing my configuration I found this: ... meta priority=123 allow-migrate=true target-role=Stopped \ meta 1: resource-stickiness=0 target-role=Stopped \ meta 2: rule 0: date spec hours=7-19 weekdays=1-5 resource-stickiness=1000 Note the multiple "

[ClusterLabs] A bug? (SLES15 SP2 with "crm resource refresh")

2021-01-08 Thread Ulrich Windl
Hi! Trying to reproduce a problem that had occurred in the past after a "crm resource refresh" ("reprobe"), I noticed something on the DC that looks odd to me: Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Forcing the status of all resources to be redetected Jan 08 11:13:21 h16

[ClusterLabs] Antw: [EXT] Re: Pending Fencing Actions shown in pcs status

2021-01-07 Thread Ulrich Windl
>>> schrieb am 07.01.2021 um 09:51 in Nachricht <91782048.765666.1610009460932.javamail.ya...@mail.yahoo.co.jp>: > Hi Steffen, > Hi Reid, > > The fencing history is kept inside stonith-ng and is not written to cib. So you were asking for a specific section of the CIB like "cibadmin -Q -o

[ClusterLabs] Antw: [EXT] Re: Pending Fencing Actions shown in pcs status

2021-01-07 Thread Ulrich Windl
>>> Steffen Vinther Sørensen schrieb am 07.01.2021 um 09:49 in Nachricht : > Hi Reid, > > I was under the impression that 'pcs config' was the CIB in a more > friendly format. Here is the 'pcs cluster cib' as requested I'd also think so (+/- parsing and presentation errors) ;-) > > /Steffen

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)

2021-01-07 Thread Ulrich Windl
or explaining! Regards, Ulrich > > On Fri, Dec 18, 2020 at 8:15 AM Ken Gaillot wrote: >> >> On Fri, 2020-12-18 at 13:32 +0100, Ulrich Windl wrote: >> > > > > Andrei Borzenkov schrieb am 18.12.2020 um >> > > > > 12:17 in >> > >

[ClusterLabs] Antw: [EXT] Cluster breaks after pcs unstandby node

2021-01-04 Thread Ulrich Windl
>>> Steffen Vinther Sørensen schrieb am 04.01.2021 um 16:08 in Nachricht : > Hi all, > I am trying to stabilize a 3-node CentOS7 cluster for production > usage, VirtualDomains and GFS2 resources. However this following use > case ends up with node1 fenced, and some Virtualdomains in FAILED >

[ClusterLabs] Antw: [EXT] Observed Difference Between ldirectord and keepalived

2021-01-04 Thread Ulrich Windl
>>> Eric Robinson schrieb am 02.01.2021 um 10:11 in Nachricht > We recently switched from ldirectord to keepalived. We noticed that, after > the switch, LVS behaves a bit differently with respect to "down" services. > > On ldirectord, a virtual service with 2 realservers displays "Masq0"

[ClusterLabs] Antw: [EXT] Fencing explanation

2021-01-04 Thread Ulrich Windl
>>> Ignazio Cassano schrieb am 28.12.2020 um 22:21 in Nachricht : > Hello all, I am setting a pacemaker cluster with centos 7 and ipmi idrac > fencing devices. > What I did not understand is how set the number of seconds before a node is > rebooted by stonith. Actually the only reason for a

[ClusterLabs] Antw: [EXT] Re: Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)

2020-12-18 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 18.12.2020 um 12:17 in Nachricht : > 18.12.2020 12:00, Ulrich Windl пишет: >> >> Maybe a related question: Do STONITH resources have special rules, meaning > they don't wait for successful fencing? > > pacemaker resources in CIB

[ClusterLabs] Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)

2020-12-18 Thread Ulrich Windl
Hi! I wonder what "warning: new_event_notification (4527-22416-14): Broken pipe (32)" means: A bug? (SLES15 SP2, BTW) It happened after a "crm resource refresh": Dec 18 09:25:51 h16 pacemaker-controld[4527]: notice: Forcing the status of all resources to be redetected Dec 18 09:25:51 h16

[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 18.12.2020 um 08:21 in Nachricht <58579c3b-33ce-a121-5d67-00305f3d7...@gmail.com>: > 18.12.2020 10:09, Ulrich Windl пишет: >>>>> Andrei Borzenkov schrieb am 18.12.2020 um 08:01 in >> Nachricht : >>> 17.12.2020 21:3

[ClusterLabs] FYI: crm shell enhancement proposal #699

2020-12-17 Thread Ulrich Windl
Hi! For those using crm shell might be interested in the enhancement proposal I just had made: https://github.com/ClusterLabs/crmsh/issues/699 Enhancements to the enhancement welcome ;-) Regards, Ulrich ___ Manage your subscription:

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 18.12.2020 um 08:01 in Nachricht : > 17.12.2020 21:30, Ken Gaillot пишет: >> >> This reminded me that some IPMI implementations return "success" for >> commands before they've actually been completed. This is why >> fence_ipmilan has a "power_wait" parameter that

[ClusterLabs] Antw: Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Ulrich Windl
>>> "Ulrich Windl" schrieb am 17.12.2020 um 12:23 in Nachricht <5fdb3fbe02a10003d...@gwsmtp.uni-regensburg.de>: ... > > I wonder: Did you remove the hostnames from the log messages? Also are the > times in sync, wondering that at the same second a reso

[ClusterLabs] Antw: [EXT] Changing order in resource group after it's created

2020-12-17 Thread Ulrich Windl
>>> Tony Stocker schrieb am 17.12.2020 um 12:21 in Nachricht : > I have a resource group that has a number of entries. If I want to > reorder them, how do I do that? > > I tried doing this: > > pcs resource update FileMount ‑‑after InternalIP > > but got this error: > > Error: Specified

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Ulrich Windl
-------- > -- > > Da: Ulrich Windl > A: users@clusterlabs.org > Data: 17 dicembre 2020 7.48.46 CET > Oggetto: [ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource > > >>>> Gabri

[ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Ulrich Windl
>>> Gabriele Bulfon schrieb am 17.12.2020 um 09:11 in Nachricht <2129123894.1061.1608192712316@www>: > Yes, sorry took same bash by mistake...here are the correct logs. > > Yes, xstha1 has delay 10s so that I'm giving him precedence, xstha2 has > delay 1s and will be stonished earlier. >

[ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 17.12.2020 um 09:50 in Nachricht : ... > According to logs from xstha1, it started to activate resources only > after stonith was confirmed > > Dec 16 15:08:12 [708] stonith‑ng: notice: log_operation: > Operation 'off' [1273] (call 4 from crmd.712) for host

[ClusterLabs] Antw: [EXT] Re: Antw: Another word of warning regarding VirtualDomain and Live Migration

2020-12-16 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 16.12.2020 um 20:13 in >>> Nachricht : ... >> So the cluster is exactly doing the wrong thing: The VM ist still >> active on h16, while a "recovery" on h19 will start it there! So >> _after_ the recovery the VM is duplicate. > > The problem here is that a stop should

[ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: crm resource start/stop completion

2020-12-16 Thread Ulrich Windl
after start, then you got d > completed) > > ________ > From: Users on behalf of Ulrich Windl > > Sent: Wednesday, December 16, 2020 5:57 PM > To: users@clusterlabs.org > Subject: [ClusterLabs] Antw: [EXT] Re: Q: crm resource start/stop completion > >&

[ClusterLabs] Antw: [EXT] Re: crm shell: "params param"?

2020-12-16 Thread Ulrich Windl
exist Indeed I found out that "param" (most likely caused by some copy error) is a parameter named "param" that has no value ;-) So thats "params param"... > > Thanks for report! > > Regards, > Xin > > From

[ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-16 Thread Ulrich Windl
gt; eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > -------- > -- > > Da: Ulrich Windl > A: users@clusterlabs.org > Data: 16 dicembre 2020 15.45.36 CET &

[ClusterLabs] Antw: [EXT] delaying start of a resource

2020-12-16 Thread Ulrich Windl
>>> Gabriele Bulfon schrieb am 16.12.2020 um 15:32 in Nachricht <1523391015.734.1608129155836@www>: > Hi, I have now a two node cluster using stonith with different > pcmk_delay_base, so that node 1 has priority to stonith node 2 in case of > problems. > > Though, there is still one problem:

[ClusterLabs] crm shell: "params param"?

2020-12-16 Thread Ulrich Windl
>>> Ulrich Windl schrieb am 16.12.2020 um 14:53 in Nachricht <5FDA1167.92C : >>> 161 : 60728>: [...] > primitive prm_xen_test-jeos VirtualDomain \ > params param config="/etc/libvirt/libxl/test-jeos.xml" BTW: Is "params param" a bug in

[ClusterLabs] Antw: [EXT] Re: Antw: Another word of warning regarding VirtualDomain and Live Migration

2020-12-16 Thread Ulrich Windl
>>> Roger Zhou schrieb am 16.12.2020 um 13:58 in Nachricht <8ab80ef4-462c-421b-09b8-084d270d4...@suse.com>: > On 12/16/20 5:06 PM, Ulrich Windl wrote: >> Hi! >> >> (I changed the subject of the thread) >> VirtualDomain seems to be broken, as it

[ClusterLabs] Antw: [EXT] Re: Q: crm resource start/stop completion

2020-12-16 Thread Ulrich Windl
d press TAB, I see resources that are running already. May be that they don't have a target-role (so they are started by default). Can you confirm? Regards, Ulrich > > Regards, > Xin > ____ > From: Users on behalf of Ulrich Windl > > Sent: Monday, Dece

[ClusterLabs] Antw: Another word of warning regarding VirtualDomain and Live Migration

2020-12-16 Thread Ulrich Windl
27]: warning: Unexpected result (error: test-jeos: live migration to h19 failed: 1) was recorded for migrate_to of prm_xen_test-jeos on h16 at Dec 16 09:28:46 2020 Amazingly manual migration using virsh worked: virsh migrate --live test-jeos xen+tls://h18... Regards, Ulrich Windl >>> U

[ClusterLabs] Antw: Another word of warning regarding VirtualDomain and BtrFS (SLES15 SP2)

2020-12-14 Thread Ulrich Windl
:09:27 h16 VirtualDomain(prm_xen_test-jeos)[4850]: INFO: Domain test-jeos already stopped. Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]: error: Calculated transition 669 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-4.bz2 Whhat's going on here? Regards, Ulrich >>> Ulri

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Recoveing from node failure

2020-12-14 Thread Ulrich Windl
https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > -------- > -- > > Da: Ulrich Windl > A: users@clusterlabs.org > Data: 14 dicembre 2020 11.53.22 CET > Oggetto: [ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure > > >>

[ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure

2020-12-14 Thread Ulrich Windl
>>> Gabriele Bulfon schrieb am 14.12.2020 um 11:48 in Nachricht <1065144646.7212.1607942889206@www>: > Thanks! > > I tried first option, by adding pcmk_delay_base to the two stonith > primitives. > First has 1 second, second has 5 seconds. > It didn't work :( they still killed each other :( >

[ClusterLabs] Q: crm resource start/stop completion

2020-12-14 Thread Ulrich Windl
Hi! I wonder: Would it be difficult and would it make sense to change crm shell to: Complete "resource start " with only those resources that aren't running (per role) already Complete "resource stopt " with only those resources that are running (per role) I came to this after seeing "3

[ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure

2020-12-13 Thread Ulrich Windl
e S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > ---- > -- > > Da: Ulrich Wi

[ClusterLabs] Another word of warning regarding VirtualDomain and BtrFS (SLES15 SP2)

2020-12-13 Thread Ulrich Windl
Hi! Another word of warning regarding VirtualDomain: While configuring a 3-node cluster with SLES15 SP2 for Xen PVM (using libvirt and the VirtaulDOmain RA), I had created a TestVM using BtrFS. At some time of testing the cluster ended with the testVM running on more than one node (for reasons

[ClusterLabs] A word of warning regarding VirtualDomain and utilization

2020-12-11 Thread Ulrich Windl
Hi! A word of warning (for SLES15 SP2): I learned that the VirtualDomain RA sets utilization parameters "cpu" and "hv_memory" for the resource's permanent configuration. That is: You'll see those parameters, even though you did not configure them. So far, so good, but when you defined a

[ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Ulrich Windl
Hi! Did you take care for special "two node" settings (quorum I mean)? When I use "crm_mon -1Arfj", I see something like " * Current DC: h19 (version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with quorum" What do you see? Regards, Ulrich >>> Gabriele Bulfon

[ClusterLabs] Antw: [EXT] Re: Q: LVM-activate a shared LV

2020-12-11 Thread Ulrich Windl
rap‑options: \ > have‑watchdog=false \ > stonith‑enabled=true \ > dc‑version="2.0.4+20200616.2deceaa3a‑3.3.1‑2.0.4+20200616.2deceaa3a" \ > cluster‑infrastructure=corosync \ > cluster‑name=cluster \ > last‑lrm‑refresh=16

[ClusterLabs] Antw: [EXT] Re: Q: LVM-activate a shared LV

2020-12-10 Thread Ulrich Windl
nough to take care of OCFS2 . > Have you tried that ? > > Best Regards, > Strahil Nikolov > > > > > > > В четвъртък, 10 декември 2020 г., 16:55:52 Гринуич+2, Ulrich Windl > написа: > > > > > > Hi! > > I configured a clustered LV

[ClusterLabs] Q: validate for VirtualDomain

2020-12-10 Thread Ulrich Windl
Hi! Configuring a VirtualDomain resource I mistyped the config parameter. I wonder: Should validate-all check whether the config file exists and is readable? The only exception I could imagine is when the config file is provided by another resource that isn't running yet. Despite of this I

[ClusterLabs] Q: LVM-activate a shared LV

2020-12-10 Thread Ulrich Windl
Hi! I configured a clustered LV (I think) for activation on three nodes, but it won't work. Error is: LVM-activate(prm_testVG0_test-jeos_activate)[48844]: ERROR: LV locked by other host: testVG0/test-jeos Failed to lock logical volume testVG0/test-jeos. primitive

[ClusterLabs] Antw: [EXT] Re: resource management of standby node

2020-12-09 Thread Ulrich Windl
>>> Roger Zhou schrieb am 09.12.2020 um 03:57 in Nachricht <4859f848-3a4a-8996-a6c0-a43640d68...@suse.com>: > On 12/1/20 4:03 PM, Ulrich Windl wrote: >>>>> Ken Gaillot schrieb am 30.11.2020 um 19:52 in >>>>> Nachricht >> : >> >&g

[ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Ulrich Windl
although maybe there's a good reason. These >> > get >> > logged with KERN_ERR priority. >> >> I hit Enter and that email sent instead of line-breaking... anyway. >> >> https://github.com/torvalds/linux/blob/master/fs/dlm/dlm_internal.h#L61-L62 >> https

[ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Ulrich Windl
"connecting to" / "connected to" is probably info or notice, too. Regards, Ulrich > >> >> On Fri, Dec 4, 2020 at 5:32 AM Ulrich Windl >> wrote: >> > >> > Hi! >> > >> > Logging into a server via iDRAC, I see several mess

[ClusterLabs] Q: high-priority messages from DLM?

2020-12-04 Thread Ulrich Windl
Hi! Logging into a server via iDRAC, I see several messages drom "dlm:" at the console screen. My obvious explanation is that they are on the screen, because journald (SLES15 SP2) treats them is high priority messages that should go to the screen. However IMHO they are not: [83035.82]

[ClusterLabs] Antw: [EXT] Re: Preferred node for a service (not constrained)

2020-12-03 Thread Ulrich Windl
>>> Strahil Nikolov schrieb am 02.12.2020 um 22:42 in Nachricht <311137659.2419591.1606945369...@mail.yahoo.com>: > Constraints' values are varying from: > infinity which equals to score of 100 > to: > - infinity which equals to score of -100 > > You can usually set a positive score on

[ClusterLabs] Antw: [EXT] sbd v1.4.2

2020-12-03 Thread Ulrich Windl
Hi! See comments inline... >>> Klaus Wenninger schrieb am 02.12.2020 um 22:05 in Nachricht <1b29fa92-b1b7-2315-fbcf-0787ec0e1...@redhat.com>: > Hi sbd ‑ developers & users! > > Thanks to everybody for contributing to tests and > further development. > > Improvements in build/CI‑friendlyness

[ClusterLabs] Antw: [EXT] Final Pacemaker 2.0.5 release now available

2020-12-02 Thread Ulrich Windl
>>> Christopher Lumens schrieb am 02.12.2020 um 19:14 in Nachricht <851583983.28225008.1606932881629.javamail.zim...@redhat.com>: > Hi all, > > The final release of Pacemaker version 2.0.5 is now available at: [...] > > * crm_mon additionally supports a --resource= option for resource-based >

[ClusterLabs] Antw: Re: Antw: [EXT] Re: resource management of standby node

2020-12-01 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 30.11.2020 um 19:52 in >>> Nachricht : ... > > Though there's nothing wrong with putting all nodes in standby. Another > alternative would be to set the stop-all-resources cluster property. Hi Ken, thanks for the valuable feedback! I was looking for that, but

[ClusterLabs] Antw: Re: Antw: [EXT] Re: resource management of standby node

2020-11-30 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 30.11.2020 um 16:17 in Nachricht <0d510ecb-a2fb-1d97-ad05-c71d0c820...@gmail.com>: [...] >> I thought about that: >> First put all nodes in standby to stop resources, then put all nodes in >> maintenance mode, then edit configuration. > > There is no maintenance

[ClusterLabs] corosync[3520]: [CPG ] *** 0x55ff99d211c0 can't mcast to group dlm:ls:lvm_testVG state:1, error:12

2020-11-30 Thread Ulrich Windl
Hi! In my test cluster using UDPU(!) I saw this syslog message when I shut down a VG: Nov 30 13:38:28 h16 pacemaker-execd[3681]: notice: executing - rsc:prm_testVG_activate action:stop call_id:71 Nov 30 13:38:28 h16 LVM-activate(prm_testVG_activate)[7265]: INFO: Deactivating testVG Nov 30

[ClusterLabs] Antw: [EXT] Re: resource management of standby node

2020-11-30 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 30.11.2020 um 14:18 in Nachricht : > On Mon, Nov 30, 2020 at 3:11 PM Ulrich Windl > wrote: >> >> Hi! >> >> In SLES15 I'm surprised what a standby node does: My guess was that a > standby node would stop all resou

[ClusterLabs] crm enhancement proposal (configure grep): Opinions?

2020-11-30 Thread Ulrich Windl
Hi! Would would users of crm shell think about this enhancement proposal: crm configure grep That command would search the configuration for any occurrence of and would list the names where it occurred. That is, if pattern is testXYZ, then all resources either having testXYZ in their name or

[ClusterLabs] Q: LVM-activate: "WARNING: You are recommended to activate one LV at a time or use exclusive activation mode."

2020-11-30 Thread Ulrich Windl
Hi! I configured a shared LVM activation as per instructions (I hope) in SLES15 SP2. However I get this warning: LVM-activate(prm_testVG_activate)[57281]: WARNING: You are recommended to activate one LV at a time or use exclusive activation mode. The configuration is: primitive

[ClusterLabs] Note on Raid1 RA and attribute force_clones

2020-11-27 Thread Ulrich Windl
Hi! Reading the metadata of the Raid1 RA in SLES15 SP2, I see: force_clones (boolean, [false]): force ability to run as a clone Activating the same md RAID array on multiple nodes at the same time will result in data corruption and thus is forbidden by default. A safe example could

[ClusterLabs] Q: crm_resource -C

2020-11-26 Thread Ulrich Windl
Hi! It seems there is a change in semantics (or a bug?) from SLES11 to SLES12/SLES15 regarding crm_resource -C -r ...: I had a cloned resource that failed startup due to a missing configuration file: Failed Resource Actions: * prm_test_raid_monitor_0 on h19 'not installed' (5): call=17,

[ClusterLabs] "crm verify": ".. stonith-watchdog-timeout is nonzero"

2020-11-25 Thread Ulrich Windl
Hi! Using SBD, I got this message from crm's top-level "verify": crm(live/h16)# verify Current cluster status: Online: [ h16 h18 h19 ] prm_stonith_sbd(stonith:external/sbd): Started h18 (unpack_config) notice: Watchdog will be used via SBD if fencing is required and

[ClusterLabs] Q: "crm node server" completion

2020-11-25 Thread Ulrich Windl
Hi! Is it intentional that "crm node server" is the only command that cannot complete the node? crmsh-4.2.0+git.1604052559.2a348644-5.26.1 of SLES15 SP2 Regards, Ulrich ___ Manage your subscription:

[ClusterLabs] Antw: [EXT] Re: Q: cryptic messages from "QB"

2020-11-25 Thread Ulrich Windl
>>> Christine Caulfield schrieb am 25.11.2020 um 10:17 in Nachricht <56738406-9222-a9f3-c57c-e30400a0b...@redhat.com>: > On 25/11/2020 08:45, Ulrich Windl wrote: >> Hi! >> >> Setting up a cluster in SLES15 SP2, I wonder about a few log messages: >> >

[ClusterLabs] egards, Q: what does " corosync-cfgtool -s" check actually?

2020-11-19 Thread Ulrich Windl
Hi! having a problem, I wonder what " corosync-cfgtool -s" does check actually: I see on all nodes and all rings "status = ring 0 active with no faults", but the nodes seem unable to comminicate somehow. I there a kind of "corosync node ping" that actually checks that the local node can

[ClusterLabs] Q: "crm node status" display

2020-11-19 Thread Ulrich Windl
Hi! Setting up a new cluster with SLES15 SP2, I'm wondering: "crm node status" displays XML. Is that the way it should be? h16:~ # crm node crm(live/rksaph16)node# status crmsh-4.2.0+git.1604052559.2a348644-5.26.1.noarch Regards, Ulrich

[ClusterLabs] Minor bug in SLES 15 corosync-2.4.5-6.3.2.x86_64 (unicast, ttl)

2020-11-19 Thread Ulrich Windl
Hi! A short notification: I had set up a new cluster using udpu, finding that ringnumber 0 has a ttl statement ("ttl:1"), but ringnumber 1 had not. So I added one for ringnumber 1, and then I reloaded corosync via corosync-cfgtool -R. Amazingly when (re)starting corosync, it failed with:

[ClusterLabs] Antw: [EXT] Total resources limitation

2020-11-17 Thread Ulrich Windl
>>> Raffaele Pantaleoni schrieb am 17.11.2020 um 08:26 >>> in Nachricht : > Hi everyone, > > is there's any kind of limitation in the total number of resources that > a cluster can manage? > > The question arises, since the cluster itself is limited to 16 nodes > (real ones I mean) because

[ClusterLabs] Antw: [EXT] issue

2020-11-16 Thread Ulrich Windl
>>> Guy Przytula schrieb am 15.11.2020 um 17:51 in Nachricht <67ff4715-89b2-ecd3-0d26-d8037a177...@infocura.be>: > I have installed latest version of pacemaker on redhat 8 > > I wanted to test it out for a cluster for IBM Db2 > > There is only one issue : > > I have 2 nodes : nodep and nodes >

[ClusterLabs] Antw: [EXT] Containers as pcs resource

2020-11-11 Thread Ulrich Windl
Hi! My advice is to write an OCF RA, not a systemd service, because with the first option you have more control over the things happening. Not to talk about ocf-tester... In general you can "attach" pacemaker to a service already running, but the RA has to be correct (i.e.: detect the

[ClusterLabs] Antw: [EXT] Pacemaker 2.0.5-rc2 now available

2020-10-30 Thread Ulrich Windl
>>> Christopher Lumens schrieb am 27.10.2020 um 21:58 in Nachricht <1537839782.23878350.1603832316012.javamail.zim...@redhat.com>: > The second release candidate for Pacemaker 2.0.5 is now available at: > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker‑2.0.5‑rc2 > > The most

[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-27 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 26.10.2020 um 21:44 in Nachricht <1480408662.7194527.1603745092927.javamail.zim...@helmholtz-muenchen.de>: > > - On Oct 26, 2020, at 4:09 PM, Ulrich Windl > ulrich.wi...@rz.uni-regensburg.de wrote: > > >>

[ClusterLabs] Antw: Re: Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-27 Thread Ulrich Windl
Strahil Nikolov > > > > > > > В понеделник, 26 октомври 2020 г., 09:34:31 Гринуич+2, Ulrich Windl > написа: > > > > > >>>> Strahil Nikolov schrieb am 23.10.2020 um 17:06 in > Nachricht <428616368.2019191.1603465603...@mail.yahoo

[ClusterLabs] Antw: Re: Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Ulrich Windl
>>> Ulrich Windl schrieb am 26.10.2020 um 16:09 in Nachricht <5F96E692.B06 : 161 : 60728>: >>>> "Lentes, Bernd" schrieb am 26.10.2020 um > 15:58 in Nachricht > <695331508.6039975.1603724333027.javamail.zim...@helmholtz-muenchen.de>: > >

[ClusterLabs] Antw: Re: Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 26.10.2020 um 15:58 in Nachricht <695331508.6039975.1603724333027.javamail.zim...@helmholtz-muenchen.de>: > > - On Oct 26, 2020, at 8:41 AM, Ulrich Windl > ulrich.wi...@rz.uni-regensburg.de wrote: > >> "SIG

[ClusterLabs] Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 23.10.2020 um 23:16 in Nachricht <1814448122.1773393.1603487817751.javamail.zim...@helmholtz-muenchen.de>: > > - On Oct 23, 2020, at 8:45 PM, Valentin Vidić > vvi...@valentin-vidic.from.hr wrote: > >> On Fri, Oct 23, 2020 at 08:08:31PM +0200, Lentes, Bernd

[ClusterLabs] Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Ulrich Windl
>>> Strahil Nikolov schrieb am 23.10.2020 um 17:06 in Nachricht <428616368.2019191.1603465603...@mail.yahoo.com>: > why don't you work with something like this: 'op stop interval =300 > timeout=600'. I always thought "interval=" does not make any sense for "start" and "stop", but only for

[ClusterLabs] Antw: [EXT] Re: Upgrading/downgrading cluster configuration

2020-10-26 Thread Ulrich Windl
>>> Strahil Nikolov schrieb am 23.10.2020 um 17:04 in Nachricht <362944335.2019534.1603465466...@mail.yahoo.com>: > Usually I prefer to use "crm configure show" and later "crm configure edit" > and replace the config. I guess you use "edit" because of the lack of support for massive

[ClusterLabs] Antw: [EXT] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 22.10.2020 um 22:29 in Nachricht <684655755.160569.1603398597074.javamail.zim...@helmholtz-muenchen.de>: > Hi guys, > > ocassionally stopping a VirtualDomain resource via "crm resource stop" does > not work, and in the end the node is fenced, which is ugly. > I

[ClusterLabs] Antw: [EXT] Best way to obtain timestamp when node was set to "standby"

2020-10-21 Thread Ulrich Windl
Save the name of the current cluster state into a file, and once it changes record the time. The cluster should be in the state as long as your timestamp says. Like that, I guess... >>> Dirk Gassen schrieb am 21.10.2020 um 02:23 in Nachricht : > Hi Alltogether, > > What would be the best way

[ClusterLabs] Corosync 3.1.0 token timeout

2020-10-21 Thread Ulrich Windl
>>> Jan Friesse schrieb am 20.10.2020 um 18:05 in >>> Nachricht <9e9edd13-847c-a81f-9b28-0ecf8f17f...@redhat.com>: > I've forgot to mention one very important change (in text, release notes > at github release is already fixed): > ... > > - Default token timeout was changed from 1 seconds to

[ClusterLabs] Antw: [EXT] Corosync 3.1.0 is available at corosync.org!

2020-10-21 Thread Ulrich Windl
>>> Jan Friesse schrieb am 20.10.2020 um 17:11 in Nachricht : > I am pleased to announce the latest maintenance release of Corosync > 3.1.0 available immediately from GitHub release section at > https://github.com/corosync/corosync/releases or our website at >

[ClusterLabs] Antw: Re: Antw: [EXT] Avoiding self-fence on RA failure

2020-10-08 Thread Ulrich Windl
>>> Digimer schrieb am 07.10.2020 um 23:27 in Nachricht : > On 2020-10-07 2:35 a.m., Ulrich Windl wrote: >>>>> Digimer schrieb am 07.10.2020 um 05:42 in Nachricht >> : >>> Hi all, >>> >>> While developing our program (and not being a

[ClusterLabs] Antw: [EXT] Avoiding self-fence on RA failure

2020-10-07 Thread Ulrich Windl
>>> Digimer schrieb am 07.10.2020 um 05:42 in Nachricht : > Hi all, > > While developing our program (and not being a production cluster), I > find that when I push broken code to a node, causing the RA to fail to > perform an operation, the node gets fenced. (example below). (I see others

[ClusterLabs] Antw: [EXT] mess in the CIB

2020-10-07 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 06.10.2020 um 19:57 in Nachricht <187946767.10825794.1602007021469.javamail.zim...@helmholtz-muenchen.de>: ... > But vm_ssh ... why are some instance-attributes of it named with > snapanalysis? I don't know why, but you could edit the name to fix it. > I didn't

[ClusterLabs] Antw: [EXT] Re: pacemaker and cluster hostname reconfiguration

2020-10-05 Thread Ulrich Windl
>>> Riccardo Manfrin schrieb am 05.10.2020 um >>> 08:49 in Nachricht : > Thanks Igor, > I'm afraid I tried this path too although I did not mention in the issue. > > The procedure I followed was to add the "name" attribute to the node > list in corosync.conf, with the current hostname, than

[ClusterLabs] Antw: Re: Antw: [EXT] How to stop removed resources when replacing cib.xml via cibadmin or crm_shadow

2020-10-01 Thread Ulrich Windl
>>> Igor Tverdovskiy schrieb am 01.10.2020 um >>> 11:47 in Nachricht : > Hi, > > >> > I have generated proper XML according to predefined templates and apply >> it >> > via >> >> cibadmin --replace --xml-file cib.xml >> >> I think before doing such a thing you should put all nodes into

[ClusterLabs] Antw: [EXT] How to stop removed resources when replacing cib.xml via cibadmin or crm_shadow

2020-10-01 Thread Ulrich Windl
>>> Igor Tverdovskiy schrieb am 30.09.2020 um >>> 23:49 in Nachricht : > Hi All, > > I have a necessity to apply the whole cib.xml instead of using command line > tools like >> crm configure ... > > I have generated proper XML according to predefined templates and apply it > via >> cibadmin

[ClusterLabs] Antw: [EXT] VirtualDomain stop operation traced - but nothing appears in /var/lib/heartbeat/trace_ra/

2020-09-30 Thread Ulrich Windl
Hi! Just two notes: 1) SLES12 SP5 (pacemaker-1.1.23) ist current. 2) There isn't very much debug logging in /usr/lib/ocf/resource.d/heartbeat/VirtualDomain Maybe add a "set -x" at the second line of the RA, or call ocf-tester for the resource (while the cluster isn't managing it, of course)

<    1   2   3   4   5   6   7   8   9   10   >