[ClusterLabs] RHEL 7.4 cluster cannot commit suicide (sbd)

2017-08-26 Thread strahil nikolov
SBD_WATCHDOG_TIMEOUT=5 I have used the following example for setting up the sbd: https://acces s.redhat.com/articles/3099231 Thank you for reading this long e-mail. I would be grateful if someone finds out my mistake. Best Regards, Strahil Nikolov ___

Re: [ClusterLabs] fencing on iscsi device not working

2019-10-31 Thread Strahil Nikolov
describes how to properly configure fence_scsi and the requirements for using it. | | | Have you checked if your storage supports persistent reservations? Best Regards,Strahil Nikolov В сряда, 30 октомври 2019 г., 8:42:16 ч. Гринуич-4, RAM PRASAD TWISTED ILLUSIONS написа: Hi

Re: [ClusterLabs] connection timed out fence_virsh monitor stonith

2020-02-24 Thread Strahil Nikolov
prefer-fence_zc-mail-1_virsh) > > Resource: fence_zc-mail-2_virsh >Enabled on: zc-mail-1-ha (score:INFINITY) (role: Started) >(id:cli-prefer-fence_zc-mail-2_virsh) > >Ordering Constraints: > >Colocation Constraints: > >Ticket Constraints: I notice that the issue happens at 00:00 on both days . Have you checked for a backup or other cron job that is 'overloading' the virtualization host ? Anything in libvirt logs or in the hosts' /var/log/messages ? Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-25 Thread Strahil Nikolov
instead of using such feature. The interesting part will be the behaviour of the local cluster stack, when updates happen. The risk is high for the node to be fenced due to unresponsiveness (during the update) or if corosync/pacemaker use an old function changed in the libs. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: rulke-based operation pause/freeze?

2020-03-05 Thread Strahil Nikolov
to forget such setting. Another approach is to leave the monitoring period high enough ,so the cluster won't catch the downtime - but imagine that the downtime of the NFS has to be extended - do you believe that you will be able to change all affected resources on time ? Best Regards, Strahil Nikol

Re: [ClusterLabs] Antw: [EXT] Re: clusterlabs.org upgrade done

2020-03-04 Thread Strahil Nikolov
_ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ Maybe I will be unsubscribed every 10th email instead of every 5th one. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] DRBD not failing over

2020-02-26 Thread Strahil Nikolov
___ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >> Is your DRBD used as LVM PV -> like as a disk for iSCSI LUN ? If yes, ensure that you have an LVM global filter for the /dev/drbdXYZ and the physical devices (like /dev/sdXYZ ) and the wwid . Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

2020-02-27 Thread Strahil Nikolov
rner cases. For a higher-level tool or anything external >to pacemaker, one such corner case is a "time-of-check/time-of-use" >problem -- determining the list of active resources has to be done >separately from configuring the bans, and it's possible the list could >c

[ClusterLabs] SBD on shared disk

2020-02-05 Thread Strahil Nikolov
Hello Community, I'm preparing for my EX436 and I was wondering if there are any drawbacks if a shared LUN is split into 2 partitions and the first partition is used for SBD , while the second one for Shared File System (Either XFS for active/passive, or GFS2 for active/active). Do you see any

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-05 Thread Strahil Nikolov
;node { >ring0_addr: 001db01a >nodeid: 1 >} > >node { >ring0_addr: 001db01b >nodeid: 2 >} >} > >quorum { >provider: corosync_votequorum >two_node: 1 >} > >logging { >to_logfile: yes >l

Re: [ClusterLabs] Why Do Nodes Leave the Cluster?

2020-02-06 Thread Strahil Nikolov
is vulnerable to split brain, especially when one of the nodes is syncing (for example after a patching) and the source is fenced/lost/disconnected. It's very hard to extract data from a semi-synced drbd. Also, if you need guidance for the SELINUX, I can point you to my guide in the centos for

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Strahil Nikolov
On February 20, 2020 12:49:43 PM GMT+02:00, Maverick wrote: > >> You really need to debug the start & stop of tthe resource . >> >> Please try the debug procedure and provide the output: >> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures >>

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Strahil Nikolov
gt; >On 20/02/2020 16:46, Strahil Nikolov wrote: >> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick >wrote: >>>> You really need to debug the start & stop of tthe resource . >>>> >>>> Please try the debug procedure and provide th

Re: [ClusterLabs] "apache httpd program not found" "environment is invalid, resource considered stopped"

2020-02-19 Thread Strahil Nikolov
e to copy/paste: httpd://local/host:1090/server-status Otherwise - check the protocol.As status URL should be available only from 127.0.0.1, you can use 'http' instead. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterl

Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed

2020-02-19 Thread Strahil Nikolov
Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ Are you sure that there is no cluster peotocol mismatch ? Major number OS Upgrade (even if supported by vendor) must be done offline (with proper testing in

Re: [ClusterLabs] Fedora 31 - systemd based resources don't start

2020-02-20 Thread Strahil Nikolov
gt;> Check it out - maybe this is your reason. >> >> Best Regards, >> Strahil Nikolov > >Yes, i have stonith disabled, because as soon as the resources startup >fail on boot, node was rebooted. > > >Anyway, i was checking the pacemaker logs and the journal lo

Re: [ClusterLabs] How to unfence without reboot (fence_mpath)

2020-02-16 Thread Strahil Nikolov
op of DRBD /RHEL 7 again/, and it seems that SCSI Reservations Support is OK. Best Regards, Strahil Nikolov В неделя, 16 февруари 2020 г., 23:11:40 Гринуич-5, Ondrej написа: Hello Strahil, On 2/17/20 11:54 AM, Strahil Nikolov wrote: > Hello Community, > > This is m

Re: [ClusterLabs] How to unfence without reboot (fence_mpath)

2020-02-17 Thread Strahil Nikolov
On February 17, 2020 3:36:27 PM GMT+02:00, Ondrej wrote: >Hello Strahil, > >On 2/17/20 3:39 PM, Strahil Nikolov wrote: >> Hello Ondrej, >> >> thanks for your reply. I really appreciate that. >> >> I have picked fence_multipath as I'm preparing fo

[ClusterLabs] Saving secret locally

2020-01-18 Thread Strahil Nikolov
in the CIB : crm resource secret set I've been searching in the pcs --help and on  https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md  , but it seems it's not there or I can't find it. Thanks in advance. Best Regards, Strahil Nikolov

Re: [ClusterLabs] pcs stonith fence - Error: unable to fence

2020-01-18 Thread Strahil Nikolov
Best Regards, Strahil Nikolov В неделя, 19 януари 2020 г., 00:01:11 ч. Гринуич+2, Strahil Nikolov написа: Hi All, I am building a test cluster with fence_rhevm stonith agent on RHEL 7.7 and oVirt 4.3. When I fenced drbd3 from drbd1 using 'pcs stonith fence drbd3' - the fence action

Re: [ClusterLabs] Antw: [EXT] Multiple nfsserver resource groups

2020-03-09 Thread Strahil Nikolov
erkill >>> for our case. >> >> Things you might consider is to get reid of the groups and use >> explicit colocation and orderings. The advantages will be that you >can >> execute >several >> agents in parallel (e.g. prepare all fileysstems in parallel).

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-08 Thread Strahil Nikolov
corosync creates single node >membership. >>> After line becomes active "multicast" is delivered to other nodes >and >>> they move to gather state. >>> >> >> I would expect "reasonable timeout" to also take in account knet >delay. >> >>> So to answer you question. "Delay" is on both nodes side because >link is >>> not established between the nodes. >>> >> >> knet was expected to improve things, was not it? :) >> >___ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ I would have increased the consensus with several seconds. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump

2020-04-15 Thread Strahil Nikolov
hieve the jump of virtual_ip > The mode we use is Active / Passive mode > The Resource Agent we use is ocf: heartbeat: IPaddr2 > Hope you can solve my confusion Hello, Can you provide the version of the stack, your config and the command you run to put the node in sa

Re: [ClusterLabs] NFS in different subnets

2020-04-18 Thread Strahil Nikolov
tions of >Einstein’s brain than in the near certainty that people of equal talent >have lived and died in cotton fields and sweatshops." - Stephen Jay >Gould >___ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ I don't get something. Why this cannot be done? One node is in siteA, one in siteB , qnet on third location.Routing between the 2 subnets is established and symmetrical. Fencing via IPMI or SBD (for example from a HA iSCSI cluster) is configured The NFS resource is started on 1 node and a special RA is used for the DNS records. If node1 dies, the cluster will fence it and node2 will power up the NFS and update the records. Of course, updating DNS only from 1 side must work for both sites. Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Verifying DRBD Run-Time Configuration

2020-04-12 Thread Strahil Nikolov
On April 12, 2020 10:58:39 AM GMT+03:00, Eric Robinson wrote: >> -Original Message- >> From: Strahil Nikolov >> Sent: Sunday, April 12, 2020 2:54 AM >> To: Cluster Labs - All topics related to open-source clustering >welcomed >> ; Eric Robinson >&g

Re: [ClusterLabs] When the active node enters the standby state, what should be done to make the VIP not automatically jump

2020-04-16 Thread Strahil Nikolov
nfig and the command >you run to put the node in sandby ? > >Best Regards, >Strahil Nikolov > >- > >Sorry, I don't know how to reply correctly, so I pasted the previous >chat content on it > >The following are the commands we use > >pcs pro

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-06 Thread Strahil Nikolov
s this thread. >>>> >>>> thanks again >>>> ___ >>>> Manage your subscription: >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> ClusterLabs home: https://www.clusterlabs.org/ >> >> ___ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >___ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Retrofit MySQL with pacemaker?

2020-05-03 Thread Strahil Nikolov
air of systems , you can try the first approach (building a fresh cluster and configure it while the DB is running). Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] heartbeat IP chenged to 127.0.0.1

2020-05-13 Thread Strahil Nikolov
># corosync-cfgtool -s >Printing ring status. >Local node ID 2 >RING ID 0 >     id    = 127.0.0.1 >     status    = ring 0 active with no faults >RING ID 1 >     id    = 127.0.0.1 >     status    = ring 1 active with no faults > >What is wrong

Re: [ClusterLabs] Antw: [EXT] Re: heartbeat IP chenged to 127.0.0.1

2020-05-13 Thread Strahil Nikolov
Ulrich > >> >> Disconnect the heartbeat network cable ,and corosync-cfgtool -s: >> >> RING ID 0 >> id= 127.0.0.1 >> status= ring 0 active with no faults >> RING ID 1 >> id= 127.0.0.1 >> status= ring 1 active with

Re: [ClusterLabs] Removing DRBD w/out Data Loss?

2020-09-09 Thread Strahil Nikolov
I would play safe and leave drbd run , but in single node mode (no peers). As it won't replicate - it should be as close to bare metal as possible. Best Regards, Strahil Nikolov В сряда, 9 септември 2020 г., 15:11:06 Гринуич+3, Eric Robinson написа: Valentin -- With DRBD stopped

Re: [ClusterLabs] Maintenance mode status in CIB

2020-10-13 Thread Strahil Nikolov
Yep , both work without affecting the resources : crm cluster stop pcs cluster stop  Once your maintenance is over , you can start the cluster and everything will be back in maintenance. Best Regards, Strahil Nikolov В вторник, 13 октомври 2020 г., 19:15:27 Гринуич+3, Digimer написа

Re: [ClusterLabs] Maintenance mode status in CIB

2020-10-13 Thread Strahil Nikolov
Also, it's worth mentioning that you can set the whole cluster in global maintenance and power off the stack on all nodes without affecting your resources. I'm not sure if that is ever possible in node maintenance. Best Regards, Strahil Nikolov В вторник, 13 октомври 2020 г., 12:49:38

Re: [ClusterLabs] Open Source Linux Load Balancer with HA and Split Brain Prevention?

2020-10-04 Thread Strahil Nikolov
in the web , haproxy has a ready-to-go resource agent 'ocf:heartbeat:haproxy' , so you can give it a try. Best Regards, Strahil Nikolov В неделя, 4 октомври 2020 г., 22:41:59 Гринуич+3, Eric Robinson написа:    Greetings!   We are looking for an open-source Linux load-balancing

Re: [ClusterLabs] Two ethernet adapter within same subnet causing issue on Qdevice

2020-10-06 Thread Strahil Nikolov
I agree, it's more of a routing problem. Actually a static route should fix the issue. Best Regards, Strahil Nikolov В вторник, 6 октомври 2020 г., 10:50:24 Гринуич+3, Jan Friesse написа: Richard , > To clarify my problem, this is more on Qdevice issue I want to fix. The quest

Re: [ClusterLabs] Active-Active cluster CentOS 8

2020-08-23 Thread Strahil Nikolov
There is a topic about that at https://bugs.centos.org/view.php?id=16939 Based on the comments you can obtain it from  https://koji.mbox.centos.org/koji/buildinfo?buildID=4801 , but I haven' tested it. Best Regards, Strahil Nikolov В петък, 21 август 2020 г., 18:30:31 Гринуич+3, Mark

Re: [ClusterLabs] node utilization attributes are lost during upgrade

2020-08-18 Thread Strahil Nikolov
Won't it be easier if: - set a node in standby - stop a node - remove the node - add again with the new hostname Best Regards, Strahil Nikolov На 18 август 2020 г. 17:15:49 GMT+03:00, Ken Gaillot написа: >On Tue, 2020-08-18 at 14:35 +0200, Kadlecsik József wrote: >> Hi, >> &g

Re: [ClusterLabs] why is node fenced ?

2020-08-19 Thread Strahil Nikolov
Hi Bernd, As SLES 12 is in a such a support phase, I guess SUSE will provide fixes only for SLES 15. It will be best if you open them a case and ask them about that. Best Regards, Strahil Nikolov На 19 август 2020 г. 17:29:32 GMT+03:00, "Lentes, Bernd" написа: > >- On Au

Re: [ClusterLabs] Format of '--lifetime' in 'pcs resource move'

2020-08-20 Thread Strahil Nikolov
Have you tried ISO 8601 format. For example: 'PT20M' The ISo format is described at: https://manpages.debian.org/testing/crmsh/crm.8.en.html Best Regards, Strahil Nikolov На 20 август 2020 г. 13:40:16 GMT+03:00, Digimer написа: >Hi all, > > Reading the pcs man page for the 'mov

Re: [ClusterLabs] Resources restart when a node joins in

2020-08-27 Thread Strahil Nikolov
Hi Quentin, in order to get help it will be easier if you provide both corosync and pacemaker configuration. Best Regards, Strahil Nikolov В четвъртък, 27 август 2020 г., 17:10:01 Гринуич+3, Citron Vert написа: Hi, Sorry for using this email adress, my name is Quentin. Thank

Re: [ClusterLabs] DRBD resource not starting

2020-08-15 Thread Strahil Nikolov
And how did you define the drbd resource? Best Regards, Strahil Nikolov На 14 август 2020 г. 18:48:32 GMT+03:00, Gerry Kernan написа: >Hi >Im trying to add a drbd resource to pacemaker cluster on centos 7 > > >But getting this error on pcs status >drbd_r0

Re: [ClusterLabs] SBD fencing not working on my two-node cluster

2020-09-22 Thread Strahil Nikolov
=true P.S.: Consider setting the 'resource-stickiness' to '1'.Using partitions is not the best option but is better than nothing. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 02:06:10 Гринуич+3, Philippe M Stedman написа: Hi Strahil, Here is the output of those

Re: [ClusterLabs] SBD fencing not working on my two-node cluster

2020-09-21 Thread Strahil Nikolov
XYZ/". Also , SBD needs max 10MB block device and yours seems unnecessarily big. Most probably /dev/sde1 is your problem.  Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 23:19:47 Гринуич+3, Philippe M Stedman написа: Hi, I have been following the instructi

Re: [ClusterLabs] Pacemaker not starting

2020-09-23 Thread Strahil Nikolov
What is the output of 'corosync-quorumtool -s' on both nodes ? What is your cluster's configuration : 'crm configure show' or 'pcs config' Best Regards, Strahil Nikolov В сряда, 23 септември 2020 г., 16:07:16 Гринуич+3, Ambadas Kawle написа: Hello All We have 2 node with Mysql

Re: [ClusterLabs] Is the "allow_downscale" option supported by Corosync/Pacemaker?

2020-09-25 Thread Strahil Nikolov
I would use the 'last_man_standing: 1' + 'wait_for_all: 1'. When you shoutdown a node gracefully , the quorum is recalculated. You can check the manpage for explanation. Best Regards, Strahil Nikolov В петък, 25 септември 2020 г., 01:19:09 Гринуич+3, Philippe M Stedman написа: Hi

Re: [ClusterLabs] Resources always return to original node

2020-09-26 Thread Strahil Nikolov
Resource Stickiness for a group is the sum of all resources' resource stikiness -> 5 resources x 100 score (default stickiness) = 500 score. If your location constraint has a bigger number -> it wins :) Best Regards, Strahil Nikolov В събота, 26 септември 2020 г., 12:22:32 Гри

Re: [ClusterLabs] How to stop removed resources when replacing cib.xml via cibadmin or crm_shadow

2020-10-01 Thread Strahil Nikolov
That's the strangest request I have heard so far ... What is the reason not to use crmsh or pcs to manage the cluster ? About your question , have you tried to load a cib with the old resources stopped and then another one with the stopped resources removed ? Best Regards, Strahil Nikolov

Re: [ClusterLabs] fence_mpath in latest fence-agents: single reservation after fence

2020-06-01 Thread Strahil Nikolov
I don't see the reservation key in multipath.conf . Have you set it up in unique way (each host has it's own key)? Best Regards, Strahil Nikolov На 1 юни 2020 г. 16:04:32 GMT+03:00, Rafael David Tinoco написа: >Hello again, > >Long time I don't show up... I was finishing up details

Re: [ClusterLabs] Triggering script on cib change

2020-09-16 Thread Strahil Nikolov
example is available at :  https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-eventnotification-haar Best Regards, Strahil Nikolov В сряда, 16 септември 2020 г., 09:20:44 Гринуич+3, Digimer написа: Is there a way

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-21 Thread Strahil Nikolov
Both SUSE and RedHat provide utilities to add the node without messing with the configs manually. What is your distro ? Best Regards, Strahil Nikolov В сряда, 21 октомври 2020 г., 17:03:19 Гринуич+3, Jiaqi Tian1 написа: Hi, I'm trying to add a new node into an active pacemaker

Re: [ClusterLabs] Upgrading/downgrading cluster configuration

2020-10-23 Thread Strahil Nikolov
Usually I prefer to use "crm configure show" and later "crm configure edit" and replace the config. I am not sure if this will work with such downgrade scenario, but it shouldn't be a problem. Best Regards, Strahil Nikolov В четвъртък, 22 октомври 2020 г., 21:30:

Re: [ClusterLabs] VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-23 Thread Strahil Nikolov
why don't you work with something like this: 'op stop interval =300 timeout=600'. The stop operation will timeout at your requirements without modifying the script. Best Regards, Strahil Nikolov В четвъртък, 22 октомври 2020 г., 23:30:08 Гринуич+3, Lentes, Bernd написа: Hi guys

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-27 Thread Strahil Nikolov
nd-clusters_considerations-in-adopting-rhel-8#new_commands_for_authenticating_nodes_in_a_cluster Best Regards, Strahil Nikolov В вторник, 27 октомври 2020 г., 18:06:06 Гринуич+2, Jiaqi Tian1 написа: Hi Xin, Thank you. The crmsh version is 4.1.0.0, OS is RHEL 8.0.   I have t

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-27 Thread Strahil Nikolov
Ulrich, do you mean '--queue' ? Best Regards, Strahil Nikolov В вторник, 27 октомври 2020 г., 12:15:16 Гринуич+2, Ulrich Windl написа: >>> "Lentes, Bernd" schrieb am 26.10.2020 um 21:44 in Nachricht <1480408662.7194527.1603745092927.javamail.zim...

Re: [ClusterLabs] Antw: [EXT] Re: Upgrading/downgrading cluster configuration

2020-10-26 Thread Strahil Nikolov
>>> Strahil Nikolov schrieb am 23.10.2020 um 17:04 in Nachricht <362944335.2019534.1603465466...@mail.yahoo.com>: > Usually I prefer to use "crm configure show" and later "crm configure edit" > and replace the config. >I guess you use "

Re: [ClusterLabs] Antw: [EXT] Re: VirtualDomain does not stop via "crm resource stop" - modify RA ?

2020-10-26 Thread Strahil Nikolov
I think it's useful - for example a HANA powers up for 10-15min (even more , depends on storage tier) - so the default will time out and the fun starts there. Maybe the cluster is just showing them without using them , but it looked quite the opposite. Best Regards, Strahil Nikolov В

Re: [ClusterLabs] fence_virt architecture? (was: Re: Still Beginner STONITH Problem)

2020-07-19 Thread Strahil Nikolov
see any proof that I'm right (libvirt network was in NAT mode) or wrong (VMs using Host's bond in a bridged network). Best Regards, Strahil Nikolov На 19 юли 2020 г. 9:45:29 GMT+03:00, Andrei Borzenkov написа: >18.07.2020 03:36, Reid Wahl пишет: >> I'm not sure that the libvir

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Strahil Nikolov
has no watchdog, you can use softdog kernel module for linux. Best Regards, Strahil Nikolov На 29 юли 2020 г. 9:01:22 GMT+03:00, Gabriele Bulfon написа: >That one was taken from a specific implementation on Solaris 11. >The situation is a dual node server with shared storage controller: >bo

Re: [ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-30 Thread Strahil Nikolov
unplanned downtime. Best Regards, Strahil Nikolov На 29 юли 2020 г. 12:46:16 GMT+03:00, lkxjtu написа: >Hi Reid Wahl, > > >There are more log informations below. The reason seems to be that >communication with DBUS timed out. Any suggestions? > > >1672712 Jul 24 21:20:17 [3945

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Strahil Nikolov
This one links to how to power fence when reservations are removed: https://access.redhat.com/solutions/4526731 Best Regards, Strahil Nikolov На 30 юли 2020 г. 9:28:51 GMT+03:00, Andrei Borzenkov написа: >30.07.2020 08:42, Strahil Nikolov пишет: >> You got plenty of options: &

Re: [ClusterLabs] Antw: [EXT] Re: Maximum cluster size with Pacemaker 2.x and Corosync 3.x, and scaling to hundreds of nodes

2020-07-31 Thread Strahil Nikolov
, Strahil Nikolov На 31 юли 2020 г. 8:57:29 GMT+03:00, Ulrich Windl написа: >>>> Ken Gaillot schrieb am 30.07.2020 um 16:43 in >Nachricht ><93b973947008b62c4848f8a799ddc3f0949451e8.ca...@redhat.com>: >> On Wed, 2020‑07‑29 at 23:12 +, Toby Haynes wrote: >>

Re: [ClusterLabs] About the log indicating RA execution

2020-07-02 Thread Strahil Nikolov
Usually I check the logs on the Designated Coordinator , especially when it was not fenced. Best Regards, Strahil Nikolov На 2 юли 2020 г. 12:12:04 GMT+03:00, "井上和徳" написа: >Hi all, > >We think it is desirable to output the log indicating the start and >finish of

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-07 Thread Strahil Nikolov
. Best Regards, Strahil Nikolov На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: > >What does 'virsh list' > >give you onthe 2 hosts? Hopefully different names for > >the VMs ... > >Yes, each host shows its

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-07 Thread Strahil Nikolov
We had no issues with fencing, but we got plenty of san issues to test the fencing :) Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-06 Thread Strahil Nikolov
- as in many environments it can be dropped by firewalls. Best Regards, Strahil Nikolov На 6 юли 2020 г. 12:24:08 GMT+03:00, Klaus Wenninger написа: >On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: >> Hello, >> >> >> # fence_xvm -o list >> >> kvm

Re: [ClusterLabs] qnetd and booth arbitrator running together in a 3rd geo site

2020-07-14 Thread Strahil Nikolov
And whatabout SBD (a.k.a. poison pill). I've used it reliably with 3 SBDs on a streched cluster. Neverr failed to kill the node. Best Regards, Strahil Nikolov На 14 юли 2020 г. 14:18:56 GMT+03:00, Rohit Saini написа: >I dont think my question was very clear. I am stric

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-15 Thread Strahil Nikolov
How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but obviously I'm missing something. Best Regards, Strahil Nikolov На 14 юли 2020 г. 11:06:42 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >H

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-15 Thread Strahil Nikolov
By default libvirt is using NAT and not routed network - in such case, vm1 won't receive data from host2. Can you provide the Networks' xml ? Best Regards, Strahil Nikolov На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger написа: >On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-15 Thread Strahil Nikolov
If it is created by libvirt - this is NAT and you will never receive output from the other host. Best Regards, Strahil Nikolov На 15 юли 2020 г. 15:05:48 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > >Am 15.07.2020 um 13:42 Strahil Nikolov wrote: &g

Re: [ClusterLabs] jquery in pcs package

2020-07-02 Thread Strahil Nikolov
Firewalld's add-service (without zone definition) will add it on the default zone which by default is public. If you have public and private zones , and the cluster is supposed to communicate over the private VLAN, you can open the port only there. Best Regards, Strahil Nikolov На 2 юли

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-09 Thread Strahil Nikolov
account for Red Hat, you can check https://access.redhat.com/solutions/917833 Best Regards, Strahil Nikolov На 9 юли 2020 г. 17:01:13 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > >thanks for the advise. I have worked through that list as follows: &g

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-08 Thread Strahil Nikolov
pervisours - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) - fence_xvm on both VMs In your case , the primary suspect is multicast traffic. Best Regards, Strahil Nikolov На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-08 Thread Strahil Nikolov
pervisours - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) - fence_xvm on both VMs In your case , the primary suspect is multicast traffic. Best Regards, Strahil Nikolov На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >

Re: [ClusterLabs] pacemaker together with ovirt or Kimchi ?

2020-07-11 Thread Strahil Nikolov
It won't make sense. oVirt has a built-in HA for Virtual Machines. Best Regards, Strahil Nikolov В събота, 11 юли 2020 г., 17:50:18 ч. Гринуич+3, Lentes, Bernd написа: Hi, i'm having a two node cluster with pacemaker and about 10 virtual domains as resources. It's running fine. I

Re: [ClusterLabs] Antw: [EXT] Failed fencing monitor process (fence_vmware_soap) RHEL 8

2020-06-18 Thread Strahil Nikolov
What about second fencing mechanism ? You can add a shared (independent) vmdk as an sbd device. The reconfiguration will require cluster downtime, but this is only necessary once. Once 2 fencing mechanisms are available - you can configure the order easily. Best Regards, Strahil Nikolov В

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Two node cluster and extended distance/site failure

2020-06-24 Thread Strahil Nikolov
Instead of NFS, iSCSI is also an option. Best Regards, Strahil Nikolov На 24 юни 2020 г. 13:42:26 GMT+03:00, Andrei Borzenkov написа: >24.06.2020 12:20, Ulrich Windl пишет: >>> >>> How Service Guard handles loss of shared storage? >> >> When a node is up

Re: [ClusterLabs] Beginner with STONITH Problem

2020-06-24 Thread Strahil Nikolov
is hosted . Another approach is to use a shared disk (either over iSCSI or SAN) and use sbd for power-based fencing, or use SCSI3 Persistent Reservations (which can also be converted into a power-based fencing). Best Regards, Strahil Nikolov На 24 юни 2020 г. 13:44:27 GMT+03:00, "stefan

Re: [ClusterLabs] Beginner with STONITH Problem

2020-06-25 Thread Strahil Nikolov
Hi Stefan, this sounds like firewall issue. Check that the port udp/1229 is opened for the Hypervisours and tcp/1229 for the VMs. P.S.: The protocols are based on my fading memory, so double check the . Best Regards, Strahil Nikolov На 25 юни 2020 г. 18:18:46 GMT+03:00, "stefan

Re: [ClusterLabs] DRBD sync stalled at 100% ?

2020-06-28 Thread Strahil Nikolov
i was thinking about a github issue, but it seems that only 'linstor-server' has an issue section. Best Regards, Strahil Nikolov На 28 юни 2020 г. 20:13:21 GMT+03:00, Eric Robinson написа: >I could if linbit had per-incident pricing. Unfortunately, they only >offer yearly contracts,

Re: [ClusterLabs] Suggestions for multiple NFS mounts as LSB script

2020-06-29 Thread Strahil Nikolov
to use your script, you can create a systemd service to call it and ensure (via pacemaker) that service will be always running. Best Regards, Strahil Nikolov На 29 юни 2020 г. 16:15:42 GMT+03:00, Tony Stocker написа: >Hello > >We have a system which has become critical

Re: [ClusterLabs] Antw: [EXT] Failed fencing monitor process (fence_vmware_soap) RHEL 8

2020-06-18 Thread Strahil Nikolov
Nice to know. Yet, if the monitoring of that fencing device failed - most probably the Vcenter was not responding/unreachable - that's why I offered sbd . Best Regards, Strahil Nikolov На 18 юни 2020 г. 18:24:48 GMT+03:00, Ken Gaillot написа: >Note that a failed start of a stonith dev

Re: [ClusterLabs] DRBD sync stalled at 100% ?

2020-06-27 Thread Strahil Nikolov
I've seen this on a test setup after multiple network disruptions. I managed to fix it by stopping drbd on all nodes and starting it back. I guess you can get downtime and try that approach. Best Regards, Strahil Nikolov На 27 юни 2020 г. 16:36:10 GMT+03:00, Eric Robinson написа

Re: [ClusterLabs] DRBD sync stalled at 100% ?

2020-06-28 Thread Strahil Nikolov
I guess you can open an issue to linbit, as you still have the logs. Best Regards, Strahil Nikolov На 28 юни 2020 г. 8:19:59 GMT+03:00, Eric Robinson написа: >I fixed it with a drbd down/up. > >From: Users On Behalf Of Eric Robinson >Sent: Saturday, June 27, 2020 4:32 PM >T

Re: [ClusterLabs] Redudant Ring Network failure

2020-06-09 Thread Strahil Nikolov
Are you using multicast ? Best Regards, Strahil Nikolov На 9 юни 2020 г. 10:28:25 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL (EXTERN)" написа: >Hello, >We have massive problems with the redundant ring operation of our >Corosync / pacemaker 3 Node NFS clusters. > >Most

Re: [ClusterLabs] Redudant Ring Network failure

2020-06-09 Thread Strahil Nikolov
with tcpdump that the heartbeats are received from the remote side. 3. Check for retransmissions or packet loss. Usually you can find more details in the log specified in corosync.conf or in /var/log/messages (and also the journal). Best Regards, Strahil Nikolov На 9 юни 2020 г. 21:11:02 GMT+03:00

Re: [ClusterLabs] New user needs some help stabilizing the cluster

2020-06-10 Thread Strahil Nikolov
ings and way new stuff like sctp. P.S.: You can use a second fencing mechanism like 'sbd' a.k.a. "poison pill" , just make the vmdk shared & independent . This way your cluster can operate even when the vCenter is unreachable for any reason. Best Regards, Strahil Nikolov На 10 юн

Re: [ClusterLabs] Rolling upgrade from Corosync 2.3+ to Corosync 2.99+ or Corosync 3.0+?

2020-06-11 Thread Strahil Nikolov
and immediately bring the resources up 5. Remove access to the shared storage for the old cluster 6. Wipe the old cluster. Downtime will be way shorter. Best Regards, Strahil Nikolov На 11 юни 2020 г. 17:48:47 GMT+03:00, Vitaly Zolotusky написа: >Thank you very much for quick reply! >I wi

Re: [ClusterLabs] New user needs some help stabilizing the cluster

2020-06-12 Thread Strahil Nikolov
Don't forget to increase the consensus! Best Regards, Strahil Nikolov На 11 юни 2020 г. 22:11:09 GMT+03:00, Howard написа: >This is interesting. So it seems that 13,000 ms or 13 seconds is how >long >the VM was frozen during the snapshot backup and 0.8 seconds is the >t

Re: [ClusterLabs] New user needs some help stabilizing the cluster

2020-06-12 Thread Strahil Nikolov
And I forgot to ask ... Are you using memory-based snapshot ? It shouldn't take so long. Best Regards, Strahil Nikolov На 12 юни 2020 г. 7:10:38 GMT+03:00, Strahil Nikolov написа: >Don't forget to increase the consensus! > >Best Regards, >Strahil Nikolov > >На 11 юни 2020 г.

Re: [ClusterLabs] Setting up HA cluster on Raspberry pi4 with ubuntu 20.04 aarch64 architecture

2020-06-12 Thread Strahil Nikolov
Out of curiosity , are you running it on sles/opensuse? I think it is easier with 'crm cluster start'. Otherwise you can run 'journalctl -u pacemaker.service -e' to find what dependency has failed. Another one is: 'systemctl list-dependencies pacemaker.service' Best Regards, Strahil

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-17 Thread Strahil Nikolov
The simplest way to check if the libvirt's network is NAT (or not) is to try to ssh from the first VM to the second one. I should admit that I was lost when I tried to create a routed network in KVM, so I can't help with that. Best Regards, Strahil Nikolov На 17 юли 2020 г. 16:56:44 GMT+03

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-29 Thread Strahil Nikolov
Do you have a reason not to use any stonith already available ? Best Regards, Strahil Nikolov На 28 юли 2020 г. 13:26:52 GMT+03:00, Gabriele Bulfon написа: >Thanks, I attach here the script. >It basically runs ssh on the other node with no password (must be >preconfigured via auth

Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-07-30 Thread Strahil Nikolov
-> it's just a script started by the watchdog.service on the node itself.It should be usable on all Linuxes and many UNIX-like OSes. Best Regards, Strahil Nikolov На 30 юли 2020 г. 12:05:39 GMT+03:00, Gabriele Bulfon написа: >Reading sbd from SuSE I saw that it requires a special

Re: [ClusterLabs] Antw: [EXT] Re: Preferred node for a service (not constrained)

2020-12-03 Thread Strahil Nikolov
The problem with infinity is that the moment when the node is back - there will be a second failover. This is bad for bulky DBs that power down/up more than 30 min (15 min down, 15 min up). Best Regards, Strahil Nikolov В четвъртък, 3 декември 2020 г., 10:32:18 Гринуич+2, Andrei Borzenkov

Re: [ClusterLabs] Q: high-priority messages from DLM?

2020-12-05 Thread Strahil Nikolov
It's more interesting why you got connection close... Are you sure you didn't got network issues ? What is corosync saying in the lgos ? Offtopic: Are you using DLM with OCFS2 ? Best Regards, Strahil Nikolov В 10:33 -0800 на 04.12.2020 (пт), Reid Wahl написа: > On Fri, Dec 4, 2020 at 10:32

Re: [ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Strahil Nikolov
Nope, but if you don't use clustered FS, you could also use plain LVM + tags. As far as I know you need dlm and clvmd for clustered FS. Best Regards, Strahil Nikolov В вторник, 8 декември 2020 г., 10:15:39 Гринуич+2, Ulrich Windl написа: >>> Strahil Nikolov schrieb am 0

Re: [ClusterLabs] Preferred node for a service (not constrained)

2020-12-02 Thread Strahil Nikolov
node2 to node1 . Note: default stickiness is per resource , while the total stickiness score of a group is calculated based on the scores of all resources in it. Best Regards, Strahil Nikolov В сряда, 2 декември 2020 г., 16:54:43 Гринуич+2, Dan Swartzendruber написа: On 2020-11-30

Re: [ClusterLabs] Changing order in resource group after it's created

2020-12-17 Thread Strahil Nikolov
Use the syntax as if your resource was never in a group and use '--before/--after' to specify the new location. Best Regards, Strahil Nikolov В четвъртък, 17 декември 2020 г., 13:21:55 Гринуич+2, Tony Stocker написа: I have a resource group that has a number of entries. If I want

Re: [ClusterLabs] Cannot allocate memory in pgsql_monitor

2020-12-10 Thread Strahil Nikolov
systemd services do not use ulimit, so you need to check "systemctl show pacemaker.service" for any clues. I have seen similar error in SLES 12 SP2 when the maximum tasks was reduced and we were hitting the limit. Best Regards, Strahil Nikolov В четвъртък, 10 декември 2020 г.

Re: [ClusterLabs] Q: LVM-activate a shared LV

2020-12-10 Thread Strahil Nikolov
I think that dlm + clvmd was enough to take care of OCFS2 . Have you tried that ? Best Regards, Strahil Nikolov В четвъртък, 10 декември 2020 г., 16:55:52 Гринуич+2, Ulrich Windl написа: Hi! I configured a clustered LV (I think) for activation on three nodes, but it won't work

  1   2   3   >