SBD_WATCHDOG_TIMEOUT=5
I have used the following example for setting up the sbd: https://acces
s.redhat.com/articles/3099231
Thank you for reading this long e-mail. I would be grateful if someone
finds out my mistake.
Best Regards,
Strahil Nikolov
___
describes how to properly configure fence_scsi and the
requirements for using it.
|
|
|
Have you checked if your storage supports persistent reservations?
Best Regards,Strahil Nikolov
В сряда, 30 октомври 2019 г., 8:42:16 ч. Гринуич-4, RAM PRASAD TWISTED
ILLUSIONS написа:
Hi
prefer-fence_zc-mail-1_virsh)
>
> Resource: fence_zc-mail-2_virsh
>Enabled on: zc-mail-1-ha (score:INFINITY) (role: Started)
>(id:cli-prefer-fence_zc-mail-2_virsh)
>
>Ordering Constraints:
>
>Colocation Constraints:
>
>Ticket Constraints:
I notice that the issue happens at 00:00 on both days .
Have you checked for a backup or other cron job that is 'overloading' the
virtualization host ?
Anything in libvirt logs or in the hosts' /var/log/messages ?
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
instead of using such feature.
The interesting part will be the behaviour of the local cluster stack, when
updates happen. The risk is high for the node to be fenced due to
unresponsiveness (during the update) or if corosync/pacemaker use an old
function changed in the libs.
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
to forget such setting.
Another approach is to leave the monitoring period high enough ,so the cluster
won't catch the downtime - but imagine that the downtime of the NFS has to be
extended - do you believe that you will be able to change all affected
resources on time ?
Best Regards,
Strahil Nikol
_
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
Maybe I will be unsubscribed every 10th email instead of every 5th one.
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
Is your DRBD used as LVM PV -> like as a disk for iSCSI LUN ?
If yes, ensure that you have an LVM global filter for the /dev/drbdXYZ and the
physical devices (like /dev/sdXYZ ) and the wwid .
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
rner cases. For a higher-level tool or anything external
>to pacemaker, one such corner case is a "time-of-check/time-of-use"
>problem -- determining the list of active resources has to be done
>separately from configuring the bans, and it's possible the list could
>c
Hello Community,
I'm preparing for my EX436 and I was wondering if there are any drawbacks if a
shared LUN is split into 2 partitions and the first partition is used for SBD ,
while the second one for Shared File System (Either XFS for active/passive, or
GFS2 for active/active).
Do you see any
;node {
>ring0_addr: 001db01a
>nodeid: 1
>}
>
>node {
>ring0_addr: 001db01b
>nodeid: 2
>}
>}
>
>quorum {
>provider: corosync_votequorum
>two_node: 1
>}
>
>logging {
>to_logfile: yes
>l
is vulnerable to split brain, especially when one of the nodes is
syncing (for example after a patching) and the source is
fenced/lost/disconnected. It's very hard to extract data from a semi-synced
drbd.
Also, if you need guidance for the SELINUX, I can point you to my guide in the
centos for
On February 20, 2020 12:49:43 PM GMT+02:00, Maverick wrote:
>
>> You really need to debug the start & stop of tthe resource .
>>
>> Please try the debug procedure and provide the output:
>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>
gt;
>On 20/02/2020 16:46, Strahil Nikolov wrote:
>> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick
>wrote:
>>>> You really need to debug the start & stop of tthe resource .
>>>>
>>>> Please try the debug procedure and provide th
e to copy/paste:
httpd://local/host:1090/server-status
Otherwise - check the protocol.As status URL should be available only from
127.0.0.1, you can use 'http' instead.
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterl
Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
Are you sure that there is no cluster peotocol mismatch ?
Major number OS Upgrade (even if supported by vendor) must be done offline
(with proper testing in
gt;> Check it out - maybe this is your reason.
>>
>> Best Regards,
>> Strahil Nikolov
>
>Yes, i have stonith disabled, because as soon as the resources startup
>fail on boot, node was rebooted.
>
>
>Anyway, i was checking the pacemaker logs and the journal lo
op of DRBD
/RHEL 7 again/, and it seems that SCSI Reservations Support is OK.
Best Regards,
Strahil Nikolov
В неделя, 16 февруари 2020 г., 23:11:40 Гринуич-5, Ondrej
написа:
Hello Strahil,
On 2/17/20 11:54 AM, Strahil Nikolov wrote:
> Hello Community,
>
> This is m
On February 17, 2020 3:36:27 PM GMT+02:00, Ondrej
wrote:
>Hello Strahil,
>
>On 2/17/20 3:39 PM, Strahil Nikolov wrote:
>> Hello Ondrej,
>>
>> thanks for your reply. I really appreciate that.
>>
>> I have picked fence_multipath as I'm preparing fo
in the CIB :
crm resource secret set
I've been searching in the pcs --help and on
https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md
, but it seems it's not there or I can't find it.
Thanks in advance.
Best Regards,
Strahil Nikolov
Best Regards,
Strahil Nikolov
В неделя, 19 януари 2020 г., 00:01:11 ч. Гринуич+2, Strahil Nikolov
написа:
Hi All,
I am building a test cluster with fence_rhevm stonith agent on RHEL 7.7 and
oVirt 4.3.
When I fenced drbd3 from drbd1 using 'pcs stonith fence drbd3' - the fence
action
erkill
>>> for our case.
>>
>> Things you might consider is to get reid of the groups and use
>> explicit colocation and orderings. The advantages will be that you
>can
>> execute
>several
>> agents in parallel (e.g. prepare all fileysstems in parallel).
corosync creates single node
>membership.
>>> After line becomes active "multicast" is delivered to other nodes
>and
>>> they move to gather state.
>>>
>>
>> I would expect "reasonable timeout" to also take in account knet
>delay.
>>
>>> So to answer you question. "Delay" is on both nodes side because
>link is
>>> not established between the nodes.
>>>
>>
>> knet was expected to improve things, was not it? :)
>>
>___
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
I would have increased the consensus with several seconds.
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
hieve the jump of virtual_ip
> The mode we use is Active / Passive mode
> The Resource Agent we use is ocf: heartbeat: IPaddr2
> Hope you can solve my confusion
Hello,
Can you provide the version of the stack, your config and the command you run
to put the node in sa
tions of
>Einstein’s brain than in the near certainty that people of equal talent
>have lived and died in cotton fields and sweatshops." - Stephen Jay
>Gould
>___
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
I don't get something.
Why this cannot be done?
One node is in siteA, one in siteB , qnet on third location.Routing between
the 2 subnets is established and symmetrical.
Fencing via IPMI or SBD (for example from a HA iSCSI cluster) is configured
The NFS resource is started on 1 node and a special RA is used for the DNS
records. If node1 dies, the cluster will fence it and node2 will power up
the NFS and update the records.
Of course, updating DNS only from 1 side must work for both sites.
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
On April 12, 2020 10:58:39 AM GMT+03:00, Eric Robinson
wrote:
>> -Original Message-
>> From: Strahil Nikolov
>> Sent: Sunday, April 12, 2020 2:54 AM
>> To: Cluster Labs - All topics related to open-source clustering
>welcomed
>> ; Eric Robinson
>&g
nfig and the command
>you run to put the node in sandby ?
>
>Best Regards,
>Strahil Nikolov
>
>-
>
>Sorry, I don't know how to reply correctly, so I pasted the previous
>chat content on it
>
>The following are the commands we use
>
>pcs pro
s this thread.
>>>>
>>>> thanks again
>>>> ___
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>___
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
Hi Sherrard,
Have you tried to increase the qnet timers in the corosync.conf ?
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
air of systems , you can try the first approach (building
a fresh cluster and configure it while the DB is running).
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
># corosync-cfgtool -s
>Printing ring status.
>Local node ID 2
>RING ID 0
> id = 127.0.0.1
> status = ring 0 active with no faults
>RING ID 1
> id = 127.0.0.1
> status = ring 1 active with no faults
>
>What is wrong
Ulrich
>
>>
>> Disconnect the heartbeat network cable ,and corosync-cfgtool -s:
>>
>> RING ID 0
>> id= 127.0.0.1
>> status= ring 0 active with no faults
>> RING ID 1
>> id= 127.0.0.1
>> status= ring 1 active with
I would play safe and leave drbd run , but in single node mode (no peers).
As it won't replicate - it should be as close to bare metal as possible.
Best Regards,
Strahil Nikolov
В сряда, 9 септември 2020 г., 15:11:06 Гринуич+3, Eric Robinson
написа:
Valentin --
With DRBD stopped
Yep , both work without affecting the resources :
crm cluster stop
pcs cluster stop
Once your maintenance is over , you can start the cluster and everything will
be back in maintenance.
Best Regards,
Strahil Nikolov
В вторник, 13 октомври 2020 г., 19:15:27 Гринуич+3, Digimer
написа
Also, it's worth mentioning that you can set the whole cluster in global
maintenance and power off the stack on all nodes without affecting your
resources.
I'm not sure if that is ever possible in node maintenance.
Best Regards,
Strahil Nikolov
В вторник, 13 октомври 2020 г., 12:49:38
in the web , haproxy has a ready-to-go resource
agent 'ocf:heartbeat:haproxy' , so you can give it a try.
Best Regards,
Strahil Nikolov
В неделя, 4 октомври 2020 г., 22:41:59 Гринуич+3, Eric Robinson
написа:
Greetings!
We are looking for an open-source Linux load-balancing
I agree,
it's more of a routing problem.
Actually a static route should fix the issue.
Best Regards,
Strahil Nikolov
В вторник, 6 октомври 2020 г., 10:50:24 Гринуич+3, Jan Friesse
написа:
Richard ,
> To clarify my problem, this is more on Qdevice issue I want to fix.
The quest
There is a topic about that at https://bugs.centos.org/view.php?id=16939
Based on the comments you can obtain it from
https://koji.mbox.centos.org/koji/buildinfo?buildID=4801 , but I haven' tested
it.
Best Regards,
Strahil Nikolov
В петък, 21 август 2020 г., 18:30:31 Гринуич+3, Mark
Won't it be easier if:
- set a node in standby
- stop a node
- remove the node
- add again with the new hostname
Best Regards,
Strahil Nikolov
На 18 август 2020 г. 17:15:49 GMT+03:00, Ken Gaillot
написа:
>On Tue, 2020-08-18 at 14:35 +0200, Kadlecsik József wrote:
>> Hi,
>>
&g
Hi Bernd,
As SLES 12 is in a such a support phase, I guess SUSE will provide fixes only
for SLES 15.
It will be best if you open them a case and ask them about that.
Best Regards,
Strahil Nikolov
На 19 август 2020 г. 17:29:32 GMT+03:00, "Lentes, Bernd"
написа:
>
>- On Au
Have you tried ISO 8601 format.
For example: 'PT20M'
The ISo format is described at:
https://manpages.debian.org/testing/crmsh/crm.8.en.html
Best Regards,
Strahil Nikolov
На 20 август 2020 г. 13:40:16 GMT+03:00, Digimer написа:
>Hi all,
>
> Reading the pcs man page for the 'mov
Hi Quentin,
in order to get help it will be easier if you provide both corosync and
pacemaker configuration.
Best Regards,
Strahil Nikolov
В четвъртък, 27 август 2020 г., 17:10:01 Гринуич+3, Citron Vert
написа:
Hi,
Sorry for using this email adress, my name is Quentin. Thank
And how did you define the drbd resource?
Best Regards,
Strahil Nikolov
На 14 август 2020 г. 18:48:32 GMT+03:00, Gerry Kernan
написа:
>Hi
>Im trying to add a drbd resource to pacemaker cluster on centos 7
>
>
>But getting this error on pcs status
>drbd_r0
=true
P.S.: Consider setting the 'resource-stickiness' to '1'.Using partitions is not
the best option but is better than nothing.
Best Regards,
Strahil Nikolov
В вторник, 22 септември 2020 г., 02:06:10 Гринуич+3, Philippe M Stedman
написа:
Hi Strahil,
Here is the output of those
XYZ/". Also , SBD needs max 10MB block device and yours seems
unnecessarily big.
Most probably /dev/sde1 is your problem.
Best Regards,
Strahil Nikolov
В понеделник, 21 септември 2020 г., 23:19:47 Гринуич+3, Philippe M Stedman
написа:
Hi,
I have been following the instructi
What is the output of 'corosync-quorumtool -s' on both nodes ?
What is your cluster's configuration :
'crm configure show' or 'pcs config'
Best Regards,
Strahil Nikolov
В сряда, 23 септември 2020 г., 16:07:16 Гринуич+3, Ambadas Kawle
написа:
Hello All
We have 2 node with Mysql
I would use the 'last_man_standing: 1' + 'wait_for_all: 1'.
When you shoutdown a node gracefully , the quorum is recalculated.
You can check the manpage for explanation.
Best Regards,
Strahil Nikolov
В петък, 25 септември 2020 г., 01:19:09 Гринуич+3, Philippe M Stedman
написа:
Hi
Resource Stickiness for a group is the sum of all resources' resource stikiness
-> 5 resources x 100 score (default stickiness) = 500 score.
If your location constraint has a bigger number -> it wins :)
Best Regards,
Strahil Nikolov
В събота, 26 септември 2020 г., 12:22:32 Гри
That's the strangest request I have heard so far ...
What is the reason not to use crmsh or pcs to manage the cluster ?
About your question , have you tried to load a cib with the old resources
stopped and then another one with the stopped resources removed ?
Best Regards,
Strahil Nikolov
I don't see the reservation key in multipath.conf .
Have you set it up in unique way (each host has it's own key)?
Best Regards,
Strahil Nikolov
На 1 юни 2020 г. 16:04:32 GMT+03:00, Rafael David Tinoco
написа:
>Hello again,
>
>Long time I don't show up... I was finishing up details
example is available at :
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-eventnotification-haar
Best Regards,
Strahil Nikolov
В сряда, 16 септември 2020 г., 09:20:44 Гринуич+3, Digimer
написа:
Is there a way
Both SUSE and RedHat provide utilities to add the node without messing with the
configs manually.
What is your distro ?
Best Regards,
Strahil Nikolov
В сряда, 21 октомври 2020 г., 17:03:19 Гринуич+3, Jiaqi Tian1
написа:
Hi,
I'm trying to add a new node into an active pacemaker
Usually I prefer to use "crm configure show" and later "crm configure edit" and
replace the config.
I am not sure if this will work with such downgrade scenario, but it shouldn't
be a problem.
Best Regards,
Strahil Nikolov
В четвъртък, 22 октомври 2020 г., 21:30:
why don't you work with something like this: 'op stop interval =300
timeout=600'.
The stop operation will timeout at your requirements without modifying the
script.
Best Regards,
Strahil Nikolov
В четвъртък, 22 октомври 2020 г., 23:30:08 Гринуич+3, Lentes, Bernd
написа:
Hi guys
nd-clusters_considerations-in-adopting-rhel-8#new_commands_for_authenticating_nodes_in_a_cluster
Best Regards,
Strahil Nikolov
В вторник, 27 октомври 2020 г., 18:06:06 Гринуич+2, Jiaqi Tian1
написа:
Hi Xin,
Thank you. The crmsh version is 4.1.0.0, OS is RHEL 8.0.
I have t
Ulrich,
do you mean '--queue' ?
Best Regards,
Strahil Nikolov
В вторник, 27 октомври 2020 г., 12:15:16 Гринуич+2, Ulrich Windl
написа:
>>> "Lentes, Bernd" schrieb am 26.10.2020
um
21:44 in Nachricht
<1480408662.7194527.1603745092927.javamail.zim...
>>> Strahil Nikolov schrieb am 23.10.2020 um 17:04 in
Nachricht <362944335.2019534.1603465466...@mail.yahoo.com>:
> Usually I prefer to use "crm configure show" and later "crm configure edit"
> and replace the config.
>I guess you use "
I think it's useful - for example a HANA powers up for 10-15min (even more ,
depends on storage tier) - so the default will time out and the fun starts
there.
Maybe the cluster is just showing them without using them , but it looked quite
the opposite.
Best Regards,
Strahil Nikolov
В
see any proof that I'm right
(libvirt network was in NAT mode) or wrong (VMs using Host's bond in a
bridged network).
Best Regards,
Strahil Nikolov
На 19 юли 2020 г. 9:45:29 GMT+03:00, Andrei Borzenkov
написа:
>18.07.2020 03:36, Reid Wahl пишет:
>> I'm not sure that the libvir
has no watchdog, you can use softdog kernel module for linux.
Best Regards,
Strahil Nikolov
На 29 юли 2020 г. 9:01:22 GMT+03:00, Gabriele Bulfon
написа:
>That one was taken from a specific implementation on Solaris 11.
>The situation is a dual node server with shared storage controller:
>bo
unplanned downtime.
Best Regards,
Strahil Nikolov
На 29 юли 2020 г. 12:46:16 GMT+03:00, lkxjtu написа:
>Hi Reid Wahl,
>
>
>There are more log informations below. The reason seems to be that
>communication with DBUS timed out. Any suggestions?
>
>
>1672712 Jul 24 21:20:17 [3945
This one links to how to power fence when reservations are removed:
https://access.redhat.com/solutions/4526731
Best Regards,
Strahil Nikolov
На 30 юли 2020 г. 9:28:51 GMT+03:00, Andrei Borzenkov
написа:
>30.07.2020 08:42, Strahil Nikolov пишет:
>> You got plenty of options:
&
,
Strahil Nikolov
На 31 юли 2020 г. 8:57:29 GMT+03:00, Ulrich Windl
написа:
>>>> Ken Gaillot schrieb am 30.07.2020 um 16:43 in
>Nachricht
><93b973947008b62c4848f8a799ddc3f0949451e8.ca...@redhat.com>:
>> On Wed, 2020‑07‑29 at 23:12 +, Toby Haynes wrote:
>>
Usually I check the logs on the Designated Coordinator , especially when it
was not fenced.
Best Regards,
Strahil Nikolov
На 2 юли 2020 г. 12:12:04 GMT+03:00, "井上和徳"
написа:
>Hi all,
>
>We think it is desirable to output the log indicating the start and
>finish of
.
Best Regards,
Strahil Nikolov
На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schm...@farmpartner-tec.com"
написа:
> >What does 'virsh list'
> >give you onthe 2 hosts? Hopefully different names for
> >the VMs ...
>
>Yes, each host shows its
We had no issues with fencing, but we got plenty of san issues to test the
fencing :)
Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
- as in many
environments it can be dropped by firewalls.
Best Regards,
Strahil Nikolov
На 6 юли 2020 г. 12:24:08 GMT+03:00, Klaus Wenninger
написа:
>On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote:
>> Hello,
>>
>> >> # fence_xvm -o list
>> >> kvm
And whatabout SBD (a.k.a. poison pill). I've used it reliably with 3 SBDs
on a streched cluster. Neverr failed to kill the node.
Best Regards,
Strahil Nikolov
На 14 юли 2020 г. 14:18:56 GMT+03:00, Rohit Saini
написа:
>I dont think my question was very clear. I am stric
How did you configure the network on your ubuntu 20.04 Hosts ? I tried to
setup bridged connection for the test setup , but obviously I'm missing
something.
Best Regards,
Strahil Nikolov
На 14 юли 2020 г. 11:06:42 GMT+03:00, "stefan.schm...@farmpartner-tec.com"
написа:
>H
By default libvirt is using NAT and not routed network - in such case, vm1
won't receive data from host2.
Can you provide the Networks' xml ?
Best Regards,
Strahil Nikolov
На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger
написа:
>On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.
If it is created by libvirt - this is NAT and you will never receive output
from the other host.
Best Regards,
Strahil Nikolov
На 15 юли 2020 г. 15:05:48 GMT+03:00, "stefan.schm...@farmpartner-tec.com"
написа:
>Hello,
>
>Am 15.07.2020 um 13:42 Strahil Nikolov wrote:
&g
Firewalld's add-service (without zone definition) will add it on the default
zone which by default is public.
If you have public and private zones , and the cluster is supposed to
communicate over the private VLAN, you can open the port only there.
Best Regards,
Strahil Nikolov
На 2 юли
account for Red Hat, you can check
https://access.redhat.com/solutions/917833
Best Regards,
Strahil Nikolov
На 9 юли 2020 г. 17:01:13 GMT+03:00, "stefan.schm...@farmpartner-tec.com"
написа:
>Hello,
>
>thanks for the advise. I have worked through that list as follows:
&g
pervisours
- Firewall opened (1229/udp for the hosts, 1229/tcp for the guests)
- fence_xvm on both VMs
In your case , the primary suspect is multicast traffic.
Best Regards,
Strahil Nikolov
На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com"
написа:
>
pervisours
- Firewall opened (1229/udp for the hosts, 1229/tcp for the guests)
- fence_xvm on both VMs
In your case , the primary suspect is multicast traffic.
Best Regards,
Strahil Nikolov
На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com"
написа:
>
It won't make sense.
oVirt has a built-in HA for Virtual Machines.
Best Regards,
Strahil Nikolov
В събота, 11 юли 2020 г., 17:50:18 ч. Гринуич+3, Lentes, Bernd
написа:
Hi,
i'm having a two node cluster with pacemaker and about 10 virtual domains as
resources.
It's running fine.
I
What about second fencing mechanism ?
You can add a shared (independent) vmdk as an sbd device. The reconfiguration
will require cluster downtime, but this is only necessary once.
Once 2 fencing mechanisms are available - you can configure the order easily.
Best Regards,
Strahil Nikolov
В
Instead of NFS, iSCSI is also an option.
Best Regards,
Strahil Nikolov
На 24 юни 2020 г. 13:42:26 GMT+03:00, Andrei Borzenkov
написа:
>24.06.2020 12:20, Ulrich Windl пишет:
>>>
>>> How Service Guard handles loss of shared storage?
>>
>> When a node is up
is hosted .
Another approach is to use a shared disk (either over iSCSI or SAN) and use
sbd for power-based fencing, or use SCSI3 Persistent Reservations (which can
also be converted into a power-based fencing).
Best Regards,
Strahil Nikolov
На 24 юни 2020 г. 13:44:27 GMT+03:00, "stefan
Hi Stefan,
this sounds like firewall issue.
Check that the port udp/1229 is opened for the Hypervisours and tcp/1229 for
the VMs.
P.S.: The protocols are based on my fading memory, so double check the .
Best Regards,
Strahil Nikolov
На 25 юни 2020 г. 18:18:46 GMT+03:00, "stefan
i was thinking about a github issue, but it seems that only 'linstor-server'
has an issue section.
Best Regards,
Strahil Nikolov
На 28 юни 2020 г. 20:13:21 GMT+03:00, Eric Robinson
написа:
>I could if linbit had per-incident pricing. Unfortunately, they only
>offer yearly contracts,
to use your script, you can create a systemd service to
call it and ensure (via pacemaker) that service will be always running.
Best Regards,
Strahil Nikolov
На 29 юни 2020 г. 16:15:42 GMT+03:00, Tony Stocker
написа:
>Hello
>
>We have a system which has become critical
Nice to know.
Yet, if the monitoring of that fencing device failed - most probably the
Vcenter was not responding/unreachable - that's why I offered sbd .
Best Regards,
Strahil Nikolov
На 18 юни 2020 г. 18:24:48 GMT+03:00, Ken Gaillot написа:
>Note that a failed start of a stonith dev
I've seen this on a test setup after multiple network disruptions.
I managed to fix it by stopping drbd on all nodes and starting it back.
I guess you can get downtime and try that approach.
Best Regards,
Strahil Nikolov
На 27 юни 2020 г. 16:36:10 GMT+03:00, Eric Robinson
написа
I guess you can open an issue to linbit, as you still have the logs.
Best Regards,
Strahil Nikolov
На 28 юни 2020 г. 8:19:59 GMT+03:00, Eric Robinson
написа:
>I fixed it with a drbd down/up.
>
>From: Users On Behalf Of Eric Robinson
>Sent: Saturday, June 27, 2020 4:32 PM
>T
Are you using multicast ?
Best Regards,
Strahil Nikolov
На 9 юни 2020 г. 10:28:25 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL (EXTERN)"
написа:
>Hello,
>We have massive problems with the redundant ring operation of our
>Corosync / pacemaker 3 Node NFS clusters.
>
>Most
with tcpdump that the heartbeats are received from the remote side.
3. Check for retransmissions or packet loss.
Usually you can find more details in the log specified in corosync.conf or in
/var/log/messages (and also the journal).
Best Regards,
Strahil Nikolov
На 9 юни 2020 г. 21:11:02 GMT+03:00
ings and way new stuff
like sctp.
P.S.: You can use a second fencing mechanism like 'sbd' a.k.a. "poison pill" ,
just make the vmdk shared & independent . This way your cluster can operate
even when the vCenter is unreachable for any reason.
Best Regards,
Strahil Nikolov
На 10 юн
and immediately bring the resources
up
5. Remove access to the shared storage for the old cluster
6. Wipe the old cluster.
Downtime will be way shorter.
Best Regards,
Strahil Nikolov
На 11 юни 2020 г. 17:48:47 GMT+03:00, Vitaly Zolotusky
написа:
>Thank you very much for quick reply!
>I wi
Don't forget to increase the consensus!
Best Regards,
Strahil Nikolov
На 11 юни 2020 г. 22:11:09 GMT+03:00, Howard написа:
>This is interesting. So it seems that 13,000 ms or 13 seconds is how
>long
>the VM was frozen during the snapshot backup and 0.8 seconds is the
>t
And I forgot to ask ... Are you using memory-based snapshot ?
It shouldn't take so long.
Best Regards,
Strahil Nikolov
На 12 юни 2020 г. 7:10:38 GMT+03:00, Strahil Nikolov
написа:
>Don't forget to increase the consensus!
>
>Best Regards,
>Strahil Nikolov
>
>На 11 юни 2020 г.
Out of curiosity , are you running it on sles/opensuse?
I think it is easier with 'crm cluster start'.
Otherwise you can run 'journalctl -u pacemaker.service -e' to find what
dependency has failed.
Another one is:
'systemctl list-dependencies pacemaker.service'
Best Regards,
Strahil
The simplest way to check if the libvirt's network is NAT (or not) is to try
to ssh from the first VM to the second one.
I should admit that I was lost when I tried to create a routed network in
KVM, so I can't help with that.
Best Regards,
Strahil Nikolov
На 17 юли 2020 г. 16:56:44 GMT+03
Do you have a reason not to use any stonith already available ?
Best Regards,
Strahil Nikolov
На 28 юли 2020 г. 13:26:52 GMT+03:00, Gabriele Bulfon
написа:
>Thanks, I attach here the script.
>It basically runs ssh on the other node with no password (must be
>preconfigured via auth
-> it's
just a script started by the watchdog.service on the node itself.It should
be usable on all Linuxes and many UNIX-like OSes.
Best Regards,
Strahil Nikolov
На 30 юли 2020 г. 12:05:39 GMT+03:00, Gabriele Bulfon
написа:
>Reading sbd from SuSE I saw that it requires a special
The problem with infinity is that the moment when the node is back - there will
be a second failover. This is bad for bulky DBs that power down/up more than 30
min (15 min down, 15 min up).
Best Regards,
Strahil Nikolov
В четвъртък, 3 декември 2020 г., 10:32:18 Гринуич+2, Andrei Borzenkov
It's more interesting why you got connection close...
Are you sure you didn't got network issues ? What is corosync saying in
the lgos ?
Offtopic: Are you using DLM with OCFS2 ?
Best Regards,
Strahil Nikolov
В 10:33 -0800 на 04.12.2020 (пт), Reid Wahl написа:
> On Fri, Dec 4, 2020 at 10:32
Nope,
but if you don't use clustered FS, you could also use plain LVM + tags.
As far as I know you need dlm and clvmd for clustered FS.
Best Regards,
Strahil Nikolov
В вторник, 8 декември 2020 г., 10:15:39 Гринуич+2, Ulrich Windl
написа:
>>> Strahil Nikolov schrieb am 0
node2 to node1 .
Note: default stickiness is per resource , while the total stickiness score of
a group is calculated based on the scores of all resources in it.
Best Regards,
Strahil Nikolov
В сряда, 2 декември 2020 г., 16:54:43 Гринуич+2, Dan Swartzendruber
написа:
On 2020-11-30
Use the syntax as if your resource was never in a group and use
'--before/--after' to specify the new location.
Best Regards,
Strahil Nikolov
В четвъртък, 17 декември 2020 г., 13:21:55 Гринуич+2, Tony Stocker
написа:
I have a resource group that has a number of entries. If I want
systemd services do not use ulimit, so you need to check "systemctl show
pacemaker.service" for any clues.
I have seen similar error in SLES 12 SP2 when the maximum tasks was reduced and
we were hitting the limit.
Best Regards,
Strahil Nikolov
В четвъртък, 10 декември 2020 г.
I think that dlm + clvmd was enough to take care of OCFS2 .
Have you tried that ?
Best Regards,
Strahil Nikolov
В четвъртък, 10 декември 2020 г., 16:55:52 Гринуич+2, Ulrich Windl
написа:
Hi!
I configured a clustered LV (I think) for activation on three nodes, but it
won't work
1 - 100 of 278 matches
Mail list logo