Re: [ClusterLabs] Upgrade to OLE8 + Pacemaker

2023-10-06 Thread Jibrail, Qusay (GfK) via Users
Hi,

May I get an answer please?

Kind regards,
––
Qusay Jibrail
Senior Infrastructure Engineer – Linux | GfK IT Services
GfK – an NIQ company | The Netherlands
Krijgsman 22-25 | Amstelveen | 1186 DM
T: +31 88 435 1232 | M: +31 628 927 686
[website]
[blog]
[instagram]
[linkedin]
[youtube]
[twitter]


From: Jibrail, Qusay (GfK)
Sent: Wednesday, 4 October 2023 11:11
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: RE: [ClusterLabs] Upgrade to OLE8 + Pacemaker

Hi Tomas,

Ok.. it is getting little bit complicated.

What about this approach:


  *   pcs cluster stop “server3”, do upgrade to OLE8 + update  pacemaker, 
corosync and pcs, check postfix is working, wait 1 day.
  *   pcs cluster stop “server4”, do upgrade to OLE8 + update pacemaker, 
corosync and pcs, check postfix is working, wait 1 day. Now we have both 
servers running the same version of OS, pacemaker, corosync and pcs
  *   Then pcs cluster start “server3” and pcs cluster start “server4”

Will above works?
We will have 2 days without load balancing.

As the server will be rebooted during OLE8 upgrade, is it better to disable 
pacemaker, corosync and pcs services from starting after reboot?
Or will it not start till I run the command pcs cluster start “server3” and pcs 
cluster start “server4”?

Kind regards,
––
Qusay Jibrail
Senior Infrastructure Engineer – Linux | GfK IT Services
GfK – an NIQ company | The Netherlands
Krijgsman 22-25 | Amstelveen | 1186 DM
T: +31 88 435 1232 | M: +31 628 927 686
[website]
[blog]
[instagram]
[linkedin]
[youtube]
[twitter]


From: Users 
mailto:users-boun...@clusterlabs.org>> On Behalf 
Of Tomas Jelinek
Sent: Tuesday, 3 October 2023 16:50
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] Upgrade to OLE8 + Pacemaker

Dne 03. 10. 23 v 16:24 Jibrail, Qusay (GfK) via Users napsal(a):
Hi Reid,

Thank you for the answer.
So my plan will be:


  1.  pcs config backup  /root/"Server Name"
  2.  create a backup of /etc/corosync/
  3.  create a backup of /etc/postfix
  4.  pcs cluster stop “server3” --> just to do the failover to server4.
The command pcs cluster stop “server3” will stop corosync and pacemaker right?

Hi,

Yes, 'pcs cluster stop' command stops both pacemaker and corosync.

  1.
  2.  run pcs status on server3 which should give an error message. And on 
server4 should show 1 offline node and one online node
  3.  Upgrade server3 to OLE8 which will upgrade these 3 package to:
corosync   x86_64 
3.1.7-1.el8

pacemaker   x86_64 
2.1.5-9.3.0.1.el8_8

pcs x86_64  
   0.10.15-4.0.1.el8_8.1

  1.  Then run crm_verify to check the configuration. If the verification is OK 
then,
  2.  pcs cluster start “server3”
  3.  run pcs status on both nodes.

Please see the version of the current installed software.
[root@server3 ~]# corosync -v
Corosync Cluster Engine, version '2.4.5'
Copyright (c) 2006-2009 Red Hat, Inc.

[root@server3 ~]# pacemakerd --version
Pacemaker 1.1.23-1.0.1.el7_9.1
Written by Andrew Beekhof

[root@server3 ~]# pcs --version
0.9.169

Did I missed anything?

Corosync 3 is not compatible with Corosync 2. So once you update server3 to 
OLE8, it won't be able to join server4 in the cluster and take over cluster 
resources.

If you are restricted to two nodes, you may remove server3 from the cluster, 
update server3 to OLE8 and create a one node cluster on server3. Once you have 
two one-node clusters, move resources from server4 cluster to server3 cluster 
manually. Then you destroy cluster on server4, update server4 to OLE8, and add 
server4 to the new cluster.

Regards,
Tomas

Kind regards,
––
Qusay Jibrail
Senior Infrastructure Engineer – Linux | GfK IT Services
GfK – an NIQ company | The Netherlands
Krijgsman 22-25 | Amstelveen | 1186 DM
T: +31 88 435 1232 | M: +31 628 927 686
[website]
[blog]

Re: [ClusterLabs] Syncronous primary doesn't switch to async mode on replica power off

2023-10-06 Thread Sergey Cherukhin
Approach with alert agent is working now. It requires to call "pcs resource
cleanup" by root using sudo, add  "sleep 120" before calling pcs utility in
alert agent script and increase alert agent timeout adequately.

But I don't like this workaround, it takes too long a time to switch
primary node to async. Timeout 60s is not enough, so I increased it twice.
But even the 60s timeout is too much.

Maybe experienced  pacemaker users can advise me on some configuration
options to solve this problem, I don't know about.

Best regards,
Sergey Cherukhin

пт, 6 окт. 2023 г. в 16:08, Klaus Wenninger :

>
>
> On Fri, Oct 6, 2023 at 8:46 AM Sergey Cherukhin <
> sergey.cheruk...@gmail.com> wrote:
>
>> Hello!
>>
>> I used Microsoft Outlook to send this message and it was sent in the
>> wrong format. I'm sorry. I won't do it again.
>>
>> I use Postgresql+Pacemaker+Corosync cluster with 2 Postgresql instances
>> in synchronous replication mode. Parameter “rep_mode” is set to "sync", and
>> when I shut down the replica normal way, the primary node  switches to the
>> async mode. But when I  shut down the replica by powering it off to emulate
>> power unit failure, primary remains in sync mode and clients hang on INSERT
>> operations  until "pcs resource cleanup" is performed.  I created an alert
>> agent to run "pcs resource cleanup" when any node is lost, but this
>> approach doesn’t work.
>>
>> What should I do to be sure the primary node will switch to async mode if
>> the replica becomes lost for any cause?
>>
>
> One idea might be running (a) small daemon(s) colocated with the
> Postgresql instance(s) that uses pacemaker-tooling to check
> for the state of the partner-node and if it isn't there switches to async
> mode. You can solve this as a small custom Resource-Agent.
> Actually it wouldn't even be necessary to have a persistently running
> process - could be done in the monitoring as well.
> Of course you could enhance monitoring of Postgresql Resource-Agent as
> that it supports this switching.
> As this would be quite a generic change it would probably be interesting
> for the community as well.
>
> On the other hand I would have considered this issue so generic that it is
> hard to believe that there is no ready made / tested
> solution around already.
>
> To get it more reactive (without setting the monitoring-interval to
> incredibly low values) using an alert-agent (as you already tried)
> but maybe directly switching to async-mode might be worthwhile trying.
> Did you investigate what did actually go wrong when you made experiments
> with the alert-agent? Interesting that the
> resource cleanup that obviously works from the cmdline doesn't do the
> trick when run as alert-agent - maybe an selinux issue ...
>
> Regards,
> Klaus
>
>>
>>
>> Best regards,
>> Sergey Cherukhin
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Syncronous primary doesn't switch to async mode on replica power off

2023-10-06 Thread Klaus Wenninger
On Fri, Oct 6, 2023 at 8:46 AM Sergey Cherukhin 
wrote:

> Hello!
>
> I used Microsoft Outlook to send this message and it was sent in the wrong
> format. I'm sorry. I won't do it again.
>
> I use Postgresql+Pacemaker+Corosync cluster with 2 Postgresql instances in
> synchronous replication mode. Parameter “rep_mode” is set to "sync", and
> when I shut down the replica normal way, the primary node  switches to the
> async mode. But when I  shut down the replica by powering it off to emulate
> power unit failure, primary remains in sync mode and clients hang on INSERT
> operations  until "pcs resource cleanup" is performed.  I created an alert
> agent to run "pcs resource cleanup" when any node is lost, but this
> approach doesn’t work.
>
> What should I do to be sure the primary node will switch to async mode if
> the replica becomes lost for any cause?
>

One idea might be running (a) small daemon(s) colocated with the Postgresql
instance(s) that uses pacemaker-tooling to check
for the state of the partner-node and if it isn't there switches to async
mode. You can solve this as a small custom Resource-Agent.
Actually it wouldn't even be necessary to have a persistently running
process - could be done in the monitoring as well.
Of course you could enhance monitoring of Postgresql Resource-Agent as that
it supports this switching.
As this would be quite a generic change it would probably be interesting
for the community as well.

On the other hand I would have considered this issue so generic that it is
hard to believe that there is no ready made / tested
solution around already.

To get it more reactive (without setting the monitoring-interval to
incredibly low values) using an alert-agent (as you already tried)
but maybe directly switching to async-mode might be worthwhile trying.
Did you investigate what did actually go wrong when you made experiments
with the alert-agent? Interesting that the
resource cleanup that obviously works from the cmdline doesn't do the trick
when run as alert-agent - maybe an selinux issue ...

Regards,
Klaus

>
>
> Best regards,
> Sergey Cherukhin
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Syncronous primary doesn't switch to async mode on replica power off

2023-10-06 Thread Sergey Cherukhin
Hello!

I used Microsoft Outlook to send this message and it was sent in the wrong
format. I'm sorry. I won't do it again.

I use Postgresql+Pacemaker+Corosync cluster with 2 Postgresql instances in
synchronous replication mode. Parameter “rep_mode” is set to "sync", and
when I shut down the replica normal way, the primary node  switches to the
async mode. But when I  shut down the replica by powering it off to emulate
power unit failure, primary remains in sync mode and clients hang on INSERT
operations  until "pcs resource cleanup" is performed.  I created an alert
agent to run "pcs resource cleanup" when any node is lost, but this
approach doesn’t work.

What should I do to be sure the primary node will switch to async mode if
the replica becomes lost for any cause?

Best regards,
Sergey Cherukhin
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/