[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Ulrich Windl
>>> Gerald Vogt  schrieb am 02.03.2023 um 17:27 in Nachricht
:
> On 02.03.23 14:51, Ulrich Windl wrote:
> Gerald Vogt  schrieb am 02.03.2023 um 14:43 in Nachricht
>> <9ba5cd78-7b3d-32ef-38cf-5c5632c46...@spamcop.net>:
>>> On 02.03.23 14:30, Ulrich Windl wrote:
>>> Gerald Vogt  schrieb am 02.03.2023 um 08:41 in 
>>> Nachricht
 <624d0b70-5983-4d21-6777-55be91688...@spamcop.net>:
> Hi,
>
> I am setting up a mail relay cluster which main purpose is to maintain
> the service ips via IPaddr2 and move them between cluster nodes when
> necessary.
>
> The service ips should only be active on nodes which are running all
> necessary mail (systemd) services.
>
> So I have set up a resource for each of those services, put them into a
> group in order they should start, cloned the group as they are normally
> supposed to run on the nodes at all times.
>
> Then I added an order constraint
>  start mail-services-clone then start mail1-ip
>  start mail-services-clone then start mail2-ip
>
> and colocations to prefer running the ips on different nodes but only
> with the clone running:
>
>  colocation add mail2-ip with mail1-ip -1000
>  colocation mail1-ip with mail-services-clone
>  colocation mail2-ip with mail-services-clone
>
> as well as a location constraint to prefer running the first ip on the
> first node and the second on the second
>
>  location mail1-ip prefers ha1=2000
>  location mail2-ip prefers ha2=2000
>
> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
> fine. mail2-ip will be moved immediately to ha3. Good.
>
> However, if pacemaker on ha2 starts up again, it will immediately remove
> mail2-ip from ha3 and keep it offline, while the services in the group are
> starting on ha2. As the services unfortunately take some time to come
> up, mail2-ip is offline for more than a minute.

 That is because you wanted "mail2-ip prefers ha2=2000", so if the cluster
>>> _can_ run it there, then it will, even if it's running elsewhere.

 Maybe explain what you really want.
>>>
>>> As I wrote before: (and I have "fixed" my copy error above to use
>>> consistent resource names now)
>>>
>>> 1. I want to run all required services on all running nodes at all times.
>>>
>>> 2. I want two service IPs mail1-ip (ip1) and mail2-ip (ip2) running on
>>> the cluster but only on nodes where all required services are already
>>> running (and not just starting)
>>>
>>> 3. Both IPs should be running on two different nodes if possible.
>>>
>>> 4. Preferably mail1-ip should be on node ha1 if ha1 is running with all
>>> required services.
>>>
>>> 5. Preferably mail2-ip should be on node ha2 if ha1 is running with all
>>> required services.
>>>
>>> So most importantly: I want ip resources mail1-ip and mail2-ip only be
>>> active on nodes which are already running all services. They should only
>>> be moved to nodes on which all services are already running.
>> 
>> Hi!
>> 
>> Usually I prefer simple solutions over hightly complex ones.
>> WOuld it work to use a negative colocation for both IPs, as well as a 
> stickiness of maybe 500, then reducing the "prefer" value to something small 
> as 5 or 10.
>> Then the IP will stay elsewhere as long as the "basement services" run 
> there.
>> 
>> This approach does not change the order of resource operations; instead it 
> kind of minimizes them.
>> In my experience most people overspecify what the cluster should do.
> 
> Well, I guess it's not possible using a group. A group resource seems to 
> be good for a colocation the moment the first resource in the group has 
> been started (or even: is starting?). For a group which takes longer 
> time to completely start, that just doesn't work.
> 
> So I suppose the only two options would be to ungroup everything and 
> create colocation constraints between each invidual service and the ip 
> address. Although I not sure if that would just have the same issue, 
> just on a smaller scale.
> 
> The other alternative would be to start the services through systemd and 
> make pacemaker a dependency to start only after all services are 
> running. Pacemaker then only handles the ip addresses...

Well,

actually I was wondering why you wouldn't run a load balancer on top.
The load balancer will find out which nodes run the software stack (that could 
be controlled by systemd, but controlling it via cluster is probably easier to 
monitor).

Regards,
Ulrich

> 
> Thanks,
> 
> Gerald
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: 

[ClusterLabs] Release crmsh 4.5.0-rc1

2023-03-02 Thread Xin Liang via Users
Hello everyone!

I'm happy to announce the release crmsh 4.5.0-rc1 now is available!

Changes since tag 4.4.1

Features:

  *   Enable setup and manage cluster via non-root user 
(PR#1009,PR#1123,PR#1135)
  *   Gradually start the large cluster 
(PR#985)
  *   Populate advised monitor/start/stop operations values 
(PR#1038)
  *   Adjust cluster property priority-fencing-delay automatically 
(PR#1017)
  *   Add option -x to skip csync2 while bootstrapping 
(PR#1035)

  *

  *

Major fixes:

  *   Fix: qdevice: Unable to setup qdevice under non-root user (bsc#1208770)
  *   Fix: upgradeutil: do upgrade silently (bsc#1208327)
  *   Fix: bootstrap: crm cluster join ssh raises TypeError (bsc#1208327)
  *   Fix: utils: Change the way to get pacemaker's version (bsc#1208216)
  *   Fix: hawk fails to parse the slash (bsc#1206217)
  *   Fix: extra logs while configuring passwordless (bsc#1207720)
  *   Fix: report: Catch read exception (bsc#1206606)
  *   Fix: bootstrap: Unset SBD_DELAY_START when running 'crm cluster start' 
(bsc#1202177)
  *   Fix: ui_context: redirect foo -h/foo --help to help foo (bsc#1205735)
  *   Fix: qdevice: Adjust SBD_WATCHDOG_TIMEOUT when configuring qdevice not 
using stage (bsc#1205727)
  *   Fix: cibconfig: Complete promotable=true and interlave=true for 
Promoted/Unpromoted resource (bsc#1205522)
  *   Fix: corosync: show corosync ring status if has fault (bsc#1205615)
  *   Fix: bootstrap: fix passwordless ssh authentication for hacluster 
automatically when a new node is joining the cluster (bsc#1201785)
  *   Fix: upgradeutil: automated init ssh passwordless auth for hacluster 
after upgrading (bsc#1201785)
  *   fix: log: fail to open log file even if user is in haclient group 
(bsc#1204670)
  *   Fix: sbd: Ask if overwrite when given sbd device on interactive 
mode(bsc#1201428)
  *   Fix: ui_cluster: 'crm cluster stop' failed to stop services (bsc#1203601)
  *   Fix: crash_test: do not use firewalld to isolate a cluster node 
(bsc#1192467)
  *   Fix: parallax: Add LogLevel=error ssh option to filter out warnings 
(bsc#1196726)
  *   Revert "Fix: utils: Only raise exception when return code of systemctl 
command over ssh larger than 4 (bsc#1196726)" (bsc#1202655)
  *   Fix: configure: refresh cib before showing or modifying if no pending 
changes has been made (bsc#1202465)
  *   Fix: bootstrap: Use crmsh.parallax instead of parallax module directly 
(bsc#1202006)

Thanks to everyone who contributed to this release!
More changes details please see 
https://github.com/ClusterLabs/crmsh/blob/master/ChangeLog
Any feedback and suggestions are big welcome!



Regards,
xin

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Release crmsh 4.4.1

2023-03-02 Thread Xin Liang via Users
Hello everyone!

I'm happy to announce the release crmsh 4.4.1 now is available!

Changes since tag 4.4.0

Features:

  *   Enable "crm configure show related:" to show the objects by 
given ra type (PR#978)
  *   Parametrize the ra trace log dir 
(PR#939)
  *   Enable -N option setup the current node and peers all together 
(PR#961)

Major fixes:

  *   Fix: utils: use -o and -n to compare files instead of strings for 
crm_diff (bsc#1201312)
  *   Fix: crm report: use sudo when under non root and hacluster user 
(bsc#1199634)
  *   Fix: utils: wait4dc: Make change since output of 'crmadmin -S' 
changed(bsc#1199412)
  *   Fix: bootstrap: stop and disable csync2.socket on removed node 
(bsc#1199325)
  *   Fix: crm report: Read data in a save way, to avoid 
UnicodeDecodeError(bsc#1198180)
  *   Fix: qdevice: Add lock to protect init_db_on_qnetd function (bsc#1197323)
  *   Fix: utils: Only raise exception when return code of systemctl command 
over ssh larger than 4 (bsc#1196726)

Thanks to everyone who contributed to this release!
More changes details please see 
https://github.com/ClusterLabs/crmsh/blob/master/ChangeLog
Any feedback and suggestions are big welcome!



Regards,
xin

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Gerald Vogt

On 02.03.23 14:51, Ulrich Windl wrote:

Gerald Vogt  schrieb am 02.03.2023 um 14:43 in Nachricht

<9ba5cd78-7b3d-32ef-38cf-5c5632c46...@spamcop.net>:

On 02.03.23 14:30, Ulrich Windl wrote:

Gerald Vogt  schrieb am 02.03.2023 um 08:41 in Nachricht

<624d0b70-5983-4d21-6777-55be91688...@spamcop.net>:

Hi,

I am setting up a mail relay cluster which main purpose is to maintain
the service ips via IPaddr2 and move them between cluster nodes when
necessary.

The service ips should only be active on nodes which are running all
necessary mail (systemd) services.

So I have set up a resource for each of those services, put them into a
group in order they should start, cloned the group as they are normally
supposed to run on the nodes at all times.

Then I added an order constraint
 start mail-services-clone then start mail1-ip
 start mail-services-clone then start mail2-ip

and colocations to prefer running the ips on different nodes but only
with the clone running:

 colocation add mail2-ip with mail1-ip -1000
 colocation mail1-ip with mail-services-clone
 colocation mail2-ip with mail-services-clone

as well as a location constraint to prefer running the first ip on the
first node and the second on the second

 location mail1-ip prefers ha1=2000
 location mail2-ip prefers ha2=2000

Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
fine. mail2-ip will be moved immediately to ha3. Good.

However, if pacemaker on ha2 starts up again, it will immediately remove
mail2-ip from ha3 and keep it offline, while the services in the group are
starting on ha2. As the services unfortunately take some time to come
up, mail2-ip is offline for more than a minute.


That is because you wanted "mail2-ip prefers ha2=2000", so if the cluster

_can_ run it there, then it will, even if it's running elsewhere.


Maybe explain what you really want.


As I wrote before: (and I have "fixed" my copy error above to use
consistent resource names now)

1. I want to run all required services on all running nodes at all times.

2. I want two service IPs mail1-ip (ip1) and mail2-ip (ip2) running on
the cluster but only on nodes where all required services are already
running (and not just starting)

3. Both IPs should be running on two different nodes if possible.

4. Preferably mail1-ip should be on node ha1 if ha1 is running with all
required services.

5. Preferably mail2-ip should be on node ha2 if ha1 is running with all
required services.

So most importantly: I want ip resources mail1-ip and mail2-ip only be
active on nodes which are already running all services. They should only
be moved to nodes on which all services are already running.


Hi!

Usually I prefer simple solutions over hightly complex ones.
WOuld it work to use a negative colocation for both IPs, as well as a stickiness of maybe 
500, then reducing the "prefer" value to something small as 5 or 10.
Then the IP will stay elsewhere as long as the "basement services" run there.

This approach does not change the order of resource operations; instead it kind 
of minimizes them.
In my experience most people overspecify what the cluster should do.


Well, I guess it's not possible using a group. A group resource seems to 
be good for a colocation the moment the first resource in the group has 
been started (or even: is starting?). For a group which takes longer 
time to completely start, that just doesn't work.


So I suppose the only two options would be to ungroup everything and 
create colocation constraints between each invidual service and the ip 
address. Although I not sure if that would just have the same issue, 
just on a smaller scale.


The other alternative would be to start the services through systemd and 
make pacemaker a dependency to start only after all services are 
running. Pacemaker then only handles the ip addresses...


Thanks,

Gerald
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Ulrich Windl
>>> Gerald Vogt  schrieb am 02.03.2023 um 14:43 in Nachricht
<9ba5cd78-7b3d-32ef-38cf-5c5632c46...@spamcop.net>:
> On 02.03.23 14:30, Ulrich Windl wrote:
> Gerald Vogt  schrieb am 02.03.2023 um 08:41 in Nachricht
>> <624d0b70-5983-4d21-6777-55be91688...@spamcop.net>:
>>> Hi,
>>>
>>> I am setting up a mail relay cluster which main purpose is to maintain
>>> the service ips via IPaddr2 and move them between cluster nodes when
>>> necessary.
>>>
>>> The service ips should only be active on nodes which are running all
>>> necessary mail (systemd) services.
>>>
>>> So I have set up a resource for each of those services, put them into a
>>> group in order they should start, cloned the group as they are normally
>>> supposed to run on the nodes at all times.
>>>
>>> Then I added an order constraint
>>> start mail-services-clone then start mail1-ip
>>> start mail-services-clone then start mail2-ip
>>>
>>> and colocations to prefer running the ips on different nodes but only
>>> with the clone running:
>>>
>>> colocation add mail2-ip with mail1-ip -1000
>>> colocation mail1-ip with mail-services-clone
>>> colocation mail2-ip with mail-services-clone
>>>
>>> as well as a location constraint to prefer running the first ip on the
>>> first node and the second on the second
>>>
>>> location mail1-ip prefers ha1=2000
>>> location mail2-ip prefers ha2=2000
>>>
>>> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
>>> fine. mail2-ip will be moved immediately to ha3. Good.
>>>
>>> However, if pacemaker on ha2 starts up again, it will immediately remove
>>> mail2-ip from ha3 and keep it offline, while the services in the group are
>>> starting on ha2. As the services unfortunately take some time to come
>>> up, mail2-ip is offline for more than a minute.
>> 
>> That is because you wanted "mail2-ip prefers ha2=2000", so if the cluster 
> _can_ run it there, then it will, even if it's running elsewhere.
>> 
>> Maybe explain what you really want.
> 
> As I wrote before: (and I have "fixed" my copy error above to use 
> consistent resource names now)
> 
> 1. I want to run all required services on all running nodes at all times.
> 
> 2. I want two service IPs mail1-ip (ip1) and mail2-ip (ip2) running on 
> the cluster but only on nodes where all required services are already 
> running (and not just starting)
> 
> 3. Both IPs should be running on two different nodes if possible.
> 
> 4. Preferably mail1-ip should be on node ha1 if ha1 is running with all 
> required services.
> 
> 5. Preferably mail2-ip should be on node ha2 if ha1 is running with all 
> required services.
> 
> So most importantly: I want ip resources mail1-ip and mail2-ip only be 
> active on nodes which are already running all services. They should only 
> be moved to nodes on which all services are already running.

Hi!

Usually I prefer simple solutions over hightly complex ones.
WOuld it work to use a negative colocation for both IPs, as well as a 
stickiness of maybe 500, then reducing the "prefer" value to something small as 
5 or 10.
Then the IP will stay elsewhere as long as the "basement services" run there.

This approach does not change the order of resource operations; instead it kind 
of minimizes them.
In my experience most people overspecify what the cluster should do.

Kimnd regards,
Ulrich Windl

> 
> Thanks,
> 
> Gerald
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Gerald Vogt

On 02.03.23 14:30, Ulrich Windl wrote:

Gerald Vogt  schrieb am 02.03.2023 um 08:41 in Nachricht

<624d0b70-5983-4d21-6777-55be91688...@spamcop.net>:

Hi,

I am setting up a mail relay cluster which main purpose is to maintain
the service ips via IPaddr2 and move them between cluster nodes when
necessary.

The service ips should only be active on nodes which are running all
necessary mail (systemd) services.

So I have set up a resource for each of those services, put them into a
group in order they should start, cloned the group as they are normally
supposed to run on the nodes at all times.

Then I added an order constraint
start mail-services-clone then start mail1-ip
start mail-services-clone then start mail2-ip

and colocations to prefer running the ips on different nodes but only
with the clone running:

colocation add mail2-ip with mail1-ip -1000
colocation mail1-ip with mail-services-clone
colocation mail2-ip with mail-services-clone

as well as a location constraint to prefer running the first ip on the
first node and the second on the second

location mail1-ip prefers ha1=2000
location mail2-ip prefers ha2=2000

Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
fine. mail2-ip will be moved immediately to ha3. Good.

However, if pacemaker on ha2 starts up again, it will immediately remove
mail2-ip from ha3 and keep it offline, while the services in the group are
starting on ha2. As the services unfortunately take some time to come
up, mail2-ip is offline for more than a minute.


That is because you wanted "mail2-ip prefers ha2=2000", so if the cluster _can_ 
run it there, then it will, even if it's running elsewhere.

Maybe explain what you really want.


As I wrote before: (and I have "fixed" my copy error above to use 
consistent resource names now)


1. I want to run all required services on all running nodes at all times.

2. I want two service IPs mail1-ip (ip1) and mail2-ip (ip2) running on 
the cluster but only on nodes where all required services are already 
running (and not just starting)


3. Both IPs should be running on two different nodes if possible.

4. Preferably mail1-ip should be on node ha1 if ha1 is running with all 
required services.


5. Preferably mail2-ip should be on node ha2 if ha1 is running with all 
required services.


So most importantly: I want ip resources mail1-ip and mail2-ip only be 
active on nodes which are already running all services. They should only 
be moved to nodes on which all services are already running.


Thanks,

Gerald
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Vladislav Bogdanov
On Thu, 2023-03-02 at 14:30 +0100, Ulrich Windl wrote:
> > > > Gerald Vogt  schrieb am 02.03.2023 um 08:41
> > > > in Nachricht
> <624d0b70-5983-4d21-6777-55be91688...@spamcop.net>:
> > Hi,
> > 
> > I am setting up a mail relay cluster which main purpose is to
> > maintain 
> > the service ips via IPaddr2 and move them between cluster nodes
> > when 
> > necessary.
> > 
> > The service ips should only be active on nodes which are running
> > all 
> > necessary mail (systemd) services.
> > 
> > So I have set up a resource for each of those services, put them
> > into a 
> > group in order they should start, cloned the group as they are
> > normally 
> > supposed to run on the nodes at all times.
> > 
> > Then I added an order constraint
> >    start mail-services-clone then start mail1-ip
> >    start mail-services-clone then start mail2-ip
> > 
> > and colocations to prefer running the ips on different nodes but
> > only 
> > with the clone running:
> > 
> >    colocation add mail2-ip with mail1-ip -1000
> >    colocation ip1 with mail-services-clone
> >    colocation ip2 with mail-services-clone
> > 
> > as well as a location constraint to prefer running the first ip on
> > the 
> > first node and the second on the second
> > 
> >    location ip1 prefers ha1=2000
> >    location ip2 prefers ha2=2000
> > 
> > Now if I stop pacemaker on one of those nodes, e.g. on node ha2,
> > it's 
> > fine. ip2 will be moved immediately to ha3. Good.
> > 
> > However, if pacemaker on ha2 starts up again, it will immediately
> > remove 
> > ip2 from ha3 and keep it offline, while the services in the group
> > are 
> > starting on ha2. As the services unfortunately take some time to
> > come 
> > up, ip2 is offline for more than a minute.
> 
> That is because you wanted "ip2 prefers ha2=2000", so if the cluster
> _can_ run it there, then it will, even if it's running elsewhere.
> 

Pacemaker sometime places actions in the transition in a suboptimal
order (prom the humans point of view).
So instead of

start group on nodeB
stop vip on nodeA
start vip on nodeB

it runs

stop vip on nodeA
start group on nodeB
start vip on nodeB

So, if start of group takes a lot of time, then vip is not available on
any node during that start.

One more techniques to minimize the time during which vip is stopped
would be to add resource migration support to IPAddr2.
That could help, but I'm not sure.
At least I know for sure pacemaker behaves differently with migratable
resources and MAY decide to use the first order I provided..

> Maybe explain what you really want.
> 
> > 
> > It seems the colocations with the clone are already good once the
> > clone 
> > group begins to start services and thus allows the ip to be removed
> > from 
> > the current node.
> > 
> > I was wondering how can I define the colocation to be accepted only
> > if 
> > all services in the clone have been started? And not once the first
> > service in the clone is starting?
> > 
> > Thanks,
> > 
> > Gerald
> > 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users 
> > 
> > ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Andrei Borzenkov
On Thu, Mar 2, 2023 at 4:16 PM Gerald Vogt  wrote:
>
> On 02.03.23 13:51, Klaus Wenninger wrote:
> > Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
> > fine. ip2 will be moved immediately to ha3. Good.
> >
> > However, if pacemaker on ha2 starts up again, it will immediately
> > remove
> > ip2 from ha3 and keep it offline, while the services in the group are
> > starting on ha2. As the services unfortunately take some time to come
> > up, ip2 is offline for more than a minute.
> >
> > It seems the colocations with the clone are already good once the clone
> > group begins to start services and thus allows the ip to be removed
> > from
> > the current node.
> >
> >
> > To achieve this you have to add orders on top of collocations.
>
> I don't understand that.
>
> "order" and "colocation" are constraints. They work on resources.
>
> I don't see how I could add an order on top of a colocation constraint...
>

You cannot, but asymmetrical serializing constraint may do it

first start clone then stop ip

when new node comes up, pacemaker builds a transaction which starts
clone on the new node and moves ip (stops on the old node and starts
on the new node). These actions are (should be) part of the same
transaction so serializing constraints should apply.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Ulrich Windl
>>> Gerald Vogt  schrieb am 02.03.2023 um 08:41 in Nachricht
<624d0b70-5983-4d21-6777-55be91688...@spamcop.net>:
> Hi,
> 
> I am setting up a mail relay cluster which main purpose is to maintain 
> the service ips via IPaddr2 and move them between cluster nodes when 
> necessary.
> 
> The service ips should only be active on nodes which are running all 
> necessary mail (systemd) services.
> 
> So I have set up a resource for each of those services, put them into a 
> group in order they should start, cloned the group as they are normally 
> supposed to run on the nodes at all times.
> 
> Then I added an order constraint
>start mail-services-clone then start mail1-ip
>start mail-services-clone then start mail2-ip
> 
> and colocations to prefer running the ips on different nodes but only 
> with the clone running:
> 
>colocation add mail2-ip with mail1-ip -1000
>colocation ip1 with mail-services-clone
>colocation ip2 with mail-services-clone
> 
> as well as a location constraint to prefer running the first ip on the 
> first node and the second on the second
> 
>location ip1 prefers ha1=2000
>location ip2 prefers ha2=2000
> 
> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's 
> fine. ip2 will be moved immediately to ha3. Good.
> 
> However, if pacemaker on ha2 starts up again, it will immediately remove 
> ip2 from ha3 and keep it offline, while the services in the group are 
> starting on ha2. As the services unfortunately take some time to come 
> up, ip2 is offline for more than a minute.

That is because you wanted "ip2 prefers ha2=2000", so if the cluster _can_ run 
it there, then it will, even if it's running elsewhere.

Maybe explain what you really want.

> 
> It seems the colocations with the clone are already good once the clone 
> group begins to start services and thus allows the ip to be removed from 
> the current node.
> 
> I was wondering how can I define the colocation to be accepted only if 
> all services in the clone have been started? And not once the first 
> service in the clone is starting?
> 
> Thanks,
> 
> Gerald
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Vladislav Bogdanov
On Thu, 2023-03-02 at 08:41 +0100, Gerald Vogt wrote:
> Hi,
> 
> I am setting up a mail relay cluster which main purpose is to
> maintain 
> the service ips via IPaddr2 and move them between cluster nodes when 
> necessary.
> 
> The service ips should only be active on nodes which are running all 
> necessary mail (systemd) services.
> 
> So I have set up a resource for each of those services, put them into
> a 
> group in order they should start, cloned the group as they are
> normally 
> supposed to run on the nodes at all times.
> 
> Then I added an order constraint
>    start mail-services-clone then start mail1-ip
>    start mail-services-clone then start mail2-ip
> 
> and colocations to prefer running the ips on different nodes but only
> with the clone running:
> 
>    colocation add mail2-ip with mail1-ip -1000
>    colocation ip1 with mail-services-clone
>    colocation ip2 with mail-services-clone
> 
> as well as a location constraint to prefer running the first ip on
> the 
> first node and the second on the second
> 
>    location ip1 prefers ha1=2000
>    location ip2 prefers ha2=2000
> 
> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
> fine. ip2 will be moved immediately to ha3. Good.
> 
> However, if pacemaker on ha2 starts up again, it will immediately
> remove 
> ip2 from ha3 and keep it offline, while the services in the group are
> starting on ha2. As the services unfortunately take some time to come
> up, ip2 is offline for more than a minute.
> 
> It seems the colocations with the clone are already good once the
> clone 
> group begins to start services and thus allows the ip to be removed
> from 
> the current node.
> 
> I was wondering how can I define the colocation to be accepted only
> if 
> all services in the clone have been started? And not once the first 
> service in the clone is starting?
> 
> Thanks,
> 
> Gerald
> 

I noticed such behavior many years ago - it is especially visible with
a long-starting resources, and one of techniques
to deal with that is to use transient node attributes instead of
colocation/order between group and vip.
I'm not sure there is a suitable open-source resource agent which just
manages specified node attribute, but it should be
not hard to compose one which implements a pseudo-resource handler
together with atrrd_updater calls.
Probably you can trim all ethernet-related from a ethmonitor to make
such almost-dummy resource agent.

Once RA is there, you can add it as the last resource in the group, and
then rely on the attribute it manages to start your VIP.
That is done with location constraints, just use score-attribute in
their rules -
 
https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#rule-properties

So, the idea is: your custom RA sets attribute 'mail-clone-started' to
something like 1,
and you have a location constraint which prevents cluster from starting
your VIP resource on a node if value of  'mail-clone-started' attribute
on a node is less then 1 or not defined.
Once node has that attribute set (which happens at the very end of a
start sequence of a group) then (and only then) it decides to move your
VIP
to that node (because of other location constraints with preferences
you already have).

Just make sure attributes are transient (not stored into CIB).


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Gerald Vogt

On 02.03.23 13:51, Klaus Wenninger wrote:

Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
fine. ip2 will be moved immediately to ha3. Good.

However, if pacemaker on ha2 starts up again, it will immediately
remove
ip2 from ha3 and keep it offline, while the services in the group are
starting on ha2. As the services unfortunately take some time to come
up, ip2 is offline for more than a minute.

It seems the colocations with the clone are already good once the clone
group begins to start services and thus allows the ip to be removed
from
the current node.


To achieve this you have to add orders on top of collocations.


I don't understand that.

"order" and "colocation" are constraints. They work on resources.

I don't see how I could add an order on top of a colocation constraint...

Thanks,

Gerald

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Klaus Wenninger
On Thu, Mar 2, 2023 at 8:41 AM Gerald Vogt  wrote:

> Hi,
>
> I am setting up a mail relay cluster which main purpose is to maintain
> the service ips via IPaddr2 and move them between cluster nodes when
> necessary.
>
> The service ips should only be active on nodes which are running all
> necessary mail (systemd) services.
>
> So I have set up a resource for each of those services, put them into a
> group in order they should start, cloned the group as they are normally
> supposed to run on the nodes at all times.
>
> Then I added an order constraint
>start mail-services-clone then start mail1-ip
>start mail-services-clone then start mail2-ip
>
> and colocations to prefer running the ips on different nodes but only
> with the clone running:
>
>colocation add mail2-ip with mail1-ip -1000
>colocation ip1 with mail-services-clone
>colocation ip2 with mail-services-clone
>
> as well as a location constraint to prefer running the first ip on the
> first node and the second on the second
>
>location ip1 prefers ha1=2000
>location ip2 prefers ha2=2000
>
> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
> fine. ip2 will be moved immediately to ha3. Good.
>
> However, if pacemaker on ha2 starts up again, it will immediately remove
> ip2 from ha3 and keep it offline, while the services in the group are
> starting on ha2. As the services unfortunately take some time to come
> up, ip2 is offline for more than a minute.
>
> It seems the colocations with the clone are already good once the clone
> group begins to start services and thus allows the ip to be removed from
> the current node.
>

To achieve this you have to add orders on top of collocations.

Klaus


>
> I was wondering how can I define the colocation to be accepted only if
> all services in the clone have been started? And not once the first
> service in the clone is starting?
>
> Thanks,
>
> Gerald
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.5 released

2023-03-02 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.5.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.5.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.5.zip

Extended validation of resource and stonith attributes, added in the
previous release, is now disabled by default. We plan to enable it in
the future, once related issues in agents are resolved.
Other than that, this release brings a bunch of bug fixes.

Complete change log for this release:
## [0.11.5] - 2023-03-01

### Added
- Warning to `pcs resource|stonith update` commands about not using
  agent self-validation feature when the resource is already
  misconfigured ([rhbz#2151524])
- Add lib command `cluster_property.set_properties` to API v2
- Commands for checking and creating qdevice certificates on the local
  node only

### Fixed
- Graceful stopping pcsd service using `systemctl stop pcsd` command
- Displaying bool and integer values in `pcs resource config` command
  ([rhbz#2151164], [ghissue#604])
- Allow time values in stonith-watchdog-timeout property
  ([rhbz#2158790])
- Enable/Disable sbd when cluster is not running ([rhbz#2166249])
- Confusing error message in `pcs constraint ticket add` command
  ([rhbz#2168617], [ghpull#559])
- Internal server error during cluster setup with Ruby 3.2
- Set `Content-Security-Policy: frame-ancestors 'self'; default-src
  'self'` HTTP header for HTTP 404 responses as well ([rhbz#2160664])
- Validate dates in location constraint rules ([ghpull#644])

### Changed
- Resource/stonith agent self-validation of instance attributes is now
  disabled by default, as many agents do not work with it properly. Use
  flag '--agent-validation' to enable it in supported commands.
  ([rhbz#2159454])


Thanks / congratulations to everyone who contributed to this release,
including lixin, Lucas Kanashiro, Mamoru TASAKA, Michal Pospisil,
Miroslav Lisik, Ondrej Mular, Tomas Jelinek and wangluwei.

Cheers,
Tomas


[ghissue#604]: https://github.com/ClusterLabs/pcs/issues/604
[ghpull#559]: https://github.com/ClusterLabs/pcs/pull/559
[ghpull#644]: https://github.com/ClusterLabs/pcs/pull/644
[rhbz#2151164]: https://bugzilla.redhat.com/show_bug.cgi?id=2151164
[rhbz#2151524]: https://bugzilla.redhat.com/show_bug.cgi?id=2151524
[rhbz#2158790]: https://bugzilla.redhat.com/show_bug.cgi?id=2158790
[rhbz#2159454]: https://bugzilla.redhat.com/show_bug.cgi?id=2159454
[rhbz#2160664]: https://bugzilla.redhat.com/show_bug.cgi?id=2160664
[rhbz#2166249]: https://bugzilla.redhat.com/show_bug.cgi?id=2166249
[rhbz#2168617]: https://bugzilla.redhat.com/show_bug.cgi?id=2168617

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/