On Friday 06 August 2021 at 14:14:09, Andrei Borzenkov wrote: > On Thu, Aug 5, 2021 at 3:44 PM Antony Stone wrote: > > > > For anyone interested in the detail of how to do this (without needing > > booth), here is my cluster.conf file, as in "crm configure load replace > > cluster.conf": > > > > -------- > > node tom attribute site=cityA > > node dick attribute site=cityA > > node harry attribute site=cityA > > > > node fred attribute site=cityB > > node george attribute site=cityB > > node ron attribute site=cityB > > > > primitive A-float IPaddr2 params ip=192.168.32.250 cidr_netmask=24 meta > > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 > > on- fail=restart > > primitive B-float IPaddr2 params ip=192.168.42.250 cidr_netmask=24 meta > > migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 > > on- fail=restart > > primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60 > > op monitor interval=5 timeout=20 on-fail=restart > > > > group GroupA A-float4 resource-stickiness=100 > > group GroupB B-float4 resource-stickiness=100 > > group Anywhere Asterisk resource-stickiness=100 > > > > location pref_A GroupA rule -inf: site ne cityA > > location pref_B GroupB rule -inf: site ne cityB > > location no_pref Anywhere rule -inf: site ne cityA and site ne cityB > > > > colocation Ast 100: Anywhere [ cityA cityB ] > > You define a resource set, but there are no resources cityA or cityB, > at least you do not show them. So it is not quite clear what this > colocation does.
Apologies - I had used different names in my test setup, and converted them to cityA etc for the sake of continuity in this discussion. That should be: colocation Ast 100: Anywhere [ GroupA GroupB ] > > property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop > > If connectivity between (any two) sites is lost you may end up with > one of A or B going out of quorum. Agreed. > While this will stop active resources and restart them on another site, No. Resources do not start on the "wrong" site because of: location pref_A GroupA rule -inf: site ne cityA location pref_B GroupB rule -inf: site ne cityB The resources in GroupA either run in cityA or they do not run at all. > there is no coordination between stopping and starting so for some time > resources will be active on both sites. It is up to you to evaluate whether > this matters. Any resource which tried to start at the wrong site would simply fail, because the IP addresses involved do not work at the "other" site. > If this matters your solution does not protect against it. > > If this does not matter, the usual response is - why do you need a > cluster in the first place? Why not simply always run asterisk on both > sites all the time? Because Asterisk at cityA is bound to a floating IP address, which is held on one of the three machines in cityA. I can't run Asterisk on all three machines there because only one of them has the IP address. Asterisk _does_ normally run on both sites all the time, but only on one machine at each site. > > start-failure-is-fatal=false cluster-recheck-interval=60s > > -------- > > > > Of course, the group definitions are not needed for single resources, but > > I shall in practice be using multiple resources which do need groups, so > > I wanted to ensure I was creating something which would work with that. > > > I have tested it by: > ... > > - causing a network failure at one city (so it simply disappears without > > stopping corosync neatly): the other city continues its resources (plus > > the "anywhere" resource), the isolated city stops > > If the site is completely isolated it probably does not matter whether > anything is active there. It is partial connectivity loss where it > becomes interesting. Agreed, however my testing shows that resources which I want running in cityA are either running there or they're not (they never move to cityB or cityC), similarly for cityB, and the resources I want just a single instance of are doing just that, and on the same machine at cityA or cityB as the local resources are running on. Thanks for the feedback, Antony. -- "Measuring average network latency is about as useful as measuring the mean temperature of patients in a hospital." - Stéphane Bortzmeyer Please reply to the list; please *don't* CC me. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/