On Mon, 2024-08-12 at 22:47 +0300, ale...@pavlyuts.ru wrote: > Hi All, > > We use Pacemaker in specific scenario, where complex network > environment, including VLANs, IPs and routes managed by external > system and integrated by glue code by: > Load CIB database config section with cibadmin –query -- > scope=configuration > Add/delete prototypes and constraints > Apply the new config by cibadmin –replace --scope=configuration -- > xml-pip --sync-call > CIB taken from stdout, new cib load to by stdin, all done by Python > code > > All types handed with standard ocf:heartbeat resource scripts. > > VLANs defined as clones to ensure it is up on all nodes. > Then, order constraints given to start IP after vlan-clone (to ensure > VLAN exists), then route after proper IP. > > This works very good on mass-create, but we got some problems on > mass-delete. > > For my understanding of Pacemaker architecture and behavior: if it > got the new config, it recalculates resource allocation, build up the > target map with respect to [co]location constraints and then schedule > changes with respect to order constraints. So, if we delete at once > VLANS, IPs and routes, we also have to delete its constraints. Then, > the scheduling of resource stop will NOT take order constraints from > OLD config into consideration. Then, all the stops for VLAN, IPs and > routes will start in random order. However:
Correct > If VLAN deletes (stops) it also deletes all IPs, bound to the > interface. And all routes. > Then, IP resource trying to remove IP address which already deleted, > and then files. Moreover, as scripts run in parallel, it may see IP > active when it checks, but there already no IP when it tries to > delete. As it failed, it left as an orphan (stopped, blocked) and > only be clear with cleanup command. And this also ruins future CIB > updates. > About same logic between IP and routes. > > After realizing this, I have changed the logic, use two stages: > Read CIB > Disable all resources to delete (setting target_role=Stopped) and > send it with cibadmin > Delete all the resources from CIB and send it with cibadmin Good plan > My idea was that Pacemaker will plan and do resource shutdown at step > 2 with respect to order constraints which are still in the config. > And then, it is safe to delete already stopped resources. > > But I see the same troubles with this two-stage approach. Sometimes > some resources fail to stop because referenced entity is already > deleted. > > It seems like one of two: > Pacemaker does not respect order constraints when we put the new > config section directly > I misunderstand --sync-call cibadmin option, and it won’t wait for > new config really applied and returns immediately, therefore delete > starts before all stops compete. I did not find any explanation, and > my guess was it should wait for changes applied by pacemaker, but I > am not sure. The second one: --sync-call only waits for the change to be committed to the CIB, not for the cluster to respond. For that, call crm_resource --wait afterward. > > I need an advice about this situation and more information about > –sync-call option. Is it right approach or I need extra delay? Or > wait for everything stop by request state once and once? > > I will be very grateful for any ideas or information! > > Sincerely, > > Alex > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/