Hi All,
We use Pacemaker in specific scenario, where complex network environment, including VLANs, IPs and routes managed by external system and integrated by glue code by: 1. Load CIB database config section with cibadmin -query --scope=configuration 2. Add/delete prototypes and constraints 3. Apply the new config by cibadmin -replace --scope=configuration --xml-pip --sync-call CIB taken from stdout, new cib load to by stdin, all done by Python code All types handed with standard ocf:heartbeat resource scripts. VLANs defined as clones to ensure it is up on all nodes. Then, order constraints given to start IP after vlan-clone (to ensure VLAN exists), then route after proper IP. This works very good on mass-create, but we got some problems on mass-delete. For my understanding of Pacemaker architecture and behavior: if it got the new config, it recalculates resource allocation, build up the target map with respect to [co]location constraints and then schedule changes with respect to order constraints. So, if we delete at once VLANS, IPs and routes, we also have to delete its constraints. Then, the scheduling of resource stop will NOT take order constraints from OLD config into consideration. Then, all the stops for VLAN, IPs and routes will start in random order. However: 1. If VLAN deletes (stops) it also deletes all IPs, bound to the interface. And all routes. 2. Then, IP resource trying to remove IP address which already deleted, and then files. Moreover, as scripts run in parallel, it may see IP active when it checks, but there already no IP when it tries to delete. As it failed, it left as an orphan (stopped, blocked) and only be clear with cleanup command. And this also ruins future CIB updates. 3. About same logic between IP and routes. 4. After realizing this, I have changed the logic, use two stages: 1. Read CIB 2. Disable all resources to delete (setting target_role=Stopped) and send it with cibadmin 3. Delete all the resources from CIB and send it with cibadmin My idea was that Pacemaker will plan and do resource shutdown at step 2 with respect to order constraints which are still in the config. And then, it is safe to delete already stopped resources. But I see the same troubles with this two-stage approach. Sometimes some resources fail to stop because referenced entity is already deleted. It seems like one of two: 1. Pacemaker does not respect order constraints when we put the new config section directly 2. I misunderstand --sync-call cibadmin option, and it won't wait for new config really applied and returns immediately, therefore delete starts before all stops compete. I did not find any explanation, and my guess was it should wait for changes applied by pacemaker, but I am not sure. I need an advice about this situation and more information about -sync-call option. Is it right approach or I need extra delay? Or wait for everything stop by request state once and once? I will be very grateful for any ideas or information! Sincerely, Alex
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/