On Wed, Aug 28, 2024 at 8:04 PM Ken Gaillot <kgail...@redhat.com> wrote:
> Thanks for the report, and sorry for the slow response. > > There is a longstanding goal to improve systemd resource monitoring by > using DBus signals instead of polling the status: > > https://projects.clusterlabs.org/T25 > > There's a good chance that would avoid the need for the 2-second > polling at start and stop as well, which would take care of the most > significant problem here. > > I've created a task for avoiding systemd reloads when unnecessary: > > https://projects.clusterlabs.org/T870 > > but that's unlikely to happen, since the executor creates the overrides > as each start or stop happens, and the executor has no knowledge of > what other starts or stops might be planned. It would be an intrusive > change to get that information where it's needed. > And yes - doing a reload just on every 10th occasion is not a good solution. We had that and I removed it years ago ;-) Klaus > > On Fri, 2024-07-05 at 12:27 +0200, Borja Macho wrote: > > Hi everyone, > > > > I have been experiencing this issue with Pacemaker running in Debian > > 12 (pacemaker 2.1.5-1+deb12u1) this is the information I found so > > far: > > > > The reload operation takes around 0.5s to finish: > > ``` > > root@m1:~# time systemctl daemon-reload > > > > real 0m0.487s > > user 0m0.003s > > sys 0m0.009s > > ``` > > > > Before starting the systemd resources pacemaker performs a reload (I > > guess that due to the override it adds to the unit file), > > when starting at the same time more than one systemd resource these > > reloads stack one on top of each other, delaying the actual start of > > the resources. > > > > In the log you can see 9 reloads (for 9 resources, wait5_2 to > > wait5_10 as they start concurrently after wait5_1) each spaced by > > ~0.5s: > > ``` > > 2024-07-05T08:00:15.422759 m1 pacemaker-controld[3088077]: notice: > > Result of start operation for wait5_1 on m1: ok > > 2024-07-05T08:00:15.426589 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of monitor operation for wait5_1 on m1 > > 2024-07-05T08:00:15.427135 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_2 on m1 > > 2024-07-05T08:00:15.427581 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_3 on m1 > > 2024-07-05T08:00:15.427978 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_4 on m1 > > 2024-07-05T08:00:15.428375 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_5 on m1 > > 2024-07-05T08:00:15.428904 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_6 on m1 > > 2024-07-05T08:00:15.429259 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_7 on m1 > > 2024-07-05T08:00:15.429764 m1 pacemaker-controld[3088077]: notice: > > Result of monitor operation for wait5_1 on m1: ok > > 2024-07-05T08:00:15.430060 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_8 on m1 > > 2024-07-05T08:00:15.430501 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_9 on m1 > > 2024-07-05T08:00:15.430999 m1 pacemaker-controld[3088077]: notice: > > Requesting local execution of start operation for wait5_10 on m1 > > 2024-07-05T08:00:15.431668 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:15.845709 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:16.333209 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:16.857070 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:17.330217 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:17.859113 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:18.315644 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:18.749311 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:19.243338 m1 systemd[1]: Reloading. > > 2024-07-05T08:00:19.835406 m1 systemd[1]: Starting > > wait_5_to_start@10.service - Cluster Controlled wait_5_to_start@10... > > ``` > > > > That means around N times the reload time before the actual start of > > the systemd unit, being N the number of concurrent systemd resources > > started. > > > > Having a look at the code I saw a "hardcoded" maximum of 2 seconds > > (execd_commands.c -> action_complete() -> delay = QB_MIN(2000, > > delay);) between sending the start message to D-Bus and performing > > the first check, the problem lays in that when that first check is > > performed some of the systemd services have not yet been started, > > thus reporting the 'inactive' previous state, as is not yet > > activating. > > > > This delay can be seen in the re-scheduling of the monitor in the > > start operation: > > > > First resource started (failure): > > ``` > > Jul 05 08:02:40.975 m1 pacemaker-execd [3088074] > > (process_unit_method_reply) debug: DBus request for start of > > wait5_2 using /org/freedesktop/systemd1/job/639194 succeeded > > Jul 05 08:02:40.975 m1 pacemaker-execd [3088074] > > (action_complete) debug: wait5_2 start may still be in > > progress: re-scheduling (elapsed=487ms, remaining=99513ms, > > start_delay=2000ms) > > ``` > > > > Last resource started (success): > > ``` > > Jul 05 08:02:44.747 m1 pacemaker-execd [3088074] > > (process_unit_method_reply) debug: DBus request for start of > > wait5_10 using /org/freedesktop/systemd1/job/640170 succeeded > > Jul 05 08:02:44.747 m1 pacemaker-execd [3088074] > > (action_complete) debug: wait5_10 start may still be in > > progress: re-scheduling (elapsed=4259ms, remaining=95741ms, > > start_delay=2000ms) > > ``` > > > > Thus any number of concurrent systemd resource starts greater than > > ceil(2/reload_time) is prone to failure. > > > > Some extra information: > > Resources that reached the activating status before those 2 seconds > > ran out succeeded to start as they reported 'activating' when the > > first monitor was performed: > > ``` > > Jul 05 08:02:44.827 m1 pacemaker-execd [3088074] (log_execute) > > debug: executing - rsc:wait5_6 action:monitor > > call_id:249 > > Jul 05 08:02:44.827 m1 pacemaker-execd [3088074] > > (services__execute_systemd) debug: Performing asynchronous status > > op on systemd unit wait_5_to_start@6 for resource wait5_6 > > Jul 05 08:02:44.831 m1 pacemaker-execd [3088074] > > (action_complete) info: wait5_6 monitor is still in > > progress: re-scheduling (elapsed=4342ms, remaining=95658ms, > > start_delay=2000ms) > > ``` > > > > The systemd service used for those tests: > > ``` > > root@m1:~# systemctl cat wait_5_to_start@.service > > # /etc/systemd/system/wait_5_to_start@.service > > [Unit] > > Description=notify start after 5 seconds service %i > > > > [Service] > > Type=notify > > ExecStart=/usr/bin/python3 -c 'import time; import systemd.daemon; > > time.sleep(5); systemd.daemon.notify("READY=1"); time.sleep(86400)' > > ``` > > How the resources were created (and tested): > > ``` > > # for I in $(seq 1 10); do pcs resource create wait5_$I > > systemd:wait_5_to_start@$I op monitor interval="60s" timeout="100s" > > op start interval="0s" timeout="100s" op stop interval="0s" > > timeout="100s" --disabled; done > > # for I in $(seq 2 10); do pcs constraint colocation add wait5_$I > > with wait5_1 INFINITY; done > > # for I in $(seq 2 10); do pcs constraint order start wait5_1 then > > start wait5_$I kind=Mandatory; done > > # for I in $(seq 2 10); do pcs resource enable wait5_$I; done > > # pcs resource move wait5_1 > > ``` > > > > Best regards, > > Borja. > > > -- > Ken Gaillot <kgail...@redhat.com> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/