Re: [systemd-devel] Regarding service rate limiting (systemd 237)
On Fri, Jul 22, 2022 at 06:14:11PM +0530, Ani A wrote: > Found the issue, posting here to close this thread (and possibly help > someone who might land in this situation!) Thanks for sharing. > The daemon which had issues with rate-limit, was invoking some > `systemctl stop/start ` > commands in its initialization! (probably this has some unwanted side > effects?) Timing comes to my mind that could affect that. > If I eliminate that, then the rate-limit on the main daemon works fine! :) Yeah, better use explicit dependencies (Wants=/After=) instead of such a call-back. Michal
Re: [systemd-devel] Regarding service rate limiting (systemd 237)
Hi Michal, Found the issue, posting here to close this thread (and possibly help someone who might land in this situation!) The daemon which had issues with rate-limit, was invoking some `systemctl stop/start ` commands in its initialization! (probably this has some unwanted side effects?) If I eliminate that, then the rate-limit on the main daemon works fine! :) Thanks. -- Ani
Re: [systemd-devel] Regarding service rate limiting (systemd 237)
Hello. On Thu, Jul 14, 2022 at 09:29:37PM +0530, Ani A wrote: > StartLimitIntervalUSec=5min 20s > StartLimitBurst=5 > StartLimitAction=none > > The time is sufficient for 5 restarts, but still daemon keeps restarting! > > Scheduled restart job, restart counter is at 6 If the 5 restarts fit into the 320 seconds, then the start rate limit won't be active. You write it's sufficient so that sounds to me that your rate limit is too high to affect real service. Therefore, I'd suggest decrasing StartLimitBurst= or prolonging StartLimitIntervalSec= (so that limit rate is _lower_ than pathologic fatal restart rate). > Also, how to get rid of this: > >Unknown serialization key: ref-gid > > ? The upstream is typically concerned about last two systemd versions, so unless this happens with v251 or v250, I don't know, I'm sorry. Michal
Re: [systemd-devel] Regarding service rate limiting (systemd 237)
Hi Michal, >> systemctl show $UNIT | grep -E >> "StartLimit.*|InactiveExitTimestamp|ActiveEnterTimestamp" This is the output from unit show : InactiveExitTimestamp=Thu 2022-07-14 21:19:16 IST InactiveExitTimestampMonotonic=3181663063 ActiveEnterTimestamp=Thu 2022-07-14 21:19:16 IST ActiveEnterTimestampMonotonic=3181663063 StartLimitIntervalUSec=5min 20s StartLimitBurst=5 StartLimitAction=none The time is sufficient for 5 restarts, but still daemon keeps restarting! Scheduled restart job, restart counter is at 6 Also, how to get rid of this: Unknown serialization key: ref-gid ? -- Ani
Re: [systemd-devel] Regarding service rate limiting (systemd 237)
On Tue, Jul 12, 2022 at 03:36:55PM +0530, Ani A wrote: > Demo services work fine, the actual service is quite heavy and takes > time to startup. > > > you may not reach the sufficient fail rate for start limit to kick > I didn't get this part. I meant that your values might have corresponded to too high (re)start rate and the real service is slower, i.e. below that limit. > Say the daemon takes 60s to startup and crash and I set the > StartLimitIntervalSec=320 This should be sufficient time for 5 > restarts (?) That gives roughly 320s/5 ~ 64s per (re)start. So I'd say this is borderline, whether the limit throttles the service starts or not. You can try whether rate limit works for your real service by setting some very long StartLimitIntervalSec= (and then calibrating more precisely). systemctl show $UNIT | grep -E "StartLimit.*|InactiveExitTimestamp|ActiveEnterTimestamp" May give sou some insight into the timings (but internal ratelimiting parameters are not available). > Thanks, I didn't know about systemd-coredump, do I have to install > this separately? > I do not see coredump.conf or systemd-coredump service running on my host! > (Ubuntu 18.04) Not sure about that distro (and that age). You will ultimetely know if coredump is configured by reading /proc/sys/kernel/core_pattern > Also, I would be more interested to get the rate-limiting to work > rather than daemon respawning indefinitely. Fair enough (just wanted to point out that start limiting won't prevent coredump size accumulation). Michal
Re: [systemd-devel] Regarding service rate limiting (systemd 237)
Hi Michal, >Does your service crash later than the demo service terminates? Demo services work fine, the actual service is quite heavy and takes time to startup. > you may not reach the sufficient fail rate for start limit to kick I didn't get this part. Say the daemon takes 60s to startup and crash and I set the StartLimitIntervalSec=320 This should be sufficient time for 5 restarts (?) > I may suggest you to use systemd-coredump and e.g. MaxUse= Thanks, I didn't know about systemd-coredump, do I have to install this separately? I do not see coredump.conf or systemd-coredump service running on my host! (Ubuntu 18.04) Also, I would be more interested to get the rate-limiting to work rather than daemon respawning indefinitely. Thanks -- Ani
Re: [systemd-devel] Regarding service rate limiting (systemd 237)
Hi. On Mon, Jul 11, 2022 at 06:26:44PM +0530, Ani A wrote: > but somehow only with the services that I am trying to rate-limit > (C,unix daemons), it doesn't work! :( Does your service crash later than the demo service terminates? (I.e. you may not reach the sufficient fail rate for start limit to kick in.) > I just want to make sure that the disk is not filled with core files > (the daemon dumps pretty huge core files), hence [trying] to > limit it to 5 restarts, but it keeps restarting forever :( I may suggest you to use systemd-coredump and e.g. MaxUse= (see coredump.conf). Also note that restart limiting would only limit the increase of data consumption due to core file accumulation but its total size would be unbound (without a removal process). HTH, Michal
[systemd-devel] Regarding service rate limiting (systemd 237)
Hello, I am on Ubuntu 18.04 (systemd version 237), I have been trying to get service rate limiting to work, but not getting it right! I checked/tested many examples with the same directives that I use in my service files, they all work well (for e.g.) cat < /usr/local/bin/myservice.sh #!/usr/bin/env bash sleep $(( $RANDOM % 15 )) exit 0 EOF cat < /etc/systemd/system/my.service [Unit] StartLimitBurst=5 StartLimitIntervalSec=90 [Service] ExecStart=/bin/bash /usr/local/bin/myservice.sh RestartSec=5 Restart=always EOF but somehow only with the services that I am trying to rate-limit (C,unix daemons), it doesn't work! :( I enabled systemd Loglevel=debug but still not getting any clues, the only odd message I see is: systemd[1]: my-cdaemon.service: Unknown serialization key: ref-gid systemd[1]: my-cdaemon.service: Changed dead -> running systemd[1]: my-cdaemon.service: Unknown serialization key: ref-gid Is this something to worry about ? Another difference from conventional service start is, the actual service is invoked like: /bin/bash -c 'exec /opt/org/bin/my-cdaemon $OPTS >> /var/log/org/my-cdaemon.log 2>&1' This should be OK? I just want to make sure that the disk is not filled with core files (the daemon dumps pretty huge core files), hence [trying] to limit it to 5 restarts, but it keeps restarting forever :( systemd[1]: my-cdaemon.service: Trying to enqueue job my-cdaemon.service/restart/replace systemd[1]: my-cdaemon.service: Installed new job my-cdaemon.service/restart as 107718 systemd[1]: my-cdaemon.service: Enqueued job my-cdaemon.service/restart as 107718 systemd[1]: my-cdaemon.service: Scheduled restart job, restart counter is at 113. < systemd[1]: my-cdaemon.service: Changed auto-restart -> dead systemd[1]: my-cdaemon.service: Job my-cdaemon.service/restart finished, result=done I tried to factor the _time taken to write the core file_ as well, in `StartLimitIntervalSec', still no luck! How can I troubleshoot this further ? Is there any way to "know" the internal state that systemd is tracking for this daemon _after 5 restarts_ ? -- Ani