Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-22 Thread Michal Koutný
On Fri, Jul 22, 2022 at 06:14:11PM +0530, Ani A  wrote:
> Found the issue, posting here to close this thread (and possibly help
> someone who might land in this situation!)

Thanks for sharing.

> The daemon which had issues with rate-limit, was invoking some
> `systemctl stop/start `
>  commands in its initialization! (probably this has some unwanted side 
> effects?)

Timing comes to my mind that could affect that.

> If I eliminate that, then the rate-limit on the main daemon works fine! :)

Yeah, better use explicit dependencies (Wants=/After=) instead of such a
call-back.

Michal


Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-22 Thread Ani A
Hi Michal,

Found the issue, posting here to close this thread (and possibly help
someone who might land in this situation!)

The daemon which had issues with rate-limit, was invoking some
`systemctl stop/start `
 commands in its initialization! (probably this has some unwanted side effects?)
If I eliminate that, then the rate-limit on the main daemon works fine! :)

Thanks.
--
Ani


Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-14 Thread Michal Koutný
Hello.

On Thu, Jul 14, 2022 at 09:29:37PM +0530, Ani A  wrote:
> StartLimitIntervalUSec=5min 20s   
> StartLimitBurst=5
> StartLimitAction=none
> 
> The time is sufficient for 5 restarts, but still daemon keeps restarting!
> 
> Scheduled restart job, restart counter is at 6

If the 5 restarts fit into the 320 seconds, then the start rate limit
won't be active. You write it's sufficient so that sounds to me that
your rate limit is too high to affect real service. Therefore, I'd
suggest decrasing StartLimitBurst= or prolonging StartLimitIntervalSec=
(so that limit rate is _lower_ than pathologic fatal restart rate).

> Also, how to get rid of this:
> 
>Unknown serialization key: ref-gid
> 
> ?

The upstream is typically concerned about last two systemd versions, so
unless this happens with v251 or v250, I don't know, I'm sorry.

Michal


Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-14 Thread Ani A
Hi Michal,

>> systemctl show $UNIT | grep -E 
>> "StartLimit.*|InactiveExitTimestamp|ActiveEnterTimestamp"

This is the output from unit show :

InactiveExitTimestamp=Thu 2022-07-14 21:19:16 IST
InactiveExitTimestampMonotonic=3181663063
ActiveEnterTimestamp=Thu 2022-07-14 21:19:16 IST
ActiveEnterTimestampMonotonic=3181663063
StartLimitIntervalUSec=5min 20s   
StartLimitBurst=5
StartLimitAction=none

The time is sufficient for 5 restarts, but still daemon keeps restarting!

Scheduled restart job, restart counter is at 6

Also, how to get rid of this:

   Unknown serialization key: ref-gid

?

--
Ani


Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-12 Thread Michal Koutný
On Tue, Jul 12, 2022 at 03:36:55PM +0530, Ani A  wrote:
> Demo services work fine, the actual service is quite heavy and takes
> time to startup.
> 
> > you may not reach the sufficient fail rate for start limit to kick
> I didn't get this part.

I meant that your values might have corresponded to too high (re)start
rate and the real service is slower, i.e. below that limit.

> Say the daemon takes 60s to startup and crash and I set the
> StartLimitIntervalSec=320 This should be sufficient time for 5
> restarts (?)

That gives roughly 320s/5 ~ 64s per (re)start. So I'd say this is
borderline, whether the limit throttles the service starts or not.

You can try whether rate limit works for your real service by setting
some very long StartLimitIntervalSec= (and then calibrating more
precisely).

systemctl show $UNIT | grep -E 
"StartLimit.*|InactiveExitTimestamp|ActiveEnterTimestamp"

May give sou some insight into the timings (but internal ratelimiting
parameters are not available).


> Thanks, I didn't know about systemd-coredump, do I have to install
> this separately?
> I do not see coredump.conf or systemd-coredump service running on my host!
> (Ubuntu 18.04)

Not sure about that distro (and that age). You will ultimetely know if
coredump is configured by reading 
/proc/sys/kernel/core_pattern

> Also, I would be more interested to get the rate-limiting to work
> rather than daemon respawning indefinitely.

Fair enough (just wanted to point out that start limiting won't prevent
coredump size accumulation).

Michal


Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-12 Thread Ani A
Hi Michal,

>Does your service crash later than the demo service terminates?
Demo services work fine, the actual service is quite heavy and takes
time to startup.

> you may not reach the sufficient fail rate for start limit to kick
I didn't get this part. Say the daemon takes 60s to startup and crash
and I set the
StartLimitIntervalSec=320
This should be sufficient time for 5 restarts (?)

> I may suggest you to use systemd-coredump and e.g. MaxUse=
Thanks, I didn't know about systemd-coredump, do I have to install
this separately?
I do not see coredump.conf or systemd-coredump service running on my host!
(Ubuntu 18.04)

Also, I would be more interested to get the rate-limiting to work
rather than daemon
respawning indefinitely.

Thanks
--
Ani


Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-12 Thread Michal Koutný
Hi.

On Mon, Jul 11, 2022 at 06:26:44PM +0530, Ani A  wrote:
> but somehow only with the services that I am trying to rate-limit
> (C,unix daemons), it doesn't work! :(

Does your service crash later than the demo service terminates?
(I.e. you may not reach the sufficient fail rate for start limit to kick
in.)

> I just want to make sure that the disk is not filled with core files
> (the daemon dumps pretty huge core files), hence [trying] to
> limit it to 5 restarts, but it keeps restarting forever :(

I may suggest you to use systemd-coredump and e.g. MaxUse= (see
coredump.conf).

Also note that restart limiting would only limit the increase of data
consumption due to core file accumulation but its total size would be
unbound (without a removal process).

HTH,
Michal


[systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-11 Thread Ani A
Hello,

I am on Ubuntu 18.04 (systemd version 237), I have been trying to get
service rate limiting to work, but not getting it right!
I checked/tested many examples with the same directives that I use in
my service files, they all work well (for e.g.)

cat < /usr/local/bin/myservice.sh
#!/usr/bin/env bash
sleep $(( $RANDOM % 15 ))
exit 0
EOF

cat < /etc/systemd/system/my.service
[Unit]
StartLimitBurst=5
StartLimitIntervalSec=90

[Service]
ExecStart=/bin/bash /usr/local/bin/myservice.sh
RestartSec=5
Restart=always
EOF

but somehow only with the services that I am trying to rate-limit
(C,unix daemons), it doesn't work! :(
I enabled systemd Loglevel=debug but still not getting any clues, the only
odd message I see is:

systemd[1]: my-cdaemon.service: Unknown serialization key: ref-gid
systemd[1]: my-cdaemon.service: Changed dead -> running
systemd[1]: my-cdaemon.service: Unknown serialization key: ref-gid

Is this something to worry about ?

Another difference from conventional service start is, the actual
service is invoked like:

 /bin/bash -c 'exec /opt/org/bin/my-cdaemon $OPTS >>
/var/log/org/my-cdaemon.log 2>&1'

This should be OK?

I just want to make sure that the disk is not filled with core files
(the daemon dumps pretty huge core files), hence [trying] to
limit it to 5 restarts, but it keeps restarting forever :(

systemd[1]: my-cdaemon.service: Trying to enqueue job
my-cdaemon.service/restart/replace
systemd[1]: my-cdaemon.service: Installed new job
my-cdaemon.service/restart as 107718
systemd[1]: my-cdaemon.service: Enqueued job
my-cdaemon.service/restart as 107718
systemd[1]: my-cdaemon.service: Scheduled restart job, restart counter
is at 113.  <
systemd[1]: my-cdaemon.service: Changed auto-restart -> dead
systemd[1]: my-cdaemon.service: Job my-cdaemon.service/restart
finished, result=done

I tried to factor the _time taken to write the core file_ as well, in
`StartLimitIntervalSec', still no luck!

How can I troubleshoot this further ? Is there any way to "know" the
internal state that systemd is tracking for this daemon _after 5 restarts_ ?

--
Ani