Package: sssd Version: 2.2.2-1+b1 Severity: important Tags: upstream In a setup with sssd using a remote slapd for NSS, and a somewhat flaky network in between, sssd_be tends to get into a busy loop sometimes, using 100% CPU time on one core.
Debugging showed that sssd has a watchdog to clean up in such cases, but sssd_be installs a signal handler that prevents the SIGTERM on the processgroup to be processed correctly, and does not exit. src/util/util_watchdog.c: 64 /* the watchdog is purposefully *not* handled by the tevent 65 * signal handler as it is meant to check if the daemon is 66 * still processing the event queue itself. A stuck process 67 * may not handle the event queue at all and thus not handle 68 * signals either */ 69 static void watchdog_handler(int sig) 70 { 71 72 watchdog_detect_timeshift(); 73 74 /* if a pre-defined number of ticks passed by kills itself */ 75 if (__sync_add_and_fetch(&watchdog_ctx.ticks, 1) > WATCHDOG_MAX_TICKS) { 76 if (getpid() == getpgrp()) { 77 kill(-getpgrp(), SIGTERM); 78 } else { 79 _exit(1); 80 } 81 } 82 } (NB. Seems what is described in the comment was not all too successful ;) The signal handler is installed in src/providers/data_provider_be.c: 448 static void be_process_finalize(struct tevent_context *ev, 449 struct tevent_signal *se, 450 int signum, 451 int count, 452 void *siginfo, 453 void *private_data) 454 { 455 struct be_ctx *be_ctx; 456 457 be_ctx = talloc_get_type(private_data, struct be_ctx); 458 talloc_free(be_ctx); 459 orderly_shutdown(0); 460 } 461 462 static errno_t be_process_install_sigterm_handler(struct be_ctx *be_ctx) 463 { 464 struct tevent_signal *sige; 465 466 BlockSignals(false, SIGTERM); 467 468 sige = tevent_add_signal(be_ctx->ev, be_ctx, SIGTERM, SA_SIGINFO, 469 be_process_finalize, be_ctx); 470 if (sige == NULL) { 471 DEBUG(SSSDBG_CRIT_FAILURE, "tevent_add_signal failed.\n"); 472 return ENOMEM; 473 } 474 475 return EOK; 476 } Setting a breakpoint on be_process_finalize showed that this function is never reached, probably because libtevent never gets around to calling it. Two proposals to circumvent this are: a) Reset the handler before calling kill on the process group in line 77 (e.g. signal(SIGTERM, SIG_DFL);) b) Move the exit call in line 79 out of the branch so it gets called unconditionally in case kill() fails to kill the process itself We tested solution a) in gdb and it caused sssd_be to exit cleanly and restart, as it should. Cheers, Nik Analysis was sponsored by Teckids e.V. and tarent solutions GmbH. -- System Information: Debian Release: bullseye/sid APT prefers testing-debug APT policy: (500, 'testing-debug'), (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 5.3.0-2-amd64 (SMP w/4 CPU cores) Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de_DE.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages sssd depends on: ii python3-sss 2.2.2-1+b1 ii sssd-ad 2.2.2-1+b1 ii sssd-common 2.2.2-1+b1 ii sssd-ipa 2.2.2-1+b1 ii sssd-krb5 2.2.2-1+b1 ii sssd-ldap 2.2.2-1+b1 ii sssd-proxy 2.2.2-1+b1 sssd recommends no packages. sssd suggests no packages. -- no debconf information