On 31/01/2019 15.10, Johan Wassberg wrote:
We are using Radiator with AuthHEIMDALDIGEST and recently upgraded from 4.15 to 4.22. We have noticed that 4.22 is leaving a lot of defunct `kdigest` processes which over time is causing Radiator to crash due to trouble forking new `kdigest`s.
Thanks for reporting this. I was able to reproduce this.
One solution could be to run `waitpid` with "-1" in a loop instead of the real pid and therfor handling all childs that sent `SIGCHLD` in the next authentication. Another soution could be to remove `WNOHANG` making `waitpid` block until the child returns. Not sure if that has any performance issues or why you implemented `waitpid` with `WNOHANG` from the beginning.
I'd say removing WNOHANG could work here. When I wrapped waitpid with two debug log calls, the time stamps they reported were typically less than a millisecond. This which makes me think that waitpid typically will wait for a very short time.
In fact, I could not first reproduce this before I added additional load to the test machine. After this the number of zombies started slowly going up. Based on this I think there was a race, just like you guessed. It also appears that the wait time is very short and thus it's worth just letting waitpid to run a bit longer without WNOHANG.
WNOHANG was likely used because with it zombies were not seen (but the system was should have been under more load).
Let me know if there is anything else I can provide to easier resolve this issue.
Can you try removing WNOHANG and configure radiusd to use LogMicroseconds? If there are visible changes in performance, then log lines after the last 'digest command output' and return code (ACCEPT, etc.) from the AuthBy can be observed more closely. Or you could modify the source to trigger a log message if waitpid starts using too long time. Maybe 2 milliseconds could be good trigger.
Thanks, Heikki -- Heikki Vatiainen <[email protected]> Radiator: the most portable, flexible and configurable RADIUS server anywhere. SQL, proxy, DBM, files, LDAP, TACACS+, PAM, Active Directory, EAP, TLS, TTLS, PEAP, WiMAX, RSA, Vasco, Yubikey, HOTP, TOTP, DIAMETER etc. Full source on Unix, Windows, MacOSX, Solaris, VMS, etc. _______________________________________________ radiator mailing list [email protected] https://lists.open.com.au/mailman/listinfo/radiator
