On 30/04/19 20:39 +0300, Andrei Borzenkov wrote: > 30.04.2019 9:51, Jan Friesse пишет: >> >>> Now, corosync-qdevice gets SIGTERM as "signal to terminate", but it >>> installs SIGTERM handler that does not exit and only closes some socket. >>> May be this should trigger termination of main loop, but somehow it does >>> not. >> >> Yep, this is exactly how qdevice daemon shutdown works. Signal just >> closes socket (should be signal safe) and poll in main loop do its job >> so main loop is terminated. >> > > That is bug in corosync 2.4.4 which is still used in TW. stop is using > pidof, I have two corosync-qdevice processes so corosync-qdevice never > gets signal in the first place. > > > ++ pidof corosync-qdevice > + kill -TERM '1812 1811'
Needless to remind that half of the cluster stack, especially the agents, still make decisions based on overly naive assumptions based on unreliable grip on processes (and singletons thereof, which may not apply, as demonstrated above) per their name/PID, something that may clash even totally accidentally (typical default of process namespace serving just 2^15 slots leading to possibly quick wraparounds; someone invoking the pacemaker daemon just so as to on-off fetch the metadata provided in this way), with containers make the situation just worse from the host perspective[1]. Luckily, we've fixed some of these troublemakers in pacemaker with the recent security updates, and there are some interesting synergies possible in the outlook, see "pidfd" from the newest developments in Linux. [1] e.g. https://lists.clusterlabs.org/pipermail/developers/2017-July/001875.html -- Jan (Poki)
pgpKnvWsl9gDO.pgp
Description: PGP signature
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/