On Fri, Jan 22, 2016 at 07:57:52PM +0300, Vladislav Bogdanov wrote:
> Tried reverting this one and a51b2bb ("If an error occurs unlink the
> lock file and exit with status 1") one-by-one and both together, the
> same result.
>
> So problem seems to be somewhere deeper.
I've got the same fencing problem with dlm-4.0.4 on Debian. Looking
at the strace of the dlm_controld process it exits right after returning
from the poll call due to SIGCHLD signal:
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN},
{fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14,
events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17,
events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN},
{fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14,
events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17,
events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN},
{fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14,
events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17,
events=POLLIN}], 10, 1000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2279, si_uid=0,
si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn() = -1 EINTR (Interrupted system call)
close(11) = 0
sendto(10, "\240", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(17, "\20", 1, MSG_NOSIGNAL, NULL, 0) = 1
poll([{fd=17, events=POLLIN}], 1, 0) = 0 (Timeout)
shutdown(17, SHUT_RDWR) = 0
close(17) = 0
munmap(0x7f5f45c26000, 2105344) = 0
munmap(0x7f5f4aeea000, 8248) = 0
munmap(0x7f5f45a24000, 2105344) = 0
munmap(0x7f5f4aee7000, 8248) = 0
munmap(0x7f5f45822000, 2105344) = 0
and in fact there is a recent change in 4.0.4 modifying that part
of code:
If an error occurs unlink the lock file and exit with status 1
https://git.fedorahosted.org/cgit/dlm.git/commit/?id=a51b2bbe413222829778698e62af88a73ebec233
The bug is caused by the missing braces in the expanded if
statement.
Do you think we can get a new version out with this patch as the
fencing in 4.0.4 does not work properly due to this issue?
--
Valentin
Index: dlm-4.0.4/dlm_controld/main.c
===================================================================
--- dlm-4.0.4.orig/dlm_controld/main.c
+++ dlm-4.0.4/dlm_controld/main.c
@@ -1028,9 +1028,10 @@ static int loop(void)
for (;;) {
rv = poll(pollfd, client_maxi + 1, poll_timeout);
if (rv == -1 && errno == EINTR) {
- if (daemon_quit && list_empty(&lockspaces))
+ if (daemon_quit && list_empty(&lockspaces)) {
rv = 0;
goto out;
+ }
if (daemon_quit) {
log_error("shutdown ignored, active lockspaces");
daemon_quit = 0;
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org