Further investigation (not yet correlated) suggests that it might possibly be a
permissions or a timing issue.
I can see from straces crmd creating the lrm sockets
[pid 4433] unlink("/var/run/heartbeat/lrm_cmd_sock") = -1 ENOENT (No such file
or directory)
[pid 4433] bind(4, {sa_family=AF_FILE,
path="/var/run/heartbeat/lrm_cmd_sock"}, 110) = 0
[pid 4433] chmod("/var/run/heartbeat/lrm_cmd_sock", 0777) = 0
[pid 4433] listen(4, 10) = 0
[pid 4433] fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
[pid 4433] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 4433] socket(PF_FILE, SOCK_STREAM, 0) = 7
[pid 4433] unlink("/var/run/heartbeat/lrm_callback_sock") = -1 ENOENT (No such
file or directory)
[pid 4433] bind(7, {sa_family=AF_FILE,
path="/var/run/heartbeat/lrm_callback_sock"}, 110) = 0
[pid 4433] chmod("/var/run/heartbeat/lrm_callback_sock", 0777) = 0
and then shortly afterwards delete them again
[pid 4433] unlink("/var/run/heartbeat/lrm_cmd_sock" <unfinished ...>
[pid 4436] <... mprotect resumed> ) = 0
[pid 4433] <... unlink resumed> ) = 0
[pid 4425] futex(0x7f33c929b0c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f33c929b0c0,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid 4433] close(7 <unfinished ...>
[pid 4426] <... futex resumed> ) = 0
[pid 4433] <... close resumed> ) = 0
[pid 4426] futex(0x7f33c929b100, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 4433] unlink("/var/run/heartbeat/lrm_callback_sock" <unfinished ...>
On one of the two nodes I did manage to make some progress in that I stopped
corosync, waited for a few minutes then started it again and now I get the
sockets (Doing /etc/init.d/corosync restarts or machine reboots wasnt ever
successful.
r...@node1:/var/run/heartbeat# ls -l
total 0
srwxrwxrwx 1 root root 0 2010-11-17 01:22 lrm_callback_sock
srwxrwxrwx 1 root root 0 2010-11-17 01:22 lrm_cmd_sock
drwxr-xr-x 2 root root 40 2010-11-17 01:22 rsctmp
srwxrwxrwx 1 root root 0 2010-11-17 01:22 stonithd
srwxrwxrwx 1 root root 0 2010-11-17 01:22 stonithd_callback
but I cant repeat this on the second node.
I havent tried it on the first node again (at least I can compare things with
the 2nd node).
After
--
do_lrm_control: Failed to sign on to the LRM after upgrade to Maverick
https://bugs.launchpad.net/bugs/676391
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs