** Changed in: cluster-glue (Ubuntu)
Assignee: Andres Rodriguez (andreserl) => (unassigned)
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to cluster-glue in Ubuntu.
https://bugs.launchpad.net/bugs/1251298
Title:
Failed to sign on to LRMd with Heartbeat/Pacemaker
Status in cluster-glue package in Ubuntu:
Confirmed
Bug description:
I'm running a 2 node heartbeat/pacemaker cluster, which was working fine with
Ubuntu 13.04
After upgrading from Ubuntu 13.04 to Ubuntu 13.10, Heartbeat/Pacemaker keeps
restarting the system due to sign on errors of lrmd and heartbeat tries to
recover.
As one system is already on ubuntu 13.10 and one system still running
13.04, I've tried it without the second node, which leads to the same
behavior, which occurs before any cluster communication happens.
Syslog:
Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 1 (30 max) times
Nov 14 15:53:06 wolverine crmd[2464]: notice: crmd_client_status_callback:
Status update: Client wolverine.domain.tld/crmd now has status [join] (DC=false)
Nov 14 15:53:06 wolverine crmd[2464]: notice: crmd_client_status_callback:
Status update: Client wolverine.domain.tld/crmd now has status [online]
(DC=false)
Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 2 (30 max) times
Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 3 (30 max) times
Nov 14 15:53:07 wolverine stonith-ng[2462]: notice: setup_cib: Watching for
stonith topology changes
Nov 14 15:53:07 wolverine stonith-ng[2462]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Nov 14 15:53:08 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 4 (30 max) times
Nov 14 15:53:10 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 5 (30 max) times
Nov 14 15:53:12 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 6 (30 max) times
Nov 14 15:53:14 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 7 (30 max) times
Nov 14 15:53:16 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 8 (30 max) times
Nov 14 15:53:18 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 9 (30 max) times
Nov 14 15:53:20 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 10 (30 max) times
Nov 14 15:53:22 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 11 (30 max) times
Nov 14 15:53:24 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 12 (30 max) times
Nov 14 15:53:26 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 13 (30 max) times
Nov 14 15:53:28 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 14 (30 max) times
Nov 14 15:53:30 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 15 (30 max) times
Nov 14 15:53:32 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 16 (30 max) times
Nov 14 15:53:34 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 17 (30 max) times
Nov 14 15:53:36 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 18 (30 max) times
Nov 14 15:53:38 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 19 (30 max) times
Nov 14 15:53:40 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 20 (30 max) times
Nov 14 15:53:42 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 21 (30 max) times
Nov 14 15:53:44 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 22 (30 max) times
Nov 14 15:53:46 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 23 (30 max) times
Nov 14 15:53:48 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 24 (30 max) times
Nov 14 15:53:50 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 25 (30 max) times
Nov 14 15:53:52 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 26 (30 max) times
Nov 14 15:53:54 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 27 (30 max) times
Nov 14 15:53:56 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 28 (30 max) times
Nov 14 15:53:58 wolverine crmd[2464]: warning: do_lrm_control: Failed to
sign on to the LRM 29 (30 max) times
Nov 14 15:54:00 wolverine crmd[2464]: error: do_lrm_control: Failed to
sign on to the LRM 30 (max) times
Nov 14 15:54:00 wolverine crmd[2464]: error: do_log: FSA: Input I_ERROR
from do_lrm_control() received in state S_STARTING
Nov 14 15:54:00 wolverine crmd[2464]: notice: do_state_transition: State
transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
origin=do_lrm_control ]
Nov 14 15:54:00 wolverine crmd[2464]: warning: do_recover: Fast-tracking
shutdown in response to errors
Symlinking lrmd from pacemaker package solved this problem partly:
root@wolverine ~ # mv /usr/lib/heartbeat/lrmd{,.cluster-glue}
root@wolverine ~ # cd /usr/lib/heartbeat/
root@wolverine /usr/lib/heartbeat # ln -s ../pacemaker/lrmd
root@wolverine /usr/lib/heartbeat # ls -la lrmd
lrwxrwxrwx 1 root root 17 Nov 14 16:35 lrmd -> ../pacemaker/lrmd
root@wolverine /usr/lib/heartbeat # ls -la lrmd*
lrwxrwxrwx 1 root root 17 Nov 14 16:35 lrmd -> ../pacemaker/lrmd
-rwxr-xr-x 1 root root 92816 Jul 18 17:55 lrmd.cluster-glue
root@wolverine /usr/lib/heartbeat #
Stopping heartbeat will still result in an unexpected reboot:
Nov 14 16:37:27 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_notify_0 (call=45, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:37:28 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_notify_0 (call=48, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:37:28 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2673
failed authorization [no default client auth]
Nov 14 16:37:28 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:37:28 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-backup:1 (10000)
Nov 14 16:37:28 wolverine attrd[2258]: notice: attrd_perform_update: Sent
update 10: master-drbd-backup:1=10000
Nov 14 16:37:28 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_monitor_31000 (call=50, rc=0, cib-update=17,
confirmed=false) ok
Nov 14 16:37:29 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2700
failed authorization [no default client auth]
Nov 14 16:37:29 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:37:29 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-rsyslog:1 (10000)
Nov 14 16:37:29 wolverine attrd[2258]: notice: attrd_perform_update: Sent
update 13: master-drbd-rsyslog:1=10000
Nov 14 16:37:29 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_monitor_31000 (call=54, rc=0, cib-update=18,
confirmed=false) ok
Nov 14 16:37:59 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2812
failed authorization [no default client auth]
Nov 14 16:37:59 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:00 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2839
failed authorization [no default client auth]
Nov 14 16:38:00 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:05 wolverine heartbeat: [2238]: info: killing
/usr/lib/heartbeat/crmd process group 2259 with signal 15
Nov 14 16:38:05 wolverine crmd[2259]: notice: crm_shutdown: Requesting
shutdown, upper limit is 1200000ms
Nov 14 16:38:05 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: shutdown (1384443485)
Nov 14 16:38:05 wolverine attrd[2258]: notice: attrd_perform_update: Sent
update 16: shutdown=1384443485
Nov 14 16:38:06 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_notify_0 (call=57, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:38:06 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_notify_0 (call=59, rc=0, cib-update=0, confirmed=true)
ok
Nov 14 16:38:07 wolverine kernel: [ 255.385984] d-con backup: Requested
state change failed by peer: Refusing to be Primary while peer is not outdated
(-7)
Nov 14 16:38:07 wolverine kernel: [ 255.386415] d-con backup: peer( Primary
-> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated )
pdsk( UpToDate -> DUnknown )
Nov 14 16:38:07 wolverine kernel: [ 255.386428] d-con backup: asender
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.386438] d-con backup: Terminating
drbd_a_backup
Nov 14 16:38:07 wolverine kernel: [ 255.386693] d-con backup: Connection
closed
Nov 14 16:38:07 wolverine kernel: [ 255.386716] d-con backup: conn(
Disconnecting -> StandAlone )
Nov 14 16:38:07 wolverine kernel: [ 255.386718] d-con backup: receiver
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.386722] d-con backup: Terminating
drbd_r_backup
Nov 14 16:38:07 wolverine kernel: [ 255.386750] block drbd0: disk( Outdated
-> Failed )
Nov 14 16:38:07 wolverine kernel: [ 255.409861] block drbd0: bitmap WRITE of
0 pages took 0 jiffies
Nov 14 16:38:07 wolverine kernel: [ 255.409930] block drbd0: 0 KB (0 bits)
marked out-of-sync by on disk bit-map.
Nov 14 16:38:07 wolverine kernel: [ 255.409943] block drbd0: disk( Failed ->
Diskless )
Nov 14 16:38:07 wolverine kernel: [ 255.410041] block drbd0: drbd_bm_resize
called with capacity == 0
Nov 14 16:38:07 wolverine kernel: [ 255.411773] d-con backup: Terminating
drbd_w_backup
Nov 14 16:38:07 wolverine kernel: [ 255.466428] d-con rsyslog: Requested
state change failed by peer: Refusing to be Primary while peer is not outdated
(-7)
Nov 14 16:38:07 wolverine kernel: [ 255.466796] d-con rsyslog: peer( Primary
-> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated )
pdsk( UpToDate -> DUnknown )
Nov 14 16:38:07 wolverine kernel: [ 255.466814] d-con rsyslog: asender
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.466832] d-con rsyslog: Terminating
drbd_a_rsyslog
Nov 14 16:38:07 wolverine kernel: [ 255.467098] d-con rsyslog: Connection
closed
Nov 14 16:38:07 wolverine kernel: [ 255.467121] d-con rsyslog: conn(
Disconnecting -> StandAlone )
Nov 14 16:38:07 wolverine kernel: [ 255.467123] d-con rsyslog: receiver
terminated
Nov 14 16:38:07 wolverine kernel: [ 255.467128] d-con rsyslog: Terminating
drbd_r_rsyslog
Nov 14 16:38:07 wolverine kernel: [ 255.467169] block drbd1: disk( Outdated
-> Failed )
Nov 14 16:38:07 wolverine kernel: [ 255.481716] block drbd1: bitmap WRITE of
0 pages took 0 jiffies
Nov 14 16:38:07 wolverine kernel: [ 255.481778] block drbd1: 0 KB (0 bits)
marked out-of-sync by on disk bit-map.
Nov 14 16:38:07 wolverine kernel: [ 255.481791] block drbd1: disk( Failed ->
Diskless )
Nov 14 16:38:07 wolverine kernel: [ 255.481881] block drbd1: drbd_bm_resize
called with capacity == 0
Nov 14 16:38:07 wolverine kernel: [ 255.482011] d-con rsyslog: Terminating
drbd_w_rsyslog
Nov 14 16:38:07 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2986
failed authorization [no default client auth]
Nov 14 16:38:07 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:07 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2989
failed authorization [no default client auth]
Nov 14 16:38:07 wolverine heartbeat: [2238]: ERROR:
api_process_registration_msg: cannot add client(crm_node)
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-backup:1 (<null>)
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 18: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-backup:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:07 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-backup:1_stop_0 (call=64, rc=0, cib-update=19, confirmed=true) ok
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-drbd-rsyslog:1 (<null>)
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 20: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-rsyslog:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:07 wolverine crmd[2259]: notice: process_lrm_event: LRM
operation drbd-rsyslog:1_stop_0 (call=67, rc=0, cib-update=20, confirmed=true)
ok
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 22: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-backup:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent
delete 24: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56,
attr=master-drbd-rsyslog:1, id=<n/a>, set=(null), section=status
Nov 14 16:38:08 wolverine crmd[2259]: notice: do_state_transition: State
transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE
origin=route_message ]
Nov 14 16:38:08 wolverine crmd[2259]: notice: lrm_state_verify_stopped:
Stopped 0 recurring operations at (null) (1274719202 ops remaining)
Nov 14 16:38:08 wolverine crmd[2259]: notice: do_lrm_control: Disconnected
from the LRM
Nov 14 16:38:08 wolverine ccm: [2254]: info: client (pid=2259) removed from
ccm
Nov 14 16:38:08 wolverine heartbeat: [2238]: EMERG: Rebooting system.
Reason: /usr/lib/heartbeat/crmd
root@wolverine ~ # lsb_release -rd
Description: Ubuntu 13.10
Release: 13.10
root@wolverine ~ # apt-cache policy cluster-glue
cluster-glue:
Installed: 1.0.11+hg2754-1.1
Candidate: 1.0.11+hg2754-1.1
Version table:
*** 1.0.11+hg2754-1.1 0
500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages
100 /var/lib/dpkg/status
root@wolverine ~ #
root@wolverine ~ # apt-cache policy heartbeat
heartbeat:
Installed: 1:3.0.5-3.1ubuntu1
Candidate: 1:3.0.5-3.1ubuntu1
Version table:
*** 1:3.0.5-3.1ubuntu1 0
500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages
100 /var/lib/dpkg/status
root@wolverine ~ #
root@wolverine ~ # apt-cache policy pacemaker
pacemaker:
Installed: 1.1.10+git20130802-1ubuntu1
Candidate: 1.1.10+git20130802-1ubuntu1
Version table:
*** 1.1.10+git20130802-1ubuntu1 0
500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages
100 /var/lib/dpkg/status
root@wolverine ~ #
Expected:
- Working heartbeat/pacemaker setup after ubuntu upgrade
What happend:
- System reboots after about one minute due to heartbeat recovery tries
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cluster-glue/+bug/1251298/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp