This is still an issue in ubuntu 14.0.2 with full updates from 12. aug 2015. The symlink resolves the problem, but why not fix the package?
-- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to cluster-glue in Ubuntu. https://bugs.launchpad.net/bugs/1251298 Title: Failed to sign on to LRMd with Heartbeat/Pacemaker Status in cluster-glue package in Ubuntu: Confirmed Bug description: I'm running a 2 node heartbeat/pacemaker cluster, which was working fine with Ubuntu 13.04 After upgrading from Ubuntu 13.04 to Ubuntu 13.10, Heartbeat/Pacemaker keeps restarting the system due to sign on errors of lrmd and heartbeat tries to recover. As one system is already on ubuntu 13.10 and one system still running 13.04, I've tried it without the second node, which leads to the same behavior, which occurs before any cluster communication happens. Syslog: Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 1 (30 max) times Nov 14 15:53:06 wolverine crmd[2464]: notice: crmd_client_status_callback: Status update: Client wolverine.domain.tld/crmd now has status [join] (DC=false) Nov 14 15:53:06 wolverine crmd[2464]: notice: crmd_client_status_callback: Status update: Client wolverine.domain.tld/crmd now has status [online] (DC=false) Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 2 (30 max) times Nov 14 15:53:06 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 3 (30 max) times Nov 14 15:53:07 wolverine stonith-ng[2462]: notice: setup_cib: Watching for stonith topology changes Nov 14 15:53:07 wolverine stonith-ng[2462]: notice: unpack_config: On loss of CCM Quorum: Ignore Nov 14 15:53:08 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 4 (30 max) times Nov 14 15:53:10 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 5 (30 max) times Nov 14 15:53:12 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 6 (30 max) times Nov 14 15:53:14 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 7 (30 max) times Nov 14 15:53:16 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 8 (30 max) times Nov 14 15:53:18 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 9 (30 max) times Nov 14 15:53:20 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 10 (30 max) times Nov 14 15:53:22 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 11 (30 max) times Nov 14 15:53:24 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 12 (30 max) times Nov 14 15:53:26 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 13 (30 max) times Nov 14 15:53:28 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 14 (30 max) times Nov 14 15:53:30 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 15 (30 max) times Nov 14 15:53:32 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 16 (30 max) times Nov 14 15:53:34 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 17 (30 max) times Nov 14 15:53:36 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 18 (30 max) times Nov 14 15:53:38 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 19 (30 max) times Nov 14 15:53:40 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 20 (30 max) times Nov 14 15:53:42 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 21 (30 max) times Nov 14 15:53:44 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 22 (30 max) times Nov 14 15:53:46 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 23 (30 max) times Nov 14 15:53:48 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 24 (30 max) times Nov 14 15:53:50 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 25 (30 max) times Nov 14 15:53:52 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 26 (30 max) times Nov 14 15:53:54 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 27 (30 max) times Nov 14 15:53:56 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 28 (30 max) times Nov 14 15:53:58 wolverine crmd[2464]: warning: do_lrm_control: Failed to sign on to the LRM 29 (30 max) times Nov 14 15:54:00 wolverine crmd[2464]: error: do_lrm_control: Failed to sign on to the LRM 30 (max) times Nov 14 15:54:00 wolverine crmd[2464]: error: do_log: FSA: Input I_ERROR from do_lrm_control() received in state S_STARTING Nov 14 15:54:00 wolverine crmd[2464]: notice: do_state_transition: State transition S_STARTING -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=do_lrm_control ] Nov 14 15:54:00 wolverine crmd[2464]: warning: do_recover: Fast-tracking shutdown in response to errors Symlinking lrmd from pacemaker package solved this problem partly: root@wolverine ~ # mv /usr/lib/heartbeat/lrmd{,.cluster-glue} root@wolverine ~ # cd /usr/lib/heartbeat/ root@wolverine /usr/lib/heartbeat # ln -s ../pacemaker/lrmd root@wolverine /usr/lib/heartbeat # ls -la lrmd lrwxrwxrwx 1 root root 17 Nov 14 16:35 lrmd -> ../pacemaker/lrmd root@wolverine /usr/lib/heartbeat # ls -la lrmd* lrwxrwxrwx 1 root root 17 Nov 14 16:35 lrmd -> ../pacemaker/lrmd -rwxr-xr-x 1 root root 92816 Jul 18 17:55 lrmd.cluster-glue root@wolverine /usr/lib/heartbeat # Stopping heartbeat will still result in an unexpected reboot: Nov 14 16:37:27 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-backup:1_notify_0 (call=45, rc=0, cib-update=0, confirmed=true) ok Nov 14 16:37:28 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-rsyslog:1_notify_0 (call=48, rc=0, cib-update=0, confirmed=true) ok Nov 14 16:37:28 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2673 failed authorization [no default client auth] Nov 14 16:37:28 wolverine heartbeat: [2238]: ERROR: api_process_registration_msg: cannot add client(crm_node) Nov 14 16:37:28 wolverine attrd[2258]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd-backup:1 (10000) Nov 14 16:37:28 wolverine attrd[2258]: notice: attrd_perform_update: Sent update 10: master-drbd-backup:1=10000 Nov 14 16:37:28 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-backup:1_monitor_31000 (call=50, rc=0, cib-update=17, confirmed=false) ok Nov 14 16:37:29 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2700 failed authorization [no default client auth] Nov 14 16:37:29 wolverine heartbeat: [2238]: ERROR: api_process_registration_msg: cannot add client(crm_node) Nov 14 16:37:29 wolverine attrd[2258]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd-rsyslog:1 (10000) Nov 14 16:37:29 wolverine attrd[2258]: notice: attrd_perform_update: Sent update 13: master-drbd-rsyslog:1=10000 Nov 14 16:37:29 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-rsyslog:1_monitor_31000 (call=54, rc=0, cib-update=18, confirmed=false) ok Nov 14 16:37:59 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2812 failed authorization [no default client auth] Nov 14 16:37:59 wolverine heartbeat: [2238]: ERROR: api_process_registration_msg: cannot add client(crm_node) Nov 14 16:38:00 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2839 failed authorization [no default client auth] Nov 14 16:38:00 wolverine heartbeat: [2238]: ERROR: api_process_registration_msg: cannot add client(crm_node) Nov 14 16:38:05 wolverine heartbeat: [2238]: info: killing /usr/lib/heartbeat/crmd process group 2259 with signal 15 Nov 14 16:38:05 wolverine crmd[2259]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms Nov 14 16:38:05 wolverine attrd[2258]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1384443485) Nov 14 16:38:05 wolverine attrd[2258]: notice: attrd_perform_update: Sent update 16: shutdown=1384443485 Nov 14 16:38:06 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-backup:1_notify_0 (call=57, rc=0, cib-update=0, confirmed=true) ok Nov 14 16:38:06 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-rsyslog:1_notify_0 (call=59, rc=0, cib-update=0, confirmed=true) ok Nov 14 16:38:07 wolverine kernel: [ 255.385984] d-con backup: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7) Nov 14 16:38:07 wolverine kernel: [ 255.386415] d-con backup: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) Nov 14 16:38:07 wolverine kernel: [ 255.386428] d-con backup: asender terminated Nov 14 16:38:07 wolverine kernel: [ 255.386438] d-con backup: Terminating drbd_a_backup Nov 14 16:38:07 wolverine kernel: [ 255.386693] d-con backup: Connection closed Nov 14 16:38:07 wolverine kernel: [ 255.386716] d-con backup: conn( Disconnecting -> StandAlone ) Nov 14 16:38:07 wolverine kernel: [ 255.386718] d-con backup: receiver terminated Nov 14 16:38:07 wolverine kernel: [ 255.386722] d-con backup: Terminating drbd_r_backup Nov 14 16:38:07 wolverine kernel: [ 255.386750] block drbd0: disk( Outdated -> Failed ) Nov 14 16:38:07 wolverine kernel: [ 255.409861] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Nov 14 16:38:07 wolverine kernel: [ 255.409930] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Nov 14 16:38:07 wolverine kernel: [ 255.409943] block drbd0: disk( Failed -> Diskless ) Nov 14 16:38:07 wolverine kernel: [ 255.410041] block drbd0: drbd_bm_resize called with capacity == 0 Nov 14 16:38:07 wolverine kernel: [ 255.411773] d-con backup: Terminating drbd_w_backup Nov 14 16:38:07 wolverine kernel: [ 255.466428] d-con rsyslog: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7) Nov 14 16:38:07 wolverine kernel: [ 255.466796] d-con rsyslog: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) Nov 14 16:38:07 wolverine kernel: [ 255.466814] d-con rsyslog: asender terminated Nov 14 16:38:07 wolverine kernel: [ 255.466832] d-con rsyslog: Terminating drbd_a_rsyslog Nov 14 16:38:07 wolverine kernel: [ 255.467098] d-con rsyslog: Connection closed Nov 14 16:38:07 wolverine kernel: [ 255.467121] d-con rsyslog: conn( Disconnecting -> StandAlone ) Nov 14 16:38:07 wolverine kernel: [ 255.467123] d-con rsyslog: receiver terminated Nov 14 16:38:07 wolverine kernel: [ 255.467128] d-con rsyslog: Terminating drbd_r_rsyslog Nov 14 16:38:07 wolverine kernel: [ 255.467169] block drbd1: disk( Outdated -> Failed ) Nov 14 16:38:07 wolverine kernel: [ 255.481716] block drbd1: bitmap WRITE of 0 pages took 0 jiffies Nov 14 16:38:07 wolverine kernel: [ 255.481778] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Nov 14 16:38:07 wolverine kernel: [ 255.481791] block drbd1: disk( Failed -> Diskless ) Nov 14 16:38:07 wolverine kernel: [ 255.481881] block drbd1: drbd_bm_resize called with capacity == 0 Nov 14 16:38:07 wolverine kernel: [ 255.482011] d-con rsyslog: Terminating drbd_w_rsyslog Nov 14 16:38:07 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2986 failed authorization [no default client auth] Nov 14 16:38:07 wolverine heartbeat: [2238]: ERROR: api_process_registration_msg: cannot add client(crm_node) Nov 14 16:38:07 wolverine heartbeat: [2238]: WARN: Client [crm_node] pid 2989 failed authorization [no default client auth] Nov 14 16:38:07 wolverine heartbeat: [2238]: ERROR: api_process_registration_msg: cannot add client(crm_node) Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd-backup:1 (<null>) Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent delete 18: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56, attr=master-drbd-backup:1, id=<n/a>, set=(null), section=status Nov 14 16:38:07 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-backup:1_stop_0 (call=64, rc=0, cib-update=19, confirmed=true) ok Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd-rsyslog:1 (<null>) Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent delete 20: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56, attr=master-drbd-rsyslog:1, id=<n/a>, set=(null), section=status Nov 14 16:38:07 wolverine crmd[2259]: notice: process_lrm_event: LRM operation drbd-rsyslog:1_stop_0 (call=67, rc=0, cib-update=20, confirmed=true) ok Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent delete 22: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56, attr=master-drbd-backup:1, id=<n/a>, set=(null), section=status Nov 14 16:38:07 wolverine attrd[2258]: notice: attrd_perform_update: Sent delete 24: node=19f64c15-2545-4b18-8d1a-39d9c3a88a56, attr=master-drbd-rsyslog:1, id=<n/a>, set=(null), section=status Nov 14 16:38:08 wolverine crmd[2259]: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ] Nov 14 16:38:08 wolverine crmd[2259]: notice: lrm_state_verify_stopped: Stopped 0 recurring operations at (null) (1274719202 ops remaining) Nov 14 16:38:08 wolverine crmd[2259]: notice: do_lrm_control: Disconnected from the LRM Nov 14 16:38:08 wolverine ccm: [2254]: info: client (pid=2259) removed from ccm Nov 14 16:38:08 wolverine heartbeat: [2238]: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/crmd root@wolverine ~ # lsb_release -rd Description: Ubuntu 13.10 Release: 13.10 root@wolverine ~ # apt-cache policy cluster-glue cluster-glue: Installed: 1.0.11+hg2754-1.1 Candidate: 1.0.11+hg2754-1.1 Version table: *** 1.0.11+hg2754-1.1 0 500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages 100 /var/lib/dpkg/status root@wolverine ~ # root@wolverine ~ # apt-cache policy heartbeat heartbeat: Installed: 1:3.0.5-3.1ubuntu1 Candidate: 1:3.0.5-3.1ubuntu1 Version table: *** 1:3.0.5-3.1ubuntu1 0 500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages 100 /var/lib/dpkg/status root@wolverine ~ # root@wolverine ~ # apt-cache policy pacemaker pacemaker: Installed: 1.1.10+git20130802-1ubuntu1 Candidate: 1.1.10+git20130802-1ubuntu1 Version table: *** 1.1.10+git20130802-1ubuntu1 0 500 http://de.archive.ubuntu.com/ubuntu/ saucy/main amd64 Packages 100 /var/lib/dpkg/status root@wolverine ~ # Expected: - Working heartbeat/pacemaker setup after ubuntu upgrade What happend: - System reboots after about one minute due to heartbeat recovery tries To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/cluster-glue/+bug/1251298/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : [email protected] Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp

