On Fri, Mar 24, 2017 at 2:10 PM, Ken Gaillot <kgail...@redhat.com> wrote:
> On 03/24/2017 03:52 PM, Digimer wrote: > > On 24/03/17 04:44 PM, Seth Reid wrote: > >> I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in > >> production yet because I'm having a problem during fencing. When I > >> disable the network interface of any one machine, the disabled machines > >> is properly fenced leaving me, briefly, with a two node cluster. A > >> second node is then fenced off immediately, and the remaining node > >> appears to try to fence itself off. This leave two nodes with > >> corosync/pacemaker stopped, and the remaining machine still in the > >> cluster but showing an offline node and an UNCLEAN node. What can be > >> causing this behavior? > > > > It looks like the fence attempt failed, leaving the cluster hung. When > > you say all nodes were fenced, did all nodes actually reboot? Or did the > > two surviving nodes just lock up? If the later, then that is the proper > > response to a failed fence (DLM stays blocked). > > See comments inline ... > > > > >> Each machine has a dedicated network interface for the cluster, and > >> there is a vlan on the switch devoted to just these interfaces. > >> In the following, I disabled the interface on node id 2 (b014). Node 1 > >> (b013) is fenced as well. Node 2 (b015) is still up. > >> > >> Logs from b013: > >> Mar 24 16:35:01 b013 CRON[19133]: (root) CMD (command -v debian-sa1 > > >> /dev/null && debian-sa1 1 1) > >> Mar 24 16:35:13 b013 corosync[2134]: notice [TOTEM ] A processor > >> failed, forming new configuration. > >> Mar 24 16:35:13 b013 corosync[2134]: [TOTEM ] A processor failed, > >> forming new configuration. > >> Mar 24 16:35:17 b013 corosync[2134]: notice [TOTEM ] A new membership > >> (192.168.100.13:576 <http://192.168.100.13:576>) was formed. Members > left: 2 > >> Mar 24 16:35:17 b013 corosync[2134]: notice [TOTEM ] Failed to receive > >> the leave message. failed: 2 > >> Mar 24 16:35:17 b013 corosync[2134]: [TOTEM ] A new membership > >> (192.168.100.13:576 <http://192.168.100.13:576>) was formed. Members > left: 2 > >> Mar 24 16:35:17 b013 corosync[2134]: [TOTEM ] Failed to receive the > >> leave message. failed: 2 > >> Mar 24 16:35:17 b013 attrd[2223]: notice: crm_update_peer_proc: Node > >> b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b013 cib[2220]: notice: crm_update_peer_proc: Node > >> b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b013 cib[2220]: notice: Removing b014-cl/2 from the > >> membership list > >> Mar 24 16:35:17 b013 cib[2220]: notice: Purged 1 peers with id=2 > >> and/or uname=b014-cl from the membership cache > >> Mar 24 16:35:17 b013 pacemakerd[2187]: notice: crm_reap_unseen_nodes: > >> Node b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b013 attrd[2223]: notice: Removing b014-cl/2 from the > >> membership list > >> Mar 24 16:35:17 b013 attrd[2223]: notice: Purged 1 peers with id=2 > >> and/or uname=b014-cl from the membership cache > >> Mar 24 16:35:17 b013 stonith-ng[2221]: notice: crm_update_peer_proc: > >> Node b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b013 stonith-ng[2221]: notice: Removing b014-cl/2 from > >> the membership list > >> Mar 24 16:35:17 b013 stonith-ng[2221]: notice: Purged 1 peers with > >> id=2 and/or uname=b014-cl from the membership cache > >> Mar 24 16:35:17 b013 dlm_controld[2727]: 3091 fence request 2 pid 19223 > >> nodedown time 1490387717 fence_all dlm_stonith > >> Mar 24 16:35:17 b013 kernel: [ 3091.800118] dlm: closing connection to > >> node 2 > >> Mar 24 16:35:17 b013 crmd[2227]: notice: crm_reap_unseen_nodes: Node > >> b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b013 dlm_stonith: stonith_api_time: Found 0 entries for > >> 2/(null): 0 in progress, 0 completed > >> Mar 24 16:35:18 b013 stonith-ng[2221]: notice: Operation reboot of > >> b014-cl by b015-cl for stonith-api.19223@b013-cl.7aeb2ffb: OK > >> Mar 24 16:35:18 b013 stonith-api[19223]: stonith_api_kick: Node 2/(null) > >> kicked: reboot > > It looks like the fencing of b014-cl is reported as successful above ... > > >> Mar 24 16:35:18 b013 kernel: [ 3092.421495] dlm: closing connection to > >> node 3 > >> Mar 24 16:35:18 b013 kernel: [ 3092.422246] dlm: closing connection to > >> node 1 > >> Mar 24 16:35:18 b013 dlm_controld[2727]: 3092 abandoned lockspace > share_data > >> Mar 24 16:35:18 b013 dlm_controld[2727]: 3092 abandoned lockspace clvmd > >> Mar 24 16:35:18 b013 kernel: [ 3092.426545] dlm: dlm user daemon left 2 > >> lockspaces > >> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Main process exited, > >> code=exited, status=255/n/a > > ... but then DLM and corosync exit on this node. Pacemaker can only > exit, and the node gets fenced. > > What does your fencing configuration look like? > This is the command I used. b013-cl, for example is a hosts file entry so that the cluster only uses the cluster-only network interface. pcs stonith create fence_wh fence_scsi debug="/var/log/cluster/fence-debug.log" vgs_path="/sbin/vgs" sg_persist_path="/usr/bin/sg_persist" sg_turs_path="/usr/bin/sg_turs" pcmk_reboot_action="off" pcmk_host_list="b013-cl b014-cl b015-cl" pcmk_monitor_action="metadata" meta provides="unfencing" --force I got the pcmk_monitor_action, pcmk_hosts_list, pcmk_reboot_action, and --force from various redhat articles. I've tried getting fencing to start without these, and it doesn't work. > > >> Mar 24 16:35:18 b013 cib[2220]: error: Connection to the CPG API > >> failed: Library error (2) > >> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Unit entered failed > >> state. > >> Mar 24 16:35:18 b013 attrd[2223]: error: Connection to cib_rw failed > >> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Failed with result > >> 'exit-code'. > >> Mar 24 16:35:18 b013 attrd[2223]: error: Connection to > >> cib_rw[0x560754147990] closed (I/O condition=17) > >> Mar 24 16:35:18 b013 systemd[1]: pacemaker.service: Main process exited, > >> code=exited, status=107/n/a > >> Mar 24 16:35:18 b013 pacemakerd[2187]: error: Connection to the CPG > >> API failed: Library error (2) > >> Mar 24 16:35:18 b013 systemd[1]: pacemaker.service: Unit entered failed > >> state. > >> Mar 24 16:35:18 b013 attrd[2223]: notice: Disconnecting client > >> 0x560754149000, pid=2227... > >> Mar 24 16:35:18 b013 systemd[1]: pacemaker.service: Failed with result > >> 'exit-code'. > >> Mar 24 16:35:18 b013 lrmd[2222]: warning: new_event_notification > >> (2222-2227-8): Bad file descriptor (9) > >> Mar 24 16:35:18 b013 stonith-ng[2221]: error: Connection to cib_rw > failed > >> Mar 24 16:35:18 b013 stonith-ng[2221]: error: Connection to > >> cib_rw[0x5579c03ecdd0] closed (I/O condition=17) > >> Mar 24 16:35:18 b013 lrmd[2222]: error: Connection to stonith-ng > failed > >> Mar 24 16:35:18 b013 lrmd[2222]: error: Connection to > >> stonith-ng[0x55888c8ef820] closed (I/O condition=17) > >> Mar 24 16:37:02 b013 kernel: [ 3196.469475] dlm: node 0: socket error > >> sending to node 2, port 21064, sk_err=113/113 > >> Mar 24 16:37:02 b013 kernel: [ 3196.470675] dlm: node 0: socket error > >> sending to node 2, port 21064, sk_err=113/113 > >> Mar 24 16:37:46 b013 kernel: [ 3240.833544] INFO: task gfs2_quotad:3054 > >> blocked for more than 120 seconds. > >> Mar 24 16:37:46 b013 kernel: [ 3240.834565] Not tainted > >> 4.4.0-66-generic #87-Ubuntu > >> Mar 24 16:37:46 b013 kernel: [ 3240.835413] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> Mar 24 16:37:46 b013 kernel: [ 3240.836656] gfs2_quotad D > >> ffff880fd747fa38 0 3054 2 0x00000000 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836663] ffff880fd747fa38 > >> 00000001d8144018 ffff880fd975f2c0 ffff880fd7a972c0 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836666] ffff880fd7480000 > >> ffff887fd81447b8 ffff887fd81447d0 ffff881fd7af00b0 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836669] 0000000000000004 > >> ffff880fd747fa50 ffffffff818384d5 ffff880fd7a972c0 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836672] Call Trace: > >> Mar 24 16:37:46 b013 kernel: [ 3240.836688] [<ffffffff818384d5>] > >> schedule+0x35/0x80 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836695] [<ffffffff8183b380>] > >> rwsem_down_read_failed+0xe0/0x140 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836701] [<ffffffff81406574>] > >> call_rwsem_down_read_failed+0x14/0x30 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836704] [<ffffffff8183a920>] ? > >> down_read+0x20/0x30 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836726] [<ffffffffc0583324>] > >> dlm_lock+0x84/0x1f0 [dlm] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836731] [<ffffffff810b57e3>] ? > >> check_preempt_wakeup+0x193/0x220 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836755] [<ffffffffc06a5da0>] ? > >> gdlm_recovery_result+0x130/0x130 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836764] [<ffffffffc06a5050>] ? > >> gdlm_cancel+0x30/0x30 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836769] [<ffffffff810ab579>] ? > >> ttwu_do_wakeup+0x19/0xe0 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836779] [<ffffffffc06a5499>] > >> gdlm_lock+0x1d9/0x300 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836788] [<ffffffffc06a5050>] ? > >> gdlm_cancel+0x30/0x30 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836798] [<ffffffffc06a5da0>] ? > >> gdlm_recovery_result+0x130/0x130 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836807] [<ffffffffc0686e5f>] > >> do_xmote+0x16f/0x290 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836816] [<ffffffffc068705c>] > >> run_queue+0xdc/0x2d0 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836824] [<ffffffffc06875ef>] > >> gfs2_glock_nq+0x20f/0x410 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836834] [<ffffffffc06a2006>] > >> gfs2_statfs_sync+0x76/0x1c0 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836841] [<ffffffff810ed018>] ? > >> del_timer_sync+0x48/0x50 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836851] [<ffffffffc06a1ffc>] ? > >> gfs2_statfs_sync+0x6c/0x1c0 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836861] [<ffffffffc0697fe3>] > >> quotad_check_timeo.part.18+0x23/0x80 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836871] [<ffffffffc069ad01>] > >> gfs2_quotad+0x241/0x2d0 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836876] [<ffffffff810c41e0>] ? > >> wake_atomic_t_function+0x60/0x60 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836886] [<ffffffffc069aac0>] ? > >> gfs2_wake_up_statfs+0x40/0x40 [gfs2] > >> Mar 24 16:37:46 b013 kernel: [ 3240.836890] [<ffffffff810a0ba8>] > >> kthread+0xd8/0xf0 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836893] [<ffffffff810a0ad0>] ? > >> kthread_create_on_node+0x1e0/0x1e0 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836897] [<ffffffff8183c98f>] > >> ret_from_fork+0x3f/0x70 > >> Mar 24 16:37:46 b013 kernel: [ 3240.836900] [<ffffffff810a0ad0>] ? > >> kthread_create_on_node+0x1e0/0x1e0 > >> > >> Logs from b015: > >> Mar 24 16:35:01 b015 CRON[19781]: (root) CMD (command -v debian-sa1 > > >> /dev/null && debian-sa1 1 1) > >> Mar 24 16:35:13 b015 corosync[2105]: notice [TOTEM ] A processor > >> failed, forming new configuration. > >> Mar 24 16:35:13 b015 corosync[2105]: [TOTEM ] A processor failed, > >> forming new configuration. > >> Mar 24 16:35:17 b015 corosync[2105]: notice [TOTEM ] A new membership > >> (192.168.100.13:576 <http://192.168.100.13:576>) was formed. Members > left: 2 > >> Mar 24 16:35:17 b015 corosync[2105]: notice [TOTEM ] Failed to receive > >> the leave message. failed: 2 > >> Mar 24 16:35:17 b015 corosync[2105]: [TOTEM ] A new membership > >> (192.168.100.13:576 <http://192.168.100.13:576>) was formed. Members > left: 2 > >> Mar 24 16:35:17 b015 corosync[2105]: [TOTEM ] Failed to receive the > >> leave message. failed: 2 > >> Mar 24 16:35:17 b015 attrd[2253]: notice: crm_update_peer_proc: Node > >> b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b015 attrd[2253]: notice: Removing b014-cl/2 from the > >> membership list > >> Mar 24 16:35:17 b015 attrd[2253]: notice: Purged 1 peers with id=2 > >> and/or uname=b014-cl from the membership cache > >> Mar 24 16:35:17 b015 stonith-ng[2251]: notice: crm_update_peer_proc: > >> Node b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b015 stonith-ng[2251]: notice: Removing b014-cl/2 from > >> the membership list > >> Mar 24 16:35:17 b015 cib[2249]: notice: crm_update_peer_proc: Node > >> b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b015 crmd[2255]: notice: State transition S_IDLE -> > >> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > >> origin=abort_transition_graph ] > >> Mar 24 16:35:17 b015 kernel: [ 3478.622093] dlm: closing connection to > >> node 2 > >> Mar 24 16:35:17 b015 stonith-ng[2251]: notice: Purged 1 peers with > >> id=2 and/or uname=b014-cl from the membership cache > >> Mar 24 16:35:17 b015 cib[2249]: notice: Removing b014-cl/2 from the > >> membership list > >> Mar 24 16:35:17 b015 cib[2249]: notice: Purged 1 peers with id=2 > >> and/or uname=b014-cl from the membership cache > >> Mar 24 16:35:17 b015 crmd[2255]: notice: crm_reap_unseen_nodes: Node > >> b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:17 b015 pacemakerd[2159]: notice: crm_reap_unseen_nodes: > >> Node b014-cl[2] - state is now lost (was member) > >> Mar 24 16:35:18 b015 systemd[1]: > >> dev-disk-by\x2did-scsi\x2d36782bcb0007085a70000081958aee1ff.device: Dev > >> dev-disk-by\x2did-scsi\x2d36782bcb0007085a70000081958aee1ff.device > >> appeared twice with different sysfs paths > >> /sys/devices/pci0000:00/0000:00:03.0/0000:08:00.0/host7/ > port-7:0/end_device-7:0/target7:0:0/7:0:0:0/block/sdc > >> and /sys/devices/virtual/block/dm-0 > >> Mar 24 16:35:18 b015 systemd[1]: > >> dev-disk-by\x2did-wwn\x2d0x6782bcb0007085a70000081958aee1ff.device: Dev > >> dev-disk-by\x2did-wwn\x2d0x6782bcb0007085a70000081958aee1ff.device > >> appeared twice with different sysfs paths > >> /sys/devices/pci0000:00/0000:00:03.0/0000:08:00.0/host7/ > port-7:0/end_device-7:0/target7:0:0/7:0:0:0/block/sdc > >> and /sys/devices/virtual/block/dm-0 > >> Mar 24 16:35:18 b015 systemd[1]: > >> dev-disk-by\x2did-scsi\x2d36782bcb0007085a70000081958aee1ff.device: Dev > >> dev-disk-by\x2did-scsi\x2d36782bcb0007085a70000081958aee1ff.device > >> appeared twice with different sysfs paths > >> /sys/devices/pci0000:00/0000:00:03.0/0000:08:00.0/host7/ > port-7:0/end_device-7:0/target7:0:0/7:0:0:0/block/sdc > >> and /sys/devices/virtual/block/dm-0 > >> Mar 24 16:35:18 b015 systemd[1]: > >> dev-disk-by\x2did-wwn\x2d0x6782bcb0007085a70000081958aee1ff.device: Dev > >> dev-disk-by\x2did-wwn\x2d0x6782bcb0007085a70000081958aee1ff.device > >> appeared twice with different sysfs paths > >> /sys/devices/pci0000:00/0000:00:03.0/0000:08:00.0/host7/ > port-7:0/end_device-7:0/target7:0:0/7:0:0:0/block/sdc > >> and /sys/devices/virtual/block/dm-0 > >> Mar 24 16:35:18 b015 stonith-ng[2251]: warning: fence_scsi[19818] > >> stderr: [ WARNING:root:Parse error: Ignoring unknown option > 'port=b014-cl' ] > >> Mar 24 16:35:18 b015 stonith-ng[2251]: warning: fence_scsi[19818] > >> stderr: [ ] > >> Mar 24 16:35:18 b015 stonith-ng[2251]: notice: Operation 'reboot' > >> [19818] (call 2 from stonith-api.19223) for host 'b014-cl' with device > >> 'fence_wh' returned: 0 (OK) > >> Mar 24 16:35:18 b015 stonith-ng[2251]: notice: Operation reboot of > >> b014-cl by b015-cl for stonith-api.19223@b013-cl.7aeb2ffb: OK > >> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 fence request 2 pid 19880 > >> nodedown time 1490387717 fence_all dlm_stonith > >> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 tell corosync to remove > >> nodeid 1 from cluster > >> Mar 24 16:35:18 b015 systemd[1]: > >> dev-disk-by\x2did-scsi\x2d36782bcb0007085a70000081958aee1ff.device: Dev > >> dev-disk-by\x2did-scsi\x2d36782bcb0007085a70000081958aee1ff.device > >> appeared twice with different sysfs paths > >> /sys/devices/pci0000:00/0000:00:03.0/0000:08:00.0/host7/ > port-7:0/end_device-7:0/target7:0:0/7:0:0:0/block/sdc > >> and /sys/devices/virtual/block/dm-0 > >> Mar 24 16:35:18 b015 systemd[1]: > >> dev-disk-by\x2did-wwn\x2d0x6782bcb0007085a70000081958aee1ff.device: Dev > >> dev-disk-by\x2did-wwn\x2d0x6782bcb0007085a70000081958aee1ff.device > >> appeared twice with different sysfs paths > >> /sys/devices/pci0000:00/0000:00:03.0/0000:08:00.0/host7/ > port-7:0/end_device-7:0/target7:0:0/7:0:0:0/block/sdc > >> and /sys/devices/virtual/block/dm-0 > >> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 tell corosync to remove > >> nodeid 1 from cluster > >> Mar 24 16:35:18 b015 dlm_stonith: stonith_api_time: Found 2 entries for > >> 2/(null): 0 in progress, 2 completed > >> Mar 24 16:35:18 b015 dlm_stonith: stonith_api_time: Node 2/(null) last > >> kicked at: 1490387718 > >> Mar 24 16:35:18 b015 kernel: [ 3479.266118] dlm: closing connection to > >> node 1 > >> Mar 24 16:35:18 b015 kernel: [ 3479.266270] dlm: closing connection to > >> node 3 > >> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 abandoned lockspace > share_data > >> Mar 24 16:35:18 b015 dlm_controld[2656]: 3479 abandoned lockspace clvmd > >> Mar 24 16:35:18 b015 kernel: [ 3479.268325] dlm: dlm user daemon left 2 > >> lockspaces > >> Mar 24 16:35:21 b015 corosync[2105]: notice [TOTEM ] A processor > >> failed, forming new configuration. > >> Mar 24 16:35:21 b015 corosync[2105]: [TOTEM ] A processor failed, > >> forming new configuration. > >> Mar 24 16:35:26 b015 corosync[2105]: notice [TOTEM ] A new membership > >> (192.168.100.15:580 <http://192.168.100.15:580>) was formed. Members > left: 1 > >> Mar 24 16:35:26 b015 corosync[2105]: notice [TOTEM ] Failed to receive > >> the leave message. failed: 1 > >> Mar 24 16:35:26 b015 corosync[2105]: [TOTEM ] A new membership > >> (192.168.100.15:580 <http://192.168.100.15:580>) was formed. Members > left: 1 > >> Mar 24 16:35:26 b015 corosync[2105]: [TOTEM ] Failed to receive the > >> leave message. failed: 1 > >> Mar 24 16:35:26 b015 attrd[2253]: notice: crm_update_peer_proc: Node > >> b013-cl[1] - state is now lost (was member) > >> Mar 24 16:35:26 b015 attrd[2253]: notice: Removing b013-cl/1 from the > >> membership list > >> Mar 24 16:35:26 b015 stonith-ng[2251]: notice: crm_update_peer_proc: > >> Node b013-cl[1] - state is now lost (was member) > >> Mar 24 16:35:26 b015 attrd[2253]: notice: Purged 1 peers with id=1 > >> and/or uname=b013-cl from the membership cache > >> Mar 24 16:35:26 b015 stonith-ng[2251]: notice: Removing b013-cl/1 from > >> the membership list > >> Mar 24 16:35:26 b015 pacemakerd[2159]: notice: Membership 580: quorum > >> lost (1) > >> Mar 24 16:35:26 b015 cib[2249]: notice: crm_update_peer_proc: Node > >> b013-cl[1] - state is now lost (was member) > >> Mar 24 16:35:26 b015 stonith-ng[2251]: notice: Purged 1 peers with > >> id=1 and/or uname=b013-cl from the membership cache > >> Mar 24 16:35:26 b015 pacemakerd[2159]: notice: crm_reap_unseen_nodes: > >> Node b013-cl[1] - state is now lost (was member) > >> Mar 24 16:35:26 b015 cib[2249]: notice: Removing b013-cl/1 from the > >> membership list > >> Mar 24 16:35:26 b015 cib[2249]: notice: Purged 1 peers with id=1 > >> and/or uname=b013-cl from the membership cache > >> Mar 24 16:35:26 b015 crmd[2255]: notice: Membership 580: quorum lost > (1) > >> Mar 24 16:35:26 b015 crmd[2255]: notice: crm_reap_unseen_nodes: Node > >> b013-cl[1] - state is now lost (was member) > >> Mar 24 16:35:26 b015 pengine[2254]: notice: We do not have quorum - > >> fencing and resource management disabled > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Node b013-cl is unclean > >> because the node is no longer part of the cluster > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Node b013-cl is unclean > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Action dlm:1_stop_0 on > >> b013-cl is unrunnable (offline) > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Action dlm:1_stop_0 on > >> b013-cl is unrunnable (offline) > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Action clvmd:1_stop_0 on > >> b013-cl is unrunnable (offline) > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Action clvmd:1_stop_0 on > >> b013-cl is unrunnable (offline) > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Action gfs2share:1_stop_0 > >> on b013-cl is unrunnable (offline) > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Action gfs2share:1_stop_0 > >> on b013-cl is unrunnable (offline) > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Node b013-cl is unclean! > >> Mar 24 16:35:26 b015 pengine[2254]: notice: Cannot fence unclean nodes > >> until quorum is attained (or no-quorum-policy is set to ignore) > >> Mar 24 16:35:26 b015 pengine[2254]: notice: Start > >> fence_wh#011(b015-cl - blocked) > >> Mar 24 16:35:26 b015 pengine[2254]: notice: Stop dlm:1#011(b013-cl > >> - blocked) > >> Mar 24 16:35:26 b015 pengine[2254]: notice: Stop > >> clvmd:1#011(b013-cl - blocked) > >> Mar 24 16:35:26 b015 pengine[2254]: notice: Stop > >> gfs2share:1#011(b013-cl - blocked) > >> Mar 24 16:35:26 b015 pengine[2254]: warning: Calculated Transition 9: > >> /var/lib/pacemaker/pengine/pe-warn-2669.bz2 > >> Mar 24 16:35:26 b015 crmd[2255]: notice: Transition 9 (Complete=6, > >> Pending=0, Fired=0, Skipped=0, Incomplete=0, > >> Source=/var/lib/pacemaker/pengine/pe-warn-2669.bz2): Complete > >> Mar 24 16:35:26 b015 crmd[2255]: notice: State transition > >> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL > >> origin=notify_crmd ] > >> Mar 24 16:35:31 b015 controld(dlm)[20000]: ERROR: Uncontrolled lockspace > >> exists, system must reboot. Executing suicide fencing > >> Mar 24 16:35:31 b015 fence_scsi: Failed: keys cannot be same. You can > >> not fence yourself. > >> Mar 24 16:35:31 b015 fence_scsi: Please use '-h' for usage > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ WARNING:root:Parse error: Ignoring unknown option > 'port=b015-cl' ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ ERROR:root:Failed: keys cannot be same. You can not fence > >> yourself. ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ Failed: keys cannot be same. You can not fence yourself. ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ ERROR:root:Please use '-h' for usage ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ Please use '-h' for usage ] > >> Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > >> stderr: [ ] > >> > >> > >> > >> Software versions: > >> corosync 2.3.5-3ubuntu1 > >> pacemaker-common 1.1.14-2ubuntu1.1 > >> pcs 0.9.149-1ubuntu1 > >> libqb0:amd64 1.0-1ubuntu1 > >> gfs2-utils 3.1.6-0ubuntu3 > >> > >> > >> ------- > >> Seth Reid > >> System Operations Engineer > >> Vendini, Inc. > >> 415.349.7736 > >> sr...@vendini.com <mailto:sr...@vendini.com> > >> www.vendini.com <http://www.vendini.com> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org