Have you checked this article: Using SCSI Persistent Reservation Fencing (fence_scsi) with pacemaker in a Red Hat High Availability cluster - Red Hat Customer Portal
| | | | | | | | | | | Using SCSI Persistent Reservation Fencing (fence_scsi) with pacemaker in... This article describes how to properly configure fence_scsi and the requirements for using it. | | | Have you checked if your storage supports persistent reservations? Best Regards,Strahil Nikolov В сряда, 30 октомври 2019 г., 8:42:16 ч. Гринуич-4, RAM PRASAD TWISTED ILLUSIONS <ramd...@gmail.com> написа: Hi everyone, I am trying to set up a storage cluster with two nodes, both running debian buster. The two nodes called, duke and miles, have a LUN residing on a SAN box as their shared storage device between them. As you can see in the output of pcs status, all the demons are active and I can get the nodes online without any issues. However, I cannot get the fencing resources to start. These two nodes were running debian jessie before and had access to the same LUN in a storage cluster configuration. Now, I am trying to recreate a similar setup with both nodes now running the latest debian. I am not sure if this is relevant, but this LUN already has shared VG with data on it. I am wondering if this could be the cause of the trouble? Should I be creating my stonith device on a different/fresh LUN? ####### pcs status Cluster name: jazz Stack: corosync Current DC: duke (version 2.0.1-9e909a5bdd) - partition with quorum Last updated: Wed Oct 30 11:58:19 2019 Last change: Wed Oct 30 11:28:28 2019 by root via cibadmin on duke 2 nodes configured 2 resources configured Online: [ duke miles ] Full list of resources: fence_duke (stonith:fence_scsi): Stopped fence_miles (stonith:fence_scsi): Stopped Failed Fencing Actions: * unfencing of duke failed: delegate=, client=pacemaker-controld.1703, origin=duke, last-failed='Wed Oct 30 11:43:29 2019' * unfencing of miles failed: delegate=, client=pacemaker-controld.1703, origin=duke, last-failed='Wed Oct 30 11:43:29 2019' Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ####### I used the following commands to add the two fencing devices and set their location constraints . ####### sudo pcs cluster cib test_cib_cfg pcs -f test_cib_cfg stonith create fence_duke fence_scsi pcmk_host_list=duke pcmk_reboot_action="off" devices="/dev/disk/by-id/wwn-0x600c0ff0001e8e3c89601b5801000000" meta provides="unfencing" pcs -f test_cib_cfg stonith create fence_miles fence_scsi pcmk_host_list=miles pcmk_reboot_action="off" devices="/dev/disk/by-id/wwn-0x600c0ff0001e8e3c89601b5801000000" delay=15 meta provides="unfencing" pcs -f test_cib_cfg constraint location fence_duke avoids duke=INFINITY pcs -f test_cib_cfg constraint location fence_miles avoids miles=INFINITY pcs cluster cib-push test_cib_cfg ####### Here is the output in /var/log/pacemaker/pacemaker.log after adding the fencing resources Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (determine_online_status_fencing) info: Node miles is active Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (determine_online_status) info: Node miles is online Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (determine_online_status_fencing) info: Node duke is active Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (determine_online_status) info: Node duke is online Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 2 is already processed Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 1 is already processed Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 2 is already processed Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 1 is already processed Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (common_print) info: fence_duke (stonith:fence_scsi): Stopped Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (common_print) info: fence_miles (stonith:fence_scsi): Stopped Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (RecurringOp) info: Start recurring monitor (60s) for fence_duke on miles Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (RecurringOp) info: Start recurring monitor (60s) for fence_miles on duke Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogNodeActions) notice: * Fence (on) miles 'required by fence_duke monitor' Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogNodeActions) notice: * Fence (on) duke 'required by fence_duke monitor' Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogAction) notice: * Start fence_duke ( miles ) Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogAction) notice: * Start fence_miles ( duke ) Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (process_pe_message) notice: Calculated transition 63, saving inputs in /var/lib/pacemaker/pengine/pe-input-23.bz2 Oct 30 12:06:02 duke pacemaker-controld [1703] (do_state_transition) info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response Oct 30 12:06:02 duke pacemaker-controld [1703] (do_te_invoke) info: Processing graph 63 (ref=pe_calc-dc-1572433562-101) derived from /var/lib/pacemaker/pengine/pe-input-23.bz2 Oct 30 12:06:02 duke pacemaker-controld [1703] (te_fence_node) notice: Requesting fencing (on) of node miles | action=5 timeout=60000 Oct 30 12:06:02 duke pacemaker-controld [1703] (te_fence_node) notice: Requesting fencing (on) of node duke | action=2 timeout=60000 Oct 30 12:06:02 duke pacemaker-fenced [1699] (handle_request) notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 'miles' with device '(any)' Oct 30 12:06:02 duke pacemaker-fenced [1699] (initiate_remote_stonith_op) notice: Requesting peer fencing (on) of miles | id=a0ac6e3a-0296-4aff-85e3-c591f75f38d3 state=0 Oct 30 12:06:02 duke pacemaker-fenced [1699] (handle_request) notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 'duke' with device '(any)' Oct 30 12:06:02 duke pacemaker-fenced [1699] (initiate_remote_stonith_op) notice: Requesting peer fencing (on) of duke | id=261d9311-0553-48ff-864f-41d53d12b152 state=0 Oct 30 12:06:02 duke pacemaker-fenced [1699] (can_fence_host_with_device) notice: fence_miles can not fence (on) duke: static-list Oct 30 12:06:02 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 1 of 2 from duke for miles/on (0 devices) a0ac6e3a-0296-4aff-85e3-c591f75f38d3 Oct 30 12:06:02 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 1 of 2 from duke for duke/on (0 devices) 261d9311-0553-48ff-864f-41d53d12b152 Oct 30 12:06:02 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 2 of 2 from miles for miles/on (0 devices) a0ac6e3a-0296-4aff-85e3-c591f75f38d3 Oct 30 12:06:02 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: All query replies have arrived, continuing (2 expected/2 received) Oct 30 12:06:02 duke pacemaker-fenced [1699] (stonith_choose_peer) notice: Couldn't find anyone to fence (on) miles with any device Oct 30 12:06:02 duke pacemaker-fenced [1699] (call_remote_stonith) info: Total timeout set to 60 for peer's fencing of miles for pacemaker-controld.1703|id=a0ac6e3a-0296-4aff-85e3-c591f75f38d3 Oct 30 12:06:02 duke pacemaker-fenced [1699] (call_remote_stonith) info: No peers (out of 2) have devices capable of fencing (on) miles for pacemaker-controld.1703 (0) Oct 30 12:06:02 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 2 of 2 from miles for duke/on (0 devices) 261d9311-0553-48ff-864f-41d53d12b152 Oct 30 12:06:02 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: All query replies have arrived, continuing (2 expected/2 received) Oct 30 12:06:02 duke pacemaker-fenced [1699] (stonith_choose_peer) notice: Couldn't find anyone to fence (on) duke with any device Oct 30 12:06:02 duke pacemaker-fenced [1699] (call_remote_stonith) info: Total timeout set to 60 for peer's fencing of duke for pacemaker-controld.1703|id=261d9311-0553-48ff-864f-41d53d12b152 Oct 30 12:06:02 duke pacemaker-fenced [1699] (call_remote_stonith) info: No peers (out of 2) have devices capable of fencing (on) duke for pacemaker-controld.1703 (0) Oct 30 12:06:02 duke pacemaker-fenced [1699] (remote_op_done) error: Operation on of miles by <no-one> for pacemaker-controld.1703@duke.a0ac6e3a: No such device Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 15/5:63:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19) Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 15 for miles failed (No such device): aborting transition. Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_callback) warning: No devices found in cluster to fence miles, giving up Oct 30 12:06:02 duke pacemaker-controld [1703] (abort_transition_graph) notice: Transition 63 aborted: Stonith failed | source=abort_for_stonith_failure:776 complete=false Oct 30 12:06:02 duke pacemaker-fenced [1699] (remote_op_done) error: Operation on of duke by <no-one> for pacemaker-controld.1703@duke.261d9311: No such device Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_notify) error: Unfencing of miles by <anyone> failed: No such device (-19) Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 16/2:63:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19) Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 16 for duke failed (No such device): aborting transition. Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_callback) warning: No devices found in cluster to fence duke, giving up Oct 30 12:06:02 duke pacemaker-controld [1703] (abort_transition_graph) info: Transition 63 aborted: Stonith failed | source=abort_for_stonith_failure:776 complete=false Oct 30 12:06:02 duke pacemaker-controld [1703] (tengine_stonith_notify) error: Unfencing of duke by <anyone> failed: No such device (-19) Oct 30 12:06:02 duke pacemaker-controld [1703] (run_graph) notice: Transition 63 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=8, Source=/var/lib/pacemaker/pengine/pe-input-23.bz2): Complete Oct 30 12:06:02 duke pacemaker-controld [1703] (do_log) info: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd Oct 30 12:06:02 duke pacemaker-controld [1703] (do_state_transition) notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd Oct 30 12:06:06 duke pacemaker-based [1698] (cib_process_ping) info: Reporting our current digest to duke: c75a23192109201a5ceaa896d6c313cc for 0.28.6 (0x55a5ab8ff1f0 0) ####### When I tried without explicitly mentioning the device in the stonith commands, this is what I end up having in the pacemaker.log. Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (determine_online_status_fencing) info: Node miles is active Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (determine_online_status) info: Node miles is online Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (determine_online_status_fencing) info: Node duke is active Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (determine_online_status) info: Node duke is online Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 2 is already processed Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 1 is already processed Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 2 is already processed Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop) info: Node 1 is already processed Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (common_print) info: fence_duke (stonith:fence_scsi): Stopped Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (common_print) info: fence_miles (stonith:fence_scsi): Stopped Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (RecurringOp) info: Start recurring monitor (60s) for fence_duke on miles Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (RecurringOp) info: Start recurring monitor (60s) for fence_miles on duke Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogNodeActions) notice: * Fence (on) miles 'required by fence_duke monitor' Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogNodeActions) notice: * Fence (on) duke 'required by fence_duke monitor' Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogAction) notice: * Start fence_duke ( miles ) Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogAction) notice: * Start fence_miles ( duke ) Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (process_pe_message) notice: Calculated transition 69, saving inputs in /var/lib/pacemaker/pengine/pe-input-28.bz2 Oct 30 12:22:34 duke pacemaker-controld [1703] (do_state_transition) info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response Oct 30 12:22:34 duke pacemaker-controld [1703] (do_te_invoke) info: Processing graph 69 (ref=pe_calc-dc-1572434554-114) derived from /var/lib/pacemaker/pengine/pe-input-28.bz2 Oct 30 12:22:34 duke pacemaker-controld [1703] (te_fence_node) notice: Requesting fencing (on) of node miles | action=5 timeout=60000 Oct 30 12:22:34 duke pacemaker-controld [1703] (te_fence_node) notice: Requesting fencing (on) of node duke | action=2 timeout=60000 Oct 30 12:22:34 duke pacemaker-fenced [1699] (handle_request) notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 'miles' with device '(any)' Oct 30 12:22:34 duke pacemaker-fenced [1699] (initiate_remote_stonith_op) notice: Requesting peer fencing (on) of miles | id=4d360268-d290-42e6-b28f-fd4d7649613b state=0 Oct 30 12:22:34 duke pacemaker-fenced [1699] (handle_request) notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 'duke' with device '(any)' Oct 30 12:22:34 duke pacemaker-fenced [1699] (initiate_remote_stonith_op) notice: Requesting peer fencing (on) of duke | id=90ca3294-5eb5-4c66-a298-cd5afcbbbd77 state=0 Oct 30 12:22:34 duke pacemaker-fenced [1699] (can_fence_host_with_device) notice: fence_miles can not fence (on) duke: static-list Oct 30 12:22:34 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 1 of 2 from duke for miles/on (0 devices) 4d360268-d290-42e6-b28f-fd4d7649613b Oct 30 12:22:34 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 2 of 2 from miles for miles/on (0 devices) 4d360268-d290-42e6-b28f-fd4d7649613b Oct 30 12:22:34 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: All query replies have arrived, continuing (2 expected/2 received) Oct 30 12:22:34 duke pacemaker-fenced [1699] (stonith_choose_peer) notice: Couldn't find anyone to fence (on) miles with any device Oct 30 12:22:34 duke pacemaker-fenced [1699] (call_remote_stonith) info: Total timeout set to 60 for peer's fencing of miles for pacemaker-controld.1703|id=4d360268-d290-42e6-b28f-fd4d7649613b Oct 30 12:22:34 duke pacemaker-fenced [1699] (call_remote_stonith) info: No peers (out of 2) have devices capable of fencing (on) miles for pacemaker-controld.1703 (0) Oct 30 12:22:34 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 1 of 2 from miles for duke/on (0 devices) 90ca3294-5eb5-4c66-a298-cd5afcbbbd77 Oct 30 12:22:34 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: Query result 2 of 2 from duke for duke/on (0 devices) 90ca3294-5eb5-4c66-a298-cd5afcbbbd77 Oct 30 12:22:34 duke pacemaker-fenced [1699] (process_remote_stonith_query) info: All query replies have arrived, continuing (2 expected/2 received) Oct 30 12:22:34 duke pacemaker-fenced [1699] (stonith_choose_peer) notice: Couldn't find anyone to fence (on) duke with any device Oct 30 12:22:34 duke pacemaker-fenced [1699] (call_remote_stonith) info: Total timeout set to 60 for peer's fencing of duke for pacemaker-controld.1703|id=90ca3294-5eb5-4c66-a298-cd5afcbbbd77 Oct 30 12:22:34 duke pacemaker-fenced [1699] (call_remote_stonith) info: No peers (out of 2) have devices capable of fencing (on) duke for pacemaker-controld.1703 (0) Oct 30 12:22:34 duke pacemaker-fenced [1699] (remote_op_done) error: Operation on of miles by <no-one> for pacemaker-controld.1703@duke.4d360268: No such device Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 25/5:69:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19) Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 25 for miles failed (No such device): aborting transition. Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_callback) warning: No devices found in cluster to fence miles, giving up Oct 30 12:22:34 duke pacemaker-controld [1703] (abort_transition_graph) notice: Transition 69 aborted: Stonith failed | source=abort_for_stonith_failure:776 complete=false Oct 30 12:22:34 duke pacemaker-fenced [1699] (remote_op_done) error: Operation on of duke by <no-one> for pacemaker-controld.1703@duke.90ca3294: No such device Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_notify) error: Unfencing of miles by <anyone> failed: No such device (-19) Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 26/2:69:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19) Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_callback) notice: Stonith operation 26 for duke failed (No such device): aborting transition. Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_callback) warning: No devices found in cluster to fence duke, giving up Oct 30 12:22:34 duke pacemaker-controld [1703] (abort_transition_graph) info: Transition 69 aborted: Stonith failed | source=abort_for_stonith_failure:776 complete=false Oct 30 12:22:34 duke pacemaker-controld [1703] (tengine_stonith_notify) error: Unfencing of duke by <anyone> failed: No such device (-19) Oct 30 12:22:34 duke pacemaker-controld [1703] (run_graph) notice: Transition 69 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=8, Source=/var/lib/pacemaker/pengine/pe-input-28.bz2): Complete Oct 30 12:22:34 duke pacemaker-controld [1703] (do_log) info: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd Oct 30 12:22:34 duke pacemaker-controld [1703] (do_state_transition) notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd Oct 30 12:22:37 duke pacemaker-based [1698] (cib_process_ping) info: Reporting our current digest to duke: 2eb5c8ee7e7df17c5737befc7d93de76 for 0.37.6 (0x55a5ab900f70 0) ####### Here is my corosync config for your reference, # Please read the corosync.conf.5 manual page totem { version: 2 cluster_name: debian token: 3000 token_retransmits_before_loss_const: 10 transport: udpu interface { ringnumber: 0 bindnetaddr: 130.237.191.255 } } logging { fileline: off to_stderr: no to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum two_node: 1 } nodelist { node { name: duke nodeid: 1 ring0_addr: XXXXXXXXXX } node { name: miles nodeid: 2 ring0_addr: XXXXXXXXXX } } ####### I am completely out of ideas in terms of what to do, and I would appreciate any help. Let me know if you guys need more details. Thanks in advance! Ram _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/