SSH-based fencing isn't. A fence method can not assume that the target is in any way functional. A quick way to see why is to crash a node with 'echo c > /proc/sysrq-trigger'.
digimer On 19/11/13 13:10, Andrey Groshev wrote: > Hi everyone again. > > I started training with STONITH. > I wrote a little STONITH external script. > Its basic moments: > * send the command "reboot" with SSH authentication using a key. > * The script takes a single argument - the path to the private key. > * Any node can send reboot any node (even yourself). > > In the crm config it looks like this: > property $id="cib-bootstrap-options" \ > stonith-enabled="true" > primitive st1 stonith:external/sshbykey \ > params path2key="/opt/cluster_tools_2/keys/root@dev-cluster2-master" > pcmk_host_check="none" > clone cloneStonith st1 > > Made the first test - Ok, node was rebooted and resource are started. > #export > path2key=/opt/cluster_tools_2/keys/[email protected] > # stonith -t external/sshbykey -E dev-cluster2-node1 > info: external_run_cmd: '/usr/lib64/stonith/plugins/external/sshbykey reset > dev-cluster2-node1' output: Now boot time 1384850888, send reboot > > info: external_run_cmd: '/usr/lib64/stonith/plugins/external/sshbykey reset > dev-cluster2-node1' output: Daration: 1340 sec. > > info: external_run_cmd: '/usr/lib64/stonith/plugins/external/sshbykey reset > dev-cluster2-node1' output: GOOD NEWS: dev-cluster2-node1 booted in 1384864288 > > Do not worry about attention to the "Duration", this because of the jump time > before synchronization time in the virtual machine and the server. Here the > meaning of a change, rather than a specific number of seconds. Next time > reboot 10 - 20 sec. > > But farther, there are problems and questions. :) > 1. > Make next test: > #stonith_admin --reboot=dev-cluster2-node2 > Node reboot, but resource don't start. > In crm_mon status - Node dev-cluster2-node2 (172793105): pending. > And it will be hung. > Next, if I reboot this node in console, or stonith or stonith_admin (the same > command!) - resources stats. > > Portions of the logs: > trace: unpack_status: Processing node id=172793105, > uname=dev-cluster2-node2 > trace: find_xml_node: Could not find transient_attributes in > node_state. > trace: unpack_instance_attributes: No instance attributes > trace: unpack_status: determining node state > trace: determine_online_status_fencing: dev-cluster2-node2: > in_cluster=false, is_peer=online, join=down, expected=down, term=0 > info: determine_online_status_fencing: - Node dev-cluster2-node2 is > not ready to run resources > trace: determine_online_status: Node dev-cluster2-node2 is offline > > ........ > > trace: unpack_status: Processing lrm resource entries on healthy > node: dev-cluster2-node2 > trace: find_xml_node: Could not find lrm in node_state. > trace: find_xml_node: Could not find lrm_resources in <NULL>. > trace: unpack_lrm_resources: Unpacking resources on > dev-cluster2-node2 > > .............. > trace: can_run_resources: dev-cluster2-node2: online=0, unclean=0, > standby=1, maintenance=0 > trace: check_actions: Skipping param check for dev-cluster2-node2: > cant run resources > ....... > trace: native_color: Pre-allloc: VirtualIP allocation score on > dev-cluster2-node2: 0 > ........... > > > <node id="172793105" uname="dev-cluster2-node2"> > <instance_attributes id="nodes-172793105"> > <nvpair id="nodes-172793105-pgsql-data-status" > name="pgsql-data-status" value="DISCONNECT"/> > <nvpair id="nodes-172793105-standby" name="standby" value="false"/> > <nvpair id="nodes-172793105-thisquorumnode" name="thisquorumnode" > value="no"/> > </instance_attributes> > </node> > > Why do that behavior? > > 2. > There is a slight discrepancy in the Pacemaker Expl. and stonith_admin --help. > stonith_admin --reboot nodename. > In one case, the sign of equality is, in other - no. > Not very important, because operate both. > But when you start to work and something goes wrong, do you think at all > suspicious things. :) > > 3. > Andrew! You promised post about STONITH debug. > > 4. (to ALL) > Also, please tell me the real arguments against the use of the SSH in STONITH. > I have my own guesses and thoughts, but I would like to know your experience. > > My environment: > corosync-2.3.2 > resource-agents-3.9.5 > pacemaker 1.1.11 > ---- > Thanks in advance, > Andrey Groshev > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
