Hi all.
So far as I know, every stonith plugin is expected to diagnose if
its target is fenced out from the other nodes before it returns
successful status on 'reset' or 'off'.
However, I think this diagnosis is somewhat excess burden for an
indivdual plugin.
Because authors of plugins know how to deal with stonith devices
for which they make plugins, but they can't always expect structure
of clusters on which their plugins will work.
When a clusters administrator try to use some plugin but the diagnosis
of the plugin doesn't match the cluster, the administrator can't help
but directly alter the plugin.
This gets down plugins' adaptiveness and can't be favorable.
One idea to avoid this problem is making schemes or conventions
which enable plugins to delegate the diagnosis to other plugins.
Attached two plugins are a sample of this idea. They work cooperatively
by the attached cib.xml.
'sshAltered' only shoots its targets and 'pingAllAddr' only diagnoses
activity of its targets.
The followings are little more detailed explanations:
When some accidents made necessary to shoot a corrupted node
by another node, the shooter node uses 'sshAltered' firstly to
shoot the target node.
'sshAltered' shoots its targets but never exits with a successful
status if the value of attribute 'shoot_only' is "yes" in the same
way as the attached cib.xml. So, next plugin will be used always
if it is defined.
'pingAllAddr' confirms activity of the IP addresses of its targets
specified in cib.xml. If any of the IP addresses don't respond,
'pingAllAddr' exits with a successful status, otherwise it
exits with an error status.
After once 'external/ssh' is rewritten into 'sshAltered', there
is no need to rewrite it again to use other conditions to
confirm targets' death.
For example, if a cluster uses iSCSI shared storages and
a failover action on this cluster must wait for the iSCSI target
devices to sweep connections to the corrupted node, it can do by
the other type plugins instead of 'pingAllAddr'. Their task is to
ask iSCSI target devices about completion of connection sweeping.
Vice-versa is also true. Any plugin which follows the explained
convention can work together with 'pingAllAddr'.
It can also be avalable by another tag-attibute like this:
<primitive type="external/ssh class="stonith" task="shoot" ...>
I hope some kind of agreement will be made about this problem.
Best regard.
--
Takenaka Kazuhiro <[EMAIL PROTECTED]>
#!/bin/bash
# 'sshAltered' is almost same as 'external/ssh' except 2 points.
# 1) This plugin logs some debug messages into /var/log/stonith.log.
# 2) This plugin doesn't ping to confirm death of the target after
# this shoots them if the value of ${shoot_only} is "yes".
#
# External STONITH module for ssh.
#
# Copyright (c) 2004 SUSE LINUX AG - Lars Marowsky-Bree <[EMAIL PROTECTED]>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
SSH_COMMAND="/usr/bin/ssh -q -x -o PasswordAuthentication=no -o
StrictHostKeyChecking=no -n -l root"
#SSH_COMMAND="/usr/bin/ssh -q -x -n -l root"
REBOOT_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now
>/dev/null 2>&1"
# Warning: If you select this poweroff command, it'll physically
# power-off the machine, and quite a number of systems won't be remotely
# revivable.
# TODO: Probably should touch a file on the server instead to just
# prevent heartbeat et al from being started after the reboot.
# POWEROFF_COMMAND="echo 'sleep 2; /sbin/poweroff -nf' | SHELL=/bin/sh at now
>/dev/null 2>&1"
POWEROFF_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now
>/dev/null 2>&1"
# Rewrite the hostlist to accept "," as a delimeter for hostnames too.
hostlist=`echo $hostlist | tr ',' ' '`
is_host_up() {
for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
if
ping -w1 -c1 "$1" >/dev/null 2>&1
then
sleep 1
else
return 1
fi
done
return 0
}
savelog() {
echo $(date '+%Y%m%d-%H%M%S') ${0##*/} "$@" >> /var/log/stonith.log; }
EXIT() { savelog EXIT $subcmd "$@"; exit "$@";}
savelog "ARGS" "$@" = $hostlist
subcmd=$1
case $1 in
gethosts)
for h in $hostlist ; do
echo $h
done
EXIT 0
;;
on)
# Can't really be implemented because ssh cannot power on a system
# when it is powered off.
EXIT 1
;;
off)
# Shouldn't really be implemented because if ssh cannot power on a
# system, it shouldn't be allowed to power it off.
EXIT 1
;;
reset)
for h in $hostlist
do
if
[ "$h" != "$2" ]
then
continue
fi
if
case ${livedangerously} in
[Yy]*) is_host_up $h;;
*) true;;
esac
then
$SSH_COMMAND "$2" "$REBOOT_COMMAND"
# Good thing this is only for testing...
# Shooting only,
# in other words, skip status verification of the shot node
if [[ "$shoot_only" = yes ]]; then
EXIT 1
fi
if
is_host_up $h
then
EXIT 1
else
EXIT 0
fi
else
# well... Let's call it successful, after all this is only for
testing...
EXIT 0
fi
done
EXIT 1
;;
status)
if
[ -z "$hostlist" ]
then
EXIT 1
fi
for h in $hostlist
do
if
ping -w1 -c1 "$h" 2>&1 | grep "unknown host"
then
EXIT 1
fi
done
EXIT 0
;;
getconfignames)
echo "hostlist"
EXIT 0
;;
getinfo-devid)
echo "ssh STONITH device"
EXIT 0
;;
getinfo-devname)
echo "ssh STONITH external device"
EXIT 0
;;
getinfo-devdescr)
echo "ssh-based Linux host reset"
echo "Fine for testing, but not suitable for production!"
EXIT 0
;;
getinfo-devurl)
echo "http://openssh.org"
EXIT 0
;;
getinfo-xml)
cat << SSHXML
<parameters>
<parameter name="hostlist" unique="1" required="1">
<content type="string" />
<shortdesc lang="en">
Hostlist
</shortdesc>
<longdesc lang="en">
The list of hosts that the STONITH device controls
</longdesc>
</parameter>
<parameter name="livedangerously" unique="0" required="0">
<content type="enum" />
<shortdesc lang="en">
Live Dangerously!!
</shortdesc>
<longdesc lang="en">
Set to "yes" if you want to risk your system's integrity.
Of course, since this plugin isn't for production, using it
in production at all is a bad idea. On the other hand,
setting this parameter to yes makes it an even worse idea.
Viva la Vida Loca!
</longdesc>
</parameter>
</parameters>
SSHXML
EXIT 0
;;
*)
EXIT 1
;;
esac
#!/bin/bash
# 'pingAllAddr' doesn't shoot its targets, this plugin only confirms death of
# the targets. 'pingAllAddr' pings the IP addresses of the targets specified
# in cib.xml. If any of the IP addresses don't respond, 'pingAllAddr'
# exits with a successful status, otherwise it exits with an error status.
savelog() {
echo $(date '+%Y%m%d-%H%M%S') ${0##*/} "$@" >> /var/log/stonith.log; }
EXIT() { savelog EXIT $subcmd "$@"; exit "$@";}
are_all_addrs_dead()
{
savelog ENTER are_all_addrs_dead
declare local host=$1
for name in ${!addrlist*}; do
savelog ADDR $name
eval set -- \$$name
if [[ "$1" = "$host" ]]; then
shift
for addr in "$@"; do
savelog ping $addr
if ping -w1 -c1 "$addr" >/dev/null 2>&1; then
savelog PING OK $addr
return 1
fi
savelog PING NG $addr
done
return 0
fi
done
return 1
}
hostlist=`echo $hostlist | tr ',' ' '`
savelog "ARGS" "$@" = "$hostlist"
subcmd=$1
case $1 in
gethosts)
for h in $hostlist ; do
echo $h
done
EXIT 0
;;
on)
EXIT 1
;;
off)
EXIT 1
;;
reset)
sleep ${initial_wait:0}
for h in $hostlist; do
if [ "$h" != "$2" ]; then
continue
fi
savelog CALL are_all_addrs_dead
if are_all_addrs_dead $h; then
EXIT 0
fi
EXIT 1
done
;;
status)
if [ -z "$hostlist" ]; then
EXIT 1
fi
EXIT 0
;;
getconfignames)
echo "hostlist"
EXIT 0
;;
getinfo-devid)
echo "isNodeAlive"
EXIT 0
;;
getinfo-devname)
echo "isNodeAlive device"
EXIT 0
;;
getinfo-devdescr)
echo "isNodeAlive"
EXIT 0
;;
getinfo-devurl)
echo "http://127.0.0.1"
EXIT 0
;;
getinfo-xml)
cat << EOX
<parameters>
<parameter name="hostlist" unique="1" required="1">
<content type="string" />
<shortdesc lang="en">
Hostlist
</shortdesc>
<longdesc lang="en">
The list of hosts that the STONITH device controls
</longdesc>
</parameter>
</parameters>
EOX
EXIT 0
;;
*)
EXIT 1
;;
esac
<!-- vim:set sw=2 ts=8: -->
<cib epoch="1" num_updates="1" admin_epoch="0">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
<nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
<nvpair id="cib-bootstrap-options-default-resource-failure-stickiness" name="default-resource-failure-stickiness" value="-INFINITY"/>
<nvpair id="cib-bootstrap-options-default-action-timeout" name="default-action-timeout" value="120s"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes/>
<resources>
<primitive id="dummy" class="ocf" type="Dummy" provider="heartbeat">
<operations>
<op id="dummy:start" name="start" timeout="30" on_fail="restart"/>
<op id="dummy:monitor" name="monitor" timeout="30" on_fail="fence" interval="10"/>
<op id="dummy:stop" name="stop" timeout="30" on_fail="fence"/>
</operations>
</primitive>
<clone id="clnFencing" globally_unique="false">
<instance_attributes id="clnFencing:attr">
<attributes>
<nvpair id="clnFencing:attr:clone_max" name="clone_max" value="2"/>
<nvpair id="clnFencing:attr:clone_node_max" name="clone_node_max" value="1"/>
</attributes>
</instance_attributes>
<group id="grpFencing">
<primitive id="prmSshAltered" class="stonith" type="external/sshAltered">
<operations>
<op id="prmSshAltered:op:monitor" name="monitor" interval="5s" timeout="20s" prereq="nothing"/>
<op id="prmSshAltered:op:start" name="start" timeout="20s" prereq="nothing"/>
</operations>
<instance_attributes id="prmSshAltered:attr">
<attributes>
<nvpair id="prmSshAltered:attr:hostlist" name="hostlist" value="node01,node02"/>
<nvpair id="prmSshAltered:attr:shoot_only" name="shoot_only" value="yes"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="prmPingAllAddr" class="stonith" type="external/pingAllAddr">
<operations>
<op id="prmPingAllAddr:op:monitor" name="monitor" interval="5s" timeout="20s" prereq="nothing"/>
<op id="prmPingAllAddr:op:start" name="start" timeout="20s" prereq="nothing"/>
</operations>
<instance_attributes id="prmPingAllAddr:attr">
<attributes>
<nvpair id="prmPingAllAddr:attr:hostlist" name="hostlist" value="node01,node02"/>
<nvpair id="prmPingAllAddr:attr:initial_wait" name="initial_wait" value="5"/>
<nvpair id="prmPingAllAddr:attr:addrlist01" name="addrlist01" value="node01 172.20.24.111 192.168.101.1 192.168.102.1 192.168.110.1"/>
<nvpair id="prmPingAllAddr:attr:addrlist02" name="addrlist02" value="node02 172.20.24.112 192.168.101.2 192.168.102.2 192.168.110.2"/>
</attributes>
</instance_attributes>
</primitive>
</group>
</clone>
</resources>
<constraints>
<rsc_location rsc="dummy" id="dummy:location1" >
<rule id="dummy:rule1" score="200">
<expression id="dummy:exp1" attribute="#uname" operation="eq" value="node01"/>
</rule>
<rule id="dummy:rule2" score="100">
<expression id="dummy:exp2" attribute="#uname" operation="eq" value="node02"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker