On Sat, Mar 19, 2011 at 6:10 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: > Hi, > > just bumping this to be not forgotten.
Actually I'd missed that this was a Pacemaker specific one and therefore something I needed to look at :-) > RA runs fine for almost a month, several simulated network outages were > passed with full success, so it could be included in some package. I > think pacemaker, because this RA uses pacemaker-specific calls. > > Andrew? Looks pretty reasonable. Can you resend with hg headers (ie. "hg export") so that you can have the appropriate credit? It should be added to extra/resources > > (I can support this one) > > 23.02.2011 11:53, Vladislav Bogdanov wrote: >> Hi Lars, >> >> thank you for your time and for so detailed review. >> >> Just to dot half of i's (where it is about coding style): >> 1. I strongly prefer to cleanly separate data access from main logic by API. >> 2. I prefer to have non-void functions to return result explicitly >> ("main" too). This will prevent "correct" return code from being lost on >> subsequent code modifications. >> 3. I insist on all variables inside functions to be local. This helps to >> avoid some hardly-debugable logic errors. >> 4. I prefer not to touch global variables inside of lower-layer >> functions. To be honest, I prefer to pass everything needed by function >> in its parameters. This sometimes is hard to do, so sometimes I prefer >> to use global variables relatively high at stack rather then pass them >> down through several more functions. >> 5) I prefer to use sub-shells for recursive function calls. >> >> Please look at one more attached revision, and also in comments inline. >> >> >> ... >>>> <longdesc lang="en"> >>>> Every time the monitor action is run, this resource agent records (in the >>>> CIB) speed of active network interfaces from a list. >>> >>> >>> Where "active" is ... >>> what, exactly? >>> Add some hint on the intended use case and purpose, >>> maybe add an example or two. This is the long description, after all. >> >> Done. >> >>> >>> Also note that this is >>> - linux specific >>> - requires kernel >= 2.6.33, >>> afaict no /sys/class/net/*/speed before that >> >> Ahm, was not aware about that. I need to look again at this because I >> need this to run on RHEL6 too. Anybody knows, does it has this sysfs? >> Let's delay with this for a bit. >> >>> >>>> </longdesc> >>>> <shortdesc lang="en">Network interface status</shortdesc> >>> >>> again, "interface status" may be a bit misleading. >> >> Fixed. >> >>> >>>> Weight of each 10Mbps in interface speed (1Gbps = 100 * 10Mbps). >>> >>> The way you implemented it, you get the speed, then do some math with >>> it, just to arrive at the same value you just read... >>> Why not just give one point per Mbps by default? >> >> Rewrote. >>> >>>> With default value 1Gbps interface will be counted as 1000. >>>> </longdesc> >>>> <shortdesc lang="en">Weight of 10Mbps interface</shortdesc> >>> >>> can someone come up with better wording here? >> >> I tried. >> >>> >>>> <parameter name="dampen" unique="0"> >>>> <longdesc lang="en"> >>>> The time to wait (dampening) further changes occur >>> >>> This english apparently needs fixing >>> already wherever it was copied from ;-) >> >> Hopefully done (although english is not my native lang). >> >>> >>> >>>> </longdesc> >>>> <shortdesc lang="en">Dampening interval</shortdesc> >>>> <content type="integer" default="${OCF_RESKEY_dampen_default}"/> >>>> </parameter> >>>> >>>> <parameter name="debug" unique="0"> >>>> <longdesc lang="en"> >>>> Enables to use default attrd_updater verbose logging on every call. >>> >>> "If enabled, ..." >> >> Re-worded. >>> >>>> </longdesc> >>>> <shortdesc lang="en">Verbose logging</shortdesc> >>>> <content type="string" default="false"/> >>>> </parameter> >>>> >>>> </parameters> >>>> >>>> <actions> >>>> <action name="start" timeout="30" /> >>>> <action name="stop" timeout="30" /> >>>> <action name="reload" timeout="30" /> >>>> <action name="monitor" depth="0" timeout="30" interval="10"/> >>>> <action name="meta-data" timeout="5" /> >>>> <action name="validate-all" timeout="30" /> >>>> </actions> >>>> </resource-agent> >>>> END >>>> } >>>> >>>> usage() { >>>> cat <<END >>>> usage: $0 >>>> {start|stop|monitor|migrate_to|migrate_from|validate-all|meta-data} >>>> >>>> Expects to have a fully populated OCF RA-compliant environment set. >>>> END >>> >>> BTW: >>> cat does an extra fork/exec, and "here documents" go via tmpfiles, >>> which can break things, or slow things down quite a bit. >>> most of the time, I prefer the typically built in echo " >>> ... >>> .... " >>> >>> But that has nothing to do with this RA, even less with its usage(). >>> Actually it comes down to a matter of taste entirely, >>> so just ignore this statement ;-) >> >> Let's leave it as-is. >> >>> >>>> } >>>> >>>> start() { >>>> monitor >>>> if [ $? -eq $OCF_SUCCESS ]; then >>>> return $OCF_SUCCESS >>>> fi >>> >>> monitor && return >> >> Ahm, I'd leave it as-is. It is a bit more readable and probably does not >> cost any extra CPU cycles. >> >>> >>>> ha_pseudo_resource ${ha_pseudo_resource_name} start >>>> update >>> >>> the update has been done in monitor already? >> >> Nope. ha_pseudo_resource ${ha_pseudo_resource_name} monitor returns 7 >> there so update is not called. >> >>> >>>> } >>>> >>>> stop() { >>>> ha_pseudo_resource ${ha_pseudo_resource_name} stop >>>> attrd_updater -D -n ${OCF_RESKEY_name} -d ${OCF_RESKEY_dampen} >>>> ${attrd_options} >>>> return $OCF_SUCCESS >>> >>> do we want to consider the exit code of attrd_updater, or not? >>> if not, what do we do about non-zero exit code? >>> does it have a useful reliable exit code, at all? >>> you don't ignore it in update. >>> Though you ignore the return value of update ... >>> >>> Yeah, right, ping does it the same way probably (I did not check; does >>> it?) -- good to have a reason to review that, too ;-) >> >> We did our best, and failure of attr_updater means much more serious >> problems than this RA han handle. >> >>> >>>> } >>>> >>>> monitor() { >>>> local ret >>>> ha_pseudo_resource ${ha_pseudo_resource_name} monitor >>>> ret=$? >>>> if [ ${ret} -eq $OCF_SUCCESS ] ; then >>>> update >>>> fi >>>> return ${ret} >>> >>> ha_pseudo_resource $ha_pseudo_resource_name monitor || return >>> update >>> >>> and, if that's really necessary, ignore the return code of update: >>> return $OCF_SUCCESS >> >> No, we are interested in NOT_RUNNING too. >> Let's leave as-is. >> >>> >>>> } >>>> >>>> validate() { >>> >>> check for linux, >>> and kernel >= 2.6.33, >>> and sysfs present, >>> and if not, exit with not installed or something? >> >> Again, let's delay it for a bit. >> >>> >>> Ah. just noticed you do that explicitly early on, anyways. >>>> # Check the check interval >>>> if ocf_is_decimal "${OCF_RESKEY_CRM_meta_interval}" && [ >>>> ${OCF_RESKEY_CRM_meta_interval} -gt 0 ]; then >>>> : >>>> else >>>> ocf_log err "Invalid check interval ${OCF_RESKEY_interval}. It >>>> should be positive integer!" >>>> exit $OCF_ERR_CONFIGURED >>>> fi >>>> >>>> # Check the intarfaces list >>> >>> s/tar/ter >> >> Thanks. >> >>> >>>> if [ "x" = "x${OCF_RESKEY_iface}" ]; then >>> >>> You declared yourself as bash. >>> no need for this "x" = "x$VAR" nonsense in bash... >>> in fact, even in sh, I find test -z preferable. >> >> Blindly copied it. Done. >> >>> >>>> ocf_log err "Empty iface parameter. Please specify some network >>>> interface to check" >>> >>> also check for strange characters in iface, >>> it will be used as regex/sed/awk expression later. >> >> Let it be TODO. >> >>> >>>> exit $OCF_ERR_CONFIGURED >>>> fi >>>> >>>> return $OCF_SUCCESS >>>> } >>>> >>>> iface_get_speed() { >>>> local iface=$1 >>>> local operstate >>>> local carrier >>>> local speed >>>> >>>> if [ ! -e "/sys/class/net/${iface}" ] ; then >>>> echo 0 >>>> elif iface_is_bridge ${iface} ; then >>>> bridge_get_speed ${iface} >>>> elif iface_is_bond ${iface} ; then >>>> bond_get_speed ${iface} >>>> elif iface_is_vlan ${iface} ; then >>>> iface_get_speed $( vlan_get_phy ${iface} ) >>>> else >>>> read operstate < "/sys/class/net/${iface}/operstate" >>>> read carrier < "/sys/class/net/${iface}/carrier" >>>> if [ "${operstate}" != "up" ] || [ "${carrier}" != "1" ] ; then >>>> speed="0" >>>> else >>>> read speed < "/sys/class/net/${iface}/speed" >>>> fi >>>> echo ${speed} >>> >>> I'd prefer to have speed non-local, >> >> I'd not. Let's be safe with recursions. >> >>> and be a return value. >> >> Does bash already support 10000 to be function return code? ;) >> >>> >>> then you can do away with the $(iface_get_speed) later as well. >>> >>>> fi >>>> } >>>> >>>> iface_is_vlan() { >>>> local iface=$1 >>>> [ -e "/proc/net/vlan/${iface}" ] && return 0 || return 1 >>>> } >>>> >>>> iface_is_bridge() { >>>> local iface=$1 >>>> [ -e "/sys/class/net/${iface}/bridge" ] && return 0 || return 1 >>>> } >>>> >>>> iface_is_bond() { >>>> local iface=$1 >>>> [ -e "/sys/class/net/${iface}/bonding" ] && return 0 || return 1 >>>> } >>> >>> All these functions: >>> iface_is_vlan() { [ -e "/proc/net/vlan/$1" ]; } >>> iface_is_bridge() { [ -e "/sys/class/net/$1/bridge" ]; } >>> iface_is_bond() { [ -e "/sys/class/net/$1/bonding" ]; } >> >> Please see above about implicit return codes. This should not take any >> CPU cycles. >> >>> >>>> vlan_get_phy() { >>>> local iface=$1 >>>> grep "^${iface} " "/proc/net/vlan/config" | sed -r 's/.*\| +(.*)/\1/' >>> >>> probably safe to assume that on a box running linux kernel >= 2.6.33 >>> the sed supports -r, but you never know ;-) >>> also, if you use sed aleady, have it do the pattern matching as well? >>> >>> vlan_get_phy() { sed -ne "s/^$1 .*| *//p" < /proc/net/vlan/config; } >> >> Ahm, good point, thanks. >> >>> >>>> } >>>> >>>> bridge_is_stp_enabled() { >>>> local iface=$1 >>>> local stp >>>> read stp < "/sys/class/net/${iface}/bridge/stp_state" >>>> [ "${stp}" = "1" ] && return 0 || return 1 >>> >>> how about >>> bridge_is_stp_enabled() { grep 1 < /sys/class/net/$1/bridge/stp_state; } >> >> This will be slower. >> >>> >>> maybe add 2>/dev/null ? >>> >>> or, as we are bash: >>> bridge_is_stp_enabled() { [[ $(</sys/class/net/$1/bridge/stp_state) = >>> 1 ]]; } >> >> I'd say this is unreadable for me. I'd avoid non-trivial constructs with >> no performance gain. And again, implicit vs explicit. >> >>> >>> but some may prefer the more verbose thing. >>> if so, at least do away with the && return || return. >>> shell function return value is the "exit status" of the last command >>> executed, so [[ $stp = 1 ]] is sufficient. >> >> But more error-prone on subsequent code modifications. >> >>> >>>> } >>>> >>>> bridge_get_root_ports() { >>>> local bridge=$1 >>>> local root_id >>>> local root_ports="" >>>> local bridge_id >>>> >>>> read root_id < "/sys/class/net/${bridge}/bridge/root_id" >>>> >>>> for port in /sys/class/net/${bridge}/brif/* ; do >>>> read bridge_id < "${port}/designated_bridge" >>>> if [ "${bridge_id}" = "${root_id}" ] ; then >>>> root_ports="${root_ports} ${port##*/}" >>>> fi >>>> done >>> >>> Yes, I think here the more verbose style is appropriate ;-) >>> >>>> echo "${root_ports# }" >> >> I'd leave this as-is. And, I remade this chunk of code, so this >> construct is (probably) really needed now. Anyways, I hate implicit >> assumptions. >> >>> >>> Though this is not necessary: echo $root_ports would do this just fine. >>> and, again, I'd prefer a non-local variable to pass the value >>> over some $(subshell) thingy. >> >> Did it another way. >> >>> >>>> } >>>> >>>> # From /inlude/linux/if_bridge.h: >>>> #define BR_STATE_DISABLED 0 >>>> #define BR_STATE_LISTENING 1 >>>> #define BR_STATE_LEARNING 2 >>>> #define BR_STATE_FORWARDING 3 >>>> #define BR_STATE_BLOCKING 4 >>>> >>>> bridge_get_active_ports() { >>>> local bridge=$1 >>>> shift 1 >>>> local ports="$*" >>>> local active_ports="" >>>> local port_state >>>> local stp_state=bridge_is_stp_enabled ${bridge} >>> >>> some double quote missing there? >> >> I should be temporarily insane while writing that ;) >> Fixed. >> >>> >>>> local warn=0 >>>> >>>> if [ -z "${ports}" ] || [ "${ports}" = "detect" ] ; then >>>> ports=$( bridge_get_root_ports ${bridge} ) >>> >>> if you did non-local root_ports, this would become >>> bridge_get_root_ports $bridge >>> # root ports now in $root_ports, >>> # so you can assign port=$root_ports, >>> # or just use $root_ports as is. >> >> Rewrote this. >> >>> >>>> fi >>>> >>>> for port in $ports ; do >>> >>> there. no need to trim white space anyways, as I thought. >> >> I'd leave it. >> >>> >>>> if [ ! -e "/sys/class/net/${bridge}/brif/${port}" ] ; then >>>> ocf_log warning "Port ${port} doesn't belong to bridge >>>> ${bridge}" >>>> continue >>>> fi >>>> read port_state < "/sys/class/net/${bridge}/brif/${port}/state" >>>> if [ "${port_state}" = "3" ] ; then >>>> if [ -n "${active_ports}" ] && ${stp_state} ; then >>> >>> no need to re-check stp state for each iteration. >>> do that once, and reference the result here. >> >> This nonsense is fixed. >> >>> >>>> warn=1 >>>> fi >>>> active_ports="${active_ports} ${port}" >>>> fi >>>> done >>>> if [ ${warn} -eq 1 ] ; then >>>> ocf_log warning "More then one upstream port in bridge '${bridge}' >>>> is in forwarding state while STP is enabled: ${active_ports}" >>>> fi >>>> echo "${active_ports# }" >>> >>> again, no need to trim white space, and I prefer non-local documented >>> variables over $(subshell echo) style assignments. >>> >>>> } >>>> >>>> bridge_get_speed() { >>>> local $iface=$1 >>>> >>>> if ! iface_is_bridge ${iface} ; then >>>> echo 0 >>>> return >>>> fi >>>> >>>> local ports=$( bridge_get_active_ports ${iface} >>>> ${OCF_RESKEY_bridge_ports} ) >> >> Still leaving this as subshell, will rethink later. >> >>>> for port in ${ports} ; do >>>> : $(( aggregate_speed += $( iface_get_speed ${port} ) )) >>>> done >>>> echo ${aggregate_speed} >>> >>> This can be rewritten without subshells. >> >> Error-prone because of recursions. >> I'd leave it as-is. >> >>> >>> It could even be done without bashisms, >>> (there is only one bashism in there, afaics) >>> but who cares, we are bash anyways ;-) >>> >>>> } >>>> >>>> bond_get_slaves() { >>>> local iface=$1 >>>> local slaves >>>> read slaves < "/sys/class/net/${iface}/bonding/slaves" >>>> echo ${slaves} >>> >>> if that is what you wanted, why not just say cat? >> >> It is slower. >> >>> or, drop this function completely, as it only adds noise, >>> and later do >>> slaves=$(< /sys/class/net/${iface}/bonding/slaves) >>> avoiding any subshells? >> >> Data access separation. >> >>> >>>> } >>>> >>>> bond_get_active_iface() { >>>> local iface=$1 >>>> local active >>>> read active < "/sys/class/net/${iface}/bonding/active_slave" >>>> echo ${active} >>>> } >>> >>> similar. >>> >>>> >>>> bond_is_balancing() { >>>> local iface=$1 >>>> read mode mode_index < "/sys/class/net/${iface}/bonding/mode" >>> >>> >>> You declare all variables in all your tiny to-be-used-as-subshell >>> functions as local (needlessly, as they are visible in that subshell >>> only anyways... but keep them local, we want to get rid of the subshell >>> echo style assignments), but then here omit the local? how come? >> >> Having function variables local is simply a good habit I think. Like >> having curly braces around one-statement blocks in C. >> I rewrote some functions so they can be used without subshell at a cost >> of additional variable. >> >>> >>> not that it matters, really, they are used only here, anyways. >>> >>>> case ${mode} in >>>> "balance-rr"|"balance-xor"|"802.3ad"|"balance-tlb"|"balance-alb") >>>> return 0 >>>> ;; >>>> *) >>>> return 1 >>>> ;; >>>> esac >>>> } >>>> >>>> bond_get_speed() { >>>> local iface=$1 >>>> local aggregate_speed=0 >>>> >>>> if ! iface_is_bond ${iface} ; then >>>> echo 0 >>>> return >>>> fi >>>> >>>> local slaves=$( bond_get_slaves ${iface} ) >>>> if bond_is_balancing ${iface} ; then >>>> for slave in ${slaves} ; do >>>> : $(( aggregate_speed += $( iface_get_speed ${slave} ) )) >>>> done >>>> # Bonding is unable to get speed*n >>>> : $(( aggregate_speed = aggregate_speed*8/10 )) >>>> else >>>> : $(( aggregate_speed = $( iface_get_speed $( >>>> bond_get_active_iface ${iface} ) ) )) >>>> fi >>>> echo ${aggregate_speed} >>> >>> Again, I'd prefer this to be handled without subshells. >> >> Recursion. >> >>> >>>> } >>>> >>>> update() { >>>> local speed=$( iface_get_speed ${OCF_RESKEY_iface} ) >>>> >>>> : $(( score = speed * ${OCF_RESKEY_weight_base} / 10 )) >>> >>> hmmm... >>> score = speed * 10 / 10 >>> but I complained about that earlier already ;-) >>> >>>> attrd_updater -n ${OCF_RESKEY_name} -v ${score} -d >>>> ${OCF_RESKEY_dampen} ${attrd_options} >>>> rc=$? >>>> case ${rc} in >>>> 0) >>>> ocf_is_true ${OCF_RESKEY_debug} && ocf_log debug "Updated >>>> ${OCF_RESKEY_name} = ${score}" >>>> ;; >>>> *) >>>> ocf_log warn "Could not update ${OCF_RESKEY_name} = ${score}: >>>> rc=${rc}" >>>> ;; >>>> esac >>>> return ${rc} >>> >>> all that fun just to ignore the return value of this function anyways? >> >> No. It was not ignored. >> But I fixed implicit to explicit return in start(). >> >>> >>> >>>> } >>>> >>>> if [ `uname` != "Linux" ] ; then >>>> ocf_log err "This RA works only on linux." >>>> exit $OCF_ERR_INSTALLED >>>> fi >>>> >>>> if ! ocf_is_true ${OCF_RESKEY_CRM_meta_globally_unique} ; then >>>> : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESKEY_name}"} >>>> else >>>> : ${ha_pseudo_resource_name:="ifspeed-${OCF_RESOURCE_INSTANCE}"} >>>> fi >>> >>> At least one of those clone possibilities is probably nonsense, >>> but wellm, that's how this ha_pseudo_resource stuff works. >> >> Dunno, just blindly copied this. >> >>> >>>> attrd_options='-q' >>>> if ocf_is_true ${OCF_RESKEY_debug} ; then >>>> attrd_options='' >>>> fi >>>> >>>> case $__OCF_ACTION in >>>> meta-data) >>>> meta_data >>>> exit $OCF_SUCCESS >>>> ;; >>>> start) >>>> start >>>> ;; >>>> stop) >>>> stop >>>> ;; >>>> monitor) >>>> monitor >>>> ;; >>>> reload) >>>> start >>>> ;; >>>> validate-all) >>>> validate >>>> ;; >>>> usage|help) >>>> usage >>>> exit $OCF_SUCCESS >>>> ;; >>>> *) >>>> usage >>>> exit $OCF_ERR_UNIMPLEMENTED >>>> ;; >>>> esac >>>> exit $? >>> >>> The $? is not necessary there. >> >> I prefer to have it explicitly. >> >>> >>> >>> >>> Good job! >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker