[ClusterLabs] Approach to validate on stop op (Was Re: crmsh configure delete for constraints)

Vladislav Bogdanov Tue, 29 Mar 2016 05:31:39 -0700

10.02.2016 12:31, Vladislav Bogdanov wrote:

10.02.2016 11:38, Ulrich Windl wrote:

Vladislav Bogdanov <[email protected]> schrieb am 10.02.2016 um
05:39 in

Nachricht <[email protected]>:


[...]

Well, I'd reword. Generally, RA should not exit with error if validation
fails on stop.
Is that better?

[...]

As we have different error codes, what type of error?


Any which makes pacemaker to think resource stop op failed.
OCF_ERR_* particularly.

If pacemaker has got an error on start, it will run stop with the same
set of parameters anyways. And will get error again if that one was from
validation and RA does not differentiate validation for start and stop.
And then circular fencing over the whole cluster is triggered for no
reason.

Of course, for safety, RA could save its state if start was successful
and skip validation on stop only if that state is not found. Otherwise
removed binary or config file would result in resource running on
several nodes.

Well, this all seems to be very complicated to make some general
algorithm ;)

Well, after some thinking, I've got an approach which sounds bothelegant and safe enough to me and my colleagues. Please look at thefollowing excerpt (part of hypothetical RA before the main 'case'):


-----
VALIDATION_FAILURE_FLAG="${HA_RSCTMP}/${OCF_RESOURCE_INSTANCE}.invalid"

case "${__OCF_ACTION}" in
    meta-data)
        meta_data
        exit $OCF_SUCCESS
        ;;
    usage|help)
        usage
        exit $OCF_SUCCESS
        ;;
    start)
        validate
        ret=$?
        if [ ${ret} -ne $OCF_SUCCESS ] ; then
            touch "${VALIDATION_FAILURE_FLAG}"
            exit ${ret}
        fi
        ;;
    stop)
        validate
        ret=$?
        if [ ${ret} -ne $OCF_SUCCESS ] ; then
            if [ -f "${VALIDATION_FAILURE_FLAG}" ] ; then
                rm -f "${VALIDATION_FAILURE_FLAG}"
                exit $OCF_SUCCESS
            else
                exit ${ret}
            fi
        fi
        ;;
    *) # monitor | notify | reload | etc
        validate
        ret=$?
        if [ ${ret} -ne $OCF_SUCCESS ] ; then
            if ocf_is_probe ; then
                exit $OCF_NOT_RUNNING
            fi
            exit $?
        fi
        ;;
esac
-----

Above assumes that validation function does not call exit (and thus useshave_binary instead of check_binary, etc.) but returns an error code.

The main difference to the current ocf_rarun implementation is thatchanges to machine environment (deleted binaries, configs, etc.) stillresult in stop failure (and thus fencing) if that changes were madeafter the successful validation on resource start.


I plan to extensively test such approach in my RAs shortly.

Comments are welcome.

Best,
Vladislav


Regards,
Ulrich



_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Approach to validate on stop op (Was Re: crmsh configure delete for constraints)

Reply via email to