the start function, need to start the resource when monitor doesn't return success
2018-04-12 23:38 GMT+02:00 Bishoy Mikhael <[email protected]>: > Hi All, > > I'm trying to create a resource agent to promote a standby HDFS namenode > to active when the virtual IP failover to another node. > > I've taken the skeleton from the Dummy OCF agent. > > The modifications I've done to the Dummy agent are as follows: > > HDFSHA_start() { > HDFSHA_monitor > if [ $? = $OCF_SUCCESS ]; then > /opt/hadoop/sbin/hdfs-ha.sh start > return $OCF_SUCCESS > fi > } > > HDFSHA_stop() { > HDFSHA_monitor > if [ $? = $OCF_SUCCESS ]; then > /opt/hadoop/sbin/hdfs-ha.sh stop > fi > return $OCF_SUCCESS > } > > HDFSHA_monitor() { > # Monitor _MUST!_ differentiate correctly between running > # (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING). > # That is THREE states, not just yes/no. > active_nn=$(hdfs haadmin -getAllServiceState | grep active | cut -d":" -f > 1) > current_node=$(uname -n) > if [[ ${active_nn} == ${current_node} ]]; then > return $OCF_SUCCESS > fi > } > > HDFSHA_validate() { > > return $OCF_SUCCESS > } > > > I've created the resource as follows: > > # pcs resource create hdfs-ha ocf:heartbeat:HDFSHA op monitor interval=30s > > > The resource fails right away as follows: > > > # pcs status > > Cluster name: hdfs_cluster > > Stack: corosync > > Current DC: taulog (version 1.1.16-12.el7_4.8-94ff4df) - partition with > quorum > > Last updated: Thu Apr 12 03:30:57 2018 > > Last change: Thu Apr 12 03:30:54 2018 by root via cibadmin on lingcod > > > 3 nodes configured > > 2 resources configured > > > Online: [ dentex lingcod taulog ] > > > Full list of resources: > > > VirtualIP (ocf::heartbeat:IPaddr2): Started taulog > > hdfs-ha (ocf::heartbeat:HDFSHA): FAILED (blocked)[ taulog dentex ] > > > Failed Actions: > > * hdfs-ha_stop_0 on taulog 'insufficient privileges' (4): call=12, > status=complete, exitreason='none', > > last-rc-change='Thu Apr 12 03:17:37 2018', queued=0ms, exec=1ms > > * hdfs-ha_stop_0 on dentex 'insufficient privileges' (4): call=10, > status=complete, exitreason='none', > > last-rc-change='Thu Apr 12 03:17:43 2018', queued=0ms, exec=1ms > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > I debug the resource as follows, and it returns 0 > > # pcs resource debug-monitor hdfs-ha > > Operation monitor for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0 > > > stderr: DEBUG: hdfs-ha monitor : 0 > > > # pcs resource debug-stop hdfs-ha > > Operation stop for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0 > > > stderr: DEBUG: hdfs-ha stop : 0 > > > # pcs resource debug-start hdfs-ha > > Operation start for hdfs-ha (ocf:heartbeat:HDFSHA) returned 0 > > > stderr: DEBUG: hdfs-ha start : 0 > > > > I don't understand what am I doing wrong! > > > Regards, > > Bishoy Mikhael > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > -- .~. /V\ // \\ /( )\ ^`~'^
_______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
