One another issue that can occur is that because of a slowed exit of AMFD on
the node going down,
i.e. During the 'opensafd stop' flow, I think the local AMFD should mark the
local node as "ABSENT" upon receiving down event of local AMFND as below:
diff --git a/osaf/services/saf/amf/amfd/ndfsm.cc
b/osaf/services/saf/amf/amfd/ndfsm.cc
--- a/osaf/services/saf/amf/amfd/ndfsm.cc
+++ b/osaf/services/saf/amf/amfd/ndfsm.cc
@@ -321,6 +321,7 @@ void avd_mds_avnd_down_evh(AVD_CL_CB *cb
// Do nothing if the local node goes down. Most likely due to
system shutdown.
// If node director goes down due to a bug, the AMF watchdog
will restart the node.
if (node->node_info.nodeId == cb->node_id_avd) {
+ node->node_state = AVD_AVND_STATE_ABSENT;
TRACE("Ignoring down event for local node director");
goto done;
}
This is because, if for some reason there is a small delay for the AMFD to exit
(through amfd's stop script) as described below,
Then during this duration(of delay) the other controller would have already
become ACTIVE and the local active AMFD would have received
a CLM cluster track callback that is marking the local node(going down) as
exiting the cluster.
Without the above protection(or similar), it can lead to other problems.
Comments?
Thanks,
Mathi.
> -----Original Message-----
> From: Mathivanan Naickan Palanivelu
> Sent: Friday, January 10, 2014 10:45 PM
> To: Hans Feldtanders.widell; Venkata Mahesh Alla
> Cc: [email protected]
> Subject: [devel] Possible Time delay between AMFD exit and DTM exit
> during opensaf stop
>
> Hi,
>
> We might have discussed this before but, i think there is a small chance for
> the following to happen during the opensafd stop scenario.
>
> for cmd in `ls $pkgclcclidir/osaf-*`; do
> # skip dtm here to allow shutdown of other services (e.g.
> amfd)
> ===> if [ "$cmd" != "$pkgclcclidir/osaf-dtm" ] && [ "$cmd" !=
> "$pkgclcclidir/osaf-transport-monitor" ]; then
> $cmd stop >/dev/null 2>&1
> fi
> done
> [Mathi]
> AMFD clc-cli script would have got invoked because of the above lines. (This
> does not necessarily mean that the script has finished execution!)
>
> if [ "$MDS_TRANSPORT" = "TIPC" ]; then
> unload_tipc
> else
> # stop dtm, now all dependent services should be stopped
> ====> $pkgclcclidir/osaf-dtm stop >/dev/null 2>&1
> [Mathi]
> By the time osaf-dtm is killed, there is a possibility that osaf-amfd has
> still not
> exited.
> Is it possible? If so, we might have to probably check and wait for the amfd
> pid to disappear before doing the kill here?
>
> rm -f $pkglocalstatedir/osaf_dtm_intra_server
> fi
>
> What do you say?
>
> Thanks,
> Mathi.
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical
> Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.cl
> ktrk
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel