Such a situation could occur only if there was a buggy driver and I don't think it would be effective even if OpenSAF handles it because Even otherwise the buggy driver could only make the cluster more unstable (loss of service) before the issue is actually fixed Or the device replaced.
Talking of the patch, to some extent the best result is also dependent on how the communication domain between controllers and payloads is configured (like base and fabric channels!!). At the outset I'm thinking perhaps rebooting the controller is better, but will get back shortly. One more comment inline below: Thanks, Mathi. > -----Original Message----- > From: Hans Feldt [mailto:hans.fe...@ericsson.com] > Sent: Tuesday, November 05, 2013 3:54 PM > To: Nagendra Kumar; hans.nordeb...@ericsson.com; Praveen Malviya; > Mathivanan Naickan Palanivelu; Suryanarayana Garlapati > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] amfnd: Reboot payload when link between > Controller and Payload flickers [#600] > > A couple of comments: > > * guess this change will be obsoleted by the proposed CLM change #220 ? > if so it should be made clear in changelogs and code > [Mathi] No. The scope of #220 is to componentize CLMNA and to support The use case involveing '/etc/init.d/opensafd stop' without a node reboot. https://sourceforge.net/p/opensaf/tickets/439/ (Enhanced Cluster Management) is planned for 4.5. That ticket will focus on using 3rd party cluster managers for enhanced cluster manangement. Cheers, Mathi. > * The MDS thread is changing data in the control block. It should only send a > message to the main thread. > > Thanks, > Hans > > On 10/21/2013 01:33 PM, nagendr...@oracle.com wrote: > > osaf/services/saf/amf/amfnd/di.cc | 13 +++++++++---- > > osaf/services/saf/amf/amfnd/include/avnd_cb.h | 1 + > > osaf/services/saf/amf/amfnd/mds.cc | 11 +++++++++++ > > 3 files changed, 21 insertions(+), 4 deletions(-) > > > > > > diff --git a/osaf/services/saf/amf/amfnd/di.cc > b/osaf/services/saf/amf/amfnd/di.cc > > --- a/osaf/services/saf/amf/amfnd/di.cc > > +++ b/osaf/services/saf/amf/amfnd/di.cc > > @@ -437,13 +437,18 @@ uint32_t avnd_evt_mds_avd_dn_evh(AVND_CB > > > > TRACE_ENTER(); > > > > - LOG_ER("AMF director unexpectedly crashed"); > > - > > /* Don't issue reboot if it has been already issued.*/ > > if (false == cb->reboot_in_progress) { > > cb->reboot_in_progress = true; > > - opensaf_reboot(avnd_cb->node_info.nodeId, (char > *)avnd_cb->node_info.executionEnvironment.value, > > - "local AVD down(Adest) or both AVD > down(Vdest) received"); > > + if(cb->cont_reboot_in_progress == false) { > > + LOG_ER("AMF director unexpectedly crashed"); > > + opensaf_reboot(avnd_cb->node_info.nodeId, (char > *)avnd_cb->node_info.executionEnvironment.value, > > + "local AVD down(Adest) or both AVD > down(Vdest) received"); > > + } else { > > + opensaf_reboot(avnd_cb->node_info.nodeId, (char > *)avnd_cb->node_info.executionEnvironment.value, > > + "Link reset with Act controller"); > > + } > > + > > } > > > > TRACE_LEAVE(); > > diff --git a/osaf/services/saf/amf/amfnd/include/avnd_cb.h > b/osaf/services/saf/amf/amfnd/include/avnd_cb.h > > --- a/osaf/services/saf/amf/amfnd/include/avnd_cb.h > > +++ b/osaf/services/saf/amf/amfnd/include/avnd_cb.h > > @@ -130,6 +130,7 @@ typedef struct avnd_cb_tag { > > SaBoolT first_time_up; > > bool reboot_in_progress; > > AVND_SU *failed_su; > > + bool cont_reboot_in_progress; > > } AVND_CB; > > > > #define AVND_CB_NULL ((AVND_CB *)0) > > diff --git a/osaf/services/saf/amf/amfnd/mds.cc > b/osaf/services/saf/amf/amfnd/mds.cc > > --- a/osaf/services/saf/amf/amfnd/mds.cc > > +++ b/osaf/services/saf/amf/amfnd/mds.cc > > @@ -386,6 +386,7 @@ uint32_t avnd_mds_rcv(AVND_CB *cb, MDS_C > > if ((AVSV_D2N_NODE_UP_MSG == ((AVSV_DND_MSG > *)(rcv_info->i_msg))->msg_type) || > > (AVSV_D2N_DATA_VERIFY_MSG == ((AVSV_DND_MSG > *)(rcv_info->i_msg))->msg_type)) { > > cb->active_avd_adest = rcv_info->i_fr_dest; > > + avnd_cb->cont_reboot_in_progress = false; > > TRACE_1("Active AVD Adest = %" PRIu64 ,cb- > >active_avd_adest); > > } > > > > @@ -560,6 +561,14 @@ uint32_t avnd_mds_svc_evt(AVND_CB *cb, M > > case NCSMDS_UP: > > switch (evt_info->i_svc_id) { > > case NCSMDS_SVC_ID_AVD: > > + > > + if ((m_MDS_DEST_IS_AN_ADEST(evt_info->i_dest) > && avnd_cb->cont_reboot_in_progress) && > > + > (m_NCS_NODE_ID_FROM_MDS_DEST(evt_info->i_dest) == cb- > >active_avd_adest)) { > > + memset(&cb->avd_dest, 0, > sizeof(MDS_DEST)); > > + evt = avnd_evt_create(cb, > AVND_EVT_MDS_AVD_DN, 0, &evt_info->i_dest, 0, 0, 0); > > + break; > > + } > > + > > /* create the mds event */ > > evt = avnd_evt_create(cb, > AVND_EVT_MDS_AVD_UP, 0, &evt_info->i_dest, 0, 0, 0); > > break; > > @@ -606,6 +615,8 @@ uint32_t avnd_mds_svc_evt(AVND_CB *cb, M > > /* Supervise our node local director */ > > if (evt_info->i_node_id != > ncs_get_node_id()) { > > /* Ignore the other AVD Adest > Down.*/ > > + > if(m_NCS_NODE_ID_FROM_MDS_DEST(evt_info->i_dest) == cb- > >active_avd_adest) > > + avnd_cb- > >cont_reboot_in_progress = true; > > return rc; > > } > > } > > > > ------------------------------------------------------------------------------ November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel