> -----Original Message-----
> From: Mathivanan Naickan Palanivelu [mailto:[email protected]]
> Sent: den 6 november 2013 11:59
> To: Hans Feldt; Nagendra Kumar; Hans Nordebäck; Praveen Malviya; 
> Suryanarayana Garlapati
> Cc: [email protected]
> Subject: RE: [PATCH 1 of 1] amfnd: Reboot payload when link between 
> Controller and Payload flickers [#600]
> 
> Such a situation could occur only if there was a buggy driver and I don't 
> think it would be effective even if OpenSAF handles it because
> Even otherwise the buggy driver could only make the cluster more unstable 
> (loss of service) before the issue is actually fixed
> Or the device replaced.

TIPC can reset a link in low memory situations. A patch to improve this is 
pending in net-dev: http://patchwork.ozlabs.org/patch/288790/

So it doesn't have to be a "buggy driver".

Thanks,
Hans

> 
> Talking of the patch, to some extent the best result is also dependent on how 
> the communication domain between
> controllers and payloads is configured (like base and fabric channels!!).
> At the outset I'm thinking perhaps rebooting the controller is better, but 
> will get back shortly.
> One more comment inline below:
> 
> Thanks,
> Mathi.
> 
> > -----Original Message-----
> > From: Hans Feldt [mailto:[email protected]]
> > Sent: Tuesday, November 05, 2013 3:54 PM
> > To: Nagendra Kumar; [email protected]; Praveen Malviya;
> > Mathivanan Naickan Palanivelu; Suryanarayana Garlapati
> > Cc: [email protected]
> > Subject: Re: [PATCH 1 of 1] amfnd: Reboot payload when link between
> > Controller and Payload flickers [#600]
> >
> > A couple of comments:
> >
> > * guess this change will be obsoleted by the proposed CLM change #220 ?
> >     if so it should be made clear in changelogs and code
> >
> [Mathi]
> No. The scope of #220 is to componentize CLMNA and to support
> The use case involveing '/etc/init.d/opensafd stop' without a node reboot.
> 
> https://sourceforge.net/p/opensaf/tickets/439/ (Enhanced Cluster Management)
> is planned for 4.5. That ticket will focus on using 3rd party cluster managers
> for enhanced cluster manangement.
> 
> Cheers,
> Mathi.
> 
> 
> > * The MDS thread is changing data in the control block. It should only send 
> > a
> > message to the main thread.
> >
> > Thanks,
> > Hans
> >
> > On 10/21/2013 01:33 PM, [email protected] wrote:
> > >   osaf/services/saf/amf/amfnd/di.cc             |  13 +++++++++----
> > >   osaf/services/saf/amf/amfnd/include/avnd_cb.h |   1 +
> > >   osaf/services/saf/amf/amfnd/mds.cc            |  11 +++++++++++
> > >   3 files changed, 21 insertions(+), 4 deletions(-)
> > >
> > >
> > > diff --git a/osaf/services/saf/amf/amfnd/di.cc
> > b/osaf/services/saf/amf/amfnd/di.cc
> > > --- a/osaf/services/saf/amf/amfnd/di.cc
> > > +++ b/osaf/services/saf/amf/amfnd/di.cc
> > > @@ -437,13 +437,18 @@ uint32_t avnd_evt_mds_avd_dn_evh(AVND_CB
> > >
> > >           TRACE_ENTER();
> > >
> > > - LOG_ER("AMF director unexpectedly crashed");
> > > -
> > >           /* Don't issue reboot if it has been already issued.*/
> > >           if (false == cb->reboot_in_progress) {
> > >                   cb->reboot_in_progress = true;
> > > -         opensaf_reboot(avnd_cb->node_info.nodeId, (char
> > *)avnd_cb->node_info.executionEnvironment.value,
> > > -                         "local AVD down(Adest) or both AVD
> > down(Vdest) received");
> > > +         if(cb->cont_reboot_in_progress == false) {
> > > +                 LOG_ER("AMF director unexpectedly crashed");
> > > +                 opensaf_reboot(avnd_cb->node_info.nodeId, (char
> > *)avnd_cb->node_info.executionEnvironment.value,
> > > +                                 "local AVD down(Adest) or both AVD
> > down(Vdest) received");
> > > +         } else {
> > > +                 opensaf_reboot(avnd_cb->node_info.nodeId, (char
> > *)avnd_cb->node_info.executionEnvironment.value,
> > > +                                 "Link reset with Act controller");
> > > +         }
> > > +
> > >           }
> > >
> > >           TRACE_LEAVE();
> > > diff --git a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > > --- a/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > > +++ b/osaf/services/saf/amf/amfnd/include/avnd_cb.h
> > > @@ -130,6 +130,7 @@ typedef struct avnd_cb_tag {
> > >           SaBoolT first_time_up;
> > >           bool reboot_in_progress;
> > >           AVND_SU *failed_su;
> > > + bool cont_reboot_in_progress;
> > >   } AVND_CB;
> > >
> > >   #define AVND_CB_NULL ((AVND_CB *)0)
> > > diff --git a/osaf/services/saf/amf/amfnd/mds.cc
> > b/osaf/services/saf/amf/amfnd/mds.cc
> > > --- a/osaf/services/saf/amf/amfnd/mds.cc
> > > +++ b/osaf/services/saf/amf/amfnd/mds.cc
> > > @@ -386,6 +386,7 @@ uint32_t avnd_mds_rcv(AVND_CB *cb, MDS_C
> > >                   if ((AVSV_D2N_NODE_UP_MSG == ((AVSV_DND_MSG
> > *)(rcv_info->i_msg))->msg_type) ||
> > >                       (AVSV_D2N_DATA_VERIFY_MSG == ((AVSV_DND_MSG
> > *)(rcv_info->i_msg))->msg_type)) {
> > >                           cb->active_avd_adest = rcv_info->i_fr_dest;
> > > +                 avnd_cb->cont_reboot_in_progress = false;
> > >                           TRACE_1("Active AVD Adest = %" PRIu64 ,cb-
> > >active_avd_adest);
> > >                   }
> > >
> > > @@ -560,6 +561,14 @@ uint32_t avnd_mds_svc_evt(AVND_CB *cb, M
> > >           case NCSMDS_UP:
> > >                   switch (evt_info->i_svc_id) {
> > >                   case NCSMDS_SVC_ID_AVD:
> > > +
> > > +                 if ((m_MDS_DEST_IS_AN_ADEST(evt_info->i_dest)
> > && avnd_cb->cont_reboot_in_progress) &&
> > > +
> >     (m_NCS_NODE_ID_FROM_MDS_DEST(evt_info->i_dest) == cb-
> > >active_avd_adest)) {
> > > +                         memset(&cb->avd_dest, 0,
> > sizeof(MDS_DEST));
> > > +                         evt = avnd_evt_create(cb,
> > AVND_EVT_MDS_AVD_DN, 0, &evt_info->i_dest, 0, 0, 0);
> > > +                         break;
> > > +                 }
> > > +
> > >                           /* create the mds event */
> > >                           evt = avnd_evt_create(cb,
> > AVND_EVT_MDS_AVD_UP, 0, &evt_info->i_dest, 0, 0, 0);
> > >                           break;
> > > @@ -606,6 +615,8 @@ uint32_t avnd_mds_svc_evt(AVND_CB *cb, M
> > >                                   /* Supervise our node local director */
> > >                                   if (evt_info->i_node_id !=
> > ncs_get_node_id()) {
> > >                                           /* Ignore the other AVD Adest
> > Down.*/
> > > +
> >     if(m_NCS_NODE_ID_FROM_MDS_DEST(evt_info->i_dest) == cb-
> > >active_avd_adest)
> > > +                                         avnd_cb-
> > >cont_reboot_in_progress = true;
> > >                                           return rc;
> > >                                   }
> > >                           }
> > >
> > >

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to