Hi Hans N,
Thanks for pointing it out.
Do you mean, we should add handling of errors like BAD_HANDLE, TIMEOUT in
immutil function and remove handling from Amf code ? If yes, then I can raise
an enhancement ticket on imm util and another on 'Amf/other services in
general' to clean the code. Is that ok?
Thanks
-Nagu
> -----Original Message-----
> From: Hans Nordebäck [mailto:[email protected]]
> Sent: 28 May 2015 16:32
> To: Anders Björnerstedt; Nagendra Kumar; Praveen Malviya
> Cc: [email protected]
> Subject: RE: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear times
> out
>
> Hi, in ticket #1290 BAD_HANDLE, one comment was that preferably should
> immutil be updated to handle BAD_HANDLE. Probably immutil should also
> handle other errors, e.g. ERR_TIMEOUT, as for this ticket #1360.
>
> /Thanks HansN
>
> -----Original Message-----
> From: Anders Björnerstedt
> Sent: den 27 maj 2015 09:52
> To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya
> Cc: [email protected]
> Subject: RE: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear times
> out
>
> Hi again,
>
> I understand that this case is apparently due to message loss IMMND->IMMD.
>
> Since change-set:
>
> changeset: 3523:5228e50660b4
> user: Anders Bjornerstedt <[email protected]>
> date: Fri Apr 27 15:02:01 2012 +0200
> summary: Immsv: IMMND subscribes (mds) in normal mode on IMMD
> (#2446)
>
> IMMND subscribes to IMMD sv-id in NORMAL mode, i.e. not REDUNDANT
> mode.
> MDS will then supposedly take care of buffering messages that can not be
> delivered due to absence of IMMD until the IMMD is back. The IMMD itself
> MBCPs fevs mesasages (which ImplementerCLear is) to standby before
> sending.
>
> So this message loss could only happen if the IMMD goes down after having
> received the request, but before having MBCP'ed the request. So this should
> be very rare indeed.
>
> The fix in itself is safe though.
>
> I worry about the followingthough.
> Assume that instead of ImplementerClear, the AMf simply closes the OI
> handle and re-opens it.
> This should have exactly the same effect.
> A close of handle will be registgered and acceptet by the local IMMND and the
> local IMMND will close the implementer by generating that message internally
> to the IMMND. If that message gets lost, then the AMFD would here actually
> get Ok, but we would possibly have an "orphaned" OI.
>
> I have not seen one such case though.
>
> /AndersBj
>
>
>
> -----Original Message-----
> From: Anders Björnerstedt [mailto:[email protected]]
> Sent: den 27 maj 2015 09:23
> To: Nagendra Kumar; Hans Nordebäck; Praveen Malviya
> Cc: [email protected]
> Subject: Re: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear times
> out
>
> Ok, that should be a very rare error scenario, but in this case for this
> particular operation (ImplementerClear) which is an idempotent operation,
> then repeating the request should work and be safe.
>
> /AndersBj
>
> -----Original Message-----
> From: Nagendra Kumar [mailto:[email protected]]
> Sent: den 27 maj 2015 08:35
> To: Hans Nordebäck; Praveen Malviya
> Cc: [email protected]
> Subject: Re: [devel] [PATCH 1 of 1] amfd: try again if ImplementerClear times
> out
>
> Hi Hans N/Anders Bj,
> Thanks for your review.
> >>> A log record can be written before the "goto try_again".
> I would like to get logs record for BAD_HANDLE and other types of errors also.
> That's why the log record is on top common for all the errors.
>
> >>> An alternative to try_again can be to change IMMA_SYNCR_TIMEOUT
> from the default 10 seconds to a higher value.
> In the scenario of the ticket(specific), Immd has gone down and the packet
> has been lost along with Immd and hence increase of wait time will increase
> the HA unavailability.) So, my thought was it is ok for default wait time for
> 10
> seconds, and try again so that local Immd will respond, hence quicker
> recovery.
>
> Thanks
> -Nagu
>
> > -----Original Message-----
> > From: Hans Nordebäck [mailto:[email protected]]
> > Sent: 26 May 2015 18:15
> > To: Nagendra Kumar; Praveen Malviya
> > Cc: [email protected]
> > Subject: Re: [PATCH 1 of 1] amfd: try again if ImplementerClear times
> > out
> >
> > ack, code review only. Some comments below discussed with AndersBj.
> >
> > A log record can be written before the "goto try_again". An
> > alternative to try_again can be to change IMMA_SYNCR_TIMEOUT from the
> > default 10 seconds to a higher value.
> >
> > /Thanks HansN
> >
> > On 05/12/2015 10:44 AM, [email protected] wrote:
> > > osaf/services/saf/amf/amfd/role.cc | 9 ++++++---
> > > 1 files changed, 6 insertions(+), 3 deletions(-)
> > >
> > >
> > > ImplementerClear returns SA_AIS_ERR_TIMEOUT because the message
> sent
> > > from immnd to immd got lost because of node down.
> > > So, Amf need to handle it as try_again.
> > > After Amf does try_again, Imm returns SA_AIS_ERR_BAD_HANDLE.
> > > Amf handles SA_AIS_ERR_BAD_HANDLE and reinitializes its interface
> > > with
> > Imm.
> > > In handling SA_AIS_ERR_BAD_HANDLE, Amf sets the implementer in
> > avd_imm_reinit_bg_thread, so
> > > when Amf proceeds and calls avd_imm_applier_set, Imm returns
> > > SA_AIS_ERR_INVALID_PARAM. Since, it is expected error from Imm, Amf
> > > ignores it.
> > >
> > > diff --git a/osaf/services/saf/amf/amfd/role.cc
> > b/osaf/services/saf/amf/amfd/role.cc
> > > --- a/osaf/services/saf/amf/amfd/role.cc
> > > +++ b/osaf/services/saf/amf/amfd/role.cc
> > > @@ -659,15 +659,17 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb,
> > > _exit(EXIT_FAILURE); // should never get here...
> > > }
> > >
> > > +try_again:
> > > /* Take mutex here to sync with imm reinit thread.*/
> > > osaf_mutex_lock_ordie(&imm_reinit_mutex);
> > > -
> > > /* Give up IMM OI implementer role */
> > > if ((rc = immutil_saImmOiImplementerClear(cb->immOiHandle)) !=
> > SA_AIS_OK) {
> > > osaf_mutex_unlock_ordie(&imm_reinit_mutex);
> > > LOG_ER("FAILOVER Active --> Quiesced FAILED,
> > ImplementerClear failed %u", rc);
> > > if (rc == SA_AIS_ERR_BAD_HANDLE) {
> > > avd_imm_reinit_bg();
> > > + } else if (rc == SA_AIS_ERR_TIMEOUT) {
> > > + goto try_again;
> > > } else
> > > osafassert(0);
> > > } else
> > > @@ -685,10 +687,11 @@ void avd_mds_qsd_role_evh(AVD_CL_CB *cb,
> > > LOG_ER("avd_imm_applier_set FAILED, %u", rc);
> > > if (rc == SA_AIS_ERR_BAD_HANDLE) {
> > > avd_imm_reinit_bg();
> > > - } else if (rc == SA_AIS_ERR_EXIST) {
> > > + } else if ((rc == SA_AIS_ERR_EXIST) || (rc ==
> > SA_AIS_ERR_INVALID_PARAM)) {
> > > /* This may arise if
> > immutil_saImmOiImplementerClear
> > > failed and amf reinitializes imm interface
> > > and
> > > - set applier in avd_imm_reinit_bg_thread.*/
> > > + set applier in avd_imm_reinit_bg_thread. Imm may
> > > + return ERR_EXIST or INVALID_PARAM. */
> > > } else
> > > osafassert(0);
> > > } else
> >
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel