Hi AndersBj,

Reviewed and tested the patch.
Ack.

The setCcbErrorString in the case of OI_callback timeout will not reach 
OM because,
the reply will be sent from immnd_evt_proc_ccb_finalize, which does not 
set ErrorString.

/Neel.


On Wednesday 06 May 2015 08:07 PM, Anders Bjornerstedt wrote:
>   osaf/libs/common/immsv/include/immsv_api.h |   5 +-
>   osaf/services/saf/immsv/README             |  33 +++++++++++++++++---
>   osaf/services/saf/immsv/immnd/ImmModel.cc  |  47 
> +++++++++++++++++++++++------
>   osaf/services/saf/immsv/immnd/ImmModel.hh  |   5 +-
>   4 files changed, 71 insertions(+), 19 deletions(-)
>
>
> See the diff for osaf/services/saf/immsv/REAMDE for an explanation
> of this enhancement.
>
> diff --git a/osaf/libs/common/immsv/include/immsv_api.h 
> b/osaf/libs/common/immsv/include/immsv_api.h
> --- a/osaf/libs/common/immsv/include/immsv_api.h
> +++ b/osaf/libs/common/immsv/include/immsv_api.h
> @@ -142,11 +142,12 @@ typedef enum {
>   
>   typedef enum {
>       SA_IMM_ADMIN_EXPORT = 1,  /* Defined in A.02.01 declared in  A.03.01 */
> -    SA_IMM_ADMIN_INIT_FROM_FILE = 100 /* Non standard, force PBE disable. */
> +    SA_IMM_ADMIN_INIT_FROM_FILE = 100, /* Non standard, force PBE disable. */
> +    SA_IMM_ADMIN_ABORT_CCBS = 202 /* Non standard, abort non critical CCBs. 
> */
>   } SaImmMngtAdminOperationT;
>   
>   /*
> - * Special flags only to be used by the imm-dummper, the imm-loader or
> + * Special flags only to be used by the imm-dumper, the imm-loader or
>    * new API functions.
>    *
>    * The first excludes non persistent runtime attributes from the dump.
> diff --git a/osaf/services/saf/immsv/README b/osaf/services/saf/immsv/README
> --- a/osaf/services/saf/immsv/README
> +++ b/osaf/services/saf/immsv/README
> @@ -2302,8 +2302,8 @@ the continuation times out in the server
>   receives an error reply when that om client has NOT also timed out.
>   
>   
> -Improve error diagnostics when PBE is misconfigured.
> -====================================================
> +Improve error diagnostics when PBE is misconfigured (4.6)
> +=========================================================
>   http://sourceforge.net/p/opensaf/tickets/1139
>   
>   Configuration mistakes such as omitting to change immnd.conf to allow PBE
> @@ -2329,15 +2329,15 @@ Error logging been improved and the imm
>   Ccb operation error cases. This should make troubleshooting this issue much 
> faster and easier.
>   
>   
> -IMM API that replaces SaNameT with SaStringT and SA_IMM_ATTR_DN
> -===============================================================
> +IMM API that replaces SaNameT with SaStringT and SA_IMM_ATTR_DN (4.6)
> +=====================================================================
>   http://sourceforge.net/p/opensaf/tickets/643
>   
>   See: osaf/services/saf/immsv/README.SASTRINGT_API for details.
>   
>   
>   Notes on upgrading from OpenSAF 4.[1,2,3,4,5] to OpenSAF (4.6)
> -==========================================================
> +==============================================================
>   OpenSAF4.6 adds new message types that avoid using the SaNameT type (#969).
>   During a rolling upgrade from an earlier OpenSAF release to the 4.6 release 
> there
>   will be nodes executing the older release concurrently with nodes executing 
> OpenSAF 4.6.
> @@ -2376,6 +2376,29 @@ Bit 5 controls OpenSAF4.5 protocols allo
>   Bit 6 controls OpenSAF4.6 protocols allowed or not (normally on/1).
>   
>   
> +Provide an admin-operation for aborting all non-critical CCBs (4.7)
> +===================================================================
> +http://sourceforge.net/p/opensaf/tickets/1107
> +
> +There may arise situations where an open CCB that is not in critical,
> +i.e. has not entered the commit protocol yet, is blocking an involved
> +service/OI from performing some other task that is more urgent and more
> +important than completing that CCB. The best example is the AMF, where
> +an si-swap will fail and cause the standby to reboot if it was involved
> +in an open CCB when the si-swap order was issued (see ticket #1105).
> +Ticket #1105 can be fixed by the AMF (active or standby) sending an
> +admin-operation directed at the IMM service requesting it to abort non
> +critical CCBs. The AMF can either use a synchronous admin-op or an
> +asyncronous admin-op. After the admin-operation has been invoked the AMF
> +should allow a few seconds for the CCB to get aborted and the AMF OI to
> +get the abort callback for the CCB. That should then clear the path for
> +the AMF standby to succeed with the si-swap.
> +The admin-operation for aborting non critical CCBs involves requesting the
> +operation id '202' directed at the IMM SF service object:
> +
> +     immadm -o 202 safRdn=immManagement,safApp=safImmService
> +
> +
>   ----------------------------------------
>   DEPENDENCIES
>   ============
> diff --git a/osaf/services/saf/immsv/immnd/ImmModel.cc 
> b/osaf/services/saf/immsv/immnd/ImmModel.cc
> --- a/osaf/services/saf/immsv/immnd/ImmModel.cc
> +++ b/osaf/services/saf/immsv/immnd/ImmModel.cc
> @@ -453,6 +453,7 @@ static SaImmRepositoryInitModeT immInitM
>   
>   static SaUint32T ccbIdLongDnGuard  = 0; /* Disallow long DN additions if 
> longDnsAllowed is being changed in ccb*/
>   static bool      sIsLongDnLoaded   = false; /* track long DNs before 
> opensafImm=opensafImm,safApp=safImmService is created */
> +static bool      sAbortNonCriticalCcbs = false; /* Set to true at coord by 
> the special imm admin-op to abort ccbs #1107 */
>   
>   struct AttrFlagIncludes
>   {
> @@ -1252,7 +1253,7 @@ immModel_adminOperationInvoke(IMMND_CB *
>   {
>       return ImmModel::instance(&cb->immModel)->
>           adminOperationInvoke(req, reqConn, reply_dest, inv,
> -        implConn, implNodeId, pbeExpected, displayRes);
> +        implConn, implNodeId, pbeExpected, displayRes, cb->mIsCoord);
>   }
>   
>   SaUint32T  /* Returns admo-id for object if object exists and active admo 
> exists, otherwise zero. */
> @@ -3139,7 +3140,7 @@ ImmModel::classCreate(const ImmsvOmClass
>               if(attr->attrValueType != SA_IMM_ATTR_SANAMET
>                       && !((attr->attrFlags & SA_IMM_ATTR_DN) && 
> (attr->attrValueType == SA_IMM_ATTR_SASTRINGT))) {
>                   LOG_NO("ERR_INVALID_PARAM: Attribute '%s' must be of type 
> SaNameT, "
> -                             "or of type SaStringT with DN flag", attNm);
> +                       "or of type SaStringT with DN flag", attNm);
>                   illegal = 1;
>               }
>   
> @@ -10982,7 +10983,7 @@ SaAisErrorT ImmModel::adminOperationInvo
>                                              SaInvocationT& saInv,
>                                              SaUint32T* implConn,
>                                              unsigned int* implNodeId,
> -                                           bool pbeExpected, bool* 
> displayRes)
> +                                           bool pbeExpected, bool* 
> displayRes, bool isAtCoord)
>   {
>       TRACE_ENTER();
>       SaAisErrorT err = SA_AIS_OK;
> @@ -11179,7 +11180,7 @@ SaAisErrorT ImmModel::adminOperationInvo
>               TRACE_7("Admin op on special object %s whith no implementer 
> ret:%u",
>                   objectName.c_str(), err);
>           } else if(objectName == immManagementDn) {
> -            err = admoImmMngtObject(req);
> +            err = admoImmMngtObject(req, isAtCoord);
>               TRACE_7("Admin op on special object %s whith no implementer 
> ret:%u",
>                   objectName.c_str(), err);
>           } else {
> @@ -11772,7 +11773,7 @@ ImmModel::resourceDisplay(const struct I
>   
>   
>   SaAisErrorT
> -ImmModel::admoImmMngtObject(const ImmsvOmAdminOperationInvoke* req)
> +ImmModel::admoImmMngtObject(const ImmsvOmAdminOperationInvoke* req, bool 
> isAtCoord)
>   {
>       SaAisErrorT err = SA_AIS_ERR_INTERRUPT;
>       /* Function for handling admin-ops directed at the immsv itself.
> @@ -11810,6 +11811,13 @@ ImmModel::admoImmMngtObject(const ImmsvO
>               immInitMode = SA_IMM_INIT_FROM_FILE;
>               LOG_NO("SaImmRepositoryInitModeT FORCED to: 
> SA_IMM_INIT_FROM_FILE");
>           }
> +    } else if (req->operationId == SA_IMM_ADMIN_ABORT_CCBS) { /* Non 
> standard. */
> +        LOG_NO("Received: immadm -o %u 
> safRdn=immManagement,safApp=safImmService",
> +            SA_IMM_ADMIN_ABORT_CCBS);
> +        if(isAtCoord) {
> +            LOG_IN("sAbortNonCriticalCcbs = true;");
> +            sAbortNonCriticalCcbs = true;
> +        }
>       } else {
>           LOG_NO("Invalid operation ID %llu, for operation on %s", 
> (SaUint64T) req->operationId,
>               immManagementDn.c_str());
> @@ -12476,7 +12484,7 @@ ImmModel::cleanTheBasement(InvocVector&
>               //AND ccbIds for ccbs in critical and marked with 
> PbeRestartedId.
>               //Restarted PBE => try to recover outcome BEFORE timeout, making
>               //recovery transparent to user!
> -            //TODO the timeout should not be hardwired, but for now it is.
> +            //Also handle the case of admin-op requesting abort of all 
> non-critical ccbs.
>               TRACE("Checking active ccb %u for deadlock or blocked 
> implementer",
>                   (*i3)->mId);
>               TRACE("state:%u waitsart:%u PberestartId:%u",(*i3)->mState,
> @@ -12484,9 +12492,14 @@ ImmModel::cleanTheBasement(InvocVector&
>   
>               CcbImplementerMap::iterator cim;
>               uint32_t max_oi_timeout = DEFAULT_TIMEOUT_SEC;
> -            for(cim = (*i3)->mImplementers.begin(); cim != 
> (*i3)->mImplementers.end(); ++cim) {
> -                if(cim->second->mImplementer->mTimeout > max_oi_timeout) {
> -                    max_oi_timeout = cim->second->mImplementer->mTimeout;
> +            if(sAbortNonCriticalCcbs) {
> +                LOG_IN("sAbortNonCriticalCcbs is true => set max_oi_timeout 
> to 0");
> +                max_oi_timeout = 0;
> +            } else {
> +                for(cim = (*i3)->mImplementers.begin(); cim != 
> (*i3)->mImplementers.end(); ++cim) {
> +                    if(cim->second->mImplementer->mTimeout > max_oi_timeout) 
> {
> +                        max_oi_timeout = cim->second->mImplementer->mTimeout;
> +                    }
>                   }
>               }
>   
> @@ -12502,6 +12515,15 @@ ImmModel::cleanTheBasement(InvocVector&
>                       oi_timeout = 0;
>                       TRACE_5("CCB %u timeout while waiting on implementer 
> reply",
>                           (*i3)->mId);
> +                    setCcbErrorString(*i3, "Resource Error: CCB timeout 
> while "
> +                         "waiting on implementer reply");
> +                }
> +
> +                if(sAbortNonCriticalCcbs) {
> +                    LOG_NO("CCB %u aborted by: immadm -o %u 
> safRdn=immManagement,safApp=safImmService",
> +                        (*i3)->mId, SA_IMM_ADMIN_ABORT_CCBS);
> +                    setCcbErrorString(*i3, "Resource Error: CCB aborted by 
> admin-operation"
> +                         " '202' on 
> safRdn=immManagement,safApp=safImmService");
>                   }
>   
>                   if((*i3)->mState == IMM_CCB_CRITICAL) {
> @@ -12528,6 +12550,11 @@ ImmModel::cleanTheBasement(InvocVector&
>           }
>       }
>   
> +    if(sAbortNonCriticalCcbs) {
> +        LOG_IN("sAbortNonCriticalCcbs reset to false");
> +        sAbortNonCriticalCcbs = false; /* Reset. */
> +    }
> +
>       while((i3 = ccbsToGc.begin()) != ccbsToGc.end()) {
>           CcbInfo* ccb = (*i3);
>           ccbsToGc.erase(i3);
> @@ -12544,7 +12571,7 @@ ImmModel::cleanTheBasement(InvocVector&
>           //It needs to be long to allow reply on larger batch jobs such as a
>           //schema/class change with instance migration and slow file system.
>           //It can not be infinite as that could cause a memory leak.
> -         if(now - ci2->second.mCreateTime >= (DEFAULT_TIMEOUT_SEC * 20)) {
> +        if(now - ci2->second.mCreateTime >= (DEFAULT_TIMEOUT_SEC * 20)) {
>               TRACE_5("Timeout on PbeRtReqContinuation %llu", ci2->first);
>               pbePrtoReqs.push_back(ci2->second.mConn);
>               sPbeRtReqContinuationMap.erase(ci2);
> diff --git a/osaf/services/saf/immsv/immnd/ImmModel.hh 
> b/osaf/services/saf/immsv/immnd/ImmModel.hh
> --- a/osaf/services/saf/immsv/immnd/ImmModel.hh
> +++ b/osaf/services/saf/immsv/immnd/ImmModel.hh
> @@ -361,7 +361,8 @@ public:
>                                                SaUint32T* implConn,
>                                                unsigned int* implNodeId,
>                                                bool pbeExpected,
> -                                             bool* displayRes);
> +                                             bool* displayRes,
> +                                             bool isAtCoord);
>       
>       // Objects
>       
> @@ -653,7 +654,7 @@ private:
>                                          std::string newClassName,
>                                          bool remove=false);
>       SaAisErrorT        updateImmObject2(const ImmsvOmAdminOperationInvoke* 
> req);
> -    SaAisErrorT        admoImmMngtObject(const ImmsvOmAdminOperationInvoke* 
> req);
> +    SaAisErrorT        admoImmMngtObject(const ImmsvOmAdminOperationInvoke* 
> req, bool isAtCoord);
>       
>       void               addNoDanglingRefs(ObjectInfo *obj);
>       void               removeNoDanglingRefs(


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to