On 01/31/2017 05:54 AM, Neelakanta Reddy wrote: > Hi Rafael, > > Comments inline. > > On 2017/01/30 07:33 PM, Rafael Odzakow wrote: >> ACK, with comments after [rafael] tag >> > I will add the comments, while pushing. >> As an improvement after the retries the operation can fail the >> campaign. Today we only move the problem over to the next campaign? >> > No, the first campaign will not fail and the campaign state will be > moved to campaign completed. > But saAmfSUMaintenanceCampaign is not cleared, when the next campaign > is tried it will fail with the error. That is my point. Why would we want to move a problem over to the next campaign? Is it not better to fail at the right time? > > The patch, retries the modification of saAmfSUMaintenanceCampaign, if > the ccb operation is aborted because of resource abort. > > Thanks, > Neel. >> >> On 01/30/2017 02:25 PM, reddy.neelaka...@oracle.com wrote: >>> osaf/services/saf/smfsv/smfd/SmfImmOperation.cc | 21 >>> ++++++++++++++++++--- >>> osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc | 16 >>> ++++++++++++++-- >>> 2 files changed, 32 insertions(+), 5 deletions(-) >>> >>> >>> When the campaign is commited, while modifying >>> saAmfSUMaintenanceCampaign flags, if any of the IMMND restarts or >>> new node joins the cluster, then the non-commited ccbs will be aborted. >>> >>> This is a rare case, when saAmfSUMaintenanceCampaign will fail due >>> to IMMND sync, and and the next campaign will fails with >>> ER Failed to set maintenance state in step=safSmfStep=0001 >>> >>> diff --git a/osaf/services/saf/smfsv/smfd/SmfImmOperation.cc >>> b/osaf/services/saf/smfsv/smfd/SmfImmOperation.cc >>> --- a/osaf/services/saf/smfsv/smfd/SmfImmOperation.cc >>> +++ b/osaf/services/saf/smfsv/smfd/SmfImmOperation.cc >>> @@ -1033,11 +1033,26 @@ SmfImmModifyOperation::execute(SmfRollba >>> // objectName.length = (SaUint16T) objectLen; >>> // memcpy(objectName.value, object, objectLen); >>> + const SaStringT * errStrings = NULL; >>> result = immutil_saImmOmCcbObjectModify_2(m_ccbHandle, >>> &objectName, (const SaImmAttrModificationT_2 **) >>> - m_immAttrMods); >>> - if (result != SA_AIS_OK) { >>> + m_immAttrMods); >>> + if (result != SA_AIS_OK && result == >>> SA_AIS_ERR_FAILED_OPERATION) { >>> + result = saImmOmCcbGetErrorStrings(m_ccbHandle, &errStrings); >>> + if (errStrings){ >>> + TRACE("Received error string is %s", errStrings[0]); >>> + char * type = NULL; >>> + type = strstr(errStrings[0], "IMM: Resource abort: "); >>> + if(type != NULL) { >>> + TRACE("SA_AIS_ERR_FAILED_OPERATION is modified to >>> SA_AIS_ERR_TRY_AGAIN because of ccb resourse abort" ); >>> + result = SA_AIS_ERR_TRY_AGAIN; >>> + TRACE_LEAVE(); >>> + return result; >>> + } >>> + } >>> + } >>> + if (result != SA_AIS_OK){ >>> LOG_NO("SmfImmModifyOperation::execute, >>> saImmOmCcbObjectModify failed %s", saf_error(result)); >>> - TRACE_LEAVE(); >>> + TRACE_LEAVE(); >>> return result; >>> } >>> diff --git a/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc >>> b/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc >>> --- a/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc >>> +++ b/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc >>> @@ -1063,6 +1063,7 @@ SmfUpgradeCampaign::resetMaintenanceStat >>> std::list < std::string > objectList; >>> SmfImmUtils immUtil; >>> (void)immUtil.getChildren("", objectList, SA_IMM_SUBTREE, >>> "SaAmfSU"); >>> + SaAisErrorT rc = SA_AIS_OK; >>> //Reset saAmfSUMaintenanceCampaign for all found SUs >>> const std::string campDn = >>> SmfCampaignThread::instance()->campaign()->getDn(); >>> @@ -1093,9 +1094,20 @@ SmfUpgradeCampaign::resetMaintenanceStat >>> } >>> } >>> - if (immUtil.doImmOperations(operations) != SA_AIS_OK) { >>> - LOG_NO("SmfUpgradeStep::setMaintenanceState(), fails to reset all >>> saAmfSUMaintenanceCampaign attr"); >>> + const uint32_t MAX_NO_RETRIES = 2; >>> + uint32_t retry_cnt = 0; >>> + while (++retry_cnt <= MAX_NO_RETRIES) { >>> + rc = immUtil.doImmOperations(operations); >> [rafael] whitespace above >>> + if (rc != SA_AIS_OK && rc == SA_AIS_ERR_TRY_AGAIN) { >>> + /* TRY_AGAIN is returned only when ccb is aborted >> [rafael] whitespace above >>> + with Resource abort in error string. Resurce abort in >>> + error string is checked presently for modify >>> operation */ >> [rafael] no strict rule but I never use this comment style over >> multiple lines. Maybe try: >> /* >> * >> */ >> or >> // >> // >> // >>> + continue; >>> + } >>> } >>> + if (rc != SA_AIS_OK){ >>> + LOG_NO("SmfUpgradeStep::setMaintenanceState(), fails to >>> reset all saAmfSUMaintenanceCampaign attr"); >>> + } >>> //Delete the created SmfImmModifyOperation instances >>> std::list < SmfImmOperation * > ::iterator operIter; >> >
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel