Hi Alex,

I tested the patch,
when the same scenario is followed in the defect, the campaign is moving 
to SUSPENDED_BY_ERROR_DETECTED.

when the other controller comes up(which is rebooted in the middle of 
si-swap),

a) execute the campaign again, the campaign is moving to EXECUTION_FAILED.
b) rollback the campaign, the campaign us moving to ROLLBACK_COMPLETED.

The case (a), above has to be looked into, why the campaign is moving to 
EXECUTION_FAILED STATE.
The campaign, has to move to EXECUTION_COMPLETED state.

Reviewing the code, let you know if there are any further comments.

Regards,
Neel.


On 2016/10/06 01:50 AM, Alex Jones wrote:
>   osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc |  69 
> +++++++++++++++++---
>   osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh |   6 +
>   2 files changed, 64 insertions(+), 11 deletions(-)
>
>
> Sep 27 00:34:14 q50-s1 osafsmfd[6667]: NO SA_AMF_ADMIN_SI_SWAP [rc=1] 
> successfully initiated
> Sep 27 00:34:15 q50-s1 osafimmnd[6571]: NO ERR_BAD_OPERATION: Mismatch on 
> administrative owner '' != 'SMFSERVICE'
> Sep 27 00:34:17 q50-s1 osafsmfd[6667]: NO Fail to invoke admin operation, 
> rc=SA_AIS_ERR_BAD_OPERATION (20). dn=[safSi=SC-2N,safApp=OpenSAF], opId=[7]
> Sep 27 00:34:17 q50-s1 osafsmfd[6667]: NO Admin op SA_AMF_ADMIN_SI_SWAP fail 
> [rc = 20]
> Sep 27 00:34:17 q50-s1 osafsmfd[6667]: NO CAMP: Procedure 
> safSmfProc=RollingUpgrade returned FAILED
> Sep 27 00:36:14 q50-s1 osafsmfd[6667]: NO Campaign thread does not disappear 
> within 120 seconds after SA_AMF_ADMIN_SI_SWAP, the operation was assumed 
> failed.
> Sep 27 00:36:14 q50-s1 kernel: [14934029.531187] osafsmfd[32024]: segfault at 
> 4 ip 00000000004425b6 sp 00007f67f7ffe1c0 error 4 in osafsmfd[400000+9a000]
> Sep 27 00:36:14 q50-s1 osafamfnd[6649]: NO 
> 'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
> Recovery is 'nodeFailfast'
> Sep 27 00:36:14 q50-s1 osafamfnd[6649]: ER 
> safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown 
> Recovery is:nodeFailfast
>
> There are a few problems here. One is that the SmfSwapThread is pointing to a
> deleted procedure when the original active controller is reassigned active. 
> The
> second problem is that a new SmfSwapThread is created when the original active
> controller is reassigned active, so now there are two running. The first 
> thread
> tries to use its proc pointer (which has been deleted when the original active
> goes to quiesced) and causes the segfault.
>
> The proposed solution is a little different from that proposed in the ticket
> description. This solution proposes to use the existence of the SmfSwapThread 
> as
> a test. When the original active controller is reassigned active because the
> si-swap failed, it will still remove the RestartIndicator as it does now. But,
> if the SmfSwapThread is still running, it will not create a new one, but 
> update
> it with the recreated procedure pointer, and let it handle the si-swap 
> timeout.
> Then it will report the error. I believe this solution is backwards compatible
> because no IMM changes are made like the ones proposed in the ticket.
>
> diff --git a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc 
> b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc
> --- a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc
> +++ b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc
> @@ -482,12 +482,18 @@ SmfUpgradeProcedure::switchOver()
>                   osafassert(0);
>           }
>   
> -     TRACE("SmfUpgradeProcedure::switchOver: Create the restart indicator");
> -     
> SmfCampaignThread::instance()->campaign()->getUpgradeCampaign()->createSmfRestartIndicator();
> -
> -     SmfSwapThread *swapThread = new SmfSwapThread(this);
> -     TRACE("SmfUpgradeProcedure::switchOver, Starting SI_SWAP thread");
> -     swapThread->start();
> +     if (!SmfSwapThread::running()) {
> +             TRACE("SmfUpgradeProcedure::switchOver: Create the restart 
> indicator");
> +             
> SmfCampaignThread::instance()->campaign()->getUpgradeCampaign()->createSmfRestartIndicator();
> +
> +             SmfSwapThread *swapThread = new SmfSwapThread(this);
> +             TRACE("SmfUpgradeProcedure::switchOver, Starting SI_SWAP 
> thread");
> +             swapThread->start();
> +     } else {
> +             TRACE("SmfUpgradeProcedure::switchOver, SI_SWAP thread already 
> running");
> +             SmfSwapThread::setProc(this);
> +
> +     }
>   
>       TRACE_LEAVE();
>   }
> @@ -4156,6 +4162,31 @@ SmfUpgradeProcedure::resetProcCounter()
>   /*  Static methods                                                    */
>   /*====================================================================*/
>   
> +SmfSwapThread *SmfSwapThread::me(0);
> +std::mutex SmfSwapThread::m_mutex;
> +
> +/**
> + * SmfSmfSwapThread::running
> + * Is the thread currently running?
> + */
> +bool
> +SmfSwapThread::running(void)
> +{
> +     std::lock_guard<std::mutex> guard(m_mutex);
> +     return me ? true : false;
> +}
> +
> +/**
> + * SmfSmfSwapThread::setProc
> + * Set the procedure pointer to the newly created procedure
> + */
> +void
> +SmfSwapThread::setProc(SmfUpgradeProcedure *newProc)
> +{
> +     std::lock_guard<std::mutex> guard(m_mutex);
> +     me->m_proc = newProc;
> +}
> +
>   /**
>    * SmfSmfSwapThread::main
>    * static main for the thread
> @@ -4181,6 +4212,8 @@ SmfSwapThread::SmfSwapThread(SmfUpgradeP
>       m_proc(i_proc)
>   {
>       sem_init(&m_semaphore, 0, 0);
> +     std::lock_guard<std::mutex> guard(m_mutex);
> +     me = this;
>   }
>   
>   /**
> @@ -4188,6 +4221,8 @@ SmfSwapThread::SmfSwapThread(SmfUpgradeP
>    */
>   SmfSwapThread::~SmfSwapThread()
>   {
> +     std::lock_guard<std::mutex> guard(m_mutex);
> +     me = 0;
>   }
>   
>   /**
> @@ -4309,13 +4344,25 @@ SmfSwapThread::main(void)
>   
>   exit_error:
>           if (SmfCampaignThread::instance() != NULL) {
> -                SmfProcStateExecFailed::instance()->changeState(m_proc, 
> SmfProcStateExecFailed::instance());
> -        }
> -
> -        if (SmfCampaignThread::instance() != NULL) {
> +             std::lock_guard<std::mutex> guard(m_mutex);
> +
> +                SmfProcStateExecuting::instance()->changeState(m_proc, 
> SmfProcStateStepUndone::instance());
> +
> +             // find the failed upgrade step and set it to Undone
> +             std::vector<SmfUpgradeStep *>& 
> upgradeSteps(m_proc->getProcSteps());
> +             for (std::vector<SmfUpgradeStep *>::iterator 
> it(upgradeSteps.begin()); it != upgradeSteps.end(); ++it) {
> +                     if ((*it)->getSwitchOver()) {
> +                             
> (*it)->changeState(SmfStepStateUndone::instance());
> +                             break;
> +                     }
> +             }
> +
> +             std::string error("si-swap of middleware failed");
> +             SmfCampaignThread::instance()->campaign()->setError(error);
> +
>                   CAMPAIGN_EVT *evt = new CAMPAIGN_EVT();
>                   evt->type = CAMPAIGN_EVT_PROCEDURE_RC;
> -                evt->event.procResult.rc = SMF_PROC_FAILED;
> +                evt->event.procResult.rc = SMF_PROC_STEPUNDONE;
>                   evt->event.procResult.procedure = m_proc;
>                   SmfCampaignThread::instance()->send(evt);
>           }
> diff --git a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh 
> b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh
> --- a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh
> +++ b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh
> @@ -29,6 +29,7 @@
>   #include <vector>
>   #include <list>
>   #include <map>
> +#include <mutex>
>   
>   #include <saSmf.h>
>   #include <saImmOi.h>
> @@ -791,7 +792,12 @@ class SmfSwapThread {
>       ~SmfSwapThread();
>       int start(void);
>   
> +     static bool running(void);
> +     static void setProc(SmfUpgradeProcedure *);
> +
>    private:
> +     static SmfSwapThread *me;
> +     static std::mutex m_mutex;
>   
>       void main(void);
>       int init(void);
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to