Hi Neel,

  Where did the failure occur when you reexecuted the campaign? In my
test it succeeds.

  Can you send the logs?

Alex

On 10/07/2016 08:49 AM, Neelakanta Reddy wrote:
> ------------------------------------------------------------------------
> NOTICE: This email was received from an EXTERNAL sender
> ------------------------------------------------------------------------
> 
> Hi Alex,
> 
> I tested the patch,
> when the same scenario is followed in the defect, the campaign is moving
> to SUSPENDED_BY_ERROR_DETECTED.
> 
> when the other controller comes up(which is rebooted in the middle of
> si-swap),
> 
> a) execute the campaign again, the campaign is moving to EXECUTION_FAILED.
> b) rollback the campaign, the campaign us moving to ROLLBACK_COMPLETED.
> 
> The case (a), above has to be looked into, why the campaign is moving to
> EXECUTION_FAILED STATE.
> The campaign, has to move to EXECUTION_COMPLETED state.
> 
> Reviewing the code, let you know if there are any further comments.
> 
> Regards,
> Neel.
> 
> 
> On 2016/10/06 01:50 AM, Alex Jones wrote:
>> osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc | 69
> +++++++++++++++++---
>> osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh | 6 +
>> 2 files changed, 64 insertions(+), 11 deletions(-)
>>
>>
>> Sep 27 00:34:14 q50-s1 osafsmfd[6667]: NO SA_AMF_ADMIN_SI_SWAP [rc=1]
> successfully initiated
>> Sep 27 00:34:15 q50-s1 osafimmnd[6571]: NO ERR_BAD_OPERATION: Mismatch
> on administrative owner '' != 'SMFSERVICE'
>> Sep 27 00:34:17 q50-s1 osafsmfd[6667]: NO Fail to invoke admin
> operation, rc=SA_AIS_ERR_BAD_OPERATION (20).
> dn=[safSi=SC-2N,safApp=OpenSAF], opId=[7]
>> Sep 27 00:34:17 q50-s1 osafsmfd[6667]: NO Admin op
> SA_AMF_ADMIN_SI_SWAP fail [rc = 20]
>> Sep 27 00:34:17 q50-s1 osafsmfd[6667]: NO CAMP: Procedure
> safSmfProc=RollingUpgrade returned FAILED
>> Sep 27 00:36:14 q50-s1 osafsmfd[6667]: NO Campaign thread does not
> disappear within 120 seconds after SA_AMF_ADMIN_SI_SWAP, the operation
> was assumed failed.
>> Sep 27 00:36:14 q50-s1 kernel: [14934029.531187] osafsmfd[32024]:
> segfault at 4 ip 00000000004425b6 sp 00007f67f7ffe1c0 error 4 in
> osafsmfd[400000+9a000]
>> Sep 27 00:36:14 q50-s1 osafamfnd[6649]: NO
> 'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
> 'avaDown' : Recovery is 'nodeFailfast'
>> Sep 27 00:36:14 q50-s1 osafamfnd[6649]: ER
> safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown
> Recovery is:nodeFailfast
>>
>> There are a few problems here. One is that the SmfSwapThread is
> pointing to a
>> deleted procedure when the original active controller is reassigned
> active. The
>> second problem is that a new SmfSwapThread is created when the
> original active
>> controller is reassigned active, so now there are two running. The
> first thread
>> tries to use its proc pointer (which has been deleted when the
> original active
>> goes to quiesced) and causes the segfault.
>>
>> The proposed solution is a little different from that proposed in the
> ticket
>> description. This solution proposes to use the existence of the
> SmfSwapThread as
>> a test. When the original active controller is reassigned active
> because the
>> si-swap failed, it will still remove the RestartIndicator as it does
> now. But,
>> if the SmfSwapThread is still running, it will not create a new one,
> but update
>> it with the recreated procedure pointer, and let it handle the si-swap
> timeout.
>> Then it will report the error. I believe this solution is backwards
> compatible
>> because no IMM changes are made like the ones proposed in the ticket.
>>
>> diff --git a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc
> b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc
>> --- a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc
>> +++ b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc
>> @@ -482,12 +482,18 @@ SmfUpgradeProcedure::switchOver()
>> osafassert(0);
>> }
>>
>> - TRACE("SmfUpgradeProcedure::switchOver: Create the restart indicator");
>> -
> SmfCampaignThread::instance()->campaign()->getUpgradeCampaign()->createSmfRestartIndicator();
>> -
>> - SmfSwapThread *swapThread = new SmfSwapThread(this);
>> - TRACE("SmfUpgradeProcedure::switchOver, Starting SI_SWAP thread");
>> - swapThread->start();
>> + if (!SmfSwapThread::running()) {
>> + TRACE("SmfUpgradeProcedure::switchOver: Create the restart indicator");
>> +
> SmfCampaignThread::instance()->campaign()->getUpgradeCampaign()->createSmfRestartIndicator();
>> +
>> + SmfSwapThread *swapThread = new SmfSwapThread(this);
>> + TRACE("SmfUpgradeProcedure::switchOver, Starting SI_SWAP thread");
>> + swapThread->start();
>> + } else {
>> + TRACE("SmfUpgradeProcedure::switchOver, SI_SWAP thread already
> running");
>> + SmfSwapThread::setProc(this);
>> +
>> + }
>>
>> TRACE_LEAVE();
>> }
>> @@ -4156,6 +4162,31 @@ SmfUpgradeProcedure::resetProcCounter()
>> /* Static methods */
>> /*====================================================================*/
>>
>> +SmfSwapThread *SmfSwapThread::me(0);
>> +std::mutex SmfSwapThread::m_mutex;
>> +
>> +/**
>> + * SmfSmfSwapThread::running
>> + * Is the thread currently running?
>> + */
>> +bool
>> +SmfSwapThread::running(void)
>> +{
>> + std::lock_guard<std::mutex> guard(m_mutex);
>> + return me ? true : false;
>> +}
>> +
>> +/**
>> + * SmfSmfSwapThread::setProc
>> + * Set the procedure pointer to the newly created procedure
>> + */
>> +void
>> +SmfSwapThread::setProc(SmfUpgradeProcedure *newProc)
>> +{
>> + std::lock_guard<std::mutex> guard(m_mutex);
>> + me->m_proc = newProc;
>> +}
>> +
>> /**
>> * SmfSmfSwapThread::main
>> * static main for the thread
>> @@ -4181,6 +4212,8 @@ SmfSwapThread::SmfSwapThread(SmfUpgradeP
>> m_proc(i_proc)
>> {
>> sem_init(&m_semaphore, 0, 0);
>> + std::lock_guard<std::mutex> guard(m_mutex);
>> + me = this;
>> }
>>
>> /**
>> @@ -4188,6 +4221,8 @@ SmfSwapThread::SmfSwapThread(SmfUpgradeP
>> */
>> SmfSwapThread::~SmfSwapThread()
>> {
>> + std::lock_guard<std::mutex> guard(m_mutex);
>> + me = 0;
>> }
>>
>> /**
>> @@ -4309,13 +4344,25 @@ SmfSwapThread::main(void)
>>
>> exit_error:
>> if (SmfCampaignThread::instance() != NULL) {
>> - SmfProcStateExecFailed::instance()->changeState(m_proc,
> SmfProcStateExecFailed::instance());
>> - }
>> -
>> - if (SmfCampaignThread::instance() != NULL) {
>> + std::lock_guard<std::mutex> guard(m_mutex);
>> +
>> + SmfProcStateExecuting::instance()->changeState(m_proc,
> SmfProcStateStepUndone::instance());
>> +
>> + // find the failed upgrade step and set it to Undone
>> + std::vector<SmfUpgradeStep *>& upgradeSteps(m_proc->getProcSteps());
>> + for (std::vector<SmfUpgradeStep *>::iterator
> it(upgradeSteps.begin()); it != upgradeSteps.end(); ++it) {
>> + if ((*it)->getSwitchOver()) {
>> + (*it)->changeState(SmfStepStateUndone::instance());
>> + break;
>> + }
>> + }
>> +
>> + std::string error("si-swap of middleware failed");
>> + SmfCampaignThread::instance()->campaign()->setError(error);
>> +
>> CAMPAIGN_EVT *evt = new CAMPAIGN_EVT();
>> evt->type = CAMPAIGN_EVT_PROCEDURE_RC;
>> - evt->event.procResult.rc = SMF_PROC_FAILED;
>> + evt->event.procResult.rc = SMF_PROC_STEPUNDONE;
>> evt->event.procResult.procedure = m_proc;
>> SmfCampaignThread::instance()->send(evt);
>> }
>> diff --git a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh
> b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh
>> --- a/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh
>> +++ b/osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh
>> @@ -29,6 +29,7 @@
>> #include <vector>
>> #include <list>
>> #include <map>
>> +#include <mutex>
>>
>> #include <saSmf.h>
>> #include <saImmOi.h>
>> @@ -791,7 +792,12 @@ class SmfSwapThread {
>> ~SmfSwapThread();
>> int start(void);
>>
>> + static bool running(void);
>> + static void setProc(SmfUpgradeProcedure *);
>> +
>> private:
>> + static SmfSwapThread *me;
>> + static std::mutex m_mutex;
>>
>> void main(void);
>> int init(void);
>>

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to