Hi Alex, got it. Ack from me.
Thanks, Neel. On 2017/04/08 03:25 AM, Alex Jones wrote: > > Hi Neel, > > Replies to your comments: > > (1)I don’t see this. With this patch can you get the system into a > state where the suMaintenanceCampaign is set in AMF, the component > crashes right before the SMF commit clears the suMaintenanceCampaign, > and thecmgnError is not set? If so, that’s a bug that we should > address. The SMF-AIS spec states that the suMaintenanceCampaign > attribute is cleared when the campaign is committed. Therefore it is > entirely possible that an SU could become disabled right before the > attribute is cleared, and the campaign still transitions to COMMITTED. > But, cmpnError should still be set in this case. So, as I read the > spec the operator needs to check this after committing, and may need > to repair an SU after committing a campaign, if this corner case occurs. > > (2)Well, the SMF-AIS spec says that asyncFailures are considered in > EVERY state while the suMaintenanceCampaign attribute is set. A > suspend is only done in the states you mention, but cmpnError should > be set in all states. This is already handled by the campaign state > machine. Look at SmfCampState.cc. The ::asyncFailure virtual function > only suspends the campaign in the states that you mention. If an > asyncFailure occurs in any other state, the base class virtual > function is called which does nothing. > > Alex > > *From:*Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com] > *Sent:* Friday, April 7, 2017 5:05 AM > *To:* Alex Jones <alex.jo...@genband.com>; lennart.l...@ericsson.com; > rafael.odza...@ericsson.com > *Cc:* opensaf-devel@lists.sourceforge.net > *Subject:* Re: [PATCH 1 of 1] smfd: fix race condition when detecting > async failures [#2413] > > ------------------------------------------------------------------------ > > NOTICE: This email was received from an EXTERNAL sender > > ------------------------------------------------------------------------ > > > Hi Alex, > > Reviewed and tested the patch. > The patch avoids, the crash in smfd. > > I have the following Questions on the implementation(which was > overlooked at the time of #2145 review): > > 1. If the component is killed when the campaign is in > SA_SMF_CMPG_EXECUTION_COMPLETED state(as in the defect) and > the amfnd does not re-start the component because, > saAmfSUMaintenanceCampaign is set. The saSmfCmpgError indication is not > given in SMF, because the campaign is in the process of committing. > > who does this justify Asynchronous Failures of AMF Entities? > > 2. > according to SMF AIS, the async failures has to be considered Executing, > Suspending, or RollingBack only. > Then how do we prevent in other states? And the AMFND still will not > re-start(correct) because of saAmfSUMaintenanceCampaign is set. > > Thanks, > Neel. > > On 2017/04/06 11:56 PM, Alex Jones wrote: > > src/smf/smfd/SmfCampaignThread.cc | 16 ++++++++++------ > > 1 files changed, 10 insertions(+), 6 deletions(-) > > > > > > smfd core dumps during commit of campaign. > > > > If an AMF SU under maintenance fails right as the campaign commit is > done, there > > is a race condition present. Before SMF clears the suMaintenaceCampaign > > attribute of the SU, if the SU fails, it will send a notification. > Meanwhile, > > the commit deletes upgrade campaign pointer inside smfd. If the > deletion of the > > campaign pointer inside smfd occurs before it receives the NTF event > a crash > > will occur because the campaign pointer is gone. > > > > Solution is to always process NTF events before processing the > termination of > > the campaign. The campaign termination code sets "m_running" to > false, and > > deletes the pointer. This should always be last in the poll loop so > that the > > loop will exit immediately without processing any NTF events (or > other future > > events.) > > > > diff --git a/src/smf/smfd/SmfCampaignThread.cc > b/src/smf/smfd/SmfCampaignThread.cc > > --- a/src/smf/smfd/SmfCampaignThread.cc > > +++ b/src/smf/smfd/SmfCampaignThread.cc > > @@ -907,12 +907,10 @@ int SmfCampaignThread::handleEvents(void > > break; > > } > > > > - /* Process the Mail box events */ > > - if (fds[0].revents & POLLIN) { > > - /* dispatch MBX events */ > > - processEvt(); > > - } > > - > > + /* > > + * Handle NTF events first because processEvt may delete and > terminate the > > + * campaign thread. > > + */ > > if (fds[1].revents & POLLIN) { > > // dispatch NTF events > > rc = saNtfDispatch(m_ntfHandle, SA_DISPATCH_ALL); > > @@ -922,6 +920,12 @@ int SmfCampaignThread::handleEvents(void > > } > > } > > > > + /* Process the Mail box events */ > > + if (fds[0].revents & POLLIN) { > > + /* dispatch MBX events */ > > + processEvt(); > > + } > > + > > m_campaign->updateElapsedTime(); > > } > > TRACE_LEAVE(); > > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel