Re: [devel] [PATCH 0 of 1] Review Request for Review Request for smfd: Merge rolling to singlestep procedures for several nodes [#1685]
Hi Neel A document ticket is created see [#2017] and document update is ongoing Thanks Lennart > -Original Message- > From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com] > Sent: den 8 september 2016 14:30 > To: Rafael Odzakow; Lennart Lund > > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 0 of 1] Review Request for Review Request for smfd: > Merge rolling to singlestep procedures for several nodes [#1685] > > Hi Rafel, > > Reviewed and tested the patch. > Ack with the following comments. > > comments: > > 1. The bt has to be corrected: > > Program terminated with signal 11, Segmentation fault. > #0 0x004293c9 in SmfUpgradeCampaign::execute() () at > SmfUpgradeCampaign.cc:802 > 802 SmfUpgradeCampaign.cc: No such file or directory. > > 2. > > when controller nodes given to nodesForSingleStep needs to be returned > with error > > 3. > The usage of this functionality must be documented in SMF_PR > > Thanks, > Neel. > > On 2016/09/05 01:24 PM, Rafael Odzakow wrote: > > Summary: Merge rolling to singlestep procedures for several nodes > > Review request for Trac Ticket(s): #1685 > > Peer Reviewer(s): lennart, reddy > > Pull request to: <> > > Affected branch(es): <> > > Development branch: <> > > > > > > Impacted area Impact y/n > > > > Docsn > > Build systemn > > RPM/packaging n > > Configuration files n > > Startup scripts n > > SAF servicesn > > OpenSAF servicesy > > Core libraries n > > Samples n > > Tests n > > Other n > > > > > > Comments (indicate scope for each "y" above): > > - > > Added SI-SWAP and protection for openSafSmfExecControl attribute in > SmfConfig. > > > > Protection is added to keep backward compatibility while not allowing > changes > > to happen during upgrade to the Execution Control DN. As a bonus some > old code > > was possible to remove. > > > > changeset 3850c8ad75dcb065cb6eddf94f50958fcce63e60 > > Author: Rafael Odzakow > > Date: Mon, 05 Sep 2016 09:47:53 +0200 > > > > smfd: Merge rolling to singlestep procedures for several nodes > [#1685] > > > > Enables the balanced mode feature. This mode changes the execution > > of rolling procedures to be merged into one or several single steps that are > > spread out across the cluster nodes. This feature is used to give a faster > > upgrade time compared to rolling one node at a time, possibly several > times > > for each node. By splittting the procedures it into several single steps > > across the nodes a total service outage can be avoided. > > > > Added Files: > > > > osaf/services/saf/smfsv/smfd/SmfExecControl.cc > > osaf/services/saf/smfsv/smfd/SmfExecControl.h > > osaf/services/saf/smfsv/smfd/SmfExecControlHdl.cc > > osaf/services/saf/smfsv/smfd/SmfExecControlHdl.h > > > > > > Complete diffstat: > > -- > > osaf/services/saf/smfsv/config/smfsv_classes.xml| 1084 > ++ > -- > - > > osaf/services/saf/smfsv/smfd/Makefile.am| 8 +- > > osaf/services/saf/smfsv/smfd/SmfCampState.cc| 151 > > > > > osaf/services/saf/smfsv/smfd/SmfCampaign.cc |77 +++- > > osaf/services/saf/smfsv/smfd/SmfCampaign.hh | 1 + > > osaf/services/saf/smfsv/smfd/SmfExecControl.cc | 363 > ++ > + > > osaf/services/saf/smfsv/smfd/SmfExecControl.h |59 + > > osaf/services/saf/smfsv/smfd/SmfExecControlHdl.cc | 506 > ++ > + > > osaf/services/saf/smfsv/smfd/SmfExecControlHdl.h| 126 > > > osaf/services/saf/smfsv/smfd/SmfProcState.cc|28 - > > osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc |39 -- > > osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.hh |35 ++--- > > osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc |27 +++- > > osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh |18 ++- > > osaf/services/saf/smfsv/smfd/SmfUpgradeStep.cc |49 > > osaf/services/saf/smfsv/smfd/SmfUpgradeStep.hh |13 +- > > osaf/services/saf/smfsv/smfd/smfd.h | 2 +- > > osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc|32 - > > osaf/services/saf/smfsv/smfd/smfd_cb.h | 5 - > > 19 files changed, 1842 insertions(+), 781 deletions(-) >
Re: [devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]
Hi Zoran, I realized that the "numberOfItems" related argument is invalid after discussing with the ticket raiser, I misunderstood the word "crashed". So, I will remove that part of the text from the commit message. Thanks to you also for clarifying. Apart from that, the point of contention is about the assert behavior. You have a point that services have to be strict when handling bad SaNameT values. But that principle can be applied only to the extent of returning an API error code and we don't have to assert the user application. Furthermore, what the user passes is irrelevant in this CLM API's context because in this case it is an out-param. i.e. CLM will fill values in that structure before the call returns. What user specifies for an out-param is of no relevance and that's why this patch. Thanks, Mathi. > -Original Message- > From: Zoran Milinkovic [mailto:zoran.milinko...@ericsson.com] > Sent: Friday, September 09, 2016 4:38 PM > To: Mathivanan Naickan Palanivelu > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm > service is filling data [#2002] > > Hi Mathi, > > User cannot have bigger numberOfItem than the structure is allocated. > That's a user's fault and it must be detected in some way. > > If wrong input data is not detected at this place, it must be detected in > another place. > > All at all, I don't see anything wrong with the code. I don't see that the > test > case is a valid test case. > > With the support for long DN, CLM has become more strict for bad SaNameT > values, and assert bad SaNameT values. > > If you want to avoid assert, then the code can be rewritten to handle non- > extended SaNameT values longer than 255 characters. > > BR, > Zoran > > > -Original Message- > From: mathi.naic...@oracle.com [mailto:mathi.naic...@oracle.com] > Sent: den 9 september 2016 11:40 > To: Zoran Milinkovic > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm > service is filling data [#2002] > > osaf/libs/agents/saf/clma/clma_api.c | 23 --- > 1 files changed, 0 insertions(+), 23 deletions(-) > > > When CLM is going to fill the values of notification buffer, there is no need > to > check the length values that is passed by the user. > Also, there can be cases when user has a bigger numofItems, but has > allocated lesser than that. > In such cases, dereferencing the notification can be avoided inside this > function. > > diff --git a/osaf/libs/agents/saf/clma/clma_api.c > b/osaf/libs/agents/saf/clma/clma_api.c > --- a/osaf/libs/agents/saf/clma/clma_api.c > +++ b/osaf/libs/agents/saf/clma/clma_api.c > @@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b > > /* validate the notify buffer */ > if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { > - uint32_t i; > - > if (!buf->numberOfItems) > return SA_AIS_ERR_INVALID_PARAM; > - > - // Check that nodeName is not longer than 255 > - for(i=0; inumberOfItems; i++) { > - if(osaf_extended_name_length( > >notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { > - return SA_AIS_ERR_INVALID_PARAM; > - } > - } > } > > /* Validate if flag is TRACK_CURRENT and no callback and no buffer > provided */ @@ -167,24 +158,10 @@ static SaAisErrorT > clma_validate_flags_b > > /* validate the notify buffer */ > if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { > - uint32_t i; > - > if (!buf->numberOfItems) { > TRACE_LEAVE(); > return SA_AIS_ERR_INVALID_PARAM; > } > - > - // Check that nodeName and EE are not longer than 255 > - for(i=0; inumberOfItems; i++) { > - if(osaf_extended_name_length( > >notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { > - TRACE_LEAVE(); > - return SA_AIS_ERR_INVALID_PARAM; > - } > - if(osaf_extended_name_length( > >notification[i].clusterNode.executionEnvironment) >= > SA_MAX_NAME_LENGTH) { > - TRACE_LEAVE(); > - return SA_AIS_ERR_INVALID_PARAM; > - } > - } > } > > /* Validate if flag is TRACK_CURRENT and no callback and no buffer > provided */ -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]
Hi Mathi, notificationBuffer in only OUT parameter, and input parameters should not be checked. Ack from me. Thanks, Zoran -Original Message- From: mathi.naic...@oracle.com [mailto:mathi.naic...@oracle.com] Sent: den 9 september 2016 11:40 To: Zoran Milinkovic Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002] osaf/libs/agents/saf/clma/clma_api.c | 23 --- 1 files changed, 0 insertions(+), 23 deletions(-) When CLM is going to fill the values of notification buffer, there is no need to check the length values that is passed by the user. Also, there can be cases when user has a bigger numofItems, but has allocated lesser than that. In such cases, dereferencing the notification can be avoided inside this function. diff --git a/osaf/libs/agents/saf/clma/clma_api.c b/osaf/libs/agents/saf/clma/clma_api.c --- a/osaf/libs/agents/saf/clma/clma_api.c +++ b/osaf/libs/agents/saf/clma/clma_api.c @@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b /* validate the notify buffer */ if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { - uint32_t i; - if (!buf->numberOfItems) return SA_AIS_ERR_INVALID_PARAM; - - // Check that nodeName is not longer than 255 - for(i=0; inumberOfItems; i++) { - if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { - return SA_AIS_ERR_INVALID_PARAM; - } - } } /* Validate if flag is TRACK_CURRENT and no callback and no buffer provided */ @@ -167,24 +158,10 @@ static SaAisErrorT clma_validate_flags_b /* validate the notify buffer */ if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { - uint32_t i; - if (!buf->numberOfItems) { TRACE_LEAVE(); return SA_AIS_ERR_INVALID_PARAM; } - - // Check that nodeName and EE are not longer than 255 - for(i=0; inumberOfItems; i++) { - if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { - TRACE_LEAVE(); - return SA_AIS_ERR_INVALID_PARAM; - } - if(osaf_extended_name_length(>notification[i].clusterNode.executionEnvironment) >= SA_MAX_NAME_LENGTH) { - TRACE_LEAVE(); - return SA_AIS_ERR_INVALID_PARAM; - } - } } /* Validate if flag is TRACK_CURRENT and no callback and no buffer provided */ -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 2 of 4] AMFND: Admin operation continuation if csi completes during headless [#1725 part 1] V1
Hi Minh, I am using 1725_pending_review.tgz (1725_02_V2_bugfix_01_resend_buffer_in_set_leds.diff, 1725_02_V2_bugfix_02_honor_clusterinit_nodesync_timer.diff, 1725_02_V2_bugfix_03_restore_ng_admin.diff, 1725_03_V4_failover_absent_susi_longDn.diff, 1725_04_V2_headless_validation.diff, 1725_05_V2_resend_oper_state.diff, 1725_06a_fullscope_escalation_headless.diff). I am doing basic node reboot validation testing with no faults. Configuration: SU1(act) and SU2(stanby) both on PL-3. TC #1: Start SC-1, PL-3 and PL-5: Unlock SU1 and SU2. Stop SC-1 and stop PL-3, start PL-3 and start SC-1. After SC-1 and PL-3 comes back, ideally SU1 and SU2 should get assignments as Act and Std, but no assignment are being given to SUs on PL-3 and it shows following in status: Only Su2 has Std assignment. safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) TC #2: Configuration same as TC#1. Stop PL-3 and don't start. The same issue: safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) TC #3: Configured SU1(Act) on PL-3 and SU2(Std) on PL-4. Stop SC-1, stop PL-3 and PL-4, but PL-5 is running. start SC-1, the same issue. TC #4: Same as TC #3, but SU3 configured on PL-5 as spare. SU3 doesn't get any assignment and Sg is unstable. Thanks -Nagu > -Original Message- > From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] > Sent: 18 August 2016 05:46 > To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya; > gary@dektech.com.au; long.hb.ngu...@dektech.com.au; > minh.c...@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 2 of 4] AMFND: Admin operation continuation if csi > completes during headless [#1725 part 1] V1 > > osaf/services/saf/amf/amfnd/di.cc | 199 > + > osaf/services/saf/amf/amfnd/include/avnd_di.h |1 + > 2 files changed, 134 insertions(+), 66 deletions(-) > > > There're two options basically that AMFD can continue admin operation wih > completed csi(s) > > First: AMFD can use the sync SUSI fsm state as latest, AMFD then has to > explore its SUSI assignments with adminStates of relevant entities to > determine which SU should be on call of susi_success(). Deeper level of > exploration for csi addition. It also depends on SG Fsm state which is being > used variously in different SG types. > > Second: AMFD uses the SUSI fsm state read from IMM as latest, and AMFND > needs to resend susi_resp messages which were deferred during headless so > that AMFD can continue the admin operation sequence. Both cases of csi > completion [during or after] headless can run in the same code flow. > > The patch buffers susi_resp_msg during headless stage and resend it to > AMFD after headless. There could be a chance that AMFND sent out susi > response message but AMFD could not receive or process it. This case could > be seen as a defect, which can be fixed by securing the result of sending > susi_resp message from AMFND toward AMFD. > > diff --git a/osaf/services/saf/amf/amfnd/di.cc > b/osaf/services/saf/amf/amfnd/di.cc > --- a/osaf/services/saf/amf/amfnd/di.cc > +++ b/osaf/services/saf/amf/amfnd/di.cc > @@ -805,11 +805,6 @@ uint32_t avnd_di_susi_resp_send(AVND_CB > if (cb->term_state == > AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED) > return rc; > > - if (cb->is_avd_down == true) { > -m_AVND_SU_ALL_SI_RESET(su); > - return rc; > - } > - > // should be in assignment pending state to be here > osafassert(m_AVND_SU_IS_ASSIGN_PEND(su)); > > @@ -820,64 +815,76 @@ uint32_t avnd_di_susi_resp_send(AVND_CB > TRACE_ENTER2("Sending Resp su=%s, si=%s, curr_state=%u, > prv_state=%u", su->name.value, curr_si->name.value,curr_si- > >curr_state,curr_si->prv_state); > /* populate the susi resp msg */ > msg.info.avd = new AVSV_DND_MSG(); > -msg.type = AVND_MSG_AVD; > -msg.info.avd->msg_type = AVSV_N2D_INFO_SU_SI_ASSIGN_MSG; > -msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb- > >snd_msg_id); > -msg.info.avd->msg_info.n2d_su_si_assign.node_id =
[devel] [PATCH 0 of 1] Review Request for amfd: do not flush imm jobs on active controller[#2016]
Summary: amfd: do not flush imm jobs on active controller[#2016] Review request for Trac Ticket(s): #2016 Peer Reviewer(s): AMf devs Pull request to: <> Affected branch(es): ALL Development branch: <> Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - changeset 21daf300e419eed5d9d977bc366451a28506ecfe Author: praveen.malv...@oracle.com Date: Fri, 09 Sep 2016 17:41:00 +0530 amfd: do not flush imm jobs on active controller[#2016] AMF pushes IMM updates in a queue to be updated after finishing main job. currently, standby amfd as a applier flushes job queue if number of elements reaches to 200. Since check is on implementer status, active controller may flush it when it loses implementer status and goes for reinit with IMM(one case is when IMM returns BAD_HANDLE, a rare case). Patch checks the role of amfd besides implemeter status before flusing the job queue. Complete diffstat: -- osaf/services/saf/amf/amfd/imm.cc | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Testing Commands: - Not able to reproduce. Simulated by forcing AMFD to finalize with IMM. Testing, Expected Results: -- Problem observed while simulating it. Passed with this patch. Conditions of Submission: - Ack from any reviewer. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1 of 1] amfd: do not flush imm jobs on active controller[#2016]
osaf/services/saf/amf/amfd/imm.cc | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) AMF pushes IMM updates in a queue to be updated after finishing main job. currently, standby amfd as a applier flushes job queue if number of elements reaches to 200. Since check is on implementer status, active controller may flush it when it loses implementer status and goes for reinit with IMM(one case is when IMM returns BAD_HANDLE, a rare case). Patch checks the role of amfd besides implemeter status before flusing the job queue. diff --git a/osaf/services/saf/amf/amfd/imm.cc b/osaf/services/saf/amf/amfd/imm.cc --- a/osaf/services/saf/amf/amfd/imm.cc +++ b/osaf/services/saf/amf/amfd/imm.cc @@ -385,7 +385,7 @@ AvdJobDequeueResultT Fifo::execute(SaImm if (!avd_cb->active_services_exist) return JOB_ETRYAGAIN; - if (!avd_cb->is_implementer) { + if ((!avd_cb->is_implementer) && (avd_cb->avail_state_avd == SA_AMF_HA_STANDBY)) { check_and_flush_job_queue_standby_amfd(); return JOB_EINVH; } -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] ntf: cluster rebooted with ntfd crashed on both controllers [#2006]
Hi Vu, Please find one comment below. Thanks, Praveen On 08-Sep-16 5:24 PM, Vu Minh Nguyen wrote: > osaf/libs/agents/saf/ntfa/ntfa_api.c | 18 + > osaf/libs/common/ntfsv/include/ntfsv_msg.h | 1 - > osaf/libs/common/ntfsv/ntfsv_mem.c | 2 +- > tests/ntfsv/tet_ntf_common.c | 5 +- > tests/ntfsv/tet_saNtfNotificationSend.c| 92 > ++ > 5 files changed, 113 insertions(+), 5 deletions(-) > > > In AIS, it states "additionalText length must be consistent with > lengthAdditionalText" > But NTFA did not check this. So, when data is passing to LOGA, ntfsv got > invalid param. > > This patch adds the check. Also fix few error in ntftest - using `strlen` to > calculate > the string length instead of `sizeof`. > > diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c > b/osaf/libs/agents/saf/ntfa/ntfa_api.c > --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c > +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c > @@ -550,6 +550,24 @@ static SaAisErrorT checkHeader(v_data *p > return rc; > } > > + // AIS: additionalText must be consistent with lengthAdditionalText > + if (nh->additionalText == NULL && nh->lengthAdditionalText != 0) { > + TRACE_1("Mismatch b/w additionalText and lengthAdditionalText"); > + return SA_AIS_ERR_INVALID_PARAM; > + } > + > + if (nh->lengthAdditionalText > MAX_ADDITIONAL_TEXT_LENGTH) { > + TRACE_1("lengthAdditionalText is too long"); > + return SA_AIS_ERR_INVALID_PARAM; > + } [Paveen] I think above check should also be done as a part of NotifocationAllocate() APIs because it will be too late to return INVALID_PARAM in Send() and force user to reallocate. I know MAX_ADDITIONAL_TEXT_LENGTH is very high. What do you think? > + > + SaUint16T len = nh->lengthAdditionalText; > + SaStringT addT = nh->additionalText; > + if ((addT != NULL) && (len != strlen(addT) + 1)) { > + TRACE_1("Mismatch b/w additionalText and lengthAdditionalText"); > + return SA_AIS_ERR_INVALID_PARAM; > + } > + > return SA_AIS_OK; > } > > diff --git a/osaf/libs/common/ntfsv/include/ntfsv_msg.h > b/osaf/libs/common/ntfsv/include/ntfsv_msg.h > --- a/osaf/libs/common/ntfsv/include/ntfsv_msg.h > +++ b/osaf/libs/common/ntfsv/include/ntfsv_msg.h > @@ -34,7 +34,6 @@ extern "C" { > /*MAX length of addtionaltext conforms to MAX value of logMaxLogrecsize >as mentioned in LOGSV PR doc Section 3.5.2.1*/ > #define MAX_ADDITIONAL_TEXT_LENGTH 65535 > -#define MAX_DISCARDED_NOTIFICATIONS 1024 > > /* Message type enums */ > typedef enum { > diff --git a/osaf/libs/common/ntfsv/ntfsv_mem.c > b/osaf/libs/common/ntfsv/ntfsv_mem.c > --- a/osaf/libs/common/ntfsv/ntfsv_mem.c > +++ b/osaf/libs/common/ntfsv/ntfsv_mem.c > @@ -222,7 +222,7 @@ SaAisErrorT ntfsv_alloc_ntf_header(SaNtf > > /* Additional text */ > if (lengthAdditionalText != 0) { > - notificationHeader->additionalText = > (SaStringT)malloc(lengthAdditionalText * sizeof(char)); > + notificationHeader->additionalText = (SaStringT)calloc(1, > lengthAdditionalText * sizeof(char)); > if (notificationHeader->additionalText == NULL) { > TRACE("Out of memory in additionalText field"); > rc = SA_AIS_ERR_NO_MEMORY; > diff --git a/tests/ntfsv/tet_ntf_common.c b/tests/ntfsv/tet_ntf_common.c > --- a/tests/ntfsv/tet_ntf_common.c > +++ b/tests/ntfsv/tet_ntf_common.c > @@ -797,7 +797,7 @@ void createObjectCreateDeleteNotificatio > ntfHandle, /* handle to Notification Service instance */ > myObjCrDelNotification, > 0, > - (SaUint16T)(sizeof(DEFAULT_ADDITIONAL_TEXT) +1), > + (SaUint16T)(strlen(DEFAULT_ADDITIONAL_TEXT) +1), > 0, > 2, > SA_NTF_ALLOC_SYSTEM_LIMIT), SA_AIS_OK); > @@ -840,7 +840,7 @@ void createObjectCreateDeleteNotificatio > /* set additional text and additional info */ > (void) strncpy(head->additionalText, > DEFAULT_ADDITIONAL_TEXT, > - (SaUint16T)(sizeof(DEFAULT_ADDITIONAL_TEXT) +1)); > + (SaUint16T)(strlen(DEFAULT_ADDITIONAL_TEXT) +1)); > > /* Set source indicator */ > *(myObjCrDelNotification->sourceIndicator) = SA_NTF_UNKNOWN_OPERATION; > @@ -852,7 +852,6 @@ void createObjectCreateDeleteNotificatio > myObjCrDelNotification->objectAttributes[1].attributeId = 1; > myObjCrDelNotification->objectAttributes[1].attributeType = > SA_NTF_VALUE_INT32; > myObjCrDelNotification->objectAttributes[1].attributeValue.int32Val = 2; > - > } > > > diff --git a/tests/ntfsv/tet_saNtfNotificationSend.c > b/tests/ntfsv/tet_saNtfNotificationSend.c > --- a/tests/ntfsv/tet_saNtfNotificationSend.c > +++
Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987]
Hi Minh, Please find inline. Thanks, Praveen On 09-Sep-16 4:30 PM, minh chau wrote: > Hi Praveen, > > Please see comment in line with [Minh] > > Thanks, > Minh > > On 09/09/16 17:06, praveen malviya wrote: >> Hi Minh, >> >> I could not understand why AMF should become implementer/applier earlier. > [Minh]: We need to read osaAmfSGFsmState for differentiation of > nodegroup operation while exploring SUSI assignment. osafAmfSGFsmState > needs to be available before avd_susi_read_headless_cached_rta(). > avd_susi_read_headless_cached_rta() needs to be available before > avd_sg_read_headless_cached_rta() for purpose of checking > SUOperationList. So I think it's the best if we can retrieve > osafAmfSGFsmState in avd_sg_config_get(). To read osafAmfSGFsmState as > RTA, AMF needs to be implementer before reading RTA, otherwise the > returned value from IMM is dummy (most of the time I got it as 0). If > there's not simpler way to do, I would go for reading osafAmfSGFsmState > in avd_sg_config_get(), but we also need to set applier for standby AMFD > before avd_sg_config_get() to avoid issue in #1720. [Praveen] For runtime non-cached attributes, IMM gives callback to implementer to fetch the latest value. So if implementer is not set, then a client like immlist may face a problem to get the latest value. But osafAmfSGFsmState is a runtime cached attribute, IMM should return the value from its database. Is it a bug or an IMM restriction that if implementer is not registered then it will return dummy value? >> Anyways, please find one query below with [Praveen]. >> >> Thanks, >> Praveen >> >> On 05-Sep-16 7:13 AM, Minh Hon Chau wrote: >>> osaf/services/saf/amf/amfd/include/node.h | 3 + >>> osaf/services/saf/amf/amfd/include/sg.h | 1 + >>> osaf/services/saf/amf/amfd/nodegroup.cc | 83 >>> +++ >>> osaf/services/saf/amf/amfd/role.cc| 20 +++--- >>> osaf/services/saf/amf/amfd/sg.cc | 15 ++-- >>> osaf/services/saf/amf/amfd/sgproc.cc | 2 +- >>> osaf/services/saf/amf/amfd/siass.cc | 4 +- >>> 7 files changed, 109 insertions(+), 19 deletions(-) >>> >>> >>> The SG becomes unstable because some variables used in nodegroup >>> operation are not >>> restored after headless if this admin operation on nodegroup was >>> interrupted just >>> before cluster goes into headless stage. >>> >>> In order to restore nodegroup operation, AMF needs to know exactly >>> whether nodegroup >>> operation was running during headless up on @susi assignment. If susi >>> is in QUIESCED, >>> QUIESCING or being removed while its related entities su, si, sg are >>> not in LOCKED >>> and SHUTTING_DOWN, that means either node or nodegroup MUST be in >>> LOCKED or SHUTTING >>> DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to >>> know a nodegroup >>> operation was running. However, if saAmfNGAdminState is in LOCKED, >>> this case is an >>> ambiguity of locking a node. The reason of differentiation of locking >>> a node or node >>> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and >>> AVD_SG_FSM_SU_OPER for node >>> group operation while AVD_SG_FSM_SU_OPER is only used for node >>> operation. When 2N SG >>> uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is >>> borrowed (but not >>> updated to IMM) to run the admin operation sequence. Therefore, after >>> headless if >>> AVD_SG_FSM_SG_ADMIN was being used for nodegroup then >>> saAmfSGAdminState also needs to >>> be set. >>> >>> Because SG FSM state is used to restore nodegroup during restoring >>> susi assignment, >>> the osaAmfSGFsmState (RTA) needs to be read earlier than reading susi >>> assignment. This >>> needs active AMFD become implementer earlier than reading sg object. >>> There was a known >>> ticket reported in 1720, if only make active AMFD as early >>> implementer than it will >>> cause the standby AMFD missing ccb apply callback.This patch also >>> needs to set both >>> active and standby AMFD become implementer and applier earlier so >>> that AMFD can read >>> osaAmfSGFsmState and do not cause regression of 1720. >>> >> >>> diff --git a/osaf/services/saf/amf/amfd/include/node.h >>> b/osaf/services/saf/amf/amfd/include/node.h >>> --- a/osaf/services/saf/amf/amfd/include/node.h >>> +++ b/osaf/services/saf/amf/amfd/include/node.h >>> @@ -45,6 +45,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> class AVD_SU; >>> struct avd_cluster_tag; >>> @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF >>> extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT >>> state); >>> extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node); >>> extern bool any_ng_in_locked_in_state(const AVD_AVND *node); >>> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct >>> avd_su_si_rel_tag* susi); >>> + >>> #endif >>> diff --git a/osaf/services/saf/amf/amfd/include/sg.h >>> b/osaf/services/saf/amf/amfd/include/sg.h >>> ---
Re: [devel] Review request for updates to CLM PR FOR 5.1
Hi Mathi, Ack from me. Thanks, Zoran -Original Message- From: Mathivanan Naickan Palanivelu [mailto:mathi.naic...@oracle.com] Sent: den 9 september 2016 12:23 To: Zoran Milinkovic Cc: opensaf-devel@lists.sourceforge.net Subject: Review request for updates to CLM PR FOR 5.1 Hi, Please find the updated CLM PR doc (with track changes enabled) for review. Thanks, Mathi. -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]
Hi Mathi, User cannot have bigger numberOfItem than the structure is allocated. That's a user's fault and it must be detected in some way. If wrong input data is not detected at this place, it must be detected in another place. All at all, I don't see anything wrong with the code. I don't see that the test case is a valid test case. With the support for long DN, CLM has become more strict for bad SaNameT values, and assert bad SaNameT values. If you want to avoid assert, then the code can be rewritten to handle non-extended SaNameT values longer than 255 characters. BR, Zoran -Original Message- From: mathi.naic...@oracle.com [mailto:mathi.naic...@oracle.com] Sent: den 9 september 2016 11:40 To: Zoran Milinkovic Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002] osaf/libs/agents/saf/clma/clma_api.c | 23 --- 1 files changed, 0 insertions(+), 23 deletions(-) When CLM is going to fill the values of notification buffer, there is no need to check the length values that is passed by the user. Also, there can be cases when user has a bigger numofItems, but has allocated lesser than that. In such cases, dereferencing the notification can be avoided inside this function. diff --git a/osaf/libs/agents/saf/clma/clma_api.c b/osaf/libs/agents/saf/clma/clma_api.c --- a/osaf/libs/agents/saf/clma/clma_api.c +++ b/osaf/libs/agents/saf/clma/clma_api.c @@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b /* validate the notify buffer */ if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { - uint32_t i; - if (!buf->numberOfItems) return SA_AIS_ERR_INVALID_PARAM; - - // Check that nodeName is not longer than 255 - for(i=0; inumberOfItems; i++) { - if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { - return SA_AIS_ERR_INVALID_PARAM; - } - } } /* Validate if flag is TRACK_CURRENT and no callback and no buffer provided */ @@ -167,24 +158,10 @@ static SaAisErrorT clma_validate_flags_b /* validate the notify buffer */ if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { - uint32_t i; - if (!buf->numberOfItems) { TRACE_LEAVE(); return SA_AIS_ERR_INVALID_PARAM; } - - // Check that nodeName and EE are not longer than 255 - for(i=0; inumberOfItems; i++) { - if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { - TRACE_LEAVE(); - return SA_AIS_ERR_INVALID_PARAM; - } - if(osaf_extended_name_length(>notification[i].clusterNode.executionEnvironment) >= SA_MAX_NAME_LENGTH) { - TRACE_LEAVE(); - return SA_AIS_ERR_INVALID_PARAM; - } - } } /* Validate if flag is TRACK_CURRENT and no callback and no buffer provided */ -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] AMFND: Fix amfnd coredump if sc failover while shutting down [#2008]
Hi Praveen, I saw Hans gave ack on this #2008 yesterday so I pushed it this morning. As if it's simpler fix by introducing new flag then I think you can float another patch. Maybe it's after 5.1RC? Thanks, Minh On 09/09/16 17:22, praveen malviya wrote: > Hi, > > I saw the ticket is pushed. But there is another simpler way of fixing > it on top of previous fix of similar problem which was fixed in > avnd_evt_avd_info_su_si_assign_evh() with following code: > if ((cb->term_state == AVND_TERM_STATE_OPENSAF_SHUTDOWN_INITIATED) || > (cb->term_state == > AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED)) { > if ((su->is_ncs == true) && > (info->msg_act == AVSV_SUSI_ACT_MOD) && > (info->ha_state == SA_AMF_HA_ACTIVE)) { > LOG_NO("shutdown started, failover requested, > escalate to forced shutdown"); > avnd_last_step_clean(cb); > } else { > > Above fix does not distinguish between removal of assignments and > clean ups of comps during shutdown phase. These are two sub-phases. > So simpler fix could be to distinguish between these two sub-phases by > introducing a new flag like > AVND_TERM_STATE_OPENSAF_SHUTDOWN_CLEANUP_STARTED. > > Thanks, > Praveen > > > On 08-Sep-16 5:31 PM, Minh Hon Chau wrote: >> osaf/libs/common/amf/include/amf_db_template.h | 11 +++ >> osaf/services/saf/amf/amfnd/clc.cc | 6 ++ >> osaf/services/saf/amf/amfnd/term.cc| 16 ++-- >> 3 files changed, 19 insertions(+), 14 deletions(-) >> >> >> During cluster shutting down phase, if both controllers do not shutdown >> fast enough and active controller goes down first, then a possibility of >> sc failover happens. In this situation, avnd_last_step_clean() gets >> called >> twice, a coredump is generated. It most likely because deleting >> record in >> nodeid_mdsdest_db and hctypedb but those container still own the key. >> Thus, >> the second call of avnd_last_step_clean() cause coredump. >> >> Patch ensure deletion of nodeid_mdsdest_db and hctypedb is the last step >> before exit >> >> diff --git a/osaf/libs/common/amf/include/amf_db_template.h >> b/osaf/libs/common/amf/include/amf_db_template.h >> --- a/osaf/libs/common/amf/include/amf_db_template.h >> +++ b/osaf/libs/common/amf/include/amf_db_template.h >> @@ -73,6 +73,7 @@ class AmfDb { >>public: >> unsigned int insert(const Key , T *obj); >> void erase(const Key ); >> + void deleteAll(); >> T *find(const Key ); >> T *findNext(const Key ); >> >> @@ -120,6 +121,16 @@ void AmfDb::erase(const Key >> >> // >> template >> +void AmfDb ::deleteAll() { >> + for (const auto& it: db) { >> +delete it.second; >> + } >> + db.clear(); >> +} >> + >> + >> +// >> +template >> T *AmfDb ::find(const Key ) { >>typename AmfDbMap::iterator it = db.find(key); >>if (it == db.end()) >> diff --git a/osaf/services/saf/amf/amfnd/clc.cc >> b/osaf/services/saf/amf/amfnd/clc.cc >> --- a/osaf/services/saf/amf/amfnd/clc.cc >> +++ b/osaf/services/saf/amf/amfnd/clc.cc >> @@ -815,6 +815,8 @@ uint32_t avnd_comp_clc_fsm_run(AVND_CB * >> if (all_comps_terminated()) { >> LOG_NO("Terminated all AMF components"); >> LOG_NO("Shutdown completed, exiting"); >> +cb->nodeid_mdsdest_db.deleteAll(); >> +cb->hctypedb.deleteAll(); >> exit(0); >> } else { >> TRACE("Do nothing"); >> @@ -2301,6 +2303,8 @@ uint32_t avnd_comp_clc_terming_cleansucc >> if (all_comps_terminated()) { >> LOG_NO("Terminated all AMF components"); >> LOG_NO("Shutdown completed, exiting"); >> +cb->nodeid_mdsdest_db.deleteAll(); >> +cb->hctypedb.deleteAll(); >> exit(0); >> } >> } >> @@ -2362,6 +2366,8 @@ uint32_t avnd_comp_clc_terming_cleanfail >> all_comps_terminated()) { >> LOG_WA("Terminated all AMF components (with failures)"); >> LOG_NO("Shutdown completed, exiting"); >> +cb->nodeid_mdsdest_db.deleteAll(); >> +cb->hctypedb.deleteAll(); >> exit(0); >> } >> >> diff --git a/osaf/services/saf/amf/amfnd/term.cc >> b/osaf/services/saf/amf/amfnd/term.cc >> --- a/osaf/services/saf/amf/amfnd/term.cc >> +++ b/osaf/services/saf/amf/amfnd/term.cc >> @@ -58,8 +58,6 @@ extern const AVND_EVT_HDLR g_avnd_func_l >> void avnd_last_step_clean(AVND_CB *cb) >> { >> AVND_COMP *comp; >> -AVND_NODEID_TO_MDSDEST_MAP *rec = nullptr; >> -AVND_HCTYPE *hc = nullptr; >> int cleanup_call_cnt = 0; >> >> TRACE_ENTER(); >> @@ -86,21 +84,11 @@ void avnd_last_step_clean(AVND_CB *cb) >> /* Stop was called early or some other problem */ >> if (cleanup_call_cnt == 0) { >> LOG_NO("No
Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987]
Hi Praveen, Please see comment in line with [Minh] Thanks, Minh On 09/09/16 17:06, praveen malviya wrote: > Hi Minh, > > I could not understand why AMF should become implementer/applier earlier. [Minh]: We need to read osaAmfSGFsmState for differentiation of nodegroup operation while exploring SUSI assignment. osafAmfSGFsmState needs to be available before avd_susi_read_headless_cached_rta(). avd_susi_read_headless_cached_rta() needs to be available before avd_sg_read_headless_cached_rta() for purpose of checking SUOperationList. So I think it's the best if we can retrieve osafAmfSGFsmState in avd_sg_config_get(). To read osafAmfSGFsmState as RTA, AMF needs to be implementer before reading RTA, otherwise the returned value from IMM is dummy (most of the time I got it as 0). If there's not simpler way to do, I would go for reading osafAmfSGFsmState in avd_sg_config_get(), but we also need to set applier for standby AMFD before avd_sg_config_get() to avoid issue in #1720. > Anyways, please find one query below with [Praveen]. > > Thanks, > Praveen > > On 05-Sep-16 7:13 AM, Minh Hon Chau wrote: >> osaf/services/saf/amf/amfd/include/node.h | 3 + >> osaf/services/saf/amf/amfd/include/sg.h | 1 + >> osaf/services/saf/amf/amfd/nodegroup.cc | 83 >> +++ >> osaf/services/saf/amf/amfd/role.cc| 20 +++--- >> osaf/services/saf/amf/amfd/sg.cc | 15 ++-- >> osaf/services/saf/amf/amfd/sgproc.cc | 2 +- >> osaf/services/saf/amf/amfd/siass.cc | 4 +- >> 7 files changed, 109 insertions(+), 19 deletions(-) >> >> >> The SG becomes unstable because some variables used in nodegroup >> operation are not >> restored after headless if this admin operation on nodegroup was >> interrupted just >> before cluster goes into headless stage. >> >> In order to restore nodegroup operation, AMF needs to know exactly >> whether nodegroup >> operation was running during headless up on @susi assignment. If susi >> is in QUIESCED, >> QUIESCING or being removed while its related entities su, si, sg are >> not in LOCKED >> and SHUTTING_DOWN, that means either node or nodegroup MUST be in >> LOCKED or SHUTTING >> DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to >> know a nodegroup >> operation was running. However, if saAmfNGAdminState is in LOCKED, >> this case is an >> ambiguity of locking a node. The reason of differentiation of locking >> a node or node >> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and >> AVD_SG_FSM_SU_OPER for node >> group operation while AVD_SG_FSM_SU_OPER is only used for node >> operation. When 2N SG >> uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is >> borrowed (but not >> updated to IMM) to run the admin operation sequence. Therefore, after >> headless if >> AVD_SG_FSM_SG_ADMIN was being used for nodegroup then >> saAmfSGAdminState also needs to >> be set. >> >> Because SG FSM state is used to restore nodegroup during restoring >> susi assignment, >> the osaAmfSGFsmState (RTA) needs to be read earlier than reading susi >> assignment. This >> needs active AMFD become implementer earlier than reading sg object. >> There was a known >> ticket reported in 1720, if only make active AMFD as early >> implementer than it will >> cause the standby AMFD missing ccb apply callback.This patch also >> needs to set both >> active and standby AMFD become implementer and applier earlier so >> that AMFD can read >> osaAmfSGFsmState and do not cause regression of 1720. >> > >> diff --git a/osaf/services/saf/amf/amfd/include/node.h >> b/osaf/services/saf/amf/amfd/include/node.h >> --- a/osaf/services/saf/amf/amfd/include/node.h >> +++ b/osaf/services/saf/amf/amfd/include/node.h >> @@ -45,6 +45,7 @@ >> #include >> #include >> #include >> +#include >> >> class AVD_SU; >> struct avd_cluster_tag; >> @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF >> extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT >> state); >> extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node); >> extern bool any_ng_in_locked_in_state(const AVD_AVND *node); >> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct >> avd_su_si_rel_tag* susi); >> + >> #endif >> diff --git a/osaf/services/saf/amf/amfd/include/sg.h >> b/osaf/services/saf/amf/amfd/include/sg.h >> --- a/osaf/services/saf/amf/amfd/include/sg.h >> +++ b/osaf/services/saf/amf/amfd/include/sg.h >> @@ -46,6 +46,7 @@ >> class AVD_SU; >> class AVD_SI; >> class AVD_APP; >> +class AVD_AMF_NG; >> >> /* The valid SG FSM states. */ >> typedef enum { >> diff --git a/osaf/services/saf/amf/amfd/nodegroup.cc >> b/osaf/services/saf/amf/amfd/nodegroup.cc >> --- a/osaf/services/saf/amf/amfd/nodegroup.cc >> +++ b/osaf/services/saf/amf/amfd/nodegroup.cc >> @@ -1356,3 +1356,86 @@ void avd_ng_constructor(void) >> ng_ccb_apply_cb); >> } >> >> +/* >> + * @brief: Restore nodegroup related variables which
[devel] Review request for updates to CLM PR FOR 5.1
Hi, Please find the updated CLM PR doc (with track changes enabled) for review. Thanks, Mathi. OpenSAF_CLMSv_PR_5.1_Mathi.odt Description: application/vnd.oasis.opendocument.text -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]
osaf/libs/agents/saf/clma/clma_api.c | 23 --- 1 files changed, 0 insertions(+), 23 deletions(-) When CLM is going to fill the values of notification buffer, there is no need to check the length values that is passed by the user. Also, there can be cases when user has a bigger numofItems, but has allocated lesser than that. In such cases, dereferencing the notification can be avoided inside this function. diff --git a/osaf/libs/agents/saf/clma/clma_api.c b/osaf/libs/agents/saf/clma/clma_api.c --- a/osaf/libs/agents/saf/clma/clma_api.c +++ b/osaf/libs/agents/saf/clma/clma_api.c @@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b /* validate the notify buffer */ if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { - uint32_t i; - if (!buf->numberOfItems) return SA_AIS_ERR_INVALID_PARAM; - - // Check that nodeName is not longer than 255 - for(i=0; inumberOfItems; i++) { - if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { - return SA_AIS_ERR_INVALID_PARAM; - } - } } /* Validate if flag is TRACK_CURRENT and no callback and no buffer provided */ @@ -167,24 +158,10 @@ static SaAisErrorT clma_validate_flags_b /* validate the notify buffer */ if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) { - uint32_t i; - if (!buf->numberOfItems) { TRACE_LEAVE(); return SA_AIS_ERR_INVALID_PARAM; } - - // Check that nodeName and EE are not longer than 255 - for(i=0; inumberOfItems; i++) { - if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) { - TRACE_LEAVE(); - return SA_AIS_ERR_INVALID_PARAM; - } - if(osaf_extended_name_length(>notification[i].clusterNode.executionEnvironment) >= SA_MAX_NAME_LENGTH) { - TRACE_LEAVE(); - return SA_AIS_ERR_INVALID_PARAM; - } - } } /* Validate if flag is TRACK_CURRENT and no callback and no buffer provided */ -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0 of 1] Review Request for clm: avoid checking length of sanamet when clm service is filling data [#2002]
Summary: clm: avoid checking length of sanamet when clm service is filling data [#2002] Review request for Trac Ticket(s): #2002 Peer Reviewer(s): Zoran Pull request to: <> Affected branch(es): opensaf-5.1.x, default Development branch: <> Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - changeset 1449b7ad43db3e77226abc433aa1a93ba9d5e166 Author: mathi.naic...@oracle.com Date: Fri, 09 Sep 2016 15:07:36 +0530 clm: avoid checking length of sanamet when clm service is filling data [#2002] When CLM is going to fill the values of notification buffer, there is no need to check the length values that is passed by the user. Also, there can be cases when user has a bigger numofItems, but has allocated lesser than that. In such cases, dereferencing the notification can be avoided inside this function. Complete diffstat: -- osaf/libs/agents/saf/clma/clma_api.c | 23 --- 1 files changed, 0 insertions(+), 23 deletions(-) Testing Commands: - As explained in the ticket. Testing, Expected Results: -- Same as above. Conditions of Submission: - Ack from zoran. Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 n n powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] lga: ignore CLM status if client Initialize with A.2.1 [#1999]
Ack Regards, Vu > -Original Message- > From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] > Sent: Wednesday, September 7, 2016 3:35 PM > To: vu.m.ngu...@dektech.com.au; lennart.l...@ericsson.com; > praveenmalv...@users.sf.net > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 1 of 1] lga: ignore CLM status if client Initialize with A.2.1 > [#1999] > > osaf/libs/agents/saf/lga/lga.h | 1 + > osaf/libs/agents/saf/lga/lga_api.c | 4 +++- > osaf/libs/agents/saf/lga/lga_mds.c | 12 +--- > tests/logsv/tet_Log_clm.c | 35 > ++- > 4 files changed, 47 insertions(+), 5 deletions(-) > > > Issue : > If client saLogInitialize with A.2.1 and > linked with New agents lib A.2.2 code is considering CLM status. > > But as explained in osaf/services/saf/logsv/README > old lga clients saLogInitialize with A.02.01 are always clm member, > > Fix: > > If client saLogInitialize() with A.2.1 , irrelevant of linked with New agents lib > A.2.2 code > or linked with Old agents lib A.2.1 code, CLM status will be ignored . > > Test code added two new test cases in testsuite 14 > Please check test case 14 of /usr/bin/logtest -e 14 > > diff --git a/osaf/libs/agents/saf/lga/lga.h b/osaf/libs/agents/saf/lga/lga.h > --- a/osaf/libs/agents/saf/lga/lga.h > +++ b/osaf/libs/agents/saf/lga/lga.h > @@ -79,6 +79,7 @@ typedef struct lga_client_hdl_rec { > * streams are recovered > */ >bool is_stale_client; /* Status of client based on the CLM status of node.*/ > + SaVersionT version; /* the API version is being used by client, used for CLM > status */ > } lga_client_hdl_rec_t; > > /* States of the server */ > diff --git a/osaf/libs/agents/saf/lga/lga_api.c > b/osaf/libs/agents/saf/lga/lga_api.c > --- a/osaf/libs/agents/saf/lga/lga_api.c > +++ b/osaf/libs/agents/saf/lga/lga_api.c > @@ -164,7 +164,8 @@ SaAisErrorT saLogInitialize(SaLogHandleT > /* validate the version */ > if ((version->releaseCode == LOG_RELEASE_CODE) && (version- > >majorVersion <= LOG_MAJOR_VERSION)) { > version->majorVersion = LOG_MAJOR_VERSION; > - version->minorVersion = LOG_MINOR_VERSION; > + if (version->minorVersion != LOG_MINOR_VERSION_0) > + version->minorVersion = LOG_MINOR_VERSION; > } else { > TRACE("version FAILED, required: %c.%u.%u, supported: > %c.%u.%u\n", > version->releaseCode, version->majorVersion, version- > >minorVersion, > @@ -264,6 +265,7 @@ SaAisErrorT saLogInitialize(SaLogHandleT > > /* pass the handle value to the appl */ > *logHandle = lga_hdl_rec->local_hdl; > + lga_hdl_rec->version = *version; > > err: > /* free up the response message */ > diff --git a/osaf/libs/agents/saf/lga/lga_mds.c > b/osaf/libs/agents/saf/lga/lga_mds.c > --- a/osaf/libs/agents/saf/lga/lga_mds.c > +++ b/osaf/libs/agents/saf/lga/lga_mds.c > @@ -545,9 +545,15 @@ static uint32_t lga_lgs_msg_proc(lga_cb_ > library handles and all resources > associated with these handles. Hence, it is recommended > for the processes to finalize the > library handles as soon as the processes > detect that the node left the > membership.*/ > - lga_hdl_rec->is_stale_client = true; > - TRACE("CLM_NODE callback > is_stale_client: %d clm_node_state: %d", > - lga_hdl_rec- > >is_stale_client, cb->clm_node_state); > + > + /*Old LGA clients A.02.01 are always > clm member */ > + if (!((lga_hdl_rec- > >version.releaseCode == LOG_RELEASE_CODE_0) && > + (lga_hdl_rec- > >version.majorVersion == LOG_MAJOR_VERSION_0) && > + (lga_hdl_rec- > >version.minorVersion == LOG_MINOR_VERSION_0))) { > + lga_hdl_rec->is_stale_client > = true; > + TRACE("CLM_NODE callback > is_stale_client: %d clm_node_state: %d", > + lga_hdl_rec- > >is_stale_client, cb->clm_node_state); > + } > } > lga_msg_destroy(lgsv_msg); > } > diff --git a/tests/logsv/tet_Log_clm.c b/tests/logsv/tet_Log_clm.c > --- a/tests/logsv/tet_Log_clm.c > +++ b/tests/logsv/tet_Log_clm.c > @@ -95,7 +95,7 @@ void saLogInitializ_14_02(void) > rc = saLogInitialize(, , > ); > saLogFinalize(logHandle); > unlockNode(); > - test_validate(rc, SA_AIS_ERR_UNAVAILABLE); > +
Re: [devel] [PATCH 1 of 1] AMFND: Fix amfnd coredump if sc failover while shutting down [#2008]
Hi, I saw the ticket is pushed. But there is another simpler way of fixing it on top of previous fix of similar problem which was fixed in avnd_evt_avd_info_su_si_assign_evh() with following code: if ((cb->term_state == AVND_TERM_STATE_OPENSAF_SHUTDOWN_INITIATED) || (cb->term_state == AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED)) { if ((su->is_ncs == true) && (info->msg_act == AVSV_SUSI_ACT_MOD) && (info->ha_state == SA_AMF_HA_ACTIVE)) { LOG_NO("shutdown started, failover requested, escalate to forced shutdown"); avnd_last_step_clean(cb); } else { Above fix does not distinguish between removal of assignments and clean ups of comps during shutdown phase. These are two sub-phases. So simpler fix could be to distinguish between these two sub-phases by introducing a new flag like AVND_TERM_STATE_OPENSAF_SHUTDOWN_CLEANUP_STARTED. Thanks, Praveen On 08-Sep-16 5:31 PM, Minh Hon Chau wrote: > osaf/libs/common/amf/include/amf_db_template.h | 11 +++ > osaf/services/saf/amf/amfnd/clc.cc | 6 ++ > osaf/services/saf/amf/amfnd/term.cc| 16 ++-- > 3 files changed, 19 insertions(+), 14 deletions(-) > > > During cluster shutting down phase, if both controllers do not shutdown > fast enough and active controller goes down first, then a possibility of > sc failover happens. In this situation, avnd_last_step_clean() gets called > twice, a coredump is generated. It most likely because deleting record in > nodeid_mdsdest_db and hctypedb but those container still own the key. Thus, > the second call of avnd_last_step_clean() cause coredump. > > Patch ensure deletion of nodeid_mdsdest_db and hctypedb is the last step > before exit > > diff --git a/osaf/libs/common/amf/include/amf_db_template.h > b/osaf/libs/common/amf/include/amf_db_template.h > --- a/osaf/libs/common/amf/include/amf_db_template.h > +++ b/osaf/libs/common/amf/include/amf_db_template.h > @@ -73,6 +73,7 @@ class AmfDb { >public: > unsigned int insert(const Key , T *obj); > void erase(const Key ); > + void deleteAll(); > T *find(const Key ); > T *findNext(const Key ); > > @@ -120,6 +121,16 @@ void AmfDb::erase(const Key > > // > template > +void AmfDb ::deleteAll() { > + for (const auto& it: db) { > + delete it.second; > + } > + db.clear(); > +} > + > + > +// > +template > T *AmfDb ::find(const Key ) { >typename AmfDbMap::iterator it = db.find(key); >if (it == db.end()) > diff --git a/osaf/services/saf/amf/amfnd/clc.cc > b/osaf/services/saf/amf/amfnd/clc.cc > --- a/osaf/services/saf/amf/amfnd/clc.cc > +++ b/osaf/services/saf/amf/amfnd/clc.cc > @@ -815,6 +815,8 @@ uint32_t avnd_comp_clc_fsm_run(AVND_CB * > if (all_comps_terminated()) { > LOG_NO("Terminated all AMF components"); > LOG_NO("Shutdown completed, exiting"); > + cb->nodeid_mdsdest_db.deleteAll(); > + cb->hctypedb.deleteAll(); > exit(0); > } else { > TRACE("Do nothing"); > @@ -2301,6 +2303,8 @@ uint32_t avnd_comp_clc_terming_cleansucc > if (all_comps_terminated()) { > LOG_NO("Terminated all AMF components"); > LOG_NO("Shutdown completed, exiting"); > + cb->nodeid_mdsdest_db.deleteAll(); > + cb->hctypedb.deleteAll(); > exit(0); > } > } > @@ -2362,6 +2366,8 @@ uint32_t avnd_comp_clc_terming_cleanfail > all_comps_terminated()) { > LOG_WA("Terminated all AMF components (with failures)"); > LOG_NO("Shutdown completed, exiting"); > + cb->nodeid_mdsdest_db.deleteAll(); > + cb->hctypedb.deleteAll(); > exit(0); > } > > diff --git a/osaf/services/saf/amf/amfnd/term.cc > b/osaf/services/saf/amf/amfnd/term.cc > --- a/osaf/services/saf/amf/amfnd/term.cc > +++ b/osaf/services/saf/amf/amfnd/term.cc > @@ -58,8 +58,6 @@ extern const AVND_EVT_HDLR g_avnd_func_l > void avnd_last_step_clean(AVND_CB *cb) > { > AVND_COMP *comp; > - AVND_NODEID_TO_MDSDEST_MAP *rec = nullptr; > - AVND_HCTYPE *hc = nullptr; > int cleanup_call_cnt = 0; > > TRACE_ENTER(); > @@ -86,21 +84,11 @@ void avnd_last_step_clean(AVND_CB *cb) > /* Stop was called early or some other problem */ > if (cleanup_call_cnt == 0) { > LOG_NO("No component to terminate, exiting"); > + cb->nodeid_mdsdest_db.deleteAll(); > + cb->hctypedb.deleteAll(); > exit(0); > } > > -
Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987]
Hi Minh, I could not understand why AMF should become implementer/applier earlier. Anyways, please find one query below with [Praveen]. Thanks, Praveen On 05-Sep-16 7:13 AM, Minh Hon Chau wrote: > osaf/services/saf/amf/amfd/include/node.h | 3 + > osaf/services/saf/amf/amfd/include/sg.h | 1 + > osaf/services/saf/amf/amfd/nodegroup.cc | 83 > +++ > osaf/services/saf/amf/amfd/role.cc| 20 +++--- > osaf/services/saf/amf/amfd/sg.cc | 15 ++-- > osaf/services/saf/amf/amfd/sgproc.cc | 2 +- > osaf/services/saf/amf/amfd/siass.cc | 4 +- > 7 files changed, 109 insertions(+), 19 deletions(-) > > > The SG becomes unstable because some variables used in nodegroup operation > are not > restored after headless if this admin operation on nodegroup was interrupted > just > before cluster goes into headless stage. > > In order to restore nodegroup operation, AMF needs to know exactly whether > nodegroup > operation was running during headless up on @susi assignment. If susi is in > QUIESCED, > QUIESCING or being removed while its related entities su, si, sg are not in > LOCKED > and SHUTTING_DOWN, that means either node or nodegroup MUST be in LOCKED or > SHUTTING > DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to know a > nodegroup > operation was running. However, if saAmfNGAdminState is in LOCKED, this case > is an > ambiguity of locking a node. The reason of differentiation of locking a node > or node > group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and AVD_SG_FSM_SU_OPER > for node > group operation while AVD_SG_FSM_SU_OPER is only used for node operation. > When 2N SG > uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is borrowed > (but not > updated to IMM) to run the admin operation sequence. Therefore, after > headless if > AVD_SG_FSM_SG_ADMIN was being used for nodegroup then saAmfSGAdminState also > needs to > be set. > > Because SG FSM state is used to restore nodegroup during restoring susi > assignment, > the osaAmfSGFsmState (RTA) needs to be read earlier than reading susi > assignment. This > needs active AMFD become implementer earlier than reading sg object. There > was a known > ticket reported in 1720, if only make active AMFD as early implementer than > it will > cause the standby AMFD missing ccb apply callback.This patch also needs to > set both > active and standby AMFD become implementer and applier earlier so that AMFD > can read > osaAmfSGFsmState and do not cause regression of 1720. > > diff --git a/osaf/services/saf/amf/amfd/include/node.h > b/osaf/services/saf/amf/amfd/include/node.h > --- a/osaf/services/saf/amf/amfd/include/node.h > +++ b/osaf/services/saf/amf/amfd/include/node.h > @@ -45,6 +45,7 @@ > #include > #include > #include > +#include > > class AVD_SU; > struct avd_cluster_tag; > @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF > extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT state); > extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node); > extern bool any_ng_in_locked_in_state(const AVD_AVND *node); > +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct avd_su_si_rel_tag* > susi); > + > #endif > diff --git a/osaf/services/saf/amf/amfd/include/sg.h > b/osaf/services/saf/amf/amfd/include/sg.h > --- a/osaf/services/saf/amf/amfd/include/sg.h > +++ b/osaf/services/saf/amf/amfd/include/sg.h > @@ -46,6 +46,7 @@ > class AVD_SU; > class AVD_SI; > class AVD_APP; > +class AVD_AMF_NG; > > /* The valid SG FSM states. */ > typedef enum { > diff --git a/osaf/services/saf/amf/amfd/nodegroup.cc > b/osaf/services/saf/amf/amfd/nodegroup.cc > --- a/osaf/services/saf/amf/amfd/nodegroup.cc > +++ b/osaf/services/saf/amf/amfd/nodegroup.cc > @@ -1356,3 +1356,86 @@ void avd_ng_constructor(void) > ng_ccb_apply_cb); > } > > +/* > + * @brief: Restore nodegroup related variables which are used in nodegroup > + * operations before headless. The main variables are @admin_ng, > + * @ng_using_saAmfSGAdminState, and the saAmfSGAdminState which > + * is borrowed by nodegroup operation on 2N SG > + * @param: control block, and susi is being read after headless > + * @return: void > + */ > + > +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct avd_su_si_rel_tag* > susi) { > + > + TRACE_ENTER(); > + osafassert(cb->scs_absence_max_duration > 0); > + // In order to restore nodegroup operation, AMF needs to know exactly > + // whether nodegroup operation was running during headless up on @susi > + // If susi is in QUIESCED/QUIESCING or being removed while its related > + // entities su, si, sg are not in LOCKED and SHUTTING_DOWN, that means > + // either node or nodegroup MUST be in LOCKED or SHUTTING_DOWN. > + // In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to know > + // a nodegroup operation was running.