Re: [devel] [PATCH 0 of 1] Review Request for Review Request for smfd: Merge rolling to singlestep procedures for several nodes [#1685]

2016-09-09 Thread Lennart Lund
Hi Neel

A document ticket is created see [#2017] and document update is ongoing

Thanks
Lennart

> -Original Message-
> From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]
> Sent: den 8 september 2016 14:30
> To: Rafael Odzakow ; Lennart Lund
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 0 of 1] Review Request for Review Request for smfd:
> Merge rolling to singlestep procedures for several nodes [#1685]
> 
> Hi Rafel,
> 
> Reviewed and tested the patch.
> Ack with the following comments.
> 
> comments:
> 
> 1.  The bt  has to be corrected:
> 
> Program terminated with signal 11, Segmentation fault.
> #0  0x004293c9 in SmfUpgradeCampaign::execute() () at
> SmfUpgradeCampaign.cc:802
> 802 SmfUpgradeCampaign.cc: No such file or directory.
> 
> 2.
> 
> when controller nodes given to nodesForSingleStep needs to be returned
> with error
> 
> 3.
> The usage of this functionality must be documented in SMF_PR
> 
> Thanks,
> Neel.
> 
> On 2016/09/05 01:24 PM, Rafael Odzakow wrote:
> > Summary: Merge rolling to singlestep procedures for several nodes
> > Review request for Trac Ticket(s): #1685
> > Peer Reviewer(s): lennart, reddy
> > Pull request to: <>
> > Affected branch(es): <>
> > Development branch: <>
> >
> > 
> > Impacted area   Impact y/n
> > 
> >   Docsn
> >   Build systemn
> >   RPM/packaging   n
> >   Configuration files n
> >   Startup scripts n
> >   SAF servicesn
> >   OpenSAF servicesy
> >   Core libraries  n
> >   Samples n
> >   Tests   n
> >   Other   n
> >
> >
> > Comments (indicate scope for each "y" above):
> > -
> >   Added SI-SWAP and protection for openSafSmfExecControl attribute in
> SmfConfig.
> >
> >   Protection is added to keep backward compatibility while not allowing
> changes
> >   to happen during upgrade to the Execution Control DN. As a bonus some
> old code
> >   was possible to remove.
> >
> > changeset 3850c8ad75dcb065cb6eddf94f50958fcce63e60
> > Author: Rafael Odzakow 
> > Date:   Mon, 05 Sep 2016 09:47:53 +0200
> >
> > smfd: Merge rolling to singlestep procedures for several nodes
> [#1685]
> >
> > Enables the balanced mode feature. This mode changes the execution
> > of rolling procedures to be merged into one or several single steps that are
> > spread out across the cluster nodes. This feature is used to give a faster
> > upgrade time compared to rolling one node at a time, possibly several
> times
> > for each node. By splittting the procedures it into several single steps
> > across the nodes a total service outage can be avoided.
> >
> > Added Files:
> > 
> >   osaf/services/saf/smfsv/smfd/SmfExecControl.cc
> >   osaf/services/saf/smfsv/smfd/SmfExecControl.h
> >   osaf/services/saf/smfsv/smfd/SmfExecControlHdl.cc
> >   osaf/services/saf/smfsv/smfd/SmfExecControlHdl.h
> >
> >
> > Complete diffstat:
> > --
> >   osaf/services/saf/smfsv/config/smfsv_classes.xml|  1084
> ++
> --
> -
> >   osaf/services/saf/smfsv/smfd/Makefile.am| 8 +-
> >   osaf/services/saf/smfsv/smfd/SmfCampState.cc|   151 
> > 
> 
> >   osaf/services/saf/smfsv/smfd/SmfCampaign.cc |77 +++-
> >   osaf/services/saf/smfsv/smfd/SmfCampaign.hh | 1 +
> >   osaf/services/saf/smfsv/smfd/SmfExecControl.cc  |   363
> ++
> +
> >   osaf/services/saf/smfsv/smfd/SmfExecControl.h   |59 +
> >   osaf/services/saf/smfsv/smfd/SmfExecControlHdl.cc   |   506
> ++
> +
> >   osaf/services/saf/smfsv/smfd/SmfExecControlHdl.h|   126
> 
> >   osaf/services/saf/smfsv/smfd/SmfProcState.cc|28 -
> >   osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc  |39 --
> >   osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.hh  |35 ++---
> >   osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.cc |27 +++-
> >   osaf/services/saf/smfsv/smfd/SmfUpgradeProcedure.hh |18 ++-
> >   osaf/services/saf/smfsv/smfd/SmfUpgradeStep.cc  |49 
> >   osaf/services/saf/smfsv/smfd/SmfUpgradeStep.hh  |13 +-
> >   osaf/services/saf/smfsv/smfd/smfd.h | 2 +-
> >   osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc|32 -
> >   osaf/services/saf/smfsv/smfd/smfd_cb.h  | 5 -
> >   19 files changed, 1842 insertions(+), 781 deletions(-)
> 

Re: [devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]

2016-09-09 Thread Mathivanan Naickan Palanivelu
Hi Zoran,

 I realized that the "numberOfItems" related argument is invalid after 
discussing with the ticket raiser,
I misunderstood the word "crashed". So, I will remove that part of the text 
from the commit message. Thanks to you also for clarifying.

Apart from that, the point of contention is about the assert behavior.
You have a point that services have to be strict when handling bad SaNameT 
values. 
But that principle can be applied only to the extent of returning an API error 
code and we don't have to assert the user application.

Furthermore, what the user passes is irrelevant in this CLM API's context 
because in this case
it is an out-param. i.e. CLM will fill values in that structure before the call 
returns.
What user specifies for an out-param is of no relevance and that's why this 
patch.

Thanks,
Mathi.

> -Original Message-
> From: Zoran Milinkovic [mailto:zoran.milinko...@ericsson.com]
> Sent: Friday, September 09, 2016 4:38 PM
> To: Mathivanan Naickan Palanivelu
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm
> service is filling data [#2002]
> 
> Hi Mathi,
> 
> User cannot have bigger numberOfItem than the structure is allocated.
> That's a user's fault and it must be detected in some way.
> 
> If wrong input data is not detected at this place, it must be detected in
> another place.
> 
> All at all, I don't see anything wrong with the code. I don't see that the 
> test
> case is a valid test case.
> 
> With the support for long DN, CLM has become more strict for bad SaNameT
> values, and assert bad SaNameT values.
> 
> If you want to avoid assert, then the code can be rewritten to handle non-
> extended SaNameT values longer than 255 characters.
> 
> BR,
> Zoran
> 
> 
> -Original Message-
> From: mathi.naic...@oracle.com [mailto:mathi.naic...@oracle.com]
> Sent: den 9 september 2016 11:40
> To: Zoran Milinkovic
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm
> service is filling data [#2002]
> 
>  osaf/libs/agents/saf/clma/clma_api.c |  23 ---
>  1 files changed, 0 insertions(+), 23 deletions(-)
> 
> 
> When CLM is going to fill the values of notification buffer, there is no need 
> to
> check the length values that is passed by the user.
> Also, there can be cases when user has a bigger numofItems, but has
> allocated lesser than that.
> In such cases, dereferencing the notification can be avoided inside this
> function.
> 
> diff --git a/osaf/libs/agents/saf/clma/clma_api.c
> b/osaf/libs/agents/saf/clma/clma_api.c
> --- a/osaf/libs/agents/saf/clma/clma_api.c
> +++ b/osaf/libs/agents/saf/clma/clma_api.c
> @@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b
> 
>   /* validate the notify buffer */
>   if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
> - uint32_t i;
> -
>   if (!buf->numberOfItems)
>   return SA_AIS_ERR_INVALID_PARAM;
> -
> - // Check that nodeName is not longer than 255
> - for(i=0; inumberOfItems; i++) {
> - if(osaf_extended_name_length(
> >notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) {
> - return SA_AIS_ERR_INVALID_PARAM;
> - }
> - }
>   }
> 
>   /* Validate if flag is TRACK_CURRENT and no callback and no buffer
> provided */ @@ -167,24 +158,10 @@ static SaAisErrorT
> clma_validate_flags_b
> 
>   /* validate the notify buffer */
>   if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
> - uint32_t i;
> -
>   if (!buf->numberOfItems) {
>   TRACE_LEAVE();
>   return SA_AIS_ERR_INVALID_PARAM;
>   }
> -
> - // Check that nodeName and EE are not longer than 255
> - for(i=0; inumberOfItems; i++) {
> - if(osaf_extended_name_length(
> >notification[i].clusterNode.nodeName) >= SA_MAX_NAME_LENGTH) {
> - TRACE_LEAVE();
> - return SA_AIS_ERR_INVALID_PARAM;
> - }
> - if(osaf_extended_name_length(
> >notification[i].clusterNode.executionEnvironment) >=
> SA_MAX_NAME_LENGTH) {
> - TRACE_LEAVE();
> - return SA_AIS_ERR_INVALID_PARAM;
> - }
> - }
>   }
> 
>   /* Validate if flag is TRACK_CURRENT and no callback and no buffer
> provided */

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]

2016-09-09 Thread Zoran Milinkovic
Hi Mathi,

notificationBuffer in only OUT parameter, and input parameters should not be 
checked.
Ack from me.

Thanks,
Zoran

-Original Message-
From: mathi.naic...@oracle.com [mailto:mathi.naic...@oracle.com] 
Sent: den 9 september 2016 11:40
To: Zoran Milinkovic
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service 
is filling data [#2002]

 osaf/libs/agents/saf/clma/clma_api.c |  23 ---
 1 files changed, 0 insertions(+), 23 deletions(-)


When CLM is going to fill the values of notification buffer, there is no need 
to check
the length values that is passed by the user.
Also, there can be cases when user has a bigger numofItems, but has allocated 
lesser than that.
In such cases, dereferencing the notification can be avoided inside this 
function.

diff --git a/osaf/libs/agents/saf/clma/clma_api.c 
b/osaf/libs/agents/saf/clma/clma_api.c
--- a/osaf/libs/agents/saf/clma/clma_api.c
+++ b/osaf/libs/agents/saf/clma/clma_api.c
@@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b
 
/* validate the notify buffer */
if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
-   uint32_t i;
-
if (!buf->numberOfItems)
return SA_AIS_ERR_INVALID_PARAM;
-
-   // Check that nodeName is not longer than 255
-   for(i=0; inumberOfItems; i++) {
-   
if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= 
SA_MAX_NAME_LENGTH) {
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   }
}
 
/* Validate if flag is TRACK_CURRENT and no callback and no buffer 
provided */
@@ -167,24 +158,10 @@ static SaAisErrorT clma_validate_flags_b
 
/* validate the notify buffer */
if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
-   uint32_t i;
-
if (!buf->numberOfItems) {
TRACE_LEAVE();
return SA_AIS_ERR_INVALID_PARAM;
}
-
-   // Check that nodeName and EE are not longer than 255
-   for(i=0; inumberOfItems; i++) {
-   
if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= 
SA_MAX_NAME_LENGTH) {
-   TRACE_LEAVE();
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   
if(osaf_extended_name_length(>notification[i].clusterNode.executionEnvironment)
 >= SA_MAX_NAME_LENGTH) {
-   TRACE_LEAVE();
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   }
}
 
/* Validate if flag is TRACK_CURRENT and no callback and no buffer 
provided */

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 2 of 4] AMFND: Admin operation continuation if csi completes during headless [#1725 part 1] V1

2016-09-09 Thread Nagendra Kumar
Hi Minh,
I am using 1725_pending_review.tgz 
(1725_02_V2_bugfix_01_resend_buffer_in_set_leds.diff, 
1725_02_V2_bugfix_02_honor_clusterinit_nodesync_timer.diff, 
1725_02_V2_bugfix_03_restore_ng_admin.diff, 
1725_03_V4_failover_absent_susi_longDn.diff, 
1725_04_V2_headless_validation.diff, 1725_05_V2_resend_oper_state.diff, 
1725_06a_fullscope_escalation_headless.diff).

I am doing basic node reboot validation testing with no faults.

Configuration: SU1(act) and SU2(stanby) both on PL-3.

TC #1: Start SC-1, PL-3 and PL-5: Unlock SU1 and SU2. Stop SC-1 and stop PL-3, 
start PL-3 and start SC-1.
After SC-1 and PL-3 comes back, ideally SU1 and SU2 should get assignments as 
Act and Std, but no assignment are being given to SUs on PL-3 and it shows 
following in status:

Only Su2 has Std assignment.

safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

TC #2: Configuration same as TC#1. Stop PL-3 and don't start. The same issue:
safSISU=safSu=PL-5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1
saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

TC #3: Configured SU1(Act) on PL-3 and SU2(Std) on PL-4.
Stop SC-1, stop PL-3 and PL-4, but PL-5 is running. start SC-1, the same issue.

TC #4: Same as TC #3, but SU3 configured on PL-5 as spare. SU3 doesn't get any 
assignment and Sg is unstable.

Thanks
-Nagu

> -Original Message-
> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
> Sent: 18 August 2016 05:46
> To: hans.nordeb...@ericsson.com; Nagendra Kumar; Praveen Malviya;
> gary@dektech.com.au; long.hb.ngu...@dektech.com.au;
> minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 2 of 4] AMFND: Admin operation continuation if csi
> completes during headless [#1725 part 1] V1
> 
>  osaf/services/saf/amf/amfnd/di.cc |  199 
> +
>  osaf/services/saf/amf/amfnd/include/avnd_di.h |1 +
>  2 files changed, 134 insertions(+), 66 deletions(-)
> 
> 
> There're two options basically that AMFD can continue admin operation wih
> completed csi(s)
> 
> First: AMFD can use the sync SUSI fsm state as latest, AMFD then has to
> explore its SUSI assignments with adminStates of relevant entities to
> determine which SU should be on call of susi_success(). Deeper level of
> exploration for csi addition. It also depends on SG Fsm state which is being
> used variously in different SG types.
> 
> Second: AMFD uses the SUSI fsm state read from IMM as latest, and AMFND
> needs to resend susi_resp messages which were deferred during headless so
> that AMFD can continue the admin operation sequence. Both cases of csi
> completion [during or after] headless can run in the same code flow.
> 
> The patch buffers susi_resp_msg during headless stage and resend it to
> AMFD after headless. There could be a chance that AMFND sent out susi
> response message but AMFD could not receive or process it. This case could
> be seen as a defect, which can be fixed by securing the result of sending
> susi_resp message from AMFND toward AMFD.
> 
> diff --git a/osaf/services/saf/amf/amfnd/di.cc
> b/osaf/services/saf/amf/amfnd/di.cc
> --- a/osaf/services/saf/amf/amfnd/di.cc
> +++ b/osaf/services/saf/amf/amfnd/di.cc
> @@ -805,11 +805,6 @@ uint32_t avnd_di_susi_resp_send(AVND_CB
>   if (cb->term_state ==
> AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED)
>   return rc;
> 
> - if (cb->is_avd_down == true) {
> -m_AVND_SU_ALL_SI_RESET(su);
> - return rc;
> - }
> -
>   // should be in assignment pending state to be here
>   osafassert(m_AVND_SU_IS_ASSIGN_PEND(su));
> 
> @@ -820,64 +815,76 @@ uint32_t avnd_di_susi_resp_send(AVND_CB
>   TRACE_ENTER2("Sending Resp su=%s, si=%s, curr_state=%u,
> prv_state=%u", su->name.value, curr_si->name.value,curr_si-
> >curr_state,curr_si->prv_state);
>   /* populate the susi resp msg */
>   msg.info.avd = new AVSV_DND_MSG();
> -msg.type = AVND_MSG_AVD;
> -msg.info.avd->msg_type = AVSV_N2D_INFO_SU_SI_ASSIGN_MSG;
> -msg.info.avd->msg_info.n2d_su_si_assign.msg_id = ++(cb-
> >snd_msg_id);
> -msg.info.avd->msg_info.n2d_su_si_assign.node_id = 

[devel] [PATCH 0 of 1] Review Request for amfd: do not flush imm jobs on active controller[#2016]

2016-09-09 Thread praveen . malviya
Summary: amfd: do not flush imm jobs on active controller[#2016] 
Review request for Trac Ticket(s): #2016 
Peer Reviewer(s): AMf devs 
Pull request to: <>
Affected branch(es): ALL 
Development branch: <>


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 21daf300e419eed5d9d977bc366451a28506ecfe
Author: praveen.malv...@oracle.com
Date:   Fri, 09 Sep 2016 17:41:00 +0530

amfd: do not flush imm jobs on active controller[#2016]

AMF pushes IMM updates in a queue to be updated after finishing main 
job.
currently, standby amfd as a applier flushes job queue if number of 
elements
reaches to 200. Since check is on implementer status, active controller 
may
flush it when it loses implementer status and goes for reinit with 
IMM(one
case is when IMM returns BAD_HANDLE, a rare case).

Patch checks the role of amfd besides implemeter status before flusing 
the
job queue.


Complete diffstat:
--
 osaf/services/saf/amf/amfd/imm.cc |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


Testing Commands:
-
Not able to reproduce.
Simulated by forcing AMFD to finalize with IMM.

Testing, Expected Results:
--
Problem observed while simulating it.
Passed with this patch.

Conditions of Submission:
-
Ack from any reviewer.

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1 of 1] amfd: do not flush imm jobs on active controller[#2016]

2016-09-09 Thread praveen . malviya
 osaf/services/saf/amf/amfd/imm.cc |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


AMF pushes IMM updates in a queue to be updated after finishing main job.
currently, standby amfd as a applier flushes job queue if number of elements
reaches to 200. Since check is on implementer status, active controller may
flush it when it loses implementer status and goes for reinit with IMM(one case
is when IMM returns BAD_HANDLE, a rare case).

Patch checks the role of amfd besides implemeter status before flusing the
job queue.

diff --git a/osaf/services/saf/amf/amfd/imm.cc 
b/osaf/services/saf/amf/amfd/imm.cc
--- a/osaf/services/saf/amf/amfd/imm.cc
+++ b/osaf/services/saf/amf/amfd/imm.cc
@@ -385,7 +385,7 @@ AvdJobDequeueResultT Fifo::execute(SaImm
if (!avd_cb->active_services_exist)
return JOB_ETRYAGAIN;
 
-   if (!avd_cb->is_implementer) {
+   if ((!avd_cb->is_implementer) && (avd_cb->avail_state_avd == 
SA_AMF_HA_STANDBY)) {
check_and_flush_job_queue_standby_amfd();
return JOB_EINVH;
}

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] ntf: cluster rebooted with ntfd crashed on both controllers [#2006]

2016-09-09 Thread praveen malviya
Hi Vu,

Please find one comment below.

Thanks,
Praveen

On 08-Sep-16 5:24 PM, Vu Minh Nguyen wrote:
>  osaf/libs/agents/saf/ntfa/ntfa_api.c   |  18 +
>  osaf/libs/common/ntfsv/include/ntfsv_msg.h |   1 -
>  osaf/libs/common/ntfsv/ntfsv_mem.c |   2 +-
>  tests/ntfsv/tet_ntf_common.c   |   5 +-
>  tests/ntfsv/tet_saNtfNotificationSend.c|  92 
> ++
>  5 files changed, 113 insertions(+), 5 deletions(-)
>
>
> In AIS, it states "additionalText length must be consistent with 
> lengthAdditionalText"
> But NTFA did not check this. So, when data is passing to LOGA, ntfsv got 
> invalid param.
>
> This patch adds the check. Also fix few error in ntftest - using `strlen` to 
> calculate
> the string length instead of `sizeof`.
>
> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c 
> b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c
> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c
> @@ -550,6 +550,24 @@ static SaAisErrorT checkHeader(v_data *p
>   return rc;
>   }
>
> + // AIS: additionalText must be consistent with lengthAdditionalText
> + if (nh->additionalText == NULL && nh->lengthAdditionalText != 0) {
> + TRACE_1("Mismatch b/w additionalText and lengthAdditionalText");
> + return SA_AIS_ERR_INVALID_PARAM;
> + }
> +
> + if (nh->lengthAdditionalText > MAX_ADDITIONAL_TEXT_LENGTH) {
> + TRACE_1("lengthAdditionalText is too long");
> + return SA_AIS_ERR_INVALID_PARAM;
> + }
[Paveen] I think above check should also be done as a part of 
NotifocationAllocate() APIs because it will be too late to return 
INVALID_PARAM in Send() and force user to reallocate. I know 
MAX_ADDITIONAL_TEXT_LENGTH is very high.
What do you think?
> +

> + SaUint16T len = nh->lengthAdditionalText;
> + SaStringT addT = nh->additionalText;
> + if ((addT != NULL) && (len != strlen(addT) + 1)) {
> + TRACE_1("Mismatch b/w additionalText and lengthAdditionalText");
> + return SA_AIS_ERR_INVALID_PARAM;
> + }
> +
>   return SA_AIS_OK;
>  }
>
> diff --git a/osaf/libs/common/ntfsv/include/ntfsv_msg.h 
> b/osaf/libs/common/ntfsv/include/ntfsv_msg.h
> --- a/osaf/libs/common/ntfsv/include/ntfsv_msg.h
> +++ b/osaf/libs/common/ntfsv/include/ntfsv_msg.h
> @@ -34,7 +34,6 @@ extern "C" {
>  /*MAX length of addtionaltext conforms to MAX value of logMaxLogrecsize
>as mentioned in LOGSV PR doc Section 3.5.2.1*/
>  #define MAX_ADDITIONAL_TEXT_LENGTH 65535
> -#define MAX_DISCARDED_NOTIFICATIONS 1024
>
>  /* Message type enums */
>  typedef enum {
> diff --git a/osaf/libs/common/ntfsv/ntfsv_mem.c 
> b/osaf/libs/common/ntfsv/ntfsv_mem.c
> --- a/osaf/libs/common/ntfsv/ntfsv_mem.c
> +++ b/osaf/libs/common/ntfsv/ntfsv_mem.c
> @@ -222,7 +222,7 @@ SaAisErrorT ntfsv_alloc_ntf_header(SaNtf
>
>   /* Additional text */
>   if (lengthAdditionalText != 0) {
> - notificationHeader->additionalText = 
> (SaStringT)malloc(lengthAdditionalText * sizeof(char));
> + notificationHeader->additionalText = (SaStringT)calloc(1, 
> lengthAdditionalText * sizeof(char));
>   if (notificationHeader->additionalText == NULL) {
>   TRACE("Out of memory in additionalText field");
>   rc = SA_AIS_ERR_NO_MEMORY;
> diff --git a/tests/ntfsv/tet_ntf_common.c b/tests/ntfsv/tet_ntf_common.c
> --- a/tests/ntfsv/tet_ntf_common.c
> +++ b/tests/ntfsv/tet_ntf_common.c
> @@ -797,7 +797,7 @@ void createObjectCreateDeleteNotificatio
>   ntfHandle, /* handle to Notification Service instance */
>   myObjCrDelNotification,
>   0,
> - (SaUint16T)(sizeof(DEFAULT_ADDITIONAL_TEXT) +1),
> + (SaUint16T)(strlen(DEFAULT_ADDITIONAL_TEXT) +1),
>   0,
>   2,
>   SA_NTF_ALLOC_SYSTEM_LIMIT), SA_AIS_OK);
> @@ -840,7 +840,7 @@ void createObjectCreateDeleteNotificatio
>   /* set additional text and additional info */
>   (void) strncpy(head->additionalText,
>   DEFAULT_ADDITIONAL_TEXT,
> - (SaUint16T)(sizeof(DEFAULT_ADDITIONAL_TEXT) +1));
> + (SaUint16T)(strlen(DEFAULT_ADDITIONAL_TEXT) +1));
>
>   /* Set source indicator */
>   *(myObjCrDelNotification->sourceIndicator) = SA_NTF_UNKNOWN_OPERATION;
> @@ -852,7 +852,6 @@ void createObjectCreateDeleteNotificatio
>   myObjCrDelNotification->objectAttributes[1].attributeId = 1;
>   myObjCrDelNotification->objectAttributes[1].attributeType = 
> SA_NTF_VALUE_INT32;
>   myObjCrDelNotification->objectAttributes[1].attributeValue.int32Val = 2;
> -
>  }
>
>
> diff --git a/tests/ntfsv/tet_saNtfNotificationSend.c 
> b/tests/ntfsv/tet_saNtfNotificationSend.c
> --- a/tests/ntfsv/tet_saNtfNotificationSend.c
> +++ 

Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987]

2016-09-09 Thread praveen malviya
Hi Minh,

Please find inline.

Thanks,
Praveen

On 09-Sep-16 4:30 PM, minh chau wrote:
> Hi Praveen,
>
> Please see comment in line with [Minh]
>
> Thanks,
> Minh
>
> On 09/09/16 17:06, praveen malviya wrote:
>> Hi Minh,
>>
>> I could not understand why AMF should become implementer/applier earlier.
> [Minh]: We need to read osaAmfSGFsmState for differentiation of
> nodegroup operation while exploring SUSI assignment. osafAmfSGFsmState
> needs to be available before avd_susi_read_headless_cached_rta().
> avd_susi_read_headless_cached_rta() needs to be available before
> avd_sg_read_headless_cached_rta() for purpose of checking
> SUOperationList. So I think it's the best if we can retrieve
> osafAmfSGFsmState in avd_sg_config_get(). To read osafAmfSGFsmState as
> RTA, AMF needs to be implementer before reading RTA, otherwise the
> returned value from IMM is dummy (most of the time I got it as 0). If
> there's not simpler way to do, I would go for reading osafAmfSGFsmState
> in avd_sg_config_get(), but we also need to set applier for standby AMFD
> before avd_sg_config_get() to avoid issue in #1720.
[Praveen] For runtime non-cached attributes, IMM gives callback to 
implementer to fetch the latest value. So if implementer is not set, 
then a client like immlist may face a problem to get the latest value.

But osafAmfSGFsmState is a runtime cached attribute, IMM should return 
the value from its database.
Is it a bug or an IMM restriction that if implementer is not registered 
then it will return dummy value?

>> Anyways, please find one query below with [Praveen].
>>
>> Thanks,
>> Praveen
>>
>> On 05-Sep-16 7:13 AM, Minh Hon Chau wrote:
>>>  osaf/services/saf/amf/amfd/include/node.h |   3 +
>>>  osaf/services/saf/amf/amfd/include/sg.h   |   1 +
>>>  osaf/services/saf/amf/amfd/nodegroup.cc   |  83
>>> +++
>>>  osaf/services/saf/amf/amfd/role.cc|  20 +++---
>>>  osaf/services/saf/amf/amfd/sg.cc  |  15 ++--
>>>  osaf/services/saf/amf/amfd/sgproc.cc  |   2 +-
>>>  osaf/services/saf/amf/amfd/siass.cc   |   4 +-
>>>  7 files changed, 109 insertions(+), 19 deletions(-)
>>>
>>>
>>> The SG becomes unstable because some variables used in nodegroup
>>> operation are not
>>> restored after headless if this admin operation on nodegroup was
>>> interrupted just
>>> before cluster goes into headless stage.
>>>
>>> In order to restore nodegroup operation, AMF needs to know exactly
>>> whether nodegroup
>>> operation was running during headless up on @susi assignment. If susi
>>> is in QUIESCED,
>>> QUIESCING or being removed while its related entities su, si, sg are
>>> not in LOCKED
>>> and SHUTTING_DOWN, that means either node or nodegroup MUST be in
>>> LOCKED or SHUTTING
>>> DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to
>>> know a nodegroup
>>> operation was running. However, if saAmfNGAdminState is in LOCKED,
>>> this case is an
>>> ambiguity of locking a node. The reason of differentiation of locking
>>> a node or node
>>> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and
>>> AVD_SG_FSM_SU_OPER for node
>>> group operation while AVD_SG_FSM_SU_OPER is only used for node
>>> operation. When 2N SG
>>> uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is
>>> borrowed (but not
>>> updated to IMM) to run the admin operation sequence. Therefore, after
>>> headless if
>>> AVD_SG_FSM_SG_ADMIN was being used for nodegroup then
>>> saAmfSGAdminState also needs to
>>> be set.
>>>
>>> Because SG FSM state is used to restore nodegroup during restoring
>>> susi assignment,
>>> the osaAmfSGFsmState (RTA) needs to be read earlier than reading susi
>>> assignment. This
>>> needs active AMFD become implementer earlier than reading sg object.
>>> There was a known
>>> ticket reported in 1720, if only make active AMFD as early
>>> implementer than it will
>>> cause the standby AMFD missing ccb apply callback.This patch also
>>> needs to set both
>>> active and standby AMFD become implementer and applier earlier so
>>> that AMFD can read
>>> osaAmfSGFsmState and do not cause regression of 1720.
>>>
>>
>>> diff --git a/osaf/services/saf/amf/amfd/include/node.h
>>> b/osaf/services/saf/amf/amfd/include/node.h
>>> --- a/osaf/services/saf/amf/amfd/include/node.h
>>> +++ b/osaf/services/saf/amf/amfd/include/node.h
>>> @@ -45,6 +45,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  class AVD_SU;
>>>  struct avd_cluster_tag;
>>> @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF
>>>  extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT
>>> state);
>>>  extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node);
>>>  extern bool any_ng_in_locked_in_state(const AVD_AVND *node);
>>> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct
>>> avd_su_si_rel_tag* susi);
>>> +
>>>  #endif
>>> diff --git a/osaf/services/saf/amf/amfd/include/sg.h
>>> b/osaf/services/saf/amf/amfd/include/sg.h
>>> --- 

Re: [devel] Review request for updates to CLM PR FOR 5.1

2016-09-09 Thread Zoran Milinkovic
Hi Mathi,

Ack from me.

Thanks,
Zoran

-Original Message-
From: Mathivanan Naickan Palanivelu [mailto:mathi.naic...@oracle.com] 
Sent: den 9 september 2016 12:23
To: Zoran Milinkovic
Cc: opensaf-devel@lists.sourceforge.net
Subject: Review request for updates to CLM PR FOR 5.1

Hi,
Please find the updated CLM PR doc (with track changes enabled) for review.

Thanks,
Mathi.

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]

2016-09-09 Thread Zoran Milinkovic
Hi Mathi,

User cannot have bigger numberOfItem than the structure is allocated.
That's a user's fault and it must be detected in some way.

If wrong input data is not detected at this place, it must be detected in 
another place.

All at all, I don't see anything wrong with the code. I don't see that the test 
case is a valid test case.

With the support for long DN, CLM has become more strict for bad SaNameT 
values, and assert bad SaNameT values.

If you want to avoid assert, then the code can be rewritten to handle 
non-extended SaNameT values longer than 255 characters.

BR,
Zoran


-Original Message-
From: mathi.naic...@oracle.com [mailto:mathi.naic...@oracle.com] 
Sent: den 9 september 2016 11:40
To: Zoran Milinkovic
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service 
is filling data [#2002]

 osaf/libs/agents/saf/clma/clma_api.c |  23 ---
 1 files changed, 0 insertions(+), 23 deletions(-)


When CLM is going to fill the values of notification buffer, there is no need 
to check the length values that is passed by the user.
Also, there can be cases when user has a bigger numofItems, but has allocated 
lesser than that.
In such cases, dereferencing the notification can be avoided inside this 
function.

diff --git a/osaf/libs/agents/saf/clma/clma_api.c 
b/osaf/libs/agents/saf/clma/clma_api.c
--- a/osaf/libs/agents/saf/clma/clma_api.c
+++ b/osaf/libs/agents/saf/clma/clma_api.c
@@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b
 
/* validate the notify buffer */
if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
-   uint32_t i;
-
if (!buf->numberOfItems)
return SA_AIS_ERR_INVALID_PARAM;
-
-   // Check that nodeName is not longer than 255
-   for(i=0; inumberOfItems; i++) {
-   
if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= 
SA_MAX_NAME_LENGTH) {
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   }
}
 
/* Validate if flag is TRACK_CURRENT and no callback and no buffer 
provided */ @@ -167,24 +158,10 @@ static SaAisErrorT clma_validate_flags_b
 
/* validate the notify buffer */
if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
-   uint32_t i;
-
if (!buf->numberOfItems) {
TRACE_LEAVE();
return SA_AIS_ERR_INVALID_PARAM;
}
-
-   // Check that nodeName and EE are not longer than 255
-   for(i=0; inumberOfItems; i++) {
-   
if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= 
SA_MAX_NAME_LENGTH) {
-   TRACE_LEAVE();
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   
if(osaf_extended_name_length(>notification[i].clusterNode.executionEnvironment)
 >= SA_MAX_NAME_LENGTH) {
-   TRACE_LEAVE();
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   }
}
 
/* Validate if flag is TRACK_CURRENT and no callback and no buffer 
provided */

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] AMFND: Fix amfnd coredump if sc failover while shutting down [#2008]

2016-09-09 Thread minh chau
Hi Praveen,

I saw Hans gave ack on this #2008 yesterday so I pushed it this morning.
As if it's simpler fix by introducing new flag then I think you can 
float another patch. Maybe it's after 5.1RC?

Thanks,
Minh
On 09/09/16 17:22, praveen malviya wrote:
> Hi,
>
> I saw the ticket is pushed. But there is another simpler way of fixing 
> it on top of previous fix of similar problem which was fixed in 
> avnd_evt_avd_info_su_si_assign_evh() with following code:
>  if ((cb->term_state == AVND_TERM_STATE_OPENSAF_SHUTDOWN_INITIATED) ||
> (cb->term_state == 
> AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED)) {
> if ((su->is_ncs == true) &&
> (info->msg_act == AVSV_SUSI_ACT_MOD) &&
> (info->ha_state == SA_AMF_HA_ACTIVE)) {
> LOG_NO("shutdown started, failover requested, 
> escalate to forced shutdown");
> avnd_last_step_clean(cb);
> } else {
>
> Above fix does not distinguish between removal of assignments and 
> clean ups of comps during shutdown phase. These are two sub-phases.
> So simpler fix could be to distinguish between these two sub-phases by 
> introducing a new flag like 
> AVND_TERM_STATE_OPENSAF_SHUTDOWN_CLEANUP_STARTED.
>
> Thanks,
> Praveen
>
>
> On 08-Sep-16 5:31 PM, Minh Hon Chau wrote:
>>  osaf/libs/common/amf/include/amf_db_template.h |  11 +++
>>  osaf/services/saf/amf/amfnd/clc.cc |   6 ++
>>  osaf/services/saf/amf/amfnd/term.cc|  16 ++--
>>  3 files changed, 19 insertions(+), 14 deletions(-)
>>
>>
>> During cluster shutting down phase, if both controllers do not shutdown
>> fast enough and active controller goes down first, then a possibility of
>> sc failover happens. In this situation, avnd_last_step_clean() gets 
>> called
>> twice, a coredump is generated. It most likely because deleting 
>> record in
>> nodeid_mdsdest_db and hctypedb but those container still own the key. 
>> Thus,
>> the second call of avnd_last_step_clean() cause coredump.
>>
>> Patch ensure deletion of nodeid_mdsdest_db and hctypedb is the last step
>> before exit
>>
>> diff --git a/osaf/libs/common/amf/include/amf_db_template.h 
>> b/osaf/libs/common/amf/include/amf_db_template.h
>> --- a/osaf/libs/common/amf/include/amf_db_template.h
>> +++ b/osaf/libs/common/amf/include/amf_db_template.h
>> @@ -73,6 +73,7 @@ class AmfDb {
>>public:
>> unsigned int insert(const Key , T *obj);
>> void erase(const Key );
>> +   void deleteAll();
>> T *find(const Key );
>> T *findNext(const Key );
>>
>> @@ -120,6 +121,16 @@ void AmfDb::erase(const Key 
>>
>>  //
>>  template 
>> +void AmfDb::deleteAll() {
>> +  for (const auto& it: db) {
>> +delete it.second;
>> +  }
>> +  db.clear();
>> +}
>> +
>> +
>> +//
>> +template 
>>  T *AmfDb::find(const Key ) {
>>typename AmfDbMap::iterator it = db.find(key);
>>if (it == db.end())
>> diff --git a/osaf/services/saf/amf/amfnd/clc.cc 
>> b/osaf/services/saf/amf/amfnd/clc.cc
>> --- a/osaf/services/saf/amf/amfnd/clc.cc
>> +++ b/osaf/services/saf/amf/amfnd/clc.cc
>> @@ -815,6 +815,8 @@ uint32_t avnd_comp_clc_fsm_run(AVND_CB *
>>  if (all_comps_terminated()) {
>>  LOG_NO("Terminated all AMF components");
>>  LOG_NO("Shutdown completed, exiting");
>> +cb->nodeid_mdsdest_db.deleteAll();
>> +cb->hctypedb.deleteAll();
>>  exit(0);
>>  } else {
>>  TRACE("Do nothing");
>> @@ -2301,6 +2303,8 @@ uint32_t avnd_comp_clc_terming_cleansucc
>>  if (all_comps_terminated()) {
>>  LOG_NO("Terminated all AMF components");
>>  LOG_NO("Shutdown completed, exiting");
>> +cb->nodeid_mdsdest_db.deleteAll();
>> +cb->hctypedb.deleteAll();
>>  exit(0);
>>  }
>>  }
>> @@ -2362,6 +2366,8 @@ uint32_t avnd_comp_clc_terming_cleanfail
>>  all_comps_terminated()) {
>>  LOG_WA("Terminated all AMF components (with failures)");
>>  LOG_NO("Shutdown completed, exiting");
>> +cb->nodeid_mdsdest_db.deleteAll();
>> +cb->hctypedb.deleteAll();
>>  exit(0);
>>  }
>>
>> diff --git a/osaf/services/saf/amf/amfnd/term.cc 
>> b/osaf/services/saf/amf/amfnd/term.cc
>> --- a/osaf/services/saf/amf/amfnd/term.cc
>> +++ b/osaf/services/saf/amf/amfnd/term.cc
>> @@ -58,8 +58,6 @@ extern const AVND_EVT_HDLR g_avnd_func_l
>>  void avnd_last_step_clean(AVND_CB *cb)
>>  {
>>  AVND_COMP *comp;
>> -AVND_NODEID_TO_MDSDEST_MAP *rec = nullptr;
>> -AVND_HCTYPE *hc = nullptr;
>>  int cleanup_call_cnt = 0;
>>
>>  TRACE_ENTER();
>> @@ -86,21 +84,11 @@ void avnd_last_step_clean(AVND_CB *cb)
>>  /* Stop was called early or some other problem */
>>  if (cleanup_call_cnt == 0) {
>>  LOG_NO("No 

Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987]

2016-09-09 Thread minh chau
Hi Praveen,

Please see comment in line with [Minh]

Thanks,
Minh

On 09/09/16 17:06, praveen malviya wrote:
> Hi Minh,
>
> I could not understand why AMF should become implementer/applier earlier.
[Minh]: We need to read osaAmfSGFsmState for differentiation of 
nodegroup operation while exploring SUSI assignment. osafAmfSGFsmState 
needs to be available before avd_susi_read_headless_cached_rta().
avd_susi_read_headless_cached_rta() needs to be available before 
avd_sg_read_headless_cached_rta() for purpose of checking 
SUOperationList. So I think it's the best if we can retrieve 
osafAmfSGFsmState in avd_sg_config_get(). To read osafAmfSGFsmState as 
RTA, AMF needs to be implementer before reading RTA, otherwise the 
returned value from IMM is dummy (most of the time I got it as 0). If 
there's not simpler way to do, I would go for reading osafAmfSGFsmState 
in avd_sg_config_get(), but we also need to set applier for standby AMFD 
before avd_sg_config_get() to avoid issue in #1720.
> Anyways, please find one query below with [Praveen].
>
> Thanks,
> Praveen
>
> On 05-Sep-16 7:13 AM, Minh Hon Chau wrote:
>>  osaf/services/saf/amf/amfd/include/node.h |   3 +
>>  osaf/services/saf/amf/amfd/include/sg.h   |   1 +
>>  osaf/services/saf/amf/amfd/nodegroup.cc   |  83 
>> +++
>>  osaf/services/saf/amf/amfd/role.cc|  20 +++---
>>  osaf/services/saf/amf/amfd/sg.cc  |  15 ++--
>>  osaf/services/saf/amf/amfd/sgproc.cc  |   2 +-
>>  osaf/services/saf/amf/amfd/siass.cc   |   4 +-
>>  7 files changed, 109 insertions(+), 19 deletions(-)
>>
>>
>> The SG becomes unstable because some variables used in nodegroup 
>> operation are not
>> restored after headless if this admin operation on nodegroup was 
>> interrupted just
>> before cluster goes into headless stage.
>>
>> In order to restore nodegroup operation, AMF needs to know exactly 
>> whether nodegroup
>> operation was running during headless up on @susi assignment. If susi 
>> is in QUIESCED,
>> QUIESCING or being removed while its related entities su, si, sg are 
>> not in LOCKED
>> and SHUTTING_DOWN, that means either node or nodegroup MUST be in 
>> LOCKED or SHUTTING
>> DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to 
>> know a nodegroup
>> operation was running. However, if saAmfNGAdminState is in LOCKED, 
>> this case is an
>> ambiguity of locking a node. The reason of differentiation of locking 
>> a node or node
>> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and 
>> AVD_SG_FSM_SU_OPER for node
>> group operation while AVD_SG_FSM_SU_OPER is only used for node 
>> operation. When 2N SG
>> uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is 
>> borrowed (but not
>> updated to IMM) to run the admin operation sequence. Therefore, after 
>> headless if
>> AVD_SG_FSM_SG_ADMIN was being used for nodegroup then 
>> saAmfSGAdminState also needs to
>> be set.
>>
>> Because SG FSM state is used to restore nodegroup during restoring 
>> susi assignment,
>> the osaAmfSGFsmState (RTA) needs to be read earlier than reading susi 
>> assignment. This
>> needs active AMFD become implementer earlier than reading sg object. 
>> There was a known
>> ticket reported in 1720, if only make active AMFD as early 
>> implementer than it will
>> cause the standby AMFD missing ccb apply callback.This patch also 
>> needs to set both
>> active and standby AMFD become implementer and applier earlier so 
>> that AMFD can read
>> osaAmfSGFsmState and do not cause regression of 1720.
>>
>
>> diff --git a/osaf/services/saf/amf/amfd/include/node.h 
>> b/osaf/services/saf/amf/amfd/include/node.h
>> --- a/osaf/services/saf/amf/amfd/include/node.h
>> +++ b/osaf/services/saf/amf/amfd/include/node.h
>> @@ -45,6 +45,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  class AVD_SU;
>>  struct avd_cluster_tag;
>> @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF
>>  extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT 
>> state);
>>  extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node);
>>  extern bool any_ng_in_locked_in_state(const AVD_AVND *node);
>> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct 
>> avd_su_si_rel_tag* susi);
>> +
>>  #endif
>> diff --git a/osaf/services/saf/amf/amfd/include/sg.h 
>> b/osaf/services/saf/amf/amfd/include/sg.h
>> --- a/osaf/services/saf/amf/amfd/include/sg.h
>> +++ b/osaf/services/saf/amf/amfd/include/sg.h
>> @@ -46,6 +46,7 @@
>>  class AVD_SU;
>>  class AVD_SI;
>>  class AVD_APP;
>> +class AVD_AMF_NG;
>>
>>  /* The valid SG FSM states. */
>>  typedef enum {
>> diff --git a/osaf/services/saf/amf/amfd/nodegroup.cc 
>> b/osaf/services/saf/amf/amfd/nodegroup.cc
>> --- a/osaf/services/saf/amf/amfd/nodegroup.cc
>> +++ b/osaf/services/saf/amf/amfd/nodegroup.cc
>> @@ -1356,3 +1356,86 @@ void avd_ng_constructor(void)
>>  ng_ccb_apply_cb);
>>  }
>>
>> +/*
>> + * @brief: Restore nodegroup related variables which 

[devel] Review request for updates to CLM PR FOR 5.1

2016-09-09 Thread Mathivanan Naickan Palanivelu
Hi,
Please find the updated CLM PR doc (with track changes enabled) for review.

Thanks,
Mathi.

OpenSAF_CLMSv_PR_5.1_Mathi.odt
Description: application/vnd.oasis.opendocument.text
--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1 of 1] clm: avoid checking length of sanamet when clm service is filling data [#2002]

2016-09-09 Thread mathi . naickan
 osaf/libs/agents/saf/clma/clma_api.c |  23 ---
 1 files changed, 0 insertions(+), 23 deletions(-)


When CLM is going to fill the values of notification buffer, there is no need 
to check
the length values that is passed by the user.
Also, there can be cases when user has a bigger numofItems, but has allocated 
lesser than that.
In such cases, dereferencing the notification can be avoided inside this 
function.

diff --git a/osaf/libs/agents/saf/clma/clma_api.c 
b/osaf/libs/agents/saf/clma/clma_api.c
--- a/osaf/libs/agents/saf/clma/clma_api.c
+++ b/osaf/libs/agents/saf/clma/clma_api.c
@@ -117,17 +117,8 @@ static SaAisErrorT clma_validate_flags_b
 
/* validate the notify buffer */
if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
-   uint32_t i;
-
if (!buf->numberOfItems)
return SA_AIS_ERR_INVALID_PARAM;
-
-   // Check that nodeName is not longer than 255
-   for(i=0; inumberOfItems; i++) {
-   
if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= 
SA_MAX_NAME_LENGTH) {
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   }
}
 
/* Validate if flag is TRACK_CURRENT and no callback and no buffer 
provided */
@@ -167,24 +158,10 @@ static SaAisErrorT clma_validate_flags_b
 
/* validate the notify buffer */
if ((flags & SA_TRACK_CURRENT) && buf && buf->notification) {
-   uint32_t i;
-
if (!buf->numberOfItems) {
TRACE_LEAVE();
return SA_AIS_ERR_INVALID_PARAM;
}
-
-   // Check that nodeName and EE are not longer than 255
-   for(i=0; inumberOfItems; i++) {
-   
if(osaf_extended_name_length(>notification[i].clusterNode.nodeName) >= 
SA_MAX_NAME_LENGTH) {
-   TRACE_LEAVE();
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   
if(osaf_extended_name_length(>notification[i].clusterNode.executionEnvironment)
 >= SA_MAX_NAME_LENGTH) {
-   TRACE_LEAVE();
-   return SA_AIS_ERR_INVALID_PARAM;
-   }
-   }
}
 
/* Validate if flag is TRACK_CURRENT and no callback and no buffer 
provided */

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0 of 1] Review Request for clm: avoid checking length of sanamet when clm service is filling data [#2002]

2016-09-09 Thread mathi . naickan
Summary: clm: avoid checking length of sanamet when clm service is filling data 
[#2002]
Review request for Trac Ticket(s): #2002
Peer Reviewer(s): Zoran 
Pull request to: <>
Affected branch(es): opensaf-5.1.x, default 
Development branch: <>


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 1449b7ad43db3e77226abc433aa1a93ba9d5e166
Author: mathi.naic...@oracle.com
Date:   Fri, 09 Sep 2016 15:07:36 +0530

clm: avoid checking length of sanamet when clm service is filling data
[#2002] When CLM is going to fill the values of notification buffer, 
there
is no need to check the length values that is passed by the user. Also,
there can be cases when user has a bigger numofItems, but has allocated
lesser than that. In such cases, dereferencing the notification can be
avoided inside this function.


Complete diffstat:
--
 osaf/libs/agents/saf/clma/clma_api.c |  23 ---
 1 files changed, 0 insertions(+), 23 deletions(-)


Testing Commands:
-
As explained in the ticket.

Testing, Expected Results:
--
Same as above.

Conditions of Submission:
-
Ack from zoran.

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] lga: ignore CLM status if client Initialize with A.2.1 [#1999]

2016-09-09 Thread Vu Minh Nguyen
Ack

Regards, Vu

> -Original Message-
> From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com]
> Sent: Wednesday, September 7, 2016 3:35 PM
> To: vu.m.ngu...@dektech.com.au; lennart.l...@ericsson.com;
> praveenmalv...@users.sf.net
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] lga: ignore CLM status if client Initialize with
A.2.1
> [#1999]
> 
>  osaf/libs/agents/saf/lga/lga.h |   1 +
>  osaf/libs/agents/saf/lga/lga_api.c |   4 +++-
>  osaf/libs/agents/saf/lga/lga_mds.c |  12 +---
>  tests/logsv/tet_Log_clm.c  |  35
> ++-
>  4 files changed, 47 insertions(+), 5 deletions(-)
> 
> 
> Issue :
> If client saLogInitialize with  A.2.1 and
> linked with New agents lib A.2.2 code is considering CLM status.
> 
> But as explained in osaf/services/saf/logsv/README
> old lga clients  saLogInitialize with A.02.01 are always clm member,
> 
> Fix:
> 
> If  client saLogInitialize() with  A.2.1 , irrelevant of linked with New
agents lib
> A.2.2 code
> or linked with Old agents lib A.2.1 code, CLM status will be ignored .
> 
> Test code added two new test cases  in testsuite 14
> Please check test case 14 of /usr/bin/logtest -e 14
> 
> diff --git a/osaf/libs/agents/saf/lga/lga.h
b/osaf/libs/agents/saf/lga/lga.h
> --- a/osaf/libs/agents/saf/lga/lga.h
> +++ b/osaf/libs/agents/saf/lga/lga.h
> @@ -79,6 +79,7 @@ typedef struct lga_client_hdl_rec {
> * streams are recovered
> */
>bool is_stale_client;  /* Status of client based on the CLM status of
node.*/
> +  SaVersionT version; /* the API version is being used by client, used
for CLM
> status */
>  } lga_client_hdl_rec_t;
> 
>  /* States of the server */
> diff --git a/osaf/libs/agents/saf/lga/lga_api.c
> b/osaf/libs/agents/saf/lga/lga_api.c
> --- a/osaf/libs/agents/saf/lga/lga_api.c
> +++ b/osaf/libs/agents/saf/lga/lga_api.c
> @@ -164,7 +164,8 @@ SaAisErrorT saLogInitialize(SaLogHandleT
>   /* validate the version */
>   if ((version->releaseCode == LOG_RELEASE_CODE) && (version-
> >majorVersion <= LOG_MAJOR_VERSION)) {
>   version->majorVersion = LOG_MAJOR_VERSION;
> - version->minorVersion = LOG_MINOR_VERSION;
> + if (version->minorVersion != LOG_MINOR_VERSION_0)
> + version->minorVersion = LOG_MINOR_VERSION;
>   } else {
>   TRACE("version FAILED, required: %c.%u.%u, supported:
> %c.%u.%u\n",
> version->releaseCode, version->majorVersion, version-
> >minorVersion,
> @@ -264,6 +265,7 @@ SaAisErrorT saLogInitialize(SaLogHandleT
> 
>   /* pass the handle value to the appl */
>   *logHandle = lga_hdl_rec->local_hdl;
> + lga_hdl_rec->version = *version;
> 
>   err:
>   /* free up the response message */
> diff --git a/osaf/libs/agents/saf/lga/lga_mds.c
> b/osaf/libs/agents/saf/lga/lga_mds.c
> --- a/osaf/libs/agents/saf/lga/lga_mds.c
> +++ b/osaf/libs/agents/saf/lga/lga_mds.c
> @@ -545,9 +545,15 @@ static uint32_t lga_lgs_msg_proc(lga_cb_
> library handles and all resources
> associated with these handles. Hence, it is recommended
> for the processes to finalize the
> library handles as soon as the processes
> detect that the node left the
> membership.*/
> - lga_hdl_rec->is_stale_client = true;
> - TRACE("CLM_NODE callback
> is_stale_client: %d clm_node_state: %d",
> - lga_hdl_rec-
> >is_stale_client, cb->clm_node_state);
> +
> + /*Old LGA clients A.02.01 are always
> clm member */
> + if (!((lga_hdl_rec-
> >version.releaseCode == LOG_RELEASE_CODE_0) &&
> + (lga_hdl_rec-
> >version.majorVersion == LOG_MAJOR_VERSION_0) &&
> + (lga_hdl_rec-
> >version.minorVersion == LOG_MINOR_VERSION_0))) {
> + lga_hdl_rec->is_stale_client
> = true;
> + TRACE("CLM_NODE callback
> is_stale_client: %d clm_node_state: %d",
> + lga_hdl_rec-
> >is_stale_client, cb->clm_node_state);
> + }
>   }
>   lga_msg_destroy(lgsv_msg);
>   }
> diff --git a/tests/logsv/tet_Log_clm.c b/tests/logsv/tet_Log_clm.c
> --- a/tests/logsv/tet_Log_clm.c
> +++ b/tests/logsv/tet_Log_clm.c
> @@ -95,7 +95,7 @@ void saLogInitializ_14_02(void)
>   rc = saLogInitialize(, ,
> );
>   saLogFinalize(logHandle);
>   unlockNode();
> - test_validate(rc, SA_AIS_ERR_UNAVAILABLE);
> +

Re: [devel] [PATCH 1 of 1] AMFND: Fix amfnd coredump if sc failover while shutting down [#2008]

2016-09-09 Thread praveen malviya
Hi,

I saw the ticket is pushed. But there is another simpler way of fixing 
it on top of previous fix of similar problem which was fixed in 
avnd_evt_avd_info_su_si_assign_evh() with following code:
  if ((cb->term_state == AVND_TERM_STATE_OPENSAF_SHUTDOWN_INITIATED) ||
 (cb->term_state == 
AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED)) {
 if ((su->is_ncs == true) &&
 (info->msg_act == AVSV_SUSI_ACT_MOD) &&
 (info->ha_state == SA_AMF_HA_ACTIVE)) {
 LOG_NO("shutdown started, failover requested, 
escalate to forced shutdown");
 avnd_last_step_clean(cb);
 } else {

Above fix does not distinguish between removal of assignments and clean 
ups of comps during shutdown phase. These are two sub-phases.
So simpler fix could be to distinguish between these two sub-phases by 
introducing a new flag like 
AVND_TERM_STATE_OPENSAF_SHUTDOWN_CLEANUP_STARTED.

Thanks,
Praveen


On 08-Sep-16 5:31 PM, Minh Hon Chau wrote:
>  osaf/libs/common/amf/include/amf_db_template.h |  11 +++
>  osaf/services/saf/amf/amfnd/clc.cc |   6 ++
>  osaf/services/saf/amf/amfnd/term.cc|  16 ++--
>  3 files changed, 19 insertions(+), 14 deletions(-)
>
>
> During cluster shutting down phase, if both controllers do not shutdown
> fast enough and active controller goes down first, then a possibility of
> sc failover happens. In this situation, avnd_last_step_clean() gets called
> twice, a coredump is generated. It most likely because deleting record in
> nodeid_mdsdest_db and hctypedb but those container still own the key. Thus,
> the second call of avnd_last_step_clean() cause coredump.
>
> Patch ensure deletion of nodeid_mdsdest_db and hctypedb is the last step
> before exit
>
> diff --git a/osaf/libs/common/amf/include/amf_db_template.h 
> b/osaf/libs/common/amf/include/amf_db_template.h
> --- a/osaf/libs/common/amf/include/amf_db_template.h
> +++ b/osaf/libs/common/amf/include/amf_db_template.h
> @@ -73,6 +73,7 @@ class AmfDb {
>public:
> unsigned int insert(const Key , T *obj);
> void erase(const Key );
> +   void deleteAll();
> T *find(const Key );
> T *findNext(const Key );
>
> @@ -120,6 +121,16 @@ void AmfDb::erase(const Key 
>
>  //
>  template 
> +void AmfDb::deleteAll() {
> +  for (const auto& it: db) {
> + delete it.second;
> +  }
> +  db.clear();
> +}
> +
> +
> +//
> +template 
>  T *AmfDb::find(const Key ) {
>typename AmfDbMap::iterator it = db.find(key);
>if (it == db.end())
> diff --git a/osaf/services/saf/amf/amfnd/clc.cc 
> b/osaf/services/saf/amf/amfnd/clc.cc
> --- a/osaf/services/saf/amf/amfnd/clc.cc
> +++ b/osaf/services/saf/amf/amfnd/clc.cc
> @@ -815,6 +815,8 @@ uint32_t avnd_comp_clc_fsm_run(AVND_CB *
>   if (all_comps_terminated()) {
>   LOG_NO("Terminated all AMF components");
>   LOG_NO("Shutdown completed, exiting");
> + cb->nodeid_mdsdest_db.deleteAll();
> + cb->hctypedb.deleteAll();
>   exit(0);
>   } else {
>   TRACE("Do nothing");
> @@ -2301,6 +2303,8 @@ uint32_t avnd_comp_clc_terming_cleansucc
>   if (all_comps_terminated()) {
>   LOG_NO("Terminated all AMF components");
>   LOG_NO("Shutdown completed, exiting");
> + cb->nodeid_mdsdest_db.deleteAll();
> + cb->hctypedb.deleteAll();
>   exit(0);
>   }
>   }
> @@ -2362,6 +2366,8 @@ uint32_t avnd_comp_clc_terming_cleanfail
>   all_comps_terminated()) {
>   LOG_WA("Terminated all AMF components (with failures)");
>   LOG_NO("Shutdown completed, exiting");
> + cb->nodeid_mdsdest_db.deleteAll();
> + cb->hctypedb.deleteAll();
>   exit(0);
>   }
>
> diff --git a/osaf/services/saf/amf/amfnd/term.cc 
> b/osaf/services/saf/amf/amfnd/term.cc
> --- a/osaf/services/saf/amf/amfnd/term.cc
> +++ b/osaf/services/saf/amf/amfnd/term.cc
> @@ -58,8 +58,6 @@ extern const AVND_EVT_HDLR g_avnd_func_l
>  void avnd_last_step_clean(AVND_CB *cb)
>  {
>   AVND_COMP *comp;
> - AVND_NODEID_TO_MDSDEST_MAP *rec = nullptr;
> - AVND_HCTYPE *hc = nullptr;
>   int cleanup_call_cnt = 0;
>
>   TRACE_ENTER();
> @@ -86,21 +84,11 @@ void avnd_last_step_clean(AVND_CB *cb)
>   /* Stop was called early or some other problem */
>   if (cleanup_call_cnt == 0) {
>   LOG_NO("No component to terminate, exiting");
> + cb->nodeid_mdsdest_db.deleteAll();
> + cb->hctypedb.deleteAll();
>   exit(0);
>   }
>
> - 

Re: [devel] [PATCH 1 of 1] AMF: Fix SG unstable from admin continuation of nodegroup after headless [#1987]

2016-09-09 Thread praveen malviya
Hi Minh,

I could not understand why AMF should become implementer/applier earlier.
Anyways, please find one query below with [Praveen].

Thanks,
Praveen

On 05-Sep-16 7:13 AM, Minh Hon Chau wrote:
>  osaf/services/saf/amf/amfd/include/node.h |   3 +
>  osaf/services/saf/amf/amfd/include/sg.h   |   1 +
>  osaf/services/saf/amf/amfd/nodegroup.cc   |  83 
> +++
>  osaf/services/saf/amf/amfd/role.cc|  20 +++---
>  osaf/services/saf/amf/amfd/sg.cc  |  15 ++--
>  osaf/services/saf/amf/amfd/sgproc.cc  |   2 +-
>  osaf/services/saf/amf/amfd/siass.cc   |   4 +-
>  7 files changed, 109 insertions(+), 19 deletions(-)
>
>
> The SG becomes unstable because some variables used in nodegroup operation 
> are not
> restored after headless if this admin operation on nodegroup was interrupted 
> just
> before cluster goes into headless stage.
>
> In order to restore nodegroup operation, AMF needs to know exactly whether 
> nodegroup
> operation was running during headless up on @susi assignment. If susi is in 
> QUIESCED,
> QUIESCING or being removed while its related entities su, si, sg are not in 
> LOCKED
> and SHUTTING_DOWN, that means either node or nodegroup MUST be in LOCKED or 
> SHUTTING
> DOWN. In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to know a 
> nodegroup
> operation was running. However, if saAmfNGAdminState is in LOCKED, this case 
> is an
> ambiguity of locking a node. The reason of differentiation of locking a node 
> or node
> group is because 2N SG uses both AVD_SG_FSM_SG_ADMIN and AVD_SG_FSM_SU_OPER 
> for node
> group operation while AVD_SG_FSM_SU_OPER is only used for node operation. 
> When 2N SG
> uses AVD_SG_FSM_SG_ADMIN for nodegroup, the saAmfSGAdminState is borrowed 
> (but not
> updated to IMM) to run the admin operation sequence. Therefore, after 
> headless if
> AVD_SG_FSM_SG_ADMIN was being used for nodegroup then saAmfSGAdminState also 
> needs to
> be set.
>
> Because SG FSM state is used to restore nodegroup during restoring susi 
> assignment,
> the osaAmfSGFsmState (RTA) needs to be read earlier than reading susi 
> assignment. This
> needs active AMFD become implementer earlier than reading sg object. There 
> was a known
> ticket reported in 1720, if only make active AMFD as early implementer than 
> it will
> cause the standby AMFD missing ccb apply callback.This patch also needs to 
> set both
> active and standby AMFD become implementer and applier earlier so that AMFD 
> can read
> osaAmfSGFsmState and do not cause regression of 1720.
>

> diff --git a/osaf/services/saf/amf/amfd/include/node.h 
> b/osaf/services/saf/amf/amfd/include/node.h
> --- a/osaf/services/saf/amf/amfd/include/node.h
> +++ b/osaf/services/saf/amf/amfd/include/node.h
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  class AVD_SU;
>  struct avd_cluster_tag;
> @@ -232,4 +233,6 @@ extern void ng_complete_admin_op(AVD_AMF
>  extern void avd_ng_admin_state_set(AVD_AMF_NG* ng, SaAmfAdminStateT state);
>  extern bool are_all_ngs_in_unlocked_state(const AVD_AVND *node);
>  extern bool any_ng_in_locked_in_state(const AVD_AVND *node);
> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct avd_su_si_rel_tag* 
> susi);
> +
>  #endif
> diff --git a/osaf/services/saf/amf/amfd/include/sg.h 
> b/osaf/services/saf/amf/amfd/include/sg.h
> --- a/osaf/services/saf/amf/amfd/include/sg.h
> +++ b/osaf/services/saf/amf/amfd/include/sg.h
> @@ -46,6 +46,7 @@
>  class AVD_SU;
>  class AVD_SI;
>  class AVD_APP;
> +class AVD_AMF_NG;
>
>  /* The valid SG FSM states. */
>  typedef enum {
> diff --git a/osaf/services/saf/amf/amfd/nodegroup.cc 
> b/osaf/services/saf/amf/amfd/nodegroup.cc
> --- a/osaf/services/saf/amf/amfd/nodegroup.cc
> +++ b/osaf/services/saf/amf/amfd/nodegroup.cc
> @@ -1356,3 +1356,86 @@ void avd_ng_constructor(void)
>   ng_ccb_apply_cb);
>  }
>
> +/*
> + * @brief: Restore nodegroup related variables which are used in nodegroup
> + * operations before headless. The main variables are @admin_ng,
> + * @ng_using_saAmfSGAdminState, and the saAmfSGAdminState which
> + * is borrowed by nodegroup operation on 2N SG
> + * @param: control block, and susi is being read after headless
> + * @return: void
> + */
> +
> +void avd_ng_restore_headless_states(AVD_CL_CB *cb, struct avd_su_si_rel_tag* 
> susi) {
> +
> + TRACE_ENTER();
> + osafassert(cb->scs_absence_max_duration > 0);
> + // In order to restore nodegroup operation, AMF needs to know exactly
> + // whether nodegroup operation was running during headless up on @susi
> + // If susi is in QUIESCED/QUIESCING or being removed while its related
> + // entities su, si, sg are not in LOCKED and SHUTTING_DOWN, that means
> + // either node or nodegroup MUST be in LOCKED or SHUTTING_DOWN.
> + // In case of SHUTTING_DOWN saAmfNGAdminState, that's enough to know
> + // a nodegroup operation was running.