Re: [devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1

2016-11-30 Thread Lennart Lund
Hi Neel,

Ack with comment

If you copied the loop used with admin operation for node group also this loop 
would get a more predictable timeout. But even if you do that #2212 is still 
needed in order to get rid of code redundancy.

Thanks
Lennart 

> -Original Message-
> From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]
> Sent: den 1 december 2016 06:14
> To: Tai Chi Dinh ; Lennart Lund
> ; Rafael Odzakow
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling
> adminoperation[#2211] V1
> 
> Hi ,
> 
> will be pushing the patch if there are no further comments.
> #2212 is opened to change SmfImmUtils AdminOperation the same way as
> Node group admin operation.
> 
> Thanks,
> Neel.
> 
> On 2016/11/30 02:42 PM, Tai Dinh wrote:
> > Thank Neel,
> >
> > ACK from me.
> >
> > /Tai
> > -Original Message-
> > From: reddy.neelaka...@oracle.com
> [mailto:reddy.neelaka...@oracle.com]
> > Sent: Wednesday, November 30, 2016 3:47 PM
> > To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com;
> > tai.d...@dektech.com.au
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling
> > adminoperation[#2211] V1
> >
> >   osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++
> ---
> >   1 files changed, 15 insertions(+), 11 deletions(-)
> >
> >
> > diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> > b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> > --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> > +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> > @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
> > }
> >
> > /* Call the admin operation */
> > -   do {
> > -   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> > +
> > +   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> > +   rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> > , 0, i_operationId, i_params,
> > +   , i_timeout);
> > +   while ((rc == SA_AIS_OK) && (returnValue ==
> SA_AIS_ERR_TRY_AGAIN) &&
> > (retry > 0) ){
> > +   sleep(2);
> > rc =
> immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> > , 0, i_operationId, i_params,
> > -  ,
> > i_timeout);
> > -   if (retry <= 0) {
> > -   LOG_NO("Fail to invoke admin operation, too many
> > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> > -  i_dn.c_str(), i_operationId);
> > -   rc = SA_AIS_ERR_TRY_AGAIN;
> > -   goto done;
> > -   }
> > -   sleep(2);
> > +   , i_timeout);
> > retry--;
> > -   } while ((rc == SA_AIS_OK) && (returnValue ==
> > SA_AIS_ERR_TRY_AGAIN));
> > +   }
> > +
> > +   if (retry <= 0) {
> > +   LOG_NO("Fail to invoke admin operation, too many
> > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> > +   i_dn.c_str(), i_operationId);
> > +   rc = SA_AIS_ERR_TRY_AGAIN;
> > +   goto done;
> > +   }
> >
> > if ( rc != SA_AIS_OK) {
> > LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s],
> > opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);
> >


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]

2016-11-30 Thread Hans Nordebäck
HI Nagu,

One strategy is in case if an error code is returned to the application, it is 
the application to select (or not) to log this as an error/notify etc. 

/Thanks HansN 

-Original Message-
From: Nagendra Kumar [mailto:nagendr...@oracle.com] 
Sent: den 1 december 2016 07:38
To: Hans Nordebäck ; Praveen Malviya 
; Minh Hon Chau ; Gary 
Lee 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]

Hi Hans,
I had proposed this long time back in the ticket:

" Nagendra Kumar - 2015-08-07

As part of ticket cleanup, evaluated and decided to keep it as enhancement.
The logging should not be heavy and should only stick to state to the level of 
providing hints to the user. Perhaps, the log should fall under Error category."

>> After running various OpenSAF tests there should be no errors in the syslog 
>> and there may be test case that will fail now, even though there are no 
>> OpenSAF errors?
If there are error situation like the log added, Errors should come into 
syslog. All negative test cases should result into Error, shouldn't they ?

Thanks
-Nagu

> -Original Message-
> From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com]
> Sent: 01 December 2016 11:40
> To: Nagendra Kumar; Praveen Malviya; Minh Hon Chau; Gary Lee
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures 
> [#340]
> 
> Hi Nagu,
> Not sure, but if these errors are not OpenSAF errors, i.e. not 
> depending on application configuration, use  etc.
> should they really be logged as errors? After running various OpenSAF 
> tests there should be no errors in the syslog and there may be test 
> case that will fail now, even though there are no OpenSAF errors?
> /Regards HansN
> 
> -Original Message-
> From: Nagendra Kumar [mailto:nagendr...@oracle.com]
> Sent: den 1 december 2016 06:37
> To: Hans Nordebäck ; Praveen Malviya 
> ; Minh Hon Chau 
> ; Gary Lee 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures 
> [#340]
> 
> Hi Hans,
>   I find it more appropriate here as we are returning Error codes 
> so 
> log the information as Error :)
> 
> Thanks
> -Nagu
> > -Original Message-
> > From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com]
> > Sent: 29 November 2016 14:20
> > To: Nagendra Kumar; Praveen Malviya; minh.c...@dektech.com.au; 
> > gary@dektech.com.au
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: Re: [PATCH 1 of 1] amfnd: add logs for HC start failures 
> > [#340]
> >
> > Hi Nagu,
> >
> > a question, why use LOG_ER and not LOG_WA below? /Thanks HansN
> >
> >
> > On 11/04/2016 09:32 AM, nagendr...@oracle.com wrote:
> > >   osaf/services/saf/amf/amfnd/chc.cc |  7 +++
> > >   1 files changed, 7 insertions(+), 0 deletions(-)
> > >
> > >
> > > diff --git a/osaf/services/saf/amf/amfnd/chc.cc
> > b/osaf/services/saf/amf/amfnd/chc.cc
> > > --- a/osaf/services/saf/amf/amfnd/chc.cc
> > > +++ b/osaf/services/saf/amf/amfnd/chc.cc
> > > @@ -315,6 +315,7 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > >
> > >   /* get the comp */
> > >   if ((*o_comp = avnd_compdb_rec_get(cb->compdb,
> > Amf::to_string(_start->comp_name))) == nullptr) {
> > > + LOG_ER("Component '%s' doesn't exist in
> > DB", osaf_extended_name_borrow(_start->comp_name));
> > >   *o_amf_rc = SA_AIS_ERR_NOT_EXIST;
> > >   return;
> > >   }
> > > @@ -324,6 +325,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > >
> > m_AVND_COMP_PRES_STATE_IS_INSTANTIATIONFAILED(*o_comp) ||
> > >
> > m_AVND_COMP_PRES_STATE_IS_TERMINATING(*o_comp) ||
> > >
> > m_AVND_COMP_PRES_STATE_IS_TERMINATIONFAILED(*o_comp)) {
> > > + LOG_ER("Component '%s' is not healthy (pres
> > state '%u')", osaf_extended_name_borrow(_start->comp_name),
> > > + (*o_comp)->pres);
> > >   *o_amf_rc = SA_AIS_ERR_TRY_AGAIN;
> > >   return;
> > >   }
> > > @@ -347,6 +350,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > >   if (0 == avnd_hcdb_rec_get(cb, _chk)) {
> > >   /* HC instance did not exist, look for 
> > > HC type
> > */
> > >   if (nullptr == avnd_hctypedb_rec_get(cb,
> > (*o_comp)->saAmfCompType, _start->hc_key)) {
> > > + LOG_ER("Health check is not
> > configured for component '%s'",
> > > +
> > 

Re: [devel] [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202]

2016-11-30 Thread ramesh betham
Hi Mahesh,

Please find my review comments inline with [Ramesh].

Thanks,
Ramesh.

On 11/29/2016 4:07 PM, mahesh.va...@oracle.com wrote:
>   osaf/libs/common/cpsv/include/cpnd_cb.h   |   4 +-
>   osaf/libs/common/cpsv/include/cpnd_init.h |   8 +-
>   osaf/libs/common/cpsv/include/cpnd_sec.h  |   2 +-
>   osaf/libs/core/include/ncs_osprm.h|   2 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_db.c |  12 ++--
>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  82 
> +-
>   osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  31 ++
>   osaf/services/saf/cpsv/cpnd/cpnd_res.c|  24 +++--
>   osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  12 ++--
>   9 files changed, 103 insertions(+), 74 deletions(-)
>
>
> diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h 
> b/osaf/libs/common/cpsv/include/cpnd_cb.h
> --- a/osaf/libs/common/cpsv/include/cpnd_cb.h
> +++ b/osaf/libs/common/cpsv/include/cpnd_cb.h
> @@ -341,8 +341,8 @@ uint32_t cpnd_amf_register(CPND_CB *cpnd
>   uint32_t cpnd_amf_deregister(CPND_CB *cpnd_cb);
>   uint32_t cpnd_client_extract_bits(uint32_t bitmap_value, uint32_t 
> *bit_position);
>   uint32_t cpnd_res_ckpt_sec_del(CPND_CKPT_NODE *cp_node);
> -uint32_t cpnd_ckpt_replica_create_res(NCS_OS_POSIX_SHM_REQ_INFO *open_req, 
> char *buf, CPND_CKPT_NODE **cp_node,
> - uint32_t ref_cnt, CKPT_INFO 
> *cp_info, bool shm_alloc_guaranteed);
> +uint32_t cpnd_ckpt_replica_create_res(CPND_CB *cb, NCS_OS_POSIX_SHM_REQ_INFO 
> *open_req, char *buf, CPND_CKPT_NODE **cp_node,
> + uint32_t ref_cnt, CKPT_INFO 
> *cp_info);
>   int32_t cpnd_find_free_loc(CPND_CB *cb, CPND_TYPE_INFO type);
>   uint32_t cpnd_ckpt_write_header(CPND_CB *cb, uint32_t nckpts);
>   uint32_t cpnd_cli_info_write_header(CPND_CB *cb, int32_t n_clients);
> diff --git a/osaf/libs/common/cpsv/include/cpnd_init.h 
> b/osaf/libs/common/cpsv/include/cpnd_init.h
> --- a/osaf/libs/common/cpsv/include/cpnd_init.h
> +++ b/osaf/libs/common/cpsv/include/cpnd_init.h
> @@ -90,7 +90,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C
>   uint32_t cpnd_ckpt_remote_cpnd_add(CPND_CKPT_NODE *cp_node, MDS_DEST 
> mds_info);
>   uint32_t cpnd_ckpt_remote_cpnd_del(CPND_CKPT_NODE *cp_node, MDS_DEST 
> mds_info);
>   int32_t cpnd_ckpt_get_lck_sec_id(CPND_CKPT_NODE *cp_node);
> -uint32_t cpnd_ckpt_sec_write(CPND_CKPT_NODE *cp_node, CPND_CKPT_SECTION_INFO
> +uint32_t cpnd_ckpt_sec_write(CPND_CB *cb, CPND_CKPT_NODE *cp_node, 
> CPND_CKPT_SECTION_INFO
> *sec_info, const void *data, uint64_t size, uint64_t 
> offset, uint32_t type);
>   uint32_t cpnd_ckpt_sec_read(CPND_CKPT_NODE *cp_node, CPND_CKPT_SECTION_INFO
>*sec_info, void *data, uint64_t size, uint64_t offset);
> @@ -164,7 +164,7 @@ void cpnd_evt_node_getnext(CPND_CB *cb,
>   uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE 
> *evt_node);
>   uint32_t cpnd_evt_node_del(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE 
> *evt_node);
>   CPND_CKPT_NODE *cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, 
> SaConstStringT ckpt_name);
> -CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node, 
> SaCkptSectionIdT *id, SaTimeT exp_time,
> +CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CB *cb, CPND_CKPT_NODE 
> *cp_node, SaCkptSectionIdT *id, SaTimeT exp_time,
> uint32_t gen_flag);
>   void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt);
>   uint32_t cpnd_ckpt_node_tree_init(CPND_CB *cb);
> @@ -176,8 +176,8 @@ void cpnd_client_node_tree_cleanup(CPND_
>   void cpnd_client_node_tree_destroy(CPND_CB *cb);
>   void cpnd_allrepl_write_evt_node_tree_cleanup(CPND_CB *cb);
>   void cpnd_allrepl_write_evt_node_tree_destroy(CPND_CB *cb);
> -uint32_t cpnd_sec_hdr_update(CPND_CKPT_SECTION_INFO *pSecPtr, CPND_CKPT_NODE 
> *cp_node);
> -uint32_t cpnd_ckpt_hdr_update(CPND_CKPT_NODE *cp_node);
> +uint32_t cpnd_sec_hdr_update(CPND_CB *cb, CPND_CKPT_SECTION_INFO *pSecPtr, 
> CPND_CKPT_NODE *cp_node);
> +uint32_t cpnd_ckpt_hdr_update(CPND_CB *cb, CPND_CKPT_NODE *cp_node);
>   void cpnd_ckpt_node_destroy(CPND_CB *cb, CPND_CKPT_NODE *cp_node);
>   uint32_t cpnd_get_slot_sub_slot_id_from_mds_dest(MDS_DEST dest);
>   uint32_t cpnd_get_slot_sub_slot_id_from_node_id(NCS_NODE_ID i_node_id);
> diff --git a/osaf/libs/common/cpsv/include/cpnd_sec.h 
> b/osaf/libs/common/cpsv/include/cpnd_sec.h
> --- a/osaf/libs/common/cpsv/include/cpnd_sec.h
> +++ b/osaf/libs/common/cpsv/include/cpnd_sec.h
> @@ -39,7 +39,7 @@ CPND_CKPT_SECTION_INFO *
>   cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE *, const SaCkptSectionIdT *);
>   
>   CPND_CKPT_SECTION_INFO *
> -cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *);
> +cpnd_ckpt_sec_del(CPND_CB *cb, CPND_CKPT_NODE *, SaCkptSectionIdT *);
>   
>   CPND_CKPT_SECTION_INFO *
>   cpnd_get_sect_with_id(const CPND_CKPT_NODE *, uint32_t lcl_sec_id);
> diff --git 

Re: [devel] [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]

2016-11-30 Thread Nagendra Kumar
Hi Hans,
I had proposed this long time back in the ticket:

" Nagendra Kumar - 2015-08-07

As part of ticket cleanup, evaluated and decided to keep it as enhancement.
The logging should not be heavy and should only stick to state to the level of 
providing hints to the user. Perhaps, the log should fall under Error category."

>> After running various OpenSAF tests there should be no errors in the syslog 
>> and there may be test case that will fail now, even though there are no 
>> OpenSAF errors?
If there are error situation like the log added, Errors should come into 
syslog. All negative test cases should result into Error, shouldn't they ?

Thanks
-Nagu

> -Original Message-
> From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com]
> Sent: 01 December 2016 11:40
> To: Nagendra Kumar; Praveen Malviya; Minh Hon Chau; Gary Lee
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]
> 
> Hi Nagu,
> Not sure, but if these errors are not OpenSAF errors, i.e. not depending on
> application configuration, use  etc.
> should they really be logged as errors? After running various OpenSAF tests
> there should be no errors in the syslog and there may be test case that will
> fail now, even though there are no OpenSAF errors?
> /Regards HansN
> 
> -Original Message-
> From: Nagendra Kumar [mailto:nagendr...@oracle.com]
> Sent: den 1 december 2016 06:37
> To: Hans Nordebäck ; Praveen Malviya
> ; Minh Hon Chau
> ; Gary Lee 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]
> 
> Hi Hans,
>   I find it more appropriate here as we are returning Error
> codes so log the information as Error :)
> 
> Thanks
> -Nagu
> > -Original Message-
> > From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com]
> > Sent: 29 November 2016 14:20
> > To: Nagendra Kumar; Praveen Malviya; minh.c...@dektech.com.au;
> > gary@dektech.com.au
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: Re: [PATCH 1 of 1] amfnd: add logs for HC start failures
> > [#340]
> >
> > Hi Nagu,
> >
> > a question, why use LOG_ER and not LOG_WA below? /Thanks HansN
> >
> >
> > On 11/04/2016 09:32 AM, nagendr...@oracle.com wrote:
> > >   osaf/services/saf/amf/amfnd/chc.cc |  7 +++
> > >   1 files changed, 7 insertions(+), 0 deletions(-)
> > >
> > >
> > > diff --git a/osaf/services/saf/amf/amfnd/chc.cc
> > b/osaf/services/saf/amf/amfnd/chc.cc
> > > --- a/osaf/services/saf/amf/amfnd/chc.cc
> > > +++ b/osaf/services/saf/amf/amfnd/chc.cc
> > > @@ -315,6 +315,7 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > >
> > >   /* get the comp */
> > >   if ((*o_comp = avnd_compdb_rec_get(cb->compdb,
> > Amf::to_string(_start->comp_name))) == nullptr) {
> > > + LOG_ER("Component '%s' doesn't exist in
> > DB", osaf_extended_name_borrow(_start->comp_name));
> > >   *o_amf_rc = SA_AIS_ERR_NOT_EXIST;
> > >   return;
> > >   }
> > > @@ -324,6 +325,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > >
> > m_AVND_COMP_PRES_STATE_IS_INSTANTIATIONFAILED(*o_comp) ||
> > >
> > m_AVND_COMP_PRES_STATE_IS_TERMINATING(*o_comp) ||
> > >
> > m_AVND_COMP_PRES_STATE_IS_TERMINATIONFAILED(*o_comp)) {
> > > + LOG_ER("Component '%s' is not healthy (pres
> > state '%u')", osaf_extended_name_borrow(_start->comp_name),
> > > + (*o_comp)->pres);
> > >   *o_amf_rc = SA_AIS_ERR_TRY_AGAIN;
> > >   return;
> > >   }
> > > @@ -347,6 +350,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > >   if (0 == avnd_hcdb_rec_get(cb, _chk)) {
> > >   /* HC instance did not exist, look for 
> > > HC type
> > */
> > >   if (nullptr == avnd_hctypedb_rec_get(cb,
> > (*o_comp)->saAmfCompType, _start->hc_key)) {
> > > + LOG_ER("Health check is not
> > configured for component '%s'",
> > > +
> > osaf_extended_name_borrow(_start->comp_name));
> > >   *o_amf_rc =
> > SA_AIS_ERR_NOT_EXIST;
> > >   return;
> > >   }
> > > @@ -356,6 +361,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > >   tmp_hc_rec.req_hdl = hc_start->hdl;
> > >   /* determine if this healthcheck is already 
> > > active */
> > >   if (0 != m_AVND_COMPDB_REC_HC_GET(**o_comp,
> > tmp_hc_rec)) {
> > > + LOG_ER("Health check is already active for
> 

Re: [devel] [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]

2016-11-30 Thread Hans Nordebäck
Hi Nagu,
Not sure, but if these errors are not OpenSAF errors, i.e. not depending on 
application configuration, use  etc. 
should they really be logged as errors? After running various OpenSAF tests 
there should be no errors in the syslog 
and there may be test case that will fail now, even though there are no OpenSAF 
errors?
/Regards HansN

-Original Message-
From: Nagendra Kumar [mailto:nagendr...@oracle.com] 
Sent: den 1 december 2016 06:37
To: Hans Nordebäck ; Praveen Malviya 
; Minh Hon Chau ; Gary 
Lee 
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]

Hi Hans,
I find it more appropriate here as we are returning Error codes 
so log the information as Error :)

Thanks
-Nagu
> -Original Message-
> From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com]
> Sent: 29 November 2016 14:20
> To: Nagendra Kumar; Praveen Malviya; minh.c...@dektech.com.au; 
> gary@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] amfnd: add logs for HC start failures 
> [#340]
> 
> Hi Nagu,
> 
> a question, why use LOG_ER and not LOG_WA below? /Thanks HansN
> 
> 
> On 11/04/2016 09:32 AM, nagendr...@oracle.com wrote:
> >   osaf/services/saf/amf/amfnd/chc.cc |  7 +++
> >   1 files changed, 7 insertions(+), 0 deletions(-)
> >
> >
> > diff --git a/osaf/services/saf/amf/amfnd/chc.cc
> b/osaf/services/saf/amf/amfnd/chc.cc
> > --- a/osaf/services/saf/amf/amfnd/chc.cc
> > +++ b/osaf/services/saf/amf/amfnd/chc.cc
> > @@ -315,6 +315,7 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> >
> > /* get the comp */
> > if ((*o_comp = avnd_compdb_rec_get(cb->compdb,
> Amf::to_string(_start->comp_name))) == nullptr) {
> > +   LOG_ER("Component '%s' doesn't exist in
> DB", osaf_extended_name_borrow(_start->comp_name));
> > *o_amf_rc = SA_AIS_ERR_NOT_EXIST;
> > return;
> > }
> > @@ -324,6 +325,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> >
> m_AVND_COMP_PRES_STATE_IS_INSTANTIATIONFAILED(*o_comp) ||
> >
> m_AVND_COMP_PRES_STATE_IS_TERMINATING(*o_comp) ||
> >
> m_AVND_COMP_PRES_STATE_IS_TERMINATIONFAILED(*o_comp)) {
> > +   LOG_ER("Component '%s' is not healthy (pres
> state '%u')", osaf_extended_name_borrow(_start->comp_name),
> > +   (*o_comp)->pres);
> > *o_amf_rc = SA_AIS_ERR_TRY_AGAIN;
> > return;
> > }
> > @@ -347,6 +350,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > if (0 == avnd_hcdb_rec_get(cb, _chk)) {
> > /* HC instance did not exist, look for HC type
> */
> > if (nullptr == avnd_hctypedb_rec_get(cb,
> (*o_comp)->saAmfCompType, _start->hc_key)) {
> > +   LOG_ER("Health check is not
> configured for component '%s'",
> > +
>   osaf_extended_name_borrow(_start->comp_name));
> > *o_amf_rc =
> SA_AIS_ERR_NOT_EXIST;
> > return;
> > }
> > @@ -356,6 +361,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb,
> > tmp_hc_rec.req_hdl = hc_start->hdl;
> > /* determine if this healthcheck is already active */
> > if (0 != m_AVND_COMPDB_REC_HC_GET(**o_comp,
> tmp_hc_rec)) {
> > +   LOG_ER("Health check is already active for
> component '%s'",
> > +
>   osaf_extended_name_borrow(_start->comp_name));
> > *o_amf_rc = SA_AIS_ERR_EXIST;
> > return;
> > }
> 

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM track callback [#2141]

2016-11-30 Thread Nagendra Kumar
Hi Minh,
But fixing small thing is complicating the code. You can just try to 
stop CLM track again in Quisced->Standby transition and even if it fails you 
may try few times as it is Standby and it is not doing much job accept that it 
is receiving some checkpoints. If that fails, it is ok to reboot the Standby 
node. I am sure this is very rare case and the added code may not be getting 
chance to get executed.
Also, it is important to investigate why CLM is giving try-again.

Thanks
-Nagu

> -Original Message-
> From: minh chau [mailto:minh.c...@dektech.com.au]
> Sent: 01 December 2016 10:02
> To: Nagendra Kumar; Praveen Malviya; hans.nordeb...@ericsson.com;
> gary@dektech.com.au; long.hb.ngu...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM
> track callback [#2141]
> 
> Hi Nagu,
> 
> The patch has already had a variable to memorize the track start/stop status.
> Wouldn't it be nice to avoid a node reboot (as consequence of AMFD's
> exit) due to TRY_AGAIN/TIMEOUT which could be classified as temporary
> service denial? And retry to stop tracking until coming back to as active one.
> 
> Thanks,
> Minh
> 
> On 30/11/16 17:25, Nagendra Kumar wrote:
> > Hi Minh,
> > I think the patch could be made simpler by again try to stop CLM
> track(keep a variable to indicate it wasn't stopped before) when AMFD
> becomes Standby. If it doesn't get stopped again, then exit Amfd as it is kind
> of double fault.
> >
> > Thanks
> > -Nagu
> >
> >> -Original Message-
> >> From: minh chau [mailto:minh.c...@dektech.com.au]
> >> Sent: 30 November 2016 10:40
> >> To: praveen malviya;hans.nordeb...@ericsson.com; Nagendra Kumar;
> >> gary@dektech.com.au;long.hb.ngu...@dektech.com.au
> >> Cc:opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives
> >> CLM track callback [#2141]
> >>
> >> Hi Praveen,
> >>
> >> I have seen a limitation of using the same job queue now, that it
> >> will increase the possibility of loss of IMM update job which is
> >> causing incorrect headless recovery. The timer is one alternative. We
> >> had spawned a thread to deal with TRY_AGAIN/TIMEOUT in case of start
> >> cluster tracking, so stop cluster tracking should be handled in a similar
> way.
> >> I will create another ticket and assign it to myself to introduce a
> >> thread handler for using CLM service in AMF. Some background/similar
> >> tickets: #1609, #1531. For the scope of this ticket, the floated
> >> patch + a correction from your comment that standby AMFD responds
> CLM
> >> callback, they should be enough to avoid the reported problem in
> >> #2141. Do you agree?
> >>
> >> Thanks,
> >> Minh
> >>
> >>
> >> On 21/11/16 22:37, praveen malviya wrote:
> >>> Hi Minh,
> >>>
> >>> If the intention is to use same job queue then, I think, one problem
> >>> will come:
> >>>   "Currently AMFD empties job queue when AMFD becomes quiesced in
> >>> avd_mds_qsd_role_evh(). Most probably before emptying, AMFD will try
> >>> to stop CLM tracking and now if the API again fails then this
> >>> failure cannot be handled as AMFD cannot be blocked in
> >>> avd_mds_qsd_role_evh() for long time. And then AMFD will empty the
> job queue."
> >>>
> >>> Can we think of some way to avoid above problem? One solution,
> >>> without using job queue, could be: When CLM track fails, run a timer
> >>> of 1 seconds. When this timer expires, check role of AMFD. If the
> >>> role is still standby and Track stop is pending then now AMFD will give a
> try.
> >>> If it again fails then rerun the timer based on the value of return
> >>> code. We can decide how many times this can be tried.
> >>>
> >>> One more thing. on standby controller if clm track has failed then
> >>> it should respond for any track callback that it receives without
> >>> taking any action. This needs to be done irrespective of the
> >>> approach of the solution.
> >>>
> >>> Thanks,
> >>> Praveen
> >>>
> >>> On 21-Nov-16 4:38 PM, minh chau wrote:
>  Hi Praveen,
> 
>  I am thinking if creating a CLM job (likely IMM update, NTF send
>  job) in order to stop tracking at non-active AMFD.
>  Should it be a better version than this patch that AMFD will still
>  have chance to stop tracking until being promoted back to active role?
> 
>  Thanks,
>  Minh
> 
>  On 18/11/16 11:38, minh chau wrote:
> > Hi Praveen,
> >
> > It has been seen once in swapping 2N Opensaf SI. It was just an
> > error indicating that AMFD failed to stop track callback, after
> > that the si-swap also succeeded. CLM service could be busy during
> > switch-over but AMF (as well as other services) is supposed to
> > handle that return code properly as AMFD has done in track start.
> > Further testing, it exposed a problem due to clm track callback
> received on standby AMFD.
> >
> > AMFD 

Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while section_hdr_update_fail [#2207]

2016-11-30 Thread A V Mahesh
Hi Hoang,

ACK for the attached patch .

-AVM


On 11/29/2016 6:01 PM, A V Mahesh wrote:
> HI All,
>
> Validation of this patch dependent on #2202 , so please find the attached
> re-based patch of this (#2207) ticket.
>
> Hoang  will re-publish the V3 patch once he is in office.
>
> -AVM
>
> On 11/29/2016 4:23 PM, Hoang Vo wrote:
>>   osaf/libs/common/cpsv/include/cpnd_sec.h |   2 +-
>>   osaf/services/saf/cpsv/cpnd/cpnd_db.c|   5 -
>>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c   |   8 
>>   osaf/services/saf/cpsv/cpnd/cpnd_proc.c  |   2 +-
>>   osaf/services/saf/cpsv/cpnd/cpnd_sec.cc  |  25 
>> ++---
>>   5 files changed, 24 insertions(+), 18 deletions(-)
>>
>>
>> problem:
>> the steps to add a section is add_db_tree -> update_sec_hdr -> 
>> update_ckpt_hdr
>> so if an error occur cpsv should handle error in reverse order.
>> currently, section_hdr_update_fails, cpsv revert ckpt_hdr also that 
>> case error
>>
>> solution:
>> only revert db_tree in case section_hdr_update_fails
>>
>> diff --git a/osaf/libs/common/cpsv/include/cpnd_sec.h 
>> b/osaf/libs/common/cpsv/include/cpnd_sec.h
>> --- a/osaf/libs/common/cpsv/include/cpnd_sec.h
>> +++ b/osaf/libs/common/cpsv/include/cpnd_sec.h
>> @@ -39,7 +39,7 @@ CPND_CKPT_SECTION_INFO *
>>   cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE *, const 
>> SaCkptSectionIdT *);
>> CPND_CKPT_SECTION_INFO *
>> -cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *);
>> +cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *, bool);
>> CPND_CKPT_SECTION_INFO *
>>   cpnd_get_sect_with_id(const CPND_CKPT_NODE *, uint32_t lcl_sec_id);
>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_db.c 
>> b/osaf/services/saf/cpsv/cpnd/cpnd_db.c
>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_db.c
>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_db.c
>> @@ -391,6 +391,7 @@ CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_ad
>>   int32_t lcl_sec_id = 0;
>>   uint32_t rc = NCSCC_RC_SUCCESS;
>>   uint32_t value = 0, i = 0, j = 0;
>> +bool hdr_update = true;
>> TRACE_ENTER();
>>   /* get the lcl_sec_id */
>> @@ -469,8 +470,10 @@ CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_ad
>>   return pSecPtr;
>>  section_hdr_update_fails:
>> +hdr_update = false;
>> +
>>ckpt_hdr_update_fails:
>> -cpnd_ckpt_sec_del(cp_node, id);
>> +cpnd_ckpt_sec_del(cp_node, id, hdr_update);
>>  section_add_fails:
>>   if (pSecPtr->sec_id.id != NULL)
>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c 
>> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c
>> @@ -2357,12 +2357,12 @@ static uint32_t cpnd_evt_proc_ckpt_sect_
>>   /* delete the section */
>>   if (gen_sec_id)
>>   tmp_sec_info =
>> - cpnd_ckpt_sec_del(cp_node, _info->sec_id);
>> + cpnd_ckpt_sec_del(cp_node, _info->sec_id, true);
>>   else
>>   tmp_sec_info =
>> cpnd_ckpt_sec_del(cp_node,
>> evt->info.sec_creatReq.
>> - sec_attri.sectionId);
>> + sec_attri.sectionId, true);
>> if (tmp_sec_info == sec_info) {
>>   cp_node->replica_info.
>> @@ -2494,7 +2494,7 @@ static uint32_t cpnd_evt_proc_ckpt_sect_
>>   rc = cpnd_ckpt_sec_find(cp_node, >info.sec_delReq.sec_id);
>>   if (rc == NCSCC_RC_SUCCESS) {
>>   -sec_info = cpnd_ckpt_sec_del(cp_node, 
>> >info.sec_delReq.sec_id);
>> +sec_info = cpnd_ckpt_sec_del(cp_node, 
>> >info.sec_delReq.sec_id, true);
>>   /* resetting lcl_sec_id mapping */
>>   if (sec_info == NULL) {
>>   rc = NCSCC_RC_FAILURE;
>> @@ -2774,7 +2774,7 @@ static uint32_t cpnd_evt_proc_nd2nd_ckpt
>>   send_evt.info.cpnd.info.sec_delete_rsp.error = 
>> SA_AIS_ERR_TRY_AGAIN;
>>   goto nd_rsp;
>>   }
>> -sec_info = cpnd_ckpt_sec_del(cp_node, 
>> >info.sec_delete_req.sec_id);
>> +sec_info = cpnd_ckpt_sec_del(cp_node, 
>> >info.sec_delete_req.sec_id, true);
>>   if (sec_info == NULL) {
>>   if 
>> (m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags)) {
>>   TRACE_4("cpnd ckpt sect del failed for 
>> sec_id:%s,ckpt_id:%llx",
>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c 
>> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c
>> @@ -1544,7 +1544,7 @@ uint32_t cpnd_proc_sec_expiry(CPND_CB *c
>>   return NCSCC_RC_FAILURE;
>>   }
>>   -cpnd_ckpt_sec_del(cp_node, _info->sec_id);
>> +cpnd_ckpt_sec_del(cp_node, _info->sec_id, true);
>> cp_node->replica_info.shm_sec_mapping[pSec_info->lcl_sec_id] = 1;
>> /* send out destory to all cpnd's maintaining this ckpt */
>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_sec.cc 
>> 

Re: [devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1

2016-11-30 Thread Neelakanta Reddy
Hi ,

will be pushing the patch if there are no further comments.
#2212 is opened to change SmfImmUtils AdminOperation the same way as 
Node group admin operation.

Thanks,
Neel.

On 2016/11/30 02:42 PM, Tai Dinh wrote:
> Thank Neel,
>
> ACK from me.
>
> /Tai
> -Original Message-
> From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com]
> Sent: Wednesday, November 30, 2016 3:47 PM
> To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com;
> tai.d...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling
> adminoperation[#2211] V1
>
>   osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
>   1 files changed, 15 insertions(+), 11 deletions(-)
>
>
> diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
>   }
>   
>   /* Call the admin operation */
> - do {
> - TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> + 
> + TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> , 0, i_operationId, i_params,
> + , i_timeout);
> + while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN) &&
> (retry > 0) ){
> + sleep(2);
>   rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> , 0, i_operationId, i_params,
> -,
> i_timeout);
> - if (retry <= 0) {
> - LOG_NO("Fail to invoke admin operation, too many
> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> -i_dn.c_str(), i_operationId);
> - rc = SA_AIS_ERR_TRY_AGAIN;
> - goto done;
> - }
> - sleep(2);
> + , i_timeout);
>   retry--;
> - } while ((rc == SA_AIS_OK) && (returnValue ==
> SA_AIS_ERR_TRY_AGAIN));
> + }
> + 
> + if (retry <= 0) {
> + LOG_NO("Fail to invoke admin operation, too many
> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> + i_dn.c_str(), i_operationId);
> + rc = SA_AIS_ERR_TRY_AGAIN;
> + goto done;
> + }
>   
>   if ( rc != SA_AIS_OK) {
>   LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s],
> opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);
>


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM track callback [#2141]

2016-11-30 Thread minh chau
Hi Nagu,

The patch has already had a variable to memorize the track start/stop 
status.
Wouldn't it be nice to avoid a node reboot (as consequence of AMFD's 
exit) due to TRY_AGAIN/TIMEOUT which could be classified as temporary 
service denial? And retry to stop tracking until coming back to as 
active one.

Thanks,
Minh

On 30/11/16 17:25, Nagendra Kumar wrote:
> Hi Minh,
>   I think the patch could be made simpler by again try to stop CLM 
> track(keep a variable to indicate it wasn't stopped before) when AMFD becomes 
> Standby. If it doesn't get stopped again, then exit Amfd as it is kind of 
> double fault.
>
> Thanks
> -Nagu
>
>> -Original Message-
>> From: minh chau [mailto:minh.c...@dektech.com.au]
>> Sent: 30 November 2016 10:40
>> To: praveen malviya;hans.nordeb...@ericsson.com; Nagendra Kumar;
>> gary@dektech.com.au;long.hb.ngu...@dektech.com.au
>> Cc:opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM
>> track callback [#2141]
>>
>> Hi Praveen,
>>
>> I have seen a limitation of using the same job queue now, that it will 
>> increase
>> the possibility of loss of IMM update job which is causing incorrect headless
>> recovery. The timer is one alternative. We had spawned a thread to deal with
>> TRY_AGAIN/TIMEOUT in case of start cluster tracking, so stop cluster
>> tracking should be handled in a similar way.
>> I will create another ticket and assign it to myself to introduce a thread
>> handler for using CLM service in AMF. Some background/similar
>> tickets: #1609, #1531. For the scope of this ticket, the floated patch + a
>> correction from your comment that standby AMFD responds CLM callback,
>> they should be enough to avoid the reported problem in #2141. Do you
>> agree?
>>
>> Thanks,
>> Minh
>>
>>
>> On 21/11/16 22:37, praveen malviya wrote:
>>> Hi Minh,
>>>
>>> If the intention is to use same job queue then, I think, one problem
>>> will come:
>>>   "Currently AMFD empties job queue when AMFD becomes quiesced in
>>> avd_mds_qsd_role_evh(). Most probably before emptying, AMFD will try
>>> to stop CLM tracking and now if the API again fails then this failure
>>> cannot be handled as AMFD cannot be blocked in avd_mds_qsd_role_evh()
>>> for long time. And then AMFD will empty the job queue."
>>>
>>> Can we think of some way to avoid above problem? One solution, without
>>> using job queue, could be: When CLM track fails, run a timer of 1
>>> seconds. When this timer expires, check role of AMFD. If the role is
>>> still standby and Track stop is pending then now AMFD will give a try.
>>> If it again fails then rerun the timer based on the value of return
>>> code. We can decide how many times this can be tried.
>>>
>>> One more thing. on standby controller if clm track has failed then it
>>> should respond for any track callback that it receives without taking
>>> any action. This needs to be done irrespective of the approach of the
>>> solution.
>>>
>>> Thanks,
>>> Praveen
>>>
>>> On 21-Nov-16 4:38 PM, minh chau wrote:
 Hi Praveen,

 I am thinking if creating a CLM job (likely IMM update, NTF send job) in
 order to stop tracking at non-active AMFD.
 Should it be a better version than this patch that AMFD will still have
 chance to stop tracking until being promoted back to active role?

 Thanks,
 Minh

 On 18/11/16 11:38, minh chau wrote:
> Hi Praveen,
>
> It has been seen once in swapping 2N Opensaf SI. It was just an error
> indicating that AMFD failed to stop track callback, after that the
> si-swap also succeeded. CLM service could be busy during switch-over
> but AMF (as well as other services) is supposed to handle that return
> code properly as AMFD has done in track start. Further testing, it
> exposed a problem due to clm track callback received on standby AMFD.
>
> AMFD can not be too busy just try to stop clm track. First option,
> AMFD can start another thread which just tries to stop track callback,
> but that's likely to generate more problems where this standby AMFD is
> promoted back to active in the meantime that thread of stopping track
> is on going. At that time, main thread will be starting track and
> another thread is trying to stop track.
>
> Second option as in the patch, AMFD retries to stop track where
> problem happens in callback. At the time standby AMFD switching back
> to active, we stop and start track again, but it seems no point of
> doing this. I hope it's less impact on existing behavior, do you see
> any problems?
>
> Regarding standby controller with locked/disabled CLM, I think it is
> mentioned (in #1828 or some where else?) that this matter requires
> more thoughts as whether we want Ha role assigned to AMFD in non
> member node. It seems to beyond this ticket.
>
> Thanks,
> Minh
>
> On 17/11/16 22:03, 

Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]

2016-11-30 Thread Vo Minh Hoang
Dear Mahesh,

ACK all three patches, tested, found no problem.

Sincerely,
Hoang

-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, November 29, 2016 5:37 PM
To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 0 of 3] Review Request for leap : now leap library ensure
shm availability before writing [#2202]

Summary:leap : now leap library ensure shm availability before writing
[#2202] Review request for Trac Ticket(s): #2202 Peer Reviewer(s): Hoang /
Ramesh Pull request to: <> Affected
branch(es): <> Development branch: <>


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset 7b53e1b3754622fe90c22c801adeb7df6d808c30
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 15:59:21 +0530

leap : now leap library ensure shm availability before writing
[#2202]
 Issue  :

If OSAF_CKPT_SHM_ALLOC_GUARANTEE is NOT set and SHM is 100% used in
system
, pnd Segmentation fault (core dumped) at LEAP memcpy().

Fix :

Now LEAP library ensures shm free space before writing This may
degrade
some performance of cpsv , if OSAF_CKPT_SHM_ALLOC_GUARANTEE is set,
cpsv
give natural performance.

changeset 083114e13c00c9c4267ffe65a86c1a97a951b876
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 16:02:06 +0530

cpsv : update cpsv error handing based on leap changes [#2202]

changeset fb509abb1d1583315f585663fd75bf73e35211a6
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 16:02:58 +0530

mqsv : update mqsv error handing based on leap changes [#2202]


Complete diffstat:
--
 osaf/libs/common/cpsv/include/cpnd_cb.h   |   4 ++--
 osaf/libs/common/cpsv/include/cpnd_init.h |   8 
 osaf/libs/common/cpsv/include/cpnd_sec.h  |   2 +-
 osaf/libs/core/include/ncs_osprm.h|   2 +-
 osaf/libs/core/leap/os_defs.c |  20 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  12 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  82
+---
--
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  31
++-
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|  24 
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  12 ++--
 osaf/services/saf/glsv/glnd/glnd_shm.c|   2 +-
 osaf/services/saf/mqsv/mqnd/mqnd_shm.c|   2 +-
 12 files changed, 123 insertions(+), 78 deletions(-)


Testing Commands:
-
Create situation that node SHM  reaches 100% usage and then perform any CPSV
operation which writes to SHM

Testing, Expected Results:
--
 <>


Conditions of Submission:
-
 <>


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which 

Re: [devel] [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202]

2016-11-30 Thread A V Mahesh
Hi Hoang,

On 12/1/2016 9:15 AM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> I have one small concern about:
> Please consider get CB data by ncshm_take_hdl() or pass in ensures_space as
> parameter.
> When CB is global access, adding it to parameters of many API is a little
> bit inconvenient.
The `cb->shm_alloc_guaranteed = atoi(ptr);` only
update at the tim of cpnd_lib_init() by reading 
OSAF_CKPT_SHM_ALLOC_GUARANTEE env
and no further run-time update, so ideally you dont required lock.

Even the for CPSV API`s cb lock was taken and so the call shm will 
automatically under lock
do see any specific missing place in API code ?

-AVM
> And passing CB for just one ensures_space param might confuse the usage of
> function, maybe in future use.
>
> This is just my opinion, please consider about it.
>
> Thank you and best regards,
> Hoang
>
>
> -Original Message-
> From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com]
> Sent: Tuesday, November 29, 2016 5:37 PM
> To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 2 of 3] cpsv : update cpsv error handing based on leap
> changes [#2202]
>
>   osaf/libs/common/cpsv/include/cpnd_cb.h   |   4 +-
>   osaf/libs/common/cpsv/include/cpnd_init.h |   8 +-
>   osaf/libs/common/cpsv/include/cpnd_sec.h  |   2 +-
>   osaf/libs/core/include/ncs_osprm.h|   2 +-
>   osaf/services/saf/cpsv/cpnd/cpnd_db.c |  12 ++--
>   osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  82
> +-
>   osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  31 ++
>   osaf/services/saf/cpsv/cpnd/cpnd_res.c|  24 +++--
>   osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  12 ++--
>   9 files changed, 103 insertions(+), 74 deletions(-)
>
>
> diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h
> b/osaf/libs/common/cpsv/include/cpnd_cb.h
> --- a/osaf/libs/common/cpsv/include/cpnd_cb.h
> +++ b/osaf/libs/common/cpsv/include/cpnd_cb.h
> @@ -341,8 +341,8 @@ uint32_t cpnd_amf_register(CPND_CB *cpnd  uint32_t
> cpnd_amf_deregister(CPND_CB *cpnd_cb);  uint32_t
> cpnd_client_extract_bits(uint32_t bitmap_value, uint32_t *bit_position);
> uint32_t cpnd_res_ckpt_sec_del(CPND_CKPT_NODE *cp_node); -uint32_t
> cpnd_ckpt_replica_create_res(NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf,
> CPND_CKPT_NODE **cp_node,
> - uint32_t ref_cnt, CKPT_INFO
> *cp_info, bool shm_alloc_guaranteed);
> +uint32_t cpnd_ckpt_replica_create_res(CPND_CB *cb,
> NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf, CPND_CKPT_NODE **cp_node,
> + uint32_t ref_cnt, CKPT_INFO
> *cp_info);
>   int32_t cpnd_find_free_loc(CPND_CB *cb, CPND_TYPE_INFO type);  uint32_t
> cpnd_ckpt_write_header(CPND_CB *cb, uint32_t nckpts);  uint32_t
> cpnd_cli_info_write_header(CPND_CB *cb, int32_t n_clients); diff --git
> a/osaf/libs/common/cpsv/include/cpnd_init.h
> b/osaf/libs/common/cpsv/include/cpnd_init.h
> --- a/osaf/libs/common/cpsv/include/cpnd_init.h
> +++ b/osaf/libs/common/cpsv/include/cpnd_init.h
> @@ -90,7 +90,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C  uint32_t
> cpnd_ckpt_remote_cpnd_add(CPND_CKPT_NODE *cp_node, MDS_DEST mds_info);
> uint32_t cpnd_ckpt_remote_cpnd_del(CPND_CKPT_NODE *cp_node, MDS_DEST
> mds_info);  int32_t cpnd_ckpt_get_lck_sec_id(CPND_CKPT_NODE *cp_node);
> -uint32_t cpnd_ckpt_sec_write(CPND_CKPT_NODE *cp_node,
> CPND_CKPT_SECTION_INFO
> +uint32_t cpnd_ckpt_sec_write(CPND_CB *cb, CPND_CKPT_NODE *cp_node,
> +CPND_CKPT_SECTION_INFO
> *sec_info, const void *data, uint64_t size,
> uint64_t offset, uint32_t type);  uint32_t cpnd_ckpt_sec_read(CPND_CKPT_NODE
> *cp_node, CPND_CKPT_SECTION_INFO
>*sec_info, void *data, uint64_t size, uint64_t
> offset); @@ -164,7 +164,7 @@ void cpnd_evt_node_getnext(CPND_CB *cb,
> uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE
> *evt_node);  uint32_t cpnd_evt_node_del(CPND_CB *cb,
> CPSV_CPND_ALL_REPL_EVT_NODE *evt_node);  CPND_CKPT_NODE
> *cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, SaConstStringT ckpt_name);
> -CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node,
> SaCkptSectionIdT *id, SaTimeT exp_time,
> +CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CB *cb, CPND_CKPT_NODE
> +*cp_node, SaCkptSectionIdT *id, SaTimeT exp_time,
> uint32_t gen_flag);
>   void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt);
> uint32_t cpnd_ckpt_node_tree_init(CPND_CB *cb); @@ -176,8 +176,8 @@ void
> cpnd_client_node_tree_cleanup(CPND_
>   void cpnd_client_node_tree_destroy(CPND_CB *cb);  void
> cpnd_allrepl_write_evt_node_tree_cleanup(CPND_CB *cb);  void
> cpnd_allrepl_write_evt_node_tree_destroy(CPND_CB *cb); -uint32_t
> cpnd_sec_hdr_update(CPND_CKPT_SECTION_INFO *pSecPtr, CPND_CKPT_NODE
> *cp_node); -uint32_t cpnd_ckpt_hdr_update(CPND_CKPT_NODE *cp_node);
> +uint32_t cpnd_sec_hdr_update(CPND_CB 

Re: [devel] [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202]

2016-11-30 Thread Vo Minh Hoang
Dear Mahesh,

I have one small concern about:
Please consider get CB data by ncshm_take_hdl() or pass in ensures_space as
parameter.
When CB is global access, adding it to parameters of many API is a little
bit inconvenient.
And passing CB for just one ensures_space param might confuse the usage of
function, maybe in future use.

This is just my opinion, please consider about it.

Thank you and best regards,
Hoang


-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] 
Sent: Tuesday, November 29, 2016 5:37 PM
To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 2 of 3] cpsv : update cpsv error handing based on leap
changes [#2202]

 osaf/libs/common/cpsv/include/cpnd_cb.h   |   4 +-
 osaf/libs/common/cpsv/include/cpnd_init.h |   8 +-
 osaf/libs/common/cpsv/include/cpnd_sec.h  |   2 +-
 osaf/libs/core/include/ncs_osprm.h|   2 +-
 osaf/services/saf/cpsv/cpnd/cpnd_db.c |  12 ++--
 osaf/services/saf/cpsv/cpnd/cpnd_evt.c|  82
+-
 osaf/services/saf/cpsv/cpnd/cpnd_proc.c   |  31 ++
 osaf/services/saf/cpsv/cpnd/cpnd_res.c|  24 +++--
 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc   |  12 ++--
 9 files changed, 103 insertions(+), 74 deletions(-)


diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h
b/osaf/libs/common/cpsv/include/cpnd_cb.h
--- a/osaf/libs/common/cpsv/include/cpnd_cb.h
+++ b/osaf/libs/common/cpsv/include/cpnd_cb.h
@@ -341,8 +341,8 @@ uint32_t cpnd_amf_register(CPND_CB *cpnd  uint32_t
cpnd_amf_deregister(CPND_CB *cpnd_cb);  uint32_t
cpnd_client_extract_bits(uint32_t bitmap_value, uint32_t *bit_position);
uint32_t cpnd_res_ckpt_sec_del(CPND_CKPT_NODE *cp_node); -uint32_t
cpnd_ckpt_replica_create_res(NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf,
CPND_CKPT_NODE **cp_node,
-   uint32_t ref_cnt, CKPT_INFO
*cp_info, bool shm_alloc_guaranteed);
+uint32_t cpnd_ckpt_replica_create_res(CPND_CB *cb,
NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf, CPND_CKPT_NODE **cp_node,
+   uint32_t ref_cnt, CKPT_INFO
*cp_info);
 int32_t cpnd_find_free_loc(CPND_CB *cb, CPND_TYPE_INFO type);  uint32_t
cpnd_ckpt_write_header(CPND_CB *cb, uint32_t nckpts);  uint32_t
cpnd_cli_info_write_header(CPND_CB *cb, int32_t n_clients); diff --git
a/osaf/libs/common/cpsv/include/cpnd_init.h
b/osaf/libs/common/cpsv/include/cpnd_init.h
--- a/osaf/libs/common/cpsv/include/cpnd_init.h
+++ b/osaf/libs/common/cpsv/include/cpnd_init.h
@@ -90,7 +90,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C  uint32_t
cpnd_ckpt_remote_cpnd_add(CPND_CKPT_NODE *cp_node, MDS_DEST mds_info);
uint32_t cpnd_ckpt_remote_cpnd_del(CPND_CKPT_NODE *cp_node, MDS_DEST
mds_info);  int32_t cpnd_ckpt_get_lck_sec_id(CPND_CKPT_NODE *cp_node);
-uint32_t cpnd_ckpt_sec_write(CPND_CKPT_NODE *cp_node,
CPND_CKPT_SECTION_INFO
+uint32_t cpnd_ckpt_sec_write(CPND_CB *cb, CPND_CKPT_NODE *cp_node, 
+CPND_CKPT_SECTION_INFO
  *sec_info, const void *data, uint64_t size,
uint64_t offset, uint32_t type);  uint32_t cpnd_ckpt_sec_read(CPND_CKPT_NODE
*cp_node, CPND_CKPT_SECTION_INFO
 *sec_info, void *data, uint64_t size, uint64_t
offset); @@ -164,7 +164,7 @@ void cpnd_evt_node_getnext(CPND_CB *cb,
uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE
*evt_node);  uint32_t cpnd_evt_node_del(CPND_CB *cb,
CPSV_CPND_ALL_REPL_EVT_NODE *evt_node);  CPND_CKPT_NODE
*cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, SaConstStringT ckpt_name);
-CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node,
SaCkptSectionIdT *id, SaTimeT exp_time,
+CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CB *cb, CPND_CKPT_NODE 
+*cp_node, SaCkptSectionIdT *id, SaTimeT exp_time,
  uint32_t gen_flag);
 void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt);
uint32_t cpnd_ckpt_node_tree_init(CPND_CB *cb); @@ -176,8 +176,8 @@ void
cpnd_client_node_tree_cleanup(CPND_
 void cpnd_client_node_tree_destroy(CPND_CB *cb);  void
cpnd_allrepl_write_evt_node_tree_cleanup(CPND_CB *cb);  void
cpnd_allrepl_write_evt_node_tree_destroy(CPND_CB *cb); -uint32_t
cpnd_sec_hdr_update(CPND_CKPT_SECTION_INFO *pSecPtr, CPND_CKPT_NODE
*cp_node); -uint32_t cpnd_ckpt_hdr_update(CPND_CKPT_NODE *cp_node);
+uint32_t cpnd_sec_hdr_update(CPND_CB *cb, CPND_CKPT_SECTION_INFO 
+*pSecPtr, CPND_CKPT_NODE *cp_node); uint32_t 
+cpnd_ckpt_hdr_update(CPND_CB *cb, CPND_CKPT_NODE *cp_node);
 void cpnd_ckpt_node_destroy(CPND_CB *cb, CPND_CKPT_NODE *cp_node);
uint32_t cpnd_get_slot_sub_slot_id_from_mds_dest(MDS_DEST dest);  uint32_t
cpnd_get_slot_sub_slot_id_from_node_id(NCS_NODE_ID i_node_id); diff --git
a/osaf/libs/common/cpsv/include/cpnd_sec.h
b/osaf/libs/common/cpsv/include/cpnd_sec.h
--- a/osaf/libs/common/cpsv/include/cpnd_sec.h
+++ b/osaf/libs/common/cpsv/include/cpnd_sec.h
@@ -39,7 +39,7 @@ CPND_CKPT_SECTION_INFO *  

[devel] [PATCH 0 of 1] Review Request for AMF: support restrictions to auto-repair V2 [#2144]

2016-11-30 Thread Alex Jones
Summary: amf: support restrictions to auto-repair
Review request for Trac Ticket(s): 2144
Peer Reviewer(s): praveen, nagu
Pull request to: 
Affected branch(es): default
Development branch: 


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
This patch should be applied on top of the previous one.

changeset e49b3bcc81901987abe8a221c3a561f883b91a57
Author: Alex Jones 
Date:   Wed, 30 Nov 2016 15:32:19 -0500

amf: support restrictions to auto-repair V2 [#2144]

This patch implements section 3.11.1.4.2 of AMF spec (Restrictions to 
Auto-
Repair).


Complete diffstat:
--
 osaf/libs/common/amf/d2nedu.c  |  15 -
 osaf/libs/common/amf/d2nmsg.c  |  12 
 osaf/libs/common/amf/include/amf_d2nmsg.h  |  13 
 osaf/services/saf/amf/amfd/comp.cc |   2 +-
 osaf/services/saf/amf/amfd/include/su.h|   2 +-
 osaf/services/saf/amf/amfd/include/util.h  |   1 +
 osaf/services/saf/amf/amfd/mds.cc  |   2 +-
 osaf/services/saf/amf/amfd/sgproc.cc   |   2 +-
 osaf/services/saf/amf/amfd/su.cc   |  35 +++-
 osaf/services/saf/amf/amfd/util.cc |  74 ++
 osaf/services/saf/amf/amfnd/clc.cc |  28 ++---
 osaf/services/saf/amf/amfnd/err.cc |  20 +-
 osaf/services/saf/amf/amfnd/evt.cc |   2 +
 osaf/services/saf/amf/amfnd/include/avnd_evt.h |   1 +
 osaf/services/saf/amf/amfnd/include/avnd_mds.h |   4 +-
 osaf/services/saf/amf/amfnd/include/avnd_su.h  |   3 +
 osaf/services/saf/amf/amfnd/main.cc|   1 +
 osaf/services/saf/amf/amfnd/mds.cc |   6 +-
 osaf/services/saf/amf/amfnd/su.cc  |  46 
 osaf/services/saf/amf/amfnd/susm.cc|  10 ++-
 20 files changed, 252 insertions(+), 27 deletions(-)


Testing Commands:
-
1) set saAmfSUMaintenanceCampaign for an SU
2) try different failure scenarios on components in the SU (kill a component,
   etc)


Testing, Expected Results:
--
1) auto-repair should never be engaged while the saAmfSUMaintenanceCampaign is
   set


Conditions of Submission:
-



Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly 

[devel] [PATCH 1 of 1] amf: support restrictions to auto-repair V2 [#2144]

2016-11-30 Thread Alex Jones
 osaf/libs/common/amf/d2nedu.c  |  15 -
 osaf/libs/common/amf/d2nmsg.c  |  12 
 osaf/libs/common/amf/include/amf_d2nmsg.h  |  13 
 osaf/services/saf/amf/amfd/comp.cc |   2 +-
 osaf/services/saf/amf/amfd/include/su.h|   2 +-
 osaf/services/saf/amf/amfd/include/util.h  |   1 +
 osaf/services/saf/amf/amfd/mds.cc  |   2 +-
 osaf/services/saf/amf/amfd/sgproc.cc   |   2 +-
 osaf/services/saf/amf/amfd/su.cc   |  35 +++-
 osaf/services/saf/amf/amfd/util.cc |  74 ++
 osaf/services/saf/amf/amfnd/clc.cc |  28 ++---
 osaf/services/saf/amf/amfnd/err.cc |  20 +-
 osaf/services/saf/amf/amfnd/evt.cc |   2 +
 osaf/services/saf/amf/amfnd/include/avnd_evt.h |   1 +
 osaf/services/saf/amf/amfnd/include/avnd_mds.h |   4 +-
 osaf/services/saf/amf/amfnd/include/avnd_su.h  |   3 +
 osaf/services/saf/amf/amfnd/main.cc|   1 +
 osaf/services/saf/amf/amfnd/mds.cc |   6 +-
 osaf/services/saf/amf/amfnd/su.cc  |  46 
 osaf/services/saf/amf/amfnd/susm.cc|  10 ++-
 20 files changed, 252 insertions(+), 27 deletions(-)


This patch implements section 3.11.1.4.2 of AMF spec (Restrictions to
Auto-Repair).

diff --git a/osaf/libs/common/amf/d2nedu.c b/osaf/libs/common/amf/d2nedu.c
--- a/osaf/libs/common/amf/d2nedu.c
+++ b/osaf/libs/common/amf/d2nedu.c
@@ -414,6 +414,16 @@ uint32_t avsv_edp_dnd_msg(EDU_HDL *hdl, 
{EDU_EXEC, avsv_edp_csi_attr_info, 0, 0, 0,
 
(long)&((AVSV_DND_MSG*)0)->msg_info.d2n_compcsi_assign_msg_info.info.attrs, 0, 
NULL},
 
+   /*AVSV_D2N_SU_MAINTENANCE_MSG_INFO*/
+   {EDU_EXEC, ncs_edp_uns32, 0, 0, 0,
+(long)&((AVSV_DND_MSG 
*)0)->msg_info.d2n_su_maintenance_msg_info.msg_id, 0, NULL},
+{EDU_EXEC, m_NCS_EDP_SACLMNODEIDT, 0, 0, 0,
+(long)&((AVSV_DND_MSG 
*)0)->msg_info.d2n_su_maintenance_msg_info.node_id, 0, NULL},
+   {EDU_EXEC, ncs_edp_sanamet, 0, 0, 0, 
+   
(long)&((AVSV_DND_MSG*)0)->msg_info.d2n_su_maintenance_msg_info.su, 0, NULL},
+   {EDU_EXEC, ncs_edp_sanamet, 0, 0, EDU_EXIT, 
+   
(long)&((AVSV_DND_MSG*)0)->msg_info.d2n_su_maintenance_msg_info.suMaintenanceCampaign,
 0, NULL},
+
{EDU_END, 0, 0, 0, 0, 0, 0, NULL},
};
 
@@ -482,7 +492,8 @@ int avsv_dnd_msg_test_type_fnc(NCSCONTEX
LCL_JMP_OFFSET_AVSV_D2N_REBOOT_MSG = 123,
LCL_JMP_OFFSET_AVSV_N2D_ND_SISU_STATE_INFO_MSG = 125,
LCL_JMP_OFFSET_AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG = 131,
-   LCL_JMP_OFFSET_AVSV_D2N_COMPCSI_ASSIGN_MSG = 137
+   LCL_JMP_OFFSET_AVSV_D2N_COMPCSI_ASSIGN_MSG = 137,
+   LCL_JMP_OFFSET_AVSV_D2N_SU_MAINTENANCE_MSG = 143
};
AVSV_DND_MSG_TYPE type;
 
@@ -554,6 +565,8 @@ int avsv_dnd_msg_test_type_fnc(NCSCONTEX
return LCL_JMP_OFFSET_AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG ;
case AVSV_D2N_COMPCSI_ASSIGN_MSG:
return LCL_JMP_OFFSET_AVSV_D2N_COMPCSI_ASSIGN_MSG;
+   case AVSV_D2N_SU_MAINTENANCE_MSG:
+   return LCL_JMP_OFFSET_AVSV_D2N_SU_MAINTENANCE_MSG;
 
default:
break;
diff --git a/osaf/libs/common/amf/d2nmsg.c b/osaf/libs/common/amf/d2nmsg.c
--- a/osaf/libs/common/amf/d2nmsg.c
+++ b/osaf/libs/common/amf/d2nmsg.c
@@ -513,6 +513,15 @@ static void free_d2n_compcsi_info(AVSV_D
 
 }
 
+static void free_d2n_su_maintenance_info(AVSV_DND_MSG *su_maintenance_msg)
+{
+  AVSV_D2N_SU_MAINTENANCE_MSG_INFO *su_maintenance =
+_maintenance_msg->msg_info.d2n_su_maintenance_msg_info;
+
+   osaf_extended_name_free(_maintenance->su);
+   osaf_extended_name_free(_maintenance->suMaintenanceCampaign);
+}
+
 /
   Name  : avsv_dnd_msg_free
  
@@ -609,6 +618,9 @@ void avsv_dnd_msg_free(AVSV_DND_MSG *msg
case AVSV_D2N_COMPCSI_ASSIGN_MSG:
free_d2n_compcsi_info(msg);
break;
+   case AVSV_D2N_SU_MAINTENANCE_MSG:
+   free_d2n_su_maintenance_info(msg);
+   break;
default:
break;
}
diff --git a/osaf/libs/common/amf/include/amf_d2nmsg.h 
b/osaf/libs/common/amf/include/amf_d2nmsg.h
--- a/osaf/libs/common/amf/include/amf_d2nmsg.h
+++ b/osaf/libs/common/amf/include/amf_d2nmsg.h
@@ -51,6 +51,7 @@ extern "C" {
 #define AVSV_AVD_AVND_MSG_FMT_VER_55
 #define AVSV_AVD_AVND_MSG_FMT_VER_66
 #define AVSV_AVD_AVND_MSG_FMT_VER_77
+#define AVSV_AVD_AVND_MSG_FMT_VER_88
 
 /* Internode/External Components Validation result */
 typedef enum {
@@ -108,6 +109,7 @@ typedef enum {
AVSV_N2D_ND_SISU_STATE_INFO_MSG,
AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG,
AVSV_D2N_COMPCSI_ASSIGN_MSG,
+   

Re: [devel] [PATCH 1 of 1] imm:allow augumentCcbInit with ROF as false in completed callback[#1956] V2

2016-11-30 Thread Zoran Milinkovic
Hi Neelakanta,

The new objectName variable is not used in the code.
I would rather like to see adminOwnerName as some sort of string type than 
using SaNameT. This is not an issue in the patch, only the suggestion.
The reason for this is to try to avoid using SaNameT type as much as possible, 
and replace it with SaImmAdminOwnerNameT (or other sting type).

Find other comments inline

-Original Message-
From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] 
Sent: den 30 november 2016 06:39
To: Zoran Milinkovic ; Hung Duc Nguyen 

Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] imm:allow augumentCcbInit with ROF as false in 
completed callback[#1956] V2

 osaf/libs/agents/saf/imma/imma_cb.h |   5 +
 osaf/libs/agents/saf/imma/imma_db.c |  99 +
 osaf/libs/agents/saf/imma/imma_oi_api.c |  60 +++
 osaf/libs/agents/saf/imma/imma_proc.c   |  64 ++---
 4 files changed, 133 insertions(+), 95 deletions(-)


Two new variables objectName and adminOwnerName added to imma_oi_ccb_record. 
For CCB create operation adminOwnerName will be added.
For CCB delete/modify operation objectName will be added

diff --git a/osaf/libs/agents/saf/imma/imma_cb.h 
b/osaf/libs/agents/saf/imma/imma_cb.h
--- a/osaf/libs/agents/saf/imma/imma_cb.h
+++ b/osaf/libs/agents/saf/imma/imma_cb.h
@@ -36,6 +36,10 @@ struct imma_oi_ccb_record {
struct imma_callback_info* ccbCallback;
SaImmHandleT privateAugOmHandle; 
SaImmAdminOwnerHandleT privateAoHandle;
+   SaNameT objectName;/* The  Object name from the modif/delete ccb 
operationi. The objectName is used to obtain
+   adminOnerName when saImmOiAugmentCcbInitialize is 
called in completed-callback*/

[Zoran] objectName is not used in the code. It can be removed from the patch

+   SaNameT adminOwnerName; /* adminowner name of the ccb id, assigned at 
ccb-create-callback.The adminOwnerName is used 
+saImmOiAugmentCcbInitialize is called in 
completed-callback*/
 };

 typedef struct imma_client_node {
@@ -197,6 +201,7 @@ int imma_oi_ccb_record_set_critical(IMMA
 int imma_oi_ccb_record_terminate(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT 
ccbId);
 int imma_oi_ccb_record_abort(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId);
 int imma_oi_ccb_record_exists(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId);
+struct imma_oi_ccb_record * imma_oi_ccb_record_find(IMMA_CLIENT_NODE *cl_node, 
SaImmOiCcbIdT ccbId);
 int imma_oi_ccb_record_set_error(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT 
ccbId, const SaStringT errorString);
 SaStringT imma_oi_ccb_record_get_error(IMMA_CLIENT_NODE *cl_node, 
SaImmOiCcbIdT ccbId);
 void imma_oi_ccb_allow_error_string(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT 
ccbId);
diff --git a/osaf/libs/agents/saf/imma/imma_db.c 
b/osaf/libs/agents/saf/imma/imma_db.c
--- a/osaf/libs/agents/saf/imma/imma_db.c
+++ b/osaf/libs/agents/saf/imma/imma_db.c
@@ -24,6 +24,8 @@
 */
 
 #include "imma.h"
+#include "osaf_extended_name.h"
+
 /
   Name  : imma_client_tree_init
   Description   : This routine is used to initialize the client tree
@@ -273,6 +275,9 @@ int imma_oi_ccb_record_delete(IMMA_CLIEN
to_delete->mCcbErrorString = NULL;
}
 
+   osaf_extended_name_free(&(to_delete->objectName));
+   osaf_extended_name_free(&(to_delete->adminOwnerName));
+
free(to_delete);
TRACE_LEAVE();
return 1;
@@ -541,6 +546,11 @@ int imma_oi_ccb_record_note_callback(IMM
 */
 
int rs = 0;
+   const SaImmAttrNameT admoNameAttr = SA_IMM_ATTR_ADMIN_OWNER_NAME;
+   SaImmAttrValuesT_2 **attr, **attributes = NULL;
+   SaImmAttrValuesT_2 *attrVal = NULL;
+   size_t attrDataSize = 0;
+
struct imma_oi_ccb_record *tmp = imma_oi_ccb_record_find(cl_node, 
ccbId);
 
if(tmp && !(tmp->isAborted)) {
@@ -553,7 +563,96 @@ int imma_oi_ccb_record_note_callback(IMM
rs = 1;
}
}
+   if(callback){
 
+

[Zoran] Remove extra lines

+   int noOfAttributes = 0;
+
+   /* NOTE: The code below is practically a copy of the code
+  in immOm searchNext, for serving the attrValues structure.
+  This code should be factored out into some common function.
+*/
+   IMMSV_ATTR_VALUES_LIST *p = callback->attrValues;
+   int i = 0;
+   while (p) {
+   ++noOfAttributes;
+   p = p->next;
+   }
+
+   p = callback->attrValues;
+
+
+   if(cl_node->isImmA2fCbk) {
+  

Re: [devel] [PATCH 1 of 1] amfnd: honour max num of retries for comp instantiation during repair[#2059]

2016-11-30 Thread Hans Nordeback
ack, code review only.

One comment though,

it would be good in to reduce the cyclomatic complexity in amf as mentioned

in the refactoring tickets. This is particularly true for e.g clc.cc. 
Last time I run pmccabe the numbers for clc.cc were

around 115 which is extraordinary high, now the number is 181, (not 
introduced by this patch, but the patch adds a small

amount to the numbers). It would be good to re-factor the complex if 
statements to reusable functions instead.

Numbers around 30-40 means the function wil be harder to understand and 
more difficult to get acceptable code coverage.

pmccabe clc.cc | sort -nr | head -10
181 181 255 927 566 clc.cc(927): 
avnd_comp_clc_st_chng_prc
37  39  68  758 154 clc.cc(758): avnd_comp_clc_fsm_run
23  23  34  202485  clc.cc(2024): 
avnd_comp_clc_inst_clean_hdler

/Regards HansN


On 09/29/2016 12:08 PM, praveen.malv...@oracle.com wrote:
>   osaf/services/saf/amf/amfnd/clc.cc |  8 
>   1 files changed, 8 insertions(+), 0 deletions(-)
>
>
> SU stuck in TERMINATING state when comp fails to instantiate as part of 
> repair.
> Repair was performed after comp-failover recovery.
>
> Comp faults with comp-failover recovery. After removal of assignments, AMFNFD
> tries to repair component by trying to instantiate it. Since instantiate 
> scripit
> is not present, comp instantiation fails and AMFND cleans up it succesfully. 
> After
> successful cleanup AMFND niether tries to instantiate comp again not it marks 
> comp
> and SU to INST_FAILED state. AMFND should mark the comp and SU to INST_FAILED 
> state
> after finishing instantiation of comp MAX_TRY times. If the value is not 
> configured
> for the comp then AMFND should honour the default value of 
> saAmfNumMaxInstantiateWithoutDelay=2.
>
> Patch honours saAmfNumMaxInstantiateWithoutDelay and if comp instantiation 
> fails after
> these many retries then both comp and SU moves to INST_FAILED state.
>
> diff --git a/osaf/services/saf/amf/amfnd/clc.cc 
> b/osaf/services/saf/amf/amfnd/clc.cc
> --- a/osaf/services/saf/amf/amfnd/clc.cc
> +++ b/osaf/services/saf/amf/amfnd/clc.cc
> @@ -1785,6 +1785,14 @@ static bool is_failed_comp_eligible_for_
>/*A failed component is eligible for instantiation
>  in the context of comp-restart recovery.*/
>   return true;
> +  } else {//SU is failed.
> +  /*As a part of repair comp instantiation failed
> +and it was cleaned up again.*/
> +  if (m_AVND_COMP_IS_FAILED(comp) && !comp->csi_list.n_nodes &&
> +  (!m_AVND_SU_IS_ADMN_TERM(comp->su)) &&
> +  (avnd_cb->oper_state == SA_AMF_OPERATIONAL_ENABLED) &&
> +  (!m_AVND_IS_SHUTTING_DOWN(avnd_cb)) && 
> !sufailover_in_progress(comp->su))
> + return true;
>}
> }
>   


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]

2016-11-30 Thread Neelakanta Reddy
Hi Lennart,

This defect is related to unconditional delay of 2 seconds.
will open another defect/enhancement  to fix Smfutil unpredictable 
timeout and use smfutil in nodegroups as well .

Thanks,
Neel.

On 2016/11/30 02:54 PM, Lennart Lund wrote:
> Hi Neel
>
> In handling of admin state of node groups there is a similar loop that from 
> the beginning was a copy of the loop in SmfUtils. I have changed this loop to 
> remove the incorrect 2 sec sleep and also fixed the unpredictable timeout 
> time for the loop. I was thinking that the loop in SmfUtils should be fixed 
> in the same way and also make it possible to use the SmfUtil admin operation 
> handling with node groups as well. What do you think about this?
>
> Thanks
> Lennart
>
>> -Original Message-
>> From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com]
>> Sent: den 30 november 2016 08:53
>> To: Lennart Lund ; Rafael Odzakow
>> ; Tai Chi Dinh 
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling
>> adminoperation[#2211]
>>
>>   osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
>>   1 files changed, 15 insertions(+), 11 deletions(-)
>>
>>
>> diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
>> b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
>> --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
>> +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
>> @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
>>  }
>>
>>  /* Call the admin operation */
>> -do {
>> -TRACE("call immutil_saImmOmAdminOperationInvoke_2");
>> +
>> +TRACE("call immutil_saImmOmAdminOperationInvoke_2");
>> +rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
>> , 0, i_operationId, i_params,
>> +, i_timeout);
>> +while ((rc == SA_AIS_OK) && (returnValue ==
>> SA_AIS_ERR_TRY_AGAIN)){
>> +sleep(2);
>>  rc =
>> immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
>> , 0, i_operationId, i_params,
>> -   ,
>> i_timeout);
>> -if (retry <= 0) {
>> -LOG_NO("Fail to invoke admin operation, too many
>> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
>> -   i_dn.c_str(), i_operationId);
>> -rc = SA_AIS_ERR_TRY_AGAIN;
>> -goto done;
>> -}
>> -sleep(2);
>> +, i_timeout);
>>  retry--;
>> -} while ((rc == SA_AIS_OK) && (returnValue ==
>> SA_AIS_ERR_TRY_AGAIN));
>> +}
>> +
>> +if (retry <= 0) {
>> +LOG_NO("Fail to invoke admin operation, too many
>> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
>> +i_dn.c_str(), i_operationId);
>> +rc = SA_AIS_ERR_TRY_AGAIN;
>> +goto done;
>> +}
>>
>>  if ( rc != SA_AIS_OK) {
>>  LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s],
>> opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] fm: Add support for differentiating a hung node versus a stopped node V3 [#2160]

2016-11-30 Thread Hans Nordebäck
Hi,

Has anyone had the time to look at this patch?

/Regards HansN

-Original Message-
From: Hans Nordebäck 
Sent: den 23 november 2016 14:45
To: ramesh.bet...@oracle.com; mathi.naic...@oracle.com; Anders Widell 

Cc: opensaf-devel@lists.sourceforge.net
Subject: [devel] [PATCH 1 of 1] fm: Add support for differentiating a hung node 
versus a stopped node V3 [#2160]

 osaf/services/infrastructure/fm/fms/fm_cb.h   |   1 +
 osaf/services/infrastructure/fm/fms/fm_evt.h  |   1 +
 osaf/services/infrastructure/fm/fms/fm_main.c |  54 +++--- 
 osaf/services/infrastructure/fm/fms/fm_mds.c  |  12 +-
 4 files changed, 59 insertions(+), 9 deletions(-)


diff --git a/osaf/services/infrastructure/fm/fms/fm_cb.h 
b/osaf/services/infrastructure/fm/fms/fm_cb.h
--- a/osaf/services/infrastructure/fm/fms/fm_cb.h
+++ b/osaf/services/infrastructure/fm/fms/fm_cb.h
@@ -106,6 +106,7 @@ typedef struct fm_cb {
SaClmHandleT clm_hdl;
bool use_remote_fencing;
SaNameT peer_clm_node_name;
+   bool peer_node_terminated;
 } FM_CB;
 
 extern char *role_string[];
diff --git a/osaf/services/infrastructure/fm/fms/fm_evt.h 
b/osaf/services/infrastructure/fm/fms/fm_evt.h
--- a/osaf/services/infrastructure/fm/fms/fm_evt.h
+++ b/osaf/services/infrastructure/fm/fms/fm_evt.h
@@ -21,6 +21,7 @@
 /* EVT from other GFM over MDS.*/
 typedef enum {
GFM_GFM_EVT_NODE_INFO_EXCHANGE,
+   GFM_GFM_EVT_PEER_IS_TERMINATING,
GFM_GFM_EVT_MAX
 } GFM_GFM_MSG_TYPE;
 
diff --git a/osaf/services/infrastructure/fm/fms/fm_main.c 
b/osaf/services/infrastructure/fm/fms/fm_main.c
--- a/osaf/services/infrastructure/fm/fms/fm_main.c
+++ b/osaf/services/infrastructure/fm/fms/fm_main.c
@@ -59,6 +59,7 @@ char *role_string[] = { "UNDEFINED", "AC  static uint32_t 
fm_agents_startup(void);  static uint32_t fm_get_args(FM_CB *);  static 
uint32_t fms_fms_exchange_node_info(FM_CB *);
+static uint32_t fms_fms_inform_terminating(FM_CB *fm_cb);
 static uint32_t fm_nid_notify(uint32_t);  static uint32_t fm_tmr_start(FM_TMR 
*, SaTimeT);  static SaAisErrorT get_peer_clm_node_name(NODE_ID); @@ -280,6 
+281,7 @@ int main(int argc, char *argv[])
}
 
if (fds[FD_TERM].revents & POLLIN) {
+   fms_fms_inform_terminating(fm_cb);
daemon_exit();
}
 
@@ -622,8 +624,12 @@ static void fm_mbx_msg_handler(FM_CB *fm
 * node_down event has been received.
 */
if (fm_cb->use_remote_fencing) {
-   opensaf_reboot(fm_cb->peer_node_id, 
(char *)fm_cb->peer_clm_node_name.value,
-   "Received Node Down for 
peer controller");
+   if (fm_cb->peer_node_terminated == 
false) {
+   
opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_clm_node_name.value,
+   "Received Node 
Down for peer controller");
+   } else {
+   LOG_NO("Peer node %s is 
terminated, fencing will not be performed", fm_cb->peer_clm_node_name.value);
+   }
} else {
opensaf_reboot(fm_cb->peer_node_id, 
(char *)fm_cb->peer_node_name.value,
"Received Node Down for 
peer controller"); @@ -661,11 +667,12 @@ static void fm_mbx_msg_handler(FM_CB 
*fm
 
LOG_NO("Reseting peer controller node id: %x", 
fm_cb->peer_node_id);
if (fm_cb->use_remote_fencing) {
-   LOG_NO("saClmClusterNodeGet succeeded node_id 
0x%X, clm peer node name %s",
-   fm_mbx_evt->node_id, 
fm_cb->peer_clm_node_name.value);
-
-   opensaf_reboot(fm_cb->peer_node_id, (char 
*)fm_cb->peer_clm_node_name.value,
-   "Received Node Down for peer 
controller");
+   if (fm_cb->peer_node_terminated == false) {
+   opensaf_reboot(fm_cb->peer_node_id, 
(char *)fm_cb->peer_clm_node_name.value,
+   "Received Node Down for 
peer controller");
+   } else {
+   LOG_NO("Peer node %s is terminated, 
fencing will not be performed", fm_cb->peer_clm_node_name.value);
+   }
} else {
opensaf_reboot(fm_cb->peer_node_id, (char 
*)fm_cb->peer_node_name.value,

Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]

2016-11-30 Thread Lennart Lund
Hi Neel

In handling of admin state of node groups there is a similar loop that from the 
beginning was a copy of the loop in SmfUtils. I have changed this loop to 
remove the incorrect 2 sec sleep and also fixed the unpredictable timeout time 
for the loop. I was thinking that the loop in SmfUtils should be fixed in the 
same way and also make it possible to use the SmfUtil admin operation handling 
with node groups as well. What do you think about this?

Thanks
Lennart

> -Original Message-
> From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com]
> Sent: den 30 november 2016 08:53
> To: Lennart Lund ; Rafael Odzakow
> ; Tai Chi Dinh 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling
> adminoperation[#2211]
> 
>  osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
>  1 files changed, 15 insertions(+), 11 deletions(-)
> 
> 
> diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
>   }
> 
>   /* Call the admin operation */
> - do {
> - TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> +
> + TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> , 0, i_operationId, i_params,
> + , i_timeout);
> + while ((rc == SA_AIS_OK) && (returnValue ==
> SA_AIS_ERR_TRY_AGAIN)){
> + sleep(2);
>   rc =
> immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> , 0, i_operationId, i_params,
> -,
> i_timeout);
> - if (retry <= 0) {
> - LOG_NO("Fail to invoke admin operation, too many
> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> -i_dn.c_str(), i_operationId);
> - rc = SA_AIS_ERR_TRY_AGAIN;
> - goto done;
> - }
> - sleep(2);
> + , i_timeout);
>   retry--;
> - } while ((rc == SA_AIS_OK) && (returnValue ==
> SA_AIS_ERR_TRY_AGAIN));
> + }
> +
> + if (retry <= 0) {
> + LOG_NO("Fail to invoke admin operation, too many
> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> + i_dn.c_str(), i_operationId);
> + rc = SA_AIS_ERR_TRY_AGAIN;
> + goto done;
> + }
> 
>   if ( rc != SA_AIS_OK) {
>   LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s],
> opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1

2016-11-30 Thread Tai Dinh
Thank Neel,

ACK from me.

/Tai
-Original Message-
From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] 
Sent: Wednesday, November 30, 2016 3:47 PM
To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com;
tai.d...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling
adminoperation[#2211] V1

 osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
 1 files changed, 15 insertions(+), 11 deletions(-)


diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
--- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
+++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
@@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
}
 
/* Call the admin operation */
-   do {
-   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
+   
+   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
+   rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
, 0, i_operationId, i_params,
+   , i_timeout);
+   while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN) &&
(retry > 0) ){
+   sleep(2);
rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
, 0, i_operationId, i_params,
-  ,
i_timeout);
-   if (retry <= 0) {
-   LOG_NO("Fail to invoke admin operation, too many
SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
-  i_dn.c_str(), i_operationId);
-   rc = SA_AIS_ERR_TRY_AGAIN;
-   goto done;
-   }
-   sleep(2);
+   , i_timeout);
retry--;
-   } while ((rc == SA_AIS_OK) && (returnValue ==
SA_AIS_ERR_TRY_AGAIN));
+   } 
+   
+   if (retry <= 0) {
+   LOG_NO("Fail to invoke admin operation, too many
SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
+   i_dn.c_str(), i_operationId);
+   rc = SA_AIS_ERR_TRY_AGAIN;
+   goto done;
+   }
 
if ( rc != SA_AIS_OK) {
LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s],
opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1

2016-11-30 Thread reddy . neelakanta
 osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
 1 files changed, 15 insertions(+), 11 deletions(-)


diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc 
b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
--- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
+++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
@@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
}
 
/* Call the admin operation */
-   do {
-   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
+   
+   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
+   rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, , 
0, i_operationId, i_params,
+   , i_timeout);
+   while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN) && 
(retry > 0) ){
+   sleep(2);
rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, 
, 0, i_operationId, i_params,
-  , 
i_timeout);
-   if (retry <= 0) {
-   LOG_NO("Fail to invoke admin operation, too many 
SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
-  i_dn.c_str(), i_operationId);
-   rc = SA_AIS_ERR_TRY_AGAIN;
-   goto done;
-   }
-   sleep(2);
+   , i_timeout);
retry--;
-   } while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN));
+   } 
+   
+   if (retry <= 0) {
+   LOG_NO("Fail to invoke admin operation, too many 
SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
+   i_dn.c_str(), i_operationId);
+   rc = SA_AIS_ERR_TRY_AGAIN;
+   goto done;
+   }
 
if ( rc != SA_AIS_OK) {
LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], 
opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);

--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0 of 1] Review Request for smf: Avoid unconditional sleep when calling adminoperation[#2211] V1

2016-11-30 Thread reddy . neelakanta
Summary:smf: Avoid unconditional sleep when calling adminoperation[#2211] V1 
Review request for Trac Ticket(s): 2211
Peer Reviewer(s): Rafael, Lennart, Tai
Affected branch(es): 5.0.x, 5.1.x, default
Development branch: default


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

changeset e2caf94233ab292190c7efe6623a845d58f798d5
Author: Neelakanta Reddy 
Date:   Wed, 30 Nov 2016 14:13:54 +0530

smf: Avoid unconditional sleep when calling adminoperation[#2211] V1


Complete diffstat:
--
 osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
 1 files changed, 15 insertions(+), 11 deletions(-)


Testing Commands:
-
Campaign should involve admin operation

Testing, Expected Results:
--
with this patch unconditional sleep of 2 seconds should not happen

Conditions of Submission:
-
Ack from Reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]

2016-11-30 Thread Neelakanta Reddy
Hi Tai,

Yes, you are correct.
will add "retry > 0" and re-publish the patch.

Thanks,
Neel.

On 2016/11/30 01:43 PM, Tai Dinh wrote:
> Hi Neel,
>
> NACK with the patch.
> Need to add 'retry > 0'  as part of while loop condition.
>
> /Tai
> -Original Message-
> From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com]
> Sent: Wednesday, November 30, 2016 2:53 PM
> To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com;
> tai.d...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling
> adminoperation[#2211]
>
>   osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
>   1 files changed, 15 insertions(+), 11 deletions(-)
>
>
> diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
> @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
>   }
>   
>   /* Call the admin operation */
> - do {
> - TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> + 
> + TRACE("call immutil_saImmOmAdminOperationInvoke_2");
> + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> , 0, i_operationId, i_params,
> + , i_timeout);
> + while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN)){
> + sleep(2);
>   rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
> , 0, i_operationId, i_params,
> -,
> i_timeout);
> - if (retry <= 0) {
> - LOG_NO("Fail to invoke admin operation, too many
> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> -i_dn.c_str(), i_operationId);
> - rc = SA_AIS_ERR_TRY_AGAIN;
> - goto done;
> - }
> - sleep(2);
> + , i_timeout);
>   retry--;
> - } while ((rc == SA_AIS_OK) && (returnValue ==
> SA_AIS_ERR_TRY_AGAIN));
> + }
> + 
> + if (retry <= 0) {
> + LOG_NO("Fail to invoke admin operation, too many
> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
> + i_dn.c_str(), i_operationId);
> + rc = SA_AIS_ERR_TRY_AGAIN;
> + goto done;
> + }
>   
>   if ( rc != SA_AIS_OK) {
>   LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s],
> opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);
>


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]

2016-11-30 Thread A V Mahesh

Hi Hoang,

We are some how able to simulated your test case.
Following are the detailed steps how we reproduced  ,
this test is generating same core dumb as below.

But the provided patch resolved the issue,  can you please test your self
provide your observations , if the test is different please update .

Attached :035_cpsv_2202_V2_debug.patch  & 036_cpsv_2207_debug.patch

Test application  : cpsv_shm_2202.c

==

1)  /etc/init.d/opensafd stop

2)  Change the defaults /dev/shm  size to 3MB

  # vi /etc/fstab  tmpfs

 And add following line

 ` tmpfs   /dev/shm tmpfs 
defaults,size=3m   0 0`


3)  Remount /dev/shm

#mount -o remount /dev/shm

4)  Check /dev/shm reflected with new value

# df -k /dev/shm/

Filesystem 1K-blocks  Used Available Use% Mounted on
tmpfs   3072 0  3072   0% /dev/shm


5) set  ulimit to   unlimited

#ulimit -c  unlimited

6) #/etc/init.d/opensafd start


7)  Compile & run attached test application ( cpsv_shm_2202.c )

#gcc cpsv_shm_2202.c -o ckpt_shm -lSaCkpt

# ./ckpt_shm

8)  Once /dev/shm/ reach 100% Use you will see core dump  same as yours

# df -k /dev/shm/


7) Then we applied the patch  test again with 
`cpsv_2202_V2_debug.patch`  & `cpsv_2207_debug.patch`)  no core dump


   saCkptSectionCreate 1  returned 18. ( no core dump )
==

-AVM



On 11/30/2016 11:37 AM, Vo Minh Hoang wrote:

Dear Mahesh,

Unfortunately, I have just receive information that the same core dump still
occur after applying patch.

Here is dump information in short, please tell me if I can do anything in
support:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7fe314aa0109 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install
opensaf-ckpt-nodedirector-debuginfo-5.1.0-.0.4997518.sle12.x86_64
(gdb) where
#0  0x7fe314aa0109 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
#1  0x7fe315c26082 in memcpy (__len=, __src=, __dest=)
 at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2  ncs_os_posix_shm (req=req@entry=0x7ffecb80adb0) at os_defs.c:874
#3  0x00415a80 in cpnd_sec_hdr_update (cb=cb@entry=0x9e57f0,
sec_info=sec_info@entry=0xb8ff60,
 cp_node=cp_node@entry=0xb8e8c0) at cpnd_proc.c:1880
#4  0x00406047 in cpnd_ckpt_sec_add (cb=cb@entry=0x9e57f0,
cp_node=0xb8e8c0, id=0x7fe30c002390,
 exp_time=1480480471343486000, gen_flag=gen_flag@entry=0) at
cpnd_db.c:457
#5  0x0040d17c in cpnd_evt_proc_ckpt_sect_create
(cb=cb@entry=0x9e57f0,
 evt=evt@entry=0x7fe30c01e1d0, sinfo=sinfo@entry=0x7fe30c01e828) at
cpnd_evt.c:2267
#6  0x0040eaf4 in cpnd_process_evt (evt=0x7fe30c01e1c0) at
cpnd_evt.c:227
#7  0x004106cd in cpnd_main_process (cb=cb@entry=0x9e57f0) at
cpnd_init.c:579
#8  0x00405383 in main (argc=, argv=)
at cpnd_main.c:79

Sincerely,
Hoang

-Original Message-
From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com]
Sent: Tuesday, November 29, 2016 5:37 PM
To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 0 of 3] Review Request for leap : now leap library ensure
shm availability before writing [#2202]

Summary:leap : now leap library ensure shm availability before writing
[#2202] Review request for Trac Ticket(s): #2202 Peer Reviewer(s): Hoang /
Ramesh Pull request to: <> Affected
branch(es): <> Development branch: <>


Impacted area   Impact y/n

  Docsn
  Build systemn
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF servicesn
  OpenSAF servicesy
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

changeset 7b53e1b3754622fe90c22c801adeb7df6d808c30
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 15:59:21 +0530

leap : now leap library ensure shm availability before writing
[#2202]
  Issue :

If OSAF_CKPT_SHM_ALLOC_GUARANTEE is NOT set and SHM is 100% used in
system
, pnd Segmentation fault (core dumped) at LEAP memcpy().

Fix :

Now LEAP library ensures shm free space before writing This may
degrade
some performance of cpsv , if OSAF_CKPT_SHM_ALLOC_GUARANTEE is set,
cpsv
give natural performance.

changeset 083114e13c00c9c4267ffe65a86c1a97a951b876
Author: A V Mahesh 
Date:   Tue, 29 Nov 2016 16:02:06 +0530

cpsv : update cpsv error handing based on leap changes [#2202]

changeset fb509abb1d1583315f585663fd75bf73e35211a6
Author: A V Mahesh 

Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]

2016-11-30 Thread Tai Dinh
Hi Neel,

NACK with the patch.
Need to add 'retry > 0'  as part of while loop condition.

/Tai
-Original Message-
From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] 
Sent: Wednesday, November 30, 2016 2:53 PM
To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com;
tai.d...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling
adminoperation[#2211]

 osaf/services/saf/smfsv/smfd/SmfUtils.cc |  26 +++---
 1 files changed, 15 insertions(+), 11 deletions(-)


diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
--- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc
+++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc
@@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st
}
 
/* Call the admin operation */
-   do {
-   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
+   
+   TRACE("call immutil_saImmOmAdminOperationInvoke_2");
+   rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
, 0, i_operationId, i_params,
+   , i_timeout);
+   while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN)){
+   sleep(2);
rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle,
, 0, i_operationId, i_params,
-  ,
i_timeout);
-   if (retry <= 0) {
-   LOG_NO("Fail to invoke admin operation, too many
SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
-  i_dn.c_str(), i_operationId);
-   rc = SA_AIS_ERR_TRY_AGAIN;
-   goto done;
-   }
-   sleep(2);
+   , i_timeout);
retry--;
-   } while ((rc == SA_AIS_OK) && (returnValue ==
SA_AIS_ERR_TRY_AGAIN));
+   } 
+   
+   if (retry <= 0) {
+   LOG_NO("Fail to invoke admin operation, too many
SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]",
+   i_dn.c_str(), i_operationId);
+   rc = SA_AIS_ERR_TRY_AGAIN;
+   goto done;
+   }
 
if ( rc != SA_AIS_OK) {
LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s],
opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId);


--
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]

2016-11-30 Thread ramesh betham
Hi Hoang,

Can you please provide the snapshot of "df -h /dev/shm" when the issue 
occurs.

Also.. can you please provide the corresponding core dump file and 
"osafckptnd" process file (process file not required if this image is 
build for SLES x86_64)

Thanks,
Ramesh.

On 11/30/2016 12:10 PM, Vo Minh Hoang wrote:
> Dear Mahesh,o
>
> I will do following your request.
> Please note that test case executing time is very long so result might not
> be available in today.
>
> Thank you and best regards,
> Hoang
>
> -Original Message-
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Wednesday, November 30, 2016 1:35 PM
> To: Vo Minh Hoang 
> Cc: opensaf-devel@lists.sourceforge.net; ramesh.bet...@oracle.com
> Subject: Re: [PATCH 0 of 3] Review Request for leap : now leap library
> ensure shm availability before writing [#2202]
>
> Hi Hoang,
>
> Apply  these change test and provide sys log at the time issue occurred.
>
> 
> 
>
> diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c
> --- a/osaf/libs/core/leap/os_defs.c
> +++ b/osaf/libs/core/leap/os_defs.c
> @@ -865,11 +865,15 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S
>   }
>
>   /* Checking whether sufficient shared memory is
> available to write the data, to be safer side ffree reduced to 1 block size
> */
> -   if (req->info.write.i_write_size >
> (statsvfs.f_bfree * statsvfs.f_frsize)) {
> +   if (req->info.write.i_write_size >
> ((statsvfs.f_bfree - 1) * statsvfs.f_frsize)) {
>   syslog(LOG_ERR, "Insufficient shared memory
> space (%ld) available to write the data of size: %ld \n",
>   (statsvfs.f_bfree *
> statsvfs.f_frsize), req->info.write.i_write_size);
>   return NCSCC_RC_FAILURE;
> +   } else {
> +   syslog(LOG_ERR, "Sufficient shared
> memory space (%ld) available to write the data of size: %ld \n",
> +(statsvfs.f_bfree *
> statsvfs.f_frsize), req->info.write.i_write_size);
>   }
> +
>   }
>   memcpy((void *)((char *)req->info.write.i_addr +
> req->info.write.i_offset), req->info.write.i_from_buff,
>   req->info.write.i_write_size);
>
> 
> 
>
> -AVM
>
> On 11/30/2016 11:56 AM, A V Mahesh wrote:
>> Hi Hoang,
>>
>> Thansk for the test .
>>
>> Then it looks issue is not  related to  SHM deficiency ( 100 % used by
>> other application ) can you please  re-test with below changes and
>> that will confirm us it is completely not related to SHM  free size.
>>
>> replacing:
>>
>> `if (req->info.write.i_write_size > (statsvfs.f_bfree *
>> statsvfs.f_frsize)) {`
>>
>> with below, to be safer side ffree reduced to 1 block size :
>>
>> `if (req->info.write.i_write_size > ((statsvfs.f_bfree - 1) *
>> statsvfs.f_frsize)) {
>>
>>
>> -AVM
>>
>>
>> On 11/30/2016 11:37 AM, Vo Minh Hoang wrote:
>>> Dear Mahesh,
>>>
>>> Unfortunately, I have just receive information that the same core
>>> dump still occur after applying patch.
>>>
>>> Here is dump information in short, please tell me if I can do
>>> anything in
>>> support:
>>>
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x7fe314aa0109 in __memcpy_sse2_unaligned () from
>>> /lib64/libc.so.6
>>> Missing separate debuginfos, use: zypper install
>>> opensaf-ckpt-nodedirector-debuginfo-5.1.0-.0.4997518.sle12.x86_64
>>> (gdb) where
>>> #0  0x7fe314aa0109 in __memcpy_sse2_unaligned () from
>>> /lib64/libc.so.6
>>> #1  0x7fe315c26082 in memcpy (__len=,
>>> __src=>> out>, __dest=)
>>>   at /usr/include/x86_64-linux-gnu/bits/string3.h:51
>>> #2  ncs_os_posix_shm (req=req@entry=0x7ffecb80adb0) at os_defs.c:874
>>> #3  0x00415a80 in cpnd_sec_hdr_update (cb=cb@entry=0x9e57f0,
>>> sec_info=sec_info@entry=0xb8ff60,
>>>   cp_node=cp_node@entry=0xb8e8c0) at cpnd_proc.c:1880
>>> #4  0x00406047 in cpnd_ckpt_sec_add (cb=cb@entry=0x9e57f0,
>>> cp_node=0xb8e8c0, id=0x7fe30c002390,
>>>   exp_time=1480480471343486000, gen_flag=gen_flag@entry=0) at
>>> cpnd_db.c:457
>>> #5  0x0040d17c in cpnd_evt_proc_ckpt_sect_create
>>> (cb=cb@entry=0x9e57f0,
>>>   evt=evt@entry=0x7fe30c01e1d0, sinfo=sinfo@entry=0x7fe30c01e828)
>>> at
>>> cpnd_evt.c:2267
>>> #6  0x0040eaf4 in cpnd_process_evt (evt=0x7fe30c01e1c0) at
>>> cpnd_evt.c:227
>>> #7  0x004106cd in cpnd_main_process (cb=cb@entry=0x9e57f0) at
>>> cpnd_init.c:579
>>> #8  0x00405383 in main (argc=, argv=>> out>)
>>> at cpnd_main.c:79
>>>
>>> Sincerely,
>>> Hoang
>>>
>>> -Original