Re: [devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1
Hi Neel, Ack with comment If you copied the loop used with admin operation for node group also this loop would get a more predictable timeout. But even if you do that #2212 is still needed in order to get rid of code redundancy. Thanks Lennart > -Original Message- > From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com] > Sent: den 1 december 2016 06:14 > To: Tai Chi Dinh; Lennart Lund > ; Rafael Odzakow > > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling > adminoperation[#2211] V1 > > Hi , > > will be pushing the patch if there are no further comments. > #2212 is opened to change SmfImmUtils AdminOperation the same way as > Node group admin operation. > > Thanks, > Neel. > > On 2016/11/30 02:42 PM, Tai Dinh wrote: > > Thank Neel, > > > > ACK from me. > > > > /Tai > > -Original Message- > > From: reddy.neelaka...@oracle.com > [mailto:reddy.neelaka...@oracle.com] > > Sent: Wednesday, November 30, 2016 3:47 PM > > To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com; > > tai.d...@dektech.com.au > > Cc: opensaf-devel@lists.sourceforge.net > > Subject: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling > > adminoperation[#2211] V1 > > > > osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++ > --- > > 1 files changed, 15 insertions(+), 11 deletions(-) > > > > > > diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > > b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > > --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > > +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > > @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st > > } > > > > /* Call the admin operation */ > > - do { > > - TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > > + > > + TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > > + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > > , 0, i_operationId, i_params, > > + , i_timeout); > > + while ((rc == SA_AIS_OK) && (returnValue == > SA_AIS_ERR_TRY_AGAIN) && > > (retry > 0) ){ > > + sleep(2); > > rc = > immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > > , 0, i_operationId, i_params, > > - , > > i_timeout); > > - if (retry <= 0) { > > - LOG_NO("Fail to invoke admin operation, too many > > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > > - i_dn.c_str(), i_operationId); > > - rc = SA_AIS_ERR_TRY_AGAIN; > > - goto done; > > - } > > - sleep(2); > > + , i_timeout); > > retry--; > > - } while ((rc == SA_AIS_OK) && (returnValue == > > SA_AIS_ERR_TRY_AGAIN)); > > + } > > + > > + if (retry <= 0) { > > + LOG_NO("Fail to invoke admin operation, too many > > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > > + i_dn.c_str(), i_operationId); > > + rc = SA_AIS_ERR_TRY_AGAIN; > > + goto done; > > + } > > > > if ( rc != SA_AIS_OK) { > > LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], > > opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); > > -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]
HI Nagu, One strategy is in case if an error code is returned to the application, it is the application to select (or not) to log this as an error/notify etc. /Thanks HansN -Original Message- From: Nagendra Kumar [mailto:nagendr...@oracle.com] Sent: den 1 december 2016 07:38 To: Hans Nordebäck; Praveen Malviya ; Minh Hon Chau ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340] Hi Hans, I had proposed this long time back in the ticket: " Nagendra Kumar - 2015-08-07 As part of ticket cleanup, evaluated and decided to keep it as enhancement. The logging should not be heavy and should only stick to state to the level of providing hints to the user. Perhaps, the log should fall under Error category." >> After running various OpenSAF tests there should be no errors in the syslog >> and there may be test case that will fail now, even though there are no >> OpenSAF errors? If there are error situation like the log added, Errors should come into syslog. All negative test cases should result into Error, shouldn't they ? Thanks -Nagu > -Original Message- > From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] > Sent: 01 December 2016 11:40 > To: Nagendra Kumar; Praveen Malviya; Minh Hon Chau; Gary Lee > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures > [#340] > > Hi Nagu, > Not sure, but if these errors are not OpenSAF errors, i.e. not > depending on application configuration, use etc. > should they really be logged as errors? After running various OpenSAF > tests there should be no errors in the syslog and there may be test > case that will fail now, even though there are no OpenSAF errors? > /Regards HansN > > -Original Message- > From: Nagendra Kumar [mailto:nagendr...@oracle.com] > Sent: den 1 december 2016 06:37 > To: Hans Nordebäck ; Praveen Malviya > ; Minh Hon Chau > ; Gary Lee > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures > [#340] > > Hi Hans, > I find it more appropriate here as we are returning Error codes > so > log the information as Error :) > > Thanks > -Nagu > > -Original Message- > > From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] > > Sent: 29 November 2016 14:20 > > To: Nagendra Kumar; Praveen Malviya; minh.c...@dektech.com.au; > > gary@dektech.com.au > > Cc: opensaf-devel@lists.sourceforge.net > > Subject: Re: [PATCH 1 of 1] amfnd: add logs for HC start failures > > [#340] > > > > Hi Nagu, > > > > a question, why use LOG_ER and not LOG_WA below? /Thanks HansN > > > > > > On 11/04/2016 09:32 AM, nagendr...@oracle.com wrote: > > > osaf/services/saf/amf/amfnd/chc.cc | 7 +++ > > > 1 files changed, 7 insertions(+), 0 deletions(-) > > > > > > > > > diff --git a/osaf/services/saf/amf/amfnd/chc.cc > > b/osaf/services/saf/amf/amfnd/chc.cc > > > --- a/osaf/services/saf/amf/amfnd/chc.cc > > > +++ b/osaf/services/saf/amf/amfnd/chc.cc > > > @@ -315,6 +315,7 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > > > > /* get the comp */ > > > if ((*o_comp = avnd_compdb_rec_get(cb->compdb, > > Amf::to_string(_start->comp_name))) == nullptr) { > > > + LOG_ER("Component '%s' doesn't exist in > > DB", osaf_extended_name_borrow(_start->comp_name)); > > > *o_amf_rc = SA_AIS_ERR_NOT_EXIST; > > > return; > > > } > > > @@ -324,6 +325,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > > > m_AVND_COMP_PRES_STATE_IS_INSTANTIATIONFAILED(*o_comp) || > > > > > m_AVND_COMP_PRES_STATE_IS_TERMINATING(*o_comp) || > > > > > m_AVND_COMP_PRES_STATE_IS_TERMINATIONFAILED(*o_comp)) { > > > + LOG_ER("Component '%s' is not healthy (pres > > state '%u')", osaf_extended_name_borrow(_start->comp_name), > > > + (*o_comp)->pres); > > > *o_amf_rc = SA_AIS_ERR_TRY_AGAIN; > > > return; > > > } > > > @@ -347,6 +350,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > if (0 == avnd_hcdb_rec_get(cb, _chk)) { > > > /* HC instance did not exist, look for > > > HC type > > */ > > > if (nullptr == avnd_hctypedb_rec_get(cb, > > (*o_comp)->saAmfCompType, _start->hc_key)) { > > > + LOG_ER("Health check is not > > configured for component '%s'", > > > + > >
Re: [devel] [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202]
Hi Mahesh, Please find my review comments inline with [Ramesh]. Thanks, Ramesh. On 11/29/2016 4:07 PM, mahesh.va...@oracle.com wrote: > osaf/libs/common/cpsv/include/cpnd_cb.h | 4 +- > osaf/libs/common/cpsv/include/cpnd_init.h | 8 +- > osaf/libs/common/cpsv/include/cpnd_sec.h | 2 +- > osaf/libs/core/include/ncs_osprm.h| 2 +- > osaf/services/saf/cpsv/cpnd/cpnd_db.c | 12 ++-- > osaf/services/saf/cpsv/cpnd/cpnd_evt.c| 82 > +- > osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 31 ++ > osaf/services/saf/cpsv/cpnd/cpnd_res.c| 24 +++-- > osaf/services/saf/cpsv/cpnd/cpnd_sec.cc | 12 ++-- > 9 files changed, 103 insertions(+), 74 deletions(-) > > > diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h > b/osaf/libs/common/cpsv/include/cpnd_cb.h > --- a/osaf/libs/common/cpsv/include/cpnd_cb.h > +++ b/osaf/libs/common/cpsv/include/cpnd_cb.h > @@ -341,8 +341,8 @@ uint32_t cpnd_amf_register(CPND_CB *cpnd > uint32_t cpnd_amf_deregister(CPND_CB *cpnd_cb); > uint32_t cpnd_client_extract_bits(uint32_t bitmap_value, uint32_t > *bit_position); > uint32_t cpnd_res_ckpt_sec_del(CPND_CKPT_NODE *cp_node); > -uint32_t cpnd_ckpt_replica_create_res(NCS_OS_POSIX_SHM_REQ_INFO *open_req, > char *buf, CPND_CKPT_NODE **cp_node, > - uint32_t ref_cnt, CKPT_INFO > *cp_info, bool shm_alloc_guaranteed); > +uint32_t cpnd_ckpt_replica_create_res(CPND_CB *cb, NCS_OS_POSIX_SHM_REQ_INFO > *open_req, char *buf, CPND_CKPT_NODE **cp_node, > + uint32_t ref_cnt, CKPT_INFO > *cp_info); > int32_t cpnd_find_free_loc(CPND_CB *cb, CPND_TYPE_INFO type); > uint32_t cpnd_ckpt_write_header(CPND_CB *cb, uint32_t nckpts); > uint32_t cpnd_cli_info_write_header(CPND_CB *cb, int32_t n_clients); > diff --git a/osaf/libs/common/cpsv/include/cpnd_init.h > b/osaf/libs/common/cpsv/include/cpnd_init.h > --- a/osaf/libs/common/cpsv/include/cpnd_init.h > +++ b/osaf/libs/common/cpsv/include/cpnd_init.h > @@ -90,7 +90,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C > uint32_t cpnd_ckpt_remote_cpnd_add(CPND_CKPT_NODE *cp_node, MDS_DEST > mds_info); > uint32_t cpnd_ckpt_remote_cpnd_del(CPND_CKPT_NODE *cp_node, MDS_DEST > mds_info); > int32_t cpnd_ckpt_get_lck_sec_id(CPND_CKPT_NODE *cp_node); > -uint32_t cpnd_ckpt_sec_write(CPND_CKPT_NODE *cp_node, CPND_CKPT_SECTION_INFO > +uint32_t cpnd_ckpt_sec_write(CPND_CB *cb, CPND_CKPT_NODE *cp_node, > CPND_CKPT_SECTION_INFO > *sec_info, const void *data, uint64_t size, uint64_t > offset, uint32_t type); > uint32_t cpnd_ckpt_sec_read(CPND_CKPT_NODE *cp_node, CPND_CKPT_SECTION_INFO >*sec_info, void *data, uint64_t size, uint64_t offset); > @@ -164,7 +164,7 @@ void cpnd_evt_node_getnext(CPND_CB *cb, > uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE > *evt_node); > uint32_t cpnd_evt_node_del(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE > *evt_node); > CPND_CKPT_NODE *cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, > SaConstStringT ckpt_name); > -CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node, > SaCkptSectionIdT *id, SaTimeT exp_time, > +CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CB *cb, CPND_CKPT_NODE > *cp_node, SaCkptSectionIdT *id, SaTimeT exp_time, > uint32_t gen_flag); > void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt); > uint32_t cpnd_ckpt_node_tree_init(CPND_CB *cb); > @@ -176,8 +176,8 @@ void cpnd_client_node_tree_cleanup(CPND_ > void cpnd_client_node_tree_destroy(CPND_CB *cb); > void cpnd_allrepl_write_evt_node_tree_cleanup(CPND_CB *cb); > void cpnd_allrepl_write_evt_node_tree_destroy(CPND_CB *cb); > -uint32_t cpnd_sec_hdr_update(CPND_CKPT_SECTION_INFO *pSecPtr, CPND_CKPT_NODE > *cp_node); > -uint32_t cpnd_ckpt_hdr_update(CPND_CKPT_NODE *cp_node); > +uint32_t cpnd_sec_hdr_update(CPND_CB *cb, CPND_CKPT_SECTION_INFO *pSecPtr, > CPND_CKPT_NODE *cp_node); > +uint32_t cpnd_ckpt_hdr_update(CPND_CB *cb, CPND_CKPT_NODE *cp_node); > void cpnd_ckpt_node_destroy(CPND_CB *cb, CPND_CKPT_NODE *cp_node); > uint32_t cpnd_get_slot_sub_slot_id_from_mds_dest(MDS_DEST dest); > uint32_t cpnd_get_slot_sub_slot_id_from_node_id(NCS_NODE_ID i_node_id); > diff --git a/osaf/libs/common/cpsv/include/cpnd_sec.h > b/osaf/libs/common/cpsv/include/cpnd_sec.h > --- a/osaf/libs/common/cpsv/include/cpnd_sec.h > +++ b/osaf/libs/common/cpsv/include/cpnd_sec.h > @@ -39,7 +39,7 @@ CPND_CKPT_SECTION_INFO * > cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE *, const SaCkptSectionIdT *); > > CPND_CKPT_SECTION_INFO * > -cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *); > +cpnd_ckpt_sec_del(CPND_CB *cb, CPND_CKPT_NODE *, SaCkptSectionIdT *); > > CPND_CKPT_SECTION_INFO * > cpnd_get_sect_with_id(const CPND_CKPT_NODE *, uint32_t lcl_sec_id); > diff --git
Re: [devel] [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]
Hi Hans, I had proposed this long time back in the ticket: " Nagendra Kumar - 2015-08-07 As part of ticket cleanup, evaluated and decided to keep it as enhancement. The logging should not be heavy and should only stick to state to the level of providing hints to the user. Perhaps, the log should fall under Error category." >> After running various OpenSAF tests there should be no errors in the syslog >> and there may be test case that will fail now, even though there are no >> OpenSAF errors? If there are error situation like the log added, Errors should come into syslog. All negative test cases should result into Error, shouldn't they ? Thanks -Nagu > -Original Message- > From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] > Sent: 01 December 2016 11:40 > To: Nagendra Kumar; Praveen Malviya; Minh Hon Chau; Gary Lee > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340] > > Hi Nagu, > Not sure, but if these errors are not OpenSAF errors, i.e. not depending on > application configuration, use etc. > should they really be logged as errors? After running various OpenSAF tests > there should be no errors in the syslog and there may be test case that will > fail now, even though there are no OpenSAF errors? > /Regards HansN > > -Original Message- > From: Nagendra Kumar [mailto:nagendr...@oracle.com] > Sent: den 1 december 2016 06:37 > To: Hans Nordebäck; Praveen Malviya > ; Minh Hon Chau > ; Gary Lee > Cc: opensaf-devel@lists.sourceforge.net > Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340] > > Hi Hans, > I find it more appropriate here as we are returning Error > codes so log the information as Error :) > > Thanks > -Nagu > > -Original Message- > > From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] > > Sent: 29 November 2016 14:20 > > To: Nagendra Kumar; Praveen Malviya; minh.c...@dektech.com.au; > > gary@dektech.com.au > > Cc: opensaf-devel@lists.sourceforge.net > > Subject: Re: [PATCH 1 of 1] amfnd: add logs for HC start failures > > [#340] > > > > Hi Nagu, > > > > a question, why use LOG_ER and not LOG_WA below? /Thanks HansN > > > > > > On 11/04/2016 09:32 AM, nagendr...@oracle.com wrote: > > > osaf/services/saf/amf/amfnd/chc.cc | 7 +++ > > > 1 files changed, 7 insertions(+), 0 deletions(-) > > > > > > > > > diff --git a/osaf/services/saf/amf/amfnd/chc.cc > > b/osaf/services/saf/amf/amfnd/chc.cc > > > --- a/osaf/services/saf/amf/amfnd/chc.cc > > > +++ b/osaf/services/saf/amf/amfnd/chc.cc > > > @@ -315,6 +315,7 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > > > > /* get the comp */ > > > if ((*o_comp = avnd_compdb_rec_get(cb->compdb, > > Amf::to_string(_start->comp_name))) == nullptr) { > > > + LOG_ER("Component '%s' doesn't exist in > > DB", osaf_extended_name_borrow(_start->comp_name)); > > > *o_amf_rc = SA_AIS_ERR_NOT_EXIST; > > > return; > > > } > > > @@ -324,6 +325,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > > > m_AVND_COMP_PRES_STATE_IS_INSTANTIATIONFAILED(*o_comp) || > > > > > m_AVND_COMP_PRES_STATE_IS_TERMINATING(*o_comp) || > > > > > m_AVND_COMP_PRES_STATE_IS_TERMINATIONFAILED(*o_comp)) { > > > + LOG_ER("Component '%s' is not healthy (pres > > state '%u')", osaf_extended_name_borrow(_start->comp_name), > > > + (*o_comp)->pres); > > > *o_amf_rc = SA_AIS_ERR_TRY_AGAIN; > > > return; > > > } > > > @@ -347,6 +350,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > if (0 == avnd_hcdb_rec_get(cb, _chk)) { > > > /* HC instance did not exist, look for > > > HC type > > */ > > > if (nullptr == avnd_hctypedb_rec_get(cb, > > (*o_comp)->saAmfCompType, _start->hc_key)) { > > > + LOG_ER("Health check is not > > configured for component '%s'", > > > + > > osaf_extended_name_borrow(_start->comp_name)); > > > *o_amf_rc = > > SA_AIS_ERR_NOT_EXIST; > > > return; > > > } > > > @@ -356,6 +361,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > tmp_hc_rec.req_hdl = hc_start->hdl; > > > /* determine if this healthcheck is already > > > active */ > > > if (0 != m_AVND_COMPDB_REC_HC_GET(**o_comp, > > tmp_hc_rec)) { > > > + LOG_ER("Health check is already active for >
Re: [devel] [PATCH 1 of 1] amfnd: add logs for HC start failures [#340]
Hi Nagu, Not sure, but if these errors are not OpenSAF errors, i.e. not depending on application configuration, use etc. should they really be logged as errors? After running various OpenSAF tests there should be no errors in the syslog and there may be test case that will fail now, even though there are no OpenSAF errors? /Regards HansN -Original Message- From: Nagendra Kumar [mailto:nagendr...@oracle.com] Sent: den 1 december 2016 06:37 To: Hans Nordebäck; Praveen Malviya ; Minh Hon Chau ; Gary Lee Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1 of 1] amfnd: add logs for HC start failures [#340] Hi Hans, I find it more appropriate here as we are returning Error codes so log the information as Error :) Thanks -Nagu > -Original Message- > From: Hans Nordeback [mailto:hans.nordeb...@ericsson.com] > Sent: 29 November 2016 14:20 > To: Nagendra Kumar; Praveen Malviya; minh.c...@dektech.com.au; > gary@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] amfnd: add logs for HC start failures > [#340] > > Hi Nagu, > > a question, why use LOG_ER and not LOG_WA below? /Thanks HansN > > > On 11/04/2016 09:32 AM, nagendr...@oracle.com wrote: > > osaf/services/saf/amf/amfnd/chc.cc | 7 +++ > > 1 files changed, 7 insertions(+), 0 deletions(-) > > > > > > diff --git a/osaf/services/saf/amf/amfnd/chc.cc > b/osaf/services/saf/amf/amfnd/chc.cc > > --- a/osaf/services/saf/amf/amfnd/chc.cc > > +++ b/osaf/services/saf/amf/amfnd/chc.cc > > @@ -315,6 +315,7 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > > /* get the comp */ > > if ((*o_comp = avnd_compdb_rec_get(cb->compdb, > Amf::to_string(_start->comp_name))) == nullptr) { > > + LOG_ER("Component '%s' doesn't exist in > DB", osaf_extended_name_borrow(_start->comp_name)); > > *o_amf_rc = SA_AIS_ERR_NOT_EXIST; > > return; > > } > > @@ -324,6 +325,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > > m_AVND_COMP_PRES_STATE_IS_INSTANTIATIONFAILED(*o_comp) || > > > m_AVND_COMP_PRES_STATE_IS_TERMINATING(*o_comp) || > > > m_AVND_COMP_PRES_STATE_IS_TERMINATIONFAILED(*o_comp)) { > > + LOG_ER("Component '%s' is not healthy (pres > state '%u')", osaf_extended_name_borrow(_start->comp_name), > > + (*o_comp)->pres); > > *o_amf_rc = SA_AIS_ERR_TRY_AGAIN; > > return; > > } > > @@ -347,6 +350,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > if (0 == avnd_hcdb_rec_get(cb, _chk)) { > > /* HC instance did not exist, look for HC type > */ > > if (nullptr == avnd_hctypedb_rec_get(cb, > (*o_comp)->saAmfCompType, _start->hc_key)) { > > + LOG_ER("Health check is not > configured for component '%s'", > > + > osaf_extended_name_borrow(_start->comp_name)); > > *o_amf_rc = > SA_AIS_ERR_NOT_EXIST; > > return; > > } > > @@ -356,6 +361,8 @@ void avnd_comp_hc_param_val(AVND_CB *cb, > > tmp_hc_rec.req_hdl = hc_start->hdl; > > /* determine if this healthcheck is already active */ > > if (0 != m_AVND_COMPDB_REC_HC_GET(**o_comp, > tmp_hc_rec)) { > > + LOG_ER("Health check is already active for > component '%s'", > > + > osaf_extended_name_borrow(_start->comp_name)); > > *o_amf_rc = SA_AIS_ERR_EXIST; > > return; > > } > -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM track callback [#2141]
Hi Minh, But fixing small thing is complicating the code. You can just try to stop CLM track again in Quisced->Standby transition and even if it fails you may try few times as it is Standby and it is not doing much job accept that it is receiving some checkpoints. If that fails, it is ok to reboot the Standby node. I am sure this is very rare case and the added code may not be getting chance to get executed. Also, it is important to investigate why CLM is giving try-again. Thanks -Nagu > -Original Message- > From: minh chau [mailto:minh.c...@dektech.com.au] > Sent: 01 December 2016 10:02 > To: Nagendra Kumar; Praveen Malviya; hans.nordeb...@ericsson.com; > gary@dektech.com.au; long.hb.ngu...@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM > track callback [#2141] > > Hi Nagu, > > The patch has already had a variable to memorize the track start/stop status. > Wouldn't it be nice to avoid a node reboot (as consequence of AMFD's > exit) due to TRY_AGAIN/TIMEOUT which could be classified as temporary > service denial? And retry to stop tracking until coming back to as active one. > > Thanks, > Minh > > On 30/11/16 17:25, Nagendra Kumar wrote: > > Hi Minh, > > I think the patch could be made simpler by again try to stop CLM > track(keep a variable to indicate it wasn't stopped before) when AMFD > becomes Standby. If it doesn't get stopped again, then exit Amfd as it is kind > of double fault. > > > > Thanks > > -Nagu > > > >> -Original Message- > >> From: minh chau [mailto:minh.c...@dektech.com.au] > >> Sent: 30 November 2016 10:40 > >> To: praveen malviya;hans.nordeb...@ericsson.com; Nagendra Kumar; > >> gary@dektech.com.au;long.hb.ngu...@dektech.com.au > >> Cc:opensaf-devel@lists.sourceforge.net > >> Subject: Re: [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives > >> CLM track callback [#2141] > >> > >> Hi Praveen, > >> > >> I have seen a limitation of using the same job queue now, that it > >> will increase the possibility of loss of IMM update job which is > >> causing incorrect headless recovery. The timer is one alternative. We > >> had spawned a thread to deal with TRY_AGAIN/TIMEOUT in case of start > >> cluster tracking, so stop cluster tracking should be handled in a similar > way. > >> I will create another ticket and assign it to myself to introduce a > >> thread handler for using CLM service in AMF. Some background/similar > >> tickets: #1609, #1531. For the scope of this ticket, the floated > >> patch + a correction from your comment that standby AMFD responds > CLM > >> callback, they should be enough to avoid the reported problem in > >> #2141. Do you agree? > >> > >> Thanks, > >> Minh > >> > >> > >> On 21/11/16 22:37, praveen malviya wrote: > >>> Hi Minh, > >>> > >>> If the intention is to use same job queue then, I think, one problem > >>> will come: > >>> "Currently AMFD empties job queue when AMFD becomes quiesced in > >>> avd_mds_qsd_role_evh(). Most probably before emptying, AMFD will try > >>> to stop CLM tracking and now if the API again fails then this > >>> failure cannot be handled as AMFD cannot be blocked in > >>> avd_mds_qsd_role_evh() for long time. And then AMFD will empty the > job queue." > >>> > >>> Can we think of some way to avoid above problem? One solution, > >>> without using job queue, could be: When CLM track fails, run a timer > >>> of 1 seconds. When this timer expires, check role of AMFD. If the > >>> role is still standby and Track stop is pending then now AMFD will give a > try. > >>> If it again fails then rerun the timer based on the value of return > >>> code. We can decide how many times this can be tried. > >>> > >>> One more thing. on standby controller if clm track has failed then > >>> it should respond for any track callback that it receives without > >>> taking any action. This needs to be done irrespective of the > >>> approach of the solution. > >>> > >>> Thanks, > >>> Praveen > >>> > >>> On 21-Nov-16 4:38 PM, minh chau wrote: > Hi Praveen, > > I am thinking if creating a CLM job (likely IMM update, NTF send > job) in order to stop tracking at non-active AMFD. > Should it be a better version than this patch that AMFD will still > have chance to stop tracking until being promoted back to active role? > > Thanks, > Minh > > On 18/11/16 11:38, minh chau wrote: > > Hi Praveen, > > > > It has been seen once in swapping 2N Opensaf SI. It was just an > > error indicating that AMFD failed to stop track callback, after > > that the si-swap also succeeded. CLM service could be busy during > > switch-over but AMF (as well as other services) is supposed to > > handle that return code properly as AMFD has done in track start. > > Further testing, it exposed a problem due to clm track callback > received on standby AMFD. > > > > AMFD
Re: [devel] [PATCH 1 of 1] cpnd: fix error handling while section_hdr_update_fail [#2207]
Hi Hoang, ACK for the attached patch . -AVM On 11/29/2016 6:01 PM, A V Mahesh wrote: > HI All, > > Validation of this patch dependent on #2202 , so please find the attached > re-based patch of this (#2207) ticket. > > Hoang will re-publish the V3 patch once he is in office. > > -AVM > > On 11/29/2016 4:23 PM, Hoang Vo wrote: >> osaf/libs/common/cpsv/include/cpnd_sec.h | 2 +- >> osaf/services/saf/cpsv/cpnd/cpnd_db.c| 5 - >> osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 8 >> osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 2 +- >> osaf/services/saf/cpsv/cpnd/cpnd_sec.cc | 25 >> ++--- >> 5 files changed, 24 insertions(+), 18 deletions(-) >> >> >> problem: >> the steps to add a section is add_db_tree -> update_sec_hdr -> >> update_ckpt_hdr >> so if an error occur cpsv should handle error in reverse order. >> currently, section_hdr_update_fails, cpsv revert ckpt_hdr also that >> case error >> >> solution: >> only revert db_tree in case section_hdr_update_fails >> >> diff --git a/osaf/libs/common/cpsv/include/cpnd_sec.h >> b/osaf/libs/common/cpsv/include/cpnd_sec.h >> --- a/osaf/libs/common/cpsv/include/cpnd_sec.h >> +++ b/osaf/libs/common/cpsv/include/cpnd_sec.h >> @@ -39,7 +39,7 @@ CPND_CKPT_SECTION_INFO * >> cpnd_ckpt_sec_get_create(const CPND_CKPT_NODE *, const >> SaCkptSectionIdT *); >> CPND_CKPT_SECTION_INFO * >> -cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *); >> +cpnd_ckpt_sec_del(CPND_CKPT_NODE *, SaCkptSectionIdT *, bool); >> CPND_CKPT_SECTION_INFO * >> cpnd_get_sect_with_id(const CPND_CKPT_NODE *, uint32_t lcl_sec_id); >> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_db.c >> b/osaf/services/saf/cpsv/cpnd/cpnd_db.c >> --- a/osaf/services/saf/cpsv/cpnd/cpnd_db.c >> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_db.c >> @@ -391,6 +391,7 @@ CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_ad >> int32_t lcl_sec_id = 0; >> uint32_t rc = NCSCC_RC_SUCCESS; >> uint32_t value = 0, i = 0, j = 0; >> +bool hdr_update = true; >> TRACE_ENTER(); >> /* get the lcl_sec_id */ >> @@ -469,8 +470,10 @@ CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_ad >> return pSecPtr; >> section_hdr_update_fails: >> +hdr_update = false; >> + >>ckpt_hdr_update_fails: >> -cpnd_ckpt_sec_del(cp_node, id); >> +cpnd_ckpt_sec_del(cp_node, id, hdr_update); >> section_add_fails: >> if (pSecPtr->sec_id.id != NULL) >> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >> @@ -2357,12 +2357,12 @@ static uint32_t cpnd_evt_proc_ckpt_sect_ >> /* delete the section */ >> if (gen_sec_id) >> tmp_sec_info = >> - cpnd_ckpt_sec_del(cp_node, _info->sec_id); >> + cpnd_ckpt_sec_del(cp_node, _info->sec_id, true); >> else >> tmp_sec_info = >> cpnd_ckpt_sec_del(cp_node, >> evt->info.sec_creatReq. >> - sec_attri.sectionId); >> + sec_attri.sectionId, true); >> if (tmp_sec_info == sec_info) { >> cp_node->replica_info. >> @@ -2494,7 +2494,7 @@ static uint32_t cpnd_evt_proc_ckpt_sect_ >> rc = cpnd_ckpt_sec_find(cp_node, >info.sec_delReq.sec_id); >> if (rc == NCSCC_RC_SUCCESS) { >> -sec_info = cpnd_ckpt_sec_del(cp_node, >> >info.sec_delReq.sec_id); >> +sec_info = cpnd_ckpt_sec_del(cp_node, >> >info.sec_delReq.sec_id, true); >> /* resetting lcl_sec_id mapping */ >> if (sec_info == NULL) { >> rc = NCSCC_RC_FAILURE; >> @@ -2774,7 +2774,7 @@ static uint32_t cpnd_evt_proc_nd2nd_ckpt >> send_evt.info.cpnd.info.sec_delete_rsp.error = >> SA_AIS_ERR_TRY_AGAIN; >> goto nd_rsp; >> } >> -sec_info = cpnd_ckpt_sec_del(cp_node, >> >info.sec_delete_req.sec_id); >> +sec_info = cpnd_ckpt_sec_del(cp_node, >> >info.sec_delete_req.sec_id, true); >> if (sec_info == NULL) { >> if >> (m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags)) { >> TRACE_4("cpnd ckpt sect del failed for >> sec_id:%s,ckpt_id:%llx", >> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >> @@ -1544,7 +1544,7 @@ uint32_t cpnd_proc_sec_expiry(CPND_CB *c >> return NCSCC_RC_FAILURE; >> } >> -cpnd_ckpt_sec_del(cp_node, _info->sec_id); >> +cpnd_ckpt_sec_del(cp_node, _info->sec_id, true); >> cp_node->replica_info.shm_sec_mapping[pSec_info->lcl_sec_id] = 1; >> /* send out destory to all cpnd's maintaining this ckpt */ >> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_sec.cc >>
Re: [devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1
Hi , will be pushing the patch if there are no further comments. #2212 is opened to change SmfImmUtils AdminOperation the same way as Node group admin operation. Thanks, Neel. On 2016/11/30 02:42 PM, Tai Dinh wrote: > Thank Neel, > > ACK from me. > > /Tai > -Original Message- > From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] > Sent: Wednesday, November 30, 2016 3:47 PM > To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com; > tai.d...@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling > adminoperation[#2211] V1 > > osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- > 1 files changed, 15 insertions(+), 11 deletions(-) > > > diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st > } > > /* Call the admin operation */ > - do { > - TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > + > + TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > , 0, i_operationId, i_params, > + , i_timeout); > + while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN) && > (retry > 0) ){ > + sleep(2); > rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > , 0, i_operationId, i_params, > -, > i_timeout); > - if (retry <= 0) { > - LOG_NO("Fail to invoke admin operation, too many > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > -i_dn.c_str(), i_operationId); > - rc = SA_AIS_ERR_TRY_AGAIN; > - goto done; > - } > - sleep(2); > + , i_timeout); > retry--; > - } while ((rc == SA_AIS_OK) && (returnValue == > SA_AIS_ERR_TRY_AGAIN)); > + } > + > + if (retry <= 0) { > + LOG_NO("Fail to invoke admin operation, too many > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > + i_dn.c_str(), i_operationId); > + rc = SA_AIS_ERR_TRY_AGAIN; > + goto done; > + } > > if ( rc != SA_AIS_OK) { > LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], > opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); > -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM track callback [#2141]
Hi Nagu, The patch has already had a variable to memorize the track start/stop status. Wouldn't it be nice to avoid a node reboot (as consequence of AMFD's exit) due to TRY_AGAIN/TIMEOUT which could be classified as temporary service denial? And retry to stop tracking until coming back to as active one. Thanks, Minh On 30/11/16 17:25, Nagendra Kumar wrote: > Hi Minh, > I think the patch could be made simpler by again try to stop CLM > track(keep a variable to indicate it wasn't stopped before) when AMFD becomes > Standby. If it doesn't get stopped again, then exit Amfd as it is kind of > double fault. > > Thanks > -Nagu > >> -Original Message- >> From: minh chau [mailto:minh.c...@dektech.com.au] >> Sent: 30 November 2016 10:40 >> To: praveen malviya;hans.nordeb...@ericsson.com; Nagendra Kumar; >> gary@dektech.com.au;long.hb.ngu...@dektech.com.au >> Cc:opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH 1 of 1] AMFD: Do not let non-active AMFD receives CLM >> track callback [#2141] >> >> Hi Praveen, >> >> I have seen a limitation of using the same job queue now, that it will >> increase >> the possibility of loss of IMM update job which is causing incorrect headless >> recovery. The timer is one alternative. We had spawned a thread to deal with >> TRY_AGAIN/TIMEOUT in case of start cluster tracking, so stop cluster >> tracking should be handled in a similar way. >> I will create another ticket and assign it to myself to introduce a thread >> handler for using CLM service in AMF. Some background/similar >> tickets: #1609, #1531. For the scope of this ticket, the floated patch + a >> correction from your comment that standby AMFD responds CLM callback, >> they should be enough to avoid the reported problem in #2141. Do you >> agree? >> >> Thanks, >> Minh >> >> >> On 21/11/16 22:37, praveen malviya wrote: >>> Hi Minh, >>> >>> If the intention is to use same job queue then, I think, one problem >>> will come: >>> "Currently AMFD empties job queue when AMFD becomes quiesced in >>> avd_mds_qsd_role_evh(). Most probably before emptying, AMFD will try >>> to stop CLM tracking and now if the API again fails then this failure >>> cannot be handled as AMFD cannot be blocked in avd_mds_qsd_role_evh() >>> for long time. And then AMFD will empty the job queue." >>> >>> Can we think of some way to avoid above problem? One solution, without >>> using job queue, could be: When CLM track fails, run a timer of 1 >>> seconds. When this timer expires, check role of AMFD. If the role is >>> still standby and Track stop is pending then now AMFD will give a try. >>> If it again fails then rerun the timer based on the value of return >>> code. We can decide how many times this can be tried. >>> >>> One more thing. on standby controller if clm track has failed then it >>> should respond for any track callback that it receives without taking >>> any action. This needs to be done irrespective of the approach of the >>> solution. >>> >>> Thanks, >>> Praveen >>> >>> On 21-Nov-16 4:38 PM, minh chau wrote: Hi Praveen, I am thinking if creating a CLM job (likely IMM update, NTF send job) in order to stop tracking at non-active AMFD. Should it be a better version than this patch that AMFD will still have chance to stop tracking until being promoted back to active role? Thanks, Minh On 18/11/16 11:38, minh chau wrote: > Hi Praveen, > > It has been seen once in swapping 2N Opensaf SI. It was just an error > indicating that AMFD failed to stop track callback, after that the > si-swap also succeeded. CLM service could be busy during switch-over > but AMF (as well as other services) is supposed to handle that return > code properly as AMFD has done in track start. Further testing, it > exposed a problem due to clm track callback received on standby AMFD. > > AMFD can not be too busy just try to stop clm track. First option, > AMFD can start another thread which just tries to stop track callback, > but that's likely to generate more problems where this standby AMFD is > promoted back to active in the meantime that thread of stopping track > is on going. At that time, main thread will be starting track and > another thread is trying to stop track. > > Second option as in the patch, AMFD retries to stop track where > problem happens in callback. At the time standby AMFD switching back > to active, we stop and start track again, but it seems no point of > doing this. I hope it's less impact on existing behavior, do you see > any problems? > > Regarding standby controller with locked/disabled CLM, I think it is > mentioned (in #1828 or some where else?) that this matter requires > more thoughts as whether we want Ha role assigned to AMFD in non > member node. It seems to beyond this ticket. > > Thanks, > Minh > > On 17/11/16 22:03,
Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]
Dear Mahesh, ACK all three patches, tested, found no problem. Sincerely, Hoang -Original Message- From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] Sent: Tuesday, November 29, 2016 5:37 PM To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202] Summary:leap : now leap library ensure shm availability before writing [#2202] Review request for Trac Ticket(s): #2202 Peer Reviewer(s): Hoang / Ramesh Pull request to: <> Affected branch(es): <> Development branch: <> Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesy Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - changeset 7b53e1b3754622fe90c22c801adeb7df6d808c30 Author: A V MaheshDate: Tue, 29 Nov 2016 15:59:21 +0530 leap : now leap library ensure shm availability before writing [#2202] Issue : If OSAF_CKPT_SHM_ALLOC_GUARANTEE is NOT set and SHM is 100% used in system , pnd Segmentation fault (core dumped) at LEAP memcpy(). Fix : Now LEAP library ensures shm free space before writing This may degrade some performance of cpsv , if OSAF_CKPT_SHM_ALLOC_GUARANTEE is set, cpsv give natural performance. changeset 083114e13c00c9c4267ffe65a86c1a97a951b876 Author: A V Mahesh Date: Tue, 29 Nov 2016 16:02:06 +0530 cpsv : update cpsv error handing based on leap changes [#2202] changeset fb509abb1d1583315f585663fd75bf73e35211a6 Author: A V Mahesh Date: Tue, 29 Nov 2016 16:02:58 +0530 mqsv : update mqsv error handing based on leap changes [#2202] Complete diffstat: -- osaf/libs/common/cpsv/include/cpnd_cb.h | 4 ++-- osaf/libs/common/cpsv/include/cpnd_init.h | 8 osaf/libs/common/cpsv/include/cpnd_sec.h | 2 +- osaf/libs/core/include/ncs_osprm.h| 2 +- osaf/libs/core/leap/os_defs.c | 20 ++-- osaf/services/saf/cpsv/cpnd/cpnd_db.c | 12 ++-- osaf/services/saf/cpsv/cpnd/cpnd_evt.c| 82 +--- -- osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 31 ++- osaf/services/saf/cpsv/cpnd/cpnd_res.c| 24 osaf/services/saf/cpsv/cpnd/cpnd_sec.cc | 12 ++-- osaf/services/saf/glsv/glnd/glnd_shm.c| 2 +- osaf/services/saf/mqsv/mqnd/mqnd_shm.c| 2 +- 12 files changed, 123 insertions(+), 78 deletions(-) Testing Commands: - Create situation that node SHM reaches 100% usage and then perform any CPSV operation which writes to SHM Testing, Expected Results: -- <> Conditions of Submission: - <> Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which
Re: [devel] [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202]
Hi Hoang, On 12/1/2016 9:15 AM, Vo Minh Hoang wrote: > Dear Mahesh, > > I have one small concern about: > Please consider get CB data by ncshm_take_hdl() or pass in ensures_space as > parameter. > When CB is global access, adding it to parameters of many API is a little > bit inconvenient. The `cb->shm_alloc_guaranteed = atoi(ptr);` only update at the tim of cpnd_lib_init() by reading OSAF_CKPT_SHM_ALLOC_GUARANTEE env and no further run-time update, so ideally you dont required lock. Even the for CPSV API`s cb lock was taken and so the call shm will automatically under lock do see any specific missing place in API code ? -AVM > And passing CB for just one ensures_space param might confuse the usage of > function, maybe in future use. > > This is just my opinion, please consider about it. > > Thank you and best regards, > Hoang > > > -Original Message- > From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] > Sent: Tuesday, November 29, 2016 5:37 PM > To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 2 of 3] cpsv : update cpsv error handing based on leap > changes [#2202] > > osaf/libs/common/cpsv/include/cpnd_cb.h | 4 +- > osaf/libs/common/cpsv/include/cpnd_init.h | 8 +- > osaf/libs/common/cpsv/include/cpnd_sec.h | 2 +- > osaf/libs/core/include/ncs_osprm.h| 2 +- > osaf/services/saf/cpsv/cpnd/cpnd_db.c | 12 ++-- > osaf/services/saf/cpsv/cpnd/cpnd_evt.c| 82 > +- > osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 31 ++ > osaf/services/saf/cpsv/cpnd/cpnd_res.c| 24 +++-- > osaf/services/saf/cpsv/cpnd/cpnd_sec.cc | 12 ++-- > 9 files changed, 103 insertions(+), 74 deletions(-) > > > diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h > b/osaf/libs/common/cpsv/include/cpnd_cb.h > --- a/osaf/libs/common/cpsv/include/cpnd_cb.h > +++ b/osaf/libs/common/cpsv/include/cpnd_cb.h > @@ -341,8 +341,8 @@ uint32_t cpnd_amf_register(CPND_CB *cpnd uint32_t > cpnd_amf_deregister(CPND_CB *cpnd_cb); uint32_t > cpnd_client_extract_bits(uint32_t bitmap_value, uint32_t *bit_position); > uint32_t cpnd_res_ckpt_sec_del(CPND_CKPT_NODE *cp_node); -uint32_t > cpnd_ckpt_replica_create_res(NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf, > CPND_CKPT_NODE **cp_node, > - uint32_t ref_cnt, CKPT_INFO > *cp_info, bool shm_alloc_guaranteed); > +uint32_t cpnd_ckpt_replica_create_res(CPND_CB *cb, > NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf, CPND_CKPT_NODE **cp_node, > + uint32_t ref_cnt, CKPT_INFO > *cp_info); > int32_t cpnd_find_free_loc(CPND_CB *cb, CPND_TYPE_INFO type); uint32_t > cpnd_ckpt_write_header(CPND_CB *cb, uint32_t nckpts); uint32_t > cpnd_cli_info_write_header(CPND_CB *cb, int32_t n_clients); diff --git > a/osaf/libs/common/cpsv/include/cpnd_init.h > b/osaf/libs/common/cpsv/include/cpnd_init.h > --- a/osaf/libs/common/cpsv/include/cpnd_init.h > +++ b/osaf/libs/common/cpsv/include/cpnd_init.h > @@ -90,7 +90,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C uint32_t > cpnd_ckpt_remote_cpnd_add(CPND_CKPT_NODE *cp_node, MDS_DEST mds_info); > uint32_t cpnd_ckpt_remote_cpnd_del(CPND_CKPT_NODE *cp_node, MDS_DEST > mds_info); int32_t cpnd_ckpt_get_lck_sec_id(CPND_CKPT_NODE *cp_node); > -uint32_t cpnd_ckpt_sec_write(CPND_CKPT_NODE *cp_node, > CPND_CKPT_SECTION_INFO > +uint32_t cpnd_ckpt_sec_write(CPND_CB *cb, CPND_CKPT_NODE *cp_node, > +CPND_CKPT_SECTION_INFO > *sec_info, const void *data, uint64_t size, > uint64_t offset, uint32_t type); uint32_t cpnd_ckpt_sec_read(CPND_CKPT_NODE > *cp_node, CPND_CKPT_SECTION_INFO >*sec_info, void *data, uint64_t size, uint64_t > offset); @@ -164,7 +164,7 @@ void cpnd_evt_node_getnext(CPND_CB *cb, > uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE > *evt_node); uint32_t cpnd_evt_node_del(CPND_CB *cb, > CPSV_CPND_ALL_REPL_EVT_NODE *evt_node); CPND_CKPT_NODE > *cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, SaConstStringT ckpt_name); > -CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node, > SaCkptSectionIdT *id, SaTimeT exp_time, > +CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CB *cb, CPND_CKPT_NODE > +*cp_node, SaCkptSectionIdT *id, SaTimeT exp_time, > uint32_t gen_flag); > void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt); > uint32_t cpnd_ckpt_node_tree_init(CPND_CB *cb); @@ -176,8 +176,8 @@ void > cpnd_client_node_tree_cleanup(CPND_ > void cpnd_client_node_tree_destroy(CPND_CB *cb); void > cpnd_allrepl_write_evt_node_tree_cleanup(CPND_CB *cb); void > cpnd_allrepl_write_evt_node_tree_destroy(CPND_CB *cb); -uint32_t > cpnd_sec_hdr_update(CPND_CKPT_SECTION_INFO *pSecPtr, CPND_CKPT_NODE > *cp_node); -uint32_t cpnd_ckpt_hdr_update(CPND_CKPT_NODE *cp_node); > +uint32_t cpnd_sec_hdr_update(CPND_CB
Re: [devel] [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202]
Dear Mahesh, I have one small concern about: Please consider get CB data by ncshm_take_hdl() or pass in ensures_space as parameter. When CB is global access, adding it to parameters of many API is a little bit inconvenient. And passing CB for just one ensures_space param might confuse the usage of function, maybe in future use. This is just my opinion, please consider about it. Thank you and best regards, Hoang -Original Message- From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] Sent: Tuesday, November 29, 2016 5:37 PM To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 2 of 3] cpsv : update cpsv error handing based on leap changes [#2202] osaf/libs/common/cpsv/include/cpnd_cb.h | 4 +- osaf/libs/common/cpsv/include/cpnd_init.h | 8 +- osaf/libs/common/cpsv/include/cpnd_sec.h | 2 +- osaf/libs/core/include/ncs_osprm.h| 2 +- osaf/services/saf/cpsv/cpnd/cpnd_db.c | 12 ++-- osaf/services/saf/cpsv/cpnd/cpnd_evt.c| 82 +- osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 31 ++ osaf/services/saf/cpsv/cpnd/cpnd_res.c| 24 +++-- osaf/services/saf/cpsv/cpnd/cpnd_sec.cc | 12 ++-- 9 files changed, 103 insertions(+), 74 deletions(-) diff --git a/osaf/libs/common/cpsv/include/cpnd_cb.h b/osaf/libs/common/cpsv/include/cpnd_cb.h --- a/osaf/libs/common/cpsv/include/cpnd_cb.h +++ b/osaf/libs/common/cpsv/include/cpnd_cb.h @@ -341,8 +341,8 @@ uint32_t cpnd_amf_register(CPND_CB *cpnd uint32_t cpnd_amf_deregister(CPND_CB *cpnd_cb); uint32_t cpnd_client_extract_bits(uint32_t bitmap_value, uint32_t *bit_position); uint32_t cpnd_res_ckpt_sec_del(CPND_CKPT_NODE *cp_node); -uint32_t cpnd_ckpt_replica_create_res(NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf, CPND_CKPT_NODE **cp_node, - uint32_t ref_cnt, CKPT_INFO *cp_info, bool shm_alloc_guaranteed); +uint32_t cpnd_ckpt_replica_create_res(CPND_CB *cb, NCS_OS_POSIX_SHM_REQ_INFO *open_req, char *buf, CPND_CKPT_NODE **cp_node, + uint32_t ref_cnt, CKPT_INFO *cp_info); int32_t cpnd_find_free_loc(CPND_CB *cb, CPND_TYPE_INFO type); uint32_t cpnd_ckpt_write_header(CPND_CB *cb, uint32_t nckpts); uint32_t cpnd_cli_info_write_header(CPND_CB *cb, int32_t n_clients); diff --git a/osaf/libs/common/cpsv/include/cpnd_init.h b/osaf/libs/common/cpsv/include/cpnd_init.h --- a/osaf/libs/common/cpsv/include/cpnd_init.h +++ b/osaf/libs/common/cpsv/include/cpnd_init.h @@ -90,7 +90,7 @@ uint32_t cpnd_ckpt_replica_create(CPND_C uint32_t cpnd_ckpt_remote_cpnd_add(CPND_CKPT_NODE *cp_node, MDS_DEST mds_info); uint32_t cpnd_ckpt_remote_cpnd_del(CPND_CKPT_NODE *cp_node, MDS_DEST mds_info); int32_t cpnd_ckpt_get_lck_sec_id(CPND_CKPT_NODE *cp_node); -uint32_t cpnd_ckpt_sec_write(CPND_CKPT_NODE *cp_node, CPND_CKPT_SECTION_INFO +uint32_t cpnd_ckpt_sec_write(CPND_CB *cb, CPND_CKPT_NODE *cp_node, +CPND_CKPT_SECTION_INFO *sec_info, const void *data, uint64_t size, uint64_t offset, uint32_t type); uint32_t cpnd_ckpt_sec_read(CPND_CKPT_NODE *cp_node, CPND_CKPT_SECTION_INFO *sec_info, void *data, uint64_t size, uint64_t offset); @@ -164,7 +164,7 @@ void cpnd_evt_node_getnext(CPND_CB *cb, uint32_t cpnd_evt_node_add(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE *evt_node); uint32_t cpnd_evt_node_del(CPND_CB *cb, CPSV_CPND_ALL_REPL_EVT_NODE *evt_node); CPND_CKPT_NODE *cpnd_ckpt_node_find_by_name(CPND_CB *cpnd_cb, SaConstStringT ckpt_name); -CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CKPT_NODE *cp_node, SaCkptSectionIdT *id, SaTimeT exp_time, +CPND_CKPT_SECTION_INFO *cpnd_ckpt_sec_add(CPND_CB *cb, CPND_CKPT_NODE +*cp_node, SaCkptSectionIdT *id, SaTimeT exp_time, uint32_t gen_flag); void cpnd_evt_backup_queue_add(CPND_CKPT_NODE *cp_node, CPND_EVT *evt); uint32_t cpnd_ckpt_node_tree_init(CPND_CB *cb); @@ -176,8 +176,8 @@ void cpnd_client_node_tree_cleanup(CPND_ void cpnd_client_node_tree_destroy(CPND_CB *cb); void cpnd_allrepl_write_evt_node_tree_cleanup(CPND_CB *cb); void cpnd_allrepl_write_evt_node_tree_destroy(CPND_CB *cb); -uint32_t cpnd_sec_hdr_update(CPND_CKPT_SECTION_INFO *pSecPtr, CPND_CKPT_NODE *cp_node); -uint32_t cpnd_ckpt_hdr_update(CPND_CKPT_NODE *cp_node); +uint32_t cpnd_sec_hdr_update(CPND_CB *cb, CPND_CKPT_SECTION_INFO +*pSecPtr, CPND_CKPT_NODE *cp_node); uint32_t +cpnd_ckpt_hdr_update(CPND_CB *cb, CPND_CKPT_NODE *cp_node); void cpnd_ckpt_node_destroy(CPND_CB *cb, CPND_CKPT_NODE *cp_node); uint32_t cpnd_get_slot_sub_slot_id_from_mds_dest(MDS_DEST dest); uint32_t cpnd_get_slot_sub_slot_id_from_node_id(NCS_NODE_ID i_node_id); diff --git a/osaf/libs/common/cpsv/include/cpnd_sec.h b/osaf/libs/common/cpsv/include/cpnd_sec.h --- a/osaf/libs/common/cpsv/include/cpnd_sec.h +++ b/osaf/libs/common/cpsv/include/cpnd_sec.h @@ -39,7 +39,7 @@ CPND_CKPT_SECTION_INFO *
[devel] [PATCH 0 of 1] Review Request for AMF: support restrictions to auto-repair V2 [#2144]
Summary: amf: support restrictions to auto-repair Review request for Trac Ticket(s): 2144 Peer Reviewer(s): praveen, nagu Pull request to: Affected branch(es): default Development branch: Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - This patch should be applied on top of the previous one. changeset e49b3bcc81901987abe8a221c3a561f883b91a57 Author: Alex JonesDate: Wed, 30 Nov 2016 15:32:19 -0500 amf: support restrictions to auto-repair V2 [#2144] This patch implements section 3.11.1.4.2 of AMF spec (Restrictions to Auto- Repair). Complete diffstat: -- osaf/libs/common/amf/d2nedu.c | 15 - osaf/libs/common/amf/d2nmsg.c | 12 osaf/libs/common/amf/include/amf_d2nmsg.h | 13 osaf/services/saf/amf/amfd/comp.cc | 2 +- osaf/services/saf/amf/amfd/include/su.h| 2 +- osaf/services/saf/amf/amfd/include/util.h | 1 + osaf/services/saf/amf/amfd/mds.cc | 2 +- osaf/services/saf/amf/amfd/sgproc.cc | 2 +- osaf/services/saf/amf/amfd/su.cc | 35 +++- osaf/services/saf/amf/amfd/util.cc | 74 ++ osaf/services/saf/amf/amfnd/clc.cc | 28 ++--- osaf/services/saf/amf/amfnd/err.cc | 20 +- osaf/services/saf/amf/amfnd/evt.cc | 2 + osaf/services/saf/amf/amfnd/include/avnd_evt.h | 1 + osaf/services/saf/amf/amfnd/include/avnd_mds.h | 4 +- osaf/services/saf/amf/amfnd/include/avnd_su.h | 3 + osaf/services/saf/amf/amfnd/main.cc| 1 + osaf/services/saf/amf/amfnd/mds.cc | 6 +- osaf/services/saf/amf/amfnd/su.cc | 46 osaf/services/saf/amf/amfnd/susm.cc| 10 ++- 20 files changed, 252 insertions(+), 27 deletions(-) Testing Commands: - 1) set saAmfSUMaintenanceCampaign for an SU 2) try different failure scenarios on components in the SU (kill a component, etc) Testing, Expected Results: -- 1) auto-repair should never be engaged while the saAmfSUMaintenanceCampaign is set Conditions of Submission: - Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly
[devel] [PATCH 1 of 1] amf: support restrictions to auto-repair V2 [#2144]
osaf/libs/common/amf/d2nedu.c | 15 - osaf/libs/common/amf/d2nmsg.c | 12 osaf/libs/common/amf/include/amf_d2nmsg.h | 13 osaf/services/saf/amf/amfd/comp.cc | 2 +- osaf/services/saf/amf/amfd/include/su.h| 2 +- osaf/services/saf/amf/amfd/include/util.h | 1 + osaf/services/saf/amf/amfd/mds.cc | 2 +- osaf/services/saf/amf/amfd/sgproc.cc | 2 +- osaf/services/saf/amf/amfd/su.cc | 35 +++- osaf/services/saf/amf/amfd/util.cc | 74 ++ osaf/services/saf/amf/amfnd/clc.cc | 28 ++--- osaf/services/saf/amf/amfnd/err.cc | 20 +- osaf/services/saf/amf/amfnd/evt.cc | 2 + osaf/services/saf/amf/amfnd/include/avnd_evt.h | 1 + osaf/services/saf/amf/amfnd/include/avnd_mds.h | 4 +- osaf/services/saf/amf/amfnd/include/avnd_su.h | 3 + osaf/services/saf/amf/amfnd/main.cc| 1 + osaf/services/saf/amf/amfnd/mds.cc | 6 +- osaf/services/saf/amf/amfnd/su.cc | 46 osaf/services/saf/amf/amfnd/susm.cc| 10 ++- 20 files changed, 252 insertions(+), 27 deletions(-) This patch implements section 3.11.1.4.2 of AMF spec (Restrictions to Auto-Repair). diff --git a/osaf/libs/common/amf/d2nedu.c b/osaf/libs/common/amf/d2nedu.c --- a/osaf/libs/common/amf/d2nedu.c +++ b/osaf/libs/common/amf/d2nedu.c @@ -414,6 +414,16 @@ uint32_t avsv_edp_dnd_msg(EDU_HDL *hdl, {EDU_EXEC, avsv_edp_csi_attr_info, 0, 0, 0, (long)&((AVSV_DND_MSG*)0)->msg_info.d2n_compcsi_assign_msg_info.info.attrs, 0, NULL}, + /*AVSV_D2N_SU_MAINTENANCE_MSG_INFO*/ + {EDU_EXEC, ncs_edp_uns32, 0, 0, 0, +(long)&((AVSV_DND_MSG *)0)->msg_info.d2n_su_maintenance_msg_info.msg_id, 0, NULL}, +{EDU_EXEC, m_NCS_EDP_SACLMNODEIDT, 0, 0, 0, +(long)&((AVSV_DND_MSG *)0)->msg_info.d2n_su_maintenance_msg_info.node_id, 0, NULL}, + {EDU_EXEC, ncs_edp_sanamet, 0, 0, 0, + (long)&((AVSV_DND_MSG*)0)->msg_info.d2n_su_maintenance_msg_info.su, 0, NULL}, + {EDU_EXEC, ncs_edp_sanamet, 0, 0, EDU_EXIT, + (long)&((AVSV_DND_MSG*)0)->msg_info.d2n_su_maintenance_msg_info.suMaintenanceCampaign, 0, NULL}, + {EDU_END, 0, 0, 0, 0, 0, 0, NULL}, }; @@ -482,7 +492,8 @@ int avsv_dnd_msg_test_type_fnc(NCSCONTEX LCL_JMP_OFFSET_AVSV_D2N_REBOOT_MSG = 123, LCL_JMP_OFFSET_AVSV_N2D_ND_SISU_STATE_INFO_MSG = 125, LCL_JMP_OFFSET_AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG = 131, - LCL_JMP_OFFSET_AVSV_D2N_COMPCSI_ASSIGN_MSG = 137 + LCL_JMP_OFFSET_AVSV_D2N_COMPCSI_ASSIGN_MSG = 137, + LCL_JMP_OFFSET_AVSV_D2N_SU_MAINTENANCE_MSG = 143 }; AVSV_DND_MSG_TYPE type; @@ -554,6 +565,8 @@ int avsv_dnd_msg_test_type_fnc(NCSCONTEX return LCL_JMP_OFFSET_AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG ; case AVSV_D2N_COMPCSI_ASSIGN_MSG: return LCL_JMP_OFFSET_AVSV_D2N_COMPCSI_ASSIGN_MSG; + case AVSV_D2N_SU_MAINTENANCE_MSG: + return LCL_JMP_OFFSET_AVSV_D2N_SU_MAINTENANCE_MSG; default: break; diff --git a/osaf/libs/common/amf/d2nmsg.c b/osaf/libs/common/amf/d2nmsg.c --- a/osaf/libs/common/amf/d2nmsg.c +++ b/osaf/libs/common/amf/d2nmsg.c @@ -513,6 +513,15 @@ static void free_d2n_compcsi_info(AVSV_D } +static void free_d2n_su_maintenance_info(AVSV_DND_MSG *su_maintenance_msg) +{ + AVSV_D2N_SU_MAINTENANCE_MSG_INFO *su_maintenance = +_maintenance_msg->msg_info.d2n_su_maintenance_msg_info; + + osaf_extended_name_free(_maintenance->su); + osaf_extended_name_free(_maintenance->suMaintenanceCampaign); +} + / Name : avsv_dnd_msg_free @@ -609,6 +618,9 @@ void avsv_dnd_msg_free(AVSV_DND_MSG *msg case AVSV_D2N_COMPCSI_ASSIGN_MSG: free_d2n_compcsi_info(msg); break; + case AVSV_D2N_SU_MAINTENANCE_MSG: + free_d2n_su_maintenance_info(msg); + break; default: break; } diff --git a/osaf/libs/common/amf/include/amf_d2nmsg.h b/osaf/libs/common/amf/include/amf_d2nmsg.h --- a/osaf/libs/common/amf/include/amf_d2nmsg.h +++ b/osaf/libs/common/amf/include/amf_d2nmsg.h @@ -51,6 +51,7 @@ extern "C" { #define AVSV_AVD_AVND_MSG_FMT_VER_55 #define AVSV_AVD_AVND_MSG_FMT_VER_66 #define AVSV_AVD_AVND_MSG_FMT_VER_77 +#define AVSV_AVD_AVND_MSG_FMT_VER_88 /* Internode/External Components Validation result */ typedef enum { @@ -108,6 +109,7 @@ typedef enum { AVSV_N2D_ND_SISU_STATE_INFO_MSG, AVSV_N2D_ND_CSICOMP_STATE_INFO_MSG, AVSV_D2N_COMPCSI_ASSIGN_MSG, +
Re: [devel] [PATCH 1 of 1] imm:allow augumentCcbInit with ROF as false in completed callback[#1956] V2
Hi Neelakanta, The new objectName variable is not used in the code. I would rather like to see adminOwnerName as some sort of string type than using SaNameT. This is not an issue in the patch, only the suggestion. The reason for this is to try to avoid using SaNameT type as much as possible, and replace it with SaImmAdminOwnerNameT (or other sting type). Find other comments inline -Original Message- From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] Sent: den 30 november 2016 06:39 To: Zoran Milinkovic; Hung Duc Nguyen Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] imm:allow augumentCcbInit with ROF as false in completed callback[#1956] V2 osaf/libs/agents/saf/imma/imma_cb.h | 5 + osaf/libs/agents/saf/imma/imma_db.c | 99 + osaf/libs/agents/saf/imma/imma_oi_api.c | 60 +++ osaf/libs/agents/saf/imma/imma_proc.c | 64 ++--- 4 files changed, 133 insertions(+), 95 deletions(-) Two new variables objectName and adminOwnerName added to imma_oi_ccb_record. For CCB create operation adminOwnerName will be added. For CCB delete/modify operation objectName will be added diff --git a/osaf/libs/agents/saf/imma/imma_cb.h b/osaf/libs/agents/saf/imma/imma_cb.h --- a/osaf/libs/agents/saf/imma/imma_cb.h +++ b/osaf/libs/agents/saf/imma/imma_cb.h @@ -36,6 +36,10 @@ struct imma_oi_ccb_record { struct imma_callback_info* ccbCallback; SaImmHandleT privateAugOmHandle; SaImmAdminOwnerHandleT privateAoHandle; + SaNameT objectName;/* The Object name from the modif/delete ccb operationi. The objectName is used to obtain + adminOnerName when saImmOiAugmentCcbInitialize is called in completed-callback*/ [Zoran] objectName is not used in the code. It can be removed from the patch + SaNameT adminOwnerName; /* adminowner name of the ccb id, assigned at ccb-create-callback.The adminOwnerName is used +saImmOiAugmentCcbInitialize is called in completed-callback*/ }; typedef struct imma_client_node { @@ -197,6 +201,7 @@ int imma_oi_ccb_record_set_critical(IMMA int imma_oi_ccb_record_terminate(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId); int imma_oi_ccb_record_abort(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId); int imma_oi_ccb_record_exists(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId); +struct imma_oi_ccb_record * imma_oi_ccb_record_find(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId); int imma_oi_ccb_record_set_error(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId, const SaStringT errorString); SaStringT imma_oi_ccb_record_get_error(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId); void imma_oi_ccb_allow_error_string(IMMA_CLIENT_NODE *cl_node, SaImmOiCcbIdT ccbId); diff --git a/osaf/libs/agents/saf/imma/imma_db.c b/osaf/libs/agents/saf/imma/imma_db.c --- a/osaf/libs/agents/saf/imma/imma_db.c +++ b/osaf/libs/agents/saf/imma/imma_db.c @@ -24,6 +24,8 @@ */ #include "imma.h" +#include "osaf_extended_name.h" + / Name : imma_client_tree_init Description : This routine is used to initialize the client tree @@ -273,6 +275,9 @@ int imma_oi_ccb_record_delete(IMMA_CLIEN to_delete->mCcbErrorString = NULL; } + osaf_extended_name_free(&(to_delete->objectName)); + osaf_extended_name_free(&(to_delete->adminOwnerName)); + free(to_delete); TRACE_LEAVE(); return 1; @@ -541,6 +546,11 @@ int imma_oi_ccb_record_note_callback(IMM */ int rs = 0; + const SaImmAttrNameT admoNameAttr = SA_IMM_ATTR_ADMIN_OWNER_NAME; + SaImmAttrValuesT_2 **attr, **attributes = NULL; + SaImmAttrValuesT_2 *attrVal = NULL; + size_t attrDataSize = 0; + struct imma_oi_ccb_record *tmp = imma_oi_ccb_record_find(cl_node, ccbId); if(tmp && !(tmp->isAborted)) { @@ -553,7 +563,96 @@ int imma_oi_ccb_record_note_callback(IMM rs = 1; } } + if(callback){ + [Zoran] Remove extra lines + int noOfAttributes = 0; + + /* NOTE: The code below is practically a copy of the code + in immOm searchNext, for serving the attrValues structure. + This code should be factored out into some common function. +*/ + IMMSV_ATTR_VALUES_LIST *p = callback->attrValues; + int i = 0; + while (p) { + ++noOfAttributes; + p = p->next; + } + + p = callback->attrValues; + + + if(cl_node->isImmA2fCbk) { +
Re: [devel] [PATCH 1 of 1] amfnd: honour max num of retries for comp instantiation during repair[#2059]
ack, code review only. One comment though, it would be good in to reduce the cyclomatic complexity in amf as mentioned in the refactoring tickets. This is particularly true for e.g clc.cc. Last time I run pmccabe the numbers for clc.cc were around 115 which is extraordinary high, now the number is 181, (not introduced by this patch, but the patch adds a small amount to the numbers). It would be good to re-factor the complex if statements to reusable functions instead. Numbers around 30-40 means the function wil be harder to understand and more difficult to get acceptable code coverage. pmccabe clc.cc | sort -nr | head -10 181 181 255 927 566 clc.cc(927): avnd_comp_clc_st_chng_prc 37 39 68 758 154 clc.cc(758): avnd_comp_clc_fsm_run 23 23 34 202485 clc.cc(2024): avnd_comp_clc_inst_clean_hdler /Regards HansN On 09/29/2016 12:08 PM, praveen.malv...@oracle.com wrote: > osaf/services/saf/amf/amfnd/clc.cc | 8 > 1 files changed, 8 insertions(+), 0 deletions(-) > > > SU stuck in TERMINATING state when comp fails to instantiate as part of > repair. > Repair was performed after comp-failover recovery. > > Comp faults with comp-failover recovery. After removal of assignments, AMFNFD > tries to repair component by trying to instantiate it. Since instantiate > scripit > is not present, comp instantiation fails and AMFND cleans up it succesfully. > After > successful cleanup AMFND niether tries to instantiate comp again not it marks > comp > and SU to INST_FAILED state. AMFND should mark the comp and SU to INST_FAILED > state > after finishing instantiation of comp MAX_TRY times. If the value is not > configured > for the comp then AMFND should honour the default value of > saAmfNumMaxInstantiateWithoutDelay=2. > > Patch honours saAmfNumMaxInstantiateWithoutDelay and if comp instantiation > fails after > these many retries then both comp and SU moves to INST_FAILED state. > > diff --git a/osaf/services/saf/amf/amfnd/clc.cc > b/osaf/services/saf/amf/amfnd/clc.cc > --- a/osaf/services/saf/amf/amfnd/clc.cc > +++ b/osaf/services/saf/amf/amfnd/clc.cc > @@ -1785,6 +1785,14 @@ static bool is_failed_comp_eligible_for_ >/*A failed component is eligible for instantiation > in the context of comp-restart recovery.*/ > return true; > + } else {//SU is failed. > + /*As a part of repair comp instantiation failed > +and it was cleaned up again.*/ > + if (m_AVND_COMP_IS_FAILED(comp) && !comp->csi_list.n_nodes && > + (!m_AVND_SU_IS_ADMN_TERM(comp->su)) && > + (avnd_cb->oper_state == SA_AMF_OPERATIONAL_ENABLED) && > + (!m_AVND_IS_SHUTTING_DOWN(avnd_cb)) && > !sufailover_in_progress(comp->su)) > + return true; >} > } > -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]
Hi Lennart, This defect is related to unconditional delay of 2 seconds. will open another defect/enhancement to fix Smfutil unpredictable timeout and use smfutil in nodegroups as well . Thanks, Neel. On 2016/11/30 02:54 PM, Lennart Lund wrote: > Hi Neel > > In handling of admin state of node groups there is a similar loop that from > the beginning was a copy of the loop in SmfUtils. I have changed this loop to > remove the incorrect 2 sec sleep and also fixed the unpredictable timeout > time for the loop. I was thinking that the loop in SmfUtils should be fixed > in the same way and also make it possible to use the SmfUtil admin operation > handling with node groups as well. What do you think about this? > > Thanks > Lennart > >> -Original Message- >> From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] >> Sent: den 30 november 2016 08:53 >> To: Lennart Lund; Rafael Odzakow >> ; Tai Chi Dinh >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling >> adminoperation[#2211] >> >> osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- >> 1 files changed, 15 insertions(+), 11 deletions(-) >> >> >> diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc >> b/osaf/services/saf/smfsv/smfd/SmfUtils.cc >> --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc >> +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc >> @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st >> } >> >> /* Call the admin operation */ >> -do { >> -TRACE("call immutil_saImmOmAdminOperationInvoke_2"); >> + >> +TRACE("call immutil_saImmOmAdminOperationInvoke_2"); >> +rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, >> , 0, i_operationId, i_params, >> +, i_timeout); >> +while ((rc == SA_AIS_OK) && (returnValue == >> SA_AIS_ERR_TRY_AGAIN)){ >> +sleep(2); >> rc = >> immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, >> , 0, i_operationId, i_params, >> - , >> i_timeout); >> -if (retry <= 0) { >> -LOG_NO("Fail to invoke admin operation, too many >> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", >> - i_dn.c_str(), i_operationId); >> -rc = SA_AIS_ERR_TRY_AGAIN; >> -goto done; >> -} >> -sleep(2); >> +, i_timeout); >> retry--; >> -} while ((rc == SA_AIS_OK) && (returnValue == >> SA_AIS_ERR_TRY_AGAIN)); >> +} >> + >> +if (retry <= 0) { >> +LOG_NO("Fail to invoke admin operation, too many >> SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", >> +i_dn.c_str(), i_operationId); >> +rc = SA_AIS_ERR_TRY_AGAIN; >> +goto done; >> +} >> >> if ( rc != SA_AIS_OK) { >> LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], >> opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] fm: Add support for differentiating a hung node versus a stopped node V3 [#2160]
Hi, Has anyone had the time to look at this patch? /Regards HansN -Original Message- From: Hans Nordebäck Sent: den 23 november 2016 14:45 To: ramesh.bet...@oracle.com; mathi.naic...@oracle.com; Anders WidellCc: opensaf-devel@lists.sourceforge.net Subject: [devel] [PATCH 1 of 1] fm: Add support for differentiating a hung node versus a stopped node V3 [#2160] osaf/services/infrastructure/fm/fms/fm_cb.h | 1 + osaf/services/infrastructure/fm/fms/fm_evt.h | 1 + osaf/services/infrastructure/fm/fms/fm_main.c | 54 +++--- osaf/services/infrastructure/fm/fms/fm_mds.c | 12 +- 4 files changed, 59 insertions(+), 9 deletions(-) diff --git a/osaf/services/infrastructure/fm/fms/fm_cb.h b/osaf/services/infrastructure/fm/fms/fm_cb.h --- a/osaf/services/infrastructure/fm/fms/fm_cb.h +++ b/osaf/services/infrastructure/fm/fms/fm_cb.h @@ -106,6 +106,7 @@ typedef struct fm_cb { SaClmHandleT clm_hdl; bool use_remote_fencing; SaNameT peer_clm_node_name; + bool peer_node_terminated; } FM_CB; extern char *role_string[]; diff --git a/osaf/services/infrastructure/fm/fms/fm_evt.h b/osaf/services/infrastructure/fm/fms/fm_evt.h --- a/osaf/services/infrastructure/fm/fms/fm_evt.h +++ b/osaf/services/infrastructure/fm/fms/fm_evt.h @@ -21,6 +21,7 @@ /* EVT from other GFM over MDS.*/ typedef enum { GFM_GFM_EVT_NODE_INFO_EXCHANGE, + GFM_GFM_EVT_PEER_IS_TERMINATING, GFM_GFM_EVT_MAX } GFM_GFM_MSG_TYPE; diff --git a/osaf/services/infrastructure/fm/fms/fm_main.c b/osaf/services/infrastructure/fm/fms/fm_main.c --- a/osaf/services/infrastructure/fm/fms/fm_main.c +++ b/osaf/services/infrastructure/fm/fms/fm_main.c @@ -59,6 +59,7 @@ char *role_string[] = { "UNDEFINED", "AC static uint32_t fm_agents_startup(void); static uint32_t fm_get_args(FM_CB *); static uint32_t fms_fms_exchange_node_info(FM_CB *); +static uint32_t fms_fms_inform_terminating(FM_CB *fm_cb); static uint32_t fm_nid_notify(uint32_t); static uint32_t fm_tmr_start(FM_TMR *, SaTimeT); static SaAisErrorT get_peer_clm_node_name(NODE_ID); @@ -280,6 +281,7 @@ int main(int argc, char *argv[]) } if (fds[FD_TERM].revents & POLLIN) { + fms_fms_inform_terminating(fm_cb); daemon_exit(); } @@ -622,8 +624,12 @@ static void fm_mbx_msg_handler(FM_CB *fm * node_down event has been received. */ if (fm_cb->use_remote_fencing) { - opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_clm_node_name.value, - "Received Node Down for peer controller"); + if (fm_cb->peer_node_terminated == false) { + opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_clm_node_name.value, + "Received Node Down for peer controller"); + } else { + LOG_NO("Peer node %s is terminated, fencing will not be performed", fm_cb->peer_clm_node_name.value); + } } else { opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_node_name.value, "Received Node Down for peer controller"); @@ -661,11 +667,12 @@ static void fm_mbx_msg_handler(FM_CB *fm LOG_NO("Reseting peer controller node id: %x", fm_cb->peer_node_id); if (fm_cb->use_remote_fencing) { - LOG_NO("saClmClusterNodeGet succeeded node_id 0x%X, clm peer node name %s", - fm_mbx_evt->node_id, fm_cb->peer_clm_node_name.value); - - opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_clm_node_name.value, - "Received Node Down for peer controller"); + if (fm_cb->peer_node_terminated == false) { + opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_clm_node_name.value, + "Received Node Down for peer controller"); + } else { + LOG_NO("Peer node %s is terminated, fencing will not be performed", fm_cb->peer_clm_node_name.value); + } } else { opensaf_reboot(fm_cb->peer_node_id, (char *)fm_cb->peer_node_name.value,
Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]
Hi Neel In handling of admin state of node groups there is a similar loop that from the beginning was a copy of the loop in SmfUtils. I have changed this loop to remove the incorrect 2 sec sleep and also fixed the unpredictable timeout time for the loop. I was thinking that the loop in SmfUtils should be fixed in the same way and also make it possible to use the SmfUtil admin operation handling with node groups as well. What do you think about this? Thanks Lennart > -Original Message- > From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] > Sent: den 30 november 2016 08:53 > To: Lennart Lund; Rafael Odzakow > ; Tai Chi Dinh > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling > adminoperation[#2211] > > osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- > 1 files changed, 15 insertions(+), 11 deletions(-) > > > diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st > } > > /* Call the admin operation */ > - do { > - TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > + > + TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > , 0, i_operationId, i_params, > + , i_timeout); > + while ((rc == SA_AIS_OK) && (returnValue == > SA_AIS_ERR_TRY_AGAIN)){ > + sleep(2); > rc = > immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > , 0, i_operationId, i_params, > -, > i_timeout); > - if (retry <= 0) { > - LOG_NO("Fail to invoke admin operation, too many > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > -i_dn.c_str(), i_operationId); > - rc = SA_AIS_ERR_TRY_AGAIN; > - goto done; > - } > - sleep(2); > + , i_timeout); > retry--; > - } while ((rc == SA_AIS_OK) && (returnValue == > SA_AIS_ERR_TRY_AGAIN)); > + } > + > + if (retry <= 0) { > + LOG_NO("Fail to invoke admin operation, too many > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > + i_dn.c_str(), i_operationId); > + rc = SA_AIS_ERR_TRY_AGAIN; > + goto done; > + } > > if ( rc != SA_AIS_OK) { > LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], > opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1
Thank Neel, ACK from me. /Tai -Original Message- From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] Sent: Wednesday, November 30, 2016 3:47 PM To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com; tai.d...@dektech.com.au Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1 osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- 1 files changed, 15 insertions(+), 11 deletions(-) diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc b/osaf/services/saf/smfsv/smfd/SmfUtils.cc --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st } /* Call the admin operation */ - do { - TRACE("call immutil_saImmOmAdminOperationInvoke_2"); + + TRACE("call immutil_saImmOmAdminOperationInvoke_2"); + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, , 0, i_operationId, i_params, + , i_timeout); + while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN) && (retry > 0) ){ + sleep(2); rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, , 0, i_operationId, i_params, - , i_timeout); - if (retry <= 0) { - LOG_NO("Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", - i_dn.c_str(), i_operationId); - rc = SA_AIS_ERR_TRY_AGAIN; - goto done; - } - sleep(2); + , i_timeout); retry--; - } while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN)); + } + + if (retry <= 0) { + LOG_NO("Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", + i_dn.c_str(), i_operationId); + rc = SA_AIS_ERR_TRY_AGAIN; + goto done; + } if ( rc != SA_AIS_OK) { LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 1 of 1] smf: Avoid unconditional sleep when calling adminoperation[#2211] V1
osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- 1 files changed, 15 insertions(+), 11 deletions(-) diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc b/osaf/services/saf/smfsv/smfd/SmfUtils.cc --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st } /* Call the admin operation */ - do { - TRACE("call immutil_saImmOmAdminOperationInvoke_2"); + + TRACE("call immutil_saImmOmAdminOperationInvoke_2"); + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, , 0, i_operationId, i_params, + , i_timeout); + while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN) && (retry > 0) ){ + sleep(2); rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, , 0, i_operationId, i_params, - , i_timeout); - if (retry <= 0) { - LOG_NO("Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", - i_dn.c_str(), i_operationId); - rc = SA_AIS_ERR_TRY_AGAIN; - goto done; - } - sleep(2); + , i_timeout); retry--; - } while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN)); + } + + if (retry <= 0) { + LOG_NO("Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", + i_dn.c_str(), i_operationId); + rc = SA_AIS_ERR_TRY_AGAIN; + goto done; + } if ( rc != SA_AIS_OK) { LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] [PATCH 0 of 1] Review Request for smf: Avoid unconditional sleep when calling adminoperation[#2211] V1
Summary:smf: Avoid unconditional sleep when calling adminoperation[#2211] V1 Review request for Trac Ticket(s): 2211 Peer Reviewer(s): Rafael, Lennart, Tai Affected branch(es): 5.0.x, 5.1.x, default Development branch: default Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesy OpenSAF servicesn Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - changeset e2caf94233ab292190c7efe6623a845d58f798d5 Author: Neelakanta ReddyDate: Wed, 30 Nov 2016 14:13:54 +0530 smf: Avoid unconditional sleep when calling adminoperation[#2211] V1 Complete diffstat: -- osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- 1 files changed, 15 insertions(+), 11 deletions(-) Testing Commands: - Campaign should involve admin operation Testing, Expected Results: -- with this patch unconditional sleep of 2 seconds should not happen Conditions of Submission: - Ack from Reviewers Arch Built StartedLinux distro --- mipsn n mips64 n n x86 n n x86_64 y y powerpc n n powerpc64 n n Reviewer Checklist: --- [Submitters: make sure that your review doesn't trigger any checkmarks!] Your checkin has not passed review because (see checked entries): ___ Your RR template is generally incomplete; it has too many blank entries that need proper data filled in. ___ You have failed to nominate the proper persons for review and push. ___ Your patches do not have proper short+long header ___ You have grammar/spelling in your header that is unacceptable. ___ You have exceeded a sensible line length in your headers/comments/text. ___ You have failed to put in a proper Trac Ticket # into your commits. ___ You have incorrectly put/left internal data in your comments/files (i.e. internal bug tracking tool IDs, product names etc) ___ You have not given any evidence of testing beyond basic build tests. Demonstrate some level of runtime or other sanity testing. ___ You have ^M present in some of your files. These have to be removed. ___ You have needlessly changed whitespace or added whitespace crimes like trailing spaces, or spaces before tabs. ___ You have mixed real technical changes with whitespace and other cosmetic code cleanup changes. These have to be separate commits. ___ You need to refactor your submission into logical chunks; there is too much content into a single commit. ___ You have extraneous garbage in your review (merge commits etc) ___ You have giant attachments which should never have been sent; Instead you should place your content in a public tree to be pulled. ___ You have too many commits attached to an e-mail; resend as threaded commits, or place in a public tree for a pull. ___ You have resent this content multiple times without a clear indication of what has changed between each re-send. ___ You have failed to adequately and individually address all of the comments and change requests that were proposed in the initial review. ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) ___ Your computer have a badly configured date and time; confusing the the threaded patch review. ___ Your changes affect IPC mechanism, and you don't present any results for in-service upgradability test. ___ Your changes affect user manual and documentation, your patch series do not contain the patch that updates the Doxygen manual. -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]
Hi Tai, Yes, you are correct. will add "retry > 0" and re-publish the patch. Thanks, Neel. On 2016/11/30 01:43 PM, Tai Dinh wrote: > Hi Neel, > > NACK with the patch. > Need to add 'retry > 0' as part of while loop condition. > > /Tai > -Original Message- > From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] > Sent: Wednesday, November 30, 2016 2:53 PM > To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com; > tai.d...@dektech.com.au > Cc: opensaf-devel@lists.sourceforge.net > Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling > adminoperation[#2211] > > osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- > 1 files changed, 15 insertions(+), 11 deletions(-) > > > diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc > +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc > @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st > } > > /* Call the admin operation */ > - do { > - TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > + > + TRACE("call immutil_saImmOmAdminOperationInvoke_2"); > + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > , 0, i_operationId, i_params, > + , i_timeout); > + while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN)){ > + sleep(2); > rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, > , 0, i_operationId, i_params, > -, > i_timeout); > - if (retry <= 0) { > - LOG_NO("Fail to invoke admin operation, too many > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > -i_dn.c_str(), i_operationId); > - rc = SA_AIS_ERR_TRY_AGAIN; > - goto done; > - } > - sleep(2); > + , i_timeout); > retry--; > - } while ((rc == SA_AIS_OK) && (returnValue == > SA_AIS_ERR_TRY_AGAIN)); > + } > + > + if (retry <= 0) { > + LOG_NO("Fail to invoke admin operation, too many > SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", > + i_dn.c_str(), i_operationId); > + rc = SA_AIS_ERR_TRY_AGAIN; > + goto done; > + } > > if ( rc != SA_AIS_OK) { > LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], > opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); > -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]
Hi Hoang, We are some how able to simulated your test case. Following are the detailed steps how we reproduced , this test is generating same core dumb as below. But the provided patch resolved the issue, can you please test your self provide your observations , if the test is different please update . Attached :035_cpsv_2202_V2_debug.patch & 036_cpsv_2207_debug.patch Test application : cpsv_shm_2202.c == 1) /etc/init.d/opensafd stop 2) Change the defaults /dev/shm size to 3MB # vi /etc/fstab tmpfs And add following line ` tmpfs /dev/shm tmpfs defaults,size=3m 0 0` 3) Remount /dev/shm #mount -o remount /dev/shm 4) Check /dev/shm reflected with new value # df -k /dev/shm/ Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 3072 0 3072 0% /dev/shm 5) set ulimit to unlimited #ulimit -c unlimited 6) #/etc/init.d/opensafd start 7) Compile & run attached test application ( cpsv_shm_2202.c ) #gcc cpsv_shm_2202.c -o ckpt_shm -lSaCkpt # ./ckpt_shm 8) Once /dev/shm/ reach 100% Use you will see core dump same as yours # df -k /dev/shm/ 7) Then we applied the patch test again with `cpsv_2202_V2_debug.patch` & `cpsv_2207_debug.patch`) no core dump saCkptSectionCreate 1 returned 18. ( no core dump ) == -AVM On 11/30/2016 11:37 AM, Vo Minh Hoang wrote: Dear Mahesh, Unfortunately, I have just receive information that the same core dump still occur after applying patch. Here is dump information in short, please tell me if I can do anything in support: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x7fe314aa0109 in __memcpy_sse2_unaligned () from /lib64/libc.so.6 Missing separate debuginfos, use: zypper install opensaf-ckpt-nodedirector-debuginfo-5.1.0-.0.4997518.sle12.x86_64 (gdb) where #0 0x7fe314aa0109 in __memcpy_sse2_unaligned () from /lib64/libc.so.6 #1 0x7fe315c26082 in memcpy (__len=, __src=, __dest=) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 #2 ncs_os_posix_shm (req=req@entry=0x7ffecb80adb0) at os_defs.c:874 #3 0x00415a80 in cpnd_sec_hdr_update (cb=cb@entry=0x9e57f0, sec_info=sec_info@entry=0xb8ff60, cp_node=cp_node@entry=0xb8e8c0) at cpnd_proc.c:1880 #4 0x00406047 in cpnd_ckpt_sec_add (cb=cb@entry=0x9e57f0, cp_node=0xb8e8c0, id=0x7fe30c002390, exp_time=1480480471343486000, gen_flag=gen_flag@entry=0) at cpnd_db.c:457 #5 0x0040d17c in cpnd_evt_proc_ckpt_sect_create (cb=cb@entry=0x9e57f0, evt=evt@entry=0x7fe30c01e1d0, sinfo=sinfo@entry=0x7fe30c01e828) at cpnd_evt.c:2267 #6 0x0040eaf4 in cpnd_process_evt (evt=0x7fe30c01e1c0) at cpnd_evt.c:227 #7 0x004106cd in cpnd_main_process (cb=cb@entry=0x9e57f0) at cpnd_init.c:579 #8 0x00405383 in main (argc=, argv=) at cpnd_main.c:79 Sincerely, Hoang -Original Message- From: mahesh.va...@oracle.com [mailto:mahesh.va...@oracle.com] Sent: Tuesday, November 29, 2016 5:37 PM To: hoang.m...@dektech.com.au; ramesh.bet...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202] Summary:leap : now leap library ensure shm availability before writing [#2202] Review request for Trac Ticket(s): #2202 Peer Reviewer(s): Hoang / Ramesh Pull request to: <> Affected branch(es): <> Development branch: <> Impacted area Impact y/n Docsn Build systemn RPM/packaging n Configuration files n Startup scripts n SAF servicesn OpenSAF servicesy Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - changeset 7b53e1b3754622fe90c22c801adeb7df6d808c30 Author: A V MaheshDate: Tue, 29 Nov 2016 15:59:21 +0530 leap : now leap library ensure shm availability before writing [#2202] Issue : If OSAF_CKPT_SHM_ALLOC_GUARANTEE is NOT set and SHM is 100% used in system , pnd Segmentation fault (core dumped) at LEAP memcpy(). Fix : Now LEAP library ensures shm free space before writing This may degrade some performance of cpsv , if OSAF_CKPT_SHM_ALLOC_GUARANTEE is set, cpsv give natural performance. changeset 083114e13c00c9c4267ffe65a86c1a97a951b876 Author: A V Mahesh Date: Tue, 29 Nov 2016 16:02:06 +0530 cpsv : update cpsv error handing based on leap changes [#2202] changeset fb509abb1d1583315f585663fd75bf73e35211a6 Author: A V Mahesh
Re: [devel] [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211]
Hi Neel, NACK with the patch. Need to add 'retry > 0' as part of while loop condition. /Tai -Original Message- From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com] Sent: Wednesday, November 30, 2016 2:53 PM To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com; tai.d...@dektech.com.au Cc: opensaf-devel@lists.sourceforge.net Subject: [PATCH 1 of 1] smf: Avoid uncditional sleep when calling adminoperation[#2211] osaf/services/saf/smfsv/smfd/SmfUtils.cc | 26 +++--- 1 files changed, 15 insertions(+), 11 deletions(-) diff --git a/osaf/services/saf/smfsv/smfd/SmfUtils.cc b/osaf/services/saf/smfsv/smfd/SmfUtils.cc --- a/osaf/services/saf/smfsv/smfd/SmfUtils.cc +++ b/osaf/services/saf/smfsv/smfd/SmfUtils.cc @@ -583,19 +583,23 @@ SmfImmUtils::callAdminOperation(const st } /* Call the admin operation */ - do { - TRACE("call immutil_saImmOmAdminOperationInvoke_2"); + + TRACE("call immutil_saImmOmAdminOperationInvoke_2"); + rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, , 0, i_operationId, i_params, + , i_timeout); + while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN)){ + sleep(2); rc = immutil_saImmOmAdminOperationInvoke_2(m_ownerHandle, , 0, i_operationId, i_params, - , i_timeout); - if (retry <= 0) { - LOG_NO("Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", - i_dn.c_str(), i_operationId); - rc = SA_AIS_ERR_TRY_AGAIN; - goto done; - } - sleep(2); + , i_timeout); retry--; - } while ((rc == SA_AIS_OK) && (returnValue == SA_AIS_ERR_TRY_AGAIN)); + } + + if (retry <= 0) { + LOG_NO("Fail to invoke admin operation, too many SA_AIS_ERR_TRY_AGAIN, giving up. dn=[%s], opId=[%u]", + i_dn.c_str(), i_operationId); + rc = SA_AIS_ERR_TRY_AGAIN; + goto done; + } if ( rc != SA_AIS_OK) { LOG_NO("Fail to invoke admin operation, rc=%s. dn=[%s], opId=[%u]",saf_error(rc), i_dn.c_str(), i_operationId); -- ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] [PATCH 0 of 3] Review Request for leap : now leap library ensure shm availability before writing [#2202]
Hi Hoang, Can you please provide the snapshot of "df -h /dev/shm" when the issue occurs. Also.. can you please provide the corresponding core dump file and "osafckptnd" process file (process file not required if this image is build for SLES x86_64) Thanks, Ramesh. On 11/30/2016 12:10 PM, Vo Minh Hoang wrote: > Dear Mahesh,o > > I will do following your request. > Please note that test case executing time is very long so result might not > be available in today. > > Thank you and best regards, > Hoang > > -Original Message- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: Wednesday, November 30, 2016 1:35 PM > To: Vo Minh Hoang> Cc: opensaf-devel@lists.sourceforge.net; ramesh.bet...@oracle.com > Subject: Re: [PATCH 0 of 3] Review Request for leap : now leap library > ensure shm availability before writing [#2202] > > Hi Hoang, > > Apply these change test and provide sys log at the time issue occurred. > > > > > diff --git a/osaf/libs/core/leap/os_defs.c b/osaf/libs/core/leap/os_defs.c > --- a/osaf/libs/core/leap/os_defs.c > +++ b/osaf/libs/core/leap/os_defs.c > @@ -865,11 +865,15 @@ uint32_t ncs_os_posix_shm(NCS_OS_POSIX_S > } > > /* Checking whether sufficient shared memory is > available to write the data, to be safer side ffree reduced to 1 block size > */ > - if (req->info.write.i_write_size > > (statsvfs.f_bfree * statsvfs.f_frsize)) { > + if (req->info.write.i_write_size > > ((statsvfs.f_bfree - 1) * statsvfs.f_frsize)) { > syslog(LOG_ERR, "Insufficient shared memory > space (%ld) available to write the data of size: %ld \n", > (statsvfs.f_bfree * > statsvfs.f_frsize), req->info.write.i_write_size); > return NCSCC_RC_FAILURE; > + } else { > + syslog(LOG_ERR, "Sufficient shared > memory space (%ld) available to write the data of size: %ld \n", > +(statsvfs.f_bfree * > statsvfs.f_frsize), req->info.write.i_write_size); > } > + > } > memcpy((void *)((char *)req->info.write.i_addr + > req->info.write.i_offset), req->info.write.i_from_buff, > req->info.write.i_write_size); > > > > > -AVM > > On 11/30/2016 11:56 AM, A V Mahesh wrote: >> Hi Hoang, >> >> Thansk for the test . >> >> Then it looks issue is not related to SHM deficiency ( 100 % used by >> other application ) can you please re-test with below changes and >> that will confirm us it is completely not related to SHM free size. >> >> replacing: >> >> `if (req->info.write.i_write_size > (statsvfs.f_bfree * >> statsvfs.f_frsize)) {` >> >> with below, to be safer side ffree reduced to 1 block size : >> >> `if (req->info.write.i_write_size > ((statsvfs.f_bfree - 1) * >> statsvfs.f_frsize)) { >> >> >> -AVM >> >> >> On 11/30/2016 11:37 AM, Vo Minh Hoang wrote: >>> Dear Mahesh, >>> >>> Unfortunately, I have just receive information that the same core >>> dump still occur after applying patch. >>> >>> Here is dump information in short, please tell me if I can do >>> anything in >>> support: >>> >>> Program terminated with signal SIGSEGV, Segmentation fault. >>> #0 0x7fe314aa0109 in __memcpy_sse2_unaligned () from >>> /lib64/libc.so.6 >>> Missing separate debuginfos, use: zypper install >>> opensaf-ckpt-nodedirector-debuginfo-5.1.0-.0.4997518.sle12.x86_64 >>> (gdb) where >>> #0 0x7fe314aa0109 in __memcpy_sse2_unaligned () from >>> /lib64/libc.so.6 >>> #1 0x7fe315c26082 in memcpy (__len=, >>> __src=>> out>, __dest=) >>> at /usr/include/x86_64-linux-gnu/bits/string3.h:51 >>> #2 ncs_os_posix_shm (req=req@entry=0x7ffecb80adb0) at os_defs.c:874 >>> #3 0x00415a80 in cpnd_sec_hdr_update (cb=cb@entry=0x9e57f0, >>> sec_info=sec_info@entry=0xb8ff60, >>> cp_node=cp_node@entry=0xb8e8c0) at cpnd_proc.c:1880 >>> #4 0x00406047 in cpnd_ckpt_sec_add (cb=cb@entry=0x9e57f0, >>> cp_node=0xb8e8c0, id=0x7fe30c002390, >>> exp_time=1480480471343486000, gen_flag=gen_flag@entry=0) at >>> cpnd_db.c:457 >>> #5 0x0040d17c in cpnd_evt_proc_ckpt_sect_create >>> (cb=cb@entry=0x9e57f0, >>> evt=evt@entry=0x7fe30c01e1d0, sinfo=sinfo@entry=0x7fe30c01e828) >>> at >>> cpnd_evt.c:2267 >>> #6 0x0040eaf4 in cpnd_process_evt (evt=0x7fe30c01e1c0) at >>> cpnd_evt.c:227 >>> #7 0x004106cd in cpnd_main_process (cb=cb@entry=0x9e57f0) at >>> cpnd_init.c:579 >>> #8 0x00405383 in main (argc=, argv=>> out>) >>> at cpnd_main.c:79 >>> >>> Sincerely, >>> Hoang >>> >>> -Original