Hi Mahesh,
By looking to the ticket #1616 description again, I think the scenario and the cause of the problem that you mention is quite different with the ones in the #1616. It would be clearer if this problem is fixed in a new ticket. How would you think? Best regards, Nhat Pham From: Nhat Pham [mailto:[email protected]] Sent: Tuesday, January 5, 2016 11:18 AM To: 'A V Mahesh' <[email protected]>; '[email protected]' <[email protected]> Subject: RE: [devel] [PATCH 1 of 1] cpsv: improve handling unlink and close non-collocated checkpoint [#1616] Hi Mahesh, Ill include the fix in the new patch for this ticket and send it to you for review. Best regards, Nhat Pham From: A V Mahesh [mailto:[email protected]] Sent: Tuesday, January 5, 2016 10:46 AM To: Nhat Pham <[email protected] <mailto:[email protected]> >; [email protected] <mailto:[email protected]> Subject: Re: [devel] [PATCH 1 of 1] cpsv: improve handling unlink and close non-collocated checkpoint [#1616] Hi Nhat Pham, On 1/5/2016 8:19 AM, Nhat Pham wrote: But the replica IMM objects are not created. This is because when unlink the checkpoint, the CPD doesnt delete relating nodes from ckpt_reploc_tree. This should be fixed too. We will do in a New ticket ? or you would like to adders in this ticket it self -AVM On 1/5/2016 8:19 AM, Nhat Pham wrote: Hi Mahesh, For the case you described below, a new replica is created with same name. root@PL-3:~# ll /run/shm total 696 drwxrwxrwt 2 root root 100 Jan 5 09:02 ./ drwxr-xr-x 11 root root 440 Jan 5 09:01 ../ -rw-r--r-- 1 root root 704008 Jan 5 09:00 opensaf_CPND_CHECKPOINT_INFO_131855 -rw-r--r-- 1 root root 8872 Jan 5 09:00 opensaf_safCkpt=tes_131855_1 -rw-r--r-- 1 root root 8872 Jan 5 09:02 opensaf_safCkpt=tes_131855_2 But the replica IMM objects are not created. This is because when unlink the checkpoint, the CPD doesnt delete relating nodes from ckpt_reploc_tree. This should be fixed too. root@PL-3:~# immfind | grep safCkpt safApp=safCkptService safCkpt=test ç=============== No Replica IMM object is created Best regards, Nhat Pham ==== Create checkpoint safCkpt=test root@PL-3:~# immfind | grep safCkpt safApp=safCkptService safCkpt=test safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=test safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=test safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=test root@PL-3:~# ll /run/shm/ total 692 drwxrwxrwt 2 root root 80 Jan 5 09:00 ./ drwxr-xr-x 11 root root 440 Jan 5 09:01 ../ -rw-r--r-- 1 root root 704008 Jan 5 09:00 opensaf_CPND_CHECKPOINT_INFO_131855 -rw-r--r-- 1 root root 8872 Jan 5 09:00 opensaf_safCkpt=tes_131855_1 ==== Unlink checkpoint safCkpt=test checkpoint is still being used. root@PL-3:~# ll /run/shm/ total 692 drwxrwxrwt 2 root root 80 Jan 5 09:00 ./ drwxr-xr-x 11 root root 440 Jan 5 09:01 ../ -rw-r--r-- 1 root root 704008 Jan 5 09:00 opensaf_CPND_CHECKPOINT_INFO_131855 -rw-r--r-- 1 root root 8872 Jan 5 09:00 opensaf_safCkpt=tes_131855_1 root@PL-3:~# immfind | grep safCkpt safApp=safCkptService ==== Create a new checkpoint with same name safCkpt=test root@PL-3:~# immfind | grep safCkpt safApp=safCkptService safCkpt=test ç=============== No Replica IMM object is created root@PL-3:~# ll /run/shm total 696 drwxrwxrwt 2 root root 100 Jan 5 09:02 ./ drwxr-xr-x 11 root root 440 Jan 5 09:01 ../ -rw-r--r-- 1 root root 704008 Jan 5 09:00 opensaf_CPND_CHECKPOINT_INFO_131855 -rw-r--r-- 1 root root 8872 Jan 5 09:00 opensaf_safCkpt=tes_131855_1 -rw-r--r-- 1 root root 8872 Jan 5 09:02 opensaf_safCkpt=tes_131855_2 -----Original Message----- From: A V Mahesh [mailto:[email protected]] Sent: Monday, January 4, 2016 11:29 AM To: [email protected] <mailto:[email protected]> Subject: Re: [devel] [PATCH 1 of 1] cpsv: improve handling unlink and close non-collocated checkpoint [#1616] Hi Nhat Pham, As this patch fixed the issue of `unlinked checkpoint is deleted although there is a client using the checkpoint` Can you please verify following subsequent case which is also to be addressed : Say multiple application opened checkpoint and one application closed with unlink , still some application are uniting the checkpoint , at that moment the checkpoint is re-created by specifying in an open call the SA_CKPT_CHECKPOINT_CREATE flag and the same name of the checkpoint which is to be unlinked but checkpoint is possibly not yet finally deleted. According to CPSV specification A new instance of the checkpoint is created while the old instance of the checkpoint is possibly not yet finally deleted. this means , at CPND while open called the SA_CKPT_CHECKPOINT_CREATE flag and the same name of the checkpoint where checkpoint exist with same name and unlink is already marked for that old instence CPND should create a New of checkpoint/ replica ( this mean at a given point of time two instance of cktp/replicas exist on cluster) -AVM On 12/9/2015 11:48 AM, A V Mahesh wrote: > Hi, > > Ok got it , I will review both cases > > -AVM > > On 12/9/2015 11:34 AM, Nhat Pham wrote: >> Hi Mahesh, >> >> For 'several problem', I mean 3 use cases where: >> 1,2 : the checkpoint replicas are not deleted immediately even no >> client exists >> 3: the checkpoint is deleted although there is a client using the >> checkpoint. >> >> The patch only addresses the problem in these 3 use cases. >> >> Best regards, >> Nhat Pham >> >> -----Original Message----- >> From: A V Mahesh [mailto:[email protected]] >> Sent: Wednesday, December 9, 2015 12:21 PM >> To: Nhat Pham <[email protected] <mailto:[email protected]> >; [email protected] <mailto:[email protected]> >> Cc: [email protected] <mailto:[email protected]> >> Subject: Re: [PATCH 1 of 1] cpsv: improve handling unlink and close >> non-collocated checkpoint [#1616] >> >> Hi Nhat >> >> >>There are several problems relating to closing and unlinking >> non-collocated checkpoint. >> >> I can see only one problem unlinked non-collocated checkpoint is not >> getting deleted immediate even No client exist for that non-collocated checkpoint. >> >> I see 1,2 ,3 are use-case of non-collocated checkpoint , in all >> cases the the non-collocated checkpoint is not getting deleted >> immediately is that you mean by `several problems` ? >> >> Please let me know is any other portable exist and it is being >> addressed in this patch , so that I can look the patch in that point of view as well . >> >> -AVM >> >> >> On 12/9/2015 8:06 AM, Nhat Pham wrote: >>> osaf/services/saf/cpsv/cpnd/cpnd_evt.c | 51 +++++++++++++------------ >>> osaf/services/saf/cpsv/cpnd/cpnd_proc.c | 66 >> ++++++++++++++++++-------------- >>> 2 files changed, 64 insertions(+), 53 deletions(-) >>> >>> >>> Problem: >>> -------- >>> There are several problems relating to closing and unlinking >> non-collocated checkpoint. >>> 1. A non-collocated checkpoint is firstly created on SC-2. It is >>> closed on >> SC-2. It is opened on PL-3. >>> It is unlinked. It is closes on PL-3. The replicas on SCs are not >>> destroyed although the checkpoint is unlinked and no client is using it. >>> >>> 2. A non-collocated checkpoint is firstly created on PL-3. It is >>> closed on >> PL-3. It is opened on SC-2. >>> It is unlinked. It is closes on SC-2. The replicas on SCs and PL-3 >>> are not destroyed although the checkpoint is unlinked and no client >>> is using >> it. >>> 3. A non-collocated checkpoint is firstly created on PL-3. It is >>> closed on >> PL-3. It is opened on PL-4. >>> It is unlinked. The replicas on SCs and PL-3 are destroyed although >>> the >> checkpoint is using on PL-4. >>> Solution: >>> --------- >>> The main cause of above problems is to use checking if >>> non-collocated replica is on PL to decide destroying the replicas. >>> This mechanism is not correct in some cases. The solution is use >>> another mechanism which checks if there is any client using the >>> checkpoint on the cluster by >> verifying if the retention duration timer is active or not. >>> Test: >>> ----- >>> Following test cases were executed for both non-collocated and >>> collocated checkpoint to verify the solution: >>> 1. verify_unlink_ckpt_created_on_sc_before_close_it_from_sc >>> 2. verify_unlink_ckpt_created_on_sc_before_close_it_from_pl >>> 3. verify_unlink_ckpt_created_on_sc_after_close_it >>> 4. verify_unlink_ckpt_created_on_pl_before_close_it_from_pl >>> 5. verify_unlink_ckpt_created_on_pl_before_close_it_from_sc >>> 6. verify_unlink_ckpt_created_on_pl_before_close_it_from_other_pl >>> 7. verify_unlink_ckpt_created_on_pl_after_close_it >>> >>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >>> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c >>> @@ -26,7 +26,7 @@ >>> >>> #include "cpnd.h" >>> >>> -extern uint32_t cpnd_ckpt_non_collocated_rplica_close(CPND_CB *cb, >>> CPND_CKPT_NODE *cp_node, SaAisErrorT *error); >>> +extern uint32_t cpnd_proc_rdset_start(CPND_CB *cb, CPND_CKPT_NODE >>> +*cp_node); >>> extern uint32_t cpnd_proc_non_colloc_rt_expiry(CPND_CB *cb, >>> SaCkptCheckpointHandleT ckpt_id); >>> >>> static uint32_t cpnd_evt_proc_cb_dump(CPND_CB *cb); @@ -1194,8 >>> +1194,7 @@ static uint32_t cpnd_evt_proc_ckpt_unlin >>> >> /******************************************************************** >> ******* >> * >>> * Name : cpnd_evt_proc_ckpt_unlink_info >>> * >>> - * Description : Function to process check point unlink >>> - * from Applications. >>> + * Description : Function to process checkpoint unlink event from CPD >>> * >>> * Arguments : CPND_CB *cb - CPND CB pointer >>> * CPSV_EVT *evt - Received Event structure >>> @@ -1209,10 +1208,11 @@ static uint32_t cpnd_evt_proc_ckpt_unlin >>> { >>> uint32_t rc = NCSCC_RC_SUCCESS; >>> CPND_CKPT_NODE *cp_node = NULL; >>> - SaAisErrorT error; >>> + SaAisErrorT error = SA_AIS_OK; >>> CPSV_SEND_INFO sinfo_cpa; >>> CPSV_EVT send_evt; >>> bool sinfo_cpa_flag = false; >>> + bool destroy_replica = false; >>> >>> TRACE_ENTER(); >>> memset(&send_evt, '\0', sizeof(CPSV_EVT)); @@ -1220,25 +1220,35 >>> @@ static uint32_t cpnd_evt_proc_ckpt_unlin >>> if (cp_node == NULL) { >>> TRACE_4("cpnd ckpt node get failed for >> ckpt_id:%llx",evt->info.ckpt_ulink.ckpt_id); >>> rc = NCSCC_RC_FAILURE; >>> - send_evt.info.cpa.info.ulinkRsp.error = >> SA_AIS_ERR_NOT_EXIST; >>> + error = SA_AIS_ERR_NOT_EXIST; >>> goto agent_rsp; >>> } >>> >>> sinfo_cpa = cp_node->cpa_sinfo; >>> sinfo_cpa_flag = cp_node->cpa_sinfo_flag; >>> + >>> if (cp_node->is_close == true) { >>> - send_evt.info.cpa.info.ulinkRsp.error = SA_AIS_OK; >>> + /* For non-collocated checkpoint if retention duration timer >> is active >>> + * (i.e the checkpoint is not opened by any client in >> cluster) the replica >>> + * should be destroyed in this case */ >>> + if >> (!m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags) >> ) { >>> + if (cp_node->ret_tmr.is_active) { >>> + TRACE_1("cpnd destroy replica ckpt_id:%llx - >> No client opens the non-collocated checkpoint ", >>> + cp_node->ckpt_id); >>> + destroy_replica = true; >>> + } >>> + } >>> + /* For collocated checkpoint, there is no client opening the >> checkpoint on this >>> + * node. The replica should be destroyed. */ >>> + else >>> + destroy_replica = true; >>> + } >>> + >>> + if (destroy_replica == true) { >>> /* check timer is present,if yes...stop the timer and >> destroy shm_info and the node */ >>> if (cp_node->ret_tmr.is_active) >>> cpnd_tmr_stop(&cp_node->ret_tmr); >>> >>> - if >> (!m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags) >> ) { >>> - if >> (cpnd_is_noncollocated_replica_present_on_payload(cb, cp_node)) { >>> - rc = NCSCC_RC_SUCCESS; >>> - goto agent_rsp; >>> - } >>> - } >>> - >>> rc = cpnd_ckpt_replica_destroy(cb, cp_node, &error); >>> if (rc == NCSCC_RC_FAILURE) { >>> TRACE_4("cpnd ckpt replica destroy failed for >> ckpt_id:%llx,error >>> %u",cp_node->ckpt_id, error); @@ -1260,8 +1270,6 @@ static uint32_t >>> cpnd_evt_proc_ckpt_unlin >>> >>> } >>> TRACE_4("cpnd proc ckpt unlink set for >>> ckpt_id:%llx",cp_node->ckpt_id); >>> - >>> - send_evt.info.cpa.info.ulinkRsp.error = SA_AIS_OK; >>> } >>> >>> agent_rsp: >>> @@ -1269,6 +1277,7 @@ static uint32_t cpnd_evt_proc_ckpt_unlin >>> if (sinfo_cpa_flag == 1) { >>> send_evt.type = CPSV_EVT_TYPE_CPA; >>> send_evt.info.cpa.type = CPA_EVT_ND2A_CKPT_UNLINK_RSP; >>> + send_evt.info.cpa.info.ulinkRsp.error = error; >>> rc = cpnd_mds_send_rsp(cb, &sinfo_cpa, &send_evt); >>> >>> } >>> @@ -1767,7 +1776,6 @@ static uint32_t cpnd_evt_proc_ckpt_activ >>> static uint32_t cpnd_evt_proc_ckpt_rdset_info(CPND_CB *cb, >>> CPND_EVT >> *evt, CPSV_SEND_INFO *sinfo) >>> { >>> CPND_CKPT_NODE *cp_node = NULL; >>> - SaAisErrorT error = SA_AIS_OK; >>> >>> TRACE_ENTER(); >>> /* get cp_node from ckpt_info_db */ @@ -1791,14 +1799,9 @@ >>> static uint32_t cpnd_evt_proc_ckpt_rdset >>> } >>> >>> if (evt->info.rdset.type == CPSV_CKPT_RDSET_START) { >>> - if >> (!m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags) >> ) { >>> - if (cpnd_ckpt_non_collocated_rplica_close(cb, >> cp_node, &error) == NCSCC_RC_FAILURE) { >>> - TRACE_4("cpnd ckpt relica close failed for >> client_hdl:%llx,ckpt_id:%llx",evt->info.closeReq.client_hdl, >> cp_node->ckpt_id); >>> - >>> - } >>> - TRACE_LEAVE(); >>> - return NCSCC_RC_SUCCESS; >>> - } >>> + cpnd_proc_rdset_start(cb, cp_node); >>> + TRACE_LEAVE(); >>> + return NCSCC_RC_SUCCESS; >>> } >>> >>> /* if timer already started on one of the node then what to do!!! >>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c >>> @@ -2297,53 +2297,61 @@ uint32_t cpnd_ckpt_replica_close(CPND_CB >>> } >>> >>> >> /******************************************************************** >> ******* >> ************* >>> - * Name : cpnd_ckpt_non_collocated_rplica_close >>> + * Name : cpnd_proc_rdset_start >>> * >>> - * Description : This is the function close the non_collocated Ckpt >> Replica >>> + * Description : This is the function process the event >> CPSV_CKPT_RDSET_START >>> + * This event is only applicable for non-collocated >> checkpoint >>> * Arguments : cb - CPND Control Block pointer >>> * cp_node - pointer to checkpoint node >>> * >>> * Return Values : NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE >>> >>> ******************************************************************** >>> ** >>> *******************/ >>> >>> -uint32_t cpnd_ckpt_non_collocated_rplica_close(CPND_CB *cb, >>> CPND_CKPT_NODE *cp_node, SaAisErrorT *error) >>> +uint32_t cpnd_proc_rdset_start(CPND_CB *cb, CPND_CKPT_NODE >>> +*cp_node) >>> { >>> SaTimeT presentTime; >>> + SaAisErrorT error = SA_AIS_OK; >>> uint32_t rc = NCSCC_RC_SUCCESS; >>> >>> TRACE_ENTER(); >>> - if (cp_node->ckpt_lcl_ref_cnt == 0) { >>> >>> - cp_node->is_close = true; >>> - cpnd_restart_set_close_flag(cb, cp_node); >>> + if >> (m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags)) >> { >>> + TRACE_LEAVE(); >>> + return NCSCC_RC_SUCCESS; >>> + } >>> >>> - if (cp_node->is_unlink != true && >>> - >> (m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retention >> Duratio >> n) != 0)) { >>> - m_GET_TIME_STAMP(presentTime); >>> - cpnd_restart_update_timer(cb, cp_node, presentTime); >>> + if (cp_node->ckpt_lcl_ref_cnt != 0) { >>> + LOG_ER("cpnd receives CPND_EVT_D2ND_RDSET_INFO with START >> while ckpt_lcl_ref_cnt = %d", cp_node->ckpt_lcl_ref_cnt); >>> + TRACE_LEAVE(); >>> + return NCSCC_RC_FAILURE; >>> + } >>> >>> - cp_node->ret_tmr.type = >> CPND_TMR_TYPE_NON_COLLOC_RETENTION; >>> - cp_node->ret_tmr.uarg = cb->cpnd_cb_hdl_id; >>> - cp_node->ret_tmr.ckpt_id = cp_node->ckpt_id; >>> - cpnd_tmr_start(&cp_node->ret_tmr, >>> - >> m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retentionD >> uration >> )); >>> - TRACE_1("cpnd ckpt ret tmr success >> ckpt_id:%llx",cp_node->ckpt_id); >>> - } else { >>> - /* Check for Non-Collocated Replica */ >>> - if >> (cpnd_is_noncollocated_replica_present_on_payload(cb, cp_node)) { >>> - return NCSCC_RC_SUCCESS; >>> - } >>> - rc = cpnd_ckpt_replica_destroy(cb, cp_node, error); >>> - if (rc == NCSCC_RC_FAILURE) { >>> - TRACE_4("cpnd ckpt replica destroy failed >> ckpt_id:%llx",cp_node->ckpt_id); >>> - return NCSCC_RC_FAILURE; >>> - } >>> - TRACE_1("cpnd ckpt replica destroy failed >> ckpt_id:%llx",cp_node->ckpt_id); >>> + cp_node->is_close = true; >>> + cpnd_restart_set_close_flag(cb, cp_node); >>> >>> - cpnd_restart_shm_ckpt_free(cb, cp_node); >>> - cpnd_ckpt_node_destroy(cb, cp_node); >>> + if (cp_node->is_unlink != true && >>> + >> (m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retention >> Duratio >> n) != 0)) { >>> + m_GET_TIME_STAMP(presentTime); >>> + cpnd_restart_update_timer(cb, cp_node, presentTime); >>> + >>> + cp_node->ret_tmr.type = CPND_TMR_TYPE_NON_COLLOC_RETENTION; >>> + cp_node->ret_tmr.uarg = cb->cpnd_cb_hdl_id; >>> + cp_node->ret_tmr.ckpt_id = cp_node->ckpt_id; >>> + cpnd_tmr_start(&cp_node->ret_tmr, >>> + >> m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retentionD >> uration >> )); >>> + TRACE_1("cpnd ckpt ret tmr success >> ckpt_id:%llx",cp_node->ckpt_id); >>> + } else { >>> + rc = cpnd_ckpt_replica_destroy(cb, cp_node, &error); >>> + if (rc == NCSCC_RC_FAILURE) { >>> + LOG_ER("cpnd ckpt replica destroy failed >> ckpt_id:%llx, error:%d",cp_node->ckpt_id, error); >>> + return NCSCC_RC_FAILURE; >>> } >>> + TRACE_1("cpnd ckpt replica destroy success >>> +ckpt_id:%llx",cp_node->ckpt_id); >>> + >>> + cpnd_restart_shm_ckpt_free(cb, cp_node); >>> + cpnd_ckpt_node_destroy(cb, cp_node); >>> } >>> + >>> TRACE_LEAVE(); >>> return NCSCC_RC_SUCCESS; >>> } > > ---------------------------------------------------------------------- > -------- _______________________________________________ > Opensaf-devel mailing list > [email protected] <mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/opensaf-devel ---------------------------------------------------------------------------- -- _______________________________________________ Opensaf-devel mailing list [email protected] <mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
