Hi Nhat, Please see below for my comments tagged with [AVM]
-AVM On 12/2/2015 12:33 PM, Nhat Pham wrote: > > Hi Mahesh, > > The ticket #1615 and 1616 report 2 different problems although the > steps to reproduce the problem are quite similar. > > The problem scenarios are quite tricky I think. J > > Please see below for my comments. > > Best regards, > > Nhat Pham > > *From:*A V Mahesh [mailto:[email protected]] > *Sent:* Wednesday, December 2, 2015 12:04 PM > *To:* Nhat Pham <[email protected]>; [email protected] > *Cc:* [email protected] > *Subject:* Re: [PATCH 1 of 1] cpsv: cpd broadcasts > CPND_EVT_D2ND_CKPT_RDSET with STOP [#1615] > > Hi Nhat, > > On 12/2/2015 7:42 AM, Nhat Pham wrote: > > Problem: > > >>>-------- > > >>>A non-collocated checkpoint is firstly created on SC-2. Then the > > >>>checkpoint is closed on SC-2. > > >>>The CPD broadcasts CPND_EVT_D2ND_CKPT_RDSET with START to start > > >>>retention duration timer on CPND because there is no user. During > > >>>that time the checkpoint is opened again and using on PL-3. > > >>>After retention duration, the checkpoint is destroyed on both SC-1 > > >>>and SC-2. > > > I interpreted `again using on PL-3` as the PL-3 was also opened the > non-collocated checkpoint at fist time it self both SC-2 & PL-3 > (opened/closed) in sequence > so I thought PL3 CPND already has the database of checkpoint , and > then we are trying to re-open the ckpt again on PL-3. > > In ticket #1616 reproducible step, it was mentioned PL-3 was also > opened ckpt , so I carried out the same test > and interpreted the #1615 & #1616 test cases are same except > the UNLINK is called , so I was suggesting > to address both together in one go with and with out Unlink called. > > Now I got it they are NOT related testcases. my current > understanding test cases of > of is Ticket #1616 & Ticket #1615 is as follows and you addressed > #1615 in this patch , please confirm : > > ========================================== > Ticket #1616 - Non-collocated ckpt > > S1. Create a checkpoint on SC-2 ( OpenFlags with > SA_CKPT_CHECKPOINT_CREATE ) /success/ > S2. Close the checkpoint on SC-2 ( retention timer starts because > of simple Close ) /success/ > S3. Open checkpoint on PL-3 /success/ > S4. Unlink the checkpoint on PL-3 /success/ > S5. Close the checkpoint on PL-3 /success/ > / > After this replicas will be deleted immediately on SC-1 & SC-2 ( no > retention timer starts because of Unlink called ) , > the subsequent checkpoint Create with same name on any node > (PL-3/SC-2/SC-1) /// > > [Nhat Pham] Actually, the replicas on SC-1 and SC-2 are not deleted > immediately after this step. This leads the problem after S6. > [AVM] If no process has the checkpoint open (S2 & S5 closed checkpoint ) when saCkptCheckpointUnlink() is invoked, the checkpoint is immediately deleted. I hope in your test case case after the S5 & before S6 no process has the checkpoint opened , so we need to fix this issue fist. this fix will fix IMM objects are not created issue. > > S6. Create the checkpoint on PL-3 ( OpenFlags with > SA_CKPT_CHECKPOINT_CREATE ) > > /checkpoint is created successfully with replicas BUT the IMM objects > is not created imm database /// > > [Nhat Pham] It should be “Create the checkpoint on SC-2” (not PL-3). > > Actually, the checkpoint is not created in this case because the CPND > on SC-2 finds that checkpoint exists. > > It just informs the CPD with CPD_EVT_ND2D_CKPT_USR_INFO. Thus the IMM > objects are not created. > > [AVM] This shouldn't happen if no process has the checkpoint open when saCkptCheckpointUnlink() is invoked > > ========================================== > > > ========================================== > Ticket #1615 - Non-collocated ckpt > > S1. Create a checkpoint on SC-1 ( OpenFlags with > SA_CKPT_CHECKPOINT_CREATE ) /success/ > S2. Open a checkpoint on SC-2 /success/ > S3. Close the checkpoint on SC-1 /success/ > S4. Close the checkpoint on SC-2 /success ( /After this replicas > Still exist on SC-1 & SC-2 and the retention timer started ) > > [Nhat Pham] S2 and S4 might not be necessary to trigger the retention > timer started. > > S5. Open checkpoint on PL-3 /success /( retention timer still running > on SC-1 & SC-2 it was not stopped ) , > S6. Create Section Failed with SA_AIS_ERR_NOT_EXIST > > The reason for the Section Create SA_AIS_ERR_NOT_EXIST is the > retention timer SC-1 & SC-2 was not stopped even after step 5 (S5), > the subsequent checkpoint Create with same name on any node > (PL-3/SC-2/SC-1) > > [Nhat Pham] regarding “the subsequent checkpoint Create with same > name on any node (PL-3/SC-2/SC-1) ”, do you mean that: > > The checkpoint with same name can be re-created on any node after this > step. > [AVM] I mean Open same checkpoint ( S5. Open checkpoint on PL-3 ) , while retention timer running we can reopen the checkpoint then it will stop retention timer. > > > > ========================================== > > -AVM > > On 12/2/2015 7:42 AM, Nhat Pham wrote: > > Hi Mahesh, > > > > I'm not clear about the your proposal below. Could you please help to make > > it clearer? Thanks. > > > > My understanding about the existing implementation: > > In case the non-collocated checkpoint exist on controllers, when the > > checkpoint is opened on PL first time > > (i.e the PL doesn't know that if the checkpoint exist and the cp_node > > doesn't exist in CPND database) > > the cpnd on PL sends CPD_EVT_ND2D_CKPT_CREATE to CPD to create the > > checkpoint. > > The CPD finds the checkpoint existing so it returns the message > > CPND_EVT_D2ND_CKPT_INFO with create_replica == false. > > The CPND updates its database with new checkpoint node without creating a > > replica on PL. > > > > Dec 1 9:03:35.780616 osafckptnd [468:cpsv_evt.c:2199] TR cpnd <<== > > CPND_EVT_A2ND_CKPT_OPEN(hdl=1, safCkpt=test3) from node 0x2030F > > Dec 1 9:03:35.780874 osafckptnd [468:cpsv_evt.c:2195] TR cpnd ==>> > > CPD_EVT_ND2D_CKPT_CREATE(safCkpt=test3, creationFlags=0x2) to CPD > > Dec 1 9:03:35.782806 osafckptnd [468:cpsv_evt.c:2201] TR cpnd <<== [3] > > CPND_EVT_D2ND_CKPT_INFO(err=1, active=0x2020F, create_rep=false) from CPD > > > > So, the flow in this case is: > > cpnd_evt_proc_ckpt_open() --> CPD_EVT_ND2D_CKPT_CREATE --> > > cpd_evt_proc_ckpt_create() > > > > Best regards, > > Nhat Pham > > > > -----Original Message----- > > From: A V Mahesh [mailto:[email protected]] > > Sent: Tuesday, December 1, 2015 5:51 PM > > To: Nhat Pham<[email protected]> > <mailto:[email protected]>;[email protected] > <mailto:[email protected]> > > Cc:[email protected] > <mailto:[email protected]> > > Subject: Re: [PATCH 1 of 1] cpsv: cpd broadcasts CPND_EVT_D2ND_CKPT_RDSET > > with STOP [#1615] > > > > Hi, > > we need to check wha cpnd_ckpt_node_find_by_name() is returning on PL-3 > if > > a no-collocated ckpt replicas exist on controller with unlinked , > > > > If it returns null we also need to find any non-collated replica exist on > > Controller nodes , while opening a checkpoint from PL-3, We are not > > suppose to create new Replica on PL-3 if replica exist on controllers ( > sc-1 > > & sc-2 ) > > > > -AVM > > > > On 12/1/2015 3:47 PM, A V Mahesh wrote: > > Hi , > > > > We may need to handle else condition of below with > > `cp_node->is_unlink == true` case in function > > cpnd_evt_proc_ckpt_open() > > > > `if(((cp_node = cpnd_ckpt_node_find_by_name(cb, ckpt_name)) != NULL) > > && cp_node->is_unlink == false) {` > > > > -AVM > > > > On 12/1/2015 3:25 PM, A V Mahesh wrote: > > Hi , > > > > The approach of stopping existing ckpt is different , it should > be > > through > > > > cpnd_evt_proc_ckpt_open() --> cpnd_send_ckpt_usr_info_to_cpd --> > > CPD_EVT_ND2D_CKPT_USR_INFO --> cpd_evt_proc_ckpt_usr_info() So > please > > do change based on this flow in > > cpd_evt_proc_ckpt_usr_info() and republish the patch . > > > > > > -AVM > > > > > > On 12/1/2015 12:25 PM, Nhat Pham wrote: > > osaf/libs/common/cpsv/include/cpd_proc.h | 2 ++ > > osaf/services/saf/cpsv/cpd/cpd_evt.c | 8 +++++++- > > osaf/services/saf/cpsv/cpd/cpd_proc.c | 22 > ++++++++++++++++++++++ > > 3 files changed, 31 insertions(+), 1 deletions(-) > > > > > > Problem: > > -------- > > A non-collocated checkpoint is firstly created on SC-2. Then > the > > checkpoint is closed on SC-2. > > The CPD broadcasts CPND_EVT_D2ND_CKPT_RDSET with START to > start > > retention duration timer on CPND because there is no user. > During > > that time the checkpoint is opened again and using on PL-3. > > After retention duration, the checkpoint is destroyed on both > SC-1 > > and SC-2. > > > > Solution: > > --------- > > The problem happens because the CPD doesn't broadcasts > > CPND_EVT_D2ND_CKPT_RDSET with STOP when the checkpoint is > opened > > again on PL-3. The CPD is updated to broadcasts > > CPND_EVT_D2ND_CKPT_RDSET with STOP when the checkpoint is > opened > > again. > > > > diff --git a/osaf/libs/common/cpsv/include/cpd_proc.h > > b/osaf/libs/common/cpsv/include/cpd_proc.h > > --- a/osaf/libs/common/cpsv/include/cpd_proc.h > > +++ b/osaf/libs/common/cpsv/include/cpd_proc.h > > @@ -71,6 +71,8 @@ uint32_t cpd_proc_retention_set(CPD_CB * > > uint32_t cpd_proc_unlink_set(CPD_CB *cb, CPD_CKPT_INFO_NODE > > **ckpt_node, > > CPD_CKPT_MAP_INFO *map_info, SaNameT > *ckpt_name); > > +void cpd_proc_broadcast_RDSET_STOP(SaCkptCheckpointHandleT > > ckpt_id, CPD_CB *cb); > > + > > void cpd_cb_dump(void); > > uint32_t cpd_mbcsv_chgrole(CPD_CB *cb); diff --git > > a/osaf/services/saf/cpsv/cpd/cpd_evt.c > > b/osaf/services/saf/cpsv/cpd/cpd_evt.c > > --- a/osaf/services/saf/cpsv/cpd/cpd_evt.c > > +++ b/osaf/services/saf/cpsv/cpd/cpd_evt.c > > @@ -355,8 +355,14 @@ static uint32_t cpd_evt_proc_ckpt_create > > } > > if (is_first_rep) > > TRACE_2("cpd ckpt create success for first replica > > ckpt_id:%llx,dest :%"PRIu64,map_info->ckpt_id,sinfo->dest); > > - else > > + else > > TRACE_2("cpd ckpt create success ckpt_id:%llx,dest > > :%"PRIu64,map_info->ckpt_id,sinfo->dest); > > + > > + > > + /* In case the first user re-creates the existing > > non-collocated checkpoint. All CPND should stop RD timer */ > > + if ((is_first_rep == false) && > > (!(map_info->attributes.creationFlags & > > SA_CKPT_CHECKPOINT_COLLOCATED))) > > + if (ckpt_node->num_users == 1) > > + cpd_proc_broadcast_RDSET_STOP(ckpt_node->ckpt_id, cb); > > TRACE_LEAVE(); > > return proc_rc; > > diff --git a/osaf/services/saf/cpsv/cpd/cpd_proc.c > > b/osaf/services/saf/cpsv/cpd/cpd_proc.c > > --- a/osaf/services/saf/cpsv/cpd/cpd_proc.c > > +++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c > > @@ -1251,3 +1251,25 @@ uint32_t > cpd_ckpt_reploc_imm_object_dele > > } > > return NCSCC_RC_SUCCESS; > > } > > + > > > +/****************************************************************** > > +************************ > > > > + * Name : cpd_proc_broadcast_RDSET_STOP > > + * > > + * Description : This routine broadcast message > > CPND_EVT_D2ND_CKPT_RDSET with STOP > > + * > > + * Return Values : None > > + * > > + * Notes : None > > > +******************************************************************* > > +***********************/ > > > > + > > +void cpd_proc_broadcast_RDSET_STOP(SaCkptCheckpointHandleT > ckpt_id, > > CPD_CB *cb) > > +{ > > + CPSV_EVT send_evt; > > + > > + memset(&send_evt, 0, sizeof(CPSV_EVT)); > > + send_evt.type = CPSV_EVT_TYPE_CPND; > > + send_evt.info.cpnd.type = CPND_EVT_D2ND_CKPT_RDSET; > > + send_evt.info.cpnd.info.rdset.ckpt_id = ckpt_id; > > + send_evt.info.cpnd.info.rdset.type = > CPSV_CKPT_RDSET_STOP; > > + cpd_mds_bcast_send(cb, &send_evt, NCSMDS_SVC_ID_CPND); } > > > > > > > > > ------------------------------------------------------------------------------ Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
